This approach leverages binary search in conjunction with the Rabin-Karp (rolling hash) algorithm to find the longest duplicate substring within a given string.
We perform binary search on the length of the possible substring, starting from 1 to length of s
-1. For each mid-length obtained from the binary search, we use a rolling hash function to hash each substring of length mid
. This hash is used to quickly identify duplicates due to its constant time complexity for fixed-length substrings.
Time Complexity: O(n log n), where n is the length of the string. The binary search takes O(log n), and for each midpoint, hashing takes O(n).
Space Complexity: O(n), primarily for storing hash values and powers of base.
1function longestDupSubstring(s) {
2 const MOD = 10000007;
3 const BASE = 26;
4
5 function search(len) {
6 let hash = 0,
7 baseL = 1;
8 const seen = new Set();
9
10 for (let i = 0; i < len; ++i) {
11 hash = (hash * BASE + (s.charCodeAt(i) - 'a'.charCodeAt(0))) % MOD;
12 baseL = (i < len - 1) ? (baseL * BASE) % MOD : baseL;
13 }
14 seen.add(hash);
15 for (let i = len; i < s.length; ++i) {
16 hash = (hash * BASE - (s.charCodeAt(i - len) - 'a'.charCodeAt(0)) * baseL % MOD + MOD) % MOD;
17 hash = (hash + (s.charCodeAt(i) - 'a'.charCodeAt(0))) % MOD;
18 if (seen.has(hash)) return s.substring(i - len + 1, i + 1);
19 seen.add(hash);
20 }
21 return "";
22 }
23
24 let left = 0,
25 right = s.length - 1,
26 result = "";
27
28 while (left <= right) {
29 const mid = left + ((right - left) >> 1);
30 const dup = search(mid);
31 if (dup) {
32 left = mid + 1;
33 result = dup;
34 } else {
35 right = mid - 1;
36 }
37 }
38 return result;
39}
40
41console.log(longestDupSubstring("banana"));
42
This JavaScript implementation makes use of binary search and a custom hash function to find duplicate substrings. A set is used to manage and compare hash values efficiently for potential duplicates.
This method involves constructing a suffix array from the input string and then performing binary search on the suffixes to find the longest duplicate substring.
Using suffix arrays, we can efficiently sort and group starting indices of the given string. Then, by employing binary search, we determine the largest-length substring that repeats. The Longest Common Prefix (LCP) array helps in assessing the similarity of suffixes at each binary search step.
Time Complexity: O(n^2 log n), primarily due to the sorting step where n is the length of the input string.
Space Complexity: O(n^2), largely for storing pointers to suffixes.
1#include <stdio.h>
2#include <stdlib.h>
3#include <string.h>
4
5int compare(const void *a, const void *b) {
6 return strcmp(*(const char **)a, *(const char **)b);
7}
8
9char *longestDupSubstring(char *s) {
10 int n = strlen(s);
11 char **suffixes = (char **)malloc(n * sizeof(char *));
12 for (int i = 0; i < n; ++i) {
13 suffixes[i] = s + i;
14 }
15 qsort(suffixes, n, sizeof(char *), compare);
16
17 int max_len = 0;
18 char *result = "";
19 for (int i = 1; i < n; ++i) {
20 int len = 0;
21 while (suffixes[i][len] == suffixes[i - 1][len]) len++;
22 if (len > max_len) {
23 max_len = len;
24 result = (char *)malloc((len + 1) * sizeof(char));
25 strncpy(result, suffixes[i], len);
26 result[len] = '\0';
27 }
28 }
29 free(suffixes);
30 return result;
31}
32
33int main() {
34 char *s = "banana";
35 printf("%s\n", longestDupSubstring(s));
36 return 0;
37}
38
In this C implementation, suffixes of the input string are sorted using quicksort. The maximum-length common prefix between any two consecutive suffixes is determined, and the largest one is recorded as the result. This is an efficient way to identify the longest duplicate substring.