Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".
Example 1:
Input: s = "banana" Output: "ana"
Example 2:
Input: s = "abcd" Output: ""
Constraints:
2 <= s.length <= 3 * 104s consists of lowercase English letters.The key challenge in #1044 Longest Duplicate Substring is efficiently finding the longest substring that appears at least twice in a string. A brute-force comparison of all substrings would be too slow, so optimized string-processing techniques are required.
A common approach combines Binary Search with a Rolling Hash (Rabin–Karp). The idea is to binary search the length of the substring and check if any duplicate substring of that length exists. Using a rolling hash allows substring hashes to be updated in constant time as a sliding window moves through the string, enabling fast duplicate detection with a hash set.
Another advanced technique uses a Suffix Array along with the LCP (Longest Common Prefix) array. By sorting all suffixes and computing adjacent prefix matches, we can directly determine the longest repeated substring.
The binary search with rolling hash approach typically achieves O(n log n) time, while suffix array solutions can also reach near-linear performance depending on implementation.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Binary Search + Rolling Hash | O(n log n) | O(n) |
| Suffix Array + LCP | O(n log n) | O(n) |
NeetCode
Use these hints if you're stuck. Try solving on your own first.
Binary search for the length of the answer. (If there's an answer of length 10, then there are answers of length 9, 8, 7, ...)
To check whether an answer of length K exists, we can use Rabin-Karp 's algorithm.
This approach leverages binary search in conjunction with the Rabin-Karp (rolling hash) algorithm to find the longest duplicate substring within a given string.
We perform binary search on the length of the possible substring, starting from 1 to length of s-1. For each mid-length obtained from the binary search, we use a rolling hash function to hash each substring of length mid. This hash is used to quickly identify duplicates due to its constant time complexity for fixed-length substrings.
Time Complexity: O(n log n), where n is the length of the string. The binary search takes O(log n), and for each midpoint, hashing takes O(n).
Space Complexity: O(n), primarily for storing hash values and powers of base.
1#include <iostream>
2#include <unordered_set>
3#include <string>
4using namespace std;
5
6class Solution {
7 const int MOD = 10000007;
8 const int BASE = 26;
9public:
10 string longestDupSubstring(string s) {
11 int n = s.size();
12 int left = 1, right = n - 1;
string result = "";
while (left <= right) {
int mid = left + (right - left) / 2;
string dup = findDup(s, mid);
if (!dup.empty()) {
left = mid + 1;
result = dup;
} else {
right = mid - 1;
}
}
return result;
}
string findDup(string &s, int len) {
unordered_set<int> seen;
long long hash = 0, power = 1;
for (int i = 0; i < len; ++i) {
hash = (hash * BASE + (s[i] - 'a')) % MOD;
power = (power * BASE) % MOD;
}
seen.insert(hash);
for (int i = len; i < s.size(); ++i) {
hash = ((hash * BASE - (s[i - len] - 'a') * power % MOD + MOD) % MOD + (s[i] - 'a')) % MOD;
if (seen.count(hash)) {
return s.substr(i - len + 1, len);
}
seen.insert(hash);
}
return "";
}
};
int main() {
Solution sol;
string s = "banana";
cout << sol.longestDupSubstring(s) << endl;
return 0;
}
This C++ solution follows similar logic to the C implementation but uses C++ containers like unordered_set for efficiency in hash management. The binary search is used to determine the maximum length of duplicate substring, leveraging findDup to extract the actual string.
This method involves constructing a suffix array from the input string and then performing binary search on the suffixes to find the longest duplicate substring.
Using suffix arrays, we can efficiently sort and group starting indices of the given string. Then, by employing binary search, we determine the largest-length substring that repeats. The Longest Common Prefix (LCP) array helps in assessing the similarity of suffixes at each binary search step.
Time Complexity: O(n^2 log n), primarily due to the sorting step where n is the length of the input string.
Space Complexity: O(n^2), largely for storing pointers to suffixes.
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
Yes, this problem is considered an advanced string algorithm question and has appeared in FAANG-style interviews. It tests knowledge of binary search, hashing techniques, and advanced string structures like suffix arrays.
A widely used approach is Binary Search combined with a Rolling Hash (Rabin–Karp). Binary search is used to guess the substring length, while the rolling hash efficiently checks if duplicate substrings of that length exist. This reduces the complexity significantly compared to brute force.
Rolling hash allows quick recalculation of substring hashes when a sliding window moves by one character. Instead of recomputing the hash from scratch, the algorithm updates it in constant time, enabling efficient duplicate detection across the string.
Hash sets are commonly used alongside rolling hashes to detect duplicate substring hashes quickly. In more advanced solutions, suffix arrays and LCP arrays are used to systematically compare suffixes and identify the longest repeated substring.
In this C implementation, suffixes of the input string are sorted using quicksort. The maximum-length common prefix between any two consecutive suffixes is determined, and the largest one is recorded as the result. This is an efficient way to identify the longest duplicate substring.