Watch 10 video solutions for Longest Duplicate Substring, a hard level problem involving String, Binary Search, Sliding Window. This walkthrough by Techdose has 43,810 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".
Example 1:
Input: s = "banana" Output: "ana"
Example 2:
Input: s = "abcd" Output: ""
Constraints:
2 <= s.length <= 3 * 104s consists of lowercase English letters.Problem Overview: Given a string s, return the longest substring that appears at least twice. The duplicates can overlap, but the returned substring must be the longest possible repeated sequence. If none exist, return an empty string.
Approach 1: Binary Search with Rolling Hash (Rabin–Karp) (Time: O(n log n), Space: O(n))
This approach combines binary search with a rolling hash to efficiently test substring lengths. Instead of checking every substring pair, you binary search the length L of the duplicate substring. For a given length, compute rolling hashes for all substrings of length L using a sliding window and store them in a hash set. If two substrings produce the same hash, a duplicate of length L exists.
The key insight: if a duplicate substring of length L exists, then duplicates also exist for all lengths smaller than L. That monotonic property makes binary search valid. Each check runs in O(n) using rolling hash updates while sliding across the string. Combined with log n binary search steps, the total complexity becomes O(n log n). This is the most common solution used in competitive programming and interviews.
Approach 2: Suffix Array with Binary Search (Time: O(n log n), Space: O(n))
A more classical string-processing solution uses a suffix array. First generate all suffixes of the string and sort them lexicographically. When suffixes are sorted, duplicate substrings appear as common prefixes between adjacent suffixes. You then compute the longest common prefix (LCP) between neighboring suffixes and track the maximum length.
To accelerate substring length checks, binary search can also be applied on the possible duplicate length while comparing prefixes in the suffix array. Building the suffix array typically takes O(n log n) time with standard algorithms, and scanning for the maximum LCP takes O(n). The memory usage is linear because only arrays for suffix indices and LCP values are stored.
This method avoids hash collisions entirely and relies on deterministic lexicographic ordering. It’s widely used in advanced string processing systems such as text indexing and genome analysis. The downside is implementation complexity compared to rolling hash.
Recommended for interviews: The binary search + rolling hash solution is usually expected. It shows strong understanding of string algorithms, sliding window hashing, and search optimization. A brute-force comparison of all substrings demonstrates the problem structure but runs in O(n^2) or worse. The rolling hash optimization reduces comparisons dramatically and is the practical approach most interviewers want to see.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Brute Force Substring Comparison | O(n^3) | O(1) | Conceptual understanding or very small strings |
| Binary Search + Rolling Hash | O(n log n) | O(n) | Best practical solution for interviews and competitive programming |
| Suffix Array + LCP | O(n log n) | O(n) | When using advanced string-processing structures or deterministic comparisons |