Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".
Example 1:
Input: s = "banana" Output: "ana"
Example 2:
Input: s = "abcd" Output: ""
Constraints:
2 <= s.length <= 3 * 104s consists of lowercase English letters.Problem Overview: Given a string s, return the longest substring that appears at least twice. The duplicates can overlap, but the returned substring must be the longest possible repeated sequence. If none exist, return an empty string.
Approach 1: Binary Search with Rolling Hash (Rabin–Karp) (Time: O(n log n), Space: O(n))
This approach combines binary search with a rolling hash to efficiently test substring lengths. Instead of checking every substring pair, you binary search the length L of the duplicate substring. For a given length, compute rolling hashes for all substrings of length L using a sliding window and store them in a hash set. If two substrings produce the same hash, a duplicate of length L exists.
The key insight: if a duplicate substring of length L exists, then duplicates also exist for all lengths smaller than L. That monotonic property makes binary search valid. Each check runs in O(n) using rolling hash updates while sliding across the string. Combined with log n binary search steps, the total complexity becomes O(n log n). This is the most common solution used in competitive programming and interviews.
Approach 2: Suffix Array with Binary Search (Time: O(n log n), Space: O(n))
A more classical string-processing solution uses a suffix array. First generate all suffixes of the string and sort them lexicographically. When suffixes are sorted, duplicate substrings appear as common prefixes between adjacent suffixes. You then compute the longest common prefix (LCP) between neighboring suffixes and track the maximum length.
To accelerate substring length checks, binary search can also be applied on the possible duplicate length while comparing prefixes in the suffix array. Building the suffix array typically takes O(n log n) time with standard algorithms, and scanning for the maximum LCP takes O(n). The memory usage is linear because only arrays for suffix indices and LCP values are stored.
This method avoids hash collisions entirely and relies on deterministic lexicographic ordering. It’s widely used in advanced string processing systems such as text indexing and genome analysis. The downside is implementation complexity compared to rolling hash.
Recommended for interviews: The binary search + rolling hash solution is usually expected. It shows strong understanding of string algorithms, sliding window hashing, and search optimization. A brute-force comparison of all substrings demonstrates the problem structure but runs in O(n^2) or worse. The rolling hash optimization reduces comparisons dramatically and is the practical approach most interviewers want to see.
This approach leverages binary search in conjunction with the Rabin-Karp (rolling hash) algorithm to find the longest duplicate substring within a given string.
We perform binary search on the length of the possible substring, starting from 1 to length of s-1. For each mid-length obtained from the binary search, we use a rolling hash function to hash each substring of length mid. This hash is used to quickly identify duplicates due to its constant time complexity for fixed-length substrings.
This C code uses binary search and rolling hashing to find the longest duplicated substring. It iteratively searches for the length and identifies duplicates using a hash table with modular arithmetic to minimize collision.
Time Complexity: O(n log n), where n is the length of the string. The binary search takes O(log n), and for each midpoint, hashing takes O(n).
Space Complexity: O(n), primarily for storing hash values and powers of base.
This method involves constructing a suffix array from the input string and then performing binary search on the suffixes to find the longest duplicate substring.
Using suffix arrays, we can efficiently sort and group starting indices of the given string. Then, by employing binary search, we determine the largest-length substring that repeats. The Longest Common Prefix (LCP) array helps in assessing the similarity of suffixes at each binary search step.
In this C implementation, suffixes of the input string are sorted using quicksort. The maximum-length common prefix between any two consecutive suffixes is determined, and the largest one is recorded as the result. This is an efficient way to identify the longest duplicate substring.
Time Complexity: O(n^2 log n), primarily due to the sorting step where n is the length of the input string.
Space Complexity: O(n^2), largely for storing pointers to suffixes.
| Approach | Complexity |
|---|---|
| Binary Search with Rolling Hashing | Time Complexity: O(n log n), where n is the length of the string. The binary search takes O(log n), and for each midpoint, hashing takes O(n). Space Complexity: O(n), primarily for storing hash values and powers of base. |
| Suffix Array with Binary Search | Time Complexity: O(n^2 log n), primarily due to the sorting step where n is the length of the input string. Space Complexity: O(n^2), largely for storing pointers to suffixes. |
| Default Approach | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Brute Force Substring Comparison | O(n^3) | O(1) | Conceptual understanding or very small strings |
| Binary Search + Rolling Hash | O(n log n) | O(n) | Best practical solution for interviews and competitive programming |
| Suffix Array + LCP | O(n log n) | O(n) | When using advanced string-processing structures or deterministic comparisons |
Longest Duplicate Substring | TRIE | Rolling Hash | Binary Search | Leetcode #1044 • Techdose • 43,810 views views
Watch 9 more video solutions →Practice Longest Duplicate Substring with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor