Longest Duplicate Substring - Solution & Explanation

Q: Is Longest Duplicate Substring easy or hard?

Longest Duplicate Substring is classified as a hard problem because it requires combining multiple algorithmic concepts such as binary search, rolling hash, or suffix arrays. Efficient solutions must avoid naive substring comparisons, which quickly become too slow for large inputs.

Q: How to solve Longest Duplicate Substring in O(n)?

True O(n) solutions exist using advanced suffix array or suffix tree algorithms with linear-time construction and longest common prefix processing. These techniques are complex and rarely expected in interviews. Most practical implementations use O(n log n) time with binary search and rolling hash.

Q: What is the best approach for Longest Duplicate Substring?

Binary search combined with rolling hash (Rabin–Karp) is the most practical approach. It searches for the maximum substring length while verifying duplicates using hash values in O(n) time per check. The overall complexity becomes O(n log n) with O(n) extra space. This solution is widely used in coding interviews and competitive programming.

Q: Is Longest Duplicate Substring asked at Google/Amazon/Meta?

Longest Duplicate Substring is considered a hard string problem and appears in interviews at companies that test advanced algorithmic knowledge. Variants involving rolling hash, suffix arrays, or longest repeated substring logic have been reported in interviews at companies like Google and Amazon.

Q: What data structure is used in Longest Duplicate Substring?

Common structures include hash sets for storing rolling hash values, arrays for prefix hashes and powers, and suffix arrays for lexicographic ordering of suffixes. These structures enable efficient substring comparison without directly comparing characters repeatedly.

Q: What is the time complexity of Longest Duplicate Substring?

The optimal solutions run in O(n log n) time. Binary search over substring length requires log n iterations, and each iteration checks duplicates using rolling hash in O(n). Suffix array based approaches also run in roughly O(n log n) time depending on the construction algorithm.

Q: Longest Duplicate Substring Python or Java solution approach?

Both Python and Java solutions typically implement binary search on substring length with a rolling hash (Rabin–Karp). A sliding window computes hash values for substrings of length L, and duplicates are detected using a hash set. This approach runs in O(n log n) time and O(n) space.

HardString Binary Search Sliding Window Rolling Hash17 min readAsked at: Amazon, Goldman Sachs, Meta +4

Practice this problem

Problem Statement

Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.

Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".

Example 1:

Input: s = "banana"
Output: "ana"

Example 2:

Input: s = "abcd"
Output: ""

Constraints:

2 <= s.length <= 3 * 10⁴
s consists of lowercase English letters.

Approach Overview

Problem Overview: Given a string s, return the longest substring that appears at least twice. The duplicates can overlap, but the returned substring must be the longest possible repeated sequence. If none exist, return an empty string.

Approach 1: Binary Search with Rolling Hash (Rabin–Karp) (Time: O(n log n), Space: O(n))

This approach combines binary search with a rolling hash to efficiently test substring lengths. Instead of checking every substring pair, you binary search the length L of the duplicate substring. For a given length, compute rolling hashes for all substrings of length L using a sliding window and store them in a hash set. If two substrings produce the same hash, a duplicate of length L exists.

The key insight: if a duplicate substring of length L exists, then duplicates also exist for all lengths smaller than L. That monotonic property makes binary search valid. Each check runs in O(n) using rolling hash updates while sliding across the string. Combined with log n binary search steps, the total complexity becomes O(n log n). This is the most common solution used in competitive programming and interviews.

Approach 2: Suffix Array with Binary Search (Time: O(n log n), Space: O(n))

A more classical string-processing solution uses a suffix array. First generate all suffixes of the string and sort them lexicographically. When suffixes are sorted, duplicate substrings appear as common prefixes between adjacent suffixes. You then compute the longest common prefix (LCP) between neighboring suffixes and track the maximum length.

To accelerate substring length checks, binary search can also be applied on the possible duplicate length while comparing prefixes in the suffix array. Building the suffix array typically takes O(n log n) time with standard algorithms, and scanning for the maximum LCP takes O(n). The memory usage is linear because only arrays for suffix indices and LCP values are stored.

This method avoids hash collisions entirely and relies on deterministic lexicographic ordering. It’s widely used in advanced string processing systems such as text indexing and genome analysis. The downside is implementation complexity compared to rolling hash.

Recommended for interviews: The binary search + rolling hash solution is usually expected. It shows strong understanding of string algorithms, sliding window hashing, and search optimization. A brute-force comparison of all substrings demonstrates the problem structure but runs in O(n^2) or worse. The rolling hash optimization reduces comparisons dramatically and is the practical approach most interviewers want to see.

Approach 1: Binary Search with Rolling Hashing

This approach leverages binary search in conjunction with the Rabin-Karp (rolling hash) algorithm to find the longest duplicate substring within a given string.

We perform binary search on the length of the possible substring, starting from 1 to length of s-1. For each mid-length obtained from the binary search, we use a rolling hash function to hash each substring of length mid. This hash is used to quickly identify duplicates due to its constant time complexity for fixed-length substrings.

This C code uses binary search and rolling hashing to find the longest duplicated substring. It iteratively searches for the length and identifies duplicates using a hash table with modular arithmetic to minimize collision.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n log n), where n is the length of the string. The binary search takes O(log n), and for each midpoint, hashing takes O(n).

Space Complexity: O(n), primarily for storing hash values and powers of base.

Try this approach in the editor →

Approach 2: Suffix Array with Binary Search

This method involves constructing a suffix array from the input string and then performing binary search on the suffixes to find the longest duplicate substring.

Using suffix arrays, we can efficiently sort and group starting indices of the given string. Then, by employing binary search, we determine the largest-length substring that repeats. The Longest Common Prefix (LCP) array helps in assessing the similarity of suffixes at each binary search step.

In this C implementation, suffixes of the input string are sorted using quicksort. The maximum-length common prefix between any two consecutive suffixes is determined, and the largest one is recorded as the result. This is an efficient way to identify the longest duplicate substring.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n^2 log n), primarily due to the sorting step where n is the length of the input string.

Space Complexity: O(n^2), largely for storing pointers to suffixes.

Try this approach in the editor →

Approach 3: Default Approach

Code

Python Java C++Go

Python

Java

C++

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Binary Search with Rolling Hashing	Time Complexity: O(n log n), where n is the length of the string. The binary search takes O(log n), and for each midpoint, hashing takes O(n). Space Complexity: O(n), primarily for storing hash values and powers of base.
Suffix Array with Binary Search	Time Complexity: O(n^2 log n), primarily due to the sorting step where n is the length of the input string. Space Complexity: O(n^2), largely for storing pointers to suffixes.
Default Approach	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Brute Force Substring Comparison	O(n^3)	O(1)	Conceptual understanding or very small strings
Binary Search + Rolling Hash	O(n log n)	O(n)	Best practical solution for interviews and competitive programming
Suffix Array + LCP	O(n log n)	O(n)	When using advanced string-processing structures or deterministic comparisons

Video Solution

Longest Duplicate Substring | TRIE | Rolling Hash | Binary Search | Leetcode #1044 • Techdose • 43,810 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Longest Duplicate Substring easy or hard?

Longest Duplicate Substring is classified as a hard problem because it requires combining multiple algorithmic concepts such as binary search, rolling hash, or suffix arrays. Efficient solutions must avoid naive substring comparisons, which quickly become too slow for large inputs.

How to solve Longest Duplicate Substring in O(n)?

True O(n) solutions exist using advanced suffix array or suffix tree algorithms with linear-time construction and longest common prefix processing. These techniques are complex and rarely expected in interviews. Most practical implementations use O(n log n) time with binary search and rolling hash.

What is the best approach for Longest Duplicate Substring?

Binary search combined with rolling hash (Rabin–Karp) is the most practical approach. It searches for the maximum substring length while verifying duplicates using hash values in O(n) time per check. The overall complexity becomes O(n log n) with O(n) extra space. This solution is widely used in coding interviews and competitive programming.

Is Longest Duplicate Substring asked at Google/Amazon/Meta?

Longest Duplicate Substring is considered a hard string problem and appears in interviews at companies that test advanced algorithmic knowledge. Variants involving rolling hash, suffix arrays, or longest repeated substring logic have been reported in interviews at companies like Google and Amazon.

What data structure is used in Longest Duplicate Substring?

Common structures include hash sets for storing rolling hash values, arrays for prefix hashes and powers, and suffix arrays for lexicographic ordering of suffixes. These structures enable efficient substring comparison without directly comparing characters repeatedly.

What is the time complexity of Longest Duplicate Substring?

The optimal solutions run in O(n log n) time. Binary search over substring length requires log n iterations, and each iteration checks duplicates using rolling hash in O(n). Suffix array based approaches also run in roughly O(n log n) time depending on the construction algorithm.

Longest Duplicate Substring Python or Java solution approach?

Both Python and Java solutions typically implement binary search on substring length with a rolling hash (Rabin–Karp). A sliding window computes hash values for substrings of length L, and duplicates are detected using a hash set. This approach runs in O(n log n) time and O(n) space.

Ready to solve this problem?

Practice Longest Duplicate Substring with our built-in code editor and test cases.

Practice on FleetCode

Longest Substring Without Repeating Characters

Longest Palindromic Substring

Problem Info

DifficultyHard

Acceptance31.1%

Approaches3

Reading time17 min

Asked at

Amazon Goldman Sachs Meta Coupang Google

Practice this problem

Open in Editor

Longest Duplicate Substring - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Binary Search with Rolling Hashing

Code

Complexity

Approach 2: Suffix Array with Binary Search

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Longest Duplicate Substring - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Binary Search with Rolling Hashing

Code

Complexity

Approach 2: Suffix Array with Binary Search

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Binary Search with Rolling Hashing

Code

Complexity

Approach 2: Suffix Array with Binary Search

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Binary Search with Rolling Hashing

Code

Complexity

Approach 2: Suffix Array with Binary Search

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents