Given a string s, return the number of distinct substrings of s.
A substring of a string is obtained by deleting any number of characters (possibly zero) from the front of the string and any number (possibly zero) from the back of the string.
Example 1:
Input: s = "aabbaba" Output: 21 Explanation: The set of distinct strings is ["a","b","aa","bb","ab","ba","aab","abb","bab","bba","aba","aabb","abba","bbab","baba","aabba","abbab","bbaba","aabbab","abbaba","aabbaba"]
Example 2:
Input: s = "abcdefg" Output: 28
Constraints:
1 <= s.length <= 500s consists of lowercase English letters.Follow up: Can you solve this problem in
O(n) time complexity?Problem Overview: Given a string s, count how many different substrings appear in it. Substrings with the same characters and order count only once, even if they appear multiple times at different positions.
Approach 1: Brute Force Enumeration (O(n^3) time, O(n^2) space)
The most direct strategy is to generate every possible substring and track which ones are unique. Use two nested loops to choose the start and end index of each substring, then extract s[i:j] and insert it into a hash set. Because substring creation itself can take O(n) time, the total complexity becomes O(n^3) in languages where slicing copies characters. The set ensures duplicates are removed automatically. This approach is easy to implement and useful for understanding the problem, but it becomes slow for large strings because the number of substrings grows to n(n+1)/2. It relies mainly on basic string operations and hash sets.
Approach 2: Rolling Hash (String Hashing) (O(n^2) time, O(n^2) space)
A faster approach avoids repeatedly copying substrings by hashing them. Use a polynomial rolling hash so each substring hash can be computed incrementally while expanding the right boundary. For each starting index i, extend the substring character by character and update the hash value in constant time using hash = hash * base + value. Store each computed hash in a set. Because hash updates are O(1), generating all substring hashes takes O(n^2) time instead of O(n^3). The set of hashes represents distinct substrings without storing the full text. This technique is common in rolling hash and substring comparison problems.
More advanced structures such as a trie, suffix automaton, or suffix array can also solve this problem efficiently by representing all suffixes of the string and counting unique paths. Those methods can reach near O(n) or O(n log n) complexity but require significantly more implementation effort compared to hashing.
Recommended for interviews: Start by explaining the brute force enumeration to show you understand how substrings are generated and how duplicates arise. Then move to the rolling hash optimization. Interviewers typically expect the O(n^2) hashing solution because it demonstrates knowledge of substring hashing and efficient duplicate detection while keeping the implementation manageable during a coding interview.
Enumerate all substrings and use a hash table to record the count of different substrings.
The time complexity is O(n^3), and the space complexity is O(n^2). Here, n is the length of the string.
String hashing is a method to map a string of any length to a non-negative integer, and the probability of collision is almost zero. String hashing is used to calculate the hash value of a string, which can quickly determine whether two strings are equal.
We take a fixed value BASE, treat the string as a number in BASE radix, and assign a value greater than 0 to represent each character. Generally, the values we assign are much smaller than BASE. For example, for a string composed of lowercase letters, we can set a=1, b=2, ..., z=26. We take a fixed value MOD, calculate the remainder of the BASE radix number to MOD, and use it as the hash value of the string.
Generally, we take BASE=131 or BASE=13331, at which point the probability of collision of the hash value is extremely low. As long as the hash values of two strings are the same, we consider the two strings to be equal. Usually, MOD is taken as 2^64. In C++, we can directly use the unsigned long long type to store this hash value. When calculating, we do not handle arithmetic overflow. When overflow occurs, it is equivalent to automatically taking the modulus of 2^64, which can avoid inefficient modulus operations.
Except for extremely specially constructed data, the above hash algorithm is unlikely to cause collisions. In general, the above hash algorithm can appear in the standard answer of the problem. We can also take some appropriate BASE and MOD values (such as large prime numbers), perform several groups of hash operations, and only consider the original strings equal when the results are all the same, making it even more difficult to construct data that causes this hash to produce errors.
The time complexity is O(n^2), and the space complexity is O(n^2). Here, n is the length of the string.
| Approach | Complexity |
|---|---|
| Brute Force Enumeration | — |
| String Hashing | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Brute Force Enumeration with Set | O(n^3) | O(n^2) | Good for understanding substring generation or when input size is very small |
| Rolling Hash (String Hashing) | O(n^2) | O(n^2) | General solution for interviews and competitive programming where substring comparison must be efficient |
L4. Number of Distinct Substrings in a String | Trie | C++ | Java • take U forward • 118,962 views views
Watch 3 more video solutions →Practice Number of Distinct Substrings in a String with our built-in code editor and test cases.
Practice on FleetCode