You are given a 0-indexed string s and a dictionary of words dictionary. You have to break s into one or more non-overlapping substrings such that each substring is present in dictionary. There may be some extra characters in s which are not present in any of the substrings.
Return the minimum number of extra characters left over if you break up s optimally.
Example 1:
Input: s = "leetscode", dictionary = ["leet","code","leetcode"] Output: 1 Explanation: We can break s in two substrings: "leet" from index 0 to 3 and "code" from index 5 to 8. There is only 1 unused character (at index 4), so we return 1.
Example 2:
Input: s = "sayhelloworld", dictionary = ["hello","world"] Output: 3 Explanation: We can break s in two substrings: "hello" from index 3 to 7 and "world" from index 8 to 12. The characters at indices 0, 1, 2 are not used in any substring and thus are considered as extra characters. Hence, we return 3.
Constraints:
1 <= s.length <= 501 <= dictionary.length <= 501 <= dictionary[i].length <= 50dictionary[i] and s consists of only lowercase English lettersdictionary contains distinct wordsProblem Overview: You are given a string s and a dictionary of valid words. The goal is to split the string into dictionary words so that the number of leftover characters (characters not part of any word) is minimized.
Approach 1: Dynamic Programming (O(n²) time, O(n) space)
This approach treats the problem as a prefix optimization task using dynamic programming. Create a DP array where dp[i] represents the minimum number of extra characters in the substring starting at index i. Iterate from the end of the string toward the beginning. At each position, assume the current character is extra (1 + dp[i+1]), then check every substring s[i:j]. If the substring exists in the dictionary (stored in a hash table for O(1) lookups), update dp[i] with dp[j]. This effectively tries all valid word breaks while minimizing leftover characters. Time complexity is O(n²) due to checking substrings, and space complexity is O(n) for the DP array.
Approach 2: Trie and Memoization (O(n * L) average time, O(n + totalDictChars) space)
This version improves substring checking using a Trie. Insert all dictionary words into the Trie, then traverse the string while simultaneously walking the Trie. From index i, explore characters forward as long as they match a Trie path. Every time you hit a word-ending node, recursively compute the result for the remaining suffix. Memoization caches results for each index so the same suffix is never recomputed. The key advantage is that substring checks become incremental character traversals instead of repeated string slicing. The time complexity is roughly O(n * L), where L is the maximum dictionary word length, while space complexity includes the memo array plus Trie storage.
Recommended for interviews: The dynamic programming solution is typically expected because it clearly demonstrates optimal substructure and efficient state transitions. It is straightforward to implement and easy to reason about during an interview. The Trie + memoization approach shows deeper knowledge of string optimization and can reduce redundant substring checks when the dictionary contains many overlapping prefixes.
In this approach, we use dynamic programming to solve the problem. We define a DP array where dp[i] represents the minimum number of extra characters from index 0 to i of the string s. Initially, we set dp[0] to 0 because there are no extra characters in an empty prefix. For each index i while iterating through the string s, we check every j from 0 to i-1 and see if substring s[j:i] exists in the dictionary. If yes, we update dp[i] to min(dp[i], dp[j]), otherwise, we set dp[i] to dp[i-1] + 1.
This C solution uses a dynamic array to store the minimum number of extra characters needed for each prefix of the string. For each end index i, we check all starting indices to see if a substring is in the dictionary and update the DP accordingly, minimizing extra characters added.
Time Complexity: O(N^2*M), where N is the length of the string s and M is the average length of words in the dictionary.
Space Complexity: O(N) because of the DP array.
To solve the problem using a Trie and memoization, we first build a Trie from the dictionary. We then use a recursive function with memoization to attempt to decompose the string s into valid segments. For each position in the string, we check possible substrings against the Trie, saving calculated results to avoid redundant computations.
This C solution utilizes a Trie to store dictionary words for fast matching. It also uses memoization to recursively find the minimum extra characters. Each call attempts to split the string from the current index, using the Trie to validate substrings.
Time Complexity: O(N*M), with N being the string length and M the average dictionary word length due to Trie traversal.
Space Complexity: O(N + T), N is for the memo array and T is for Trie storage.
We can use a hash table ss to record all words in the dictionary, which allows us to quickly determine whether a string is in the dictionary.
Next, we define f[i] to represent the minimum number of extra characters in the first i characters of string s, initially f[0] = 0.
When i \ge 1, the ith character s[i - 1] can be an extra character, in which case f[i] = f[i - 1] + 1. If there exists an index j \in [0, i - 1] such that s[j..i) is in the hash table ss, then we can take s[j..i) as a word, in which case f[i] = f[j].
In summary, we can get the state transition equation:
$
f[i] = min { f[i - 1] + 1, min_{j \in [0, i - 1]} f[j] }
where i \ge 1, and j \in [0, i - 1] and s[j..i) is in the hash table ss.
The final answer is f[n].
The time complexity is O(n^3 + L), and the space complexity is O(n + L). Here, n is the length of string s, and L$ is the sum of the lengths of all words in the dictionary.
Python
Java
C++
Go
TypeScript
Rust
JavaScript
We can use a trie to optimize the time complexity of Solution 1.
Specifically, we first insert each word in the dictionary into the trie root in reverse order, then we define f[i] to represent the minimum number of extra characters in the first i characters of string s, initially f[0] = 0.
When i \ge 1, the ith character s[i - 1] can be an extra character, in which case f[i] = f[i - 1] + 1. We can also enumerate the index j in reverse order in the range [0..i-1], and determine whether s[j..i) is in the trie root. If it exists, then we can take s[j..i) as a word, in which case f[i] = f[j].
The time complexity is O(n^2 + L), and the space complexity is O(n + L times |\Sigma|). Here, n is the length of string s, and L is the sum of the lengths of all words in the dictionary. Additionally, |\Sigma| is the size of the character set. In this problem, the character set is lowercase English letters, so |\Sigma| = 26.
Python
Java
C++
Go
TypeScript
| Approach | Complexity |
|---|---|
| Dynamic Programming Approach | Time Complexity: O(N^2*M), where N is the length of the string s and M is the average length of words in the dictionary. Space Complexity: O(N) because of the DP array. |
| Trie and Memoization Approach | Time Complexity: O(N*M), with N being the string length and M the average dictionary word length due to Trie traversal. Space Complexity: O(N + T), N is for the memo array and T is for Trie storage. |
| Hash Table + Dynamic Programming | — |
| Trie + Dynamic Programming | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Dynamic Programming with Hash Set | O(n²) | O(n) | General case. Simple implementation and common interview expectation. |
| Trie with Memoization | O(n * L) | O(n + dictionary size) | When dictionary has many overlapping prefixes and repeated substring checks become expensive. |
Extra Characters in a String - Leetcode 2707 - Python • NeetCodeIO • 23,906 views views
Watch 9 more video solutions →Practice Extra Characters in a String with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor