A valid encoding of an array of words is any reference string s and array of indices indices such that:
words.length == indices.lengths ends with the '#' character.indices[i], the substring of s starting from indices[i] and up to (but not including) the next '#' character is equal to words[i].Given an array of words, return the length of the shortest reference string s possible of any valid encoding of words.
Example 1:
Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"
Example 2:
Input: words = ["t"] Output: 2 Explanation: A valid encoding would be s = "t#" and indices = [0].
Constraints:
1 <= words.length <= 20001 <= words[i].length <= 7words[i] consists of only lowercase letters.Problem Overview: You are given a list of words and must encode them into a single reference string where each word appears as a suffix ending with #. The goal is to minimize the total length of this encoded string by reusing suffixes whenever possible.
Approach 1: Reverse Sorting and Hash Set (O(total characters) time, O(total characters) space)
The key observation: if a word is a suffix of another word, it does not need its own encoding in the final string. For example, time already covers me. Start by inserting all words into a set for fast lookups. Then iterate through each word and remove all of its suffixes (e.g., word[i:]) from the set. After processing every word, only the words that are not suffixes of any other remain. The final encoded length is the sum of len(word) + 1 for each remaining word to account for the trailing #. This approach relies heavily on constant-time membership checks provided by a hash table and simple substring iteration on each string. It’s concise, efficient, and usually the preferred interview solution.
Approach 2: Suffix Trie (O(total characters) time, O(total characters) space)
A more structured solution builds a Trie using reversed words. Reverse every word so suffix relationships become prefix relationships, then insert characters into a Trie. Each path represents shared suffixes between words. When a word finishes at a leaf node (a node with no children added after insertion), that word contributes len(word) + 1 to the final encoding length. Words that end at internal nodes are suffixes of longer words and do not increase the encoding size. This method explicitly models suffix sharing and avoids repeated substring checks. It’s particularly useful when practicing Trie design or when problems require explicit prefix/suffix structure handling.
Recommended for interviews: The reverse sorting + set approach is typically expected in interviews because it’s short and leverages a clear suffix insight. It demonstrates strong understanding of array iteration and hash-based pruning. The Trie solution is equally optimal in complexity but more verbose; it’s valuable when the interviewer wants to see knowledge of Trie construction and suffix transformations.
This approach involves building a trie that can be used to efficiently check if one word is a suffix of another. By storing only the unique paths in the trie, the encoding length can be minimized by pointing out the redundancies where one word is a suffix of another.
This solution uses a Trie (prefix tree) to store words in reversed order. It checks whether inserting a word introduces new paths in the Trie, indicating that it isn't a suffix of any word already added. For every new path, the length increases by the length of the word plus one for the '#'.
Python
C++
Java
C#
JavaScript
Time Complexity: O(N * K) where N is the number of words and K is the average length of a word.
Space Complexity: O(N * K) due to the storage in the Trie.
By reverse sorting the words and checking suffix existence, it's possible to efficiently determine the redundant entries. This alternative approach utilizes set operations to ascertain unique encodings, optimizing storage by including only necessary components.
This method sorts the words by reversed string, facilitating checks for suffixes as subsequent entries. By discarding suffixes and iterating over the unique words, it computes the total length effectively.
Python
C++
Java
C#
JavaScript
Time Complexity: O(N * K^2), Space Complexity: O(N * K).
| Approach | Complexity |
|---|---|
| Using Suffix Trie | Time Complexity: O(N * K) where N is the number of words and K is the average length of a word. |
| Reverse Sorting and Set | Time Complexity: O(N * K^2), Space Complexity: O(N * K). |
| Default Approach | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Reverse Sorting + Hash Set | O(total characters) | O(total characters) | Best general solution. Simple implementation with hash lookups and suffix removal. |
| Suffix Trie (reversed words) | O(total characters) | O(total characters) | Useful when practicing Trie data structures or when explicit suffix sharing needs to be modeled. |
Short Encoding of Words | Live Coding with Explanation | Leetcode - 820 • Algorithms Made Easy • 4,147 views views
Watch 9 more video solutions →Practice Short Encoding of Words with our built-in code editor and test cases.
Practice on FleetCode