Short Encoding of Words - Solution & Explanation

Q: Is Short Encoding of Words easy or hard?

Short Encoding of Words is rated Medium because the key insight about suffix reuse is not immediately obvious. Once you recognize that suffix words can be removed, the remaining implementation using a set or Trie becomes straightforward.

Q: Short Encoding of Words Python/Java solution

Most implementations either use a hash set to remove suffixes or build a reversed Trie. Both approaches translate cleanly into Python, Java, C++, C#, and JavaScript since they rely on basic string iteration, hash sets, and simple tree nodes.

Q: How to solve Short Encoding of Words in O(n)?

Treat the total number of characters across all words as n. Insert all words into a hash set and remove every suffix of each word starting from index 1. After pruning, sum len(word) + 1 for the remaining words. Each suffix removal and lookup is O(1) average, giving O(n) total processing.

Q: What is the best approach for Short Encoding of Words?

The reverse sorting and hash set approach is the most common solution. Insert all words into a set, then remove every suffix of each word. Only words that are not suffixes remain, and the answer is the sum of their lengths plus one for the '#' delimiter. This runs in O(total characters) time with O(total characters) space.

Q: Is Short Encoding of Words asked at Google/Amazon/Meta?

Suffix and Trie-based string compression problems like this appear in interviews at companies such as Amazon, Google, and Meta. Interviewers use it to evaluate understanding of suffix relationships, hash-based pruning, and Trie data structures.

Q: What data structure is used in Short Encoding of Words?

Two main structures appear in solutions: a hash set for quickly removing suffixes and a Trie for modeling shared suffix paths. The Trie version stores reversed words so suffixes become prefixes during insertion.

Q: What is the time complexity of Short Encoding of Words?

The optimal solutions run in O(total characters) time where total characters is the sum of lengths of all words. Both the suffix Trie and the hash set suffix-removal approach process each character a constant number of times. Space complexity is also O(total characters).

MediumArray Hash Table String Trie13 min readAsked at: Apple

Practice this problem

Problem Statement

A valid encoding of an array of words is any reference string s and array of indices indices such that:

words.length == indices.length
The reference string s ends with the '#' character.
For each index indices[i], the substring of s starting from indices[i] and up to (but not including) the next '#' character is equal to words[i].

Given an array of words, return the length of the shortest reference string s possible of any valid encoding of words.

Example 1:

Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"

Example 2:

Input: words = ["t"]
Output: 2
Explanation: A valid encoding would be s = "t#" and indices = [0].

Constraints:

1 <= words.length <= 2000
1 <= words[i].length <= 7
words[i] consists of only lowercase letters.

Approach Overview

Problem Overview: You are given a list of words and must encode them into a single reference string where each word appears as a suffix ending with #. The goal is to minimize the total length of this encoded string by reusing suffixes whenever possible.

Approach 1: Reverse Sorting and Hash Set (O(total characters) time, O(total characters) space)

The key observation: if a word is a suffix of another word, it does not need its own encoding in the final string. For example, time already covers me. Start by inserting all words into a set for fast lookups. Then iterate through each word and remove all of its suffixes (e.g., word[i:]) from the set. After processing every word, only the words that are not suffixes of any other remain. The final encoded length is the sum of len(word) + 1 for each remaining word to account for the trailing #. This approach relies heavily on constant-time membership checks provided by a hash table and simple substring iteration on each string. It’s concise, efficient, and usually the preferred interview solution.

Approach 2: Suffix Trie (O(total characters) time, O(total characters) space)

A more structured solution builds a Trie using reversed words. Reverse every word so suffix relationships become prefix relationships, then insert characters into a Trie. Each path represents shared suffixes between words. When a word finishes at a leaf node (a node with no children added after insertion), that word contributes len(word) + 1 to the final encoding length. Words that end at internal nodes are suffixes of longer words and do not increase the encoding size. This method explicitly models suffix sharing and avoids repeated substring checks. It’s particularly useful when practicing Trie design or when problems require explicit prefix/suffix structure handling.

Recommended for interviews: The reverse sorting + set approach is typically expected in interviews because it’s short and leverages a clear suffix insight. It demonstrates strong understanding of array iteration and hash-based pruning. The Trie solution is equally optimal in complexity but more verbose; it’s valuable when the interviewer wants to see knowledge of Trie construction and suffix transformations.

Approach 1: Using Suffix Trie

This approach involves building a trie that can be used to efficiently check if one word is a suffix of another. By storing only the unique paths in the trie, the encoding length can be minimized by pointing out the redundancies where one word is a suffix of another.

This solution uses a Trie (prefix tree) to store words in reversed order. It checks whether inserting a word introduces new paths in the Trie, indicating that it isn't a suffix of any word already added. For every new path, the length increases by the length of the word plus one for the '#'.

Code

Python C++Java C#JavaScript

Python

C++

Java

JavaScript

Complexity

Time Complexity: O(N * K) where N is the number of words and K is the average length of a word.
Space Complexity: O(N * K) due to the storage in the Trie.

Try this approach in the editor →

Approach 2: Reverse Sorting and Set

By reverse sorting the words and checking suffix existence, it's possible to efficiently determine the redundant entries. This alternative approach utilizes set operations to ascertain unique encodings, optimizing storage by including only necessary components.

This method sorts the words by reversed string, facilitating checks for suffixes as subsequent entries. By discarding suffixes and iterating over the unique words, it computes the total length effectively.

Code

Python C++Java C#JavaScript

Python

C++

Java

JavaScript

Complexity

Time Complexity: O(N * K^2), Space Complexity: O(N * K).

Try this approach in the editor →

Approach 3: Default Approach

Code

Python Java C++Go

Python

Java

C++

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Using Suffix Trie	Time Complexity: O(N * K) where N is the number of words and K is the average length of a word. Space Complexity: O(N * K) due to the storage in the Trie.
Reverse Sorting and Set	Time Complexity: O(N * K^2), Space Complexity: O(N * K).
Default Approach	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Reverse Sorting + Hash Set	O(total characters)	O(total characters)	Best general solution. Simple implementation with hash lookups and suffix removal.
Suffix Trie (reversed words)	O(total characters)	O(total characters)	Useful when practicing Trie data structures or when explicit suffix sharing needs to be modeled.

Video Solution

Short Encoding of Words | Live Coding with Explanation | Leetcode - 820 • Algorithms Made Easy • 4,147 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Short Encoding of Words easy or hard?

Short Encoding of Words is rated Medium because the key insight about suffix reuse is not immediately obvious. Once you recognize that suffix words can be removed, the remaining implementation using a set or Trie becomes straightforward.

Short Encoding of Words Python/Java solution

Most implementations either use a hash set to remove suffixes or build a reversed Trie. Both approaches translate cleanly into Python, Java, C++, C#, and JavaScript since they rely on basic string iteration, hash sets, and simple tree nodes.

How to solve Short Encoding of Words in O(n)?

Treat the total number of characters across all words as n. Insert all words into a hash set and remove every suffix of each word starting from index 1. After pruning, sum len(word) + 1 for the remaining words. Each suffix removal and lookup is O(1) average, giving O(n) total processing.

What is the best approach for Short Encoding of Words?

The reverse sorting and hash set approach is the most common solution. Insert all words into a set, then remove every suffix of each word. Only words that are not suffixes remain, and the answer is the sum of their lengths plus one for the '#' delimiter. This runs in O(total characters) time with O(total characters) space.

Is Short Encoding of Words asked at Google/Amazon/Meta?

Suffix and Trie-based string compression problems like this appear in interviews at companies such as Amazon, Google, and Meta. Interviewers use it to evaluate understanding of suffix relationships, hash-based pruning, and Trie data structures.

What data structure is used in Short Encoding of Words?

Two main structures appear in solutions: a hash set for quickly removing suffixes and a Trie for modeling shared suffix paths. The Trie version stores reversed words so suffixes become prefixes during insertion.

What is the time complexity of Short Encoding of Words?

The optimal solutions run in O(total characters) time where total characters is the sum of lengths of all words. Both the suffix Trie and the hash set suffix-removal approach process each character a constant number of times. Space complexity is also O(total characters).

Ready to solve this problem?

Practice Short Encoding of Words with our built-in code editor and test cases.

Practice on FleetCode

Two Sum

Median of Two Sorted Arrays

Problem Info

DifficultyMedium

Acceptance60.5%

Approaches3

Reading time13 min

Asked at

Apple

Practice this problem

Open in Editor

Short Encoding of Words - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Using Suffix Trie

Code

Complexity

Approach 2: Reverse Sorting and Set

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Short Encoding of Words - Solution & Explanation

Problem Statement

Approach Overview

Approach 1: Using Suffix Trie

Code

Complexity

Approach 2: Reverse Sorting and Set

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Using Suffix Trie

Code

Complexity

Approach 2: Reverse Sorting and Set

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Using Suffix Trie

Code

Complexity

Approach 2: Reverse Sorting and Set

Code

Complexity

Approach 3: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents