Given an array of strings words, return the smallest string that contains each string in words as a substring. If there are multiple valid strings of the smallest length, return any of them.
You may assume that no string in words is a substring of another string in words.
Example 1:
Input: words = ["alex","loves","leetcode"] Output: "alexlovesleetcode" Explanation: All permutations of "alex","loves","leetcode" would also be accepted.
Example 2:
Input: words = ["catg","ctaagt","gcta","ttca","atgcatc"] Output: "gctaagttcatgcatc"
Constraints:
1 <= words.length <= 121 <= words[i].length <= 20words[i] consists of lowercase English letters.words are unique.The Find the Shortest Superstring problem asks us to combine a set of strings into the shortest possible string such that every word appears as a substring. A common strategy is to model the problem similarly to the Traveling Salesman Problem using dynamic programming with bitmasking.
First, compute pairwise overlaps between all words. For each pair (i, j), determine how many characters from the end of word i overlap with the beginning of word j. This allows us to avoid re‑adding duplicate characters when concatenating strings.
Next, use a DP state like dp[mask][i], where mask represents the subset of words already used and i is the last word in the current superstring. Transition by appending unused words while maximizing overlap. Bitmasking efficiently tracks which words are included.
Finally, reconstruct the path that produced the optimal result to build the final superstring. This approach efficiently explores permutations while pruning redundant states.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Dynamic Programming with Bitmask (TSP-style) | O(n^2 * 2^n) | O(n * 2^n) |
NeetCode
This approach uses dynamic programming with bitmasking to explore all subsets of the strings. The dynamic programming state dp[mask][i] is the shortest superstring for the subset of words represented by the bitmask 'mask' ending with the 'i-th' string. By iterating through all possible states and updating them based on the previously computed states, we determine the shortest superstring that includes all words. The final superstring is constructed by backtracking through the computed states.
Time Complexity: O(n^2 * 2^n) where n is the number of words. This is because there are 2^n subsets and each subset operation can have O(n^2) complexity for managing overlaps.
Space Complexity: O(n * 2^n) for storing DP and path arrays.
1def shortestSuperstring(words):
2 n = len(words)
3
4 # Calculate the overlap between two strings
5 def get_overlap(a, b):
6 max_overlap = 0
7 for i in range(len(a)):
8 if b.startswith(a[i:]):
9 max_overlap = len(a) - i
10 break
11 return max_overlap
12
13 overlaps = [[0] * n for _ in range(n)]
14 for i in range(n):
15 for j in range(n):
16 if i != j:
17 overlaps[i][j] = get_overlap(words[i], words[j])
18
19 dp = [[0] * n for _ in range(1 << n)]
20 path = [[-1] * n for _ in range(1 << n)]
21
22 for mask in range(1, 1 << n):
23 for bit in range(n):
24 if (mask >> bit) & 1:
25 prev_mask = mask ^ (1 << bit)
26 if prev_mask == 0:
27 continue
28 for i in range(n):
29 if (prev_mask >> i) & 1:
30 value = dp[prev_mask][i] + overlaps[i][bit]
31 if value > dp[mask][bit]:
32 dp[mask][bit] = value
33 path[mask][bit] = i
34
35 max_overlap = max(dp[-1])
36 last = dp[-1].index(max_overlap)
37
38 mask = (1 << n) - 1
39 perm = []
40 while last != -1:
41 perm.append(last)
42 temp = last
43 last = path[mask][last]
44 mask ^= (1 << temp)
45
46 perm = perm[::-1]
47
48 answer = words[perm[0]]
49 for i in range(1, len(perm)):
50 overlap_len = overlaps[perm[i - 1]][perm[i]]
51 answer += words[perm[i]][overlap_len:]
52
53 return answer
54
55# Example usage
56words = ["catg", "ctaagt", "gcta", "ttca", "atgcatc"]
57print(shortestSuperstring(words))The Python solution calculates the maximum overlap for every pair of strings and uses dynamic programming to find the shortest superstring that can be constructed. Overlaps are computed once and stored in a 2D list "overlaps". The dp table keeps track of the optimal solution for every subset of strings represented by a bitmask, and backtracking on this table gives the final superstring.
This approach translates the problem into a variant of the Travelling Salesman Problem (TSP) where the cities correspond to the strings and the distances between them are the inverse of the overlap size. The solution can use heuristics to find a short path that visits all the 'cities' once, effectively creating the shortest superstring. Since exact TSP is NP-hard, approximation algorithms or heuristics can speed up finding a good solution.
Time Complexity: O(n^2 * 2^n), where n is the number of strings. This is due to calculating overlaps for every pair and iterating over all subsets.
Space Complexity: O(n * 2^n) for storing DP tables.
1#include <iostream>
2#include <vector>
3#include <string>
4#include <limits.h>
5using namespace std;
6
int calculate_overlap(string a, string b) {
int max_overlap = 0;
for (int i = 0; i < a.size(); ++i) {
if (b.find(a.substr(i)) == 0) {
max_overlap = a.size() - i;
break;
}
}
return max_overlap;
}
string shortestSuperstring(vector<string>& words) {
int n = words.size();
vector<vector<int>> overlaps(n, vector<int>(n, 0));
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
if (i != j) {
overlaps[i][j] = calculate_overlap(words[i], words[j]);
}
}
}
int max_mask = 1 << n;
vector<vector<int>> dp(max_mask, vector<int>(n, 0));
vector<vector<int>> parent(max_mask, vector<int>(n, -1));
for (int mask = 1; mask < max_mask; ++mask) {
for (int u = 0; u < n; ++u) {
if ((mask & (1 << u)) == 0) continue;
int prev_mask = mask ^ (1 << u);
if (prev_mask == 0) continue;
for (int v = 0; v < n; ++v) {
if ((prev_mask & (1 << v)) == 0) continue;
int val = dp[prev_mask][v] + overlaps[v][u];
if (val > dp[mask][u]) {
dp[mask][u] = val;
parent[mask][u] = v;
}
}
}
}
int max_overlap = 0, last = 0;
for (int i = 0; i < n; ++i) {
if (dp[max_mask - 1][i] > max_overlap) {
max_overlap = dp[max_mask - 1][i];
last = i;
}
}
vector<int> perm;
int mask = max_mask - 1;
while (last != -1) {
perm.push_back(last);
int temp = last;
last = parent[mask][last];
mask ^= (1 << temp);
}
reverse(perm.begin(), perm.end());
string result = words[perm[0]];
for (int i = 1; i < perm.size(); ++i) {
int overlap_len = overlaps[perm[i - 1]][perm[i]];
result += words[perm[i]].substr(overlap_len);
}
return result;
}
int main() {
vector<string> words = {"catg", "ctaagt", "gcta", "ttca", "atgcatc"};
cout << shortestSuperstring(words) << endl;
return 0;
}This approach involves generating all permutations of the words and computing the shortest superstring by calculating overlaps greedily. We concatenate each pair of words in the permutation, reducing the length by maximizing the overlap between consecutive strings. This brute force method is feasible with a small number of words due to its factorial time complexity but ensures the optimal solution is found.
Time Complexity: O(n! * len^2) where n is the number of words and len is the average length of words.
Space Complexity: O(n * len) for storing intermediate permutations and results.
This approach models the problem using dynamic programming with a bitmask and Traveling Salesman Problem (TSP) style optimization. By representing used words as a bitmask, we systematically build the shortest superstring by adding each word while maximizing overlaps, applying dynamic programming to store and reuse results of subproblems.
Time Complexity: O(n^2 * 2^n) due to combination of all subsets and pairwise overlaps.
Space Complexity: O(n * 2^n) for dp and parent tracking arrays.
1
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
Yes, variations of the shortest superstring and overlap-based string merging problems appear in high-level coding interviews. They test dynamic programming, bitmasking, and graph-like optimization skills.
Bitmasking compactly represents subsets of words and allows efficient transitions between states in dynamic programming. It helps explore all combinations of words without storing large subset structures.
The optimal approach uses dynamic programming with bitmasking. By precomputing overlaps between strings and treating the problem like a Traveling Salesman path, we track subsets of used words and build the shortest possible superstring.
Key structures include arrays for storing overlaps and a DP table such as dp[mask][i] to represent subsets of words. Bitmasks efficiently represent which strings are already included in the current superstring.
This C++ solution uses a dynamical approach similar to TSP by using bitmasks to represent subsets of words. The overlap between strings is used to define "distances", and the algorithm tries to maximize the overlap, effectively minimizing the path length, or the resulting superstring length. The "cost" or path length is maximized by summing up the overlaps between consecutive elements in permutations.
This C program defines a function shortestSuperstring that uses permutation and greedy approach to find the shortest superstring for the given array of words. The auxiliary function findOverlap calculates maximum overlap between two strings and constructs a new string by overlapping. The permute function generates permutations and keeps track of the shortest constructed superstring.
This Python solution uses dynamic programming and bitmasking to reduce complexity similar to solving TSP. The overlap matrix is precomputed, and dp array tracks minimum superstring lengths for subsets and specific last words, optimizing the shortest path with parent aiding path reconstruction.