{x}
blog image

Find the Shortest Superstring

Given an array of strings words, return the smallest string that contains each string in words as a substring. If there are multiple valid strings of the smallest length, return any of them.

You may assume that no string in words is a substring of another string in words.

 

Example 1:

Input: words = ["alex","loves","leetcode"]
Output: "alexlovesleetcode"
Explanation: All permutations of "alex","loves","leetcode" would also be accepted.

Example 2:

Input: words = ["catg","ctaagt","gcta","ttca","atgcatc"]
Output: "gctaagttcatgcatc"

 

Constraints:

  • 1 <= words.length <= 12
  • 1 <= words[i].length <= 20
  • words[i] consists of lowercase English letters.
  • All the strings of words are unique.

Solution Explanation for Find the Shortest Superstring

This problem asks to find the shortest string that contains all strings in the input array words as substrings. The solution uses a dynamic programming approach with bit manipulation to efficiently explore the possible combinations of string concatenations.

Approach:

  1. Overlap Calculation: The solution starts by computing an overlap matrix g. g[i][j] stores the maximum overlap between the end of words[i] and the beginning of words[j]. This is crucial for determining how efficiently we can concatenate strings.

  2. Dynamic Programming: The core of the solution is a dynamic programming table dp. dp[i][j] represents the maximum total overlap achievable when considering a subset of strings represented by the bitmask i and ending with string j. A bitmask is used to efficiently represent subsets of strings; each bit corresponds to a string in words. If the j-th bit is set, the j-th string is included in the subset.

  3. Backtracking: A second table p stores the predecessor string index for each state in dp. This allows for reconstructing the optimal concatenation order after the dp table is computed. The backtracking starts from the state where all strings are included and traces back to find the optimal sequence.

  4. String Construction: Once the optimal sequence is found, the strings are concatenated considering the overlaps to produce the shortest superstring.

Time Complexity Analysis:

  • Overlap Calculation: The overlap calculation takes O(n²m) time, where n is the number of words and m is the maximum length of a word. This is because comparing overlaps between all pairs of strings can take up to O(m) time for each pair.
  • Dynamic Programming: The dynamic programming step iterates through all possible subsets (2n) and for each subset iterates through all possible ending strings (n). This results in O(n * 2n) time complexity.
  • Backtracking: Backtracking takes O(n) time in the worst case to reconstruct the sequence.
  • String Construction: String construction is also O(nm) in the worst case.

Therefore, the overall time complexity is dominated by the dynamic programming step and is O(n * 2n + n²m). The exponential term makes it clear that this algorithm is not suitable for very large numbers of strings.

Space Complexity Analysis:

The space complexity is dominated by the dp and p tables, both of which have dimensions O(n * 2n). The overlap matrix g requires O(n²) space. Therefore, the overall space complexity is O(n * 2n). This is also exponential with respect to the number of strings.

Code Explanation (Python):

The provided Python code implements this algorithm directly. The key parts are:

  • g matrix calculation: Loops through pairs of words to find the maximum overlap.
  • dp and p matrix computation: This is the dynamic programming core, recursively finding the best concatenation order using bit manipulation to represent subsets efficiently.
  • Backtracking: Using p to reconstruct the optimal sequence of word indices.
  • String construction: Concatenating the words in the found sequence, accounting for the calculated overlaps.

The code in other languages (Java, C++, Go) follows the same logic and algorithmic structure, differing only in syntax and specific library functions used. The core dynamic programming and backtracking steps remain consistent across all implementations.