A valid encoding of an array of words
is any reference string s
and array of indices indices
such that:
words.length == indices.length
s
ends with the '#'
character.indices[i]
, the substring of s
starting from indices[i]
and up to (but not including) the next '#'
character is equal to words[i]
.Given an array of words
, return the length of the shortest reference string s
possible of any valid encoding of words
.
Example 1:
Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5
].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"
Example 2:
Input: words = ["t"] Output: 2 Explanation: A valid encoding would be s = "t#" and indices = [0].
Constraints:
1 <= words.length <= 2000
1 <= words[i].length <= 7
words[i]
consists of only lowercase letters.The problem asks for the shortest reference string that can encode a given array of words. A valid encoding requires that each word is a substring of the reference string, delimited by '#' characters. The solutions leverage a Trie data structure to efficiently find the shortest encoding.
Approach:
Both solutions utilize a Trie, a tree-like data structure used for efficient retrieval of strings. The core idea is to insert the words (reversed) into the Trie. The length of the shortest encoding is related to the number of nodes in the Trie that are leaf nodes (i.e., nodes without children representing the end of a word).
Solution 1: Depth-First Search (DFS) on Trie
dfs
function recursively calculates the length contribution of each subtree. The total length of the shortest encoding is the sum of the lengths of paths to all leaf nodes.Solution 2: Trie Insertion with Length Tracking
This solution is more efficient. It uses a Trie similar to the previous solution. However, instead of doing DFS, it directly tracks the length during insertion:
insert
function returns the length of the word only if it represents a unique suffix (i.e., it doesn't match a prefix already in the Trie). Otherwise, it returns 0 indicating that it's already been encoded. The sum of these lengths is the final result.Time Complexity Analysis:
Space Complexity Analysis:
For both solutions, the space complexity is O(Σ|w|) due to the Trie structure, which stores all unique suffixes of the words.
In summary, Solution 2 is generally preferred because it avoids the extra overhead of DFS, making it more efficient, especially for a large number of words. Both solutions are correct and provide the same output but with different computational efficiency.