{x}
blog image

Concatenated Words

Given an array of strings words (without duplicates), return all the concatenated words in the given list of words.

A concatenated word is defined as a string that is comprised entirely of at least two shorter words (not necessarily distinct) in the given array.

 

Example 1:

Input: words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]
Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]
Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
"dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Example 2:

Input: words = ["cat","dog","catdog"]
Output: ["catdog"]

 

Constraints:

  • 1 <= words.length <= 104
  • 1 <= words[i].length <= 30
  • words[i] consists of only lowercase English letters.
  • All the strings of words are unique.
  • 1 <= sum(words[i].length) <= 105

Solution Explanation

This problem asks us to find all "concatenated words" within a given list of words. A concatenated word is defined as a string composed of at least two shorter words from the same list. The solution leverages a Trie data structure and Depth-First Search (DFS) for efficient processing.

Approach

  1. Sorting: The input words array is sorted by word length in ascending order. This is crucial for the algorithm's efficiency. We process shorter words first to build the Trie; longer words are then checked against this Trie.

  2. Trie Construction: A Trie (prefix tree) is used to store the shorter words. The Trie efficiently allows us to check if a prefix of a longer word exists in the word list.

  3. Depth-First Search (DFS): For each word (after sorting):

    • If the word's length is 0, it's considered a concatenated word (base case).
    • We traverse the Trie using the characters of the word.
    • If we find a word ending at some point (indicated by isEnd in the Trie node), we recursively call dfs on the remaining suffix of the word. If the recursive call returns true (meaning the suffix can also be constructed from existing words), then the current word is a concatenated word.
  4. Adding to Trie: If a word is not a concatenated word, it's added to the Trie to be used for subsequent checks.

Time Complexity Analysis

  • Sorting: The sorting step takes O(N log N) time, where N is the number of words.
  • Trie Insertion: Inserting each word into the Trie takes O(L) time on average, where L is the average length of a word. In the worst case (all words are very long and share many prefixes), this could become O(NL).
  • DFS: The DFS operation, in the worst case, could potentially explore all possible partitions of each word. The worst-case complexity here is difficult to precisely characterize but could be exponential in the length of the longest word. However, it's significantly reduced by the fact that words are processed in order of increasing length and that the Trie prevents unnecessary exploration of paths.

Considering the sorting and the Trie operations, the overall time complexity is dominated by the DFS and the Trie construction. Therefore, it's approximately O(NL + Worst-case exponential complexity due to DFS), where the exponential factor depends on the structure of the input words. In practice, for typical inputs, this will perform much better than a purely exponential solution.

Space Complexity Analysis

  • Trie: The Trie's space complexity is proportional to the sum of the lengths of all words, so O(∑L).
  • Result List: The space to store the list of concatenated words is O(M), where M is the number of concatenated words found.

Overall, the space complexity is O(∑L + M).

The provided code in Python, Java, C++, and Go demonstrates the described algorithm. Each language's implementation showcases the same core logic, adapted to the specific language's syntax and data structures. The Trie structure in each language allows efficient prefix checking, making the algorithm perform well in practice despite the potential worst-case exponential complexity of DFS.