{x}
blog image

Design Add and Search Words Data Structure

Design a data structure that supports adding new words and finding if a string matches any previously added string.

Implement the WordDictionary class:

  • WordDictionary() Initializes the object.
  • void addWord(word) Adds word to the data structure, it can be matched later.
  • bool search(word) Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots '.' where dots can be matched with any letter.

 

Example:

Input
["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
Output
[null,null,null,null,false,true,true,true]

Explanation
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("bad");
wordDictionary.addWord("dad");
wordDictionary.addWord("mad");
wordDictionary.search("pad"); // return False
wordDictionary.search("bad"); // return True
wordDictionary.search(".ad"); // return True
wordDictionary.search("b.."); // return True

 

Constraints:

  • 1 <= word.length <= 25
  • word in addWord consists of lowercase English letters.
  • word in search consist of '.' or lowercase English letters.
  • There will be at most 2 dots in word for search queries.
  • At most 104 calls will be made to addWord and search.

Solution Explanation: Design Add and Search Words Data Structure

This problem requires designing a data structure that efficiently handles adding words and searching for words, potentially containing wildcard characters (.). The optimal solution utilizes a Trie data structure.

Trie (Prefix Tree): A Trie is a tree-like data structure used for efficient retrieval of keys in a set. Each node in the Trie represents a character in a word. Paths from the root to a leaf node represent words. This makes prefix-based searches very efficient.

Approach:

  1. addWord(word): This function iterates through the characters of the input word. For each character, it checks if a corresponding child node exists in the current Trie node. If not, a new node is created. The process continues until the end of the word, marking the final node as the end of a word (is_end = True).

  2. search(word): This is the more complex function. It uses Depth-First Search (DFS) or Breadth-First Search (BFS) to traverse the Trie.

    • Handling Wildcards (.): When a wildcard is encountered, the search explores all possible child nodes recursively. This is because a wildcard can match any character.

    • No Wildcards: If there's no wildcard, it follows the path corresponding to the characters in the word.

    • End of Word: The search returns true if a path corresponding to the input word is found and the end node of that path has is_end = True. Otherwise, it returns false.

Time Complexity Analysis:

  • addWord(word): O(M), where M is the length of the word being added. This is because we traverse at most M nodes in the Trie.

  • search(word): This is more nuanced.

    • In the worst-case scenario (many wildcards and a large number of words in the Trie), the time complexity could approach O(N * 26K), where N is the number of words in the Trie, and K is the number of wildcards in the search word. This is because we could potentially explore all branches of the Trie for each wildcard.
    • However, in practice, the time complexity is usually much better. The actual time taken depends on the structure of the Trie and the input words.

Space Complexity Analysis:

The space complexity is O(N * M), where N is the number of words added to the Trie and M is the average length of the words. This represents the space needed to store the Trie structure itself.

Code Examples (Python, Java, C++, Go, C#):

The provided code snippets demonstrate the implementation of this approach in various programming languages. They all follow the same basic principles: using a Trie for efficient storage and a recursive or iterative search function to handle wildcards. Note the use of an array of size 26 (children) to represent the 26 lowercase letters of the alphabet for efficient character indexing. The isEnd boolean flag indicates that a word ends at that node in the trie.

The C# example uses a queue for BFS. This avoids the potentially deep recursive calls of DFS, which could lead to stack overflow errors for very long words or large tries. The time complexity remains largely the same, while possibly improving space complexity in certain cases, due to the use of a queue instead of recursion.