{x}
blog image

Prefix and Suffix Search

Design a special dictionary that searches the words in it by a prefix and a suffix.

Implement the WordFilter class:

  • WordFilter(string[] words) Initializes the object with the words in the dictionary.
  • f(string pref, string suff) Returns the index of the word in the dictionary, which has the prefix pref and the suffix suff. If there is more than one valid index, return the largest of them. If there is no such word in the dictionary, return -1.

 

Example 1:

Input
["WordFilter", "f"]
[[["apple"]], ["a", "e"]]
Output
[null, 0]
Explanation
WordFilter wordFilter = new WordFilter(["apple"]);
wordFilter.f("a", "e"); // return 0, because the word at index 0 has prefix = "a" and suffix = "e".

 

Constraints:

  • 1 <= words.length <= 104
  • 1 <= words[i].length <= 7
  • 1 <= pref.length, suff.length <= 7
  • words[i], pref and suff consist of lowercase English letters only.
  • At most 104 calls will be made to the function f.

This problem requires designing a data structure that efficiently searches for words with a given prefix and suffix. Two main solutions are presented: a hash table-based approach and a Trie-based approach.

Solution 1: Hash Table Approach

This approach uses a hash table (dictionary in Python) to store all possible prefix-suffix combinations and their corresponding word indices.

Algorithm:

  1. Initialization (__init__ or constructor):

    • Create a hash table d.
    • Iterate through the input words list.
    • For each word w:
      • Iterate through all possible prefixes (from empty string to the entire word).
      • For each prefix a, iterate through all possible suffixes (from the entire word to the empty string).
      • Store the pair (a, b) as the key in d, and the index k of the word as the value.
  2. Search (f):

    • Given a pref and suff, construct the key (pref, suff).
    • Return the value associated with this key in d (the index). If the key is not found, return -1.

Time Complexity:

  • Initialization: O(N * L^2), where N is the number of words and L is the maximum length of a word. This is because we iterate through all prefixes and suffixes for each word.
  • Search (f): O(1) on average, as hash table lookups are typically O(1). In the worst case, it could be O(N) if there are many hash collisions (though this is unlikely with a good hash function).

Space Complexity: O(N * L^2) in the worst case, to store all possible prefix-suffix combinations.

Solution 2: Trie Approach

This approach uses two Tries: one for prefixes and one for suffixes.

Algorithm:

  1. Initialization (__init__ or constructor):

    • Create two Tries, p (for prefixes) and s (for suffixes).
    • Iterate through the input words list.
    • For each word w and its index i:
      • Insert w into p.
      • Insert the reversed w into s (to efficiently search suffixes).
  2. Search (f):

    • Given pref and suff, search for pref in p and the reversed suff in s. This will give lists of indices (a and b).
    • If either list is empty, return -1.
    • Iterate through a and b from the end, finding the largest index that is present in both lists.

Time Complexity:

  • Initialization: O(N * L), where N is the number of words and L is the maximum length of a word. Insertion into a Trie takes O(L) time.
  • Search (f): O(L) on average. Searching in a Trie takes O(L) time. The two-pointer comparison afterwards is at most O(N), but in practice, it's often much smaller since we only compare until the common element is found.

Space Complexity: O(N * L) in the worst case, as the Tries store all characters of all words.

Which Solution is Better?

Solution 2 (Trie-based) is generally preferred because it has a lower time complexity for both initialization and search, especially for larger datasets. While Solution 1 has a simpler implementation, its quadratic time complexity for initialization can become a bottleneck. The space complexity is also improved in Solution 2. The Trie efficiently handles the overlapping prefixes and suffixes.