{x}
blog image

Design Search Autocomplete System

Solution Explanation:

This problem requires designing an autocomplete system. The core idea is to use a Trie data structure to efficiently store and search sentences based on prefixes. A priority queue is used to manage the top 3 hot sentences.

Data Structures:

  • Trie: A Trie (prefix tree) is used to store the sentences and their frequencies. Each node in the Trie represents a character in a sentence. The paths from the root to the leaf nodes represent complete sentences. Each node stores:

    • children: An array of child nodes, representing the next characters in the sentences. The size of the array is 27 (26 lowercase letters + space).
    • v: The hotness (frequency) of the sentence ending at this node.
    • w: The sentence string (if it's a leaf node).
  • PriorityQueue (Min-Heap): A min-heap is used to maintain the top 3 hottest sentences encountered during the search. The priority is based on the frequency (v) of the sentence. If two sentences have the same frequency, they are sorted lexicographically.

Algorithm:

  1. Initialization (AutocompleteSystem constructor):

    • Creates an empty Trie.
    • Iterates through the input sentences and times arrays. For each sentence and its frequency, it inserts the sentence into the Trie using the insert function of the Trie.
  2. Input Processing (input function):

    • Handling #: If the input character c is #, it means the end of a sentence. The current sentence (built in the t StringBuilder) is inserted into the Trie with a frequency of 1, and the t is reset for the next sentence. An empty list is returned.

    • Handling other characters: If c is not #, it's appended to the t StringBuilder (the current prefix of the sentence). Then, the Trie is searched for all sentences matching this prefix using the Trie's search function.

    • Searching and Sorting: A Depth-First Search (dfs in Java, implicit in Python) is performed on the Trie node corresponding to the current prefix to collect all sentences with the prefix and their frequencies. These are stored in a PriorityQueue (q in Java), which keeps only the top 3 based on frequency (highest frequency first, then lexicographical order).

    • Return Value: The function returns a list of the top 3 (or fewer) sentences retrieved from the priority queue.

Time and Space Complexity:

  • Time Complexity:

    • Initialization: O(M), where M is the total number of characters in all input sentences. This is due to inserting each sentence into the Trie.
    • input function: The worst-case time complexity is O(N), where N is the maximum number of sentences in the Trie that match the given prefix (though in practice, it's significantly less due to the early termination of the search if the priority queue is full). The dfs has a time complexity proportional to the number of nodes in the Trie that match the prefix, and the sorting of the priority queue has a time complexity of O(log k), where k is at most 3.
  • Space Complexity:

    • O(M) to store the Trie, where M is the total number of characters in all sentences. The space complexity of the priority queue is O(1), as it keeps only up to 3 elements.

Code Examples (Python and Java):

The code examples provided in the original response clearly implement this algorithm. The Python code uses a more concise approach for the Trie implementation and search, while the Java code explicitly uses a PriorityQueue for managing the top 3 sentences. Both achieve the same functionality with a similar time and space complexity.