{x}
blog image

Design HashSet

Design a HashSet without using any built-in hash table libraries.

Implement MyHashSet class:

  • void add(key) Inserts the value key into the HashSet.
  • bool contains(key) Returns whether the value key exists in the HashSet or not.
  • void remove(key) Removes the value key in the HashSet. If key does not exist in the HashSet, do nothing.

 

Example 1:

Input
["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]
Output
[null, null, null, true, false, null, true, null, false]

Explanation
MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1);      // set = [1]
myHashSet.add(2);      // set = [1, 2]
myHashSet.contains(1); // return True
myHashSet.contains(3); // return False, (not found)
myHashSet.add(2);      // set = [1, 2]
myHashSet.contains(2); // return True
myHashSet.remove(2);   // set = [1]
myHashSet.contains(2); // return False, (already removed)

 

Constraints:

  • 0 <= key <= 106
  • At most 104 calls will be made to add, remove, and contains.

Solution Explanation for Design HashSet

This problem asks to design a HashSet without using built-in hash table libraries. We'll explore two solutions: a simple static array approach and a more sophisticated approach using an array of linked lists.

Solution 1: Static Array

This solution is straightforward and efficient for a limited key range. We utilize a boolean array where the index represents the key, and the value indicates whether the key exists in the set (true) or not (false).

Algorithm:

  1. Initialization: Create a boolean array data of size 1000001 (to accommodate keys from 0 to 106), initialized with all false values.

  2. add(key): Set data[key] to true.

  3. remove(key): Set data[key] to false.

  4. contains(key): Return data[key].

Time Complexity: O(1) for all operations (add, remove, contains). This is because array access is constant time.

Space Complexity: O(106) or O(1), depending on whether we consider the array size to be constant or not. Since the array size is fixed and does not depend on the input, it's often considered O(1) in this context.

Solution 2: Array of Linked Lists (Separate Chaining)

This solution handles potential collisions more gracefully than the static array approach, especially if the key range is much larger or the number of elements in the HashSet is significant.

Algorithm:

  1. Initialization: Create an array data of size SIZE (e.g., 1000), where each element is a linked list. This SIZE determines the number of "buckets" in our hash table.

  2. hash(key): This function maps a key to an index in the data array using the modulo operator (key % SIZE). This is a simple hash function; more sophisticated ones could be used for better distribution.

  3. add(key): Compute the index idx = hash(key). Check if key already exists in data[idx] (linked list at index idx). If not, append key to data[idx].

  4. remove(key): Compute idx = hash(key). Search for key in data[idx] and remove it if found.

  5. contains(key): Compute idx = hash(key). Search for key in data[idx]. Return true if found, otherwise false.

Time Complexity:

  • Average Case: O(1) for add, remove, and contains. This assumes a relatively uniform distribution of keys across the buckets.
  • Worst Case: O(n) for all operations. This occurs if all keys hash to the same bucket (resulting in a single, very long linked list).

Space Complexity: O(n), where n is the number of elements in the HashSet. This is because we need to store the elements themselves in the linked lists. The array's size is constant.

Choosing between Solutions:

  • For a small, fixed key range (like 0-106 in this problem) and a relatively small number of elements, the static array approach (Solution 1) is simpler and faster on average.
  • For larger key ranges or a potentially large number of elements, Solution 2 (array of linked lists) provides better handling of collisions and avoids the worst-case scenario of a single massive linked list. However, it has a slightly higher space overhead.

The provided code examples in Python, Java, C++, Go, and TypeScript demonstrate both approaches. Remember to choose the solution that best suits your specific needs and constraints.