{x}
blog image

Groups of Strings

You are given a 0-indexed array of strings words. Each string consists of lowercase English letters only. No letter occurs more than once in any string of words.

Two strings s1 and s2 are said to be connected if the set of letters of s2 can be obtained from the set of letters of s1 by any one of the following operations:

  • Adding exactly one letter to the set of the letters of s1.
  • Deleting exactly one letter from the set of the letters of s1.
  • Replacing exactly one letter from the set of the letters of s1 with any letter, including itself.

The array words can be divided into one or more non-intersecting groups. A string belongs to a group if any one of the following is true:

  • It is connected to at least one other string of the group.
  • It is the only string present in the group.

Note that the strings in words should be grouped in such a manner that a string belonging to a group cannot be connected to a string present in any other group. It can be proved that such an arrangement is always unique.

Return an array ans of size 2 where:

  • ans[0] is the maximum number of groups words can be divided into, and
  • ans[1] is the size of the largest group.

 

Example 1:

Input: words = ["a","b","ab","cde"]
Output: [2,3]
Explanation:
- words[0] can be used to obtain words[1] (by replacing 'a' with 'b'), and words[2] (by adding 'b'). So words[0] is connected to words[1] and words[2].
- words[1] can be used to obtain words[0] (by replacing 'b' with 'a'), and words[2] (by adding 'a'). So words[1] is connected to words[0] and words[2].
- words[2] can be used to obtain words[0] (by deleting 'b'), and words[1] (by deleting 'a'). So words[2] is connected to words[0] and words[1].
- words[3] is not connected to any string in words.
Thus, words can be divided into 2 groups ["a","b","ab"] and ["cde"]. The size of the largest group is 3.  

Example 2:

Input: words = ["a","ab","abc"]
Output: [1,3]
Explanation:
- words[0] is connected to words[1].
- words[1] is connected to words[0] and words[2].
- words[2] is connected to words[1].
Since all strings are connected to each other, they should be grouped together.
Thus, the size of the largest group is 3.

 

Constraints:

  • 1 <= words.length <= 2 * 104
  • 1 <= words[i].length <= 26
  • words[i] consists of lowercase English letters only.
  • No letter occurs more than once in words[i].

Solution Explanation

This problem involves grouping strings based on the similarity of their constituent characters. Two strings are considered connected if their character sets differ by at most one element (addition, deletion, or replacement). The goal is to find the maximum number of groups and the size of the largest group.

The optimal solution uses a combination of bit manipulation and union-find.

1. Bit Manipulation for Character Sets:

Each string is represented by a bitmask (integer). Each bit in the bitmask corresponds to a letter in the alphabet (a=0, b=1, ..., z=25). If a letter is present in the string, the corresponding bit is set to 1; otherwise, it's 0. This allows efficient checking for connections between strings using bitwise operations.

2. Union-Find for Grouping:

The union-find algorithm is used to efficiently merge connected strings into groups. Each string's bitmask acts as its initial "group ID." The find function determines the root (representative) of a group, and the union function merges two groups by setting the root of one group to be the root of the other.

3. Connection Check:

Determining whether two strings are connected involves checking if their bitmasks differ by at most one bit.

  • Addition/Deletion: A single bit difference indicates an addition or deletion of a character.
  • Replacement: A two-bit difference where one bit is set in the first string and unset in the second, and vice-versa, indicates a replacement.

4. Algorithm:

  1. Initialization: Create a union-find data structure where each string's bitmask is its initial parent (root). Initialize a counter for group sizes.

  2. Bitmask Representation: Convert each string to its bitmask representation.

  3. Initial Union: For each string, if its size is greater than 1, decrease the count of groups.

  4. Connectivity Check and Union: Iterate through all strings and their corresponding bitmasks. For each string, check its connections with all other possible strings. If two strings are connected, perform a union operation on their groups.

  5. Result: After processing all string pairs, the number of groups is given by the remaining number of roots in the union-find structure, and the maximum group size is the maximum size found during the union operation.

Time Complexity Analysis:

  • Bitmask Creation: O(n*k) - where n is number of words and k is maximum length of a word
  • Union-Find Operations: The union-find operations (find and union) have an amortized time complexity of nearly O(α(n)) where α is the inverse Ackermann function which grows extremely slowly. In practice, this is considered O(1). The nested loops that check for connections contribute O(n * 26 * 26) in the worst case (checking all pairs and all bit changes).

Therefore, the overall time complexity is dominated by the nested loops checking for connections, resulting in O(n * 26^2).

Space Complexity Analysis:

The space complexity is O(n) due to storing the bitmasks, parent array (in union-find), and group sizes in the hashmaps.

Code Examples (Python):

from collections import Counter
 
class Solution:
    def groupStrings(self, words: List[str]) -> List[int]:
        def find(x):
            if p[x] != x:
                p[x] = find(p[x])
            return p[x]
 
        def union(a, b):
            nonlocal mx, n
            if b not in p:
                return
            pa, pb = find(a), find(b)
            if pa == pb:
                return
            p[pa] = pb
            size[pb] += size[pa]
            mx = max(mx, size[pb])
            n -= 1
 
        p = {}
        size = Counter()
        n = len(words)
        mx = 0
        for word in words:
            x = 0
            for c in word:
                x |= 1 << (ord(c) - ord('a'))
            p[x] = x
            size[x] += 1
            mx = max(mx, size[x])
            if size[x] > 1:
                n -= 1
        for x in p.keys():
            for i in range(26):
                union(x, x ^ (1 << i))
                if (x >> i) & 1:
                    for j in range(26):
                        if ((x >> j) & 1) == 0:
                            union(x, x ^ (1 << i) | (1 << j))
        return [n, mx]

The other languages (Java, C++, Go) follow a similar structure, using their respective data structures for hash maps and union-find. The core logic remains the same.