You are given a 0-indexed array of strings words
. Each string consists of lowercase English letters only. No letter occurs more than once in any string of words
.
Two strings s1
and s2
are said to be connected if the set of letters of s2
can be obtained from the set of letters of s1
by any one of the following operations:
s1
.s1
.s1
with any letter, including itself.The array words
can be divided into one or more non-intersecting groups. A string belongs to a group if any one of the following is true:
Note that the strings in words
should be grouped in such a manner that a string belonging to a group cannot be connected to a string present in any other group. It can be proved that such an arrangement is always unique.
Return an array ans
of size 2
where:
ans[0]
is the maximum number of groups words
can be divided into, andans[1]
is the size of the largest group.
Example 1:
Input: words = ["a","b","ab","cde"] Output: [2,3] Explanation: - words[0] can be used to obtain words[1] (by replacing 'a' with 'b'), and words[2] (by adding 'b'). So words[0] is connected to words[1] and words[2]. - words[1] can be used to obtain words[0] (by replacing 'b' with 'a'), and words[2] (by adding 'a'). So words[1] is connected to words[0] and words[2]. - words[2] can be used to obtain words[0] (by deleting 'b'), and words[1] (by deleting 'a'). So words[2] is connected to words[0] and words[1]. - words[3] is not connected to any string in words. Thus, words can be divided into 2 groups ["a","b","ab"] and ["cde"]. The size of the largest group is 3.
Example 2:
Input: words = ["a","ab","abc"] Output: [1,3] Explanation: - words[0] is connected to words[1]. - words[1] is connected to words[0] and words[2]. - words[2] is connected to words[1]. Since all strings are connected to each other, they should be grouped together. Thus, the size of the largest group is 3.
Constraints:
1 <= words.length <= 2 * 104
1 <= words[i].length <= 26
words[i]
consists of lowercase English letters only.words[i]
.This problem involves grouping strings based on the similarity of their constituent characters. Two strings are considered connected if their character sets differ by at most one element (addition, deletion, or replacement). The goal is to find the maximum number of groups and the size of the largest group.
The optimal solution uses a combination of bit manipulation and union-find.
1. Bit Manipulation for Character Sets:
Each string is represented by a bitmask (integer). Each bit in the bitmask corresponds to a letter in the alphabet (a=0, b=1, ..., z=25). If a letter is present in the string, the corresponding bit is set to 1; otherwise, it's 0. This allows efficient checking for connections between strings using bitwise operations.
2. Union-Find for Grouping:
The union-find algorithm is used to efficiently merge connected strings into groups. Each string's bitmask acts as its initial "group ID." The find
function determines the root (representative) of a group, and the union
function merges two groups by setting the root of one group to be the root of the other.
3. Connection Check:
Determining whether two strings are connected involves checking if their bitmasks differ by at most one bit.
4. Algorithm:
Initialization: Create a union-find data structure where each string's bitmask is its initial parent (root). Initialize a counter for group sizes.
Bitmask Representation: Convert each string to its bitmask representation.
Initial Union: For each string, if its size is greater than 1, decrease the count of groups.
Connectivity Check and Union: Iterate through all strings and their corresponding bitmasks. For each string, check its connections with all other possible strings. If two strings are connected, perform a union operation on their groups.
Result: After processing all string pairs, the number of groups is given by the remaining number of roots in the union-find structure, and the maximum group size is the maximum size found during the union operation.
Time Complexity Analysis:
Therefore, the overall time complexity is dominated by the nested loops checking for connections, resulting in O(n * 26^2).
Space Complexity Analysis:
The space complexity is O(n) due to storing the bitmasks, parent array (in union-find), and group sizes in the hashmaps.
Code Examples (Python):
from collections import Counter
class Solution:
def groupStrings(self, words: List[str]) -> List[int]:
def find(x):
if p[x] != x:
p[x] = find(p[x])
return p[x]
def union(a, b):
nonlocal mx, n
if b not in p:
return
pa, pb = find(a), find(b)
if pa == pb:
return
p[pa] = pb
size[pb] += size[pa]
mx = max(mx, size[pb])
n -= 1
p = {}
size = Counter()
n = len(words)
mx = 0
for word in words:
x = 0
for c in word:
x |= 1 << (ord(c) - ord('a'))
p[x] = x
size[x] += 1
mx = max(mx, size[x])
if size[x] > 1:
n -= 1
for x in p.keys():
for i in range(26):
union(x, x ^ (1 << i))
if (x >> i) & 1:
for j in range(26):
if ((x >> j) & 1) == 0:
union(x, x ^ (1 << i) | (1 << j))
return [n, mx]
The other languages (Java, C++, Go) follow a similar structure, using their respective data structures for hash maps and union-find. The core logic remains the same.