{x}
blog image

Similar String Groups

Two strings, X and Y, are considered similar if either they are identical or we can make them equivalent by swapping at most two letters (in distinct positions) within the string X.

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}.  Notice that "tars" and "arts" are in the same group even though they are not similar.  Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

We are given a list strs of strings where every string in strs is an anagram of every other string in strs. How many groups are there?

 

Example 1:

Input: strs = ["tars","rats","arts","star"]
Output: 2

Example 2:

Input: strs = ["omv","ovm"]
Output: 1

 

Constraints:

  • 1 <= strs.length <= 300
  • 1 <= strs[i].length <= 300
  • strs[i] consists of lowercase letters only.
  • All words in strs have the same length and are anagrams of each other.

Solution Explanation:

This problem asks to find the number of groups of similar strings, where similarity is defined as either being identical or differing by at most two character swaps. The input is a list of strings, all of which are anagrams of each other. This suggests using a Union-Find data structure to efficiently track connected components (groups).

Approach:

  1. Union-Find Data Structure: A Union-Find (disjoint-set) data structure is ideal for this problem. It maintains a collection of disjoint sets and provides efficient operations for finding the set a given element belongs to (find) and merging two sets (union).

  2. Similarity Check: For each pair of strings, we check if they are similar. This is done by comparing the strings character by character and counting the number of differing characters. If the difference is 0 or 2, the strings are similar and we perform a union operation in our Union-Find structure.

  3. Counting Groups: Initially, each string is in its own group. After iterating through all pairs and performing unions, the number of connected components in the Union-Find structure represents the number of similar string groups.

Time Complexity:

The dominant part of the algorithm is the nested loop comparing pairs of strings. This takes O(n^2 * m) time, where n is the number of strings and m is the length of each string. The Union-Find operations (find and union) have amortized time complexity of almost O(1) due to path compression and union by rank optimizations, so they don't significantly impact the overall time complexity.

Therefore, the overall time complexity is O(n^2 * m).

Space Complexity:

The space complexity is dominated by the Union-Find data structure, which requires O(n) space to store the parent pointers and potentially rank information. Therefore, the space complexity is O(n).

Code Explanation (Python):

The Python code implements the solution using a custom UnionFind class. Let's break down the important parts:

class UnionFind:
    def __init__(self, n):
        self.p = list(range(n)) # parent array: initially each element is its own parent
        self.size = [1] * n     # size of each set (initially 1)
 
    def find(self, x):
        if self.p[x] != x:
            self.p[x] = self.find(self.p[x]) # path compression for efficiency
        return self.p[x]
 
    def union(self, a, b):
        pa, pb = self.find(a), self.find(b)
        if pa == pb:
            return False # already in the same set
        if self.size[pa] > self.size[pb]:
            self.p[pb] = pa
            self.size[pa] += self.size[pb]
        else:
            self.p[pa] = pb
            self.size[pb] += self.size[pa]
        return True # union successful

The numSimilarGroups function iterates through pairs of strings, checks similarity, and performs unions using the UnionFind object. Finally, the number of connected components (which is initially the number of strings, and gets decremented each time a union occurs), represents the answer. The other language implementations follow a very similar structure.

Example (Python):

For the input strs = ["tars", "rats", "arts", "star"], the algorithm would perform the following (simplified):

  1. Compare "tars" and "rats": Similar (one swap), union them.
  2. Compare "tars" and "arts": Similar (one swap), union them (all three are now in one set).
  3. Compare "tars" and "star": Not similar.
  4. Compare "rats" and "arts": Similar. (Already in same set)
  5. Compare "rats" and "star": Not similar.
  6. Compare "arts" and "star": Not similar.

Finally, it would detect two connected components (groups): {"tars", "rats", "arts"} and {"star"}, resulting in the output 2.