{x}
blog image

Number of Good Ways to Split a String

You are given a string s.

A split is called good if you can split s into two non-empty strings sleft and sright where their concatenation is equal to s (i.e., sleft + sright = s) and the number of distinct letters in sleft and sright is the same.

Return the number of good splits you can make in s.

 

Example 1:

Input: s = "aacaba"
Output: 2
Explanation: There are 5 ways to split "aacaba" and 2 of them are good. 
("a", "acaba") Left string and right string contains 1 and 3 different letters respectively.
("aa", "caba") Left string and right string contains 1 and 3 different letters respectively.
("aac", "aba") Left string and right string contains 2 and 2 different letters respectively (good split).
("aaca", "ba") Left string and right string contains 2 and 2 different letters respectively (good split).
("aacab", "a") Left string and right string contains 3 and 1 different letters respectively.

Example 2:

Input: s = "abcd"
Output: 1
Explanation: Split the string as follows ("ab", "cd").

 

Constraints:

  • 1 <= s.length <= 105
  • s consists of only lowercase English letters.

Solution Explanation for Number of Good Ways to Split a String

This problem asks to find the number of ways to split a string s into two non-empty substrings, s_left and s_right, such that the number of distinct characters in s_left equals the number of distinct characters in s_right.

Approach

The optimal approach utilizes the properties of sets and counters to efficiently track distinct characters. Instead of iterating through all possible splits (which would be O(n^2)), we use a single pass through the string.

We maintain two data structures:

  1. cnt (Counter/HashMap): Stores the frequency of each character in the entire string s. This is pre-calculated for efficiency.

  2. vis (Set/HashSet): Tracks the distinct characters encountered so far in the left substring (s_left).

The algorithm iterates through the string character by character:

  • For each character c:
    • Add c to vis.
    • Decrement the count of c in cnt. If the count becomes 0, remove c from cnt.
    • Check if the size of vis (distinct characters in s_left) is equal to the size of cnt (distinct characters remaining in s_right). If they are equal, this represents a "good split," so increment the ans counter.

Time and Space Complexity

  • Time Complexity: O(n), where n is the length of the string. We iterate through the string once. The operations on the cnt and vis data structures (insertion, deletion, size check) are all amortized O(1) on average for hash-based implementations.

  • Space Complexity: O(k), where k is the number of distinct characters in the string. In the worst case, k could be equal to n (if all characters are unique), but usually, k will be much smaller than n. The space is primarily used to store cnt and vis.

Code Implementation (Python)

from collections import Counter
 
class Solution:
    def numSplits(self, s: str) -> int:
        cnt = Counter(s)  # Count character frequencies
        vis = set()       # Track distinct characters in left substring
        ans = 0           # Count of good splits
 
        for c in s:
            vis.add(c)            # Add character to left substring's distinct set
            cnt[c] -= 1           # Decrement character count
            if cnt[c] == 0:      # Remove if count reaches zero
                del cnt[c]
            if len(vis) == len(cnt):  # Check for good split
                ans += 1
        return ans
 

The implementations in Java, C++, and Go follow a similar structure, using their respective hash map and set equivalents. The core logic remains the same, ensuring linear time complexity.