You are given a string s
.
A split is called good if you can split s
into two non-empty strings sleft
and sright
where their concatenation is equal to s
(i.e., sleft + sright = s
) and the number of distinct letters in sleft
and sright
is the same.
Return the number of good splits you can make in s
.
Example 1:
Input: s = "aacaba"
Output: 2
Explanation: There are 5 ways to split "aacaba"
and 2 of them are good.
("a", "acaba") Left string and right string contains 1 and 3 different letters respectively.
("aa", "caba") Left string and right string contains 1 and 3 different letters respectively.
("aac", "aba") Left string and right string contains 2 and 2 different letters respectively (good split).
("aaca", "ba") Left string and right string contains 2 and 2 different letters respectively (good split).
("aacab", "a") Left string and right string contains 3 and 1 different letters respectively.
Example 2:
Input: s = "abcd" Output: 1 Explanation: Split the string as follows ("ab", "cd").
Constraints:
1 <= s.length <= 105
s
consists of only lowercase English letters.This problem asks to find the number of ways to split a string s
into two non-empty substrings, s_left
and s_right
, such that the number of distinct characters in s_left
equals the number of distinct characters in s_right
.
The optimal approach utilizes the properties of sets and counters to efficiently track distinct characters. Instead of iterating through all possible splits (which would be O(n^2)), we use a single pass through the string.
We maintain two data structures:
cnt
(Counter/HashMap): Stores the frequency of each character in the entire string s
. This is pre-calculated for efficiency.
vis
(Set/HashSet): Tracks the distinct characters encountered so far in the left substring (s_left
).
The algorithm iterates through the string character by character:
c
:
c
to vis
.c
in cnt
. If the count becomes 0, remove c
from cnt
.vis
(distinct characters in s_left
) is equal to the size of cnt
(distinct characters remaining in s_right
). If they are equal, this represents a "good split," so increment the ans
counter.Time Complexity: O(n), where n is the length of the string. We iterate through the string once. The operations on the cnt
and vis
data structures (insertion, deletion, size check) are all amortized O(1) on average for hash-based implementations.
Space Complexity: O(k), where k is the number of distinct characters in the string. In the worst case, k could be equal to n (if all characters are unique), but usually, k will be much smaller than n. The space is primarily used to store cnt
and vis
.
from collections import Counter
class Solution:
def numSplits(self, s: str) -> int:
cnt = Counter(s) # Count character frequencies
vis = set() # Track distinct characters in left substring
ans = 0 # Count of good splits
for c in s:
vis.add(c) # Add character to left substring's distinct set
cnt[c] -= 1 # Decrement character count
if cnt[c] == 0: # Remove if count reaches zero
del cnt[c]
if len(vis) == len(cnt): # Check for good split
ans += 1
return ans
The implementations in Java, C++, and Go follow a similar structure, using their respective hash map and set equivalents. The core logic remains the same, ensuring linear time complexity.