{x}
blog image

Longest Substring with At Least K Repeating Characters

Given a string s and an integer k, return the length of the longest substring of s such that the frequency of each character in this substring is greater than or equal to k.

if no such substring exists, return 0.

 

Example 1:

Input: s = "aaabb", k = 3
Output: 3
Explanation: The longest substring is "aaa", as 'a' is repeated 3 times.

Example 2:

Input: s = "ababbc", k = 2
Output: 5
Explanation: The longest substring is "ababb", as 'a' is repeated 2 times and 'b' is repeated 3 times.

 

Constraints:

  • 1 <= s.length <= 104
  • s consists of only lowercase English letters.
  • 1 <= k <= 105

Solution Explanation:

This problem asks for the longest substring where each character appears at least k times. A divide and conquer approach is efficient here.

Approach:

The core idea is a recursive function (dfs in the code examples) that operates on substrings.

  1. Character Frequency Count: For a given substring, it first counts the frequency of each character using a hash map (or array in the provided implementations).

  2. Splitting Condition: It checks if any character's frequency is less than k. If it finds such a character, this character becomes the "split" character. If no such character exists, it means all characters in the substring meet the criteria, and the substring's length is returned.

  3. Recursive Calls: If a "split" character is found, the algorithm splits the substring at each occurrence of the "split" character. Recursive calls to dfs are then made on the substrings between the occurrences of the split character. The longest substring found among these recursive calls is the result.

  4. Base Case: The recursion stops when a substring contains only characters with frequencies greater than or equal to k, or when the substring is empty.

Time Complexity Analysis:

The time complexity is not easily expressed in a simple Big O notation because it depends on the structure of the input string. In the worst case, the algorithm might examine every substring, leading to a complexity close to O(n^2), where n is the length of the string. However, in many practical scenarios, it performs much better due to early termination in the recursive steps. The repeated string slicing/substring creation in the Python implementation contributes to the overall time complexity.

Space Complexity Analysis:

The space complexity is dominated by the recursion depth. In the worst-case scenario where the string has many characters with frequencies less than k, the recursion depth could potentially be proportional to n. Therefore, the space complexity is O(n) in the worst case, mainly due to the recursive call stack. The hash map (or array) used to count character frequencies has a constant space complexity O(1) because the alphabet size is fixed (26 lowercase letters).

Code Explanation (Python):

The Python code uses a helper function dfs(l, r):

  • l and r are the indices representing the start and end of the current substring.
  • Counter(s[l : r + 1]) efficiently counts character frequencies.
  • next((c for c, v in cnt.items() if v < k), '') finds the first character with frequency less than k. If no such character is found, an empty string is returned.
  • The while loop iterates through the substring, splitting it at each occurrence of the "split" character.
  • Recursive calls to dfs process the substrings between split characters.
  • ans = max(ans, t) tracks the maximum length found so far.

The other language implementations (Java, C++, Go) follow the same algorithmic logic but use different syntax and data structures (arrays for frequency counting instead of Python's Counter). The core recursive divide-and-conquer strategy remains the same.