This problem asks to find the length of the longest substring containing at most k distinct characters. The optimal approach uses a sliding window technique combined with a hash table (or dictionary) to efficiently track the distinct characters within the window.
Algorithm:
Initialization:
l
: A pointer representing the left boundary of the sliding window, initialized to 0.cnt
: A hash table (or dictionary) to store the frequency of each character within the current window. The keys are characters, and the values are their counts.Sliding Window Iteration:
s
character by character.c
:
c
in the cnt
hash table (cnt[c] += 1
).cnt
(i.e., len(cnt)
in Python, cnt.size()
in Java, etc.) exceeds k
:
k
distinct characters. We need to shrink the window from the left.s[l]
) in cnt
.s[l]
becomes 0 after decrementing, remove it from cnt
(because it's no longer in the window).l
one position to the right (l += 1
).Return Result: After iterating through the entire string, the length of the longest substring with at most k
distinct characters is len(s) - l
. The l
pointer marks the start of the remaining substring that violates the condition; everything to its right forms the longest valid substring.
Time Complexity: O(n), where n is the length of the string. Each character is processed at most twice (once when it's added to the window and once when it might be removed).
Space Complexity: O(k), where k is the maximum number of distinct characters allowed. The space used by the hash table cnt
is proportional to the number of distinct characters in the window, which is at most k.
Code Examples (with explanations inline):
Python:
from collections import Counter
class Solution:
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
l = 0 # Left pointer of the sliding window
cnt = Counter() # Hash table to count character frequencies
for r, c in enumerate(s): # Iterate with index (r) for easy access to s[l]
cnt[c] += 1 # Add character to the window
while len(cnt) > k: # If too many distinct characters
cnt[s[l]] -= 1 # Remove character from left
if cnt[s[l]] == 0:
del cnt[s[l]] # Remove from dictionary if count becomes 0
l += 1 # Move left pointer
return len(s) - l # Length of the longest valid substring
Java:
import java.util.HashMap;
import java.util.Map;
class Solution {
public int lengthOfLongestSubstringKDistinct(String s, int k) {
int l = 0; // Left pointer
Map<Character, Integer> cnt = new HashMap<>(); // Character frequency map
for (int r = 0; r < s.length(); ++r) {
char c = s.charAt(r);
cnt.put(c, cnt.getOrDefault(c, 0) + 1); // Add character
while (cnt.size() > k) { // Too many distinct characters
char leftChar = s.charAt(l);
cnt.put(leftChar, cnt.get(leftChar) - 1); // Remove from left
if (cnt.get(leftChar) == 0) {
cnt.remove(leftChar); //Remove if count becomes 0
}
l++; // Move left pointer
}
}
return s.length() - l; // Result
}
}
Other languages (C++, Go, TypeScript) follow a very similar structure, adapting the syntax and data structures appropriately. The core algorithmic idea remains the same.