{x}
blog image

Longest Substring with At Most K Distinct Characters

Solution Explanation: Longest Substring with At Most K Distinct Characters

This problem asks to find the length of the longest substring containing at most k distinct characters. The optimal approach uses a sliding window technique combined with a hash table (or dictionary) to efficiently track the distinct characters within the window.

Algorithm:

  1. Initialization:

    • l: A pointer representing the left boundary of the sliding window, initialized to 0.
    • cnt: A hash table (or dictionary) to store the frequency of each character within the current window. The keys are characters, and the values are their counts.
  2. Sliding Window Iteration:

    • The code iterates through the input string s character by character.
    • For each character c:
      • Increment Count: Increment the count of c in the cnt hash table (cnt[c] += 1).
      • Check Distinct Characters: If the number of distinct characters in cnt (i.e., len(cnt) in Python, cnt.size() in Java, etc.) exceeds k:
        • Shrink Window: This means the current window has more than k distinct characters. We need to shrink the window from the left.
        • Decrement the count of the character at the left boundary (s[l]) in cnt.
        • If the count of s[l] becomes 0 after decrementing, remove it from cnt (because it's no longer in the window).
        • Move the left boundary l one position to the right (l += 1).
  3. Return Result: After iterating through the entire string, the length of the longest substring with at most k distinct characters is len(s) - l. The l pointer marks the start of the remaining substring that violates the condition; everything to its right forms the longest valid substring.

Time Complexity: O(n), where n is the length of the string. Each character is processed at most twice (once when it's added to the window and once when it might be removed).

Space Complexity: O(k), where k is the maximum number of distinct characters allowed. The space used by the hash table cnt is proportional to the number of distinct characters in the window, which is at most k.

Code Examples (with explanations inline):

Python:

from collections import Counter
 
class Solution:
    def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
        l = 0  # Left pointer of the sliding window
        cnt = Counter()  # Hash table to count character frequencies
        for r, c in enumerate(s):  # Iterate with index (r) for easy access to s[l]
            cnt[c] += 1  # Add character to the window
            while len(cnt) > k:  # If too many distinct characters
                cnt[s[l]] -= 1  # Remove character from left
                if cnt[s[l]] == 0:
                    del cnt[s[l]]  # Remove from dictionary if count becomes 0
                l += 1  # Move left pointer
        return len(s) - l  # Length of the longest valid substring
 

Java:

import java.util.HashMap;
import java.util.Map;
 
class Solution {
    public int lengthOfLongestSubstringKDistinct(String s, int k) {
        int l = 0; // Left pointer
        Map<Character, Integer> cnt = new HashMap<>(); // Character frequency map
        for (int r = 0; r < s.length(); ++r) {
            char c = s.charAt(r);
            cnt.put(c, cnt.getOrDefault(c, 0) + 1); // Add character
            while (cnt.size() > k) { // Too many distinct characters
                char leftChar = s.charAt(l);
                cnt.put(leftChar, cnt.get(leftChar) - 1); // Remove from left
                if (cnt.get(leftChar) == 0) {
                    cnt.remove(leftChar); //Remove if count becomes 0
                }
                l++; // Move left pointer
            }
        }
        return s.length() - l; // Result
    }
}

Other languages (C++, Go, TypeScript) follow a very similar structure, adapting the syntax and data structures appropriately. The core algorithmic idea remains the same.