{x}
blog image

Subarrays with K Different Integers

Given an integer array nums and an integer k, return the number of good subarrays of nums.

A good array is an array where the number of different integers in that array is exactly k.

  • For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.

A subarray is a contiguous part of an array.

 

Example 1:

Input: nums = [1,2,1,2,3], k = 2
Output: 7
Explanation: Subarrays formed with exactly 2 different integers: [1,2], [2,1], [1,2], [2,3], [1,2,1], [2,1,2], [1,2,1,2]

Example 2:

Input: nums = [1,2,1,3,4], k = 3
Output: 3
Explanation: Subarrays formed with exactly 3 different integers: [1,2,1,3], [2,1,3], [1,3,4].

 

Constraints:

  • 1 <= nums.length <= 2 * 104
  • 1 <= nums[i], k <= nums.length

Solution Explanation for Subarrays with K Different Integers

This problem asks to find the number of subarrays containing exactly k distinct integers. The efficient solution uses a sliding window approach combined with a clever technique to avoid redundant calculations.

Core Idea:

Instead of directly counting subarrays with exactly k distinct integers, the solution leverages the principle of inclusion-exclusion. It first finds the number of subarrays with at most k distinct integers and then subtracts the number of subarrays with at most k-1 distinct integers. The difference gives the precise count of subarrays with exactly k distinct integers.

Algorithm:

The f(k) function is the core of the solution. It takes the input array nums and an integer k and returns an array pos where pos[i] represents the starting index of the smallest subarray ending at index i that contains at most k distinct integers. The algorithm uses a sliding window to efficiently determine this.

  1. Initialization: A cnt array (or a hash map in languages like Python) keeps track of the frequency of each integer within the current window. s counts the number of distinct integers in the window. j is the left pointer of the window.

  2. Sliding Window: The outer loop iterates through the array. For each element:

    • Increment its count in cnt. If the count becomes 1, it's a new distinct integer, so increment s.
    • While the number of distinct integers (s) exceeds k, shrink the window from the left:
      • Decrement the count of the leftmost element (nums[j]). If its count becomes 0, it's no longer in the window, so decrement s.
      • Increment j to move the left pointer.
    • Finally, pos[i] is set to j, representing the starting index of the minimum-length subarray ending at index i with at most k distinct integers.
  3. Inclusion-Exclusion: The main function then calls f(k) and f(k-1). The difference between corresponding elements of the resulting arrays (pos from f(k-1) and f(k)) gives the number of subarrays ending at that index that have exactly k distinct integers. Summing these differences gives the total number of such subarrays.

Time Complexity Analysis:

The f(k) function iterates through the array once. The inner loop (shrinking the window) also iterates at most through the array. Therefore, the time complexity of f(k) is O(N), where N is the length of nums. The main function calls f(k) twice, so the overall time complexity is O(N).

Space Complexity Analysis:

The space complexity is dominated by the cnt array (or hash map) and pos array, both of which have a size proportional to the length of the input array. Thus, the space complexity is O(N).

Code Explanation (Python):

The Python code directly implements the algorithm described above. The Counter object from the collections module is used for efficient frequency counting. The list comprehension elegantly calculates the sum of differences between the pos arrays.

Code Explanation (Java, C++, Go):

The Java, C++, and Go codes follow the same algorithm but use different data structures (arrays and manual counting instead of Counter). The core logic of the sliding window remains identical across all languages. The memset function in C++ is used for efficient array initialization.

This solution is highly efficient due to the linear time complexity, making it suitable for handling large input arrays. The use of inclusion-exclusion cleverly avoids redundant calculations that a more naive approach might suffer from.