Given an integer array nums
and an integer k
, return the number of good subarrays of nums
.
A good array is an array where the number of different integers in that array is exactly k
.
[1,2,3,1,2]
has 3
different integers: 1
, 2
, and 3
.A subarray is a contiguous part of an array.
Example 1:
Input: nums = [1,2,1,2,3], k = 2 Output: 7 Explanation: Subarrays formed with exactly 2 different integers: [1,2], [2,1], [1,2], [2,3], [1,2,1], [2,1,2], [1,2,1,2]
Example 2:
Input: nums = [1,2,1,3,4], k = 3 Output: 3 Explanation: Subarrays formed with exactly 3 different integers: [1,2,1,3], [2,1,3], [1,3,4].
Constraints:
1 <= nums.length <= 2 * 104
1 <= nums[i], k <= nums.length
This problem asks to find the number of subarrays containing exactly k distinct integers. The efficient solution uses a sliding window approach combined with a clever technique to avoid redundant calculations.
Core Idea:
Instead of directly counting subarrays with exactly k
distinct integers, the solution leverages the principle of inclusion-exclusion. It first finds the number of subarrays with at most k
distinct integers and then subtracts the number of subarrays with at most k-1
distinct integers. The difference gives the precise count of subarrays with exactly k
distinct integers.
Algorithm:
The f(k)
function is the core of the solution. It takes the input array nums
and an integer k
and returns an array pos
where pos[i]
represents the starting index of the smallest subarray ending at index i
that contains at most k
distinct integers. The algorithm uses a sliding window to efficiently determine this.
Initialization: A cnt
array (or a hash map in languages like Python) keeps track of the frequency of each integer within the current window. s
counts the number of distinct integers in the window. j
is the left pointer of the window.
Sliding Window: The outer loop iterates through the array. For each element:
cnt
. If the count becomes 1, it's a new distinct integer, so increment s
.s
) exceeds k
, shrink the window from the left:
nums[j]
). If its count becomes 0, it's no longer in the window, so decrement s
.j
to move the left pointer.pos[i]
is set to j
, representing the starting index of the minimum-length subarray ending at index i
with at most k
distinct integers.Inclusion-Exclusion: The main function then calls f(k)
and f(k-1)
. The difference between corresponding elements of the resulting arrays (pos
from f(k-1)
and f(k)
) gives the number of subarrays ending at that index that have exactly k
distinct integers. Summing these differences gives the total number of such subarrays.
Time Complexity Analysis:
The f(k)
function iterates through the array once. The inner loop (shrinking the window) also iterates at most through the array. Therefore, the time complexity of f(k)
is O(N), where N is the length of nums
. The main function calls f(k)
twice, so the overall time complexity is O(N).
Space Complexity Analysis:
The space complexity is dominated by the cnt
array (or hash map) and pos
array, both of which have a size proportional to the length of the input array. Thus, the space complexity is O(N).
Code Explanation (Python):
The Python code directly implements the algorithm described above. The Counter
object from the collections
module is used for efficient frequency counting. The list comprehension elegantly calculates the sum of differences between the pos
arrays.
Code Explanation (Java, C++, Go):
The Java, C++, and Go codes follow the same algorithm but use different data structures (arrays and manual counting instead of Counter
). The core logic of the sliding window remains identical across all languages. The memset
function in C++ is used for efficient array initialization.
This solution is highly efficient due to the linear time complexity, making it suitable for handling large input arrays. The use of inclusion-exclusion cleverly avoids redundant calculations that a more naive approach might suffer from.