Given 2n
balls of k
distinct colors. You will be given an integer array balls
of size k
where balls[i]
is the number of balls of color i
.
All the balls will be shuffled uniformly at random, then we will distribute the first n
balls to the first box and the remaining n
balls to the other box (Please read the explanation of the second example carefully).
Please note that the two boxes are considered different. For example, if we have two balls of colors a
and b
, and two boxes []
and ()
, then the distribution [a] (b)
is considered different than the distribution [b] (a)
(Please read the explanation of the first example carefully).
Return the probability that the two boxes have the same number of distinct balls. Answers within 10-5
of the actual value will be accepted as correct.
Example 1:
Input: balls = [1,1] Output: 1.00000 Explanation: Only 2 ways to divide the balls equally: - A ball of color 1 to box 1 and a ball of color 2 to box 2 - A ball of color 2 to box 1 and a ball of color 1 to box 2 In both ways, the number of distinct colors in each box is equal. The probability is 2/2 = 1
Example 2:
Input: balls = [2,1,1] Output: 0.66667 Explanation: We have the set of balls [1, 1, 2, 3] This set of balls will be shuffled randomly and we may have one of the 12 distinct shuffles with equal probability (i.e. 1/12): [1,1 / 2,3], [1,1 / 3,2], [1,2 / 1,3], [1,2 / 3,1], [1,3 / 1,2], [1,3 / 2,1], [2,1 / 1,3], [2,1 / 3,1], [2,3 / 1,1], [3,1 / 1,2], [3,1 / 2,1], [3,2 / 1,1] After that, we add the first two balls to the first box and the second two balls to the second box. We can see that 8 of these 12 possible random distributions have the same number of distinct colors of balls in each box. Probability is 8/12 = 0.66667
Example 3:
Input: balls = [1,2,1,2] Output: 0.60000 Explanation: The set of balls is [1, 2, 2, 3, 4, 4]. It is hard to display all the 180 possible random shuffles of this set but it is easy to check that 108 of them will have the same number of distinct colors in each box. Probability = 108 / 180 = 0.6
Constraints:
1 <= balls.length <= 8
1 <= balls[i] <= 6
sum(balls)
is even.This problem asks for the probability that two boxes, each containing half of a set of balls of different colors, have the same number of distinct colors. The solution uses dynamic programming (with memoization) and combinatorics.
Approach:
Combinatorics (Pre-calculation): The code first pre-calculates binomial coefficients (combinations) using Pascal's Triangle. This is because we need to know how many ways we can choose x
balls of a particular color from the total number of balls of that color. This is stored in a 2D array c
.
Dynamic Programming (DFS with Memoization): The core of the solution is a depth-first search (DFS) function (dfs
) with memoization. The dfs
function takes three arguments:
i
: The index of the current color being considered.j
: The number of balls remaining to be distributed to the first box.diff
: The difference in the number of distinct colors between the two boxes. A positive value means the first box has more distinct colors, negative means the second box has more, and 0 means they have the same number.The base cases are:
i
reaches the end of the colors (i >= k
), it checks if j
is 0 (all balls have been distributed) and diff
is 0 (same number of distinct colors). If both are true, it returns 1 (successful distribution); otherwise, it returns 0.j
becomes negative, it means we've tried to put more balls in the first box than are available, so it returns 0.The recursive step iterates through all possible ways to distribute the balls of the current color (i
). For each x
(number of balls of color i
going to box 1), it recursively calls dfs
with updated j
(reducing balls available), and diff
(adjusting based on whether a new color was added). It multiplies the recursive result by the number of ways to choose x
balls from the available balls of color i
(obtained from the pre-calculated c
array).
Probability Calculation: Finally, the code divides the total number of successful distributions (obtained from dfs(0, n, 0)
, starting with 0 color, n balls remaining for box 1 and equal distinct colors) by the total number of possible distributions (calculated using the binomial coefficient comb(n << 1, n)
which is total number of ways to choose n
balls from 2n balls), to obtain the probability.
Time Complexity:
The time complexity is dominated by the dfs
function. In the worst case, it explores all possible distributions. The number of possible distributions is exponential in the number of colors and balls, resulting in O(n^k), where 'n' is approximately half the total number of balls and 'k' is the number of colors. However, memoization significantly reduces the runtime in practice because many subproblems are repeated. The pre-calculation of binomial coefficients takes O(m^2) time, where m is the maximum between the maximum number of balls of any single color and twice the number of total balls. This is a relatively small overhead.
Space Complexity:
The space complexity is dominated by the memoization table in dfs
, which stores results for different subproblems. The size of the table is O(k * n * k), where 'k' is the number of colors and 'n' is approximately half the total number of balls. Additionally, we have the space used for pre-calculating binomial coefficients, which is O(m^2). The overall space complexity is therefore O(k * n * k + m^2) .
The code provided shows implementations in multiple languages (Python, Java, C++, Go, Typescript). All follow the same algorithmic approach, differing only in syntax and data structure choices. The use of memoization is crucial for making this approach feasible even for relatively small input sizes as the raw complexity is exponential.