Table: Customer
+-------------+---------+ | Column Name | Type | +-------------+---------+ | customer_id | int | | product_key | int | +-------------+---------+ This table may contain duplicates rows.customer_id
is not NULL.
product_key is a foreign key (reference column) toProduct
table.
Table: Product
+-------------+---------+ | Column Name | Type | +-------------+---------+ | product_key | int | +-------------+---------+ product_key is the primary key (column with unique values) for this table.
Write a solution to report the customer ids from the Customer
table that bought all the products in the Product
table.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Customer table: +-------------+-------------+ | customer_id | product_key | +-------------+-------------+ | 1 | 5 | | 2 | 6 | | 3 | 5 | | 3 | 6 | | 1 | 6 | +-------------+-------------+ Product table: +-------------+ | product_key | +-------------+ | 5 | | 6 | +-------------+ Output: +-------------+ | customer_id | +-------------+ | 1 | | 3 | +-------------+ Explanation: The customers who bought all the products (5 and 6) are customers with IDs 1 and 3.
The problem requires identifying customers who have purchased every product listed in the Product
table. The solution leverages SQL's grouping and aggregation capabilities to achieve this efficiently.
The core idea is to:
Count distinct products per customer: Group the Customer
table by customer_id
and count the unique product_key
values for each customer. This gives us the number of distinct products each customer bought.
Count total distinct products: Determine the total number of distinct products available using a subquery on the Product
table. This is the benchmark against which we compare each customer's purchases.
Filter customers: Compare the count of distinct products bought by each customer (from step 1) with the total number of distinct products (from step 2). Only customers whose purchase count matches the total product count have bought all products and should be included in the result.
SELECT customer_id
FROM Customer
GROUP BY 1
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product);
Explanation:
SELECT customer_id
: This selects the customer_id
column to be included in the output.FROM Customer
: This specifies the Customer
table as the source of data.GROUP BY 1
: This groups the rows by the first column in the SELECT
list (customer_id
), so we get counts of distinct products for each customer.HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product)
: This is the crucial filtering step.
COUNT(DISTINCT product_key)
counts the number of distinct products each customer purchased within their group.(SELECT COUNT(*) FROM Product)
is a subquery that calculates the total number of unique products in the Product
table.HAVING
clause filters the grouped results, keeping only groups (customers) where the number of distinct products they bought equals the total number of products.The time complexity is dominated by the grouping and aggregation operations. In general, these operations have a time complexity of O(N log N) or O(N), where N is the number of rows in the Customer
table, depending on the database engine's optimization strategies. The subquery to count products in the Product
table is a separate operation with complexity proportional to the size of the Product
table (let's call it M). The overall complexity would be O(max(N log N, M)). If the Product
table is relatively small, the overall complexity is effectively O(N log N) or O(N).
The space complexity is determined by the intermediate data structures used during the grouping and aggregation. The space used will be proportional to the number of unique customers and distinct products, which is likely to be less than or equal to the size of the Customer
table. Therefore, the space complexity is considered to be O(N) in the worst case. In practice, this will likely be less than O(N) as not all customers will have bought all the products.