{x}
blog image

Customers Who Bought Products A and B but Not C

Solution Explanation

This problem requires finding customers who bought products "A" and "B" but not "C". The solution leverages SQL's capabilities for efficient data manipulation.

Approach

The core idea is to use joins, aggregation, and filtering.

  1. Join: A LEFT JOIN combines Customers and Orders tables based on customer_id. This ensures we have customer information along with their purchase history. A USING clause simplifies the join condition.

  2. Group By: The result is grouped by customer_id to aggregate the orders for each customer. This allows us to count occurrences of each product per customer.

  3. Having Clause: The HAVING clause filters the grouped results. The conditions SUM(product_name = 'A') > 0 and SUM(product_name = 'B') > 0 check if a customer bought "A" and "B" (at least once). SUM(product_name = 'C') = 0 ensures they did not buy "C". SUM(product_name = 'X') effectively counts the number of times product 'X' appears in the customer's order history.

  4. Order By: The final result is ordered by customer_id as requested.

MySQL Code Explained

SELECT customer_id, customer_name
FROM
    Customers
    LEFT JOIN Orders USING (customer_id)
GROUP BY 1
HAVING SUM(product_name = 'A') > 0 AND SUM(product_name = 'B') > 0 AND SUM(product_name = 'C') = 0
ORDER BY 1;
  • SELECT customer_id, customer_name: Selects the customer ID and name.
  • FROM Customers LEFT JOIN Orders USING (customer_id): Joins the tables based on customer_id. The USING clause is a shorthand for ON Customers.customer_id = Orders.customer_id. A LEFT JOIN ensures that all customers are included, even if they have no orders.
  • GROUP BY 1: Groups the rows by the first column in the SELECT statement (customer_id).
  • HAVING SUM(product_name = 'A') > 0 AND SUM(product_name = 'B') > 0 AND SUM(product_name = 'C') = 0: This is the crucial filtering step. It checks for the conditions described above.
  • ORDER BY 1: Orders the final result set by customer_id.

Time Complexity Analysis

The time complexity is dominated by the GROUP BY and HAVING operations. In a well-indexed database, these operations would typically have a time complexity that's approximately linear in the number of rows in the Orders table (O(N), where N is the number of rows in Orders). The join operation's complexity also depends on the database engine and indexing, but is generally efficient for well-structured queries. The ORDER BY operation adds a sorting step, whose complexity depends on the sorting algorithm used but is often close to O(N log N) or better with optimized database indexes.

In summary, the overall time complexity is likely to be dominated by the grouping and sorting, making it roughly O(N log N) in the worst case for a large dataset in Orders. However, with appropriate database indexing, the actual performance would be significantly faster.