{x}
blog image

Market Analysis I

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| join_date      | date    |
| favorite_brand | varchar |
+----------------+---------+
user_id is the primary key (column with unique values) of this table.
This table has the info of the users of an online shopping website where users can sell and buy items.

 

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| buyer_id      | int     |
| seller_id     | int     |
+---------------+---------+
order_id is the primary key (column with unique values) of this table.
item_id is a foreign key (reference column) to the Items table.
buyer_id and seller_id are foreign keys to the Users table.

 

Table: Items

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| item_id       | int     |
| item_brand    | varchar |
+---------------+---------+
item_id is the primary key (column with unique values) of this table.

 

Write a solution to find for each user, the join date and the number of orders they made as a buyer in 2019.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Users table:
+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2018-01-01 | Lenovo         |
| 2       | 2018-02-09 | Samsung        |
| 3       | 2018-01-19 | LG             |
| 4       | 2018-05-21 | HP             |
+---------+------------+----------------+
Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2018-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2018-08-04 | 1       | 4        | 2         |
| 5        | 2018-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+
Output: 
+-----------+------------+----------------+
| buyer_id  | join_date  | orders_in_2019 |
+-----------+------------+----------------+
| 1         | 2018-01-01 | 1              |
| 2         | 2018-02-09 | 2              |
| 3         | 2018-01-19 | 0              |
| 4         | 2018-05-21 | 0              |
+-----------+------------+----------------+

Solution Explanation for LeetCode Problem 1158: Market Analysis I

This problem requires querying three tables: Users, Orders, and Items to find the number of orders each user made as a buyer in 2019. The solution involves joining these tables and using aggregate functions.

Approach

Both solutions presented utilize LEFT JOIN to include all users from the Users table, even if they didn't place any orders in 2019. The crucial part is filtering orders to only include those from the year 2019 and then grouping the results by user_id to count orders for each user.

Solution 1: This approach uses COUNT(order_id) after the LEFT JOIN and WHERE clause to count the number of orders for each user in 2019. If a user has no orders in 2019, COUNT(order_id) will return 0.

Solution 2: This solution uses SUM(YEAR(order_date) = 2019) which leverages the fact that a boolean expression evaluates to 1 for true and 0 for false. Summing this across all orders for each user effectively counts the number of orders placed in 2019. IFNULL handles cases where a user has no orders, setting the count to 0.

Code Explanation (MySQL)

Solution 1:

SELECT
    u.user_id AS buyer_id,
    u.join_date,
    COUNT(order_id) AS orders_in_2019
FROM
    Users AS u
    LEFT JOIN Orders AS o ON u.user_id = o.buyer_id AND YEAR(order_date) = 2019
GROUP BY user_id;
  • SELECT u.user_id AS buyer_id, u.join_date: Selects the user ID and join date from the Users table.
  • COUNT(order_id) AS orders_in_2019: Counts the number of orders (order_id) for each user. Only orders from 2019 are included because of the join condition.
  • FROM Users AS u LEFT JOIN Orders AS o ON u.user_id = o.buyer_id AND YEAR(order_date) = 2019: Performs a LEFT JOIN between Users and Orders. The AND YEAR(order_date) = 2019 condition ensures that only orders from 2019 are considered. LEFT JOIN includes all users regardless of whether they have orders in 2019.
  • GROUP BY user_id: Groups the results by user ID to count orders for each user separately.

Solution 2:

SELECT
    user_id AS buyer_id,
    join_date,
    IFNULL(SUM(YEAR(order_date) = 2019), 0) AS orders_in_2019
FROM
    Users AS u
    LEFT JOIN Orders AS o ON u.user_id = buyer_id
GROUP BY 1;
  • SELECT user_id AS buyer_id, join_date: Selects the user ID and join date.
  • IFNULL(SUM(YEAR(order_date) = 2019), 0) AS orders_in_2019: Calculates the sum of boolean values (1 if the order is in 2019, 0 otherwise). IFNULL handles the case where there are no orders, giving a count of 0 instead of NULL.
  • FROM Users AS u LEFT JOIN Orders AS o ON u.user_id = buyer_id: Performs a LEFT JOIN to include all users. Filtering for 2019 is done within the SUM function.
  • GROUP BY 1: Groups the results by the first selected column (user_id).

Time Complexity Analysis

Both solutions have a similar time complexity. The dominant factor is the JOIN operation, which in the worst case, has a time complexity of O(NM), where N is the number of rows in Users and M is the number of rows in Orders. The GROUP BY and COUNT/SUM operations have a time complexity that's generally linear in the number of rows after the join (O(NM) in the worst case). However, in practice, database query optimizers usually make these operations much more efficient. The actual runtime will depend on the database engine and the size of the tables. The complexity is essentially dominated by the size of the joined data.