Table: Customer
+---------------+---------+ | Column Name | Type | +---------------+---------+ | customer_id | int | | name | varchar | | visited_on | date | | amount | int | +---------------+---------+ In SQL,(customer_id, visited_on) is the primary key for this table. This table contains data about customer transactions in a restaurant. visited_on is the date on which the customer with ID (customer_id) has visited the restaurant. amount is the total paid by a customer.
You are the restaurant owner and you want to analyze a possible expansion (there will be at least one customer every day).
Compute the moving average of how much the customer paid in a seven days window (i.e., current day + 6 days before). average_amount
should be rounded to two decimal places.
Return the result table ordered by visited_on
in ascending order.
The result format is in the following example.
Example 1:
Input: Customer table: +-------------+--------------+--------------+-------------+ | customer_id | name | visited_on | amount | +-------------+--------------+--------------+-------------+ | 1 | Jhon | 2019-01-01 | 100 | | 2 | Daniel | 2019-01-02 | 110 | | 3 | Jade | 2019-01-03 | 120 | | 4 | Khaled | 2019-01-04 | 130 | | 5 | Winston | 2019-01-05 | 110 | | 6 | Elvis | 2019-01-06 | 140 | | 7 | Anna | 2019-01-07 | 150 | | 8 | Maria | 2019-01-08 | 80 | | 9 | Jaze | 2019-01-09 | 110 | | 1 | Jhon | 2019-01-10 | 130 | | 3 | Jade | 2019-01-10 | 150 | +-------------+--------------+--------------+-------------+ Output: +--------------+--------------+----------------+ | visited_on | amount | average_amount | +--------------+--------------+----------------+ | 2019-01-07 | 860 | 122.86 | | 2019-01-08 | 840 | 120 | | 2019-01-09 | 840 | 120 | | 2019-01-10 | 1000 | 142.86 | +--------------+--------------+----------------+ Explanation: 1st moving average from 2019-01-01 to 2019-01-07 has an average_amount of (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86 2nd moving average from 2019-01-02 to 2019-01-08 has an average_amount of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120 3rd moving average from 2019-01-03 to 2019-01-09 has an average_amount of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120 4th moving average from 2019-01-04 to 2019-01-10 has an average_amount of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86
This problem requires calculating a 7-day moving average of customer spending at a restaurant. The solution involves using SQL window functions or self-joins to achieve this.
This approach leverages MySQL's window functions for efficient calculation.
1. Subquery tt
: This subquery groups the Customer
table by visited_on
and sums the amount
for each day. This is necessary because multiple customers might visit on the same day.
2. CTE t
: This Common Table Expression (CTE) does the following:
SUM(amount) OVER (ORDER BY visited_on ROWS 6 PRECEDING)
: This is the core of the moving average calculation. It calculates a running sum of amount
for the current day and the six preceding days. The ROWS 6 PRECEDING
clause specifies the window frame.RANK() OVER (ORDER BY visited_on ROWS 6 PRECEDING)
: This assigns a rank to each row based on the visited_on
date within the 7-day window. This helps filter out rows where the 7-day window is not fully populated (for the first 6 days).3. Final SELECT Statement: This selects visited_on
, the calculated amount
(sum of the 7-day window), and the average amount, rounded to two decimal places using ROUND(amount / 7, 2)
. The WHERE rk > 6
clause filters out the initial rows where the 7-day window isn't complete.
MySQL Code:
WITH
t AS (
SELECT
visited_on,
SUM(amount) OVER (
ORDER BY visited_on
ROWS 6 PRECEDING
) AS amount,
RANK() OVER (
ORDER BY visited_on
ROWS 6 PRECEDING
) AS rk
FROM
(
SELECT visited_on, SUM(amount) AS amount
FROM Customer
GROUP BY visited_on
) AS tt
)
SELECT visited_on, amount, ROUND(amount / 7, 2) AS average_amount
FROM t
WHERE rk > 6;
Time Complexity: O(N log N) due to the RANK()
function, where N is the number of rows in the Customer
table after grouping by visited_on
. The sorting implicit in window functions contributes to this complexity. If RANK()
wasn't used, it would be closer to O(N).
Space Complexity: O(N) for the CTE.
This approach uses a self-join to find the sum of amounts within a 7-day window for each date.
1. Subquery a
: This selects distinct visited_on
dates from the Customer
table. This generates a list of all dates.
2. JOIN: The a
subquery is joined with the Customer
table (b
) using DATEDIFF
. This ensures that only rows within a 7-day window (including the current day) are included in the sum.
3. WHERE clause: Filters out dates that don't have a full 7-day window available (at least the first six days).
4. GROUP BY and ORDER BY: Groups the results by visited_on
to sum the amounts for each 7-day period and orders by visited_on
.
MySQL Code:
SELECT
a.visited_on,
SUM(b.amount) AS amount,
ROUND(SUM(b.amount) / 7, 2) AS average_amount
FROM
(SELECT DISTINCT visited_on FROM customer) AS a
JOIN customer AS b ON DATEDIFF(a.visited_on, b.visited_on) BETWEEN 0 AND 6
WHERE a.visited_on >= (SELECT MIN(visited_on) FROM customer) + 6
GROUP BY 1
ORDER BY 1;
Time Complexity: O(N*M), where N is the number of distinct dates and M is the average number of rows per date. This is because of the self-join. In the worst case (many customers each day), it could be O(N^2).
Space Complexity: O(N) to store the distinct dates.
In Summary:
Both approaches achieve the desired result. The window function approach is generally preferred for its conciseness and potentially better performance, especially for large datasets, although the self-join is more easily understood for those unfamiliar with window functions. The time complexity analysis highlights the potential performance differences between these two methods. Choosing the best approach often depends on database engine specific optimizations and the size of the dataset.