{x}
blog image

Human Traffic of Stadium

Table: Stadium

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| visit_date    | date    |
| people        | int     |
+---------------+---------+
visit_date is the column with unique values for this table.
Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit.
As the id increases, the date increases as well.

 

Write a solution to display the records with three or more rows with consecutive id's, and the number of people is greater than or equal to 100 for each.

Return the result table ordered by visit_date in ascending order.

The result format is in the following example.

 

Example 1:

Input: 
Stadium table:
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 1    | 2017-01-01 | 10        |
| 2    | 2017-01-02 | 109       |
| 3    | 2017-01-03 | 150       |
| 4    | 2017-01-04 | 99        |
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+
Output: 
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+
Explanation: 
The four rows with ids 5, 6, 7, and 8 have consecutive ids and each of them has >= 100 people attended. Note that row 8 was included even though the visit_date was not the next day after row 7.
The rows with ids 2 and 3 are not included because we need at least three consecutive ids.

Solution Explanation for LeetCode 601: Human Traffic of Stadium

This problem requires finding consecutive rows in the Stadium table where the ids are consecutive, the number of people is greater than or equal to 100, and there are at least three such rows. The solution uses window functions in SQL to efficiently achieve this.

Approach

The solution employs a two-step approach using Common Table Expressions (CTEs):

Step 1: Identifying Groups of Consecutive IDs with Sufficient People:

The first CTE, S, filters the Stadium table to include only rows where people >= 100. Crucially, it calculates a rank (rk) using ROW_NUMBER(). This rank subtracts the row number (based on the natural order of id) from the id. Consecutive ids will share the same rk value because the difference between id and the row number remains constant for consecutive numbers. This cleverly groups consecutive ids together.

Step 2: Counting Consecutive Rows and Filtering:

The second CTE, T, builds upon S. It uses COUNT(*) OVER (PARTITION BY rk) to count the number of rows for each rk. This counts how many consecutive rows with people >= 100 exist for each group.

The final SELECT statement then filters the results from T, selecting only those rows where cnt (the count of consecutive rows) is greater than or equal to 3. The results are ordered by id.

MySQL Code Explanation

WITH
    S AS (
        SELECT
            *,
            id - (ROW_NUMBER() OVER (ORDER BY id)) AS rk
        FROM Stadium
        WHERE people >= 100
    ),
    T AS (SELECT *, COUNT(1) OVER (PARTITION BY rk) AS cnt FROM S)
SELECT id, visit_date, people
FROM T
WHERE cnt >= 3
ORDER BY 1;
  1. WITH S AS (...): Defines the first CTE, S.

    • SELECT *, id - (ROW_NUMBER() OVER (ORDER BY id)) AS rk: Selects all columns and calculates rk. ROW_NUMBER() OVER (ORDER BY id) assigns a unique rank to each row based on ascending id order. Subtracting this rank from id groups consecutive ids.
    • FROM Stadium: Specifies the source table.
    • WHERE people >= 100: Filters rows to include only those with sufficient people.
  2. WITH T AS (...): Defines the second CTE, T.

    • SELECT *, COUNT(1) OVER (PARTITION BY rk) AS cnt: Selects all columns from S and adds a count (cnt) for each rk using a window function. The PARTITION BY rk ensures that the count is performed separately for each group of consecutive ids.
  3. SELECT id, visit_date, people FROM T WHERE cnt >= 3 ORDER BY 1;: This is the final query that selects the relevant columns from T, filters where cnt is at least 3 (meaning at least three consecutive rows meet the criteria), and orders the results by id.

Time Complexity Analysis

The time complexity is dominated by the window function operations within the CTEs. Window functions typically have a time complexity of O(N log N) or O(N) depending on the specific database implementation and the nature of the windowing operation. In this case, because we are using ROW_NUMBER() and COUNT() over partitions, we are likely closer to O(N log N) in the worst-case scenario, but database optimizations often make it perform closer to O(N) in practice. The filtering steps have a linear time complexity, O(N). Therefore, the overall time complexity is roughly O(N log N) but likely performs better than that in a real-world database system.

Space Complexity Analysis

The space complexity is determined primarily by the size of the CTEs. In the worst case, the CTEs might need to store a copy of the input data, leading to a space complexity of O(N), where N is the number of rows in the Stadium table. However, many database systems optimize these operations to minimize the actual memory used.