This problem involves two tables: Buses
and Passengers
. The goal is to determine how many passengers each bus picks up, considering that a passenger catches the first bus that arrives after or at the same time as they do.
Approach:
The most efficient approach utilizes SQL's window functions to achieve this in a single query. Here's a breakdown:
Join the Tables: We perform a LEFT JOIN
between Buses
(aliased as b
) and Passengers
(aliased as p
) based on the condition p.arrival_time <= b.arrival_time
. This ensures that all buses are included in the result, even if they don't pick up any passengers.
Group by Bus ID: We GROUP BY b.bus_id
to aggregate passengers for each bus. COUNT(passenger_id)
counts the number of passengers associated with each bus.
Window Function for Cumulative Count: The core of the solution lies in the window function LAG(COUNT(passenger_id), 1, 0) OVER (ORDER BY b.arrival_time)
. This does the following:
COUNT(passenger_id)
: Counts the passengers for each bus (as in the previous step).LAG(..., 1, 0)
: This is the lag function. It looks at the previous row's COUNT(passenger_id)
value in the order defined by ORDER BY b.arrival_time
. The 1
indicates we look one row back, and 0
is the default value if there's no previous row (for the first bus).OVER (ORDER BY b.arrival_time)
: Specifies that the lag function should look at the preceding row based on the arrival time of buses in ascending order.Subtract Cumulative Counts: Subtracting the lagged count from the current count gives us the number of passengers that specifically boarded the current bus. This is because the lagged count represents the total number of passengers who had already boarded buses up to the previous bus.
Order the Result: Finally, ORDER BY 1
(which is equivalent to ORDER BY bus_id
) orders the results in ascending order of bus IDs.
MySQL Code:
SELECT
bus_id,
COUNT(passenger_id) - LAG(COUNT(passenger_id), 1, 0) OVER (ORDER BY b.arrival_time) AS passengers_cnt
FROM
Buses AS b
LEFT JOIN Passengers AS p ON p.arrival_time <= b.arrival_time
GROUP BY 1
ORDER BY 1;
Time Complexity Analysis:
The time complexity is dominated by the GROUP BY
and LEFT JOIN
operations, which are typically O(N log N) or O(N) depending on the database's optimization techniques, where N is the total number of rows in both tables. The window function adds a small overhead, but it does not significantly change the overall time complexity.
Space Complexity Analysis:
The space complexity is primarily determined by the temporary tables or intermediate results created during the join and aggregation. In the worst-case scenario, where a large number of passengers need to be associated with each bus, the space complexity could be O(N), but typically it remains much lower due to optimizations within the database system.