{x}
blog image

Hopper Company Queries II

Solution Explanation:

This problem requires generating a report showing the percentage of working drivers for each month of 2020. The solution involves several steps and uses SQL's recursive capabilities for efficient data manipulation.

Understanding the Tables:

  • Drivers: Contains driver IDs and their join dates.
  • Rides: Contains ride IDs, user IDs, and request dates. Not all rides are accepted.
  • AcceptedRides: Contains information about accepted rides, linking to Drivers and Rides tables.

Approach:

  1. Generate Months: A recursive common table expression (CTE) called Month generates a series of numbers from 1 to 12, representing the months of the year.

  2. Identify Active Drivers for Each Month: The CTE S joins the Month CTE with the Drivers table. For each month, it identifies all drivers who joined before or on that month in 2020. This gives us the set of active drivers for each month.

  3. Identify Working Drivers for Each Month: The CTE T joins the Rides and AcceptedRides tables to find all rides accepted by drivers in 2020 along with the associated driver and request date.

  4. Calculate Working Percentage: The final SELECT statement joins S and T to count the number of distinct working drivers (those with accepted rides in the given month) and divide it by the total number of active drivers in that month. The result is then rounded to two decimal places using ROUND(). The IFNULL() function handles cases where there are no active drivers, resulting in a 0% working percentage.

MySQL Code Breakdown:

WITH RECURSIVE
    Month AS (
        SELECT 1 AS month
        UNION ALL
        SELECT month + 1
        FROM Month
        WHERE month < 12
    ),
    S AS (
        SELECT m.month, d.driver_id, d.join_date
        FROM
            Month AS m
            LEFT JOIN Drivers AS d
                ON YEAR(d.join_date) < 2020 OR (YEAR(d.join_date) = 2020 AND MONTH(d.join_date) <= m.month)
    ),
    T AS (
        SELECT ar.driver_id, r.requested_at
        FROM
            Rides r
            JOIN AcceptedRides ar USING (ride_id)
        WHERE YEAR(r.requested_at) = 2020
    )
SELECT
    s.month,
    IFNULL(ROUND(COUNT(DISTINCT t.driver_id) * 100.0 / COUNT(DISTINCT s.driver_id), 2), 0) AS working_percentage
FROM
    S s
    LEFT JOIN T t
        ON s.driver_id = t.driver_id
        AND s.join_date <= t.requested_at
        AND s.month = MONTH(t.requested_at)
GROUP BY s.month
ORDER BY s.month;
 

Time Complexity Analysis:

The time complexity is dominated by the joins and aggregations. The recursive CTE Month has a constant time complexity (O(1)). The joins in S and T and the final join have time complexity proportional to the size of the respective tables (e.g., O(n) for a table with 'n' rows, where 'n' varies for different tables). The GROUP BY and COUNT operations also have complexities related to the number of rows after joining. Therefore, the overall time complexity is roughly **O(N*M) ** where 'N' is the size of the largest table, and M represents the other tables and the intermediate results. The exact complexity depends on the specific database implementation's optimization strategies.

Space Complexity Analysis:

The space complexity is determined by the intermediate CTEs. Each CTE creates a temporary table in memory. The size of these temporary tables will be at most proportional to the size of the input tables. Therefore, the space complexity is also roughly O(N) where N represents the size of the largest table.

Note: The code is optimized to handle cases where a driver has multiple accepted rides within the same month. The COUNT(DISTINCT t.driver_id) ensures that each driver is counted only once per month, regardless of how many rides they accepted. The use of IFNULL gracefully handles potential division-by-zero errors when no active drivers exist in a given month.