This problem requires updating NULL
values in the drink
column of the CoffeeShop
table with the most recent non-NULL
value encountered. We'll explore two SQL solutions with different approaches and analyze their time complexities.
This solution leverages MySQL's variable capabilities for a concise and efficient approach.
Code:
SELECT
id,
CASE
WHEN drink IS NOT NULL THEN @cur := drink
ELSE @cur
END AS drink
FROM CoffeeShop;
Explanation:
@cur := drink
: We declare and initialize a user-defined variable @cur
. This line assigns the current non-NULL
drink
value to @cur
. This assignment happens only when drink
is NOT NULL.
ELSE @cur
: If drink
IS NULL
, the CASE
statement returns the value currently stored in @cur
(the last non-NULL
value encountered).
FROM CoffeeShop
: The query iterates through the CoffeeShop
table row by row. The order of rows is guaranteed to be preserved.
Time Complexity: O(N), where N is the number of rows in the CoffeeShop
table. The query iterates through each row once.
Space Complexity: O(1). The space used is constant regardless of the table size. The variable @cur
occupies a fixed amount of space.
This solution employs window functions to achieve the same result. It's more readable but might be slightly less efficient for very large tables.
Code:
WITH
S AS (
SELECT *, ROW_NUMBER() OVER () AS rk
FROM CoffeeShop
),
T AS (
SELECT
*,
SUM(
CASE
WHEN drink IS NULL THEN 0
ELSE 1
END
) OVER (ORDER BY rk) AS gid
FROM S
)
SELECT
id,
MAX(drink) OVER (
PARTITION BY gid
ORDER BY rk
) AS drink
FROM T;
Explanation:
CTE S
: Assigns a row number (rk
) to each row in the original table using ROW_NUMBER() OVER ()
. This is crucial for maintaining the original order and enabling subsequent operations.
CTE T
: This CTE calculates a group ID (gid
) for each consecutive sequence of non-NULL
drink values. The SUM()
with OVER (ORDER BY rk)
acts as a cumulative counter that increments only when a non-NULL drink is encountered. Rows with NULL
drinks will share the same gid
as the preceding non-NULL
row.
Final SELECT: Uses MAX(drink) OVER (PARTITION BY gid ORDER BY rk)
to find the maximum (i.e., the last non-NULL
) drink within each group (gid
) and assigns it to all rows within that group, ordered by rk
.
Time Complexity: O(N log N) or potentially O(N) depending on the specific database engine's implementation of window functions. The sorting involved in window functions can contribute to the log N factor, though some optimizations can reduce it to linear time in some cases.
Space Complexity: O(N) in the worst case because of the creation of CTEs S
and T
. The size of these CTEs is proportional to the number of rows in the input table.
Comparison:
Solution 1 (using variables) is generally more efficient, especially for very large tables, because it avoids the overhead associated with window functions. Solution 2 (using window functions) is arguably more readable and easier to understand, but the performance difference might be negligible for smaller datasets. The choice depends on the trade-off between performance and code readability.