This problem requires calculating the cumulative salary for each employee over a 3-month rolling window, excluding the most recent month and months where the employee didn't work. The solution involves using window functions in SQL.
Both solutions leverage SQL's window functions to efficiently calculate the cumulative salary. The key idea is to use the SUM() OVER()
function with a RANGE
clause to specify the 3-month rolling window.
Solution 1: This solution directly filters out the most recent month for each employee after calculating the cumulative sum.
Solution 2: This solution uses RANK()
to assign a rank to each month within each employee's record, ordered by month descending. Then it filters out the rows with rank 1 (representing the most recent month). This approach can be slightly more efficient in some database systems than the first approach, but both approaches are generally efficient for this type of problem.
Both solutions have the same time and space complexity.
Time Complexity: O(N log N), where N is the number of rows in the Employee
table. The dominant factor is the sorting required by the ORDER BY
clause within the window function and the final ORDER BY
clause. The SUM() OVER()
operation itself is typically linear in the number of rows. Although the RANK() operation adds some overhead in solution 2, it is still generally considered to be O(N log N).
Space Complexity: O(N) in the worst case. This is due to the intermediate results generated by the window function and the temporary tables (or equivalent structures) that the database system might create during query execution. The space required is roughly proportional to the size of the input data.
SELECT
id,
month,
SUM(salary) OVER (
PARTITION BY id
ORDER BY month
RANGE 2 PRECEDING
) AS Salary
FROM employee
WHERE
(id, month) NOT IN (
SELECT
id,
MAX(month)
FROM Employee
GROUP BY id
)
ORDER BY id, month DESC;
This query first calculates the 3-month rolling sum using SUM() OVER()
. The PARTITION BY id
clause ensures that the sum is calculated separately for each employee. The ORDER BY month
clause specifies the order of the window, and RANGE 2 PRECEDING
includes the current month and the two preceding months. The WHERE
clause filters out the most recent month for each employee using a subquery. Finally, the results are ordered by id
and month
(descending).
WITH
T AS (
SELECT
id,
month,
SUM(salary) OVER (
PARTITION BY id
ORDER BY month
RANGE 2 PRECEDING
) AS salary,
RANK() OVER (
PARTITION BY id
ORDER BY month DESC
) AS rk
FROM Employee
)
SELECT id, month, salary
FROM T
WHERE rk > 1
ORDER BY 1, 2 DESC;
This query uses a Common Table Expression (CTE) called T
. Within T
, it calculates both the 3-month rolling sum and the rank of each month for each employee using RANK()
. The RANK()
function assigns the same rank to rows with the same value (in this case, the same month for an employee). The final SELECT
statement filters out rows where rk = 1
(the most recent month) and orders the results.
The problem is inherently SQL-based; there's no direct equivalent in other languages like Python, Java, or C++ without using a database library to execute the SQL query. If you were to process this data outside of a database, you would need a different algorithm that could handle grouping, sorting, and windowed aggregations (e.g., using hash maps and iterators in Python).