{x}
blog image

Median Employee Salary

Solution Explanation

This problem requires finding the median salary for each company in the Employee table. The solution uses window functions to efficiently achieve this. Let's break down the SQL solution step-by-step:

1. Common Table Expression (CTE) t:

  • SELECT *, ROW_NUMBER() OVER (PARTITION BY company ORDER BY salary ASC) AS rk: This part assigns a rank (rk) to each employee within their respective company based on salary. Employees with the same salary are ranked based on their id. The ROW_NUMBER() function is a window function that partitions the data by company and orders it by salary in ascending order.

  • COUNT(id) OVER (PARTITION BY company) AS n: This calculates the total number of employees (n) for each company. Again, this is a window function partitioned by company.

2. Final SELECT Statement:

  • SELECT id, company, salary FROM t WHERE rk >= n / 2 AND rk <= n / 2 + 1: This selects the id, company, and salary from the CTE t. The WHERE clause filters the results to include only those rows whose rank (rk) falls within the median range.

    • For an even number of employees in a company, the median is the average of the two middle salaries. The condition rk >= n / 2 AND rk <= n / 2 + 1 selects both of these rows.
    • For an odd number of employees, n/2 and n/2 + 1 are the same value, effectively selecting only the middle row (the median).

Time Complexity Analysis:

The time complexity of this SQL query is dominated by the window functions used. Window functions typically have a time complexity of O(N log N) in the worst case, where N is the number of rows in the Employee table due to the sorting involved in the ROW_NUMBER() function. However, many database systems optimize window function execution, potentially achieving better performance in practice. The final WHERE clause filtering has a time complexity of O(N) in the worst case, which is lower than the complexity of the window functions.

Therefore, the overall time complexity of the SQL query can be approximated as O(N log N). The space complexity is O(N) because of the intermediate result in the CTE and the overall result size.

MySQL Code:

WITH
    t AS (
        SELECT
            *,
            ROW_NUMBER() OVER (
                PARTITION BY company
                ORDER BY salary ASC
            ) AS rk,
            COUNT(id) OVER (PARTITION BY company) AS n
        FROM Employee
    )
SELECT
    id,
    company,
    salary
FROM t
WHERE rk >= n / 2 AND rk <= n / 2 + 1;

This approach elegantly solves the problem using the power of SQL's window functions. The alternative approach of solving this without window functions (as suggested in the problem's follow-up) would generally be significantly less efficient, requiring more complex joins and subqueries.