Table: Employee
+-------------+------+ | Column Name | Type | +-------------+------+ | id | int | | salary | int | +-------------+------+ id is the primary key (column with unique values) for this table. Each row of this table contains information about the salary of an employee.
Write a solution to find the second highest distinct salary from the Employee
table. If there is no second highest salary, return null (return None in Pandas)
.
The result format is in the following example.
Example 1:
Input: Employee table: +----+--------+ | id | salary | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | +----+--------+ Output: +---------------------+ | SecondHighestSalary | +---------------------+ | 200 | +---------------------+
Example 2:
Input: Employee table: +----+--------+ | id | salary | +----+--------+ | 1 | 100 | +----+--------+ Output: +---------------------+ | SecondHighestSalary | +---------------------+ | null | +---------------------+
The problem requires finding the second highest distinct salary from an Employee
table with id
and salary
columns. If no second highest salary exists, the solution should return NULL
(or None
in Python).
Several approaches can solve this problem, each with its own trade-offs in terms of readability and performance.
MySQL:
SELECT
(
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1, 1
) AS SecondHighestSalary;
This query uses a subquery to select distinct salaries, order them descending by salary, and then LIMIT 1, 1
selects only the second row (index 1, after skipping the first row, which is the highest salary). If there's only one salary or fewer than two distinct salaries, the subquery returns an empty set, resulting in NULL
for SecondHighestSalary
.
Python (Pandas):
import pandas as pd
def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
unique_salaries = employee["salary"].drop_duplicates()
second_highest = (
unique_salaries.nlargest(2).iloc[-1] if len(unique_salaries) >= 2 else None
)
if second_highest is None:
return pd.DataFrame({"SecondHighestSalary": [None]})
result_df = pd.DataFrame({"SecondHighestSalary": [second_highest]})
return result_df
The Python solution mirrors the SQL approach. It first removes duplicate salaries using .drop_duplicates()
, then finds the two largest using .nlargest(2)
. .iloc[-1]
gets the last element (second highest). Error handling ensures None
is returned if fewer than two unique salaries exist.
Time Complexity: O(N log N) due to sorting in both SQL and Python (Pandas's nlargest
has a time complexity of O(N log k) where k=2 in this case). Space Complexity: O(N) in the worst case (all salaries are distinct).
SELECT MAX(salary) AS SecondHighestSalary
FROM Employee
WHERE salary < (SELECT MAX(salary) FROM Employee);
This approach is more concise. It finds the maximum salary and then selects the maximum salary from those strictly less than the highest salary. This directly gives the second highest salary. If there's no second highest salary (only one unique salary), the query returns an empty set which translates to NULL
.
Time Complexity: Approximately O(N). The MAX()
function has a linear time complexity, and the WHERE
clause adds another linear scan. Space Complexity: O(1). This solution is generally more efficient than the subquery approach.
WITH T AS (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rk FROM Employee)
SELECT (SELECT DISTINCT salary FROM T WHERE rk = 2) AS SecondHighestSalary;
This solution uses a Common Table Expression (CTE) called T
. It assigns a rank to each salary using DENSE_RANK()
(assigns consecutive ranks without gaps, even if there are ties). The outer query then selects the salary with rank 2. If no salary has rank 2 (meaning less than two distinct salaries), the inner query returns NULL
.
Time Complexity: O(N log N) due to the window function (sorting is implicitly done for ranking). Space Complexity: O(N) in the worst case. Although elegant, this approach isn't necessarily faster than the MAX()
method.
| Solution | Time Complexity | Space Complexity | |---|---|---| | Solution 1 (Subquery & LIMIT) | O(N log N) | O(N) | | Solution 2 (MAX()) | O(N) | O(1) | | Solution 3 (Window Function) | O(N log N) | O(N) |
Solution 2 (using MAX()
) offers the best time complexity. The choice between the solutions depends on factors like database system optimizations and readability preferences. For most cases, Solution 2 provides a balance of efficiency and simplicity.