Table: Employee
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | name | varchar | | salary | int | | managerId | int | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table indicates the ID of an employee, their name, salary, and the ID of their manager.
Write a solution to find the employees who earn more than their managers.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Employee table: +----+-------+--------+-----------+ | id | name | salary | managerId | +----+-------+--------+-----------+ | 1 | Joe | 70000 | 3 | | 2 | Henry | 80000 | 4 | | 3 | Sam | 60000 | Null | | 4 | Max | 90000 | Null | +----+-------+--------+-----------+ Output: +----------+ | Employee | +----------+ | Joe | +----------+ Explanation: Joe is the only employee who earns more than his manager.
This problem requires identifying employees who earn more than their respective managers. The most efficient approach involves using a self-join on the Employee
table.
Self-Join: We perform a self-join of the Employee
table with itself. This creates a new table where each row combines an employee's information with their manager's information. The join condition is e1.managerId = e2.id
, where e1
represents the employee and e2
represents their manager.
Filtering: After the join, we filter the resulting table to include only rows where the employee's salary (e1.salary
) is greater than their manager's salary (e2.salary
).
Result: Finally, we select the employee's name (e1.name
) as the output, giving us a list of employees earning more than their managers.
The self-join operation dominates the time complexity. A self-join, in the worst case, has a time complexity of O(N^2), where N is the number of rows in the Employee
table. However, if the database utilizes efficient indexing (such as an index on managerId
and id
), the join's performance will be significantly improved, closer to O(N log N) or even O(N) in many practical scenarios. The subsequent filtering step has a linear time complexity, O(N), relative to the number of rows after the join.
The space complexity is primarily determined by the size of the intermediate table created by the self-join. In the worst-case scenario (without efficient indexing), this could be proportional to N^2, but with proper indexing, it remains closer to O(N). The space for the final result is linearly proportional to the number of employees satisfying the condition, which is at most O(N).
SELECT e1.name AS Employee
FROM Employee e1
JOIN Employee e2 ON e1.managerId = e2.id
WHERE e1.salary > e2.salary;
This SQL query directly implements the steps described above. e1
aliases the Employee table to represent the employee, and e2
aliases it to represent the manager. The JOIN
clause connects them based on the managerId
and id
columns. The WHERE
clause filters based on salary. The AS Employee
renames the output column for clarity.
import pandas as pd
def find_employees(employee: pd.DataFrame) -> pd.DataFrame:
merged = employee.merge(
employee, left_on="managerId", right_on="id", suffixes=("", "_manager")
)
result = merged[merged["salary"] > merged["salary_manager"]][["name"]]
result.columns = ["Employee"]
return result
This Python code uses the Pandas library for data manipulation. employee.merge()
performs the self-join similar to the SQL query. The subsequent filtering and column selection mirror the SQL approach. The function returns a Pandas DataFrame containing the names of employees satisfying the condition.
The Pandas approach is convenient for data analysis, but its efficiency depends on Pandas' internal optimizations, which can vary based on data size and system resources. For extremely large datasets, a database-optimized SQL solution is generally preferred for performance.