{x}
blog image

Employees Earning More Than Their Managers

Table: Employee

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
| salary      | int     |
| managerId   | int     |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID of an employee, their name, salary, and the ID of their manager.

 

Write a solution to find the employees who earn more than their managers.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Employee table:
+----+-------+--------+-----------+
| id | name  | salary | managerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | Null      |
| 4  | Max   | 90000  | Null      |
+----+-------+--------+-----------+
Output: 
+----------+
| Employee |
+----------+
| Joe      |
+----------+
Explanation: Joe is the only employee who earns more than his manager.

Solution Explanation for Employees Earning More Than Their Managers

This problem requires identifying employees who earn more than their respective managers. The most efficient approach involves using a self-join on the Employee table.

Approach: Self-Join and Filtering

  1. Self-Join: We perform a self-join of the Employee table with itself. This creates a new table where each row combines an employee's information with their manager's information. The join condition is e1.managerId = e2.id, where e1 represents the employee and e2 represents their manager.

  2. Filtering: After the join, we filter the resulting table to include only rows where the employee's salary (e1.salary) is greater than their manager's salary (e2.salary).

  3. Result: Finally, we select the employee's name (e1.name) as the output, giving us a list of employees earning more than their managers.

Time Complexity Analysis

The self-join operation dominates the time complexity. A self-join, in the worst case, has a time complexity of O(N^2), where N is the number of rows in the Employee table. However, if the database utilizes efficient indexing (such as an index on managerId and id), the join's performance will be significantly improved, closer to O(N log N) or even O(N) in many practical scenarios. The subsequent filtering step has a linear time complexity, O(N), relative to the number of rows after the join.

Space Complexity Analysis

The space complexity is primarily determined by the size of the intermediate table created by the self-join. In the worst-case scenario (without efficient indexing), this could be proportional to N^2, but with proper indexing, it remains closer to O(N). The space for the final result is linearly proportional to the number of employees satisfying the condition, which is at most O(N).

Code Implementation (MySQL)

SELECT e1.name AS Employee
FROM Employee e1
JOIN Employee e2 ON e1.managerId = e2.id
WHERE e1.salary > e2.salary;

This SQL query directly implements the steps described above. e1 aliases the Employee table to represent the employee, and e2 aliases it to represent the manager. The JOIN clause connects them based on the managerId and id columns. The WHERE clause filters based on salary. The AS Employee renames the output column for clarity.

Code Implementation (Python with Pandas)

import pandas as pd
 
def find_employees(employee: pd.DataFrame) -> pd.DataFrame:
    merged = employee.merge(
        employee, left_on="managerId", right_on="id", suffixes=("", "_manager")
    )
    result = merged[merged["salary"] > merged["salary_manager"]][["name"]]
    result.columns = ["Employee"]
    return result
 

This Python code uses the Pandas library for data manipulation. employee.merge() performs the self-join similar to the SQL query. The subsequent filtering and column selection mirror the SQL approach. The function returns a Pandas DataFrame containing the names of employees satisfying the condition.

The Pandas approach is convenient for data analysis, but its efficiency depends on Pandas' internal optimizations, which can vary based on data size and system resources. For extremely large datasets, a database-optimized SQL solution is generally preferred for performance.