Table: Employee
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |
+--------------+---------+
id is the primary key (column with unique values) for this table.
departmentId is a foreign key (reference column) of the ID from the Department
table.
Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.
Table: Department
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | name | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table indicates the ID of a department and its name.
A company's executives are interested in seeing who earns the most money in each of the company's departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.
Write a solution to find the employees who are high earners in each of the departments.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Employee table: +----+-------+--------+--------------+ | id | name | salary | departmentId | +----+-------+--------+--------------+ | 1 | Joe | 85000 | 1 | | 2 | Henry | 80000 | 2 | | 3 | Sam | 60000 | 2 | | 4 | Max | 90000 | 1 | | 5 | Janet | 69000 | 1 | | 6 | Randy | 85000 | 1 | | 7 | Will | 70000 | 1 | +----+-------+--------+--------------+ Department table: +----+-------+ | id | name | +----+-------+ | 1 | IT | | 2 | Sales | +----+-------+ Output: +------------+----------+--------+ | Department | Employee | Salary | +------------+----------+--------+ | IT | Max | 90000 | | IT | Joe | 85000 | | IT | Randy | 85000 | | IT | Will | 70000 | | Sales | Henry | 80000 | | Sales | Sam | 60000 | +------------+----------+--------+ Explanation: In the IT department: - Max earns the highest unique salary - Both Randy and Joe earn the second-highest unique salary - Will earns the third-highest unique salary In the Sales department: - Henry earns the highest salary - Sam earns the second-highest salary - There is no third-highest salary as there are only two employees
Constraints:
This problem requires finding the top three highest unique salaries for each department in an employee database. Two solutions are presented: one using pandas in Python and another using MySQL.
This solution leverages the power of pandas for data manipulation.
Steps:
Identify Top 3 Salaries per Department:
employee.drop_duplicates(["salary", "departmentId"])
: This removes duplicate salary entries within each department, ensuring only unique salaries are considered..groupby("departmentId")["salary"].nlargest(3)
: Groups the data by department ID and selects the three largest salaries within each group..groupby("departmentId").min()
: This step is crucial. After getting the top 3, this finds the minimum of those top 3 salaries for each department. This gives us the cutoff salary; anyone above or equal to this cutoff is in the top 3.Join with Department Names:
employee["Department"] = department.set_index("id")["name"][employee["departmentId"]].values
: This efficiently joins the employee
DataFrame with the department
DataFrame to add the department name to each employee's record.Filter for Top Earners:
employee["cutoff"] = salary_cutoff[employee["departmentId"]].values
: Adds a new column 'cutoff' representing the minimum salary in the top 3 for each department.employee[employee["salary"] >= employee["cutoff"]]
: Filters the DataFrame to keep only employees whose salary is greater than or equal to the cutoff.Rename and Select Columns:
.rename(columns={"name": "Employee", "salary": "Salary"})
: Renames columns for better readability.[["Department", "Employee", "Salary"]]
: Selects the desired columns for the output.Time Complexity: The pandas solution's time complexity is dominated by the groupby
and nlargest
operations. These are generally O(N log K), where N is the number of employees and K is the number of top salaries to select (in this case, K=3). The other operations are linear, so the overall complexity is approximately O(N log K). For a fixed K (like 3), this simplifies to O(N log 3) ≈ O(N).
Space Complexity: The space complexity is linear, O(N), as it creates several intermediate DataFrames proportional to the size of the input data.
This solution uses window functions in MySQL to efficiently solve the problem.
Steps:
Rank Salaries Within Departments:
WITH T AS (...)
: A common table expression (CTE) called T
is used.DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC)
: This is the core of the solution. DENSE_RANK()
assigns a rank to each employee's salary within their department, ordered descending by salary. DENSE_RANK()
handles ties (same salaries) by assigning the same rank, unlike RANK()
which would skip ranks.Join and Filter:
SELECT d.name AS Department, t.name AS Employee, salary AS Salary
: Selects the desired columns, aliasing them for clarity.FROM T AS t JOIN Department AS d ON t.departmentId = d.id
: Joins the CTE T
with the Department
table to get the department names.WHERE rk <= 3
: Filters the results to keep only employees with a rank of 3 or less (top 3).Time Complexity: The MySQL solution's time complexity is dominated by the window function DENSE_RANK()
. Window functions in MySQL generally have a complexity that is roughly linear to the number of rows, so the overall complexity is approximately O(N log N) in the worst case (though often optimized to be closer to linear in practice).
Space Complexity: The space complexity is dominated by the CTE T
, which stores a copy of the Employee
table with an added rank column. Thus, the space complexity is approximately O(N).
Summary Table:
| Solution | Language | Time Complexity | Space Complexity | |---------------|----------|-----------------|-----------------| | Solution 1 | Python | O(N) | O(N) | | Solution 2 | MySQL | O(N log N) | O(N) |
Note: The actual performance can vary depending on database optimizations and data characteristics. The pandas solution might be faster for smaller datasets, while the MySQL solution can be more efficient for very large datasets due to database optimizations.