Table: Employees
+-------------+---------+ | Column Name | Type | +-------------+---------+ | employee_id | int | | name | varchar | +-------------+---------+ employee_id is the column with unique values for this table. Each row of this table indicates the name of the employee whose ID is employee_id.
Table: Salaries
+-------------+---------+ | Column Name | Type | +-------------+---------+ | employee_id | int | | salary | int | +-------------+---------+ employee_id is the column with unique values for this table. Each row of this table indicates the salary of the employee whose ID is employee_id.
Write a solution to report the IDs of all the employees with missing information. The information of an employee is missing if:
Return the result table ordered by employee_id
in ascending order.
The result format is in the following example.
Example 1:
Input: Employees table: +-------------+----------+ | employee_id | name | +-------------+----------+ | 2 | Crew | | 4 | Haven | | 5 | Kristian | +-------------+----------+ Salaries table: +-------------+--------+ | employee_id | salary | +-------------+--------+ | 5 | 76071 | | 1 | 22517 | | 4 | 63539 | +-------------+--------+ Output: +-------------+ | employee_id | +-------------+ | 1 | | 2 | +-------------+ Explanation: Employees 1, 2, 4, and 5 are working at this company. The name of employee 1 is missing. The salary of employee 2 is missing.
The problem requires identifying employees with missing information – either their name or salary is absent from the respective tables (Employees
and Salaries
). The solution leverages SQL's set operations to efficiently achieve this.
UNION
of SubqueriesThe core idea is to identify employees missing from either table and combine the results. This is done using two subqueries and the UNION
operator.
Subquery 1: SELECT employee_id FROM Employees WHERE employee_id NOT IN (SELECT employee_id FROM Salaries)
This subquery finds all employee_id
s present in the Employees
table but absent in the Salaries
table. This means these employees have names but are missing salary information.
Subquery 2: SELECT employee_id FROM Salaries WHERE employee_id NOT IN (SELECT employee_id FROM Employees)
This subquery mirrors the first one, but in reverse. It finds all employee_id
s in Salaries
but not in Employees
. These employees have salaries but are missing name information.
UNION
Operator: UNION
combines the results of the two subqueries. UNION
automatically removes duplicate entries, ensuring each employee ID appears only once in the final output, even if they are missing both name and salary (which would be present in both subquery results).
ORDER BY
Clause: ORDER BY 1
sorts the final result set in ascending order based on the employee_id
column (column 1).
The time complexity of this SQL query is dominated by the subqueries which perform NOT IN
operations. The complexity of NOT IN
operation depends on the database implementation but is generally O(NM) in the worst case (nested loop join) where N and M are the number of rows in the two tables being compared. However, optimized database engines often use indexes and other optimization techniques to significantly improve this performance. In most practical scenarios, the performance will be much better than O(NM), possibly closer to O(N + M) or even logarithmic time with efficient indexing.
The UNION
operation has a complexity that is linear with the number of rows from the subqueries. The ORDER BY
operation adds an additional sorting step, which typically has a time complexity of O(K log K), where K is the number of rows in the combined result of the UNION
. However, database systems often use highly optimized sorting algorithms to manage this.
SELECT employee_id
FROM Employees
WHERE employee_id NOT IN (SELECT employee_id FROM Salaries)
UNION
SELECT employee_id
FROM Salaries
WHERE employee_id NOT IN (SELECT employee_id FROM Employees)
ORDER BY 1;
This query directly implements the described approach in standard SQL, making it compatible with most SQL database systems. The specific performance will depend on factors like database engine, indexing, data size, and query optimization techniques employed by the database system.