{x}
blog image

Employees With Missing Information

Table: Employees

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| employee_id | int     |
| name        | varchar |
+-------------+---------+
employee_id is the column with unique values for this table.
Each row of this table indicates the name of the employee whose ID is employee_id.

 

Table: Salaries

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| employee_id | int     |
| salary      | int     |
+-------------+---------+
employee_id is the column with unique values for this table.
Each row of this table indicates the salary of the employee whose ID is employee_id.

 

Write a solution to report the IDs of all the employees with missing information. The information of an employee is missing if:

  • The employee's name is missing, or
  • The employee's salary is missing.

Return the result table ordered by employee_id in ascending order.

The result format is in the following example.

 

Example 1:

Input: 
Employees table:
+-------------+----------+
| employee_id | name     |
+-------------+----------+
| 2           | Crew     |
| 4           | Haven    |
| 5           | Kristian |
+-------------+----------+
Salaries table:
+-------------+--------+
| employee_id | salary |
+-------------+--------+
| 5           | 76071  |
| 1           | 22517  |
| 4           | 63539  |
+-------------+--------+
Output: 
+-------------+
| employee_id |
+-------------+
| 1           |
| 2           |
+-------------+
Explanation: 
Employees 1, 2, 4, and 5 are working at this company.
The name of employee 1 is missing.
The salary of employee 2 is missing.

Solution Explanation: Finding Employees with Missing Information

The problem requires identifying employees with missing information – either their name or salary is absent from the respective tables (Employees and Salaries). The solution leverages SQL's set operations to efficiently achieve this.

Approach: Using UNION of Subqueries

The core idea is to identify employees missing from either table and combine the results. This is done using two subqueries and the UNION operator.

  1. Subquery 1: SELECT employee_id FROM Employees WHERE employee_id NOT IN (SELECT employee_id FROM Salaries) This subquery finds all employee_ids present in the Employees table but absent in the Salaries table. This means these employees have names but are missing salary information.

  2. Subquery 2: SELECT employee_id FROM Salaries WHERE employee_id NOT IN (SELECT employee_id FROM Employees) This subquery mirrors the first one, but in reverse. It finds all employee_ids in Salaries but not in Employees. These employees have salaries but are missing name information.

  3. UNION Operator: UNION combines the results of the two subqueries. UNION automatically removes duplicate entries, ensuring each employee ID appears only once in the final output, even if they are missing both name and salary (which would be present in both subquery results).

  4. ORDER BY Clause: ORDER BY 1 sorts the final result set in ascending order based on the employee_id column (column 1).

Time Complexity Analysis

The time complexity of this SQL query is dominated by the subqueries which perform NOT IN operations. The complexity of NOT IN operation depends on the database implementation but is generally O(NM) in the worst case (nested loop join) where N and M are the number of rows in the two tables being compared. However, optimized database engines often use indexes and other optimization techniques to significantly improve this performance. In most practical scenarios, the performance will be much better than O(NM), possibly closer to O(N + M) or even logarithmic time with efficient indexing.

The UNION operation has a complexity that is linear with the number of rows from the subqueries. The ORDER BY operation adds an additional sorting step, which typically has a time complexity of O(K log K), where K is the number of rows in the combined result of the UNION. However, database systems often use highly optimized sorting algorithms to manage this.

Code in MySQL

SELECT employee_id
FROM Employees
WHERE employee_id NOT IN (SELECT employee_id FROM Salaries)
UNION
SELECT employee_id
FROM Salaries
WHERE employee_id NOT IN (SELECT employee_id FROM Employees)
ORDER BY 1;

This query directly implements the described approach in standard SQL, making it compatible with most SQL database systems. The specific performance will depend on factors like database engine, indexing, data size, and query optimization techniques employed by the database system.