{x}
blog image

Count Salary Categories

Table: Accounts

+-------------+------+
| Column Name | Type |
+-------------+------+
| account_id  | int  |
| income      | int  |
+-------------+------+
account_id is the primary key (column with unique values) for this table.
Each row contains information about the monthly income for one bank account.

 

Write a solution to calculate the number of bank accounts for each salary category. The salary categories are:

  • "Low Salary": All the salaries strictly less than $20000.
  • "Average Salary": All the salaries in the inclusive range [$20000, $50000].
  • "High Salary": All the salaries strictly greater than $50000.

The result table must contain all three categories. If there are no accounts in a category, return 0.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Accounts table:
+------------+--------+
| account_id | income |
+------------+--------+
| 3          | 108939 |
| 2          | 12747  |
| 8          | 87709  |
| 6          | 91796  |
+------------+--------+
Output: 
+----------------+----------------+
| category       | accounts_count |
+----------------+----------------+
| Low Salary     | 1              |
| Average Salary | 0              |
| High Salary    | 3              |
+----------------+----------------+
Explanation: 
Low Salary: Account 2.
Average Salary: No accounts.
High Salary: Accounts 3, 6, and 8.

Solution Explanation

The problem asks to categorize bank accounts based on their income and count the number of accounts in each category. The categories are: "Low Salary" (income < $20000), "Average Salary" ($20000 <= income <= $50000), and "High Salary" (income > $50000). The solution must return all three categories, even if a category has zero accounts.

Two approaches are presented:

Approach 1: Temporary Table + Grouping + Left Join

This approach uses a more structured method. It involves creating temporary tables to streamline the process:

  1. S (Salary Categories): A temporary table is created to list all salary categories explicitly. This ensures all categories are included in the final output, regardless of whether they contain any accounts.
  2. T (Account Counts): This table groups the Accounts table based on the salary category (calculated using a CASE statement) and counts the number of accounts in each category.
  3. LEFT JOIN: Finally, a LEFT JOIN is performed between S and T using category as the join key. This ensures all categories from S are included in the final result. IFNULL(accounts_count, 0) handles cases where a category has no matching entries in T, replacing NULL with 0.

Approach 2: Filtering + Merging

This approach is more concise. It directly filters the Accounts table three times, once for each salary category, using conditional aggregation (SUM(condition) which counts the number of rows satisfying the condition). The results from these three queries are then merged using UNION. IFNULL is used similarly to handle potential NULL values where a category is empty.

Time and Space Complexity Analysis

Approach 1:

  • Time Complexity: O(N log N) due to the GROUP BY operation in the T table (assuming efficient sorting). The LEFT JOIN operation has a time complexity that depends on the implementation of the database system but is typically O(N). The creation of the temporary table S takes constant time as it has a fixed number of rows (3).
  • Space Complexity: O(N) to store the temporary tables T and potentially intermediate results. S takes constant space.

Approach 2:

  • Time Complexity: O(N). The filtering and aggregation operations (e.g., SUM(income < 20000)) are linear in the number of rows in the Accounts table. The UNION operation is generally linear in the total number of rows across all the merged queries.
  • Space Complexity: O(1) - Constant space. It does not create any large temporary tables; the space usage is dominated by the input data.

Code Examples

The code examples are already provided in the original response in MySQL. While other languages like Python (using Pandas or SQLalchemy) could be used to achieve the same results given a CSV or similar data format, a direct translation to other SQL dialects (PostgreSQL, SQL Server, etc.) would be mostly similar. The core logic of conditional aggregation and joining/unioning would remain the same. Only minor syntactic changes would be necessary to adapt to the specific SQL dialect.