{x}
blog image

Students and Examinations

Table: Students

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| student_name  | varchar |
+---------------+---------+
student_id is the primary key (column with unique values) for this table.
Each row of this table contains the ID and the name of one student in the school.

 

Table: Subjects

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+
subject_name is the primary key (column with unique values) for this table.
Each row of this table contains the name of one subject in the school.

 

Table: Examinations

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| student_id   | int     |
| subject_name | varchar |
+--------------+---------+
There is no primary key (column with unique values) for this table. It may contain duplicates.
Each student from the Students table takes every course from the Subjects table.
Each row of this table indicates that a student with ID student_id attended the exam of subject_name.

 

Write a solution to find the number of times each student attended each exam.

Return the result table ordered by student_id and subject_name.

The result format is in the following example.

 

Example 1:

Input: 
Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1          | Alice        |
| 2          | Bob          |
| 13         | John         |
| 6          | Alex         |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math         |
| Physics      |
| Programming  |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1          | Math         |
| 1          | Physics      |
| 1          | Programming  |
| 2          | Programming  |
| 1          | Physics      |
| 1          | Math         |
| 13         | Math         |
| 13         | Programming  |
| 13         | Physics      |
| 2          | Math         |
| 1          | Math         |
+------------+--------------+
Output: 
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1          | Alice        | Math         | 3              |
| 1          | Alice        | Physics      | 2              |
| 1          | Alice        | Programming  | 1              |
| 2          | Bob          | Math         | 1              |
| 2          | Bob          | Physics      | 0              |
| 2          | Bob          | Programming  | 1              |
| 6          | Alex         | Math         | 0              |
| 6          | Alex         | Physics      | 0              |
| 6          | Alex         | Programming  | 0              |
| 13         | John         | Math         | 1              |
| 13         | John         | Physics      | 1              |
| 13         | John         | Programming  | 1              |
+------------+--------------+--------------+----------------+
Explanation: 
The result table should contain all students and all subjects.
Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.
Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics exam.
Alex did not attend any exams.
John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.

Solution Explanation

This problem requires retrieving the number of times each student attended each exam. The solution uses SQL queries to achieve this. The core idea is to perform a series of joins to combine data from the Students, Subjects, and Examinations tables, and then use aggregation to count the occurrences.

Approach

  1. Full Cartesian Product: The solution begins by generating all possible combinations of students and subjects. This is achieved by joining the Students and Subjects tables using a JOIN. Since every student should be represented with every subject, regardless of exam attendance, a CROSS JOIN or JOIN is used. In this case, a JOIN implicitly acts as a CROSS JOIN due to the nature of the subsequent LEFT JOIN.

  2. Left Join with Examinations: A LEFT JOIN is performed with the Examinations table. The LEFT JOIN ensures that all student-subject combinations from the previous join are included in the result, even if there's no matching entry in the Examinations table (meaning the student didn't attend that particular exam). The join condition uses USING (student_id, subject_name) which efficiently joins based on both common columns.

  3. Aggregation and Grouping: COUNT(e.student_id) counts the number of times each student-subject combination appears in the Examinations table. GROUP BY 1, 3 groups the results by student_id (column 1) and subject_name (column 3), ensuring that the COUNT function aggregates correctly. Aliasing COUNT(e.student_id) as attended_exams improves readability.

  4. Ordering: ORDER BY 1, 3 orders the final result set by student_id and then subject_name as specified in the problem description.

MySQL Code Explained

SELECT student_id, student_name, subject_name, COUNT(e.student_id) AS attended_exams
FROM
    Students
    JOIN Subjects
    LEFT JOIN Examinations AS e USING (student_id, subject_name)
GROUP BY 1, 3
ORDER BY 1, 3;
  • SELECT student_id, student_name, subject_name, COUNT(e.student_id) AS attended_exams: This selects the student ID, student name, subject name, and the count of exam attendances, aliased as attended_exams.

  • FROM Students JOIN Subjects LEFT JOIN Examinations AS e USING (student_id, subject_name): This performs the joins as described above. The USING clause is a concise way to specify the join columns.

  • GROUP BY 1, 3: This groups the results by the first and third selected columns (student_id and subject_name).

  • ORDER BY 1, 3: This orders the result set by student_id and then subject_name.

Time Complexity Analysis

The time complexity is dominated by the joins and the grouping operation. In a well-indexed database (with indexes on student_id and subject_name in all three tables), the joins should have a time complexity close to O(N log N) or even better (depending on the specific database engine and query optimizer), where N represents the total number of rows across all three tables. The grouping operation also has a time complexity roughly proportional to the number of unique student-subject combinations, which should be relatively small in comparison to N.

Therefore, the overall time complexity can be approximated as O(N log N), but could be closer to linear time with efficient indexing. The space complexity is O(M), where M is the number of unique student-subject pairs and hence the size of the result set.