{x}
blog image

Movie Rating

Table: Movies

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| title         | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.

 

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.
The column 'name' has unique values.

Table: MovieRating

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| user_id       | int     |
| rating        | int     |
| created_at    | date    |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date. 

 

Write a solution to:

  • Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
  • Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.

The result format is in the following example.

 

Example 1:

Input: 
Movies table:
+-------------+--------------+
| movie_id    |  title       |
+-------------+--------------+
| 1           | Avengers     |
| 2           | Frozen 2     |
| 3           | Joker        |
+-------------+--------------+
Users table:
+-------------+--------------+
| user_id     |  name        |
+-------------+--------------+
| 1           | Daniel       |
| 2           | Monica       |
| 3           | Maria        |
| 4           | James        |
+-------------+--------------+
MovieRating table:
+-------------+--------------+--------------+-------------+
| movie_id    | user_id      | rating       | created_at  |
+-------------+--------------+--------------+-------------+
| 1           | 1            | 3            | 2020-01-12  |
| 1           | 2            | 4            | 2020-02-11  |
| 1           | 3            | 2            | 2020-02-12  |
| 1           | 4            | 1            | 2020-01-01  |
| 2           | 1            | 5            | 2020-02-17  | 
| 2           | 2            | 2            | 2020-02-01  | 
| 2           | 3            | 2            | 2020-03-01  |
| 3           | 1            | 3            | 2020-02-22  | 
| 3           | 2            | 4            | 2020-02-25  | 
+-------------+--------------+--------------+-------------+
Output: 
+--------------+
| results      |
+--------------+
| Daniel       |
| Frozen 2     |
+--------------+
Explanation: 
Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker") but Daniel is smaller lexicographically.
Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.

Solution Explanation

This problem requires querying three tables: Movies, Users, and MovieRating to find two pieces of information:

  1. The user who rated the greatest number of movies. If there's a tie, the lexicographically smaller username is chosen.
  2. The movie with the highest average rating in February 2020. Again, if there's a tie, the lexicographically smaller movie title is selected.

The solution uses a UNION ALL clause to combine the results of two separate queries, each addressing one part of the problem.

MySQL Solution Breakdown

The solution is implemented in MySQL. Let's analyze each part of the UNION ALL query:

Part 1: Finding the user with the most ratings

(
    SELECT name AS results
    FROM
        Users
        JOIN MovieRating USING (user_id)
    GROUP BY user_id
    ORDER BY COUNT(1) DESC, name
    LIMIT 1
)
  • JOIN Clause: This joins the Users and MovieRating tables using the user_id column, connecting users to their movie ratings. USING (user_id) is a shorthand for ON Users.user_id = MovieRating.user_id.
  • GROUP BY user_id: Groups the results by user ID, allowing us to count the number of ratings per user.
  • COUNT(1): Counts the number of rows (ratings) for each user.
  • ORDER BY COUNT(1) DESC, name: Sorts the results first in descending order based on the rating count (COUNT(1)), and then in ascending order based on the user's name (name) to handle ties.
  • LIMIT 1: Selects only the top row, which represents the user with the most ratings.
  • AS results: Assigns the alias "results" to the name column for consistency with the output format.

Part 2: Finding the movie with the highest average rating in February 2020

(
    SELECT title
    FROM
        MovieRating
        JOIN Movies USING (movie_id)
    WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02'
    GROUP BY movie_id
    ORDER BY AVG(rating) DESC, title
    LIMIT 1
)
  • JOIN Clause: Similar to Part 1, this joins MovieRating and Movies tables using movie_id.
  • WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02': Filters the results to include only ratings from February 2020. DATE_FORMAT extracts the year and month from the created_at column.
  • GROUP BY movie_id: Groups the results by movie ID to calculate the average rating per movie.
  • AVG(rating): Calculates the average rating for each movie.
  • ORDER BY AVG(rating) DESC, title: Sorts the results first in descending order by average rating and then in ascending order by movie title (title) to handle ties.
  • LIMIT 1: Selects only the top row, representing the movie with the highest average rating.

UNION ALL Clause:

The UNION ALL combines the results of the two parts into a single result set. UNION ALL keeps all rows, including duplicates (though duplicates are unlikely in this case given the LIMIT 1 clauses). A simple UNION would remove duplicates.

Time Complexity Analysis

The time complexity of this solution is dominated by the GROUP BY and ORDER BY operations within each of the subqueries. The GROUP BY operation has a time complexity of O(N log N) in the worst case, where N is the number of rows in the joined tables (potentially a large number of rows in the MovieRating table). The ORDER BY operation also has a time complexity of O(N log N) in the worst case. Therefore, the overall time complexity of the entire query is approximately O(N log N), where N is the number of rows in the relevant joined tables. The JOIN operation itself can also contribute significantly to the execution time depending on the database optimization strategies and the size of the tables.

Space Complexity Analysis

The space complexity is determined by the intermediate data structures created during the execution of the query, including the temporary tables created for grouping and sorting. The space complexity will be proportional to the size of the input data (the number of rows in the joined tables). Therefore, the space complexity is approximately O(N), where N is the number of rows in the joined tables.