Table: Movies
+---------------+---------+ | Column Name | Type | +---------------+---------+ | movie_id | int | | title | varchar | +---------------+---------+ movie_id is the primary key (column with unique values) for this table. title is the name of the movie.
Table: Users
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | name | varchar | +---------------+---------+ user_id is the primary key (column with unique values) for this table. The column 'name' has unique values.
Table: MovieRating
+---------------+---------+ | Column Name | Type | +---------------+---------+ | movie_id | int | | user_id | int | | rating | int | | created_at | date | +---------------+---------+ (movie_id, user_id) is the primary key (column with unique values) for this table. This table contains the rating of a movie by a user in their review. created_at is the user's review date.
Write a solution to:
February 2020
. In case of a tie, return the lexicographically smaller movie name.The result format is in the following example.
Example 1:
Input: Movies table: +-------------+--------------+ | movie_id | title | +-------------+--------------+ | 1 | Avengers | | 2 | Frozen 2 | | 3 | Joker | +-------------+--------------+ Users table: +-------------+--------------+ | user_id | name | +-------------+--------------+ | 1 | Daniel | | 2 | Monica | | 3 | Maria | | 4 | James | +-------------+--------------+ MovieRating table: +-------------+--------------+--------------+-------------+ | movie_id | user_id | rating | created_at | +-------------+--------------+--------------+-------------+ | 1 | 1 | 3 | 2020-01-12 | | 1 | 2 | 4 | 2020-02-11 | | 1 | 3 | 2 | 2020-02-12 | | 1 | 4 | 1 | 2020-01-01 | | 2 | 1 | 5 | 2020-02-17 | | 2 | 2 | 2 | 2020-02-01 | | 2 | 3 | 2 | 2020-03-01 | | 3 | 1 | 3 | 2020-02-22 | | 3 | 2 | 4 | 2020-02-25 | +-------------+--------------+--------------+-------------+ Output: +--------------+ | results | +--------------+ | Daniel | | Frozen 2 | +--------------+ Explanation: Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker") but Daniel is smaller lexicographically. Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.
This problem requires querying three tables: Movies
, Users
, and MovieRating
to find two pieces of information:
The solution uses a UNION ALL
clause to combine the results of two separate queries, each addressing one part of the problem.
The solution is implemented in MySQL. Let's analyze each part of the UNION ALL
query:
Part 1: Finding the user with the most ratings
(
SELECT name AS results
FROM
Users
JOIN MovieRating USING (user_id)
GROUP BY user_id
ORDER BY COUNT(1) DESC, name
LIMIT 1
)
JOIN
Clause: This joins the Users
and MovieRating
tables using the user_id
column, connecting users to their movie ratings. USING (user_id)
is a shorthand for ON Users.user_id = MovieRating.user_id
.GROUP BY user_id
: Groups the results by user ID, allowing us to count the number of ratings per user.COUNT(1)
: Counts the number of rows (ratings) for each user.ORDER BY COUNT(1) DESC, name
: Sorts the results first in descending order based on the rating count (COUNT(1)
), and then in ascending order based on the user's name (name
) to handle ties.LIMIT 1
: Selects only the top row, which represents the user with the most ratings.AS results
: Assigns the alias "results" to the name
column for consistency with the output format.Part 2: Finding the movie with the highest average rating in February 2020
(
SELECT title
FROM
MovieRating
JOIN Movies USING (movie_id)
WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02'
GROUP BY movie_id
ORDER BY AVG(rating) DESC, title
LIMIT 1
)
JOIN
Clause: Similar to Part 1, this joins MovieRating
and Movies
tables using movie_id
.WHERE DATE_FORMAT(created_at, '%Y-%m') = '2020-02'
: Filters the results to include only ratings from February 2020. DATE_FORMAT
extracts the year and month from the created_at
column.GROUP BY movie_id
: Groups the results by movie ID to calculate the average rating per movie.AVG(rating)
: Calculates the average rating for each movie.ORDER BY AVG(rating) DESC, title
: Sorts the results first in descending order by average rating and then in ascending order by movie title (title
) to handle ties.LIMIT 1
: Selects only the top row, representing the movie with the highest average rating.UNION ALL
Clause:
The UNION ALL
combines the results of the two parts into a single result set. UNION ALL
keeps all rows, including duplicates (though duplicates are unlikely in this case given the LIMIT 1
clauses). A simple UNION
would remove duplicates.
The time complexity of this solution is dominated by the GROUP BY
and ORDER BY
operations within each of the subqueries. The GROUP BY
operation has a time complexity of O(N log N) in the worst case, where N is the number of rows in the joined tables (potentially a large number of rows in the MovieRating
table). The ORDER BY
operation also has a time complexity of O(N log N) in the worst case. Therefore, the overall time complexity of the entire query is approximately O(N log N), where N is the number of rows in the relevant joined tables. The JOIN
operation itself can also contribute significantly to the execution time depending on the database optimization strategies and the size of the tables.
The space complexity is determined by the intermediate data structures created during the execution of the query, including the temporary tables created for grouping and sorting. The space complexity will be proportional to the size of the input data (the number of rows in the joined tables). Therefore, the space complexity is approximately O(N), where N is the number of rows in the joined tables.