Table: ActorDirector
+-------------+---------+ | Column Name | Type | +-------------+---------+ | actor_id | int | | director_id | int | | timestamp | int | +-------------+---------+ timestamp is the primary key (column with unique values) for this table.
Write a solution to find all the pairs (actor_id, director_id)
where the actor has cooperated with the director at least three times.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: ActorDirector table: +-------------+-------------+-------------+ | actor_id | director_id | timestamp | +-------------+-------------+-------------+ | 1 | 1 | 0 | | 1 | 1 | 1 | | 1 | 1 | 2 | | 1 | 2 | 3 | | 1 | 2 | 4 | | 2 | 1 | 5 | | 2 | 1 | 6 | +-------------+-------------+-------------+ Output: +-------------+-------------+ | actor_id | director_id | +-------------+-------------+ | 1 | 1 | +-------------+-------------+ Explanation: The only pair is (1, 1) where they cooperated exactly 3 times.
This problem requires finding all pairs of (actor_id, director_id)
that have collaborated at least three times. The solution leverages SQL's aggregation capabilities to efficiently achieve this.
Approach:
The core idea is to group the ActorDirector
table's rows based on actor_id
and director_id
and then count the occurrences of each pair. We only keep pairs with a count greater than or equal to 3.
SQL Query Explanation:
SELECT actor_id, director_id
: This selects the actor ID and director ID columns which will form the output.
FROM ActorDirector
: This specifies that we're querying the ActorDirector
table.
GROUP BY 1, 2
: This groups the rows based on the first and second columns in the SELECT statement (i.e., actor_id
and director_id
). This means all rows with the same actor_id
and director_id
will be grouped together.
HAVING COUNT(*) >= 3
: This is a crucial part. COUNT(*)
counts the number of rows in each group. The HAVING
clause filters these groups, keeping only those where the count is greater than or equal to 3. This ensures that we only return pairs that have collaborated at least three times.
Time Complexity:
The time complexity of this SQL query is dominated by the GROUP BY
and HAVING
operations. The complexity depends on the database engine's optimization techniques, but generally, it's considered to be approximately O(N log N) or O(N), where N is the number of rows in the ActorDirector
table. The exact complexity can vary depending on the size of the table, the indexing, and the database engine used. Modern database engines are highly optimized for these operations, making them very efficient for large datasets.
Space Complexity:
The space complexity is determined by the intermediate data structures used during the grouping and aggregation phases. This is typically proportional to the number of unique pairs of (actor_id, director_id)
, which could range from O(1) in the best case (very few unique pairs) to O(N) in the worst case (most pairs are unique). In practice, it usually falls somewhere in between, depending on the data distribution.
Code (MySQL):
SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(*) >= 3;
This code is directly executable in a MySQL environment. Other SQL dialects (PostgreSQL, SQL Server, etc.) would have very similar syntax. The core concepts of GROUP BY
and HAVING
remain the same across different SQL databases.