This problem requires finding pairs of friends who have listened to at least three common songs on the same day. The solution leverages SQL joins and aggregate functions to efficiently achieve this.
Join Tables: We start by joining the Friendship
table with the Listens
table twice (aliased as l1
and l2
). l1
represents listening activity for user1_id
, and l2
represents listening activity for user2_id
. The join condition ensures that we only consider pairs of friends.
Filter Common Songs and Days: A WHERE
clause filters the results to include only rows where l1.song_id = l2.song_id
and l1.day = l2.day
. This ensures that we only consider songs listened to by both users on the same day.
Group and Count: We then GROUP BY
user1_id
, user2_id
, and l1.day
to group the results by friend pairs and day. The HAVING
clause filters the groups, keeping only those where COUNT(DISTINCT l1.song_id) >= 3
. This ensures that at least three distinct songs were listened to in common.
Select Distinct Pairs: Finally, SELECT DISTINCT user1_id, user2_id
is used to retrieve the unique pairs of similar friends.
SELECT DISTINCT user1_id, user2_id
FROM
Friendship AS f
LEFT JOIN Listens AS l1 ON user1_id = l1.user_id
LEFT JOIN Listens AS l2 ON user2_id = l2.user_id
WHERE l1.song_id = l2.song_id AND l1.day = l2.day
GROUP BY 1, 2, l1.day
HAVING COUNT(DISTINCT l1.song_id) >= 3;
The time complexity of this SQL query is dominated by the join operation. In the worst case, the join between Friendship
and Listens
tables could be O(NM), where N is the number of rows in Friendship
and M is the number of rows in Listens
. The grouping and counting operations are generally efficient in SQL databases, often having logarithmic or near-linear complexity depending on the database engine's optimization strategies. Therefore, the overall time complexity is roughly **O(NM)** in the worst case, although it would likely be much faster in practice due to database optimizations.
The space complexity is dominated by the intermediate tables created during the join and grouping operations. In the worst case, this could be proportional to the size of the input data. Hence, the space complexity is O(N+M), where N is the size of the Friendship
table and M is the size of the Listens
table. Again, database optimizations often reduce the actual space used.