This problem requires finding users who both follow at least one other user and are followed by at least one other user. These are the "second-degree followers." The solution uses SQL to efficiently query and aggregate the data from the Follow
table.
The solution employs a self-join and aggregation. Let's break down the approach:
Self-Join: The core idea is to join the Follow
table with itself. This allows us to identify relationships where a user (follower
in the first instance of the table) follows another user, and that same user is followed by yet another user.
Identify Second-Degree Followers: The self-join creates pairs of users, such that the first element of the pair follows the second. We are interested in users who appear as both a follower
(someone who follows others) and a followee
(someone who is followed by others).
Aggregation and Counting: After identifying second-degree followers, we group the results by follower
and use COUNT(DISTINCT followee)
to determine how many unique users each second-degree follower is followed by.
Ordering: The final ORDER BY
clause sorts the results alphabetically by follower
.
WITH
T AS (
SELECT f1.follower AS follower, f2.follower AS followee
FROM
Follow AS f1
JOIN Follow AS f2 ON f1.follower = f2.followee
)
SELECT follower, COUNT(DISTINCT followee) AS num
FROM T
GROUP BY 1
ORDER BY 1;
WITH T AS (...)
: This is a Common Table Expression (CTE). It defines a temporary result set named T
which is used in the main query. This makes the query more readable and easier to understand. The CTE performs the self-join.
SELECT f1.follower AS follower, f2.follower AS followee
: This selects the follower
from the first instance of the Follow
table (f1
) and labels it follower
, and the follower
from the second instance of the Follow
table (f2
) and labels it followee
. The JOIN
condition ensures only those pairs are included where f1.follower = f2.followee
, representing that a user is both a follower and a followee.FROM Follow AS f1 JOIN Follow AS f2 ON f1.follower = f2.followee
: This performs the self-join as explained above.SELECT follower, COUNT(DISTINCT followee) AS num
: This part of the query selects the follower
and counts the distinct number of followee
for each follower
. COUNT(DISTINCT followee)
ensures that we don't overcount if a follower is followed by the same user multiple times.
FROM T
: The main query selects from the CTE T
which contains the results of the self-join.
GROUP BY 1
: This groups the results by the first column of the SELECT
statement (which is follower
).
ORDER BY 1
: This sorts the results in ascending order based on the first column (i.e., follower
).
The time complexity of this SQL query is dominated by the self-join. In the worst case, the size of the Follow
table is n
. The self-join has a time complexity of O(n^2) in the worst case (although in practice, database optimizers may improve this). The subsequent aggregation and sorting steps have a complexity of O(n log n) in the worst case, assuming efficient sorting algorithms are used. Therefore, the overall time complexity is dominated by the self-join, making it O(n^2) in the worst-case scenario. However, with proper indexing on the follower
and followee
columns, the performance would be much better in practice, likely closer to O(n log n) because the join can take advantage of the indexes to find matches efficiently.
The space complexity is O(n) to store the intermediate results of the self-join. The final result set also has a maximum size of O(n), thus the space complexity remains O(n).