This problem requires calculating the average daily percentage of posts removed after being reported as spam. The solution involves querying two tables: Actions
and Removals
.
Approach:
The core idea is to determine, for each day, the ratio of spam-reported posts that were subsequently removed to the total number of spam-reported posts on that day. This ratio, expressed as a percentage, is then averaged across all days.
MySQL Solution:
The provided MySQL solution utilizes a Common Table Expression (CTE) called T
for clarity and efficiency. Here's a breakdown:
Inner Query (CTE T):
SELECT COUNT(DISTINCT t2.post_id) / COUNT(DISTINCT t1.post_id) * 100 AS percent
: This calculates the daily percentage. It counts the distinct post_id
s from the Removals
table (t2
) that match spam reports in the Actions
table (t1
), divides it by the total distinct post_id
s of spam reports, and multiplies by 100 to get a percentage. The DISTINCT
keyword is crucial to avoid overcounting if a post is reported multiple times on the same day.FROM Actions AS t1 LEFT JOIN Removals AS t2 ON t1.post_id = t2.post_id
: This performs a LEFT JOIN
between the Actions
and Removals
tables, connecting posts with their removal status. A LEFT JOIN
ensures that all spam reports are included, even if a post wasn't removed.WHERE extra = 'spam'
: This filters the results to only consider posts reported as spam.GROUP BY action_date
: This groups the results by date, so the percentage is calculated for each day.Outer Query:
SELECT ROUND(AVG(percent), 2) AS average_daily_percent
: This calculates the average of the daily percentages (percent
) computed in the CTE and rounds the result to 2 decimal places using ROUND()
.Time Complexity Analysis:
The time complexity is dominated by the joins and aggregations within the CTE. In the worst case, the LEFT JOIN
between Actions
and Removals
could take O(M*N) time, where M and N are the number of rows in Actions
and Removals
, respectively. However, database optimizers significantly reduce this cost in practice by using efficient join algorithms. The GROUP BY
and AVG
operations add further cost but are generally logarithmic or linear in the number of groups.
Therefore, while a worst-case theoretical bound might be O(M*N), the practical time complexity is likely closer to O(max(M,N) * log(max(M,N))) due to database optimizations and indexing.
Space Complexity Analysis:
The space complexity depends on the size of the intermediate results. The CTE T
will store at most the number of unique dates in Actions
where spam reports exist. The final result only stores a single value (the average percentage). Therefore, the space complexity is O(D), where D is the number of unique dates with spam reports. This is typically much smaller than the size of the input tables.
In summary, the solution efficiently calculates the desired average daily percentage using SQL's built-in functions and joins, offering a concise and performant way to address the problem. The database's query optimizer plays a significant role in improving the performance beyond the theoretical worst-case complexity.