{x}
blog image

Reported Posts II

Solution Explanation for LeetCode 1132: Reported Posts II

This problem requires calculating the average daily percentage of posts removed after being reported as spam. The solution involves querying two tables: Actions and Removals.

Approach:

The core idea is to determine, for each day, the ratio of spam-reported posts that were subsequently removed to the total number of spam-reported posts on that day. This ratio, expressed as a percentage, is then averaged across all days.

MySQL Solution:

The provided MySQL solution utilizes a Common Table Expression (CTE) called T for clarity and efficiency. Here's a breakdown:

  1. Inner Query (CTE T):

    • SELECT COUNT(DISTINCT t2.post_id) / COUNT(DISTINCT t1.post_id) * 100 AS percent: This calculates the daily percentage. It counts the distinct post_ids from the Removals table (t2) that match spam reports in the Actions table (t1), divides it by the total distinct post_ids of spam reports, and multiplies by 100 to get a percentage. The DISTINCT keyword is crucial to avoid overcounting if a post is reported multiple times on the same day.
    • FROM Actions AS t1 LEFT JOIN Removals AS t2 ON t1.post_id = t2.post_id: This performs a LEFT JOIN between the Actions and Removals tables, connecting posts with their removal status. A LEFT JOIN ensures that all spam reports are included, even if a post wasn't removed.
    • WHERE extra = 'spam': This filters the results to only consider posts reported as spam.
    • GROUP BY action_date: This groups the results by date, so the percentage is calculated for each day.
  2. Outer Query:

    • SELECT ROUND(AVG(percent), 2) AS average_daily_percent: This calculates the average of the daily percentages (percent) computed in the CTE and rounds the result to 2 decimal places using ROUND().

Time Complexity Analysis:

The time complexity is dominated by the joins and aggregations within the CTE. In the worst case, the LEFT JOIN between Actions and Removals could take O(M*N) time, where M and N are the number of rows in Actions and Removals, respectively. However, database optimizers significantly reduce this cost in practice by using efficient join algorithms. The GROUP BY and AVG operations add further cost but are generally logarithmic or linear in the number of groups.

Therefore, while a worst-case theoretical bound might be O(M*N), the practical time complexity is likely closer to O(max(M,N) * log(max(M,N))) due to database optimizations and indexing.

Space Complexity Analysis:

The space complexity depends on the size of the intermediate results. The CTE T will store at most the number of unique dates in Actions where spam reports exist. The final result only stores a single value (the average percentage). Therefore, the space complexity is O(D), where D is the number of unique dates with spam reports. This is typically much smaller than the size of the input tables.

In summary, the solution efficiently calculates the desired average daily percentage using SQL's built-in functions and joins, offering a concise and performant way to address the problem. The database's query optimizer plays a significant role in improving the performance beyond the theoretical worst-case complexity.