{x}
blog image

Get Highest Answer Rate Question

Solution Explanation for LeetCode 578: Get Highest Answer Rate Question

This problem requires finding the question with the highest answer rate from a SurveyLog table. The answer rate is calculated as the number of times a question was answered divided by the number of times it was shown. If multiple questions share the highest rate, the one with the smallest question_id should be returned.

We'll analyze two SQL solutions:

Solution 1: Using SUM() and GROUP BY

This solution efficiently calculates the answer rate for each question and then selects the question with the highest rate.

MySQL Code:

SELECT question_id AS survey_log
FROM SurveyLog
GROUP BY 1
ORDER BY SUM(action = 'answer') / SUM(action = 'show') DESC, 1
LIMIT 1;

Explanation:

  1. SELECT question_id AS survey_log: This selects the question_id and renames it to survey_log as required by the problem statement.

  2. FROM SurveyLog: This specifies the table to query.

  3. GROUP BY 1: This groups the rows by question_id (the first column in the SELECT statement). This is crucial for calculating the aggregate sums for each question.

  4. SUM(action = 'answer'): This counts the number of rows where the action is 'answer' for each group (question). MySQL treats boolean expressions as integers (1 for true, 0 for false) in aggregate functions.

  5. SUM(action = 'show'): Similarly, this counts the number of rows where action is 'show' for each group.

  6. SUM(action = 'answer') / SUM(action = 'show'): This calculates the answer rate for each question. We divide the number of answers by the number of times the question was shown. Note that if a question was never shown, this division will result in an error; however, the problem statement implies questions are always shown at least once.

  7. ORDER BY SUM(action = 'answer') / SUM(action = 'show') DESC, 1: This orders the results first by the calculated answer rate in descending order (DESC). The , 1 clause is a secondary sorting criterion: if two questions have the same answer rate, it will order them by question_id in ascending order (implicitly ASC), selecting the smaller question_id as per the problem requirements.

  8. LIMIT 1: This limits the result set to only the top row—the question with the highest answer rate (or the smallest question_id in case of a tie).

Time Complexity: The time complexity is dominated by the GROUP BY operation, which is typically O(N log N) or O(N) depending on the database's optimization strategies, where N is the number of rows in the SurveyLog table.

Space Complexity: The space complexity is O(M), where M is the number of distinct question_id values. This is because it needs to store intermediate results for each unique question.

Solution 2: Using Window Functions (MySQL)

This solution uses window functions to calculate the answer rate for each question in a more concise way.

MySQL Code:

WITH
    T AS (
        SELECT
            question_id AS survey_log,
            (SUM(action = 'answer') OVER (PARTITION BY question_id)) / (
                SUM(action = 'show') OVER (PARTITION BY question_id)
            ) AS ratio
        FROM SurveyLog
    )
SELECT survey_log
FROM T
ORDER BY ratio DESC, 1
LIMIT 1;

Explanation:

  1. WITH T AS (...): This defines a Common Table Expression (CTE) named T. CTEs are useful for breaking down complex queries into smaller, more manageable parts.

  2. SELECT question_id AS survey_log, ...: Similar to Solution 1, this selects the question_id.

  3. (SUM(action = 'answer') OVER (PARTITION BY question_id)): This uses a window function to calculate the sum of 'answer' actions for each question_id. PARTITION BY question_id ensures that the sum is calculated separately for each question.

  4. (SUM(action = 'show') OVER (PARTITION BY question_id)): This does the same for 'show' actions.

  5. ... / ... AS ratio: This calculates the answer rate and names it ratio.

  6. The rest of the query (SELECT survey_log FROM T ...): This part is similar to Solution 1: it selects the survey_log from the CTE T, orders the results by ratio (descending) and question_id (ascending), and limits the result to one row.

Time Complexity: The time complexity is similar to Solution 1, likely O(N log N) or O(N), dominated by the window function operations.

Space Complexity: The space complexity is also similar to Solution 1, O(M), to store the intermediate results for each unique question. The CTE adds a small constant overhead.

In summary, both solutions effectively solve the problem. Solution 1 is arguably slightly simpler to understand for those less familiar with window functions, while Solution 2 might be considered more elegant and potentially more efficient in some database systems due to the optimized implementation of window functions. The choice depends on personal preference and database system capabilities.