{x}
blog image

Queries Quality and Percentage

Table: Queries

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| query_name  | varchar |
| result      | varchar |
| position    | int     |
| rating      | int     |
+-------------+---------+
This table may have duplicate rows.
This table contains information collected from some queries on a database.
The position column has a value from 1 to 500.
The rating column has a value from 1 to 5. Query with rating less than 3 is a poor query.

 

We define query quality as:

The average of the ratio between query rating and its position.

We also define poor query percentage as:

The percentage of all queries with rating less than 3.

Write a solution to find each query_name, the quality and poor_query_percentage.

Both quality and poor_query_percentage should be rounded to 2 decimal places.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Queries table:
+------------+-------------------+----------+--------+
| query_name | result            | position | rating |
+------------+-------------------+----------+--------+
| Dog        | Golden Retriever  | 1        | 5      |
| Dog        | German Shepherd   | 2        | 5      |
| Dog        | Mule              | 200      | 1      |
| Cat        | Shirazi           | 5        | 2      |
| Cat        | Siamese           | 3        | 3      |
| Cat        | Sphynx            | 7        | 4      |
+------------+-------------------+----------+--------+
Output: 
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog        | 2.50    | 33.33                 |
| Cat        | 0.66    | 33.33                 |
+------------+---------+-----------------------+
Explanation: 
Dog queries quality is ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50
Dog queries poor_ query_percentage is (1 / 3) * 100 = 33.33

Cat queries quality equals ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66
Cat queries poor_ query_percentage is (1 / 3) * 100 = 33.33

Solution Explanation:

This problem requires calculating two metrics for each query name in the Queries table: quality and poor_query_percentage.

1. Quality Calculation:

The quality is defined as the average of the ratio between the query's rating and its position. For each query, we need to:

  • Calculate the ratio rating / position for each row.
  • Average these ratios across all rows with the same query_name.

2. Poor Query Percentage Calculation:

The poor_query_percentage represents the percentage of queries with a rating less than 3 for each query_name. We need to:

  • Count the number of queries with rating < 3 for each query_name.
  • Divide this count by the total number of queries for that query_name.
  • Multiply the result by 100 to get the percentage.

SQL Solution:

The SQL solution leverages the power of aggregate functions like AVG, SUM, COUNT, and ROUND. The GROUP BY clause is crucial for performing these calculations separately for each query_name.

MySQL Code:

The provided MySQL code efficiently computes both metrics. Let's break it down:

SELECT
    query_name,
    ROUND(AVG(rating / position), 2) AS quality,
    ROUND(AVG(rating < 3) * 100, 2) AS poor_query_percentage
FROM Queries
WHERE query_name IS NOT NULL
GROUP BY 1;
  • SELECT query_name, ...: This selects the query_name and the calculated metrics.
  • ROUND(AVG(rating / position), 2): This calculates the average of rating / position for each query_name and rounds it to two decimal places. The AVG function computes the average of the ratios.
  • ROUND(AVG(rating < 3) * 100, 2): This is a clever way to compute the percentage of poor queries. The expression rating < 3 evaluates to 1 (true) or 0 (false). AVG then computes the average of these 1s and 0s, which represents the fraction of poor queries. Multiplying by 100 gives the percentage, and ROUND rounds to two decimal places.
  • FROM Queries: This specifies the table to query.
  • WHERE query_name IS NOT NULL: This filters out any rows where query_name is null, ensuring accurate calculations.
  • GROUP BY 1: This groups the results by query_name, so the aggregations are performed for each unique query name (1 refers to the first column in the SELECT statement).

Time Complexity Analysis:

The time complexity of this SQL query is dominated by the GROUP BY operation. In the worst case, if there are n rows in the Queries table, the GROUP BY operation will take O(n) time to group the rows based on query_name. The aggregate functions (AVG, SUM, COUNT) then take linear time proportional to the number of rows within each group. Therefore, the overall time complexity is O(n), where n is the number of rows in the Queries table. The sorting involved in GROUP BY is typically done using efficient algorithms, making the overall time complexity linear on average.

Space Complexity Analysis:

The space complexity depends primarily on the size of the output. In the worst case, if all query_name values are unique, the output will have a size proportional to the number of unique query names in the table. The intermediate space required for the aggregation process also depends on the size of the input, but it is generally considered linear with respect to input size. Hence, the space complexity is O(m) where m is the number of unique query_name values. In the worst-case scenario where every row has a unique query_name, this simplifies to O(n).