{x}
blog image

Average Time of Process per Machine

Table: Activity

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| machine_id     | int     |
| process_id     | int     |
| activity_type  | enum    |
| timestamp      | float   |
+----------------+---------+
The table shows the user activities for a factory website.
(machine_id, process_id, activity_type) is the primary key (combination of columns with unique values) of this table.
machine_id is the ID of a machine.
process_id is the ID of a process running on the machine with ID machine_id.
activity_type is an ENUM (category) of type ('start', 'end').
timestamp is a float representing the current time in seconds.
'start' means the machine starts the process at the given timestamp and 'end' means the machine ends the process at the given timestamp.
The 'start' timestamp will always be before the 'end' timestamp for every (machine_id, process_id) pair.
It is guaranteed that each (machine_id, process_id) pair has a 'start' and 'end' timestamp.

 

There is a factory website that has several machines each running the same number of processes. Write a solution to find the average time each machine takes to complete a process.

The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run.

The resulting table should have the machine_id along with the average time as processing_time, which should be rounded to 3 decimal places.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Activity table:
+------------+------------+---------------+-----------+
| machine_id | process_id | activity_type | timestamp |
+------------+------------+---------------+-----------+
| 0          | 0          | start         | 0.712     |
| 0          | 0          | end           | 1.520     |
| 0          | 1          | start         | 3.140     |
| 0          | 1          | end           | 4.120     |
| 1          | 0          | start         | 0.550     |
| 1          | 0          | end           | 1.550     |
| 1          | 1          | start         | 0.430     |
| 1          | 1          | end           | 1.420     |
| 2          | 0          | start         | 4.100     |
| 2          | 0          | end           | 4.512     |
| 2          | 1          | start         | 2.500     |
| 2          | 1          | end           | 5.000     |
+------------+------------+---------------+-----------+
Output: 
+------------+-----------------+
| machine_id | processing_time |
+------------+-----------------+
| 0          | 0.894           |
| 1          | 0.995           |
| 2          | 1.456           |
+------------+-----------------+
Explanation: 
There are 3 machines running 2 processes each.
Machine 0's average time is ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894
Machine 1's average time is ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995
Machine 2's average time is ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456

Solution Explanation for LeetCode 1661: Average Time of Process per Machine

This problem requires calculating the average time each machine takes to complete a process. The Activity table contains start and end timestamps for each process on each machine. The solution involves several steps:

  1. Grouping by Machine: We need to group the data by machine_id to calculate the average processing time for each machine independently.

  2. Calculating Process Time: For each machine_id, we need to find the difference between the end timestamp and the start timestamp for each process. This represents the processing time for that specific process.

  3. Averaging Process Times: After calculating the processing time for each process on a machine, we need to average these times to get the average processing time for that machine.

  4. Rounding the Result: The final result should be rounded to 3 decimal places.

SQL Solutions

Both Solution 1 and Solution 2 achieve the same result using different SQL techniques. Let's break them down:

Solution 1: Using CASE WHEN

This solution uses a CASE WHEN statement to assign positive and negative values to timestamps based on the activity_type. Start timestamps are negated, and end timestamps are kept positive. This allows us to compute the difference using AVG directly:

SELECT
    machine_id,
    ROUND(
        AVG(
            CASE
                WHEN activity_type = 'start' THEN -timestamp
                ELSE timestamp
            END
        ) * 2,  -- Multiply by 2 because each machine has 2 timestamps (start and end) for a single process
        3
    ) AS processing_time
FROM Activity
GROUP BY 1;

Solution 2: Using IF

This solution employs the IF function, which is a more concise alternative to CASE WHEN in MySQL. It achieves the same result of assigning positive and negative values to timestamps based on activity_type:

SELECT
    machine_id,
    ROUND(AVG(IF(activity_type = 'start', -1, 1) * timestamp) * 2, 3) AS processing_time
FROM Activity
GROUP BY 1;

Both solutions leverage the power of SQL's aggregate functions (AVG, ROUND) and conditional logic (CASE WHEN or IF) to efficiently calculate the desired average processing time for each machine. The multiplication by 2 accounts for having one start and one end record per process.

Time Complexity Analysis

The time complexity of both SQL solutions is dominated by the GROUP BY operation. The complexity of GROUP BY is generally considered O(N log N) or O(N) depending on the database engine's implementation and the size of the data. The other operations (AVG, ROUND, CASE WHEN/IF) have linear time complexity, O(N), where N is the number of rows in the Activity table. Therefore, the overall time complexity is O(N log N) or O(N). The exact complexity depends on the database implementation.

Space Complexity Analysis

The space complexity is primarily determined by the space used to store the intermediate results during the GROUP BY operation. In the worst case, this could be proportional to the number of unique machine_id values. Therefore, the space complexity is O(M), where M is the number of unique machines. In practice, the space used would likely be closer to O(1) compared to the total input size if the number of unique machines is relatively small in comparison to the total number of rows.