{x}
blog image

Find the Subtasks That Did Not Execute

Solution Explanation: Finding Missing Subtasks

This problem involves two tables: Tasks and Executed. Tasks defines the total number of subtasks for each task, while Executed records which subtasks have been completed. The goal is to identify and report the subtasks that were not executed for each task.

Approach: Recursive CTE (Common Table Expression) and Left Join

The most efficient approach uses a recursive CTE to generate all possible subtasks for each task and then performs a left join with the Executed table to find the missing ones.

1. Recursive CTE (T):

The CTE T recursively generates all subtask IDs for each task.

  • Base Case: It starts by selecting task_id and subtasks_count from the Tasks table. subtasks_count initially represents the highest subtask ID for that task.

  • Recursive Step: It then recursively subtracts 1 from subtask_id until it reaches 1. This generates all subtask IDs (from 1 to subtasks_count) for each task.

2. Left Join:

A LEFT JOIN is performed between the CTE T (containing all possible subtask IDs) and the Executed table. This join uses task_id and subtask_id as the join keys.

3. Filtering Missing Subtasks:

Finally, a WHERE clause filters the results. Executed.subtask_id IS NULL selects only those rows where a corresponding entry is not found in the Executed table, indicating that the subtask was not executed.

MySQL Code:

WITH RECURSIVE
    T(task_id, subtask_id) AS (
        SELECT
            task_id,
            subtasks_count
        FROM Tasks
        UNION ALL
        SELECT
            task_id,
            subtask_id - 1
        FROM t
        WHERE subtask_id > 1
    )
SELECT
    T.*
FROM
    T
    LEFT JOIN Executed USING (task_id, subtask_id)
WHERE Executed.subtask_id IS NULL;

Time Complexity Analysis

  • Recursive CTE: The time complexity of generating the CTE T is O(N*M), where N is the number of tasks and M is the maximum number of subtasks for any single task (in this case, M is capped at 20). This is because each task may require up to M recursive steps.

  • Left Join: The left join has a time complexity of O(N*M) in the worst case (if every subtask from T is present in the Executed table) because of its nested-loops nature. However, the average performance can be better due to the use of indexes on task_id and subtask_id.

  • Filtering: The filtering step has a linear time complexity O(N*M) with respect to the size of the intermediate result of the left join.

Therefore, the overall time complexity of the solution is dominated by the CTE generation and left join which is approximately O(N*M). Since M is a small constant (20), we can consider the complexity to be effectively linear with respect to the number of tasks (N).

Space Complexity Analysis

The space complexity is determined primarily by the size of the intermediate results generated by the CTE and the left join. In the worst case, the CTE could generate NM rows, and the left join could also result in an intermediate table of similar size. Therefore, the space complexity is also approximately **O(NM)**. Again, considering M as a small constant, the space complexity can be considered effectively linear with respect to N.