Watch 2 video solutions for Find the Subtasks That Did Not Execute, a hard level problem involving Database. This walkthrough by Everyday Data Science has 2,343 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Tasks
+----------------+---------+ | Column Name | Type | +----------------+---------+ | task_id | int | | subtasks_count | int | +----------------+---------+ task_id is the column with unique values for this table. Each row in this table indicates that task_id was divided into subtasks_count subtasks labeled from 1 to subtasks_count. It is guaranteed that 2 <= subtasks_count <= 20.
Table: Executed
+---------------+---------+ | Column Name | Type | +---------------+---------+ | task_id | int | | subtask_id | int | +---------------+---------+ (task_id, subtask_id) is the combination of columns with unique values for this table. Each row in this table indicates that for the task task_id, the subtask with ID subtask_id was executed successfully. It is guaranteed that subtask_id <= subtasks_count for each task_id.
Write a solution to report the IDs of the missing subtasks for each task_id.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Tasks table: +---------+----------------+ | task_id | subtasks_count | +---------+----------------+ | 1 | 3 | | 2 | 2 | | 3 | 4 | +---------+----------------+ Executed table: +---------+------------+ | task_id | subtask_id | +---------+------------+ | 1 | 2 | | 3 | 1 | | 3 | 2 | | 3 | 3 | | 3 | 4 | +---------+------------+ Output: +---------+------------+ | task_id | subtask_id | +---------+------------+ | 1 | 1 | | 1 | 3 | | 2 | 1 | | 2 | 2 | +---------+------------+ Explanation: Task 1 was divided into 3 subtasks (1, 2, 3). Only subtask 2 was executed successfully, so we include (1, 1) and (1, 3) in the answer. Task 2 was divided into 2 subtasks (1, 2). No subtask was executed successfully, so we include (2, 1) and (2, 2) in the answer. Task 3 was divided into 4 subtasks (1, 2, 3, 4). All of the subtasks were executed successfully.
Problem Overview: Each task has a subtasks_count, meaning subtasks are numbered from 1 to subtasks_count. The Executed table records which subtasks actually ran. Your job is to return every (task_id, subtask_id) pair that should exist but does not appear in the execution log.
Approach 1: Recursive Table Generation + LEFT JOIN (O(T * S) time, O(T * S) space)
This approach constructs the complete list of expected subtasks first, then removes the ones that executed. Use a recursive common table expression (CTE) to generate subtask numbers from 1 up to the maximum subtasks_count. Join this generated sequence with the Tasks table so each task expands into its valid subtask range. This effectively creates the full expected set of (task_id, subtask_id) pairs.
Next, perform a LEFT JOIN between this generated dataset and the Executed table on both task_id and subtask_id. Rows where the execution side is NULL represent subtasks that never ran. This pattern—generate the full search space and filter missing rows—is common in SQL and database problems involving gaps or missing records.
The key insight is that the database does not store the missing subtasks explicitly. You must derive the expected rows first. Recursive CTEs make this possible by iteratively building a numeric sequence inside the query. The complexity is proportional to the total number of generated subtasks across all tasks. For typical constraints this remains efficient because the generation is linear.
This solution also keeps the logic entirely in SQL. No procedural loops are required. Databases optimize joins and filtering well, so the query stays readable and performant.
Recommended for interviews: Interviewers expect a solution that first generates the complete subtask range and then detects missing entries with a LEFT JOIN. Recursive CTE generation shows strong understanding of recursion in SQL and how to model implicit data. Simpler brute-force enumeration demonstrates the idea, but the recursive CTE version is the clean and scalable approach typically discussed in system design and database interviews.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Manual Number Table + LEFT JOIN | O(T * S) | O(S) | When a prebuilt numbers table already exists in the database |
| Recursive CTE Generation + LEFT JOIN | O(T * S) | O(T * S) | General SQL solution when no numbers table exists; commonly used in MySQL and interview settings |