Watch the video solution for Employees Project Allocation, a hard level problem involving Database. This walkthrough by Everyday Data Science has 1,011 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Project
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| project_id | int |
| employee_id | int |
| workload | int |
+-------------+---------+
employee_id is the primary key (column with unique values) of this table.
employee_id is a foreign key (reference column) to Employee table.
Each row of this table indicates that the employee with employee_id is working on the project with project_id and the workload of the project.
Table: Employees
+------------------+---------+ | Column Name | Type | +------------------+---------+ | employee_id | int | | name | varchar | | team | varchar | +------------------+---------+ employee_id is the primary key (column with unique values) of this table. Each row of this table contains information about one employee.
Write a solution to find the employees who are allocated to projects with a workload that exceeds the average workload of all employees for their respective teams
Return the result table ordered by employee_id, project_id in ascending order.
The result format is in the following example.
Example 1:
Input: Project table: +-------------+-------------+----------+ | project_id | employee_id | workload | +-------------+-------------+----------+ | 1 | 1 | 45 | | 1 | 2 | 90 | | 2 | 3 | 12 | | 2 | 4 | 68 | +-------------+-------------+----------+ Employees table: +-------------+--------+------+ | employee_id | name | team | +-------------+--------+------+ | 1 | Khaled | A | | 2 | Ali | B | | 3 | John | B | | 4 | Doe | A | +-------------+--------+------+ Output: +-------------+------------+---------------+------------------+ | employee_id | project_id | employee_name | project_workload | +-------------+------------+---------------+------------------+ | 2 | 1 | Ali | 90 | | 4 | 2 | Doe | 68 | +-------------+------------+---------------+------------------+ Explanation: - Employee with ID 1 has a project workload of 45 and belongs to Team A, where the average workload is 56.50. Since his project workload does not exceed the team's average workload, he will be excluded. - Employee with ID 2 has a project workload of 90 and belongs to Team B, where the average workload is 51.00. Since his project workload does exceed the team's average workload, he will be included. - Employee with ID 3 has a project workload of 12 and belongs to Team B, where the average workload is 51.00. Since his project workload does not exceed the team's average workload, he will be excluded. - Employee with ID 4 has a project workload of 68 and belongs to Team A, where the average workload is 56.50. Since his project workload does exceed the team's average workload, he will be included. Result table orderd by employee_id, project_id in ascending order.
Problem Overview: You are given employee and project records and must determine the correct project allocations based on aggregated constraints. The task typically requires computing statistics (counts or aggregates) for groups of records and then matching those results back to the base tables to identify valid employee–project assignments.
Approach 1: Grouping Statistics + Equi-Join (O(n) time, O(g) space)
The core idea is to compute per-group statistics using GROUP BY, then join the aggregated result back to the original table using an equi-join on the grouping key. First, aggregate the dataset to compute metrics such as the number of employees per project, allocation counts per employee, or other constraints defined in the problem. This aggregation produces a compact intermediate result with one row per group. Next, perform an equi-join between this aggregated table and the base employee/project table so you can filter rows that satisfy the required allocation condition. In SQL, this usually appears as a subquery or CTE joined on keys like project_id or employee_id. Time complexity is O(n) for scanning and grouping the dataset, and space complexity is O(g), where g is the number of groups.
This pattern shows up frequently in database interview problems. The aggregation step extracts global information (like counts or maximum values), while the join step maps those results back to individual rows.
Approach 2: Window Functions (O(n) time, O(1) extra space)
Another option uses window functions such as COUNT() OVER(PARTITION BY ...) or SUM() OVER(PARTITION BY ...). Instead of building a separate aggregated table, the database calculates the group statistic for each row dynamically. You then filter rows using that computed value. This removes the need for an explicit join while preserving the same logical flow. Window functions are often easier to read and can perform well on modern query engines. The overall complexity remains O(n) because the database still scans the dataset once and computes partitioned aggregates.
Window functions are common in advanced SQL interview questions and are useful when you want group-level statistics while still keeping row-level detail.
Recommended for interviews: The grouping + equi-join solution is the most widely accepted approach because it clearly separates aggregation from filtering. Interviewers expect you to recognize the pattern: compute statistics with GROUP BY, then join the result back to the original dataset. Demonstrating both the aggregation logic and the join shows strong understanding of relational operations and data aggregation patterns.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Grouping Statistics + Equi-Join | O(n) | O(g) | Standard SQL solution when you need aggregated metrics and must map them back to original rows |
| Window Functions | O(n) | O(1) | When the database supports analytic functions and you want a cleaner query without subqueries |