Find Candidates for Data Scientist Position II - Video Solutions

MediumDatabase

Leetcode MEDIUM 3278 - Candidates for Data Scientist Position II - SQL | Everyday Data Science

13:26808 views

1 video solution available

Find Candidates for Data Scientist Position II - Video Solution

Watch the video solution for Find Candidates for Data Scientist Position II, a medium level problem involving Database. This walkthrough by Everyday Data Science has 808 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.

Problem Statement

Table: Candidates

+--------------+---------+ 
| Column Name  | Type    | 
+--------------+---------+ 
| candidate_id | int     | 
| skill        | varchar |
| proficiency  | int     |
+--------------+---------+
(candidate_id, skill) is the unique key for this table.
Each row includes candidate_id, skill, and proficiency level (1-5).

Table: Projects

+--------------+---------+ 
| Column Name  | Type    | 
+--------------+---------+ 
| project_id   | int     | 
| skill        | varchar |
| importance   | int     |
+--------------+---------+
(project_id, skill) is the primary key for this table.
Each row includes project_id, required skill, and its importance (1-5) for the project.

Leetcode is staffing for multiple data science projects. Write a solution to find the best candidate for each project based on the following criteria:

Candidates must have all the skills required for a project.
Calculate a score for each candidate-project pair as follows:
- Start with 100 points
- Add 10 points for each skill where proficiency > importance
- Subtract 5 points for each skill where proficiency < importance
- If the candidate's skill proficiency equal to the project's skill importance, the score remains unchanged

Include only the top candidate (highest score) for each project. If there’s a tie, choose the candidate with the lower candidate_id. If there is no suitable candidate for a project, do not return that project.

Return a result table ordered by project_id in ascending order.

The result format is in the following example.

Example:

Input:

Candidates table:

+--------------+-----------+-------------+
| candidate_id | skill     | proficiency |
+--------------+-----------+-------------+
| 101          | Python    | 5           |
| 101          | Tableau   | 3           |
| 101          | PostgreSQL| 4           |
| 101          | TensorFlow| 2           |
| 102          | Python    | 4           |
| 102          | Tableau   | 5           |
| 102          | PostgreSQL| 4           |
| 102          | R         | 4           |
| 103          | Python    | 3           |
| 103          | Tableau   | 5           |
| 103          | PostgreSQL| 5           |
| 103          | Spark     | 4           |
+--------------+-----------+-------------+

Projects table:

+-------------+-----------+------------+
| project_id  | skill     | importance |
+-------------+-----------+------------+
| 501         | Python    | 4          |
| 501         | Tableau   | 3          |
| 501         | PostgreSQL| 5          |
| 502         | Python    | 3          |
| 502         | Tableau   | 4          |
| 502         | R         | 2          |
+-------------+-----------+------------+

Output:

+-------------+--------------+-------+
| project_id  | candidate_id | score |
+-------------+--------------+-------+
| 501         | 101          | 105   |
| 502         | 102          | 130   |
+-------------+--------------+-------+

Explanation:

For Project 501, Candidate 101 has the highest score of 105. All other candidates have the same score but Candidate 101 has the lowest candidate_id among them.
For Project 502, Candidate 102 has the highest score of 130.

The output table is ordered by project_id in ascending order.

Read full problem with examples

Approach Overview

Problem Overview: The task is to identify candidates who qualify for a Data Scientist role based on information spread across multiple database tables. You combine candidate records with related evaluation or skill data, compute statistics per candidate, and return only those who meet the required criteria.

Approach 1: Basic Join + Group By Filtering (O(n log n) time, O(n) space)

Start with a straightforward INNER JOIN between the candidate table and the related evaluation or skills table using an equi‑join condition on the candidate identifier. After joining, aggregate the rows with GROUP BY candidate_id. Use aggregate functions such as COUNT(), AVG(), or conditional sums to measure whether each candidate satisfies the hiring requirements. The HAVING clause filters candidates whose aggregated statistics meet the threshold. This approach works well when the rules depend only on grouped statistics and not on relative ranking between candidates.

Approach 2: Equi‑Join + Group Statistics + Window Function (O(n log n) time, O(n) space)

Join the relevant tables using an equi‑join on the candidate identifier, then compute aggregated statistics such as total evaluations or skill counts per candidate. After the grouping stage, apply a window function like RANK(), DENSE_RANK(), or ROW_NUMBER() over the aggregated results. Window functions allow you to compare candidates relative to others without collapsing rows further. This is useful when the problem requires selecting top candidates per metric, enforcing ordering constraints, or ensuring candidates meet multiple statistical conditions. MySQL supports this cleanly with subqueries or CTEs where the inner query performs aggregation and the outer query applies the window calculation.

Approach 3: Conditional Aggregation with Derived Table (O(n log n) time, O(n) space)

Another variation is to compute all required statistics in a derived table first. Use conditional aggregation such as SUM(CASE WHEN condition THEN 1 END) to track how many times each candidate satisfies different criteria. The derived table returns one row per candidate with all computed metrics. The outer query then filters candidates based on these metrics and optionally orders them by performance indicators. This pattern keeps the query readable when the evaluation logic involves multiple conditions.

These strategies rely heavily on relational database fundamentals such as database querying, efficient SQL joins, and analytical window functions. The core idea is always the same: join relevant data, compute statistics per candidate, then filter or rank results.

Recommended for interviews: The equi‑join with grouped statistics followed by a window function is the most expressive and closest to what interviewers expect for modern SQL problems. Showing the simpler GROUP BY solution demonstrates understanding of aggregation, while the window‑function approach shows you can handle ranking and advanced analytical queries.

Complexity Analysis

Approach	Time	Space	When to Use
Basic Join + Group By Filtering	O(n log n)	O(n)	When requirements only involve aggregated statistics per candidate
Equi-Join + Group Statistics + Window Function	O(n log n)	O(n)	When candidates must be ranked or compared using analytical metrics
Conditional Aggregation with Derived Table	O(n log n)	O(n)	When multiple filtering conditions must be calculated in a single aggregation pass

Continue Learning

Practice this problem

Solve it on FleetCode

Read Solution & Explanation

Detailed text walkthrough