Watch the video solution for Find Candidates for Data Scientist Position, a easy level problem involving Database. This walkthrough by Everyday Data Science has 627 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Candidates
+--------------+---------+ | Column Name | Type | +--------------+---------+ | candidate_id | int | | skill | varchar | +--------------+---------+ (candidate_id, skill) is the primary key (columns with unique values) for this table. Each row includes candidate_id and skill.
Write a query to find the candidates best suited for a Data Scientist position. The candidate must be proficient in Python, Tableau, and PostgreSQL.
Return the result table ordered by candidate_id in ascending order.
The result format is in the following example.
Example 1:
Input: Candidates table: +---------------+--------------+ | candidate_id | skill | +---------------+--------------+ | 123 | Python | | 234 | R | | 123 | Tableau | | 123 | PostgreSQL | | 234 | PowerBI | | 234 | SQL Server | | 147 | Python | | 147 | Tableau | | 147 | Java | | 147 | PostgreSQL | | 256 | Tableau | | 102 | DataAnalysis | +---------------+--------------+ Output: +--------------+ | candidate_id | +--------------+ | 123 | | 147 | +--------------+ Explanation: - Candidates 123 and 147 possess the necessary skills in Python, Tableau, and PostgreSQL for the data scientist position. - Candidates 234 and 102 do not possess any of the required skills for this position. - Candidate 256 has proficiency in Tableau but is missing skills in Python and PostgreSQL. The output table is sorted by candidate_id in ascending order.
Problem Overview: You are given a table of candidate skills where each row represents a candidate and a single skill they possess. The goal is to return the IDs of candidates who qualify for a Data Scientist role by having all required skills: Python, Tableau, and PostgreSQL.
Approach 1: Multiple EXISTS Subqueries (O(n) time, O(1) space)
One direct approach checks each required skill using separate EXISTS conditions. For every candidate, run three correlated subqueries that verify whether rows exist for Python, Tableau, and PostgreSQL. If all three conditions evaluate to true, the candidate qualifies. This approach is straightforward and mirrors the logical requirement exactly. However, it performs repeated scans of the same table segments, which can become inefficient as the dataset grows.
Approach 2: Self Join on Skill Filters (O(n) time, O(1) space)
Another option joins the table with itself three times, each filtered to one required skill. For example, one alias filters rows with Python, another with Tableau, and the third with PostgreSQL. Joining them on candidate_id ensures the candidate appears in all three skill sets. This approach works well conceptually but introduces extra joins, which increases query complexity and can slow execution on larger tables.
Approach 3: Conditional Filtering + GROUP BY Aggregation (O(n) time, O(1) space)
The most efficient solution filters rows to only the required skills, then groups by candidate_id. Using GROUP BY with COUNT(DISTINCT skill) lets you verify whether a candidate has all required skills. If the count equals 3, that candidate possesses every required skill. This approach scans the table once, aggregates results per candidate, and avoids redundant joins or subqueries. It relies on core SQL aggregation patterns commonly used in SQL and aggregation problems.
Example idea:
SELECT candidate_id
FROM Candidates
WHERE skill IN ('Python','Tableau','PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(DISTINCT skill) = 3;
This pattern—filter first, then aggregate—is common in database interview questions involving qualification checks or requirement matching.
Recommended for interviews: The conditional filtering with GROUP BY aggregation is the expected solution. It demonstrates strong SQL fundamentals and efficient thinking. Simpler approaches like multiple EXISTS checks show understanding of the requirement, but the aggregation approach proves you can transform row-level data into candidate-level statistics efficiently.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Multiple EXISTS Subqueries | O(n) | O(1) | When expressing requirements directly with logical checks for each skill |
| Self Join on Skill Filters | O(n) | O(1) | When explicitly combining filtered skill sets using joins |
| Conditional Filtering + GROUP BY Aggregation | O(n) | O(1) | Best general solution for SQL interviews; scans once and validates skills using aggregation |