Table: Courses
+-------------+---------+ | Column Name | Type | +-------------+---------+ | student | varchar | | class | varchar | +-------------+---------+ (student, class) is the primary key (combination of columns with unique values) for this table. Each row of this table indicates the name of a student and the class in which they are enrolled.
Write a solution to find all the classes that have at least five students.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Courses table: +---------+----------+ | student | class | +---------+----------+ | A | Math | | B | English | | C | Math | | D | Biology | | E | Math | | F | Computer | | G | Math | | H | Math | | I | Math | +---------+----------+ Output: +---------+ | class | +---------+ | Math | +---------+ Explanation: - Math has 6 students, so we include it. - English has 1 student, so we do not include it. - Biology has 1 student, so we do not include it. - Computer has 1 student, so we do not include it.
In #596 Classes More Than 5 Students, the goal is to identify classes that have at least five students enrolled. The key idea is to aggregate records by class and count how many students belong to each group.
The typical SQL approach uses the GROUP BY clause to group rows based on the class column. After grouping, an aggregate function such as COUNT() is applied to determine the number of students in each class. To filter only those classes meeting the requirement, the HAVING clause is used with a condition like COUNT(DISTINCT student) >= 5.
This method efficiently processes the dataset in a single aggregation step. It avoids unnecessary subqueries and directly filters aggregated results. In most database engines, this operation performs a full scan followed by grouping, making it efficient for typical interview-scale datasets.
Overall, the solution demonstrates how aggregation and filtering work together in SQL queries.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| GROUP BY with HAVING and COUNT | O(n) | O(k) |
Frederik Müller
The simplest solution to this problem is to exploit SQL's GROUP BY feature alongside the COUNT() function. We'll group students by their class, count the number of students in each group, and then filter those results to include only classes with five or more students.
Time Complexity: O(n), where n is the number of rows in the table, because we need to check each row to perform the GROUP BY operation.
Space Complexity: O(k), where k is the number of unique classes in the output.
1SELECT class FROM Courses GROUP BY class HAVING COUNT(student) >= 5;This SQL query begins with selecting the class from the Courses table. Then, we group the entries by class and use the COUNT() function to count the number of students associated with each class. Finally, we use HAVING to filter down to only those classes where the student count is at least five.
In languages like C, C++, Java, Python, C#, and JavaScript, we can simulate the SQL behavior with a hashmap (dictionary in Python, or an object in JavaScript). We iterate through the data, count the occurrences of each class, and then extract the classes that have counts of five or more.
Time Complexity: O(n), where n is the number of entries in the courses list, as we iterate through all entries once.
Space Complexity: O(k), the number of distinct classes, because of the hashmap.
1def find_large_classes(courses):
2
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
Yes, SQL aggregation problems like this are commonly asked in data-focused or backend interviews at large tech companies. They test understanding of GROUP BY, filtering aggregated results, and handling duplicates correctly.
The optimal approach uses SQL aggregation with GROUP BY on the class column and a HAVING clause to filter groups with at least five students. Using COUNT(DISTINCT student) ensures each student is counted once. This keeps the query concise and efficient.
The main SQL concepts used are GROUP BY, COUNT, and HAVING. GROUP BY groups rows by class, COUNT calculates the number of students in each class, and HAVING filters the groups based on the required condition.
Conceptually, the problem resembles using a hash map where the key is the class and the value is the number of students. SQL databases internally perform similar grouping operations during aggregation.
We use a defaultdict from Python's collections module to store each class's student count. As we iterate through the list of tuples courses, we increment the count for the class each student is enrolled in. Finally, we build a result list of classes that have a count of at least five.