Watch the video solution for Find Cities in Each State II, a medium level problem involving Database. This walkthrough by Everyday Data Science has 528 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: cities
+-------------+---------+ | Column Name | Type | +-------------+---------+ | state | varchar | | city | varchar | +-------------+---------+ (state, city) is the combination of columns with unique values for this table. Each row of this table contains the state name and the city name within that state.
Write a solution to find all the cities in each state and analyze them based on the following requirements:
3 cities.Return the result table ordered by the count of matching-letter cities in descending order and then by state name in ascending order.
The result format is in the following example.
Example:
Input:
cities table:
+--------------+---------------+ | state | city | +--------------+---------------+ | New York | New York City | | New York | Newark | | New York | Buffalo | | New York | Rochester | | California | San Francisco | | California | Sacramento | | California | San Diego | | California | Los Angeles | | Texas | Tyler | | Texas | Temple | | Texas | Taylor | | Texas | Dallas | | Pennsylvania | Philadelphia | | Pennsylvania | Pittsburgh | | Pennsylvania | Pottstown | +--------------+---------------+
Output:
+-------------+-------------------------------------------+-----------------------+ | state | cities | matching_letter_count | +-------------+-------------------------------------------+-----------------------+ | Pennsylvania| Philadelphia, Pittsburgh, Pottstown | 3 | | Texas | Dallas, Taylor, Temple, Tyler | 3 | | New York | Buffalo, Newark, New York City, Rochester | 2 | +-------------+-------------------------------------------+-----------------------+
Explanation:
Note:
Problem Overview: You need to group cities by their state and return only the states that satisfy a specific condition based on their cities. The task is fundamentally a database aggregation problem: collect rows per state, compute a metric such as count or list of cities, and filter the groups that meet the requirement.
Approach 1: GROUP BY + HAVING Aggregation (O(n) time, O(k) space)
The most direct SQL solution groups rows using GROUP BY state. Aggregation functions such as COUNT(), GROUP_CONCAT(), or similar operations summarize the cities belonging to each state. After grouping, apply a filter using HAVING to keep only states that meet the required condition (for example, a minimum number of cities). This works because HAVING filters aggregated groups after the grouping phase, unlike WHERE which filters individual rows.
In practice the query scans the table once, builds grouped aggregates, and discards groups that fail the condition. The time complexity is O(n) where n is the number of rows in the cities table. Space complexity is O(k) where k is the number of distinct states being tracked during aggregation. This pattern appears frequently in SQL and database interview problems.
Approach 2: Window Function + Filtering (O(n) time, O(n) space)
Another option uses window functions. Compute a per‑state metric with a window expression such as COUNT(*) OVER (PARTITION BY state). This attaches the state-level statistic to every row while preserving the original table structure. After computing the window value, filter rows using the required condition (for example, keeping rows where the state count passes the threshold).
This approach is useful when you still need row-level data after computing group statistics. Instead of collapsing rows like GROUP BY, window functions keep the detailed records. The database still scans the dataset once, so time complexity remains O(n), though intermediate storage can reach O(n) depending on the engine. Window functions are common in analytical SQL workflows and problems involving ranking, partitions, or per-group metrics.
Approach 3: Pandas groupby + filter (O(n) time, O(k) space)
In a Pandas environment, the same logic translates directly using groupby(). First group the dataframe by the state column. Then apply aggregation such as counting or collecting city names, followed by a filter() or boolean condition to keep only the qualifying states. The Pandas engine performs the grouping in linear time with respect to the dataset size.
This mirrors the SQL approach conceptually and is common in data analysis workflows involving Pandas dataframes.
Recommended for interviews: The GROUP BY + HAVING solution is the expected answer. It demonstrates that you understand relational aggregation and how SQL filters aggregated groups. Window functions show deeper SQL knowledge, but the grouped aggregation approach is simpler, faster to write, and typically preferred in database interviews.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| GROUP BY + HAVING Aggregation | O(n) | O(k) | Best general SQL solution for grouping rows by state and filtering aggregated results |
| Window Function + Partition | O(n) | O(n) | Useful when you need per-row data along with state-level statistics |
| Pandas groupby + filter | O(n) | O(k) | Data analysis workflows using Python dataframes instead of SQL |