Find Cities in Each State II - Video Solutions

MediumDatabase

Leetcode MEDIUM 3328 - Find Cities in Each State II - GROUP_CONCAT in SQL | Everyday Data Science

7:58528 views

1 video solution available

Find Cities in Each State II - Video Solution

Watch the video solution for Find Cities in Each State II, a medium level problem involving Database. This walkthrough by Everyday Data Science has 528 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.

Problem Statement

Table: cities

+-------------+---------+
| Column Name | Type    | 
+-------------+---------+
| state       | varchar |
| city        | varchar |
+-------------+---------+
(state, city) is the combination of columns with unique values for this table.
Each row of this table contains the state name and the city name within that state.

Write a solution to find all the cities in each state and analyze them based on the following requirements:

Combine all cities into a comma-separated string for each state.
Only include states that have at least 3 cities.
Only include states where at least one city starts with the same letter as the state name.

Return the result table ordered by the count of matching-letter cities in descending order and then by state name in ascending order.

The result format is in the following example.

Example:

Input:

cities table:

+--------------+---------------+
| state        | city          |
+--------------+---------------+
| New York     | New York City |
| New York     | Newark        |
| New York     | Buffalo       |
| New York     | Rochester     |
| California   | San Francisco |
| California   | Sacramento    |
| California   | San Diego     |
| California   | Los Angeles   |
| Texas        | Tyler         |
| Texas        | Temple        |
| Texas        | Taylor        |
| Texas        | Dallas        |
| Pennsylvania | Philadelphia  |
| Pennsylvania | Pittsburgh    |
| Pennsylvania | Pottstown     |
+--------------+---------------+

Output:

+-------------+-------------------------------------------+-----------------------+
| state       | cities                                    | matching_letter_count |
+-------------+-------------------------------------------+-----------------------+
| Pennsylvania| Philadelphia, Pittsburgh, Pottstown       | 3                     |
| Texas       | Dallas, Taylor, Temple, Tyler             | 3                     |
| New York    | Buffalo, Newark, New York City, Rochester | 2                     |
+-------------+-------------------------------------------+-----------------------+

Explanation:

Pennsylvania:
- Has 3 cities (meets minimum requirement)
- All 3 cities start with 'P' (same as state)
- matching_letter_count = 3
Texas:
- Has 4 cities (meets minimum requirement)
- 3 cities (Taylor, Temple, Tyler) start with 'T' (same as state)
- matching_letter_count = 3
New York:
- Has 4 cities (meets minimum requirement)
- 2 cities (Newark, New York City) start with 'N' (same as state)
- matching_letter_count = 2
California is not included in the output because:
- Although it has 4 cities (meets minimum requirement)
- No cities start with 'C' (doesn't meet the matching letter requirement)

Note:

Results are ordered by matching_letter_count in descending order
When matching_letter_count is the same (Texas and New York both have 2), they are ordered by state name alphabetically
Cities in each row are ordered alphabetically

Read full problem with examples

Approach Overview

Problem Overview: You need to group cities by their state and return only the states that satisfy a specific condition based on their cities. The task is fundamentally a database aggregation problem: collect rows per state, compute a metric such as count or list of cities, and filter the groups that meet the requirement.

Approach 1: GROUP BY + HAVING Aggregation (O(n) time, O(k) space)

The most direct SQL solution groups rows using GROUP BY state. Aggregation functions such as COUNT(), GROUP_CONCAT(), or similar operations summarize the cities belonging to each state. After grouping, apply a filter using HAVING to keep only states that meet the required condition (for example, a minimum number of cities). This works because HAVING filters aggregated groups after the grouping phase, unlike WHERE which filters individual rows.

In practice the query scans the table once, builds grouped aggregates, and discards groups that fail the condition. The time complexity is O(n) where n is the number of rows in the cities table. Space complexity is O(k) where k is the number of distinct states being tracked during aggregation. This pattern appears frequently in SQL and database interview problems.

Approach 2: Window Function + Filtering (O(n) time, O(n) space)

Another option uses window functions. Compute a per‑state metric with a window expression such as COUNT(*) OVER (PARTITION BY state). This attaches the state-level statistic to every row while preserving the original table structure. After computing the window value, filter rows using the required condition (for example, keeping rows where the state count passes the threshold).

This approach is useful when you still need row-level data after computing group statistics. Instead of collapsing rows like GROUP BY, window functions keep the detailed records. The database still scans the dataset once, so time complexity remains O(n), though intermediate storage can reach O(n) depending on the engine. Window functions are common in analytical SQL workflows and problems involving ranking, partitions, or per-group metrics.

Approach 3: Pandas groupby + filter (O(n) time, O(k) space)

In a Pandas environment, the same logic translates directly using groupby(). First group the dataframe by the state column. Then apply aggregation such as counting or collecting city names, followed by a filter() or boolean condition to keep only the qualifying states. The Pandas engine performs the grouping in linear time with respect to the dataset size.

This mirrors the SQL approach conceptually and is common in data analysis workflows involving Pandas dataframes.

Recommended for interviews: The GROUP BY + HAVING solution is the expected answer. It demonstrates that you understand relational aggregation and how SQL filters aggregated groups. Window functions show deeper SQL knowledge, but the grouped aggregation approach is simpler, faster to write, and typically preferred in database interviews.

Complexity Analysis

Approach	Time	Space	When to Use
GROUP BY + HAVING Aggregation	O(n)	O(k)	Best general SQL solution for grouping rows by state and filtering aggregated results
Window Function + Partition	O(n)	O(n)	Useful when you need per-row data along with state-level statistics
Pandas groupby + filter	O(n)	O(k)	Data analysis workflows using Python dataframes instead of SQL

Continue Learning

Practice this problem

Solve it on FleetCode

Read Solution & Explanation

Detailed text walkthrough