Watch 2 video solutions for Find All Unique Email Domains, a easy level problem involving Database. This walkthrough by Everyday Data Science has 238 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Emails
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to find all unique email domains and count the number of individuals associated with each domain. Consider only those domains that end with .com.
Return the result table orderd by email domains in ascending order.
The result format is in the following example.
Example 1:
Input: Emails table: +-----+-----------------------+ | id | email | +-----+-----------------------+ | 336 | hwkiy@test.edu | | 489 | adcmaf@outlook.com | | 449 | vrzmwyum@yahoo.com | | 95 | tof@test.edu | | 320 | jxhbagkpm@example.org | | 411 | zxcf@outlook.com | +----+------------------------+ Output: +--------------+-------+ | email_domain | count | +--------------+-------+ | outlook.com | 2 | | yahoo.com | 1 | +--------------+-------+ Explanation: - The valid domains ending with ".com" are only "outlook.com" and "yahoo.com", with respective counts of 2 and 1. Output table is ordered by email_domains in ascending order.
Problem Overview: Given a dataset of email addresses, extract the domain part (the text after @) and return the list of unique domains. The task focuses on string extraction and deduplication, which are common operations in database queries and string processing.
Approach 1: Extract Domain + Hash Set (O(n) time, O(n) space)
Iterate through each email and locate the @ character. Everything after this character is the domain. Insert the extracted domain into a hash set to automatically remove duplicates. Each insertion and lookup in the set runs in constant time on average, making the full scan linear. This approach is straightforward in languages like Python where a set handles uniqueness efficiently and pairs naturally with basic hash table operations.
Approach 2: Using SUBSTRING_INDEX + GROUP BY (O(n) time, O(k) space)
In SQL environments such as MySQL, domain extraction can be done directly during the query. The SUBSTRING_INDEX(email, '@', -1) function returns the substring after the @ symbol. Once the domain is extracted, use GROUP BY or DISTINCT to aggregate identical domains. The database engine scans the email column once and groups matching values, giving linear time relative to the number of rows. Space complexity depends on the number of unique domains k, since only grouped domain values are stored.
This SQL-based solution is usually preferred when working directly with relational datasets. It avoids exporting data to application code and lets the database engine handle string parsing and aggregation efficiently. The logic stays compact and leverages built-in functions optimized for column scans.
Recommended for interviews: The domain-extraction + hash set approach demonstrates clear reasoning about string parsing and deduplication. Interviewers typically expect you to split the string at @, store domains in a set, and return the unique results in O(n) time. The SQL SUBSTRING_INDEX method is the production-ready solution when the problem is framed as a database query.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Extract Domain + Hash Set | O(n) | O(n) | General programming solution when processing emails in application code |
| SUBSTRING_INDEX + GROUP BY | O(n) | O(k) | Best for SQL databases where domain extraction and deduplication can happen directly in a query |
| SUBSTRING_INDEX + DISTINCT | O(n) | O(k) | Simpler SQL query when only unique domains are required without aggregation |