Watch 10 video solutions for Duplicate Emails, a easy level problem involving Database. This walkthrough by Frederik Müller has 20,369 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.
Problem Overview: You are given a Person table containing email addresses. The goal is to return all email values that appear more than once. Each duplicate email should appear only once in the output.
This problem tests basic database querying skills. You need to identify repeated values in a column, which is a common pattern in database interviews and production analytics queries.
Approach 1: Using GROUP BY and HAVING Clause (O(n) time, O(k) space)
This is the most common SQL solution. You group rows by the email column and count how many times each email appears. The HAVING clause filters grouped results where the count is greater than 1. Internally the database scans the table and aggregates counts per unique email. This approach is concise, efficient, and exactly how SQL engines are designed to detect duplicates.
Typical query pattern:
SELECT email FROM Person GROUP BY email HAVING COUNT(email) > 1;
The time complexity is O(n) for scanning and grouping rows. Space complexity is O(k), where k is the number of distinct emails stored during aggregation.
Approach 2: Using Temporary Tables (O(n) time, O(k) space)
Another SQL approach stores aggregated counts in a temporary table. First, compute email frequencies using GROUP BY. Then query the temporary table to filter rows with counts greater than one. This technique is useful when intermediate results need to be reused in larger database pipelines or reporting queries.
Although the logic still relies on aggregation, separating it into steps can make complex queries easier to debug or extend. Performance characteristics remain roughly O(n) time with O(k) storage for unique emails.
Approach 3: SQL Query with Subquery to Find Duplicates (O(n) time, O(k) space)
A slightly different SQL pattern uses a subquery that calculates counts and then filters duplicates in the outer query. The inner query groups emails and counts occurrences, while the outer query selects only those above the threshold. This approach is logically identical to the HAVING solution but may appear in codebases that prefer subquery-style filtering.
Since the database still performs grouping and aggregation, the complexity remains O(n) time and O(k) memory.
Approach 4: HashMap or Dictionary Counting (O(n) time, O(k) space)
If the same problem is asked outside SQL, a hash table or dictionary works well. Iterate through all emails, increment a count in a map, and collect keys whose frequency exceeds one. Hash lookups and updates run in constant time on average, giving overall O(n) time complexity with O(k) extra space.
This approach appears frequently in coding interviews when the dataset is represented as arrays or objects instead of a relational table.
Recommended for interviews: Interviewers expect the GROUP BY + HAVING query for SQL versions of this problem. It demonstrates understanding of aggregation and filtering in relational databases. Mentioning the hash map counting strategy also shows you understand the underlying algorithm behind duplicate detection.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| GROUP BY + HAVING | O(n) | O(k) | Standard SQL duplicate detection. Most common interview solution. |
| Temporary Table Aggregation | O(n) | O(k) | Useful when intermediate results are reused in complex queries. |
| Subquery with Aggregation | O(n) | O(k) | Alternative SQL pattern using nested queries. |
| HashMap / Dictionary Counting | O(n) | O(k) | When solving the problem outside SQL using arrays or objects. |