Table: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.
Problem Overview: You are given a Person table containing email addresses. The goal is to return all email values that appear more than once. Each duplicate email should appear only once in the output.
This problem tests basic database querying skills. You need to identify repeated values in a column, which is a common pattern in database interviews and production analytics queries.
Approach 1: Using GROUP BY and HAVING Clause (O(n) time, O(k) space)
This is the most common SQL solution. You group rows by the email column and count how many times each email appears. The HAVING clause filters grouped results where the count is greater than 1. Internally the database scans the table and aggregates counts per unique email. This approach is concise, efficient, and exactly how SQL engines are designed to detect duplicates.
Typical query pattern:
SELECT email FROM Person GROUP BY email HAVING COUNT(email) > 1;
The time complexity is O(n) for scanning and grouping rows. Space complexity is O(k), where k is the number of distinct emails stored during aggregation.
Approach 2: Using Temporary Tables (O(n) time, O(k) space)
Another SQL approach stores aggregated counts in a temporary table. First, compute email frequencies using GROUP BY. Then query the temporary table to filter rows with counts greater than one. This technique is useful when intermediate results need to be reused in larger database pipelines or reporting queries.
Although the logic still relies on aggregation, separating it into steps can make complex queries easier to debug or extend. Performance characteristics remain roughly O(n) time with O(k) storage for unique emails.
Approach 3: SQL Query with Subquery to Find Duplicates (O(n) time, O(k) space)
A slightly different SQL pattern uses a subquery that calculates counts and then filters duplicates in the outer query. The inner query groups emails and counts occurrences, while the outer query selects only those above the threshold. This approach is logically identical to the HAVING solution but may appear in codebases that prefer subquery-style filtering.
Since the database still performs grouping and aggregation, the complexity remains O(n) time and O(k) memory.
Approach 4: HashMap or Dictionary Counting (O(n) time, O(k) space)
If the same problem is asked outside SQL, a hash table or dictionary works well. Iterate through all emails, increment a count in a map, and collect keys whose frequency exceeds one. Hash lookups and updates run in constant time on average, giving overall O(n) time complexity with O(k) extra space.
This approach appears frequently in coding interviews when the dataset is represented as arrays or objects instead of a relational table.
Recommended for interviews: Interviewers expect the GROUP BY + HAVING query for SQL versions of this problem. It demonstrates understanding of aggregation and filtering in relational databases. Mentioning the hash map counting strategy also shows you understand the underlying algorithm behind duplicate detection.
To identify duplicate emails, you can use the SQL GROUP BY clause, which groups rows that have the same values in specified columns into summary rows. Use the COUNT function to count the occurrences of each email, followed by the HAVING clause to filter the groups with more than one occurrence.
This SQL query selects the 'email' column from the Person table. The GROUP BY clause groups the rows by email, and HAVING COUNT(email) > 1 ensures that only groups with more than one occurrence are selected, effectively identifying duplicates.
SQL
Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting.
Space Complexity: O(1), as it uses a fixed amount of space independent of the size of the input data, assuming a reasonable number of distinct emails.
Another approach to solve the problem of finding duplicate emails is by using a temporary table to store the counts of each email and then selecting from this table. This method might be helpful when performing complex operations in intermediate steps.
Firstly, create a temporary table, email_counts, to store each distinct email along with its count resulting from GROUP BY email. Then, select only those emails from this temporary table where the count, `cnt`, is greater than 1, indicating duplicates.
SQL
Time Complexity: O(N) due to grouping and counting operations over the entire table.
Space Complexity: Additional O(M) space for the temporary table, where M is the number of distinct emails.
This approach involves using a hash map or dictionary to count the occurrences of each email in the table. Once we have the counts, we can identify which emails appear more than once and output them.
The C implementation uses a struct to keep track of unique emails and their counts. We iterate through the list of emails and populate our structure, incrementing the count for duplicates. Finally, we print emails with counts greater than one.
Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email.
Space Complexity: O(n), for storing unique emails and their counts.
Here, we will utilize SQL to directly query the table and find duplicates by grouping the records based on the email and using a HAVING clause to filter out emails that appear more than once.
This SQL query groups the records in the Person table by email and then filters for groups having more than one record (i.e., duplicate emails) using the HAVING clause.
SQL
Time Complexity: O(n log n), as it involves sorting or grouping.
Space Complexity: O(n), used by the GROUP BY for intermediate storage.
We can use the GROUP BY statement to group the data by the email field, and then use the HAVING statement to filter out the email addresses that appear more than once.
We can use a self-join to join the Person table with itself, and then filter out the records where the id is different but the email is the same.
MySQL
| Approach | Complexity |
|---|---|
| Using Group By and Having Clause | Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting. |
| Using Temporary Tables | Time Complexity: O(N) due to grouping and counting operations over the entire table. |
| Using HashMap or Dictionary to Count Occurrences | Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email. |
| SQL Query to Find Duplicates | Time Complexity: O(n log n), as it involves sorting or grouping. |
| Group By + Having | — |
| Self-Join | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| GROUP BY + HAVING | O(n) | O(k) | Standard SQL duplicate detection. Most common interview solution. |
| Temporary Table Aggregation | O(n) | O(k) | Useful when intermediate results are reused in complex queries. |
| Subquery with Aggregation | O(n) | O(k) | Alternative SQL pattern using nested queries. |
| HashMap / Dictionary Counting | O(n) | O(k) | When solving the problem outside SQL using arrays or objects. |
LeetCode Interview SQL Question with Detailed Explanation | Practice SQL | LeetCode 182 • Everyday Data Science • 36,048 views views
Watch 9 more video solutions →Practice Duplicate Emails with our built-in code editor and test cases.
Practice on FleetCode