Table: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.
Problem Overview: The table Person contains email addresses. Your task is to return all emails that appear more than once. Each duplicate email should be listed only once in the result.
Approach 1: Using GROUP BY and HAVING Clause (O(n) time, O(1) extra space)
This is the most direct SQL solution. Group all rows by the email column using GROUP BY. The database aggregates rows with the same email together. Then apply a HAVING COUNT(email) > 1 condition to filter only groups that appear more than once. Since the database scans the table once and performs aggregation, the time complexity is O(n). This approach is concise and relies entirely on SQL aggregation features commonly used in database queries.
Approach 2: Using Temporary Tables (O(n) time, O(n) space)
Create a temporary table that stores each email and the number of times it appears. First insert aggregated results using GROUP BY. Then query the temporary table to return emails where the count is greater than one. This approach separates aggregation and filtering into two steps, which can help when debugging complex queries or when intermediate results need to be reused. The database still scans the table once, giving O(n) time, while the temporary table introduces O(n) storage overhead.
Approach 3: Using HashMap or Dictionary to Count Occurrences (O(n) time, O(n) space)
If the data is processed in application code instead of SQL, use a hash map (dictionary) to count occurrences of each email. Iterate through all records and increment a counter for each email. After the pass, iterate through the map and return keys whose frequency is greater than one. Hash lookups run in constant time, so the entire operation remains O(n). This method relies on hashing concepts similar to those used in hash map problems.
Approach 4: SQL Self Join to Find Duplicates (O(n^2) worst-case time, O(1) space)
Join the Person table with itself on matching email values while ensuring the row IDs are different. When two different rows share the same email, the join produces a match. Use DISTINCT to avoid repeating the same email multiple times. This approach works without aggregation but can degrade toward O(n^2) comparisons depending on indexing and dataset size. It is mainly useful for understanding relational joins within database queries.
Recommended for interviews: The GROUP BY + HAVING solution is what interviewers typically expect for SQL-based problems. It shows you understand aggregation and filtering in relational databases. Discussing the hash map approach demonstrates that you can translate the same logic to application code, which often comes up in system design or backend discussions.
To identify duplicate emails, you can use the SQL GROUP BY clause, which groups rows that have the same values in specified columns into summary rows. Use the COUNT function to count the occurrences of each email, followed by the HAVING clause to filter the groups with more than one occurrence.
This SQL query selects the 'email' column from the Person table. The GROUP BY clause groups the rows by email, and HAVING COUNT(email) > 1 ensures that only groups with more than one occurrence are selected, effectively identifying duplicates.
Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting.
Space Complexity: O(1), as it uses a fixed amount of space independent of the size of the input data, assuming a reasonable number of distinct emails.
Another approach to solve the problem of finding duplicate emails is by using a temporary table to store the counts of each email and then selecting from this table. This method might be helpful when performing complex operations in intermediate steps.
Firstly, create a temporary table, email_counts, to store each distinct email along with its count resulting from GROUP BY email. Then, select only those emails from this temporary table where the count, `cnt`, is greater than 1, indicating duplicates.
Time Complexity: O(N) due to grouping and counting operations over the entire table.
Space Complexity: Additional O(M) space for the temporary table, where M is the number of distinct emails.
This approach involves using a hash map or dictionary to count the occurrences of each email in the table. Once we have the counts, we can identify which emails appear more than once and output them.
The C implementation uses a struct to keep track of unique emails and their counts. We iterate through the list of emails and populate our structure, incrementing the count for duplicates. Finally, we print emails with counts greater than one.
C++
Java
Python
C#
JavaScript
Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email.
Space Complexity: O(n), for storing unique emails and their counts.
Here, we will utilize SQL to directly query the table and find duplicates by grouping the records based on the email and using a HAVING clause to filter out emails that appear more than once.
This SQL query groups the records in the Person table by email and then filters for groups having more than one record (i.e., duplicate emails) using the HAVING clause.
Time Complexity: O(n log n), as it involves sorting or grouping.
Space Complexity: O(n), used by the GROUP BY for intermediate storage.
| Approach | Complexity |
|---|---|
| Using Group By and Having Clause | Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting. |
| Using Temporary Tables | Time Complexity: O(N) due to grouping and counting operations over the entire table. |
| Using HashMap or Dictionary to Count Occurrences | Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email. |
| SQL Query to Find Duplicates | Time Complexity: O(n log n), as it involves sorting or grouping. |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| GROUP BY + HAVING | O(n) | O(1) | Best SQL solution when aggregation is supported |
| Temporary Table Aggregation | O(n) | O(n) | When intermediate aggregated data needs reuse or debugging |
| HashMap / Dictionary Counting | O(n) | O(n) | When solving outside SQL using application code |
| SQL Self Join | O(n^2) | O(1) | Useful for understanding joins or when aggregation is restricted |
LeetCode 196: Delete Duplicate Emails [SQL] • Frederik Müller • 20,369 views views
Watch 9 more video solutions →Practice Duplicate Emails with our built-in code editor and test cases.
Practice on FleetCode