Table: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.
The goal of #182 Duplicate Emails is to identify email addresses that appear more than once in a database table. Since the task involves analyzing records in a relational database, the solution typically relies on SQL aggregation and comparison techniques.
A common approach is to group rows by the email column and then count how many times each email appears. If an email appears more than once, it is considered a duplicate. SQL aggregation functions such as COUNT() combined with grouping logic allow you to filter only those entries that exceed a single occurrence.
Another valid strategy is using a self-join where the table is joined with itself to compare rows that share the same email but have different identifiers. This helps detect repeated email values without relying solely on aggregation.
Both methods efficiently scan the table and isolate duplicate entries. In most cases, the time complexity depends on scanning the table once and performing grouping or matching operations, which is typically O(n) for n records.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Aggregation with GROUP BY and COUNT | O(n) | O(n) |
| Self Join Comparison | O(n log n) to O(n²) depending on indexing | O(1) to O(n) |
Frederik Müller
To identify duplicate emails, you can use the SQL GROUP BY clause, which groups rows that have the same values in specified columns into summary rows. Use the COUNT function to count the occurrences of each email, followed by the HAVING clause to filter the groups with more than one occurrence.
Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting.
Space Complexity: O(1), as it uses a fixed amount of space independent of the size of the input data, assuming a reasonable number of distinct emails.
1SELECT email FROM Person GROUP BY email HAVING COUNT(email) > 1;This SQL query selects the 'email' column from the Person table. The GROUP BY clause groups the rows by email, and HAVING COUNT(email) > 1 ensures that only groups with more than one occurrence are selected, effectively identifying duplicates.
Another approach to solve the problem of finding duplicate emails is by using a temporary table to store the counts of each email and then selecting from this table. This method might be helpful when performing complex operations in intermediate steps.
Time Complexity: O(N) due to grouping and counting operations over the entire table.
Space Complexity: Additional O(M) space for the temporary table, where M is the number of distinct emails.
1CREATE TEMPORARY TABLE email_counts AS SELECT email, COUNT(*) as cnt FROM Person GROUPThis approach involves using a hash map or dictionary to count the occurrences of each email in the table. Once we have the counts, we can identify which emails appear more than once and output them.
Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email.
Space Complexity: O(n), for storing unique emails and their counts.
1emails
Here, we will utilize SQL to directly query the table and find duplicates by grouping the records based on the email and using a HAVING clause to filter out emails that appear more than once.
Time Complexity: O(n log n), as it involves sorting or grouping.
Space Complexity: O(n), used by the GROUP BY for intermediate storage.
1SELECT email FROM Person GROUP BY email HAVING COUNT(id) > 1;This SQL query groups the records in the Person table by email and then filters for groups having more than one record (i.e., duplicate emails) using the HAVING clause.
Watch expert explanations and walkthroughs
Practice problems asked by these companies to ace your technical interviews.
Explore More ProblemsJot down your thoughts, approach, and key learnings
Yes, variations of database deduplication problems are common in technical interviews. Companies often test your understanding of SQL grouping, joins, and data integrity checks.
Yes, it can also be solved using a self-join where the table is joined with itself on the email column while ensuring the row identifiers differ. This approach compares rows directly to find repeated email values.
The most efficient approach is using SQL aggregation with GROUP BY and COUNT. By grouping rows by the email field and filtering those with counts greater than one, you can quickly identify duplicate emails in the table.
This problem mainly uses SQL aggregation concepts such as GROUP BY and COUNT. These allow you to summarize rows by email and detect values that occur multiple times.
Firstly, create a temporary table, email_counts, to store each distinct email along with its count resulting from GROUP BY email. Then, select only those emails from this temporary table where the count, `cnt`, is greater than 1, indicating duplicates.
The Python solution involves using a dictionary to maintain a count of each email. We iterate over the emails, updating counts in the dictionary. Finally, we output emails whose count exceeds one.