#182 Duplicate Emails - Solution

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input: 
Person table:
+----+---------+
| id | email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+
Output: 
+---------+
| Email   |
+---------+
| a@b.com |
+---------+
Explanation: a@b.com is repeated two times.

The goal of #182 Duplicate Emails is to identify email addresses that appear more than once in a database table. Since the task involves analyzing records in a relational database, the solution typically relies on SQL aggregation and comparison techniques.

A common approach is to group rows by the email column and then count how many times each email appears. If an email appears more than once, it is considered a duplicate. SQL aggregation functions such as COUNT() combined with grouping logic allow you to filter only those entries that exceed a single occurrence.

Another valid strategy is using a self-join where the table is joined with itself to compare rows that share the same email but have different identifiers. This helps detect repeated email values without relying solely on aggregation.

Both methods efficiently scan the table and isolate duplicate entries. In most cases, the time complexity depends on scanning the table once and performing grouping or matching operations, which is typically O(n) for n records.

Approach	Time Complexity	Space Complexity
Aggregation with GROUP BY and COUNT	O(n)	O(n)
Self Join Comparison	O(n log n) to O(n²) depending on indexing	O(1) to O(n)

Solutions (9)

Using Group By and Having Clause

To identify duplicate emails, you can use the SQL GROUP BY clause, which groups rows that have the same values in specified columns into summary rows. Use the COUNT function to count the occurrences of each email, followed by the HAVING clause to filter the groups with more than one occurrence.

Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting.
Space Complexity: O(1), as it uses a fixed amount of space independent of the size of the input data, assuming a reasonable number of distinct emails.

1SELECT email FROM Person GROUP BY email HAVING COUNT(email) > 1;

Explanation

This SQL query selects the 'email' column from the Person table. The GROUP BY clause groups the rows by email, and HAVING COUNT(email) > 1 ensures that only groups with more than one occurrence are selected, effectively identifying duplicates.

Using Temporary Tables

Another approach to solve the problem of finding duplicate emails is by using a temporary table to store the counts of each email and then selecting from this table. This method might be helpful when performing complex operations in intermediate steps.

Time Complexity: O(N) due to grouping and counting operations over the entire table.
Space Complexity: Additional O(M) space for the temporary table, where M is the number of distinct emails.

1CREATE TEMPORARY TABLE email_counts AS SELECT email, COUNT(*) as cnt FROM Person GROUP

Using HashMap or Dictionary to Count Occurrences

This approach involves using a hash map or dictionary to count the occurrences of each email in the table. Once we have the counts, we can identify which emails appear more than once and output them.

Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email.
Space Complexity: O(n), for storing unique emails and their counts.

C C++Java Python C#JavaScript

1emails

SQL Query to Find Duplicates

Here, we will utilize SQL to directly query the table and find duplicates by grouping the records based on the email and using a HAVING clause to filter out emails that appear more than once.

Time Complexity: O(n log n), as it involves sorting or grouping.
Space Complexity: O(n), used by the GROUP BY for intermediate storage.

1SELECT email FROM Person GROUP BY email HAVING COUNT(id) > 1;

Explanation

This SQL query groups the records in the Person table by email and then filters for groups having more than one record (i.e., duplicate emails) using the HAVING clause.