Duplicate Emails - Solution & Explanation

EasyDatabase9 min readAsked at: Amazon, Microsoft, Meta +4

Problem Statement

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input: 
Person table:
+----+---------+
| id | email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+
Output: 
+---------+
| Email   |
+---------+
| a@b.com |
+---------+
Explanation: a@b.com is repeated two times.

Approach Overview

Problem Overview: You are given a Person table containing email addresses. The goal is to return all email values that appear more than once. Each duplicate email should appear only once in the output.

This problem tests basic database querying skills. You need to identify repeated values in a column, which is a common pattern in database interviews and production analytics queries.

Approach 1: Using GROUP BY and HAVING Clause (O(n) time, O(k) space)

This is the most common SQL solution. You group rows by the email column and count how many times each email appears. The HAVING clause filters grouped results where the count is greater than 1. Internally the database scans the table and aggregates counts per unique email. This approach is concise, efficient, and exactly how SQL engines are designed to detect duplicates.

Typical query pattern:

SELECT email FROM Person GROUP BY email HAVING COUNT(email) > 1;

The time complexity is O(n) for scanning and grouping rows. Space complexity is O(k), where k is the number of distinct emails stored during aggregation.

Approach 2: Using Temporary Tables (O(n) time, O(k) space)

Another SQL approach stores aggregated counts in a temporary table. First, compute email frequencies using GROUP BY. Then query the temporary table to filter rows with counts greater than one. This technique is useful when intermediate results need to be reused in larger database pipelines or reporting queries.

Although the logic still relies on aggregation, separating it into steps can make complex queries easier to debug or extend. Performance characteristics remain roughly O(n) time with O(k) storage for unique emails.

Approach 3: SQL Query with Subquery to Find Duplicates (O(n) time, O(k) space)

A slightly different SQL pattern uses a subquery that calculates counts and then filters duplicates in the outer query. The inner query groups emails and counts occurrences, while the outer query selects only those above the threshold. This approach is logically identical to the HAVING solution but may appear in codebases that prefer subquery-style filtering.

Since the database still performs grouping and aggregation, the complexity remains O(n) time and O(k) memory.

Approach 4: HashMap or Dictionary Counting (O(n) time, O(k) space)

If the same problem is asked outside SQL, a hash table or dictionary works well. Iterate through all emails, increment a count in a map, and collect keys whose frequency exceeds one. Hash lookups and updates run in constant time on average, giving overall O(n) time complexity with O(k) extra space.

This approach appears frequently in coding interviews when the dataset is represented as arrays or objects instead of a relational table.

Recommended for interviews: Interviewers expect the GROUP BY + HAVING query for SQL versions of this problem. It demonstrates understanding of aggregation and filtering in relational databases. Mentioning the hash map counting strategy also shows you understand the underlying algorithm behind duplicate detection.

Approach 1: Using Group By and Having Clause

To identify duplicate emails, you can use the SQL GROUP BY clause, which groups rows that have the same values in specified columns into summary rows. Use the COUNT function to count the occurrences of each email, followed by the HAVING clause to filter the groups with more than one occurrence.

This SQL query selects the 'email' column from the Person table. The GROUP BY clause groups the rows by email, and HAVING COUNT(email) > 1 ensures that only groups with more than one occurrence are selected, effectively identifying duplicates.

Code

SQL

Complexity

Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting.
Space Complexity: O(1), as it uses a fixed amount of space independent of the size of the input data, assuming a reasonable number of distinct emails.

Try this approach in the editor →

Approach 2: Using Temporary Tables

Another approach to solve the problem of finding duplicate emails is by using a temporary table to store the counts of each email and then selecting from this table. This method might be helpful when performing complex operations in intermediate steps.

Firstly, create a temporary table, email_counts, to store each distinct email along with its count resulting from GROUP BY email. Then, select only those emails from this temporary table where the count, `cnt`, is greater than 1, indicating duplicates.

Code

SQL

Complexity

Time Complexity: O(N) due to grouping and counting operations over the entire table.
Space Complexity: Additional O(M) space for the temporary table, where M is the number of distinct emails.

Try this approach in the editor →

Approach 3: Using HashMap or Dictionary to Count Occurrences

This approach involves using a hash map or dictionary to count the occurrences of each email in the table. Once we have the counts, we can identify which emails appear more than once and output them.

The C implementation uses a struct to keep track of unique emails and their counts. We iterate through the list of emails and populate our structure, incrementing the count for duplicates. Finally, we print emails with counts greater than one.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email.
Space Complexity: O(n), for storing unique emails and their counts.

Try this approach in the editor →

Approach 4: SQL Query to Find Duplicates

Here, we will utilize SQL to directly query the table and find duplicates by grouping the records based on the email and using a HAVING clause to filter out emails that appear more than once.

This SQL query groups the records in the Person table by email and then filters for groups having more than one record (i.e., duplicate emails) using the HAVING clause.

Code

SQL

Complexity

Time Complexity: O(n log n), as it involves sorting or grouping.
Space Complexity: O(n), used by the GROUP BY for intermediate storage.

Try this approach in the editor →

Approach 5: Group By + Having

We can use the GROUP BY statement to group the data by the email field, and then use the HAVING statement to filter out the email addresses that appear more than once.

Code

Python MySQL

Python

MySQL

Try this approach in the editor →

Approach 6: Self-Join

We can use a self-join to join the Person table with itself, and then filter out the records where the id is different but the email is the same.

Code

MySQL

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Using Group By and Having Clause	Time Complexity: O(N), where N is the number of rows in the table. The query needs to scan all rows to perform the grouping and counting. Space Complexity: O(1), as it uses a fixed amount of space independent of the size of the input data, assuming a reasonable number of distinct emails.
Using Temporary Tables	Time Complexity: O(N) due to grouping and counting operations over the entire table. Space Complexity: Additional O(M) space for the temporary table, where M is the number of distinct emails.
Using HashMap or Dictionary to Count Occurrences	Time Complexity: O(n*m), where n is the number of emails and m is the average length of an email. Space Complexity: O(n), for storing unique emails and their counts.
SQL Query to Find Duplicates	Time Complexity: O(n log n), as it involves sorting or grouping. Space Complexity: O(n), used by the GROUP BY for intermediate storage.
Group By + Having	—
Self-Join	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
GROUP BY + HAVING	O(n)	O(k)	Standard SQL duplicate detection. Most common interview solution.
Temporary Table Aggregation	O(n)	O(k)	Useful when intermediate results are reused in complex queries.
Subquery with Aggregation	O(n)	O(k)	Alternative SQL pattern using nested queries.
HashMap / Dictionary Counting	O(n)	O(k)	When solving the problem outside SQL using arrays or objects.

Video Solution

LeetCode Interview SQL Question with Detailed Explanation | Practice SQL | LeetCode 182 • Everyday Data Science • 36,048 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Duplicate Emails easy or hard?

Duplicate Emails is considered an Easy database problem. It focuses on basic SQL aggregation concepts such as GROUP BY and HAVING, which are fundamental skills for backend and data-related interviews.

Duplicate Emails Python/Java solution

In Python or Java, iterate through the list of emails and store frequencies in a dictionary or HashMap. After counting, iterate through the map and collect keys with counts greater than one. This approach runs in O(n) time with O(k) extra memory.

How to solve Duplicate Emails in O(n)?

Scan the dataset once and maintain counts per email. In SQL, use GROUP BY email and filter with HAVING COUNT(email) > 1. In programming languages like Python or Java, store counts in a hash map and return keys whose frequency exceeds one.

What is the best approach for Duplicate Emails?

The standard solution uses SQL GROUP BY with a HAVING clause. Group rows by the email column and filter groups where COUNT(email) > 1. This performs a single table scan with aggregation and runs in roughly O(n) time depending on the database engine.

Is Duplicate Emails asked at Google/Amazon/Meta?

Duplicate detection problems are common in database and backend interviews at companies like Google, Amazon, and Meta. While the exact question may vary, candidates are often expected to write SQL queries using GROUP BY, HAVING, or window functions.

What data structure is used in Duplicate Emails?

The conceptual data structure behind the problem is a hash table that tracks frequency counts. SQL engines implement a similar idea internally during aggregation, while coding solutions explicitly use HashMap or dictionary structures.

What is the time complexity of Duplicate Emails?

Most SQL solutions run in O(n) time because the database scans the table once and aggregates counts for each unique email. Space complexity is O(k), where k is the number of distinct email values stored during grouping.

Ready to solve this problem?

Practice Duplicate Emails with our built-in code editor and test cases.

Practice on FleetCode

Combine Two Tables

Second Highest Salary

Problem Info

DifficultyEasy

Acceptance71.8%

Approaches6

Reading time9 min

Asked at

Amazon Microsoft Meta Google TCS

Practice this problem

Open in Editor

Problem Statement

Approach Overview

Approach 1: Using Group By and Having Clause

Code

Complexity

Approach 2: Using Temporary Tables

Code

Complexity

Approach 3: Using HashMap or Dictionary to Count Occurrences

Code

Complexity

Approach 4: SQL Query to Find Duplicates

Code

Complexity

Approach 5: Group By + Having

Code

Approach 6: Self-Join

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Using Group By and Having Clause

Code

Complexity

Approach 2: Using Temporary Tables

Code

Complexity

Approach 3: Using HashMap or Dictionary to Count Occurrences

Code

Complexity

Approach 4: SQL Query to Find Duplicates

Code

Complexity

Approach 5: Group By + Having

Code

Approach 6: Self-Join

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents