#196 Delete Duplicate Emails - Solution

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id.

For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.

For Pandas users, please note that you are supposed to modify Person in place.

After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.

The result format is in the following example.

Example 1:

Input: 
Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+
Output: 
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+
Explanation: john@example.com is repeated two times. We keep the row with the smallest Id = 1.

The goal of #196 Delete Duplicate Emails is to remove duplicate email records from a table while keeping only one instance of each email. Typically, each row contains an id and an email, and the smallest id should be preserved for each unique email.

A common strategy is to use SQL operations that identify duplicates based on the email column and then delete rows whose id is not the minimum within that group. This can be achieved using techniques such as self-joins, subqueries, or GROUP BY with aggregation. The key idea is to compare rows with the same email and retain the one with the smallest identifier.

Efficient indexing on the email column can significantly improve performance when scanning and grouping records. Most database engines can execute this logic in linear or near-linear time depending on indexing and query planning.

Approach	Time Complexity	Space Complexity
Self Join to Identify Duplicate Emails	O(n)	O(1)
Subquery with MIN(id) per Email	O(n log n)	O(1)
GROUP BY with Aggregation	O(n log n)	O(1)

Solutions (9)

SQL Using EXISTS Clause

This approach involves using the SQL EXISTS clause to handle the duplicates. By defining a subquery that checks for existence of another row with the same email but a smaller id, we can delete the current row if such a row exists.

Time Complexity: O(n^2) due to the nested subquery, where n is the number of rows.
Space Complexity: O(1) as we are performing operations in place.

1DELETE p1 FROM Person p1 WHERE EXISTS (SELECT 1 FROM Person p2 WHERE p1.email = p2.email AND p1.id > p2.id);

Explanation

This SQL query deletes rows from the Person table where there exists another row within the same table that has the same email but a smaller id. Thus, it ensures that only one unique email with the smallest id is retained.

Using SQL with Self-Join

This approach uses a self-join to create joins within the same table. We join the table on the email field itself and use the condition to keep the entry with the smallest id.

Time Complexity: O(n) primarily, because the operation groups by email and selects min in linear time, although the exact operations may vary by SQL engine.
Space Complexity: O(n) due to the creation of the temporary table holding min ids.

1DELETE FROM Person WHERE id NOT IN (SELECT min_id FROM (SELECT MIN(id)

Approach 1: Using a Hash Table (Dictionary)

This approach involves using a hash table (or dictionary) to store the smallest ID for each unique email. As you iterate over each email entry, check if the email already exists in the hash table. If it does, compare IDs and update the hash table with the smaller ID if necessary. After processing all entries, use the hash table data to filter out duplicates by retaining entries with only the smallest IDs.

Time Complexity: O(n log n) due to the sort operation.
Space Complexity: O(1) as sorting is done in place.

C C++Java Python C#JavaScript

Approach 2: SQL-Based Duplicate Removal

This approach is specifically designed for SQL contexts. The strategy here is to delete entries from a database such that for each email, only the entry with the smallest ID is retained. This involves using SQL commands to isolate entries that have duplicate emails (but higher IDs than the minimum for each specific email) and then deleting them.

Time Complexity: Generally O(n log n) for the grouping operation.
Space Complexity: O(n) with respect to holding email and associated data in memory during sub-query execution.

1DELETE FROM Person
2WHERE id NOT IN (
3  SELECT

196. Delete Duplicate Emails

Problem Statement

Approach

Complexity

Video Solution Available

Solutions (9)

SQL Using EXISTS Clause

Explanation

Using SQL with Self-Join

Approach 1: Using a Hash Table (Dictionary)

Approach 2: SQL-Based Duplicate Removal

Video Solutions

LeetCode 196: Delete Duplicate Emails [SQL]

LeetCode 182: Duplicate Emails [3 Solutions]

Delete Duplicate Emails | Leetcode 196 | Week 4 | Qs 22

Delete Duplicate Emails | Leetcode 196 | Crack SQL Interviews in 50 Qs #mysql #leetcode

Leetcode 196 - Delete Duplicate Emails | SQL CTE Interview Question

Leetcode 196 - Delete Duplicate Emails - Python Solution | FAANG Interviews

Delete Duplicate Emails - SQL - MySQL - LeetCode

LeetCode 196 - Delete Duplicate Emails (Python and SQL) [Easy]

196. Delete Duplicate Emails|197. Rising Temperature| Leetcode SQL Easy

Leetcode #196 | Delete Duplicate Emails

Asked By Companies

Prepare for Interviews

Notes

Personal Notes

Similar Problems

Related Topics

Problem Stats

Practice on LeetCode

Frequently Asked Questions

Is Delete Duplicate Emails asked in FAANG interviews?

What is the optimal approach for Delete Duplicate Emails?

What SQL concept is important for solving Delete Duplicate Emails?

What data structure or database feature helps in Delete Duplicate Emails?

Explanation

Explanation

Explanation