Unique Email Groups - Solution & Explanation

Q: Is Unique Email Groups easy or hard?

Unique Email Groups is generally classified as a medium problem because it combines string manipulation with hash-based deduplication. The logic is straightforward once you understand the normalization rules, but careful parsing is required to avoid edge‑case mistakes.

Q: Unique Email Groups Python/Java solution

In both Python and Java, iterate through the email list, normalize the local part using string operations, and insert the result into a hash set. Python typically uses a set with string slicing, while Java uses a HashSet with a StringBuilder to construct the normalized email.

MediumPremiumFree on FleetCodeArray Hash Table String9 min read

Practice this problem

Problem Statement

You are given an array of strings emails, where each string is a valid email address.

Two email addresses belong to the same group if both their normalized local names and normalized domain names are identical.

The normalization rules are as follows:

The local name is the part before the '@' symbol.
- Ignore all dots '.'.
- Ignore everything after the first '+', if present.
- Convert to lowercase.
The domain name is the part after the '@' symbol.
- Convert to lowercase.

Return an integer denoting the number of unique email groups after normalization.

Example 1:

Input: emails = ["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]

Output: 2

Explanation:

Email	Local	Normalized Local	Domain	Normalized Domain	Final Email
test.email+alex@leetcode.com	test.email+alex	testemail	leetcode.com	leetcode.com	testemail@leetcode.com
test.e.mail+bob.cathy@leetcode.com	test.e.mail+bob.cathy	testemail	leetcode.com	leetcode.com	testemail@leetcode.com
testemail+david@lee.tcode.com	testemail+david	testemail	lee.tcode.com	lee.tcode.com	testemail@lee.tcode.com

Unique emails are ["testemail@leetcode.com", "testemail@lee.tcode.com"]. Thus, the answer is 2.

Example 2:

Input: emails = ["A@B.com", "a@b.com", "ab+xy@b.com", "a.b@b.com"]

Output: 2

Explanation:

Email	Local	Normalized Local	Domain	Normalized Domain	Final Email
A@B.com	A	a	B.com	b.com	a@b.com
a@b.com	a	a	b.com	b.com	a@b.com
ab+xy@b.com	ab+xy	ab	b.com	b.com	ab@b.com
a.b@b.com	a.b	ab	b.com	b.com	ab@b.com

Unique emails are ["a@b.com", "ab@b.com"]. Thus, the answer is 2.

Example 3:

Input: emails = ["a.b+c.d+e@DoMain.com", "ab+xyz@domain.com", "ab@domain.com"]

Output: 1

Explanation:

Email	Local	Normalized Local	Domain	Normalized Domain	Final Email
a.b+c.d+e@DoMain.com	a.b+c.d+e	ab	DoMain.com	domain.com	ab@domain.com
ab+xyz@domain.com	ab+xyz	ab	domain.com	domain.com	ab@domain.com
ab@domain.com	ab	ab	domain.com	domain.com	ab@domain.com

All emails normalize to "ab@domain.com". Thus, the answer is 1.

Constraints:

1 <= emails.length <= 1000
1 <= emails[i].length <= 100
emails[i] consists of lowercase and uppercase English letters, digits, and the characters '.', '+', and '@'.
Each emails[i] contains exactly one '@' character.
All local and domain names are non-empty; local names do not start with '+'.
Domain names end with the ".com" suffix and contain at least one character before ".com".

Approach Overview

Problem Overview: You receive a list of email addresses and must determine how many unique groups exist after applying normalization rules. Typically the local part of the email may ignore characters like . or discard everything after +, while the domain remains unchanged. Different raw strings can represent the same effective email, so the goal is to normalize each address and count distinct results.

Approach 1: Brute Force Normalization with List Comparison (O(n² · m) time, O(n · m) space)

Process each email and convert it into its normalized form using basic string operations. Remove dots in the local part and ignore characters after the first +, then append the unchanged domain. Instead of using a hash structure, store normalized results in a list and check each new email against all previous entries to see if it already exists. Each comparison may scan the string, giving O(m) cost per comparison, and up to n comparisons per email. This approach works for small inputs but becomes inefficient when the list grows.

Approach 2: Hash Table for Unique Normalized Emails (O(n · m) time, O(n · m) space)

The efficient solution uses a hash table (or hash set) to track normalized email addresses. Iterate through the array of emails once. For each email, split it into local and domain parts using the @ separator. Scan the local part character by character: skip ., stop when encountering +, and keep the remaining characters. Reconstruct the normalized email as processedLocal + "@" + domain. Insert the result into a hash set; duplicates are automatically ignored due to constant-time hash lookups.

The key insight is that normalization guarantees that all equivalent emails map to the same canonical string. A hash set makes duplicate detection constant time on average, reducing the overall complexity to O(n · m), where n is the number of emails and m is the average length of an email string.

Recommended for interviews: The hash table approach is what interviewers expect. It demonstrates that you can combine string parsing with constant‑time hash lookups to deduplicate data efficiently. Mentioning the brute force approach first shows you understand the baseline solution, but implementing the hash set version proves you can optimize both time complexity and code clarity.

Solution

We can use a hash set st to store the normalized result of each email address. For each email address, we normalize it according to the problem requirements:

Split the email address into a local name and a domain name.
For the local name, remove all dots ., and if a plus sign + exists, remove the plus sign and everything after it. Then convert the local name to lowercase.
For the domain name, convert it to lowercase.
Concatenate the normalized local name and domain name to obtain the normalized email address, and add it to the hash set st.

Finally, the number of elements in the hash set st is the number of unique email groups.

The time complexity is O(n cdot m), where n and m are the number of email addresses and the average length of each email address, respectively. The space complexity is O(n cdot m), in the worst case where all email addresses are distinct.

Code

Python Java C++Go TypeScript

Python

Java

C++

TypeScript

Try this approach in the editor →

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Brute Force Normalization with List Comparison	O(n² · m)	O(n · m)	Useful for understanding the normalization process or when input size is very small.
Hash Table for Unique Normalized Emails	O(n · m)	O(n · m)	Best general solution. Efficient for large inputs and commonly expected in coding interviews.

Frequently Asked Questions

Is Unique Email Groups easy or hard?

Unique Email Groups is generally classified as a medium problem because it combines string manipulation with hash-based deduplication. The logic is straightforward once you understand the normalization rules, but careful parsing is required to avoid edge‑case mistakes.

Unique Email Groups Python/Java solution

In both Python and Java, iterate through the email list, normalize the local part using string operations, and insert the result into a hash set. Python typically uses a set with string slicing, while Java uses a HashSet<String> with a StringBuilder to construct the normalized email.

How to solve Unique Email Groups in O(n)?

Treat each email as a string of length m and normalize it in a single pass. Split the email at '@', process the local part by skipping '.' and stopping at '+', then rebuild the canonical email. Insert the normalized string into a hash set and return the set size. The algorithm effectively runs in O(n · m) time with O(n) hash operations.

What is the best approach for Unique Email Groups?

The hash table approach is the most efficient and widely expected solution. Normalize each email address by removing dots and ignoring characters after '+' in the local part, then insert the canonical form into a hash set. Hash lookups prevent duplicates and keep the overall complexity at O(n · m), where n is the number of emails and m is the average email length.

Is Unique Email Groups asked at Google/Amazon/Meta?

Email normalization and deduplication problems frequently appear in interviews at companies like Google, Amazon, and Meta. Variations often test string parsing, hashing, and data deduplication. The hash set solution demonstrates familiarity with practical data processing patterns.

What data structure is used in Unique Email Groups?

A hash set (hash table) is the primary data structure. It stores normalized email addresses and provides constant‑time insertion and membership checks. This makes it ideal for detecting duplicates while processing the list once.

What is the time complexity of Unique Email Groups?

The optimal solution runs in O(n · m) time. Each email is processed once, and normalization scans up to m characters. Hash set insertion and lookup are O(1) on average, so the total complexity scales linearly with the number of emails.

Ready to solve this problem?

Practice Unique Email Groups with our built-in code editor and test cases.

Practice on FleetCode

Two Sum

Median of Two Sorted Arrays

Problem Info

DifficultyMedium

Acceptance87.3%

Approaches1

Reading time9 min

Practice this problem

Open in Editor

Problem Statement

You are given an array of strings emails, where each string is a valid email address.

Two email addresses belong to the same group if both their normalized local names and normalized domain names are identical.

The normalization rules are as follows:

The local name is the part before the '@' symbol.
- Ignore all dots '.'.
- Ignore everything after the first '+', if present.
- Convert to lowercase.
The domain name is the part after the '@' symbol.
- Convert to lowercase.

Return an integer denoting the number of unique email groups after normalization.

Example 1:

Input: emails = ["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]

Output: 2

Explanation:

Email	Local	Normalized Local	Domain	Normalized Domain	Final Email
test.email+alex@leetcode.com	test.email+alex	testemail	leetcode.com	leetcode.com	testemail@leetcode.com
test.e.mail+bob.cathy@leetcode.com	test.e.mail+bob.cathy	testemail	leetcode.com	leetcode.com	testemail@leetcode.com
testemail+david@lee.tcode.com	testemail+david	testemail	lee.tcode.com	lee.tcode.com	testemail@lee.tcode.com

Unique emails are ["testemail@leetcode.com", "testemail@lee.tcode.com"]. Thus, the answer is 2.

Example 2:

Input: emails = ["A@B.com", "a@b.com", "ab+xy@b.com", "a.b@b.com"]

Output: 2

Explanation:

Email	Local	Normalized Local	Domain	Normalized Domain	Final Email
A@B.com	A	a	B.com	b.com	a@b.com
a@b.com	a	a	b.com	b.com	a@b.com
ab+xy@b.com	ab+xy	ab	b.com	b.com	ab@b.com
a.b@b.com	a.b	ab	b.com	b.com	ab@b.com

Unique emails are ["a@b.com", "ab@b.com"]. Thus, the answer is 2.

Example 3:

Input: emails = ["a.b+c.d+e@DoMain.com", "ab+xyz@domain.com", "ab@domain.com"]

Output: 1

Explanation:

Email	Local	Normalized Local	Domain	Normalized Domain	Final Email
a.b+c.d+e@DoMain.com	a.b+c.d+e	ab	DoMain.com	domain.com	ab@domain.com
ab+xyz@domain.com	ab+xyz	ab	domain.com	domain.com	ab@domain.com
ab@domain.com	ab	ab	domain.com	domain.com	ab@domain.com

All emails normalize to "ab@domain.com". Thus, the answer is 1.

Constraints:

1 <= emails.length <= 1000
1 <= emails[i].length <= 100
emails[i] consists of lowercase and uppercase English letters, digits, and the characters '.', '+', and '@'.
Each emails[i] contains exactly one '@' character.
All local and domain names are non-empty; local names do not start with '+'.
Domain names end with the ".com" suffix and contain at least one character before ".com".

Approach Overview

Approach 1: Brute Force Normalization with List Comparison (O(n² · m) time, O(n · m) space)

Approach 2: Hash Table for Unique Normalized Emails (O(n · m) time, O(n · m) space)

Solution

We can use a hash set st to store the normalized result of each email address. For each email address, we normalize it according to the problem requirements:

Split the email address into a local name and a domain name.
For the local name, remove all dots ., and if a plus sign + exists, remove the plus sign and everything after it. Then convert the local name to lowercase.
For the domain name, convert it to lowercase.
Concatenate the normalized local name and domain name to obtain the normalized email address, and add it to the hash set st.

Finally, the number of elements in the hash set st is the number of unique email groups.

Code

Python Java C++Go TypeScript

Python

Java

C++

TypeScript

Try this approach in the editor →

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Brute Force Normalization with List Comparison	O(n² · m)	O(n · m)	Useful for understanding the normalization process or when input size is very small.
Hash Table for Unique Normalized Emails	O(n · m)	O(n · m)	Best general solution. Efficient for large inputs and commonly expected in coding interviews.

Frequently Asked Questions

Is Unique Email Groups easy or hard?

Unique Email Groups Python/Java solution

How to solve Unique Email Groups in O(n)?

What is the best approach for Unique Email Groups?

Is Unique Email Groups asked at Google/Amazon/Meta?

What data structure is used in Unique Email Groups?

What is the time complexity of Unique Email Groups?

Unique Email Groups - Solution & Explanation

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Unique Email Groups - Solution & Explanation

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents