You are given an array of strings emails, where each string is a valid email address.
Two email addresses belong to the same group if both their normalized local names and normalized domain names are identical.
The normalization rules are as follows:
'@' symbol.
'.'.'+', if present.'@' symbol.
Return an integer denoting the number of unique email groups after normalization.
Example 1:
Input: emails = ["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]
Output: 2
Explanation:
| Local | Normalized Local | Domain | Normalized Domain | Final Email | |
|---|---|---|---|---|---|
| test.email+alex@leetcode.com | test.email+alex | testemail | leetcode.com | leetcode.com | testemail@leetcode.com |
| test.e.mail+bob.cathy@leetcode.com | test.e.mail+bob.cathy | testemail | leetcode.com | leetcode.com | testemail@leetcode.com |
| testemail+david@lee.tcode.com | testemail+david | testemail | lee.tcode.com | lee.tcode.com | testemail@lee.tcode.com |
Unique emails are ["testemail@leetcode.com", "testemail@lee.tcode.com"]. Thus, the answer is 2.
Example 2:
Input: emails = ["A@B.com", "a@b.com", "ab+xy@b.com", "a.b@b.com"]
Output: 2
Explanation:
| Local | Normalized Local | Domain | Normalized Domain | Final Email | |
|---|---|---|---|---|---|
| A@B.com | A | a | B.com | b.com | a@b.com |
| a@b.com | a | a | b.com | b.com | a@b.com |
| ab+xy@b.com | ab+xy | ab | b.com | b.com | ab@b.com |
| a.b@b.com | a.b | ab | b.com | b.com | ab@b.com |
Unique emails are ["a@b.com", "ab@b.com"]. Thus, the answer is 2.
Example 3:
Input: emails = ["a.b+c.d+e@DoMain.com", "ab+xyz@domain.com", "ab@domain.com"]
Output: 1
Explanation:
| Local | Normalized Local | Domain | Normalized Domain | Final Email | |
|---|---|---|---|---|---|
| a.b+c.d+e@DoMain.com | a.b+c.d+e | ab | DoMain.com | domain.com | ab@domain.com |
| ab+xyz@domain.com | ab+xyz | ab | domain.com | domain.com | ab@domain.com |
| ab@domain.com | ab | ab | domain.com | domain.com | ab@domain.com |
All emails normalize to "ab@domain.com". Thus, the answer is 1.
Constraints:
1 <= emails.length <= 10001 <= emails[i].length <= 100emails[i] consists of lowercase and uppercase English letters, digits, and the characters '.', '+', and '@'.emails[i] contains exactly one '@' character.'+'.".com" suffix and contain at least one character before ".com".Problem Overview: You receive a list of email addresses and must determine how many unique groups exist after applying normalization rules. Typically the local part of the email may ignore characters like . or discard everything after +, while the domain remains unchanged. Different raw strings can represent the same effective email, so the goal is to normalize each address and count distinct results.
Approach 1: Brute Force Normalization with List Comparison (O(n² · m) time, O(n · m) space)
Process each email and convert it into its normalized form using basic string operations. Remove dots in the local part and ignore characters after the first +, then append the unchanged domain. Instead of using a hash structure, store normalized results in a list and check each new email against all previous entries to see if it already exists. Each comparison may scan the string, giving O(m) cost per comparison, and up to n comparisons per email. This approach works for small inputs but becomes inefficient when the list grows.
Approach 2: Hash Table for Unique Normalized Emails (O(n · m) time, O(n · m) space)
The efficient solution uses a hash table (or hash set) to track normalized email addresses. Iterate through the array of emails once. For each email, split it into local and domain parts using the @ separator. Scan the local part character by character: skip ., stop when encountering +, and keep the remaining characters. Reconstruct the normalized email as processedLocal + "@" + domain. Insert the result into a hash set; duplicates are automatically ignored due to constant-time hash lookups.
The key insight is that normalization guarantees that all equivalent emails map to the same canonical string. A hash set makes duplicate detection constant time on average, reducing the overall complexity to O(n · m), where n is the number of emails and m is the average length of an email string.
Recommended for interviews: The hash table approach is what interviewers expect. It demonstrates that you can combine string parsing with constant‑time hash lookups to deduplicate data efficiently. Mentioning the brute force approach first shows you understand the baseline solution, but implementing the hash set version proves you can optimize both time complexity and code clarity.
We can use a hash set st to store the normalized result of each email address. For each email address, we normalize it according to the problem requirements:
., and if a plus sign + exists, remove the plus sign and everything after it. Then convert the local name to lowercase.st.Finally, the number of elements in the hash set st is the number of unique email groups.
The time complexity is O(n cdot m), where n and m are the number of email addresses and the average length of each email address, respectively. The space complexity is O(n cdot m), in the worst case where all email addresses are distinct.
Python
Java
C++
Go
TypeScript
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Brute Force Normalization with List Comparison | O(n² · m) | O(n · m) | Useful for understanding the normalization process or when input size is very small. |
| Hash Table for Unique Normalized Emails | O(n · m) | O(n · m) | Best general solution. Efficient for large inputs and commonly expected in coding interviews. |
Practice Unique Email Groups with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor