Set Mismatch - Solution & Explanation

Q: Is Set Mismatch easy or hard?

Set Mismatch is categorized as an Easy problem. The main challenge is recognizing that the array should contain numbers from 1 to n with exactly one duplicate and one missing value. Once that constraint is understood, frequency counting or simple math leads to an efficient solution.

Q: How to solve Set Mismatch in O(n)?

Iterate through the array once while tracking either frequencies or mathematical sums. A HashMap can detect the duplicate immediately and then identify the missing number during a second pass from 1..n. The math-based approach compares the expected sum of numbers 1..n with the actual array sum to derive both values.

Q: What is the best approach for Set Mismatch?

Two common approaches solve Set Mismatch in linear time. The HashMap frequency counting method is the most intuitive and easy to implement. The mathematical Sum and Difference method is more space-efficient because it runs in O(n) time with O(1) extra space.

Q: What data structure is used in Set Mismatch?

The most common data structure used is a HashMap or hash table to store frequency counts of numbers in the array. This allows O(1) average lookup time to detect duplicates and missing elements efficiently. Some optimized solutions avoid extra data structures by using mathematical properties instead.

Q: What is the time complexity of Set Mismatch?

Optimal solutions run in O(n) time where n is the size of the array. Each element is processed once either while building a frequency map or while computing sums and differences. Space complexity depends on the approach: O(n) for HashMap counting and O(1) for the mathematical method.

Q: Set Mismatch Python or Java solution approach?

Python and Java implementations typically follow the same logic. Either maintain a hash map (or dictionary) to track counts, or compute mathematical differences between the expected and actual sums of numbers. Both approaches achieve O(n) time complexity.

Q: Is Set Mismatch asked at Google, Amazon, or Meta?

Set Mismatch represents a common array and hash table pattern frequently used in technical interviews. Similar duplicate-detection problems appear in interviews at large tech companies including Amazon, Google, and Meta because they test array traversal, hashing, and reasoning about constraints.

EasyArray Hash Table Bit Manipulation Sorting24 min readAsked at: Amazon, Microsoft, Meta +3

Practice this problem

Problem Statement

You have a set of integers s, which originally contains all the numbers from 1 to n. Unfortunately, due to some error, one of the numbers in s got duplicated to another number in the set, which results in repetition of one number and loss of another number.

You are given an integer array nums representing the data status of this set after the error.

Find the number that occurs twice and the number that is missing and return them in the form of an array.

Example 1:

Input: nums = [1,2,2,4]
Output: [2,3]

Example 2:

Input: nums = [1,1]
Output: [1,2]

Constraints:

2 <= nums.length <= 10⁴
1 <= nums[i] <= 10⁴

Approach Overview

Problem Overview: You are given an array containing numbers from 1 to n. One number appears twice and another number is missing. The task is to identify the duplicated value and the missing value efficiently.

Approach 1: Sum and Difference Method (O(n) time, O(1) space)

This approach uses simple math instead of extra data structures. For a perfect sequence from 1..n, the expected sum is n(n+1)/2. When you iterate through the array and compute the actual sum, the difference between the expected and actual values reveals the relationship between the missing and duplicated numbers. You can also compute the sum of squares difference to isolate both values. The key insight is that the duplicate adds extra value while the missing number subtracts from the expected total. By solving the resulting equations, you recover both numbers in a single pass. This method runs in O(n) time and uses O(1) extra space, making it an efficient mathematical alternative when you want to avoid additional memory. It works well for problems involving arithmetic properties of an array.

Approach 2: Frequency Counting with HashMap (O(n) time, O(n) space)

This method relies on counting how many times each number appears. Iterate through the array and store frequencies in a HashMap. When a value appears twice, you immediately identify the duplicate. After building the frequency map, iterate from 1 to n and check which number does not appear in the map—that number is the missing value. The key idea is constant-time lookup using a hash table, which makes detection straightforward. The algorithm runs in O(n) time because each element is processed once and hash lookups are O(1) on average. The tradeoff is O(n) additional space for storing counts. This approach is often the most intuitive if you are comfortable with frequency counting patterns.

Recommended for interviews: The HashMap approach is usually the first solution candidates present because it directly models the problem using frequency counting. It clearly demonstrates understanding of hash tables and runs in optimal O(n) time. Strong candidates then optimize further using the mathematical sum-and-difference technique, which reduces space to O(1). Showing both approaches signals strong problem-solving depth: the HashMap version proves correctness quickly, while the math-based solution demonstrates deeper algorithmic insight with array properties.

Approach 1: Approach 1: Sum and Difference Method

This approach uses the mathematical properties of numbers from 1 to n to determine the duplicate and missing numbers. Calculate the expected sum of numbers from 1 to n. Then, iterate through the array to find the actual sum of numbers and the sum of squares. By using these properties and equations, derive the duplicate and missing numbers.

This Python function calculates the missing and duplicate numbers using the sum and difference method. It computes the total expected and actual sums, as well as the sums of squares. These computations help solving linear equations to find the exact missing and duplicate numbers.

Code

Python JavaScript

Python

JavaScript

Complexity

Time Complexity: O(n), as we iterate through the input array three times.
Space Complexity: O(1), since we use a constant amount of space for variables.

Try this approach in the editor →

Approach 2: Approach 2: Frequency Counting with HashMap

In this approach, utilize a hashmap to count the occurrences of each number. Iterate over the numbers from 1 to n and use the occurrence count to identify the duplicated and missing numbers.

This Java solution uses a HashMap to keep track of the count of each number. As we iterate over the range, we check the frequency of each number to determine which is duplicated and which is missing.

Code

Java C#

Java

Complexity

Time Complexity: O(n), where n is the length of nums, due to two iterations over the array.
Space Complexity: O(n), for storing the counts in a hashmap.

Try this approach in the editor →

Approach 3: Mathematics

We denote s_1 as the sum of all numbers from [1,..n], s_2 as the sum of the numbers in the array nums after removing duplicates, and s as the sum of the numbers in the array nums.

Then s - s_2 is the duplicate number, and s_1 - s_2 is the missing number.

The time complexity is O(n), and the space complexity is O(n), where n is the length of the array nums. Extra space is needed to store the array after de-duplication.

Code

Python Java C++Go TypeScript Rust

Python

Java

C++

TypeScript

Rust

Try this approach in the editor →

Approach 4: Hash Table

We can also use a more intuitive method, using a hash table cnt to count the occurrence of each number in the array nums.

Next, iterate through x \in [1, n], if cnt[x] = 2, then x is the duplicate number, if cnt[x] = 0, then x is the missing number.

The time complexity is O(n), and the space complexity is O(n), where n is the length of the array nums.

Code

Python Java C++Go TypeScript Rust

Python

Java

C++

TypeScript

Rust

Try this approach in the editor →

Approach 5: Bit Operation

According to the properties of the XOR operation, for integer x, we have x \oplus x = 0 and x \oplus 0 = x. Therefore, if we perform the XOR operation on all elements in the array nums and all numbers i \in [1, n], we can eliminate the numbers that appear twice, leaving only the XOR result of the missing number and the duplicate number, i.e., xs = a \oplus b.

Since these two numbers are not equal, there must be at least one bit in the XOR result that is 1. We find the lowest bit of 1 in the XOR result through the lowbit operation, and then divide all numbers in the array nums and all numbers i \in [1, n] into two groups according to whether this bit is 1. In this way, the two numbers are divided into different groups. The XOR result of one group of numbers is a, and the XOR result of the other group is b. These two numbers are the answers we are looking for.

Next, we only need to determine which of a and b is the duplicate number and which is the missing number. Therefore, iterate through the array nums, for the traversed number x, if x=a, then a is the duplicate number, return [a, b], otherwise, at the end of the iteration, return [b, a].

The time complexity is O(n), where n is the length of the array nums. The space complexity is O(1), only using a constant size of extra space.

Code

Python Java C++Go TypeScript Rust

Python

Java

C++

TypeScript

Rust

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Approach 1: Sum and Difference Method	Time Complexity: O(n), as we iterate through the input array three times. Space Complexity: O(1), since we use a constant amount of space for variables.
Approach 2: Frequency Counting with HashMap	Time Complexity: O(n), where n is the length of nums, due to two iterations over the array. Space Complexity: O(n), for storing the counts in a hashmap.
Mathematics	—
Hash Table	—
Bit Operation	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Sum and Difference Method	O(n)	O(1)	When you want optimal space and can leverage arithmetic properties of numbers 1..n
Frequency Counting with HashMap	O(n)	O(n)	General case when clarity matters and extra memory is acceptable

Video Solution

Set Mismatch - Leetcode 645 - Python • NeetCodeIO • 15,608 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Set Mismatch easy or hard?

Set Mismatch is categorized as an Easy problem. The main challenge is recognizing that the array should contain numbers from 1 to n with exactly one duplicate and one missing value. Once that constraint is understood, frequency counting or simple math leads to an efficient solution.

How to solve Set Mismatch in O(n)?

Iterate through the array once while tracking either frequencies or mathematical sums. A HashMap can detect the duplicate immediately and then identify the missing number during a second pass from 1..n. The math-based approach compares the expected sum of numbers 1..n with the actual array sum to derive both values.

What is the best approach for Set Mismatch?

Two common approaches solve Set Mismatch in linear time. The HashMap frequency counting method is the most intuitive and easy to implement. The mathematical Sum and Difference method is more space-efficient because it runs in O(n) time with O(1) extra space.

What data structure is used in Set Mismatch?

The most common data structure used is a HashMap or hash table to store frequency counts of numbers in the array. This allows O(1) average lookup time to detect duplicates and missing elements efficiently. Some optimized solutions avoid extra data structures by using mathematical properties instead.

What is the time complexity of Set Mismatch?

Optimal solutions run in O(n) time where n is the size of the array. Each element is processed once either while building a frequency map or while computing sums and differences. Space complexity depends on the approach: O(n) for HashMap counting and O(1) for the mathematical method.

Set Mismatch Python or Java solution approach?

Python and Java implementations typically follow the same logic. Either maintain a hash map (or dictionary) to track counts, or compute mathematical differences between the expected and actual sums of numbers. Both approaches achieve O(n) time complexity.

Is Set Mismatch asked at Google, Amazon, or Meta?

Set Mismatch represents a common array and hash table pattern frequently used in technical interviews. Similar duplicate-detection problems appear in interviews at large tech companies including Amazon, Google, and Meta because they test array traversal, hashing, and reasoning about constraints.

Ready to solve this problem?

Practice Set Mismatch with our built-in code editor and test cases.

Practice on FleetCode

Find the Duplicate Number

Problem Info

DifficultyEasy

Acceptance44.0%

Approaches5

Reading time24 min

Asked at

Amazon Microsoft Meta Google GitHub

Practice this problem

Open in Editor

Problem Statement

Approach Overview

Approach 1: Approach 1: Sum and Difference Method

Code

Complexity

Approach 2: Approach 2: Frequency Counting with HashMap

Code

Complexity

Approach 3: Mathematics

Code

Approach 4: Hash Table

Code

Approach 5: Bit Operation

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Approach 1: Sum and Difference Method

Code

Complexity

Approach 2: Approach 2: Frequency Counting with HashMap

Code

Complexity

Approach 3: Mathematics

Code

Approach 4: Hash Table

Code

Approach 5: Bit Operation

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents