You have a set of integers s, which originally contains all the numbers from 1 to n. Unfortunately, due to some error, one of the numbers in s got duplicated to another number in the set, which results in repetition of one number and loss of another number.
You are given an integer array nums representing the data status of this set after the error.
Find the number that occurs twice and the number that is missing and return them in the form of an array.
Example 1:
Input: nums = [1,2,2,4] Output: [2,3]
Example 2:
Input: nums = [1,1] Output: [1,2]
Constraints:
2 <= nums.length <= 1041 <= nums[i] <= 104Problem Overview: You are given an array containing numbers from 1 to n. One number appears twice and another number is missing. The task is to identify the duplicated value and the missing value efficiently.
Approach 1: Sum and Difference Method (O(n) time, O(1) space)
This approach uses simple math instead of extra data structures. For a perfect sequence from 1..n, the expected sum is n(n+1)/2. When you iterate through the array and compute the actual sum, the difference between the expected and actual values reveals the relationship between the missing and duplicated numbers. You can also compute the sum of squares difference to isolate both values. The key insight is that the duplicate adds extra value while the missing number subtracts from the expected total. By solving the resulting equations, you recover both numbers in a single pass. This method runs in O(n) time and uses O(1) extra space, making it an efficient mathematical alternative when you want to avoid additional memory. It works well for problems involving arithmetic properties of an array.
Approach 2: Frequency Counting with HashMap (O(n) time, O(n) space)
This method relies on counting how many times each number appears. Iterate through the array and store frequencies in a HashMap. When a value appears twice, you immediately identify the duplicate. After building the frequency map, iterate from 1 to n and check which number does not appear in the map—that number is the missing value. The key idea is constant-time lookup using a hash table, which makes detection straightforward. The algorithm runs in O(n) time because each element is processed once and hash lookups are O(1) on average. The tradeoff is O(n) additional space for storing counts. This approach is often the most intuitive if you are comfortable with frequency counting patterns.
Recommended for interviews: The HashMap approach is usually the first solution candidates present because it directly models the problem using frequency counting. It clearly demonstrates understanding of hash tables and runs in optimal O(n) time. Strong candidates then optimize further using the mathematical sum-and-difference technique, which reduces space to O(1). Showing both approaches signals strong problem-solving depth: the HashMap version proves correctness quickly, while the math-based solution demonstrates deeper algorithmic insight with array properties.
This approach uses the mathematical properties of numbers from 1 to n to determine the duplicate and missing numbers. Calculate the expected sum of numbers from 1 to n. Then, iterate through the array to find the actual sum of numbers and the sum of squares. By using these properties and equations, derive the duplicate and missing numbers.
This Python function calculates the missing and duplicate numbers using the sum and difference method. It computes the total expected and actual sums, as well as the sums of squares. These computations help solving linear equations to find the exact missing and duplicate numbers.
Python
JavaScript
Time Complexity: O(n), as we iterate through the input array three times.
Space Complexity: O(1), since we use a constant amount of space for variables.
In this approach, utilize a hashmap to count the occurrences of each number. Iterate over the numbers from 1 to n and use the occurrence count to identify the duplicated and missing numbers.
This Java solution uses a HashMap to keep track of the count of each number. As we iterate over the range, we check the frequency of each number to determine which is duplicated and which is missing.
Time Complexity: O(n), where n is the length of nums, due to two iterations over the array.
Space Complexity: O(n), for storing the counts in a hashmap.
We denote s_1 as the sum of all numbers from [1,..n], s_2 as the sum of the numbers in the array nums after removing duplicates, and s as the sum of the numbers in the array nums.
Then s - s_2 is the duplicate number, and s_1 - s_2 is the missing number.
The time complexity is O(n), and the space complexity is O(n), where n is the length of the array nums. Extra space is needed to store the array after de-duplication.
We can also use a more intuitive method, using a hash table cnt to count the occurrence of each number in the array nums.
Next, iterate through x \in [1, n], if cnt[x] = 2, then x is the duplicate number, if cnt[x] = 0, then x is the missing number.
The time complexity is O(n), and the space complexity is O(n), where n is the length of the array nums.
According to the properties of the XOR operation, for integer x, we have x \oplus x = 0 and x \oplus 0 = x. Therefore, if we perform the XOR operation on all elements in the array nums and all numbers i \in [1, n], we can eliminate the numbers that appear twice, leaving only the XOR result of the missing number and the duplicate number, i.e., xs = a \oplus b.
Since these two numbers are not equal, there must be at least one bit in the XOR result that is 1. We find the lowest bit of 1 in the XOR result through the lowbit operation, and then divide all numbers in the array nums and all numbers i \in [1, n] into two groups according to whether this bit is 1. In this way, the two numbers are divided into different groups. The XOR result of one group of numbers is a, and the XOR result of the other group is b. These two numbers are the answers we are looking for.
Next, we only need to determine which of a and b is the duplicate number and which is the missing number. Therefore, iterate through the array nums, for the traversed number x, if x=a, then a is the duplicate number, return [a, b], otherwise, at the end of the iteration, return [b, a].
The time complexity is O(n), where n is the length of the array nums. The space complexity is O(1), only using a constant size of extra space.
| Approach | Complexity |
|---|---|
| Approach 1: Sum and Difference Method | Time Complexity: O(n), as we iterate through the input array three times. |
| Approach 2: Frequency Counting with HashMap | Time Complexity: O(n), where n is the length of nums, due to two iterations over the array. |
| Mathematics | — |
| Hash Table | — |
| Bit Operation | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Sum and Difference Method | O(n) | O(1) | When you want optimal space and can leverage arithmetic properties of numbers 1..n |
| Frequency Counting with HashMap | O(n) | O(n) | General case when clarity matters and extra memory is acceptable |
Set Mismatch - Leetcode 645 - Python • NeetCodeIO • 15,608 views views
Watch 9 more video solutions →Practice Set Mismatch with our built-in code editor and test cases.
Practice on FleetCode