The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
arr = [2,3,4], the median is 3.arr = [2,3], the median is (2 + 3) / 2 = 2.5.Implement the MedianFinder class:
MedianFinder() initializes the MedianFinder object.void addNum(int num) adds the integer num from the data stream to the data structure.double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.Example 1:
Input ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] [[], [1], [2], [], [3], []] Output [null, null, null, 1.5, null, 2.0] Explanation MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); // arr = [1] medianFinder.addNum(2); // arr = [1, 2] medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2) medianFinder.addNum(3); // arr[1, 2, 3] medianFinder.findMedian(); // return 2.0
Constraints:
-105 <= num <= 105findMedian.5 * 104 calls will be made to addNum and findMedian.Follow up:
[0, 100], how would you optimize your solution?99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?In #295 Find Median from Data Stream, numbers arrive one by one and you must efficiently return the median at any moment. A naive approach would store all values and sort them each time the median is requested, but this leads to poor performance for continuous streams.
The optimal strategy uses two heaps: a max-heap to store the smaller half of the numbers and a min-heap to store the larger half. This structure maintains balance such that the size difference between the heaps is at most one. When a new number arrives, it is inserted into the appropriate heap and the heaps are rebalanced if necessary.
The median can then be derived from the heap tops. If both heaps have equal size, the median is the average of maxHeap.top() and minHeap.top(). If one heap has an extra element, its top represents the median. This design ensures efficient insertion and quick median retrieval for a streaming dataset.
The insertion operation runs in O(log n) time due to heap adjustments, while retrieving the median takes O(1) time.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Sorting after each insertion | O(n log n) per operation | O(n) |
| Two Heaps (Max-Heap + Min-Heap) | O(log n) insertion, O(1) median retrieval | O(n) |
NeetCode
The main idea is to use a max-heap to store the smaller half of the numbers and a min-heap to store the larger half. The max-heap allows you to efficiently fetch the largest element of the smaller half, and the min-heap allows you to efficiently fetch the smallest element of the larger half. To maintain the balance, ensure that the number of elements in the max-heap is either equal to or one more than the number of elements in the min-heap.
The time complexity for adding a number is O(log n) due to heap adjustments. Finding the median has a time complexity of O(1). Space complexity is O(n), where n is the number of elements.
1import java.util.PriorityQueue;
2
3class MedianFinder {
4 private PriorityQueue<Integer> small;
5 private PriorityQueue<Integer> large;
6
7 public MedianFinder() {
8 small = new PriorityQueue<>((a, b) -> b - a);
9 large = new PriorityQueue<>();
10 }
11
12 public void addNum(int num) {
13 small.add(num);
14 large.add(small.poll());
15 if (small.size() < large.size()) {
16 small.add(large.poll());
17 }
18 }
19
20 public double findMedian() {
21 if (small.size() > large.size()) {
22 return small.peek();
23 }
24 return (small.peek() + large.peek()) / 2.0;
25 }
26}This implementation uses Java's PriorityQueue for max and min heaps. After each insertion, elements are adjusted between two heaps to keep balance. The median is derived from the heap's top elements based on their sizes.
An alternative approach is to use a balanced binary search tree (BST) to keep track of the numbers. This allows maintaining a sorted order of elements, which can make finding the median efficient since a BST can give average-case time complexity of O(log n) for insertion and O(1) for median finding.
Insertion and balancing in the sorted set take O(log n) time, and findings of the median are O(1). Space complexity is O(n).
1using System;
2using System.Collections.Generic;
3
4public class MedianFinder {
5 private SortedSet<(int, int)> small;
6 private SortedSet<(int, int)> large;
7 private int counter = 0;
public MedianFinder() {
small = new SortedSet<(int, int)>();
large = new SortedSet<(int, int)>();
}
public void AddNum(int num) {
small.Add((num, counter++));
large.Add(small.Max);
small.Remove(small.Max);
if (large.Count > small.Count) {
small.Add(large.Min);
large.Remove(large.Min);
}
}
public double FindMedian() {
if (small.Count > large.Count) {
return small.Max.Item1;
}
return (small.Max.Item1 + large.Min.Item1) / 2.0;
}
}Watch expert explanations and walkthroughs
Practice problems asked by these companies to ace your technical interviews.
Explore More ProblemsJot down your thoughts, approach, and key learnings
Yes, this problem is commonly asked in FAANG and other top tech company interviews. It tests knowledge of heaps, streaming data handling, and efficient design of dynamic data structures.
Two heaps help maintain a balanced split of the numbers around the median. By ensuring the size difference is at most one, the median can always be derived from the top elements of the heaps without re-sorting the entire dataset.
A combination of a max-heap and a min-heap is the best data structure for this problem. The max-heap stores smaller values while the min-heap stores larger values, ensuring quick insertion and efficient median calculation.
The optimal approach uses two heaps: a max-heap for the lower half of numbers and a min-heap for the upper half. This structure keeps the dataset balanced and allows the median to be calculated directly from the heap tops in constant time.
This C# implementation uses two sorted sets as a stand-in for a balanced BST. Every number is inserted into the smaller set first, followed by balancing between the two sets. The median is retrieved based on the sizes of the sets and their boundary values.