Reservoir Sampling is a randomized algorithm used to select a random sample of k elements from a large or unknown-sized data stream. The key challenge it solves is sampling uniformly without storing the entire dataset in memory. This makes it extremely useful when working with massive streams of data where only a single pass through the input is possible.
The algorithm maintains a small "reservoir" of selected items and gradually updates it as new elements arrive. Each incoming element has a carefully calculated probability of replacing an existing element in the reservoir, ensuring every item in the stream has an equal chance of being selected. Because of its probabilistic nature, the technique is closely related to Randomized algorithms and concepts from Probability and Statistics.
Reservoir Sampling commonly appears in interview problems involving Data Stream processing, especially when memory constraints prevent storing all elements. While the input may conceptually resemble an Array, the algorithm assumes the total size may be unknown.
Mastering Reservoir Sampling helps you handle streaming data efficiently and demonstrates strong understanding of randomized algorithm design in coding interviews.
Helps understand iteration over elements and the baseline scenario where the dataset size is known.
Reservoir Sampling is a classic randomized algorithm that relies on controlled randomness for uniform selection.
Many reservoir sampling problems involve processing elements from a continuous or unknown-length data stream.
Essential for understanding why each element has an equal probability of being selected in the reservoir.
| Status | Title | Video | Leetcode | Solve | Difficulty | Companies | Topics |
|---|---|---|---|---|---|---|---|
| 382. Linked List Random Node | Solve | Medium | Amazon+2 | ||||
| 398. Random Pick Index | Solve | Medium | Amazon+2 | ||||
| 497. Random Point in Non-overlapping Rectangles | Solve | Medium | Google | ||||
| 519. Random Flip Matrix | Solve | Medium | Google |
Frequently appear alongside Reservoir Sampling.
Common questions about Reservoir Sampling.
Use it when the input size is extremely large or unknown and you cannot store all elements in memory. It is commonly used in streaming systems, analytics pipelines, and randomized selection problems.
Reservoir Sampling is a randomized algorithm used to select k random elements from a stream or dataset of unknown size. It guarantees that every element has an equal probability of being chosen while using only O(k) memory.
Practicing multiple variations helps you understand both single-item and k-item sampling scenarios. Solving several problems builds intuition for probability-based updates and streaming constraints.
It demonstrates your ability to design algorithms for streaming data and memory-constrained environments. Interviewers often use it to test understanding of probability and randomized techniques.
Reservoir Sampling processes each element once, so the time complexity is O(n). The space complexity is O(k) because only the reservoir of k elements is stored.