Watch 10 video solutions for Statistics from a Large Sample, a medium level problem involving Array, Math, Probability and Statistics. This walkthrough by Kelvin Chandra has 417 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
You are given a large sample of integers in the range [0, 255]. Since the sample is so large, it is represented by an array count where count[k] is the number of times that k appears in the sample.
Calculate the following statistics:
minimum: The minimum element in the sample.maximum: The maximum element in the sample.mean: The average of the sample, calculated as the total sum of all elements divided by the total number of elements.median:
median is the middle element once the sample is sorted.median is the average of the two middle elements once the sample is sorted.mode: The number that appears the most in the sample. It is guaranteed to be unique.Return the statistics of the sample as an array of floating-point numbers [minimum, maximum, mean, median, mode]. Answers within 10-5 of the actual answer will be accepted.
Example 1:
Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] Output: [1.00000,3.00000,2.37500,2.50000,3.00000] Explanation: The sample represented by count is [1,2,2,2,3,3,3,3]. The minimum and maximum are 1 and 3 respectively. The mean is (1+2+2+2+3+3+3+3) / 8 = 19 / 8 = 2.375. Since the size of the sample is even, the median is the average of the two middle elements 2 and 3, which is 2.5. The mode is 3 as it appears the most in the sample.
Example 2:
Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] Output: [1.00000,4.00000,2.18182,2.00000,1.00000] Explanation: The sample represented by count is [1,1,1,1,2,2,2,3,3,4,4]. The minimum and maximum are 1 and 4 respectively. The mean is (1+1+1+1+2+2+2+3+3+4+4) / 11 = 24 / 11 = 2.18181818... (for display purposes, the output shows the rounded number 2.18182). Since the size of the sample is odd, the median is the middle element 2. The mode is 1 as it appears the most in the sample.
Constraints:
count.length == 2560 <= count[i] <= 1091 <= sum(count) <= 109count represents is unique.Problem Overview: You are given a frequency array count[256] where each index represents a value and the element at that index represents how many times the value appears in a large dataset. From this compressed distribution, compute five statistics: minimum, maximum, mean, median, and mode.
Approach 1: Iterative Approach to Calculate Statistics (Time: O(k), Space: O(1))
The simplest solution walks through the frequency array once and reconstructs the statistical values directly from the counts. During the iteration, track the first index with a non‑zero count for the minimum and the last for the maximum. Maintain a running sum value * frequency and a total count to compute the mean. For the mode, keep the index with the highest frequency encountered so far. Median calculation requires identifying the middle position(s) in the expanded dataset, so while iterating you accumulate frequencies until the cumulative count crosses the median indices. Since the domain size is fixed (256 values), the scan is effectively constant time and uses O(1) additional memory.
Approach 2: Distribution-Based Approach with Two Pass Median Calculation (Time: O(k), Space: O(1))
This approach treats the input strictly as a probability distribution and separates the computation into two passes. The first pass computes min, max, mean, and mode by iterating through the frequency array and accumulating totals. After the total element count is known, determine the median positions depending on whether the dataset size is odd or even. A second pass walks the distribution again while maintaining a cumulative frequency until those median positions are reached. The key insight is that you never expand the dataset into an explicit array; instead, you rely on prefix frequency counts to simulate the sorted order. This keeps the algorithm efficient even when the conceptual sample size is extremely large.
The core idea across both methods is exploiting the bounded range of values. Instead of sorting millions of samples, the algorithm processes the array of counts directly and reconstructs statistics mathematically. Operations such as cumulative frequency tracking and weighted summation come directly from math and probability and statistics concepts.
Recommended for interviews: The distribution-based two-pass approach is typically what interviewers expect. It clearly shows you understand how to derive statistics from a frequency distribution without reconstructing the dataset. The single-pass iterative variant also works and demonstrates good implementation skills, but explicitly reasoning about median positions from cumulative counts highlights stronger algorithmic thinking.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Iterative Approach to Calculate Statistics | O(k) | O(1) | General solution when scanning the frequency array once and computing all statistics together. |
| Distribution-Based Two Pass Median Calculation | O(k) | O(1) | Preferred when reasoning about median positions from cumulative frequencies in a statistical distribution. |