You are given a large sample of integers in the range [0, 255]. Since the sample is so large, it is represented by an array count where count[k] is the number of times that k appears in the sample.
Calculate the following statistics:
minimum: The minimum element in the sample.maximum: The maximum element in the sample.mean: The average of the sample, calculated as the total sum of all elements divided by the total number of elements.median:
median is the middle element once the sample is sorted.median is the average of the two middle elements once the sample is sorted.mode: The number that appears the most in the sample. It is guaranteed to be unique.Return the statistics of the sample as an array of floating-point numbers [minimum, maximum, mean, median, mode]. Answers within 10-5 of the actual answer will be accepted.
Example 1:
Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] Output: [1.00000,3.00000,2.37500,2.50000,3.00000] Explanation: The sample represented by count is [1,2,2,2,3,3,3,3]. The minimum and maximum are 1 and 3 respectively. The mean is (1+2+2+2+3+3+3+3) / 8 = 19 / 8 = 2.375. Since the size of the sample is even, the median is the average of the two middle elements 2 and 3, which is 2.5. The mode is 3 as it appears the most in the sample.
Example 2:
Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] Output: [1.00000,4.00000,2.18182,2.00000,1.00000] Explanation: The sample represented by count is [1,1,1,1,2,2,2,3,3,4,4]. The minimum and maximum are 1 and 4 respectively. The mean is (1+1+1+1+2+2+2+3+3+4+4) / 11 = 24 / 11 = 2.18181818... (for display purposes, the output shows the rounded number 2.18182). Since the size of the sample is odd, the median is the middle element 2. The mode is 1 as it appears the most in the sample.
Constraints:
count.length == 2560 <= count[i] <= 1091 <= sum(count) <= 109count represents is unique.Problem Overview: You are given a frequency array count[256] where each index represents a value and the element at that index represents how many times the value appears in a large dataset. From this compressed distribution, compute five statistics: minimum, maximum, mean, median, and mode.
Approach 1: Iterative Approach to Calculate Statistics (Time: O(k), Space: O(1))
The simplest solution walks through the frequency array once and reconstructs the statistical values directly from the counts. During the iteration, track the first index with a non‑zero count for the minimum and the last for the maximum. Maintain a running sum value * frequency and a total count to compute the mean. For the mode, keep the index with the highest frequency encountered so far. Median calculation requires identifying the middle position(s) in the expanded dataset, so while iterating you accumulate frequencies until the cumulative count crosses the median indices. Since the domain size is fixed (256 values), the scan is effectively constant time and uses O(1) additional memory.
Approach 2: Distribution-Based Approach with Two Pass Median Calculation (Time: O(k), Space: O(1))
This approach treats the input strictly as a probability distribution and separates the computation into two passes. The first pass computes min, max, mean, and mode by iterating through the frequency array and accumulating totals. After the total element count is known, determine the median positions depending on whether the dataset size is odd or even. A second pass walks the distribution again while maintaining a cumulative frequency until those median positions are reached. The key insight is that you never expand the dataset into an explicit array; instead, you rely on prefix frequency counts to simulate the sorted order. This keeps the algorithm efficient even when the conceptual sample size is extremely large.
The core idea across both methods is exploiting the bounded range of values. Instead of sorting millions of samples, the algorithm processes the array of counts directly and reconstructs statistics mathematically. Operations such as cumulative frequency tracking and weighted summation come directly from math and probability and statistics concepts.
Recommended for interviews: The distribution-based two-pass approach is typically what interviewers expect. It clearly shows you understand how to derive statistics from a frequency distribution without reconstructing the dataset. The single-pass iterative variant also works and demonstrates good implementation skills, but explicitly reasoning about median positions from cumulative counts highlights stronger algorithmic thinking.
The iterative approach involves traversing the given frequency array from 0 to 255. This helps to determine the minimum, maximum, and gather necessary values to compute mean, median, and mode simultaneously.
This C solution iterates over the count array to determine minimum and maximum values, cumulative sum, total count, and frequency of appearances for calculating mean, median, and mode. It uses cumulative frequency count to identify the median based on odd/even total appearance count.
Time Complexity: O(N), where N = 256 due to iterating over the fixed array.
Space Complexity: O(1), only a fixed number of variable storages are used indefinitely.
This approach is on dividing tasks into passes: A first pass supports accumulation & mean calculation; the second pass concentrates on median. Separating concerns enhances clarity.
For this C implementation, two distinct loop passes are executed: The first calculates min, max, mode, and mean; the second pass determines median using precomputed summation indices.
Time Complexity: O(N), N denotes fixed count of 256.
Space Complexity: O(1), involves a fixed number of simple variables.
Python
Java
C++
Go
TypeScript
| Approach | Complexity |
|---|---|
| Iterative Approach to Calculate Statistics | Time Complexity: O(N), where N = 256 due to iterating over the fixed array. |
| Distribution-Based Approach with Two Pass Median Calculation | Time Complexity: O(N), N denotes fixed count of 256. |
| Default Approach | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Iterative Approach to Calculate Statistics | O(k) | O(1) | General solution when scanning the frequency array once and computing all statistics together. |
| Distribution-Based Two Pass Median Calculation | O(k) | O(1) | Preferred when reasoning about median positions from cumulative frequencies in a statistical distribution. |
1093. Statistics from a Large Sample (LeetCode Weekly Contest 142) • Kelvin Chandra • 417 views views
Watch 9 more video solutions →Practice Statistics from a Large Sample with our built-in code editor and test cases.
Practice on FleetCode