Table: Numbers
+-------------+------+ | Column Name | Type | +-------------+------+ | num | int | | frequency | int | +-------------+------+ num is the primary key (column with unique values) for this table. Each row of this table shows the frequency of a number in the database.
The median is the value separating the higher half from the lower half of a data sample.
Write a solution to report the median of all the numbers in the database after decompressing the Numbers table. Round the median to one decimal point.
The result format is in the following example.
Example 1:
Input: Numbers table: +-----+-----------+ | num | frequency | +-----+-----------+ | 0 | 7 | | 1 | 1 | | 2 | 3 | | 3 | 1 | +-----+-----------+ Output: +--------+ | median | +--------+ | 0.0 | +--------+ Explanation: If we decompress the Numbers table, we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3], so the median is (0 + 0) / 2 = 0.
Problem Overview: The table stores numbers along with how many times each value appears. Instead of expanding all values, compute the median directly from the frequency distribution.
Approach 1: Expand Frequencies into Rows (Brute Force) (Time: O(n + F), Space: O(F))
The most straightforward idea is to reconstruct the full dataset. For each row (num, frequency), generate frequency copies of num, then sort the expanded list and compute the median normally. In SQL this can be simulated using recursive queries or helper number tables. After expansion, the median is the middle element if the total count is odd, or the average of the two middle elements if the count is even.
This method mirrors how you would solve the problem in memory using arrays, but it becomes expensive when frequencies are large. The dataset size becomes the sum of all frequencies, which can explode quickly. It’s useful conceptually but rarely practical in a production database environment.
Approach 2: Cumulative Frequency with Window Functions (Optimal SQL) (Time: O(n log n), Space: O(n))
The key insight: the median depends on position, not the fully expanded data. First compute the total count of elements using SUM(frequency). The median positions are (total + 1) / 2 and (total + 2) / 2. For odd counts these two positions are identical; for even counts they represent the two middle elements whose average forms the median.
Next, sort rows by num and compute a running cumulative frequency using SQL window functions. This cumulative value tells you the index range that each number occupies in the conceptual expanded array. For example, if cumulative frequency jumps from 5 to 9, that number covers positions 6–9.
Once you know those ranges, simply select rows where the cumulative frequency interval contains either median position. Those numbers are the median candidates. Averaging them produces the correct result for both even and odd totals. The technique behaves like a prefix sum over frequencies and avoids materializing the full dataset.
Recommended for interviews: Interviewers expect the cumulative frequency idea. The brute-force expansion shows basic understanding of medians, but the window-function approach demonstrates database reasoning and the ability to transform frequency distributions into positional ranges efficiently.
MySQL
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Expand Frequencies into Rows | O(n + F) | O(F) | When frequencies are very small and clarity is more important than efficiency |
| Cumulative Frequency with Window Functions | O(n log n) | O(n) | Best SQL approach for large datasets without expanding rows |
| Prefix Sum Range Detection | O(n log n) | O(n) | General technique when computing statistics from frequency tables |
Leetcode HARD 571 - Decompressing Tables with Recursive CTE - Find Median | Everyday Data Science • Everyday Data Science • 919 views views
Watch 6 more video solutions →Practice Find Median Given Frequency of Numbers with our built-in code editor and test cases.
Practice on FleetCode