Watch the video solution for Calculate Compressed Mean, a easy level problem involving Database. This walkthrough by Everyday Data Science has 640 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Orders
+-------------------+------+ | Column Name | Type | +-------------------+------+ | order_id | int | | item_count | int | | order_occurrences | int | +-------------------+------+ order_id is column of unique values for this table. This table contains order_id, item_count, and order_occurrences.
Write a solution to calculate the average number of items per order, rounded to 2 decimal places.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Orders table: +----------+------------+-------------------+ | order_id | item_count | order_occurrences | +----------+------------+-------------------+ | 10 | 1 | 500 | | 11 | 2 | 1000 | | 12 | 3 | 800 | | 13 | 4 | 1000 | +----------+------------+-------------------+ Output +-------------------------+ | average_items_per_order | +-------------------------+ | 2.70 | +-------------------------+ Explanation The calculation is as follows: - Total items: (1 * 500) + (2 * 1000) + (3 * 800) + (4 * 1000) = 8900 - Total orders: 500 + 1000 + 800 + 1000 = 3300 - Therefore, the average items per order is 8900 / 3300 = 2.70
Problem Overview: The table stores numbers in a compressed format where each value appears with a frequency count. Instead of listing every occurrence, the dataset stores num and how many times it appears. Your task is to compute the mean of the expanded dataset without actually expanding it.
Approach 1: Conceptual Expansion (Brute Force) (Time: O(n + total_frequency), Space: O(total_frequency))
A straightforward way to think about the problem is to expand the compressed representation into the full dataset. For every row, you would repeat num exactly frequency times, build the full list, then compute the average. The mean is simply sum(all numbers) / count(all numbers). While this works conceptually, it becomes extremely inefficient when frequencies are large because the expanded dataset can grow far beyond the original table size. SQL systems are not designed to materialize massive repeated rows just to compute an average.
This approach helps you understand the math behind the problem, but it should never be implemented directly in production queries. It wastes memory and increases computation time unnecessarily.
Approach 2: Aggregated Summation (Optimal) (Time: O(n), Space: O(1))
The key insight is that expanding the dataset is unnecessary. Each row contributes num × frequency to the total sum and frequency to the total count. Instead of repeating values, compute the weighted contribution directly using SQL aggregation.
You iterate through the table once and compute two aggregates:
SUM(num * frequency) gives the total sum of the expanded dataset, while SUM(frequency) gives the total number of elements. The compressed mean is simply:
SUM(num * frequency) / SUM(frequency)
This approach leverages database database aggregation functions and basic SQL arithmetic operations. Because the query only scans the table once and keeps constant intermediate state, the time complexity is O(n) and the space complexity is O(1). SQL engines are highly optimized for this type of aggregation, making it both concise and efficient.
You may also round the result to a specific number of decimal places depending on the problem requirements, often using ROUND() in MySQL. This keeps the output consistent with expected precision.
Recommended for interviews: Interviewers expect the aggregated summation approach. Recognizing that the compressed format represents a weighted dataset shows strong problem understanding. The brute force expansion demonstrates the intuition behind the formula, but the optimal SQL aggregation shows practical engineering judgment and knowledge of SQL aggregation patterns.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Conceptual Expansion | O(n + total_frequency) | O(total_frequency) | Useful only for understanding how compressed data represents the full dataset |
| Aggregated Summation (SQL) | O(n) | O(1) | Best approach for SQL queries; computes weighted mean directly using SUM() |