Watch 2 video solutions for Count Apples and Oranges, a medium level problem involving Database. This walkthrough by Everyday Data Science has 3,513 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Boxes
+--------------+------+ | Column Name | Type | +--------------+------+ | box_id | int | | chest_id | int | | apple_count | int | | orange_count | int | +--------------+------+ box_id is the column with unique values for this table. chest_id is a foreign key (reference column) of the chests table. This table contains information about the boxes and the number of oranges and apples they have. Each box may include a chest, which also can contain oranges and apples.
Table: Chests
+--------------+------+ | Column Name | Type | +--------------+------+ | chest_id | int | | apple_count | int | | orange_count | int | +--------------+------+ chest_id is the column with unique values for this table. This table contains information about the chests and the corresponding number of oranges and apples they have.
Write a solution to count the number of apples and oranges in all the boxes. If a box contains a chest, you should also include the number of apples and oranges it has.
The result format is in the following example.
Example 1:
Input: Boxes table: +--------+----------+-------------+--------------+ | box_id | chest_id | apple_count | orange_count | +--------+----------+-------------+--------------+ | 2 | null | 6 | 15 | | 18 | 14 | 4 | 15 | | 19 | 3 | 8 | 4 | | 12 | 2 | 19 | 20 | | 20 | 6 | 12 | 9 | | 8 | 6 | 9 | 9 | | 3 | 14 | 16 | 7 | +--------+----------+-------------+--------------+ Chests table: +----------+-------------+--------------+ | chest_id | apple_count | orange_count | +----------+-------------+--------------+ | 6 | 5 | 6 | | 14 | 20 | 10 | | 2 | 8 | 8 | | 3 | 19 | 4 | | 16 | 19 | 19 | +----------+-------------+--------------+ Output: +-------------+--------------+ | apple_count | orange_count | +-------------+--------------+ | 151 | 123 | +-------------+--------------+ Explanation: box 2 has 6 apples and 15 oranges. box 18 has 4 + 20 (from the chest) = 24 apples and 15 + 10 (from the chest) = 25 oranges. box 19 has 8 + 19 (from the chest) = 27 apples and 4 + 4 (from the chest) = 8 oranges. box 12 has 19 + 8 (from the chest) = 27 apples and 20 + 8 (from the chest) = 28 oranges. box 20 has 12 + 5 (from the chest) = 17 apples and 9 + 6 (from the chest) = 15 oranges. box 8 has 9 + 5 (from the chest) = 14 apples and 9 + 6 (from the chest) = 15 oranges. box 3 has 16 + 20 (from the chest) = 36 apples and 7 + 10 (from the chest) = 17 oranges. Total number of apples = 6 + 24 + 27 + 27 + 17 + 14 + 36 = 151 Total number of oranges = 15 + 25 + 8 + 28 + 15 + 15 + 17 = 123
Problem Overview: You have two tables: Boxes and Chests. Each record stores how many apples and oranges it contains. Some boxes are placed inside a chest via chest_id. The task is to return the total number of apples and oranges across both tables.
Approach 1: Left Join + Summation (O(n + m) time, O(1) space)
Join the Boxes table with the Chests table using a LEFT JOIN on chest_id. The join lets you access fruit counts stored in both tables in a single result set. After the join, aggregate the counts using SUM() to compute the total apples and oranges. If a box is not inside a chest, the chest columns will be NULL, so functions like COALESCE() or IFNULL() convert them to zero before summing.
The key insight is that SQL aggregation can combine values from multiple related tables in a single pass. Instead of iterating separately over boxes and chests, the database engine performs the join and aggregation efficiently during query execution. This pattern appears frequently in database interview questions where related entities must be aggregated together.
The same idea applies in Pandas using merge(). Perform a left merge on chest_id, then compute totals by summing the relevant columns. Vectorized operations handle the aggregation efficiently without explicit loops. This mirrors SQL joins and aggregation operations often used in analytics pipelines and data processing workflows involving SQL or data analysis.
Recommended for interviews: The LEFT JOIN + SUM approach is the expected solution. Interviewers want to see that you understand relational joins and aggregation together. A naive approach that queries each table separately demonstrates basic understanding, but combining them with a join shows stronger SQL fluency and the ability to write concise analytical queries.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Separate Aggregation Queries | O(n + m) | O(1) | Simple scenario where totals from each table are computed independently |
| Left Join + Summation | O(n + m) | O(1) | Preferred SQL approach when related records must be aggregated together |
| Pandas Merge + Column Sum | O(n + m) | O(n) | When solving the problem in Python data pipelines using Pandas |