Table: Activities
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| activity_id | int |
| user_id | int |
| activity_type | enum |
| time_spent | decimal |
+---------------+---------+
activity_id is column of unique values for this table.
activity_type is an ENUM (category) type of ('send', 'open').
This table contains activity id, user id, activity type and time spent.
Table: Age
+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id | int |
| age_bucket | enum |
+-------------+------+
user_id is the column of unique values for this table.
age_bucket is an ENUM (category) type of ('21-25', '26-30', '31-35').
This table contains user id and age group.
Write a solution to calculate the percentage of the total time spent on sending and opening snaps for each age group. Precentage should be rounded to 2 decimal places.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Activities table: +-------------+---------+---------------+------------+ | activity_id | user_id | activity_type | time_spent | +-------------+---------+---------------+------------+ | 7274 | 123 | open | 4.50 | | 2425 | 123 | send | 3.50 | | 1413 | 456 | send | 5.67 | | 2536 | 456 | open | 3.00 | | 8564 | 456 | send | 8.24 | | 5235 | 789 | send | 6.24 | | 4251 | 123 | open | 1.25 | | 1435 | 789 | open | 5.25 | +-------------+---------+---------------+------------+ Age table: +---------+------------+ | user_id | age_bucket | +---------+------------+ | 123 | 31-35 | | 789 | 21-25 | | 456 | 26-30 | +---------+------------+ Output: +------------+-----------+-----------+ | age_bucket | send_perc | open_perc | +------------+-----------+-----------+ | 31-35 | 37.84 | 62.16 | | 26-30 | 82.26 | 17.74 | | 21-25 | 54.31 | 45.69 | +------------+-----------+-----------+ Explanation: For age group 31-35: - There is only one user belonging to this group with the user ID 123. - The total time spent on sending snaps by this user is 3.50, and the time spent on opening snaps is 4.50 + 1.25 = 5.75. - The overall time spent by this user is 3.50 + 5.75 = 9.25. - Therefore, the sending snap percentage will be (3.50 / 9.25) * 100 = 37.84, and the opening snap percentage will be (5.75 / 9.25) * 100 = 62.16. For age group 26-30: - There is only one user belonging to this group with the user ID 456. - The total time spent on sending snaps by this user is 5.67 + 8.24 = 13.91, and the time spent on opening snaps is 3.00. - The overall time spent by this user is 13.91 + 3.00 = 16.91. - Therefore, the sending snap percentage will be (13.91 / 16.91) * 100 = 82.26, and the opening snap percentage will be (3.00 / 16.91) * 100 = 17.74. For age group 21-25: - There is only one user belonging to this group with the user ID 789. - The total time spent on sending snaps by this user is 6.24, and the time spent on opening snaps is 5.25. - The overall time spent by this user is 6.24 + 5.25 = 11.49. - Therefore, the sending snap percentage will be (6.24 / 11.49) * 100 = 54.31, and the opening snap percentage will be (5.25 / 11.49) * 100 = 45.69. All percentages in output table rounded to the two decimal places.
Problem Overview: You are given database tables containing Snap activity data. The goal is to analyze Snap interactions by joining related tables and aggregating metrics such as counts or totals per entity (for example user, snap, or time period). The challenge is structuring the query so the relationships are resolved correctly before performing aggregation.
Approach 1: Correlated Subqueries / Nested Aggregation (O(n*m) time, O(1) extra space)
A straightforward approach is computing metrics using nested queries. For each row in the primary table, run a subquery that filters the related table and calculates the required count or sum. While this works for small datasets, it scales poorly because the database repeatedly scans the related table. Each row triggers another query execution, producing quadratic‑like behavior when tables grow.
This method is sometimes seen in early solutions because it mirrors procedural thinking: iterate through rows and compute values one by one. However, relational databases are optimized for set operations rather than repeated row-level processing.
Approach 2: Equi-Join + GROUP BY Summation (O(n log n) time, O(k) space)
The efficient solution uses an equi-join between the relevant tables and then aggregates results with GROUP BY. First join the tables using matching keys (for example user_id or snap_id). This combines related records into a single result set. Once joined, apply aggregate functions such as COUNT(), SUM(), or conditional expressions to compute the required Snap statistics.
Grouping by the desired dimension (for example user, category, or timestamp bucket) allows the database engine to compute totals in a single pass over the joined dataset. Modern SQL engines optimize joins and aggregations with indexes and hash/group operations, making this significantly faster than nested queries.
This pattern appears frequently in database interview problems: join tables to reconstruct relationships, then aggregate metrics using SQL operations such as GROUP BY. The same logic also maps naturally to Python data analysis libraries (like pandas) where you perform a merge followed by a groupby aggregation.
Recommended for interviews: Interviewers expect the Equi-Join + GROUP BY approach. The naive nested query demonstrates understanding of relational relationships, but the optimized solution shows you know how databases process large datasets efficiently using set-based operations.
We can perform an equi-join to connect the Activities table and the Age table based on user_id. Then, group by age_bucket and finally calculate the percentage of sends and opens for each age group.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Correlated Subqueries / Nested Aggregation | O(n*m) | O(1) | Small datasets or quick prototype queries where performance is not critical |
| Equi-Join + GROUP BY Summation | O(n log n) | O(k) | Production queries and coding interviews requiring efficient aggregation across related tables |
Leetcode MEDIUM 3056 - Snaps Analysis - Solved by Everyday Data Science | CASE WHEN ROUND() SUM() • Everyday Data Science • 585 views views
Practice Snaps Analysis with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor