Table: Activity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |
+---------------+---------+
This table may have duplicate rows.
The activity_type column is an ENUM (category) of type ('open_session', 'end_session', 'scroll_down', 'send_message').
The table shows the user activities for a social media website.
Note that each session belongs to exactly one user.
Write a solution to find the average number of sessions per user for a period of 30 days ending 2019-07-27 inclusively, rounded to 2 decimal places. The sessions we want to count for a user are those with at least one activity in that time period.
The result format is in the following example.
Example 1:
Input: Activity table: +---------+------------+---------------+---------------+ | user_id | session_id | activity_date | activity_type | +---------+------------+---------------+---------------+ | 1 | 1 | 2019-07-20 | open_session | | 1 | 1 | 2019-07-20 | scroll_down | | 1 | 1 | 2019-07-20 | end_session | | 2 | 4 | 2019-07-20 | open_session | | 2 | 4 | 2019-07-21 | send_message | | 2 | 4 | 2019-07-21 | end_session | | 3 | 2 | 2019-07-21 | open_session | | 3 | 2 | 2019-07-21 | send_message | | 3 | 2 | 2019-07-21 | end_session | | 3 | 5 | 2019-07-21 | open_session | | 3 | 5 | 2019-07-21 | scroll_down | | 3 | 5 | 2019-07-21 | end_session | | 4 | 3 | 2019-06-25 | open_session | | 4 | 3 | 2019-06-25 | end_session | +---------+------------+---------------+---------------+ Output: +---------------------------+ | average_sessions_per_user | +---------------------------+ | 1.33 | +---------------------------+ Explanation: User 1 and 2 each had 1 session in the past 30 days while user 3 had 2 sessions so the average is (1 + 1 + 2) / 3 = 1.33.
Problem Overview: The table Activity stores user activity records with user_id, session_id, and activity_date. The task is to compute the average number of sessions per user during the 30‑day window ending on 2019-07-27. Each session is uniquely identified by session_id, so the goal is to count unique sessions and divide by the number of unique active users in that time range.
Approach 1: Aggregate with COUNT(DISTINCT) (O(n) time, O(1) space)
Filter the dataset to the required 30‑day window using a WHERE condition. Then compute two aggregates: the number of distinct sessions and the number of distinct users. The average sessions per user is simply COUNT(DISTINCT session_id) / COUNT(DISTINCT user_id). Since SQL engines scan the filtered rows once and maintain aggregate counters, the runtime is O(n) where n is the number of rows in the date range, and the additional memory usage is constant.
The key insight is that the problem does not require grouping by user explicitly. Instead of calculating sessions per user and averaging later, you can compute the global ratio directly using two distinct counts. This reduces query complexity and keeps the execution plan simple. MySQL handles COUNT(DISTINCT ...) efficiently using internal hashing or sorting strategies.
This approach is typical in SQL interview questions involving ratios between aggregates. You filter rows, compute aggregates, and derive the final metric in the same query.
Approach 2: Group by User then Compute Average (O(n) time, O(u) space)
Another way is to first compute the number of sessions per user in the 30‑day window. Use GROUP BY user_id and count distinct session_id for each user. That produces a per‑user session count. Then compute the overall average using AVG() over those grouped results, usually through a subquery or derived table.
This approach mirrors the conceptual definition of the metric: sessions per user, then the average of those values. The SQL engine creates an intermediate grouped dataset of size u (number of active users), so the extra space is O(u). Runtime remains O(n) because each activity row is still processed once.
While slightly more verbose, this pattern appears frequently in analytics queries and demonstrates good understanding of database aggregation workflows and aggregation pipelines.
Recommended for interviews: The direct COUNT(DISTINCT) ratio is usually preferred. It is concise, efficient, and shows that you recognize the metric can be computed with two aggregates instead of an intermediate grouping step. The grouped subquery approach is still valuable because it mirrors the conceptual logic and works well when more per‑user metrics are required.
MySQL
| Approach | Time | Space | When to Use |
|---|---|---|---|
| COUNT(DISTINCT) Ratio | O(n) | O(1) | Best for concise queries when only the global average is needed |
| Group By User + AVG() | O(n) | O(u) | Useful when computing additional per-user metrics before averaging |
LeetCode 1142 "User Activity for the Past 30 Days II" Netflix Interview SQL Question Explanation • Everyday Data Science • 1,936 views views
Practice User Activity for the Past 30 Days II with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor