Table: Activity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |
+---------------+---------+
This table may have duplicate rows.
The activity_type column is an ENUM (category) of type ('open_session', 'end_session', 'scroll_down', 'send_message').
The table shows the user activities for a social media website.
Note that each session belongs to exactly one user.
Write a solution to find the daily active user count for a period of 30 days ending 2019-07-27 inclusively. A user was active on someday if they made at least one activity on that day.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Activity table: +---------+------------+---------------+---------------+ | user_id | session_id | activity_date | activity_type | +---------+------------+---------------+---------------+ | 1 | 1 | 2019-07-20 | open_session | | 1 | 1 | 2019-07-20 | scroll_down | | 1 | 1 | 2019-07-20 | end_session | | 2 | 4 | 2019-07-20 | open_session | | 2 | 4 | 2019-07-21 | send_message | | 2 | 4 | 2019-07-21 | end_session | | 3 | 2 | 2019-07-21 | open_session | | 3 | 2 | 2019-07-21 | send_message | | 3 | 2 | 2019-07-21 | end_session | | 4 | 3 | 2019-06-25 | open_session | | 4 | 3 | 2019-06-25 | end_session | +---------+------------+---------------+---------------+ Output: +------------+--------------+ | day | active_users | +------------+--------------+ | 2019-07-20 | 2 | | 2019-07-21 | 2 | +------------+--------------+ Explanation: Note that we do not care about days with zero active users.
In #1141 User Activity for the Past 30 Days I, the goal is to calculate the number of daily active users within the 30‑day window ending on a specified date. The key idea is to analyze activity records and determine how many distinct user_id values appear for each day.
Start by filtering the activity records to include only dates within the last 30 days of the reference date. After narrowing the dataset, group the records by activity_date. For each date group, count the number of unique users using COUNT(DISTINCT user_id) to avoid counting multiple activities from the same user more than once.
This grouped result gives the number of active users per day in the required window. Optionally, sort the output by date to present results chronologically. The approach mainly relies on SQL operations such as date filtering, grouping, and distinct counting, making it efficient and straightforward.
The overall performance depends on scanning the filtered activity records, which is typically O(n) in time with minimal additional space.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Date filtering + GROUP BY with COUNT(DISTINCT) | O(n) | O(d) |
Everyday Data Science
This approach uses SQL aggregation functions to filter activities in the last 30 days, group the activities by date, and count distinct user IDs for each date. We'll make use of the DATE_SUB function to get the starting date and apply GROUP BY and COUNT DISTINCT functions to get the active user count per day.
Time Complexity is approximately O(n) where n is the number of records in the table, because SQL has to scan all entries to filter and group them. Space Complexity is O(k) where k is the number of unique days with activities in the range, as that's the size of the result set.
1SELECT activity_date AS day, COUNT(DISTINCT user_id) AS active_users FROM Activity WHERE activity_date BETWEEN DATE_SUB('2019-07-27', INTERVAL 29 DAY) AND '2019-07-27' GROUP BY activity_date;This SQL statement selects dates and counts distinct users who were active on those dates. It filters the dates to the last 30 days ending on 2019-07-27 using the BETWEEN clause and DATE_SUB function. The GROUP BY activity_date groups the results by each day, and COUNT(DISTINCT user_id) counts the number of unique users active on each of those days.
This approach involves using a script to fetch and process data entries to count unique users per day. The script aggregates the data manually by iterating over each record, filtering by date, and maintaining a set of unique users for each activity date.
Time Complexity is O(n), where n is the number of activity records, as each record is processed separately. Space Complexity is O(d + u) where d is the number of days in the result and u is the total number of unique users aggregated in the set for all days.
1from collections import defaultdict
2
3def
Watch expert explanations and walkthroughs
Practice problems asked by these companies to ace your technical interviews.
Explore More ProblemsJot down your thoughts, approach, and key learnings
Yes, similar SQL aggregation and analytics problems are common in FAANG-style interviews. They often test your ability to filter datasets, group results, and compute metrics like daily active users.
The optimal approach filters activity records to the last 30 days and groups them by activity_date. For each date, count the number of distinct user_id values to determine daily active users. This method efficiently uses SQL aggregation functions.
A user may perform multiple activities on the same day. Using COUNT(DISTINCT user_id) ensures each user is counted only once per day, accurately representing the number of daily active users.
SQL aggregation with GROUP BY and COUNT(DISTINCT) is the most important concept for this problem. These operations allow you to group records by date and count unique users efficiently within the required time range.
This Python code uses a defaultdict to map each activity_date to a set of unique user_ids, accumulating only those records where the date falls between 2019-06-28 and 2019-07-27. The use of sets ensures all user_ids are distinct for each day. Finally, it returns a list of tuples showing each day and the count of its active users.