Watch the video solution for User Activities within Time Bounds, a hard level problem involving Database. This walkthrough by Everyday Data Science has 513 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Sessions
+---------------+----------+ | Column Name | Type | +---------------+----------+ | user_id | int | | session_start | datetime | | session_end | datetime | | session_id | int | | session_type | enum | +---------------+----------+ session_id is column of unique values for this table. session_type is an ENUM (category) type of (Viewer, Streamer). This table contains user id, session start, session end, session id and session type.
Write a solution to find the the users who have had at least two session of the same type (either 'Viewer' or 'Streamer') with a maximum gap of 12 hours between sessions.
Return the result table ordered by user_id in ascending order.
The result format is in the following example.
Example:
Input: Sessions table: +---------+---------------------+---------------------+------------+--------------+ | user_id | session_start | session_end | session_id | session_type | +---------+---------------------+---------------------+------------+--------------+ | 101 | 2023-11-01 08:00:00 | 2023-11-01 09:00:00 | 1 | Viewer | | 101 | 2023-11-01 10:00:00 | 2023-11-01 11:00:00 | 2 | Streamer | | 102 | 2023-11-01 13:00:00 | 2023-11-01 14:00:00 | 3 | Viewer | | 102 | 2023-11-01 15:00:00 | 2023-11-01 16:00:00 | 4 | Viewer | | 101 | 2023-11-02 09:00:00 | 2023-11-02 10:00:00 | 5 | Viewer | | 102 | 2023-11-02 12:00:00 | 2023-11-02 13:00:00 | 6 | Streamer | | 101 | 2023-11-02 13:00:00 | 2023-11-02 14:00:00 | 7 | Streamer | | 102 | 2023-11-02 16:00:00 | 2023-11-02 17:00:00 | 8 | Viewer | | 103 | 2023-11-01 08:00:00 | 2023-11-01 09:00:00 | 9 | Viewer | | 103 | 2023-11-02 20:00:00 | 2023-11-02 23:00:00 | 10 | Viewer | | 103 | 2023-11-03 09:00:00 | 2023-11-03 10:00:00 | 11 | Viewer | +---------+---------------------+---------------------+------------+--------------+ Output: +---------+ | user_id | +---------+ | 102 | | 103 | +---------+ Explanation: - User ID 101 will not be included in the final output as they do not have any two sessions of the same session type. - User ID 102 will be included in the final output as they had two viewer sessions with session IDs 3 and 4, respectively, and the time gap between them was less than 12 hours. - User ID 103 participated in two viewer sessions with a gap of less than 12 hours between them, identified by session IDs 10 and 11. Therefore, user 103 will be included in the final output. Output table is ordered by user_id in increasing order.
Problem Overview: You are given a table of user activity records with timestamps. The goal is to detect activities that occur within specific time bounds for the same user. Instead of checking every pair of records, the challenge is to efficiently compare nearby activities in chronological order.
Approach 1: Self Join with Time Difference (O(n^2) time, O(1) extra space)
A straightforward method joins the activity table with itself on user_id. For each pair of rows belonging to the same user, compute the time difference using a SQL time function such as TIMESTAMPDIFF(). Then filter rows where the difference falls within the required bounds. This works but scales poorly because each row may be compared with many others. On large datasets, the quadratic comparison cost becomes expensive.
Approach 2: Window Function + Time Function (O(n log n) time, O(n) space)
A more scalable strategy orders each user's activities by timestamp and compares adjacent records using a window function. Use LAG() or LEAD() with PARTITION BY user_id ORDER BY activity_time to access the previous or next event for the same user. Then compute the difference between timestamps with a time function such as TIMESTAMPDIFF() or equivalent logic in Python. Because each row is only compared with its immediate neighbor, the number of comparisons drops dramatically. The dominant cost is sorting events by user and time, which takes O(n log n).
This pattern is common in database analytics. Window functions let you analyze sequences of events without expensive joins. The database scans the ordered partition once, making the query efficient even with millions of records.
Conceptually, the algorithm works like this: partition activities by user, sort by timestamp, fetch the previous activity with LAG(), compute the time difference, and filter rows where the difference lies inside the allowed bounds. Many production analytics pipelines use this same design for session detection, anomaly detection, and activity clustering.
Recommended for interviews: The window function approach is what interviewers expect. The self-join approach shows you understand relational comparisons, but it does not scale well. Using LAG() demonstrates familiarity with analytical SQL and efficient event-sequence processing. Problems like this frequently appear in database interview rounds that test knowledge of window functions, SQL joins, and time functions.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Self Join with Time Difference | O(n^2) | O(1) | Small datasets or quick prototype queries |
| Window Function + Time Function | O(n log n) | O(n) | Large datasets where events must be compared sequentially per user |