Table: Activity
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id, event_date) is the primary key (combination of columns with unique values) of this table. This table shows the activity of players of some games. Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.
The result format is in the following example.
Example 1:
Input: Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-03-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Output: +-----------+ | fraction | +-----------+ | 0.33 | +-----------+ Explanation: Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33
Problem Overview: The table Activity records player logins by date. The task is to compute the fraction of players who logged in again exactly one day after their first login. The result should be the number of such players divided by the total number of distinct players.
Approach 1: SQL Query for Aggregation (O(n) time, O(1) extra space)
The cleanest solution relies on SQL aggregation. First determine each player's first login date using MIN(event_date) grouped by player_id. Then check whether a record exists where the same player logged in on first_login_date + 1. This can be done using a self join or subquery. Finally compute the ratio of players who satisfy this condition divided by the total number of players. This approach works well because relational databases handle grouping and filtering efficiently. It primarily uses operations from database querying and SQL aggregation.
Approach 2: Data Processing with Python (O(n log n) time, O(n) space)
If you load the table into memory, you can process it using Python data structures. Group all login dates by player_id using a dictionary. For each player, sort their login dates and identify the earliest one. Then check whether the date first_login + 1 day appears in the player's login set. Maintain a counter for players who satisfy the condition and divide by the total number of players. Sorting introduces O(n log n) complexity, but using sets for membership checks keeps each lookup constant time.
Approach 3: Using SQL to Find Consecutive Logins (O(n) time, O(n) space)
This variation focuses on detecting consecutive dates directly. Instead of explicitly computing the next-day condition in application code, the SQL query joins the table with itself where a.player_id = b.player_id and b.event_date = DATE_ADD(a.event_date, INTERVAL 1 DAY). Restrict a.event_date to the player's first login using a subquery. The result identifies players who logged in on consecutive days starting from their first session. The technique is a common pattern when working with date handling in SQL.
Approach 4: Date Handling with Multiple Passes (O(n) time, O(n) space)
Another implementation scans the dataset in two passes. First pass computes the earliest login date for every player using a hash map. Second pass checks whether any record exists where the player's login date equals first_login + 1. Maintain a boolean flag per player to avoid double counting. This approach avoids sorting and works in linear time, making it suitable when implementing the logic in languages like C++ or Java.
Recommended for interviews: The SQL aggregation solution is what most interviewers expect for database problems. It shows you understand grouping, joins, and date arithmetic directly inside SQL. The multi-pass hash map approach demonstrates the same reasoning when implementing the logic in a general-purpose language.
Approach: We will utilize SQL to solve this problem by taking advantage of its aggregation and date manipulation capabilities. First, we will identify each player's first login date. Then, we'll check if there's a login entry for the subsequent day after that first login date. The final step is to calculate the fraction of players who have logged in on consecutive days, starting from their first login date.
The SQL query performs the following steps:
SQL
The complexity of this SQL query is primarily determined by the table scan needed for aggregation:
Approach: This approach utilizes Python's data manipulation capabilities to process the table and calculate the required fraction. We will parse the data, identify each player's first login date, and check for subsequent day logins programmatically. Finally, we'll determine the desired fraction by counting players who have re-logged on the next day after their initial login.
This Python code executes the following steps:
Python
The complexity for this approach is:
This approach involves processing the input data in a structured manner using SQL queries to identify players who logged in on consecutive days starting from their first login date. We will use SQL window functions to handle date differences effectively and then calculate the desired fraction.
This solution leverages the Pandas library to mimic SQL-like operations. First, we calculate the first login date for each player using the 'groupby' and 'min' functions. We then merge this information back into the original data to allow comparison with the subsequent login dates. By checking if the event date matches the first login date plus one day, we determine the consecutive logins. Summing and dividing provides the result fraction.
Python
JavaScript
Time Complexity: O(n), where n is the number of records, as each operation scales linearly with the dataset size.
Space Complexity: O(n), as additional columns are created for processing.
This approach involves a multi-pass strategy to handle dates and detect consecutive logins by manually checking day-by-day login activity.
This C++ solution uses common library functions to handle date operations while grouping data by players. It sorts the dates for each player and checks for consecutive days. The solution counts how many players meet the consecutive login criterion, resulting in the calculated fraction.
Time Complexity: O(n log n) due to sorting necessary for detecting consecutive logins.
Space Complexity: O(n), where n represents distinct players and their login dates.
We can first find the first login date of each player, and then perform a left join with the original table, with the join condition being that the player ID is the same and the date difference is -1, which means the player logged in on the second day. Then, we only need to calculate the ratio of non-null players among the players who logged in on the second day.
We can use the LEAD window function to get the next login date of each player. If the next login date is one day after the current login date, it means that the player logged in on the second day, and we use a field st to record this information. Then, we use the RANK window function to rank the player IDs in ascending order by date, and get the login ranking of each player. Finally, we only need to calculate the ratio of non-null st values among the players with a ranking of 1.
MySQL
| Approach | Complexity |
|---|---|
| Approach 1: SQL Query for Aggregation | The complexity of this SQL query is primarily determined by the table scan needed for aggregation:
|
| Approach 2: Data Processing with Python | The complexity for this approach is:
|
| Approach 1: Using SQL to find consecutive logins | Time Complexity: O(n), where n is the number of records, as each operation scales linearly with the dataset size. |
| Approach 2: Utilizing Date Handling with Multiple Passes | Time Complexity: O(n log n) due to sorting necessary for detecting consecutive logins. |
| Grouping and Minimum Value + Left Join | — |
| Window Function | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| SQL Aggregation with First Login | O(n) | O(1) | Best for SQL interviews and database queries |
| Python Data Processing | O(n log n) | O(n) | Useful when exporting data and analyzing outside the database |
| SQL Consecutive Login Join | O(n) | O(n) | When detecting consecutive dates directly with joins |
| Two-Pass Hash Map with Date Checks | O(n) | O(n) | Best when implementing logic in C++ or Java without SQL |
Game Play Analysis IV | Leetcode 550 | Crack SQL Interviews in 50 Qs #mysql #leetcode • Learn With Chirag • 17,575 views views
Watch 9 more video solutions →Practice Game Play Analysis IV with our built-in code editor and test cases.
Practice on FleetCode