Game Play Analysis IV - Solution & Explanation

MediumDatabase12 min readAsked at: Amazon, Microsoft, Meta +3

Problem Statement

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.

Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.

The result format is in the following example.

Example 1:

Input: 
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+
Output: 
+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+
Explanation: 
Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33

Approach Overview

Problem Overview: The table Activity records player logins by date. The task is to compute the fraction of players who logged in again exactly one day after their first login. The result should be the number of such players divided by the total number of distinct players.

Approach 1: SQL Query for Aggregation (O(n) time, O(1) extra space)

The cleanest solution relies on SQL aggregation. First determine each player's first login date using MIN(event_date) grouped by player_id. Then check whether a record exists where the same player logged in on first_login_date + 1. This can be done using a self join or subquery. Finally compute the ratio of players who satisfy this condition divided by the total number of players. This approach works well because relational databases handle grouping and filtering efficiently. It primarily uses operations from database querying and SQL aggregation.

Approach 2: Data Processing with Python (O(n log n) time, O(n) space)

If you load the table into memory, you can process it using Python data structures. Group all login dates by player_id using a dictionary. For each player, sort their login dates and identify the earliest one. Then check whether the date first_login + 1 day appears in the player's login set. Maintain a counter for players who satisfy the condition and divide by the total number of players. Sorting introduces O(n log n) complexity, but using sets for membership checks keeps each lookup constant time.

Approach 3: Using SQL to Find Consecutive Logins (O(n) time, O(n) space)

This variation focuses on detecting consecutive dates directly. Instead of explicitly computing the next-day condition in application code, the SQL query joins the table with itself where a.player_id = b.player_id and b.event_date = DATE_ADD(a.event_date, INTERVAL 1 DAY). Restrict a.event_date to the player's first login using a subquery. The result identifies players who logged in on consecutive days starting from their first session. The technique is a common pattern when working with date handling in SQL.

Approach 4: Date Handling with Multiple Passes (O(n) time, O(n) space)

Another implementation scans the dataset in two passes. First pass computes the earliest login date for every player using a hash map. Second pass checks whether any record exists where the player's login date equals first_login + 1. Maintain a boolean flag per player to avoid double counting. This approach avoids sorting and works in linear time, making it suitable when implementing the logic in languages like C++ or Java.

Recommended for interviews: The SQL aggregation solution is what most interviewers expect for database problems. It shows you understand grouping, joins, and date arithmetic directly inside SQL. The multi-pass hash map approach demonstrates the same reasoning when implementing the logic in a general-purpose language.

Approach 1: Approach 1: SQL Query for Aggregation

Approach: We will utilize SQL to solve this problem by taking advantage of its aggregation and date manipulation capabilities. First, we will identify each player's first login date. Then, we'll check if there's a login entry for the subsequent day after that first login date. The final step is to calculate the fraction of players who have logged in on consecutive days, starting from their first login date.

The SQL query performs the following steps:

For each player, calculate the first login date using the MIN function.
Create a subquery to find the next available login date after the first login date for each player.
Using conditional aggregation, count players who have a non-null next day login date and calculate the fraction over the total count of unique players, rounding the result to two decimal places.

Code

SQL

Complexity

The complexity of this SQL query is primarily determined by the table scan needed for aggregation:

Time Complexity: O(n), where n is the number of entries in the Activity table.
Space Complexity: O(1), as we are using aggregation functions without requiring extra storage proportional to the input size.

Try this approach in the editor →

Approach 2: Approach 2: Data Processing with Python

Approach: This approach utilizes Python's data manipulation capabilities to process the table and calculate the required fraction. We will parse the data, identify each player's first login date, and check for subsequent day logins programmatically. Finally, we'll determine the desired fraction by counting players who have re-logged on the next day after their initial login.

This Python code executes the following steps:

Load the data into a pandas DataFrame and convert the event_date column to a datetime object.
Calculate each player's first login date, then compute the day after their first login date.
Merge the original DataFrame with the next day's calculated DataFrame to find records of repeat logins on consecutive days.
Calculate the fraction of players with consecutive day logins relative to the total number of players and print the result.

Code

Python

Complexity

The complexity for this approach is:

Time Complexity: O(n * log(n)) due to the sorting operation within groupby and merge operations.
Space Complexity: O(n) since additional DataFrames are created during processing.

Try this approach in the editor →

Approach 3: Approach 1: Using SQL to find consecutive logins

This approach involves processing the input data in a structured manner using SQL queries to identify players who logged in on consecutive days starting from their first login date. We will use SQL window functions to handle date differences effectively and then calculate the desired fraction.

This solution leverages the Pandas library to mimic SQL-like operations. First, we calculate the first login date for each player using the 'groupby' and 'min' functions. We then merge this information back into the original data to allow comparison with the subsequent login dates. By checking if the event date matches the first login date plus one day, we determine the consecutive logins. Summing and dividing provides the result fraction.

Code

Python JavaScript

Python

JavaScript

Complexity

Time Complexity: O(n), where n is the number of records, as each operation scales linearly with the dataset size.
Space Complexity: O(n), as additional columns are created for processing.

Try this approach in the editor →

Approach 4: Approach 2: Utilizing Date Handling with Multiple Passes

This approach involves a multi-pass strategy to handle dates and detect consecutive logins by manually checking day-by-day login activity.

This C++ solution uses common library functions to handle date operations while grouping data by players. It sorts the dates for each player and checks for consecutive days. The solution counts how many players meet the consecutive login criterion, resulting in the calculated fraction.

Code

C++Java

C++

Java

Complexity

Time Complexity: O(n log n) due to sorting necessary for detecting consecutive logins.
Space Complexity: O(n), where n represents distinct players and their login dates.

Try this approach in the editor →

Approach 5: Grouping and Minimum Value + Left Join

We can first find the first login date of each player, and then perform a left join with the original table, with the join condition being that the player ID is the same and the date difference is -1, which means the player logged in on the second day. Then, we only need to calculate the ratio of non-null players among the players who logged in on the second day.

Code

Python MySQL

Python

MySQL

Try this approach in the editor →

Approach 6: Window Function

We can use the LEAD window function to get the next login date of each player. If the next login date is one day after the current login date, it means that the player logged in on the second day, and we use a field st to record this information. Then, we use the RANK window function to rank the player IDs in ascending order by date, and get the login ranking of each player. Finally, we only need to calculate the ratio of non-null st values among the players with a ranking of 1.

Code

MySQL

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Approach 1: SQL Query for Aggregation	The complexity of this SQL query is primarily determined by the table scan needed for aggregation: Time Complexity: O(n), where n is the number of entries in the Activity table. Space Complexity: O(1), as we are using aggregation functions without requiring extra storage proportional to the input size.
Approach 2: Data Processing with Python	The complexity for this approach is: Time Complexity: O(n * log(n)) due to the sorting operation within groupby and merge operations. Space Complexity: O(n) since additional DataFrames are created during processing.
Approach 1: Using SQL to find consecutive logins	Time Complexity: O(n), where n is the number of records, as each operation scales linearly with the dataset size. Space Complexity: O(n), as additional columns are created for processing.
Approach 2: Utilizing Date Handling with Multiple Passes	Time Complexity: O(n log n) due to sorting necessary for detecting consecutive logins. Space Complexity: O(n), where n represents distinct players and their login dates.
Grouping and Minimum Value + Left Join	—
Window Function	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
SQL Aggregation with First Login	O(n)	O(1)	Best for SQL interviews and database queries
Python Data Processing	O(n log n)	O(n)	Useful when exporting data and analyzing outside the database
SQL Consecutive Login Join	O(n)	O(n)	When detecting consecutive dates directly with joins
Two-Pass Hash Map with Date Checks	O(n)	O(n)	Best when implementing logic in C++ or Java without SQL

Video Solution

Game Play Analysis IV | Leetcode 550 | Crack SQL Interviews in 50 Qs #mysql #leetcode • Learn With Chirag • 17,575 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Game Play Analysis IV easy or hard?

Game Play Analysis IV is rated Medium because it combines aggregation with date comparison logic. The query itself is short, but recognizing that you must compare the first login date with the next day requires careful reasoning about grouped data.

Game Play Analysis IV Python/Java solution

In Python or Java, group login dates by player_id using a dictionary or hash map. Track each player's earliest login date, then check whether first_login + 1 day exists in their login records. Count such players and divide by the total number of players to compute the fraction.

How to solve Game Play Analysis IV in O(n)?

Store the earliest login date for every player using aggregation or a hash map. Then check whether a login exists exactly one day after that date. Counting the players that satisfy this condition and dividing by the total number of players yields the required fraction in linear time.

What is the best approach for Game Play Analysis IV?

The most efficient approach uses SQL aggregation. Compute each player's first login with MIN(event_date), then check if a record exists for the same player on the next day. Finally divide the number of qualifying players by the total number of distinct players. This runs in O(n) time inside the database engine.

Is Game Play Analysis IV asked at Google/Amazon/Meta?

Database analytics problems similar to Game Play Analysis IV appear in interviews at companies like Amazon, Meta, and other data-driven organizations. They test understanding of SQL aggregation, joins, and date arithmetic rather than complex algorithms.

What data structure is used in Game Play Analysis IV?

SQL solutions rely on relational operations such as GROUP BY, joins, and date functions. In general programming languages, a hash map mapping player_id to login dates is typically used along with sets for fast date lookup.

What is the time complexity of Game Play Analysis IV?

The optimal SQL solution runs in O(n) time because it scans the Activity table and performs grouped aggregation by player_id. Hash-map based implementations in languages like Java or C++ also run in O(n) time with O(n) additional space.

Ready to solve this problem?

Practice Game Play Analysis IV with our built-in code editor and test cases.

Practice on FleetCode

Game Play Analysis III

Game Play Analysis V

Problem Info

DifficultyMedium

Acceptance38.4%

Approaches6

Reading time12 min

Asked at

Amazon Microsoft Meta Google Bloomberg

Practice this problem

Open in Editor

Problem Statement

Approach Overview

Approach 1: Approach 1: SQL Query for Aggregation

Code

Complexity

Approach 2: Approach 2: Data Processing with Python

Code

Complexity

Approach 3: Approach 1: Using SQL to find consecutive logins

Code

Complexity

Approach 4: Approach 2: Utilizing Date Handling with Multiple Passes

Code

Complexity

Approach 5: Grouping and Minimum Value + Left Join

Code

Approach 6: Window Function

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Approach 1: SQL Query for Aggregation

Code

Complexity

Approach 2: Approach 2: Data Processing with Python

Code

Complexity

Approach 3: Approach 1: Using SQL to find consecutive logins

Code

Complexity

Approach 4: Approach 2: Utilizing Date Handling with Multiple Passes

Code

Complexity

Approach 5: Grouping and Minimum Value + Left Join

Code

Approach 6: Window Function

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents