Table: Activity
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id, event_date) is the primary key (combination of columns with unique values) of this table. This table shows the activity of players of some games. Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.
The result format is in the following example.
Example 1:
Input: Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-03-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Output: +-----------+ | fraction | +-----------+ | 0.33 | +-----------+ Explanation: Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33
In #550 Game Play Analysis IV, the goal is to compute the fraction of players who logged back into the game exactly one day after their first login. The key idea is to first determine each player’s earliest login date and then verify whether they returned the following day.
A common strategy is to use a subquery or aggregation to find the MIN(event_date) for every player_id. This represents the player's first login. Next, check if a record exists where the same player logged in on first_login + 1 day. This can be done using a self join or conditional filtering on the Activity table.
Finally, count the number of players who satisfy this condition and divide it by the total number of distinct players. SQL date functions and grouping are essential for implementing this logic efficiently. The approach mainly involves table scans and grouping operations, making it scalable for typical database workloads.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Aggregation with self join / subquery | O(n) | O(p) |
| Window function approach | O(n) | O(p) |
Frederik Müller
Approach: We will utilize SQL to solve this problem by taking advantage of its aggregation and date manipulation capabilities. First, we will identify each player's first login date. Then, we'll check if there's a login entry for the subsequent day after that first login date. The final step is to calculate the fraction of players who have logged in on consecutive days, starting from their first login date.
The complexity of this SQL query is primarily determined by the table scan needed for aggregation:
1SELECT ROUND(SUM(CASE WHEN NextDay IS NOT NULL THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2) AS fraction FROM (SELECT a.player_id, MIN(a.event_date) AS FirstLogin, (SELECT MIN(b.event_date) FROM Activity b WHERE b.player_id = a.player_id AND b.event_date > MIN(a.event_date)) AS NextDay FROM Activity a GROUP BY a.player_id) Sub;The SQL query performs the following steps:
Approach: This approach utilizes Python's data manipulation capabilities to process the table and calculate the required fraction. We will parse the data, identify each player's first login date, and check for subsequent day logins programmatically. Finally, we'll determine the desired fraction by counting players who have re-logged on the next day after their initial login.
The complexity for this approach is:
1import pandas as pd
2from datetime import timedelta
3
4data
This approach involves processing the input data in a structured manner using SQL queries to identify players who logged in on consecutive days starting from their first login date. We will use SQL window functions to handle date differences effectively and then calculate the desired fraction.
Time Complexity: O(n), where n is the number of records, as each operation scales linearly with the dataset size.
Space Complexity: O(n), as additional columns are created for processing.
1import pandas as pd
2
3data =
This approach involves a multi-pass strategy to handle dates and detect consecutive logins by manually checking day-by-day login activity.
Time Complexity: O(n log n) due to sorting necessary for detecting consecutive logins.
Space Complexity: O(n), where n represents distinct players and their login dates.
1#include <iostream>
2#include <vector>
3#include <unordered_map>
4#include <string>
5#include <sstream>
6#include <iomanip>
using namespace std;
struct Activity {
int player_id;
int device_id;
string event_date;
int games_played;
};
bool isNextDay(const string& cur, const string& next) {
tm cur_tm = {}, next_tm = {};
strptime(cur.c_str(), "%Y-%m-%d", &cur_tm);
strptime(next.c_str(), "%Y-%m-%d", &next_tm);
next_tm.tm_mday--;
mktime(&next_tm);
return cur_tm.tm_year == next_tm.tm_year && cur_tm.tm_mon == next_tm.tm_mon && cur_tm.tm_mday == next_tm.tm_mday;
}
string fractionConsecutiveLogins(vector<Activity> data) {
unordered_map<int, vector<string>> playerMap;
for (const auto& activity : data) {
playerMap[activity.player_id].push_back(activity.event_date);
}
int totalConsecutive = 0;
for (auto& pair : playerMap) {
auto& dates = pair.second;
sort(dates.begin(), dates.end());
for (size_t i = 1; i < dates.size(); ++i) {
if (isNextDay(dates[i - 1], dates[i])) {
totalConsecutive++;
break;
}
}
}
int totalPlayers = playerMap.size();
double fraction = static_cast<double>(totalConsecutive) / totalPlayers;
ostringstream out;
out << fixed << setprecision(2) << fraction;
return out.str();
}
int main() {
vector<Activity> data = {
{1, 2, "2016-03-01", 5},
{1, 2, "2016-03-02", 6},
{2, 3, "2017-06-25", 1},
{3, 1, "2016-03-02", 0},
{3, 4, "2018-07-03", 5}
};
cout << "Fraction: " << fractionConsecutiveLogins(data) << endl;
return 0;
}Watch expert explanations and walkthroughs
Practice problems asked by these companies to ace your technical interviews.
Explore More ProblemsJot down your thoughts, approach, and key learnings
Yes, variations of retention and cohort analysis queries like this are commonly asked in data and backend interviews at FAANG and other tech companies. They test SQL proficiency, analytical thinking, and understanding of time-based data.
This problem mainly uses SQL aggregation, date arithmetic, and joins or subqueries. Functions like MIN() help identify the first login, while date comparisons determine if the player returned the next day.
The optimal approach is to first compute each player's earliest login date using an aggregation like MIN(event_date). Then check if the same player logged in again on the next day using a join or conditional filter. Finally, divide the count of such players by the total number of players to get the fraction.
The key concept is relational table grouping by player_id along with date-based filtering. Efficient use of indexes on player_id and event_date can significantly improve query performance.
This Python code executes the following steps:
This solution leverages the Pandas library to mimic SQL-like operations. First, we calculate the first login date for each player using the 'groupby' and 'min' functions. We then merge this information back into the original data to allow comparison with the subsequent login dates. By checking if the event date matches the first login date plus one day, we determine the consecutive logins. Summing and dividing provides the result fraction.
This C++ solution uses common library functions to handle date operations while grouping data by players. It sorts the dates for each player and checks for consecutive days. The solution counts how many players meet the consecutive login criterion, resulting in the calculated fraction.