Game Play Analysis IV

Table: Activity +--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id, event_date) is the primary key (combination of columns with unique values) of this table. This table shows the activity of players of some games. Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.   Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players. The result format is in the following example.   Example 1: Input: Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-03-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Output: +-----------+ | fraction | +-----------+ | 0.33 | +-----------+ Explanation: Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33

Solution Explanation for LeetCode 550: Game Play Analysis IV This problem requires calculating the fraction of players who logged in on consecutive days, starting from their first login. The solution involves finding each player's first login date and then checking if they logged in the following day. Approach 1: Grouping and Minimum Value + Left Join This approach uses two steps: Find the first login date: We group the Activity table by player_id and use the MIN(event_date) aggregate function to determine each player's first login date. This is stored in a subquery (or a temporary DataFrame in Python). Left Join and Counting: We perform a left join between the result from step 1 (containing player IDs and their first login dates) and the original Activity table. The join condition is that player_id matches and the difference between the first login date and the event_date in the Activity table is -1 day (meaning the next day). A left join ensures that all players from the first step are included, even if they didn't log in the next day. By counting the number of non-NULL event_date values from the joined result, we get the count of players who logged in on the consecutive day. The result is then divided by the total number of unique players to get the fraction. Time Complexity Analysis: The GROUP BY operation in the subquery takes O(N log N) time in the worst case, where N is the number of rows in the Activity table. (This depends on the specific database implementation; some might optimize it better). The LEFT JOIN operation has a time complexity that depends on the database's join algorithm, but is typically O(M log M + N log N) or better (e.g., hash joins can achieve close to linear time in optimal conditions), where M is the number of rows in the subquery result (number of unique players). The final count and division operations take O(M) time. Therefore, the overall time complexity is dominated by the GROUP BY and JOIN operations, resulting in approximately O(N log N) in the worst case, where N is the number of rows in the Activity table. Space Complexity: The space complexity is determined by the space required to store intermediate results (the subquery result and the join result). In the worst case, this could be O(N) in the case where all players have unique login dates. Approach 2: Window Function (MySQL) This approach leverages MySQL's window functions for a more concise and potentially efficient solution: LEAD Function: The LEAD window function is used to get the next event_date for each player, ordered by event_date. The result is compared to the current event_date to determine if the next login was on the consecutive day (difference of 1). This result is stored in a common table expression (CTE) called T. RANK Function: The RANK window function is used to assign a rank to each player's login events, ordered by event_date. This allows us to easily identify the first login event for each player (rank 1). Counting and Aggregation: We count the number of players with rank 1 who also had a consecutive login (the st column is 1) and divide this by the total number of unique players. Time Complexity Analysis: The window functions (LEAD and RANK) have a time complexity of O(N log N) in the worst case, where N is the number of rows in the Activity table. This is because window functions typically require sorting or other operations with logarithmic time complexity. The subsequent counting and aggregation steps have linear time complexity O(N). Therefore, the overall time complexity is dominated by the window functions, resulting in O(N log N). Space Complexity: The space complexity is determined by the CTE (T) which can require O(N) space in the worst case to store intermediate results. Code Implementation (Python and MySQL) The Python code uses pandas for efficient data manipulation, while the MySQL code directly utilizes SQL queries. The comments in the code further clarify each step. Python (using pandas): import pandas as pd def gameplay_analysis(activity: pd.DataFrame) -> pd.DataFrame: # Find the first login date for each player activity["first"] = activity.groupby("player_id")["event_date"].transform("min") # Filter for players who logged in on the day after their first login activity_2nd_day = activity[activity["first"] + pd.DateOffset(1) == activity["event_date"]] # Calculate and format the fraction fraction = round(len(activity_2nd_day) / activity["player_id"].nunique(), 2) return pd.DataFrame({"fraction": [fraction]}) :root {--copy-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 48 48'%3E%3Cpath fill='%23adadad' d='M16.187 9.5H12.25a1.75 1.75 0 0 0-1.75 1.75v28.5c0 .967.784 1.75 1.75 1.75h23.5a1.75 1.75 0 0 0 1.75-1.75v-28.5a1.75 1.75 0 0 0-1.75-1.75h-3.937a4.25 4.25 0 0 1-4.063 3h-7.5a4.25 4.25 0 0 1-4.063-3M31.813 7h3.937A4.25 4.25 0 0 1 40 11.25v28.5A4.25 4.25 0 0 1 35.75 44h-23.5A4.25 4.25 0 0 1 8 39.75v-28.5A4.25 4.25 0 0 1 12.25 7h3.937a4.25 4.25 0 0 1 4.063-3h7.5a4.25 4.25 0 0 1 4.063 3M18.5 8.25c0 .966.784 1.75 1.75 1.75h7.5a1.75 1.75 0 1 0 0-3.5h-7.5a1.75 1.75 0 0 0-1.75 1.75'/%3E%3C/svg%3E");--success-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath fill='%2366ff85' d='M9 16.17L5.53 12.7a.996.996 0 1 0-1.41 1.41l4.18 4.18c.39.39 1.02.39 1.41 0L20.29 7.71a.996.996 0 1 0-1.41-1.41z'/%3E%3C/svg%3E");}pre:has(code) {position: relative;}pre button.rehype-pretty-copy {right: 1px;padding: 0;width: 24px;height: 24px;display: flex;margin-top: 2px;margin-right: 8px;position: absolute;border-radius: 25%;backdrop-filter: blur(3px);& span {width: 100%;aspect-ratio: 1 / 1;}& .ready {background-image: var(--copy-icon);}& .success {display: none; background-image: var(--success-icon);}}&.rehype-pretty-copied {& .success {display: block;} & .ready {display: none;}}pre button.rehype-pretty-copy.rehype-pretty-copied {opacity: 1;& .ready { display: none; }& .success { display: block; }} MySQL (Approach 2 - using window functions): WITH T AS ( SELECT player_id, DATEDIFF(LEAD(event_date) OVER (PARTITION BY player_id ORDER BY event_date), event_date) = 1 AS st, -- Check for consecutive login RANK() OVER (PARTITION BY player_id ORDER BY event_date) AS rk -- Rank login events FROM Activity ) SELECT ROUND(COUNT(IF(st = 1, player_id, NULL)) / COUNT(DISTINCT player_id), 2) AS fraction FROM T WHERE rk = 1; -- Consider only the first login event for each player:root {--copy-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 48 48'%3E%3Cpath fill='%23adadad' d='M16.187 9.5H12.25a1.75 1.75 0 0 0-1.75 1.75v28.5c0 .967.784 1.75 1.75 1.75h23.5a1.75 1.75 0 0 0 1.75-1.75v-28.5a1.75 1.75 0 0 0-1.75-1.75h-3.937a4.25 4.25 0 0 1-4.063 3h-7.5a4.25 4.25 0 0 1-4.063-3M31.813 7h3.937A4.25 4.25 0 0 1 40 11.25v28.5A4.25 4.25 0 0 1 35.75 44h-23.5A4.25 4.25 0 0 1 8 39.75v-28.5A4.25 4.25 0 0 1 12.25 7h3.937a4.25 4.25 0 0 1 4.063-3h7.5a4.25 4.25 0 0 1 4.063 3M18.5 8.25c0 .966.784 1.75 1.75 1.75h7.5a1.75 1.75 0 1 0 0-3.5h-7.5a1.75 1.75 0 0 0-1.75 1.75'/%3E%3C/svg%3E");--success-icon: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24'%3E%3Cpath fill='%2366ff85' d='M9 16.17L5.53 12.7a.996.996 0 1 0-1.41 1.41l4.18 4.18c.39.39 1.02.39 1.41 0L20.29 7.71a.996.996 0 1 0-1.41-1.41z'/%3E%3C/svg%3E");}pre:has(code) {position: relative;}pre button.rehype-pretty-copy {right: 1px;padding: 0;width: 24px;height: 24px;display: flex;margin-top: 2px;margin-right: 8px;position: absolute;border-radius: 25%;backdrop-filter: blur(3px);& span {width: 100%;aspect-ratio: 1 / 1;}& .ready {background-image: var(--copy-icon);}& .success {display: none; background-image: var(--success-icon);}}&.rehype-pretty-copied {& .success {display: block;} & .ready {display: none;}}pre button.rehype-pretty-copy.rehype-pretty-copied {opacity: 1;& .ready { display: none; }& .success { display: block; }} Both approaches achieve the same result. The choice between them depends on factors like database system capabilities, data size, and performance optimization needs. The window function approach (MySQL) might be more efficient for very large datasets, depending on the database's optimization of window functions. The pandas solution is generally good for medium-sized datasets and offers flexibility in data manipulation.

Also Explore

DSA Questions

01 Matrix

DSA Questions

Diameter of Binary Tree

DSA Questions

Output Contest Matches

DSA Questions

Boundary of Binary Tree

DSA Questions

Remove Boxes

DSA Questions

Number of Provinces

DSA Questions

Split Array with Equal Sum

DSA Questions

Binary Tree Longest Consecutive Sequence II

DSA Questions

Game Play Analysis IV

DSA Questions

Student Attendance Record I

DSA Questions

Student Attendance Record II

DSA Questions

Optimal Division

DSA Questions

Brick Wall

DSA Questions

Split Concatenated Strings

DSA Questions

Next Greater Element III

DSA Questions

Reverse Words in a String III

DSA Questions

Game Play Analysis IV

Solution Explanation for LeetCode 550: Game Play Analysis IV

Approach 1: Grouping and Minimum Value + Left Join

Approach 2: Window Function (MySQL)

Code Implementation (Python and MySQL)

On This Page

Also Explore

01 Matrix

Diameter of Binary Tree

Output Contest Matches

Boundary of Binary Tree

Remove Boxes

Number of Provinces

Split Array with Equal Sum

Binary Tree Longest Consecutive Sequence II

Game Play Analysis IV

Student Attendance Record I

Student Attendance Record II

Optimal Division

Brick Wall

Split Concatenated Strings

Next Greater Element III

Reverse Words in a String III

Logical OR of Two Binary Grids Represented as Quad-Trees