Table: Actions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| post_id | int |
| action_date | date |
| action | enum |
| extra | varchar |
+---------------+---------+
This table may have duplicate rows.
The action column is an ENUM (category) type of ('view', 'like', 'reaction', 'comment', 'report', 'share').
The extra column has optional information about the action, such as a reason for the report or a type of reaction.
Table: Removals
+---------------+---------+ | Column Name | Type | +---------------+---------+ | post_id | int | | remove_date | date | +---------------+---------+ post_id is the primary key (column with unique values) of this table. Each row in this table indicates that some post was removed due to being reported or as a result of an admin review.
Write a solution to find the average daily percentage of posts that got removed after being reported as spam, rounded to 2 decimal places.
The result format is in the following example.
Example 1:
Input: Actions table: +---------+---------+-------------+--------+--------+ | user_id | post_id | action_date | action | extra | +---------+---------+-------------+--------+--------+ | 1 | 1 | 2019-07-01 | view | null | | 1 | 1 | 2019-07-01 | like | null | | 1 | 1 | 2019-07-01 | share | null | | 2 | 2 | 2019-07-04 | view | null | | 2 | 2 | 2019-07-04 | report | spam | | 3 | 4 | 2019-07-04 | view | null | | 3 | 4 | 2019-07-04 | report | spam | | 4 | 3 | 2019-07-02 | view | null | | 4 | 3 | 2019-07-02 | report | spam | | 5 | 2 | 2019-07-03 | view | null | | 5 | 2 | 2019-07-03 | report | racism | | 5 | 5 | 2019-07-03 | view | null | | 5 | 5 | 2019-07-03 | report | racism | +---------+---------+-------------+--------+--------+ Removals table: +---------+-------------+ | post_id | remove_date | +---------+-------------+ | 2 | 2019-07-20 | | 3 | 2019-07-18 | +---------+-------------+ Output: +-----------------------+ | average_daily_percent | +-----------------------+ | 75.00 | +-----------------------+ Explanation: The percentage for 2019-07-04 is 50% because only one post of two spam reported posts were removed. The percentage for 2019-07-02 is 100% because one post was reported as spam and it was removed. The other days had no spam reports so the average is (50 + 100) / 2 = 75% Note that the output is only one number and that we do not care about the remove dates.
Problem Overview: You’re given two tables: Actions (user reports on posts) and Removals (posts removed by moderators). The task is to compute the average daily percentage of posts reported as spam that eventually got removed. Only spam reports (action = 'report' and extra = 'spam') count, and the percentage is calculated per day before averaging across all days.
Approach 1: Aggregation with Join (O(n log n) time, O(n) space)
Start by filtering the Actions table to only rows where a post was reported as spam. For each action_date, count the number of distinct post_id values. This represents the total posts reported as spam that day. Next, join these filtered records with the Removals table on post_id to determine which reported posts were actually removed. Using COUNT(DISTINCT), compute how many of those reported posts were removed. The daily percentage becomes removed_posts / reported_posts * 100. Finally, average these daily percentages and round the result to two decimal places.
The key idea is separating the calculation into two logical layers: first compute the percentage for each day, then aggregate again to compute the overall average. SQL handles this efficiently with grouping and joins. Since grouping operations dominate runtime, the complexity is roughly O(n log n) depending on the database’s aggregation implementation.
Approach 2: Conditional Aggregation with Subquery (O(n log n) time, O(n) space)
Another option uses a subquery that groups spam reports by day while applying conditional aggregation to detect removals. After joining Actions and Removals, compute the daily removal ratio using expressions like COUNT(DISTINCT CASE WHEN ... THEN post_id END). The outer query simply averages these daily ratios. This approach reduces intermediate steps and keeps all logic inside grouped queries, which many SQL engines optimize well.
This problem heavily relies on SQL grouping techniques, database joins, and careful use of DISTINCT to avoid double-counting posts when multiple users report the same one. Handling duplicates correctly is the main challenge.
Recommended for interviews: The aggregation-with-join approach is the most readable and mirrors how analysts typically compute daily metrics in production systems. Interviewers expect you to filter spam reports first, group by date, join to Removals, and then compute percentages before averaging. Showing the step-by-step aggregation demonstrates clear reasoning about SQL analytics queries.
MySQL
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Aggregation with Join | O(n log n) | O(n) | Clear step-by-step calculation of daily reported posts and removals using JOIN and GROUP BY. |
| Conditional Aggregation | O(n log n) | O(n) | Compact SQL queries where removal checks are embedded directly in aggregated expressions. |
Leetcode MEDIUM 1132 - Reported Posts II DISTINCT ROUND AVERAGE - Explained by Everyday Data Science • Everyday Data Science • 858 views views
Practice Reported Posts II with our built-in code editor and test cases.
Practice on FleetCode