Table: Events
+---------------+---------+ | Column Name | Type | +---------------+---------+ | business_id | int | | event_type | varchar | | occurrences | int | +---------------+---------+ (business_id, event_type) is the primary key (combination of columns with unique values) of this table. Each row in the table logs the info that an event of some type occurred at some business for a number of times.
The average activity for a particular event_type is the average occurrences across all companies that have this event.
An active business is a business that has more than one event_type such that their occurrences is strictly greater than the average activity for that event.
Write a solution to find all active businesses.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Events table: +-------------+------------+-------------+ | business_id | event_type | occurrences | +-------------+------------+-------------+ | 1 | reviews | 7 | | 3 | reviews | 3 | | 1 | ads | 11 | | 2 | ads | 7 | | 3 | ads | 6 | | 1 | page views | 3 | | 2 | page views | 12 | +-------------+------------+-------------+ Output: +-------------+ | business_id | +-------------+ | 1 | +-------------+ Explanation: The average activity for each event can be calculated as follows: - 'reviews': (7+3)/2 = 5 - 'ads': (11+7+6)/3 = 8 - 'page views': (3+12)/2 = 7.5 The business with id=1 has 7 'reviews' events (more than 5) and 11 'ads' events (more than 8), so it is an active business.
Problem Overview: The table stores how many times each business triggered a specific event type. A business is considered active if its occurrence count is higher than the average for that event type in at least two different event categories. The result should return the IDs of such businesses.
Approach 1: Correlated Subquery Comparison (O(n^2) time, O(1) extra space)
A straightforward way compares each row's occurrences value with the average for the same event_type using a correlated subquery. For every record, compute AVG(occurrences) from rows sharing the same event type. If the row's value is greater than the computed average, count it toward that business. Finally, group by business_id and keep businesses where the count of above-average events is at least two.
This approach is easy to reason about but inefficient. The database repeatedly recomputes averages for each row, which leads to repeated scans of the same data. On large datasets the query planner may struggle to optimize it, resulting in near O(n^2) behavior.
Approach 2: Aggregation + Join (O(n) time, O(n) space)
A more efficient solution first calculates the average occurrences per event_type using a grouped subquery. This produces a small derived table mapping each event type to its average. Next, join this derived table back to the original Events table and filter rows where occurrences > avg_occurrences. Each remaining row represents an event where the business performed above the average.
Group these filtered rows by business_id and count how many qualifying event types each business has. Businesses with a count of at least two satisfy the problem requirement. This approach works efficiently because the average for each event type is computed once, then reused through a join.
This pattern—aggregate first, then filter with a join—is common in SQL and database interview problems. It separates the computation of global statistics (averages) from row-level filtering, which allows the optimizer to scan the data only once and leverage grouping efficiently.
Recommended for interviews: The aggregation + join approach is the expected solution. It demonstrates comfort with SQL aggregation, derived tables, and filtering grouped results using GROUP BY with HAVING. Mentioning the correlated subquery shows baseline understanding, but the optimized aggregation pattern reflects stronger SQL query design.
MySQL
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Correlated Subquery Comparison | O(n^2) | O(1) | Quick prototype or small datasets where repeated average calculations are acceptable |
| Aggregation + Join (Derived Table) | O(n) | O(n) | Preferred approach for interviews and production queries with grouping and average comparisons |
LeetCode Medium 1126 Yelp Interview SQL Question with Detailed Explanation • Everyday Data Science • 2,534 views views
Watch 1 more video solutions →Practice Active Businesses with our built-in code editor and test cases.
Practice on FleetCode