Watch 3 video solutions for Hopper Company Queries I, a hard level problem involving Database. This walkthrough by Everyday Data Science has 840 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Drivers
+-------------+---------+ | Column Name | Type | +-------------+---------+ | driver_id | int | | join_date | date | +-------------+---------+ driver_id is the primary key (column with unique values) for this table. Each row of this table contains the driver's ID and the date they joined the Hopper company.
Table: Rides
+--------------+---------+ | Column Name | Type | +--------------+---------+ | ride_id | int | | user_id | int | | requested_at | date | +--------------+---------+ ride_id is the primary key (column with unique values) for this table. Each row of this table contains the ID of a ride, the user's ID that requested it, and the day they requested it. There may be some ride requests in this table that were not accepted.
Table: AcceptedRides
+---------------+---------+ | Column Name | Type | +---------------+---------+ | ride_id | int | | driver_id | int | | ride_distance | int | | ride_duration | int | +---------------+---------+ ride_id is the primary key (column with unique values) for this table. Each row of this table contains some information about an accepted ride. It is guaranteed that each accepted ride exists in the Rides table.
Write a solution to report the following statistics for each month of 2020:
active_drivers).accepted_rides).Return the result table ordered by month in ascending order, where month is the month's number (January is 1, February is 2, etc.).
The result format is in the following example.
Example 1:
Input: Drivers table: +-----------+------------+ | driver_id | join_date | +-----------+------------+ | 10 | 2019-12-10 | | 8 | 2020-1-13 | | 5 | 2020-2-16 | | 7 | 2020-3-8 | | 4 | 2020-5-17 | | 1 | 2020-10-24 | | 6 | 2021-1-5 | +-----------+------------+ Rides table: +---------+---------+--------------+ | ride_id | user_id | requested_at | +---------+---------+--------------+ | 6 | 75 | 2019-12-9 | | 1 | 54 | 2020-2-9 | | 10 | 63 | 2020-3-4 | | 19 | 39 | 2020-4-6 | | 3 | 41 | 2020-6-3 | | 13 | 52 | 2020-6-22 | | 7 | 69 | 2020-7-16 | | 17 | 70 | 2020-8-25 | | 20 | 81 | 2020-11-2 | | 5 | 57 | 2020-11-9 | | 2 | 42 | 2020-12-9 | | 11 | 68 | 2021-1-11 | | 15 | 32 | 2021-1-17 | | 12 | 11 | 2021-1-19 | | 14 | 18 | 2021-1-27 | +---------+---------+--------------+ AcceptedRides table: +---------+-----------+---------------+---------------+ | ride_id | driver_id | ride_distance | ride_duration | +---------+-----------+---------------+---------------+ | 10 | 10 | 63 | 38 | | 13 | 10 | 73 | 96 | | 7 | 8 | 100 | 28 | | 17 | 7 | 119 | 68 | | 20 | 1 | 121 | 92 | | 5 | 7 | 42 | 101 | | 2 | 4 | 6 | 38 | | 11 | 8 | 37 | 43 | | 15 | 8 | 108 | 82 | | 12 | 8 | 38 | 34 | | 14 | 1 | 90 | 74 | +---------+-----------+---------------+---------------+ Output: +-------+----------------+----------------+ | month | active_drivers | accepted_rides | +-------+----------------+----------------+ | 1 | 2 | 0 | | 2 | 3 | 0 | | 3 | 4 | 1 | | 4 | 4 | 0 | | 5 | 5 | 0 | | 6 | 5 | 1 | | 7 | 5 | 1 | | 8 | 5 | 1 | | 9 | 5 | 0 | | 10 | 6 | 0 | | 11 | 6 | 2 | | 12 | 6 | 1 | +-------+----------------+----------------+ Explanation: By the end of January --> two active drivers (10, 8) and no accepted rides. By the end of February --> three active drivers (10, 8, 5) and no accepted rides. By the end of March --> four active drivers (10, 8, 5, 7) and one accepted ride (10). By the end of April --> four active drivers (10, 8, 5, 7) and no accepted rides. By the end of May --> five active drivers (10, 8, 5, 7, 4) and no accepted rides. By the end of June --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (13). By the end of July --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (7). By the end of August --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (17). By the end of September --> five active drivers (10, 8, 5, 7, 4) and no accepted rides. By the end of October --> six active drivers (10, 8, 5, 7, 4, 1) and no accepted rides. By the end of November --> six active drivers (10, 8, 5, 7, 4, 1) and two accepted rides (20, 5). By the end of December --> six active drivers (10, 8, 5, 7, 4, 1) and one accepted ride (2).
Problem Overview: You need to generate monthly analytics for the Hopper platform. For each month of a specific year, compute two metrics: the number of active drivers (drivers who have joined up to that month) and the number of accepted rides in that month. The result must include all months even if no rides occurred.
Approach 1: Correlated Subqueries with Monthly Aggregation (O(n^2) logical scans)
A straightforward SQL solution calculates each metric independently using correlated subqueries. For every month from 1..12, count drivers whose join_date is less than or equal to the end of that month. Accepted rides are computed by joining Rides and AcceptedRides, then grouping by the month of requested_at. The outer query iterates over all months and pulls both counts using subqueries. This approach is easy to write but inefficient because the database may repeatedly scan driver or ride tables for each month.
Approach 2: Pre‑Aggregation + Window Function (O(n))
A more efficient strategy aggregates data once and then derives the required metrics. First, group drivers by the month of join_date to compute how many drivers joined in each month. Apply a cumulative sum using a window function such as SUM(count) OVER (ORDER BY month) to calculate the running total of active drivers. Next, compute monthly accepted rides by joining Rides with AcceptedRides and grouping by MONTH(requested_at). Finally, join these results with a generated month table (1–12) so months without rides still appear in the output.
This design minimizes repeated scans and relies on efficient SQL primitives like grouping and window aggregation. The driver counts are computed once and reused through a cumulative calculation, which keeps the query scalable even with large datasets.
Conceptually, the query combines ideas from database queries, SQL aggregation, and prefix sum style cumulative calculations. Window functions act like prefix sums over ordered rows, which is why they fit perfectly for computing the running number of active drivers.
Recommended for interviews: The pre‑aggregation + window function approach. The correlated subquery solution demonstrates basic SQL reasoning, but interviewers usually expect candidates to reduce repeated scans and compute cumulative values using window functions or staged aggregations.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Correlated Subqueries with Monthly Counts | O(n^2) logical scans | O(1) | Simple implementation when tables are small or query optimization is not critical |
| Pre‑Aggregation + Window Function | O(n) | O(m) | Best for large datasets; avoids repeated scans and computes cumulative driver counts efficiently |