Watch 10 video solutions for Confirmation Rate, a medium level problem involving Database. This walkthrough by Learn With Chirag has 20,645 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Signups
+----------------+----------+ | Column Name | Type | +----------------+----------+ | user_id | int | | time_stamp | datetime | +----------------+----------+ user_id is the column of unique values for this table. Each row contains information about the signup time for the user with ID user_id.
Table: Confirmations
+----------------+----------+
| Column Name | Type |
+----------------+----------+
| user_id | int |
| time_stamp | datetime |
| action | ENUM |
+----------------+----------+
(user_id, time_stamp) is the primary key (combination of columns with unique values) for this table.
user_id is a foreign key (reference column) to the Signups table.
action is an ENUM (category) of the type ('confirmed', 'timeout')
Each row of this table indicates that the user with ID user_id requested a confirmation message at time_stamp and that confirmation message was either confirmed ('confirmed') or expired without confirming ('timeout').
The confirmation rate of a user is the number of 'confirmed' messages divided by the total number of requested confirmation messages. The confirmation rate of a user that did not request any confirmation messages is 0. Round the confirmation rate to two decimal places.
Write a solution to find the confirmation rate of each user.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Signups table: +---------+---------------------+ | user_id | time_stamp | +---------+---------------------+ | 3 | 2020-03-21 10:16:13 | | 7 | 2020-01-04 13:57:59 | | 2 | 2020-07-29 23:09:44 | | 6 | 2020-12-09 10:39:37 | +---------+---------------------+ Confirmations table: +---------+---------------------+-----------+ | user_id | time_stamp | action | +---------+---------------------+-----------+ | 3 | 2021-01-06 03:30:46 | timeout | | 3 | 2021-07-14 14:00:00 | timeout | | 7 | 2021-06-12 11:57:29 | confirmed | | 7 | 2021-06-13 12:58:28 | confirmed | | 7 | 2021-06-14 13:59:27 | confirmed | | 2 | 2021-01-22 00:00:00 | confirmed | | 2 | 2021-02-28 23:59:59 | timeout | +---------+---------------------+-----------+ Output: +---------+-------------------+ | user_id | confirmation_rate | +---------+-------------------+ | 6 | 0.00 | | 3 | 0.00 | | 7 | 1.00 | | 2 | 0.50 | +---------+-------------------+ Explanation: User 6 did not request any confirmation messages. The confirmation rate is 0. User 3 made 2 requests and both timed out. The confirmation rate is 0. User 7 made 3 requests and all were confirmed. The confirmation rate is 1. User 2 made 2 requests where one was confirmed and the other timed out. The confirmation rate is 1 / 2 = 0.5.
Problem Overview: You are given two tables: Signups (all registered users) and Confirmations (confirmation requests with actions such as confirmed or timeout). For each user, compute the confirmation rate defined as confirmed_requests / total_requests. Users with no confirmation requests must return a rate of 0.00.
Approach 1: Using SQL Aggregation with Joins (O(n) time, O(1) extra space)
Join the Signups table with Confirmations using a LEFT JOIN so users without confirmation records are still included. Group by user_id and compute the rate using conditional aggregation. A common pattern is AVG(action = 'confirmed'), which treats true values as 1 and false as 0. This directly returns the fraction of confirmed requests. If a user has no rows in Confirmations, wrap the result with COALESCE to return 0. This approach scans the confirmation records once and performs grouping, making it efficient for large datasets.
Approach 2: Using SQL Aggregate Functions (O(n) time, O(1) extra space)
Another SQL strategy explicitly counts confirmed and total requests. After a LEFT JOIN, use SUM(CASE WHEN action='confirmed' THEN 1 ELSE 0 END) for confirmed requests and COUNT(action) for total requests. Divide these values and round to two decimal places using ROUND(). This approach is slightly more verbose but very explicit about how the metric is calculated. It is useful when databases do not support boolean expressions inside aggregate functions.
Approach 3: Data Aggregation Using a Programming Language (O(n) time, O(n) space)
Load both tables into a structure such as dictionaries or hash maps in Python. Iterate through the Confirmations table and track per-user counts: total requests and confirmed requests. After building these counts, iterate through all users in Signups and compute the rate. This method mirrors what the SQL GROUP BY does internally. It works well when processing data outside a database pipeline or when integrating with application logic.
Approach 4: Programmatic Calculation Using Scripting Languages (O(n) time, O(n) space)
Scripting approaches follow the same idea but emphasize step-by-step data processing. Parse confirmation records, accumulate counts using a map keyed by user_id, then calculate the ratio for each signup record. Formatting to two decimal places ensures the output matches database-style reporting. While less concise than SQL, this method gives full control over preprocessing and validation.
Recommended for interviews: The SQL aggregation with LEFT JOIN and GROUP BY is the expected solution. It demonstrates understanding of relational joins, conditional aggregation, and handling missing rows. Knowing both the boolean AVG() trick and the explicit SUM/COUNT version shows strong command of SQL, database queries, and data aggregation patterns.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| SQL Aggregation with LEFT JOIN | O(n) | O(1) | Best general SQL solution for relational databases |
| SQL SUM/COUNT Aggregate Functions | O(n) | O(1) | When explicit conditional aggregation is preferred |
| Python Data Aggregation | O(n) | O(n) | When processing exported data outside the database |
| Programmatic Scripting Calculation | O(n) | O(n) | Useful in ETL pipelines or application-level data processing |