Top Percentile Fraud - Solution & Explanation

MediumPremiumFree on FleetCodeDatabase5 min read

Problem Statement

Table: Fraud

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| policy_id   | int     |
| state       | varchar |
| fraud_score | int     |
+-------------+---------+
policy_id is column of unique values for this table.
This table contains policy id, state, and fraud score.

The Leetcode Insurance Corp has developed an ML-driven predictive model to detect the likelihood of fraudulent claims. Consequently, they allocate their most seasoned claim adjusters to address the top 5% of claims flagged by this model.

Write a solution to find the top 5 percentile of claims from each state.

Return the result table ordered by state in ascending order, fraud_score in descending order, and policy_id in ascending order.

The result format is in the following example.

Example 1:

Input: 
Fraud table:
+-----------+------------+-------------+
| policy_id | state      | fraud_score | 
+-----------+------------+-------------+
| 1         | California | 0.92        | 
| 2         | California | 0.68        |   
| 3         | California | 0.17        | 
| 4         | New York   | 0.94        | 
| 5         | New York   | 0.81        | 
| 6         | New York   | 0.77        |  
| 7         | Texas      | 0.98        |  
| 8         | Texas      | 0.97        | 
| 9         | Texas      | 0.96        | 
| 10        | Florida    | 0.97        |  
| 11        | Florida    | 0.98        | 
| 12        | Florida    | 0.78        | 
| 13        | Florida    | 0.88        | 
| 14        | Florida    | 0.66        | 
+-----------+------------+-------------+
Output: 
+-----------+------------+-------------+
| policy_id | state      | fraud_score |
+-----------+------------+-------------+
| 1         | California | 0.92        | 
| 11        | Florida    | 0.98        | 
| 4         | New York   | 0.94        | 
| 7         | Texas      | 0.98        |  
+-----------+------------+-------------+
Explanation
- For the state of California, only policy ID 1, with a fraud score of 0.92, falls within the top 5 percentile for this state.
- For the state of Florida, only policy ID 11, with a fraud score of 0.98, falls within the top 5 percentile for this state. 
- For the state of New York, only policy ID 4, with a fraud score of 0.94, falls within the top 5 percentile for this state. 
- For the state of Texas, only policy ID 7, with a fraud score of 0.98, falls within the top 5 percentile for this state. 
Output table is ordered by state in ascending order, fraud score in descending order, and policy ID in ascending order.

Approach Overview

Problem Overview: You need to identify records that fall within the top fraud percentile based on a fraud-related metric (for example fraud rate, fraud score, or suspicious transaction count). The task is essentially a ranking problem: compute the relative position of each record and return only those that fall within the highest percentile threshold.

Approach 1: Aggregate + Threshold Subquery (O(n log n) time, O(n) space)

A straightforward approach computes the fraud metric for each entity (user, merchant, or account) using GROUP BY. After generating the aggregated results, sort them by the fraud metric and determine the cutoff for the desired percentile using a subquery. For example, you can compute the value at the 90th or 95th percentile and filter rows where the metric is greater than or equal to that threshold.

This approach works in most SQL systems but often requires multiple passes over the data. One query computes aggregated fraud metrics, another determines the percentile cutoff, and the outer query filters results. The database must perform sorting to determine percentile boundaries, giving roughly O(n log n) time due to ordering and O(n) space for intermediate results. This method is easy to reason about but becomes verbose and less flexible when you need multiple percentile-based filters.

Approach 2: Window Function Ranking (O(n log n) time, O(n) space)

The cleaner solution uses SQL window functions to compute the percentile rank directly for each row. After computing the fraud metric per entity, apply functions like PERCENT_RANK() or CUME_DIST() over an ordered window such as OVER (ORDER BY fraud_metric DESC). This assigns a percentile position between 0 and 1 to every record.

Once the percentile rank is available, filtering becomes trivial: keep rows where the percentile is greater than or equal to the required threshold (for example >= 0.9 for the top 10%). The database engine performs a single ranking pass after sorting the results. Sorting dominates the runtime, so the overall complexity is O(n log n), with O(n) space used for the ranking window.

This method is concise and expressive. It keeps the entire computation inside one query and avoids manually calculating percentile boundaries. Window functions are widely used in analytical SQL problems and are a core technique in database interview questions.

Recommended for interviews: The window function approach is the expected solution. Interviewers want to see that you understand ranking functions like PERCENT_RANK or CUME_DIST and can apply them to percentile-based filtering. Mentioning the aggregation + threshold method first shows you understand the baseline approach, but using a window function demonstrates stronger SQL skills and familiarity with modern analytical queries often used in SQL data analysis tasks.

Solution

We can use the RANK() window function to calculate the ranking of fraud scores for each state, then filter out the records with a rank of 1, and sort them as required by the problem.

Code

MySQL

Try this approach in the editor →

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Aggregate + Percentile Threshold Subquery	O(n log n)	O(n)	When window functions are unavailable or when computing percentile cutoffs manually
Window Function (PERCENT_RANK / CUME_DIST)	O(n log n)	O(n)	Preferred modern SQL approach for ranking and percentile filtering

Video Solution

Leetcode MEDIUM 3055 - Top Percentile Fraud PERCENT_RANK() SQL - Explained by Everyday Data Science • Everyday Data Science • 550 views views

Watch 1 more video solutions →

Frequently Asked Questions

Is Top Percentile Fraud easy or hard?

Top Percentile Fraud is usually considered a medium difficulty database problem. The challenge is recognizing that percentile filtering requires ranking and that SQL window functions provide a clean and efficient way to compute those ranks.

Top Percentile Fraud Python/Java solution

Unlike typical algorithm problems, this challenge is solved using SQL rather than Python or Java. The query aggregates fraud metrics and applies window functions like PERCENT_RANK() or CUME_DIST() to determine which rows belong to the highest percentile.

How to solve Top Percentile Fraud in O(n)?

Pure O(n) is generally not achievable in SQL because percentile calculations require ordering. The optimal practical solution sorts the aggregated fraud metrics and applies a window function such as PERCENT_RANK(), which results in O(n log n) time due to the sort operation.

What is the best approach for Top Percentile Fraud?

The most efficient approach uses SQL window functions such as PERCENT_RANK() or CUME_DIST(). First compute the fraud metric for each entity, then rank the results using a window ordered by the fraud score. Filtering rows with percentile values above the target threshold returns the top fraud percentile in a single query.

Is Top Percentile Fraud asked at Google/Amazon/Meta?

Percentile and ranking queries are common in data engineering and analytics interviews at companies like Google, Amazon, and Meta. Variations often test knowledge of SQL window functions, ranking logic, and analytical query design rather than raw algorithmic complexity.

What data structure is used in Top Percentile Fraud?

This problem relies on SQL analytical processing rather than traditional data structures. The key mechanism is a window function that ranks rows after sorting. Internally, the database engine uses sorted result sets to compute percentile ranks efficiently.

What is the time complexity of Top Percentile Fraud?

Most SQL solutions run in O(n log n) time because the database must sort rows by the fraud metric before applying ranking functions like PERCENT_RANK or CUME_DIST. The window calculation itself is linear after sorting. Space complexity is typically O(n) for intermediate query results.

Ready to solve this problem?

Practice Top Percentile Fraud with our built-in code editor and test cases.

Practice on FleetCode

Combine Two Tables

Second Highest Salary

Problem Info

DifficultyMedium

Acceptance58.1%

Approaches1

Reading time5 min

Practice this problem

Open in Editor

Problem Statement

Table: Fraud

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| policy_id   | int     |
| state       | varchar |
| fraud_score | int     |
+-------------+---------+
policy_id is column of unique values for this table.
This table contains policy id, state, and fraud score.

Write a solution to find the top 5 percentile of claims from each state.

Return the result table ordered by state in ascending order, fraud_score in descending order, and policy_id in ascending order.

The result format is in the following example.

Example 1:

Input: 
Fraud table:
+-----------+------------+-------------+
| policy_id | state      | fraud_score | 
+-----------+------------+-------------+
| 1         | California | 0.92        | 
| 2         | California | 0.68        |   
| 3         | California | 0.17        | 
| 4         | New York   | 0.94        | 
| 5         | New York   | 0.81        | 
| 6         | New York   | 0.77        |  
| 7         | Texas      | 0.98        |  
| 8         | Texas      | 0.97        | 
| 9         | Texas      | 0.96        | 
| 10        | Florida    | 0.97        |  
| 11        | Florida    | 0.98        | 
| 12        | Florida    | 0.78        | 
| 13        | Florida    | 0.88        | 
| 14        | Florida    | 0.66        | 
+-----------+------------+-------------+
Output: 
+-----------+------------+-------------+
| policy_id | state      | fraud_score |
+-----------+------------+-------------+
| 1         | California | 0.92        | 
| 11        | Florida    | 0.98        | 
| 4         | New York   | 0.94        | 
| 7         | Texas      | 0.98        |  
+-----------+------------+-------------+
Explanation
- For the state of California, only policy ID 1, with a fraud score of 0.92, falls within the top 5 percentile for this state.
- For the state of Florida, only policy ID 11, with a fraud score of 0.98, falls within the top 5 percentile for this state. 
- For the state of New York, only policy ID 4, with a fraud score of 0.94, falls within the top 5 percentile for this state. 
- For the state of Texas, only policy ID 7, with a fraud score of 0.98, falls within the top 5 percentile for this state. 
Output table is ordered by state in ascending order, fraud score in descending order, and policy ID in ascending order.

Approach Overview

Approach 1: Aggregate + Threshold Subquery (O(n log n) time, O(n) space)

Approach 2: Window Function Ranking (O(n log n) time, O(n) space)

Approach

Time

Space

When to Use

Aggregate + Percentile Threshold Subquery

O(n log n)

O(n)

When window functions are unavailable or when computing percentile cutoffs manually

Window Function (PERCENT_RANK / CUME_DIST)

O(n log n)

O(n)

Preferred modern SQL approach for ranking and percentile filtering

Frequently Asked Questions

Is Top Percentile Fraud easy or hard?

Top Percentile Fraud Python/Java solution

How to solve Top Percentile Fraud in O(n)?

What is the best approach for Top Percentile Fraud?

Is Top Percentile Fraud asked at Google/Amazon/Meta?

What data structure is used in Top Percentile Fraud?

What is the time complexity of Top Percentile Fraud?

Top Percentile Fraud - Solution & Explanation

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Top Percentile Fraud - Solution & Explanation

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Solution

Code

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents