Table: Insurance
+-------------+-------+ | Column Name | Type | +-------------+-------+ | pid | int | | tiv_2015 | float | | tiv_2016 | float | | lat | float | | lon | float | +-------------+-------+ pid is the primary key (column with unique values) for this table. Each row of this table contains information about one policy where: pid is the policyholder's policy ID. tiv_2015 is the total investment value in 2015 and tiv_2016 is the total investment value in 2016. lat is the latitude of the policy holder's city. It's guaranteed that lat is not NULL. lon is the longitude of the policy holder's city. It's guaranteed that lon is not NULL.
Write a solution to report the sum of all total investment values in 2016 tiv_2016, for all policyholders who:
tiv_2015 value as one or more other policyholders, andlat, lon) attribute pairs must be unique).Round tiv_2016 to two decimal places.
The result format is in the following example.
Example 1:
Input: Insurance table: +-----+----------+----------+-----+-----+ | pid | tiv_2015 | tiv_2016 | lat | lon | +-----+----------+----------+-----+-----+ | 1 | 10 | 5 | 10 | 10 | | 2 | 20 | 20 | 20 | 20 | | 3 | 10 | 30 | 20 | 20 | | 4 | 10 | 40 | 40 | 40 | +-----+----------+----------+-----+-----+ Output: +----------+ | tiv_2016 | +----------+ | 45.00 | +----------+ Explanation: The first record in the table, like the last record, meets both of the two criteria. The tiv_2015 value 10 is the same as the third and fourth records, and its location is unique. The second record does not meet any of the two criteria. Its tiv_2015 is not like any other policyholders and its location is the same as the third record, which makes the third record fail, too. So, the result is the sum of tiv_2016 of the first and last record, which is 45.
Problem Overview: You are given an Insurance table with pid, tiv_2015, tiv_2016, lat, and lon. The task is to sum tiv_2016 for policies where the tiv_2015 value appears more than once, but the (lat, lon) location is unique across the dataset.
Approach 1: SQL Query with Aggregation and Grouping (O(n log n) time, O(n) space)
This approach relies on SQL aggregation. First, group records by tiv_2015 and keep only values that appear more than once using GROUP BY tiv_2015 HAVING COUNT(*) > 1. Next, identify unique locations by grouping (lat, lon) and filtering where COUNT(*) = 1. Finally, sum tiv_2016 for rows that satisfy both conditions. Most SQL engines implement grouping using sorting or hashing internally, leading to roughly O(n log n) time with O(n) intermediate storage. This is the most direct solution when working with relational databases.
Approach 2: Structural Data Handling with In-Memory Computation (O(n) time, O(n) space)
Load the table into memory and compute frequencies using dictionaries. One map counts occurrences of tiv_2015, while another tracks occurrences of each (lat, lon) pair. After building both frequency maps, iterate through the records again and add tiv_2016 to the result when tiv_2015 frequency is greater than 1 and location frequency equals 1. Hash lookups make each operation constant time, producing overall O(n) complexity. This technique mirrors solutions built with hash tables.
Approach 3: Using Sorting and Two-Pointer Technique (O(n log n) time, O(1)–O(n) space)
Sort records by tiv_2015 to quickly detect duplicates and by (lat, lon) to detect unique locations. After sorting, use pointer scans to mark which entries share the same tiv_2015 and which locations repeat. Sorting costs O(n log n), but the scanning phase is linear. This approach avoids hash tables and works well when the dataset must be processed in deterministic order. It connects closely with patterns from sorting algorithms and pointer-based scans.
Approach 4: Hash Table for O(n) Complexity (O(n) time, O(n) space)
A more explicit version of the in-memory method. Maintain two hash tables: one mapping tiv_2015 to frequency and another mapping the string or tuple (lat, lon) to frequency. After the counting pass, iterate once more and accumulate tiv_2016 when tiv_2015 count > 1 and location count = 1. Each insertion and lookup in the hash table is constant time, so the total runtime stays O(n). This is the optimal algorithmic solution outside of SQL environments and relies on concepts from database-style filtering and hashing.
Recommended for interviews: Interviewers typically expect the hash-based counting approach or the SQL aggregation solution depending on the role. Showing the brute logic—count duplicates of tiv_2015 and filter unique coordinates—demonstrates understanding of the data constraints. Implementing it with hash maps in O(n) time shows strong practical problem-solving skills.
This approach involves writing a SQL query to filter and compute the sum of all total investment values in 2016 (tiv_2016) for policyholders, ensuring they satisfy the given criteria.
In essence, we utilize SQL's aggregation functions alongside grouping features to systematically filter the data for our specific needs. This is accomplished by creating temporary tables/subqueries to collect and filter specific required datasets.
This SQL solution utilizes subqueries to filter polynomial datasets based on constraints.
tiv_2015 is shared by more than one policyholder using HAVING COUNT(DISTINCT pid) > 1.HAVING COUNT(DISTINCT pid) = 1.tiv_2016 values and rounds it to two decimal places using SQL's ROUND function.SQL
Time Complexity: O(n log n) due to sorting in aggregation and grouping.
Space Complexity: O(n) to store intermediate datasets in memory.
This approach involves using in-memory computations through programming languages with efficient data manipulation capabilities. We load the data into memory and use data structures like dictionaries and sets to maintain unique conditions and compute the resultant sum.
Utilize hashmaps to track matching tiv_2015 values and cities uniquely defined by (lat, lon).
This solution involves several parts:
tiv_2015_count keeps track of how many policyholders share the same tiv_2015 value.city_set and unique_cities are used to ensure uniqueness of the city based on the (lat, lon) combination.tiv_2016 value of those eligible policyholders.Python
Time Complexity: O(n) as it involves two traversals over the dataset.
Space Complexity: O(n) due to the additional space required for hashmaps/databases for storing intermediate results.
This approach involves sorting the array and then using two pointers to find pairs that satisfy the condition. Sorting helps to bring potential pairs closer, making it easier to identify valid combinations without unnecessary checks.
This C program sorts an array and uses two pointers: one starting from the beginning and the other from the end. It checks the sum of these two elements and adjusts the pointers based on the result until a pair that sums to the given value is found.
Time Complexity: O(n log n) due to sorting.
Space Complexity: O(1) because sorting is done in-place.
By utilizing a hash table, we can store the elements of the array and check for complements as we traverse the list. This allows us to achieve an O(n) time complexity for finding pairs that sum up to the given value.
This program leverages a hash table to keep track of numbers we've seen so far. For each number, we calculate its complement to achieve the given sum and check if that complement has already been seen.
Time Complexity: O(n), as each insertion and lookup in the hash table is O(1).
Space Complexity: O(n) for storing the hash table values.
MySQL
| Approach | Complexity |
|---|---|
| Approach 1: SQL Query with Aggregation and Grouping | Time Complexity: O(n log n) due to sorting in aggregation and grouping. Space Complexity: O(n) to store intermediate datasets in memory. |
| Approach 2: Structural Data Handling with In-Memory Computation | Time Complexity: O(n) as it involves two traversals over the dataset. Space Complexity: O(n) due to the additional space required for hashmaps/databases for storing intermediate results. |
| Using Sorting and Two-Pointer Technique | Time Complexity: O(n log n) due to sorting. |
| Using Hash Table for O(n) Complexity | Time Complexity: O(n), as each insertion and lookup in the hash table is O(1). |
| Default Approach | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| SQL Aggregation and Grouping | O(n log n) | O(n) | Best when solving directly in a relational database using GROUP BY and HAVING |
| In-Memory Structural Processing (Python) | O(n) | O(n) | When the dataset is loaded into application memory and fast lookups are needed |
| Sorting + Two Pointer Scan | O(n log n) | O(1)–O(n) | Useful when hash tables are restricted or deterministic ordering is required |
| Hash Table Frequency Counting | O(n) | O(n) | Optimal general solution for application-layer implementations |
Investments in 2016 | Leetcode 585 | Crack SQL Interviews in 50 Qs #mysql #leetcode • Learn With Chirag • 8,186 views views
Watch 9 more video solutions →Practice Investments in 2016 with our built-in code editor and test cases.
Practice on FleetCode