Watch 10 video solutions for Investments in 2016, a medium level problem involving Database. This walkthrough by Learn With Chirag has 8,186 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Insurance
+-------------+-------+ | Column Name | Type | +-------------+-------+ | pid | int | | tiv_2015 | float | | tiv_2016 | float | | lat | float | | lon | float | +-------------+-------+ pid is the primary key (column with unique values) for this table. Each row of this table contains information about one policy where: pid is the policyholder's policy ID. tiv_2015 is the total investment value in 2015 and tiv_2016 is the total investment value in 2016. lat is the latitude of the policy holder's city. It's guaranteed that lat is not NULL. lon is the longitude of the policy holder's city. It's guaranteed that lon is not NULL.
Write a solution to report the sum of all total investment values in 2016 tiv_2016, for all policyholders who:
tiv_2015 value as one or more other policyholders, andlat, lon) attribute pairs must be unique).Round tiv_2016 to two decimal places.
The result format is in the following example.
Example 1:
Input: Insurance table: +-----+----------+----------+-----+-----+ | pid | tiv_2015 | tiv_2016 | lat | lon | +-----+----------+----------+-----+-----+ | 1 | 10 | 5 | 10 | 10 | | 2 | 20 | 20 | 20 | 20 | | 3 | 10 | 30 | 20 | 20 | | 4 | 10 | 40 | 40 | 40 | +-----+----------+----------+-----+-----+ Output: +----------+ | tiv_2016 | +----------+ | 45.00 | +----------+ Explanation: The first record in the table, like the last record, meets both of the two criteria. The tiv_2015 value 10 is the same as the third and fourth records, and its location is unique. The second record does not meet any of the two criteria. Its tiv_2015 is not like any other policyholders and its location is the same as the third record, which makes the third record fail, too. So, the result is the sum of tiv_2016 of the first and last record, which is 45.
Problem Overview: You are given an Insurance table with pid, tiv_2015, tiv_2016, lat, and lon. The task is to sum tiv_2016 for policies where the tiv_2015 value appears more than once, but the (lat, lon) location is unique across the dataset.
Approach 1: SQL Query with Aggregation and Grouping (O(n log n) time, O(n) space)
This approach relies on SQL aggregation. First, group records by tiv_2015 and keep only values that appear more than once using GROUP BY tiv_2015 HAVING COUNT(*) > 1. Next, identify unique locations by grouping (lat, lon) and filtering where COUNT(*) = 1. Finally, sum tiv_2016 for rows that satisfy both conditions. Most SQL engines implement grouping using sorting or hashing internally, leading to roughly O(n log n) time with O(n) intermediate storage. This is the most direct solution when working with relational databases.
Approach 2: Structural Data Handling with In-Memory Computation (O(n) time, O(n) space)
Load the table into memory and compute frequencies using dictionaries. One map counts occurrences of tiv_2015, while another tracks occurrences of each (lat, lon) pair. After building both frequency maps, iterate through the records again and add tiv_2016 to the result when tiv_2015 frequency is greater than 1 and location frequency equals 1. Hash lookups make each operation constant time, producing overall O(n) complexity. This technique mirrors solutions built with hash tables.
Approach 3: Using Sorting and Two-Pointer Technique (O(n log n) time, O(1)–O(n) space)
Sort records by tiv_2015 to quickly detect duplicates and by (lat, lon) to detect unique locations. After sorting, use pointer scans to mark which entries share the same tiv_2015 and which locations repeat. Sorting costs O(n log n), but the scanning phase is linear. This approach avoids hash tables and works well when the dataset must be processed in deterministic order. It connects closely with patterns from sorting algorithms and pointer-based scans.
Approach 4: Hash Table for O(n) Complexity (O(n) time, O(n) space)
A more explicit version of the in-memory method. Maintain two hash tables: one mapping tiv_2015 to frequency and another mapping the string or tuple (lat, lon) to frequency. After the counting pass, iterate once more and accumulate tiv_2016 when tiv_2015 count > 1 and location count = 1. Each insertion and lookup in the hash table is constant time, so the total runtime stays O(n). This is the optimal algorithmic solution outside of SQL environments and relies on concepts from database-style filtering and hashing.
Recommended for interviews: Interviewers typically expect the hash-based counting approach or the SQL aggregation solution depending on the role. Showing the brute logic—count duplicates of tiv_2015 and filter unique coordinates—demonstrates understanding of the data constraints. Implementing it with hash maps in O(n) time shows strong practical problem-solving skills.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| SQL Aggregation and Grouping | O(n log n) | O(n) | Best when solving directly in a relational database using GROUP BY and HAVING |
| In-Memory Structural Processing (Python) | O(n) | O(n) | When the dataset is loaded into application memory and fast lookups are needed |
| Sorting + Two Pointer Scan | O(n log n) | O(1)–O(n) | Useful when hash tables are restricted or deterministic ordering is required |
| Hash Table Frequency Counting | O(n) | O(n) | Optimal general solution for application-layer implementations |