Investments in 2016 - Solution & Explanation

MediumDatabase15 min readAsked at: Amazon, Microsoft, Meta +3

Problem Statement

Table: Insurance

+-------------+-------+
| Column Name | Type  |
+-------------+-------+
| pid         | int   |
| tiv_2015    | float |
| tiv_2016    | float |
| lat         | float |
| lon         | float |
+-------------+-------+
pid is the primary key (column with unique values) for this table.
Each row of this table contains information about one policy where:
pid is the policyholder's policy ID.
tiv_2015 is the total investment value in 2015 and tiv_2016 is the total investment value in 2016.
lat is the latitude of the policy holder's city. It's guaranteed that lat is not NULL.
lon is the longitude of the policy holder's city. It's guaranteed that lon is not NULL.

Write a solution to report the sum of all total investment values in 2016 tiv_2016, for all policyholders who:

have the same tiv_2015 value as one or more other policyholders, and
are not located in the same city as any other policyholder (i.e., the (lat, lon) attribute pairs must be unique).

Round tiv_2016 to two decimal places.

The result format is in the following example.

Example 1:

Input: 
Insurance table:
+-----+----------+----------+-----+-----+
| pid | tiv_2015 | tiv_2016 | lat | lon |
+-----+----------+----------+-----+-----+
| 1   | 10       | 5        | 10  | 10  |
| 2   | 20       | 20       | 20  | 20  |
| 3   | 10       | 30       | 20  | 20  |
| 4   | 10       | 40       | 40  | 40  |
+-----+----------+----------+-----+-----+
Output: 
+----------+
| tiv_2016 |
+----------+
| 45.00    |
+----------+
Explanation: 
The first record in the table, like the last record, meets both of the two criteria.
The tiv_2015 value 10 is the same as the third and fourth records, and its location is unique.

The second record does not meet any of the two criteria. Its tiv_2015 is not like any other policyholders and its location is the same as the third record, which makes the third record fail, too.
So, the result is the sum of tiv_2016 of the first and last record, which is 45.

Approach Overview

Problem Overview: You are given an Insurance table with pid, tiv_2015, tiv_2016, lat, and lon. The task is to sum tiv_2016 for policies where the tiv_2015 value appears more than once, but the (lat, lon) location is unique across the dataset.

Approach 1: SQL Query with Aggregation and Grouping (O(n log n) time, O(n) space)

This approach relies on SQL aggregation. First, group records by tiv_2015 and keep only values that appear more than once using GROUP BY tiv_2015 HAVING COUNT(*) > 1. Next, identify unique locations by grouping (lat, lon) and filtering where COUNT(*) = 1. Finally, sum tiv_2016 for rows that satisfy both conditions. Most SQL engines implement grouping using sorting or hashing internally, leading to roughly O(n log n) time with O(n) intermediate storage. This is the most direct solution when working with relational databases.

Approach 2: Structural Data Handling with In-Memory Computation (O(n) time, O(n) space)

Load the table into memory and compute frequencies using dictionaries. One map counts occurrences of tiv_2015, while another tracks occurrences of each (lat, lon) pair. After building both frequency maps, iterate through the records again and add tiv_2016 to the result when tiv_2015 frequency is greater than 1 and location frequency equals 1. Hash lookups make each operation constant time, producing overall O(n) complexity. This technique mirrors solutions built with hash tables.

Approach 3: Using Sorting and Two-Pointer Technique (O(n log n) time, O(1)–O(n) space)

Sort records by tiv_2015 to quickly detect duplicates and by (lat, lon) to detect unique locations. After sorting, use pointer scans to mark which entries share the same tiv_2015 and which locations repeat. Sorting costs O(n log n), but the scanning phase is linear. This approach avoids hash tables and works well when the dataset must be processed in deterministic order. It connects closely with patterns from sorting algorithms and pointer-based scans.

Approach 4: Hash Table for O(n) Complexity (O(n) time, O(n) space)

A more explicit version of the in-memory method. Maintain two hash tables: one mapping tiv_2015 to frequency and another mapping the string or tuple (lat, lon) to frequency. After the counting pass, iterate once more and accumulate tiv_2016 when tiv_2015 count > 1 and location count = 1. Each insertion and lookup in the hash table is constant time, so the total runtime stays O(n). This is the optimal algorithmic solution outside of SQL environments and relies on concepts from database-style filtering and hashing.

Recommended for interviews: Interviewers typically expect the hash-based counting approach or the SQL aggregation solution depending on the role. Showing the brute logic—count duplicates of tiv_2015 and filter unique coordinates—demonstrates understanding of the data constraints. Implementing it with hash maps in O(n) time shows strong practical problem-solving skills.

Approach 1: Approach 1: SQL Query with Aggregation and Grouping

This approach involves writing a SQL query to filter and compute the sum of all total investment values in 2016 (tiv_2016) for policyholders, ensuring they satisfy the given criteria.

In essence, we utilize SQL's aggregation functions alongside grouping features to systematically filter the data for our specific needs. This is accomplished by creating temporary tables/subqueries to collect and filter specific required datasets.

This SQL solution utilizes subqueries to filter polynomial datasets based on constraints.

The first subquery filters out groups where tiv_2015 is shared by more than one policyholder using HAVING COUNT(DISTINCT pid) > 1.
The second subquery ensures that the (lat, lon) combination is unique using HAVING COUNT(DISTINCT pid) = 1.
Finally, the main query sums up the tiv_2016 values and rounds it to two decimal places using SQL's ROUND function.

Code

SQL

Complexity

Time Complexity: O(n log n) due to sorting in aggregation and grouping.

Space Complexity: O(n) to store intermediate datasets in memory.

Try this approach in the editor →

Approach 2: Approach 2: Structural Data Handling with In-Memory Computation

This approach involves using in-memory computations through programming languages with efficient data manipulation capabilities. We load the data into memory and use data structures like dictionaries and sets to maintain unique conditions and compute the resultant sum.

Utilize hashmaps to track matching tiv_2015 values and cities uniquely defined by (lat, lon).

This solution involves several parts:

tiv_2015_count keeps track of how many policyholders share the same tiv_2015 value.
city_set and unique_cities are used to ensure uniqueness of the city based on the (lat, lon) combination.
After processing the records, we loop through them again to sum up the tiv_2016 value of those eligible policyholders.

Code

Python

Complexity

Time Complexity: O(n) as it involves two traversals over the dataset.

Space Complexity: O(n) due to the additional space required for hashmaps/databases for storing intermediate results.

Try this approach in the editor →

Approach 3: Using Sorting and Two-Pointer Technique

This approach involves sorting the array and then using two pointers to find pairs that satisfy the condition. Sorting helps to bring potential pairs closer, making it easier to identify valid combinations without unnecessary checks.

This C program sorts an array and uses two pointers: one starting from the beginning and the other from the end. It checks the sum of these two elements and adjusts the pointers based on the result until a pair that sums to the given value is found.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n log n) due to sorting.
Space Complexity: O(1) because sorting is done in-place.

Try this approach in the editor →

Approach 4: Using Hash Table for O(n) Complexity

By utilizing a hash table, we can store the elements of the array and check for complements as we traverse the list. This allows us to achieve an O(n) time complexity for finding pairs that sum up to the given value.

This program leverages a hash table to keep track of numbers we've seen so far. For each number, we calculate its complement to achieve the given sum and check if that complement has already been seen.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n), as each insertion and lookup in the hash table is O(1).
Space Complexity: O(n) for storing the hash table values.

Try this approach in the editor →

Approach 5: Default Approach

Code

MySQL

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Approach 1: SQL Query with Aggregation and Grouping	Time Complexity: O(n log n) due to sorting in aggregation and grouping. Space Complexity: O(n) to store intermediate datasets in memory.
Approach 2: Structural Data Handling with In-Memory Computation	Time Complexity: O(n) as it involves two traversals over the dataset. Space Complexity: O(n) due to the additional space required for hashmaps/databases for storing intermediate results.
Using Sorting and Two-Pointer Technique	Time Complexity: O(n log n) due to sorting. Space Complexity: O(1) because sorting is done in-place.
Using Hash Table for O(n) Complexity	Time Complexity: O(n), as each insertion and lookup in the hash table is O(1). Space Complexity: O(n) for storing the hash table values.
Default Approach	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
SQL Aggregation and Grouping	O(n log n)	O(n)	Best when solving directly in a relational database using GROUP BY and HAVING
In-Memory Structural Processing (Python)	O(n)	O(n)	When the dataset is loaded into application memory and fast lookups are needed
Sorting + Two Pointer Scan	O(n log n)	O(1)–O(n)	Useful when hash tables are restricted or deterministic ordering is required
Hash Table Frequency Counting	O(n)	O(n)	Optimal general solution for application-layer implementations

Video Solution

Investments in 2016 | Leetcode 585 | Crack SQL Interviews in 50 Qs #mysql #leetcode • Learn With Chirag • 8,186 views views

Watch 9 more video solutions →

Frequently Asked Questions

Is Investments in 2016 easy or hard?

Investments in 2016 is generally rated Medium difficulty. The challenge comes from recognizing two filtering conditions at once: duplicated tiv_2015 values and unique coordinates. Once translated into frequency counting or SQL aggregation, the implementation becomes straightforward.

Investments in 2016 Python/Java solution

In Python or Java, store frequency counts in hash maps. First count tiv_2015 and (lat, lon) occurrences, then iterate through the records and add tiv_2016 when tiv_2015 appears more than once and the coordinate is unique. The implementation runs in O(n) time with O(n) extra space.

How to solve Investments in 2016 in O(n)?

Build two hash maps: one counting how many times each tiv_2015 value appears and another counting how often each (lat, lon) coordinate appears. After the counting pass, scan the dataset again and sum tiv_2016 where tiv_2015 count is greater than 1 and the coordinate count equals 1. Each lookup is constant time, giving O(n) overall complexity.

What is the best approach for Investments in 2016?

The most practical approach uses hash tables to count frequencies of tiv_2015 values and (lat, lon) locations. After counting, iterate through the records and include tiv_2016 only when tiv_2015 appears more than once and the location appears exactly once. This runs in O(n) time with O(n) space and mirrors the logic used in optimized SQL queries.

Is Investments in 2016 asked at Google/Amazon/Meta?

Database filtering and aggregation problems similar to Investments in 2016 appear in interviews at companies like Amazon, Meta, and Google for data engineering or backend roles. Candidates are expected to demonstrate strong SQL skills, especially with GROUP BY, HAVING, and conditional aggregation.

What data structure is used in Investments in 2016?

The core data structures are hash tables (or dictionaries) used to count frequencies of tiv_2015 values and geographic coordinates. In SQL solutions, the equivalent concept is grouping and aggregation using GROUP BY.

What is the time complexity of Investments in 2016?

The optimal algorithm runs in O(n) time using hash tables to count tiv_2015 and location frequencies. SQL implementations using GROUP BY often behave closer to O(n log n) due to sorting or aggregation steps inside the query engine.

Ready to solve this problem?

Practice Investments in 2016 with our built-in code editor and test cases.

Practice on FleetCode

Combine Two Tables

Second Highest Salary

Problem Info

DifficultyMedium

Acceptance49.4%

Approaches5

Reading time15 min

Asked at

Amazon Microsoft Meta Google Bloomberg

Practice this problem

Open in Editor

Problem Statement

Approach Overview

Approach 1: Approach 1: SQL Query with Aggregation and Grouping

Code

Complexity

Approach 2: Approach 2: Structural Data Handling with In-Memory Computation

Code

Complexity

Approach 3: Using Sorting and Two-Pointer Technique

Code

Complexity

Approach 4: Using Hash Table for O(n) Complexity

Code

Complexity

Approach 5: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Approach 1: SQL Query with Aggregation and Grouping

Code

Complexity

Approach 2: Approach 2: Structural Data Handling with In-Memory Computation

Code

Complexity

Approach 3: Using Sorting and Two-Pointer Technique

Code

Complexity

Approach 4: Using Hash Table for O(n) Complexity

Code

Complexity

Approach 5: Default Approach

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents