Watch the video solution for Leetcodify Similar Friends, a hard level problem involving Database. This walkthrough by Everyday Data Science has 687 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Listens
+-------------+---------+ | Column Name | Type | +-------------+---------+ | user_id | int | | song_id | int | | day | date | +-------------+---------+ This table may contain duplicate rows. Each row of this table indicates that the user user_id listened to the song song_id on the day day.
Table: Friendship
+---------------+---------+ | Column Name | Type | +---------------+---------+ | user1_id | int | | user2_id | int | +---------------+---------+ (user1_id, user2_id) is the primary key (combination of columns with unique values) for this table. Each row of this table indicates that the users user1_id and user2_id are friends. Note that user1_id < user2_id.
Write a solution to report the similar friends of Leetcodify users. A user x and user y are similar friends if:
x and y are friends, andx and y listened to the same three or more different songs on the same day.Return the result table in any order. Note that you must return the similar pairs of friends the same way they were represented in the input (i.e., always user1_id < user2_id).
The result format is in the following example.
Example 1:
Input: Listens table: +---------+---------+------------+ | user_id | song_id | day | +---------+---------+------------+ | 1 | 10 | 2021-03-15 | | 1 | 11 | 2021-03-15 | | 1 | 12 | 2021-03-15 | | 2 | 10 | 2021-03-15 | | 2 | 11 | 2021-03-15 | | 2 | 12 | 2021-03-15 | | 3 | 10 | 2021-03-15 | | 3 | 11 | 2021-03-15 | | 3 | 12 | 2021-03-15 | | 4 | 10 | 2021-03-15 | | 4 | 11 | 2021-03-15 | | 4 | 13 | 2021-03-15 | | 5 | 10 | 2021-03-16 | | 5 | 11 | 2021-03-16 | | 5 | 12 | 2021-03-16 | +---------+---------+------------+ Friendship table: +----------+----------+ | user1_id | user2_id | +----------+----------+ | 1 | 2 | | 2 | 4 | | 2 | 5 | +----------+----------+ Output: +----------+----------+ | user1_id | user2_id | +----------+----------+ | 1 | 2 | +----------+----------+ Explanation: Users 1 and 2 are friends, and they listened to songs 10, 11, and 12 on the same day. They are similar friends. Users 1 and 3 listened to songs 10, 11, and 12 on the same day, but they are not friends. Users 2 and 4 are friends, but they did not listen to the same three different songs. Users 2 and 5 are friends and listened to songs 10, 11, and 12, but they did not listen to them on the same day.
Problem Overview: You have a listening log that records user_id, song_id, and day, plus a table of friendships. The goal is to find pairs of friends who listened to at least three of the same songs on the same day. The result returns the two user IDs and the day where their listening overlap reaches three or more songs.
Approach 1: Self Join on Listening History (O(n²) join, O(n) space)
The core idea is to compare listening records between two users. Perform a self join on the Listens table where L1.song_id = L2.song_id and L1.day = L2.day. Enforce L1.user_id < L2.user_id to avoid duplicate pairs and reversed combinations. Then join this result with the Friendship table to keep only valid friend pairs. After matching songs listened on the same day, aggregate with GROUP BY user1, user2, day and filter using HAVING COUNT(DISTINCT song_id) >= 3. This approach directly models the problem: iterate through shared listens and count how many overlaps each friend pair has per day.
Approach 2: Pre-Aggregation with CTE or Derived Table (O(n²) join, reduced intermediate rows)
Instead of joining the full listening table immediately, first normalize the listening data so each (user_id, song_id, day) appears once. A derived table or CTE removes duplicates using SELECT DISTINCT. Then perform the same self join between users on matching song_id and day. This reduces redundant rows if users listen to the same song multiple times in a day. After the join, group by the user pair and day, and apply HAVING COUNT(*) >= 3. The logical complexity remains similar, but the join size can shrink significantly on noisy datasets.
Both approaches rely heavily on relational operations: self joins, grouping, and filtering with aggregation. Understanding how SQL handles join cardinality is key when dealing with activity logs like this. Problems like this commonly appear in database and SQL interview rounds because they test your ability to reason about relationships and aggregates across multiple tables.
Recommended for interviews: The self-join with aggregation is the expected solution. It clearly demonstrates how to match rows across users, enforce friend relationships through a join, and compute overlaps using GROUP BY and HAVING. A brute-force mindset helps you recognize that every pair of users must be compared, but the SQL join expresses that comparison efficiently using relational algebra and joins.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Self Join + GROUP BY | O(n²) join worst case | O(n) | Standard SQL solution when comparing activity between two users |
| Pre‑Aggregated Listening + Self Join | O(n²) join but fewer rows | O(n) | Better when listening logs contain duplicates or repeated plays |