Watch the video solution for Number of Comments per Post, a easy level problem involving Database. This walkthrough by Everyday Data Science has 12,288 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Table: Submissions
+---------------+----------+
| Column Name | Type |
+---------------+----------+
| sub_id | int |
| parent_id | int |
+---------------+----------+
This table may have duplicate rows.
Each row can be a post or comment on the post.
parent_id is null for posts.
parent_id for comments is sub_id for another post in the table.
Write a solution to find the number of comments per post. The result table should contain post_id and its corresponding number_of_comments.
The Submissions table may contain duplicate comments. You should count the number of unique comments per post.
The Submissions table may contain duplicate posts. You should treat them as one post.
The result table should be ordered by post_id in ascending order.
The result format is in the following example.
Example 1:
Input: Submissions table: +---------+------------+ | sub_id | parent_id | +---------+------------+ | 1 | Null | | 2 | Null | | 1 | Null | | 12 | Null | | 3 | 1 | | 5 | 2 | | 3 | 1 | | 4 | 1 | | 9 | 1 | | 10 | 2 | | 6 | 7 | +---------+------------+ Output: +---------+--------------------+ | post_id | number_of_comments | +---------+--------------------+ | 1 | 3 | | 2 | 2 | | 12 | 0 | +---------+--------------------+ Explanation: The post with id 1 has three comments in the table with id 3, 4, and 9. The comment with id 3 is repeated in the table, we counted it only once. The post with id 2 has two comments in the table with id 5 and 10. The post with id 12 has no comments in the table. The comment with id 6 is a comment on a deleted post with id 7 so we ignored it.
Problem Overview: The table stores both posts and comments in the same dataset. Posts have parent_id as NULL, while comments reference the post using parent_id. The goal is to return every post with the number of comments it received, including posts that have zero comments.
Approach 1: Self Join with GROUP BY (O(n) time, O(1) extra space)
This approach treats posts and comments as two logical views of the same table. First, filter rows where parent_id IS NULL to identify posts. Then perform a LEFT JOIN between posts and the same table where comment.parent_id = post.sub_id. The join connects each post with its comments. Finally, use COUNT(comment.sub_id) and GROUP BY post.sub_id to compute the number of comments per post. The LEFT JOIN ensures posts without comments still appear in the result with a count of 0.
This pattern is common in database interview problems where hierarchical data exists inside a single table. The key insight is recognizing that comments reference their parent post through a foreign key. By joining the table to itself, you reconstruct the relationship and aggregate the results efficiently. The database engine scans the table once and performs grouping, giving roughly O(n) time complexity depending on indexing.
Approach 2: Correlated Subquery Counting (O(n²) worst case, O(1) space)
Another way is to select all posts and compute the comment count using a correlated subquery. For each post row, run a subquery that counts rows where parent_id = post.sub_id. The query structure looks like SELECT sub_id, (SELECT COUNT(*) FROM Submissions c WHERE c.parent_id = p.sub_id). While this solution is easy to read, the database may execute the subquery repeatedly for each post. With many posts, this can degrade toward O(n²) time.
This pattern appears frequently in SQL exercises, but it's usually less efficient than aggregation with joins. It works best when the dataset is small or when indexes on parent_id make the lookup cheap.
Recommended for interviews: The self join with LEFT JOIN and GROUP BY is the expected solution. It demonstrates understanding of relational modeling, aggregation, and joins. Mentioning the correlated subquery shows awareness of alternative approaches, but the join-based aggregation is the scalable and production-friendly solution.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Self Join + GROUP BY | O(n) | O(1) | Best general solution. Efficient aggregation when counting related rows. |
| Correlated Subquery | O(n²) worst case | O(1) | Readable query for small datasets or quick prototypes. |