Table: Tweets
+-------------+---------+ | Column Name | Type | +-------------+---------+ | user_id | int | | tweet_id | int | | tweet_date | date | | tweet | varchar | +-------------+---------+ tweet_id is the primary key (column with unique values) for this table. Each row of this table contains user_id, tweet_id, tweet_date and tweet. It is guaranteed that all tweet_date are valid dates in February 2024.
Write a solution to find the top 3 trending hashtags in February 2024. Every tweet may contain several hashtags.
Return the result table ordered by count of hashtag, hashtag in descending order.
The result format is in the following example.
Example 1:
Input:
Tweets table:
+---------+----------+------------------------------------------------------------+------------+ | user_id | tweet_id | tweet | tweet_date | +---------+----------+------------------------------------------------------------+------------+ | 135 | 13 | Enjoying a great start to the day. #HappyDay #MorningVibes | 2024-02-01 | | 136 | 14 | Another #HappyDay with good vibes! #FeelGood | 2024-02-03 | | 137 | 15 | Productivity peaks! #WorkLife #ProductiveDay | 2024-02-04 | | 138 | 16 | Exploring new tech frontiers. #TechLife #Innovation | 2024-02-04 | | 139 | 17 | Gratitude for today's moments. #HappyDay #Thankful | 2024-02-05 | | 140 | 18 | Innovation drives us. #TechLife #FutureTech | 2024-02-07 | | 141 | 19 | Connecting with nature's serenity. #Nature #Peaceful | 2024-02-09 | +---------+----------+------------------------------------------------------------+------------+
Output:
+-----------+-------+ | hashtag | count | +-----------+-------+ | #HappyDay | 3 | | #TechLife | 2 | | #WorkLife | 1 | +-----------+-------+
Explanation:
Note: Output table is sorted in descending order by count and hashtag respectively.
Problem Overview: The task asks you to analyze post or message text and identify trending hashtags. Each post may contain multiple hashtags embedded in the text, so you must extract them, normalize them, and count how frequently they appear across the dataset to determine which ones are trending.
Approach 1: Regex Extraction + Aggregation (O(n * m) time, O(k) space)
The key challenge is extracting hashtags from free‑form text. A regular expression such as #\w+ identifies hashtag tokens directly from each post. You iterate through each row of text, apply regex extraction, and output each match as an individual record. After extraction, you perform aggregation with GROUP BY on the hashtag and compute counts. Sorting the results by frequency identifies trending tags. Time complexity is O(n * m), where n is the number of posts and m is the average text length processed by the regex engine. Space complexity is O(k) for storing unique hashtags.
This approach maps well to database queries where regex functions such as REGEXP_EXTRACT, REGEXP_SUBSTR, or equivalent features generate hashtag matches. Once the hashtags are extracted, standard SQL aggregation (COUNT, GROUP BY, ORDER BY) determines the most frequent ones.
Approach 2: Regex Extraction with Ranking Window Functions (O(n * m + k log k) time, O(k) space)
If the problem requires selecting only the top trending hashtags or ranking them, window functions simplify the query. After extracting hashtags using a regular expression, aggregate counts per tag. Then apply ranking functions such as RANK() or DENSE_RANK() ordered by frequency descending. This avoids manual sorting logic and integrates naturally into analytical SQL workflows. The complexity remains dominated by regex scanning of each post (O(n * m)), plus sorting the aggregated results (O(k log k)) where k is the number of distinct hashtags.
This technique commonly appears in SQL interview questions involving text analytics. Regex handles extraction, while grouping and ranking identify popularity patterns.
Recommended for interviews: Regex extraction combined with aggregation is the expected solution. Interviewers want to see that you can tokenize structured patterns from text using regular expressions, then leverage SQL grouping and sorting to compute frequency metrics. A naive manual string parsing approach demonstrates understanding of the problem, but the regex + aggregation method shows practical database querying skills.
We can use regular expressions to match all tags in each tweet, and then count the occurrence of each tag. Finally, we can sort the tags in descending order by the number of occurrences. If the number of occurrences is the same, we sort them in descending order by the tag name, and return the top three tags.
Python
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Regex Extraction + GROUP BY Aggregation | O(n * m) | O(k) | General solution for extracting hashtags from text and counting their frequency |
| Regex Extraction + Ranking Window Functions | O(n * m + k log k) | O(k) | When the query requires top trending hashtags or ordered ranking |
| Manual String Parsing | O(n * m) | O(k) | Useful when regex functions are unavailable or restricted |
Find Trending Hashtags II • Owen Wu • 35 views views
Practice Find Trending Hashtags II with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor