Table: Files
+-------------+---------+ | Column Name | Type | +-- ----------+---------+ | file_name | varchar | | content | text | +-------------+---------+ file_name is the column with unique values of this table. Each row contains file_name and the content of that file.
Write a solution to find the number of files that have at least one occurrence of the words 'bull' and 'bear' as a standalone word, respectively, disregarding any instances where it appears without space on either side (e.g. 'bullet', 'bears', 'bull.', or 'bear' at the beginning or end of a sentence will not be considered)
Return the word 'bull' and 'bear' along with the corresponding number of occurrences in any order.
The result format is in the following example.
Example 1:
Input: Files table: +------------+----------------------------------------------------------------------------------+ | file_name | content | +------------+----------------------------------------------------------------------------------+ | draft1.txt | The stock exchange predicts a bull market which would make many investors happy. | | draft2.txt | The stock exchange predicts a bull market which would make many investors happy, | | | but analysts warn of possibility of too much optimism and that in fact we are | | | awaiting a bear market. | | draft3.txt | The stock exchange predicts a bull market which would make many investors happy, | | | but analysts warn of possibility of too much optimism and that in fact we are | | | awaiting a bear market. As always predicting the future market is an uncertain | | | game and all investors should follow their instincts and best practices. | +------------+----------------------------------------------------------------------------------+ Output: +------+-------+ | word | count | +------+-------+ | bull | 3 | | bear | 2 | +------+-------+ Explanation: - The word "bull" appears 1 time in "draft1.txt", 1 time in "draft2.txt", and 1 time in "draft3.txt". Therefore, the total number of occurrences for the word "bull" is 3. - The word "bear" appears 1 time in "draft2.txt", and 1 time in "draft3.txt". Therefore, the total number of occurrences for the word "bear" is 2.
Problem Overview: You have a table containing text content. The goal is to count how many times specific keywords (for example bull and bear) appear across all rows. Each occurrence inside the text must be counted, even if multiple matches appear in the same string.
Approach 1: LENGTH and REPLACE Trick (O(n) time, O(1) space)
The common SQL technique for counting substring occurrences uses the difference in string length before and after removing the target word. When you call REPLACE(content, 'bull', ''), every occurrence of the word is removed from the string. The difference between LENGTH(content) and the replaced string tells you how many characters were removed. Divide that difference by the length of the word to get the number of occurrences.
Apply this formula to each row and aggregate the result using SUM(). For example: (LENGTH(content) - LENGTH(REPLACE(content,'bull',''))) / 4 counts occurrences of the word bull. Repeat the same logic for other keywords like bear. Since SQL scans each text once per expression, the runtime is linear with respect to the total text length.
This method works well because it avoids complex parsing or tokenization. SQL engines optimize built-in string functions efficiently, making this approach concise and performant for moderate datasets. It also works consistently across many relational systems discussed in database problems that involve string processing and aggregation queries.
Approach 2: REGEXP-based Counting (O(n) time, O(1) space)
If your SQL engine supports regex counting functions such as REGEXP_COUNT(), you can directly count pattern matches inside the text. This approach expresses the intent more clearly: REGEXP_COUNT(content, 'bull'). You then aggregate counts across all rows. The database internally scans the string using its regex engine, producing the same linear-time behavior.
Regex-based solutions are easier to read but may not be available in every MySQL version used in interview platforms. The LENGTH - REPLACE trick remains the most portable technique for substring frequency queries.
Recommended for interviews: Use the LENGTH - REPLACE method. It shows that you understand SQL string manipulation and aggregation. Regex functions are cleaner when available, but interviewers typically expect the portable approach used in many SQL interview questions.
MySQL
| Approach | Time | Space | When to Use |
|---|---|---|---|
| LENGTH - REPLACE String Difference | O(n) | O(1) | Most common SQL interview technique; works across MySQL versions without regex support |
| REGEXP_COUNT Pattern Matching | O(n) | O(1) | Cleaner syntax when the SQL engine supports regex counting functions |
Leetcode MEDIUM 2738 - Count Occurrences in Text - Using SUM to Count in SQL | Everyday Data Science • Everyday Data Science • 652 views views
Practice Count Occurrences in Text with our built-in code editor and test cases.
Practice on FleetCodePractice this problem
Open in Editor