Given a string paragraph and a string array of the banned words banned, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.
The words in paragraph are case-insensitive and the answer should be returned in lowercase.
Example 1:
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2:
Input: paragraph = "a.", banned = [] Output: "a"
Constraints:
1 <= paragraph.length <= 1000' ', or one of the symbols: "!?',;.".0 <= banned.length <= 1001 <= banned[i].length <= 10banned[i] consists of only lowercase English letters.For #819 Most Common Word, the goal is to identify the most frequent word in a paragraph that is not included in a banned list. The key idea is to normalize and count words efficiently.
Start by converting the paragraph to lowercase and removing punctuation so that words like "Ball," and "ball" are treated the same. Split the paragraph into tokens (words). Use a hash set to store banned words for constant-time lookup, and a hash map (dictionary) to count the frequency of each valid word.
While iterating through the words, skip any word present in the banned set. For the remaining words, increment their frequency in the hash map and track the word with the maximum count. This approach leverages efficient lookups and counting using hash-based data structures.
The algorithm processes each character or word once, making it highly efficient for large paragraphs.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Hash Map Counting with Banned Set | O(n) | O(k) |
Ashish Pratap Singh
This approach involves using a hash map to count the frequency of each word in the paragraph after converting it to lowercase and removing punctuation. Then, the word with the highest count that is not in the banned list is selected as the result.
Time Complexity: O(N + M), where N is the length of the paragraph and M is the number of banned words. Space Complexity: O(N) for storing word frequencies.
1function mostCommonWord(paragraph, banned) {
2 const bannedSet = new Set(banned);
3 const words = paragraph.toLowerCase().match(/
This JavaScript solution uses regular expressions to find words in a paragraph and converts all words to lowercase. Using a Set for banned words, it counts the frequency of non-banned words with a plain object. The word with the maximum frequency is chosen as the result.
This approach leverages advanced string manipulation functions available in each language for efficient parsing and counting. The words are extracted, normalized, and counted using advanced language-specific methods and libraries for cleaner code.
Time Complexity: O(N log N) due to sorting, where N is total words extracted. Space Complexity: O(N).
1
Watch expert explanations and walkthroughs
Practice problems asked by these companies to ace your technical interviews.
Explore More ProblemsJot down your thoughts, approach, and key learnings
Yes, variations of word-frequency and string parsing problems like Most Common Word are common in technical interviews. They test understanding of hash tables, string processing, and efficient counting techniques.
The optimal approach uses a hash map to count the frequency of each word and a hash set to store banned words for quick lookup. By normalizing the paragraph and iterating through each word once, we can track the most frequent valid word efficiently.
A combination of a hash map and a hash set works best. The hash set allows constant-time checks for banned words, while the hash map keeps track of word frequencies during traversal.
Punctuation should be removed or ignored during preprocessing. Typically, the paragraph is converted to lowercase and non-letter characters are replaced with spaces so that words can be split and counted correctly.
This Python approach leverages Counter and re.findall for succinctly parsing and counting words, followed by determining the max occurrence non-banned word using the max function with a key parameter.