Watch 10 video solutions for Word Frequency, a medium level problem involving Shell. This walkthrough by Greg Hogg has 2,093,870 views views. Want to try solving it yourself? Practice on FleetCode or read the detailed text solution.
Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
words.txt contains only lowercase characters and space ' ' characters.Example:
Assume that words.txt has the following content:
the day is sunny the the the sunny is is
Your script should output the following, sorted by descending frequency:
the 4 is 3 sunny 2 day 1
Note:
Problem Overview: You receive a text file containing words separated by spaces. The task is to count how many times each word appears and print the result sorted by frequency in descending order. The challenge focuses on efficient text processing using shell utilities or frequency counting techniques.
Approach 1: Using Command-line Utilities (O(n log n) time, O(n) space)
This approach uses standard Unix text-processing tools. First normalize the input so each word appears on its own line using tr or similar utilities. Then sort the words with sort, count duplicates using uniq -c, and finally sort again by frequency in descending order. The key insight is that uniq only counts adjacent duplicates, so sorting groups identical words together. This pipeline-based solution is concise and idiomatic for shell scripting tasks.
Approach 2: Using grep, sort, and uniq (O(n log n) time, O(n) space)
Another shell-based solution relies on grep -o to extract individual words from the file, printing each match on a new line. Once extracted, the pipeline continues with sort to group identical words and uniq -c to compute counts. A final descending sort arranges results by frequency. This approach works well when the input format may contain punctuation or irregular spacing, because grep can precisely control the word pattern being extracted.
Approach 3: Recursive Approach with Memoization (O(n) time, O(k) space)
If implemented in languages like Python or Java, the problem becomes a classic frequency counting task. Iterate through the list of words and store counts in a hash map. A recursive helper can process one word at a time while memoizing previously seen counts in the map. Each insertion or lookup in the hash table runs in average O(1) time. After counting, convert the map entries into a list and sort them by frequency before printing.
Approach 4: Iterative Dynamic Programming (O(n) time counting + O(k log k) sorting, O(k) space)
An iterative approach processes the text sequentially and updates a frequency table for each encountered word. The “DP” aspect is simply building results from previous counts: each step updates count[word] = count[word] + 1. The main data structure is a hash map keyed by the word string. Once all words are processed, sort the unique entries by frequency. This approach is common in general string processing problems and works well outside shell environments.
Recommended for interviews: Interviewers usually expect the hash map frequency-count solution because it demonstrates understanding of constant-time lookups and sorting results by value. Showing the shell pipeline solution proves strong command-line skills, but the hashmap approach highlights core algorithmic thinking.
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Command-line Utilities Pipeline | O(n log n) | O(n) | Best for shell scripting and quick text processing using Unix tools |
| grep + sort + uniq | O(n log n) | O(n) | When word extraction requires pattern matching or flexible parsing |
| Recursive Hash Map Counting | O(n) + O(k log k) | O(k) | General programming solution using hash tables |
| Iterative Frequency Table (DP-style) | O(n) + O(k log k) | O(k) | Preferred in interviews for efficient counting and clear logic |