Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
words.txt contains only lowercase characters and space ' ' characters.Example:
Assume that words.txt has the following content:
the day is sunny the the the sunny is is
Your script should output the following, sorted by descending frequency:
the 4 is 3 sunny 2 day 1
Note:
Problem Overview: You receive a text file containing words separated by spaces. The task is to count how many times each word appears and print the result sorted by frequency in descending order. The challenge focuses on efficient text processing using shell utilities or frequency counting techniques.
Approach 1: Using Command-line Utilities (O(n log n) time, O(n) space)
This approach uses standard Unix text-processing tools. First normalize the input so each word appears on its own line using tr or similar utilities. Then sort the words with sort, count duplicates using uniq -c, and finally sort again by frequency in descending order. The key insight is that uniq only counts adjacent duplicates, so sorting groups identical words together. This pipeline-based solution is concise and idiomatic for shell scripting tasks.
Approach 2: Using grep, sort, and uniq (O(n log n) time, O(n) space)
Another shell-based solution relies on grep -o to extract individual words from the file, printing each match on a new line. Once extracted, the pipeline continues with sort to group identical words and uniq -c to compute counts. A final descending sort arranges results by frequency. This approach works well when the input format may contain punctuation or irregular spacing, because grep can precisely control the word pattern being extracted.
Approach 3: Recursive Approach with Memoization (O(n) time, O(k) space)
If implemented in languages like Python or Java, the problem becomes a classic frequency counting task. Iterate through the list of words and store counts in a hash map. A recursive helper can process one word at a time while memoizing previously seen counts in the map. Each insertion or lookup in the hash table runs in average O(1) time. After counting, convert the map entries into a list and sort them by frequency before printing.
Approach 4: Iterative Dynamic Programming (O(n) time counting + O(k log k) sorting, O(k) space)
An iterative approach processes the text sequentially and updates a frequency table for each encountered word. The “DP” aspect is simply building results from previous counts: each step updates count[word] = count[word] + 1. The main data structure is a hash map keyed by the word string. Once all words are processed, sort the unique entries by frequency. This approach is common in general string processing problems and works well outside shell environments.
Recommended for interviews: Interviewers usually expect the hash map frequency-count solution because it demonstrates understanding of constant-time lookups and sorting results by value. Showing the shell pipeline solution proves strong command-line skills, but the hashmap approach highlights core algorithmic thinking.
This approach uses a combination of Unix command-line utilities to achieve the desired result. It leverages the pipeline mechanism where the output of one command becomes the input to the next logically and efficiently.
This solution uses `awk` to scan each word in the file and maintain a frequency count in an associative array, `freq`. The `END` block prints each word and its count. This output is then piped into the `sort` command, which sorts the results numerically in descending order by the second column (the frequency values).
Bash
The time complexity is O(n log n), where `n` is the number of unique words due to sorting. The space complexity is O(k), where `k` is the number of unique words, since it stores the frequency of each word.
This approach relies on manually splitting input into words and using sorting to order them and then using `uniq` to count the frequencies.
This solution uses `tr` to replace spaces with newlines, effectively breaking words into individual lines. The output is piped into `sort` to arrange the words alphabetically. The `uniq -c` command counts consecutive identical words. Another `sort` is then used to order the results by frequency in descending order. Finally, `awk` is used again to print the final output format with the word before the frequency count.
Bash
The time complexity is O(n log n) primarily due to the sort operations. The space complexity is O(k), where `k` is the number of unique words because of the storage for word frequency.
This approach involves breaking the problem into smaller subproblems that can be solved recursively. By using memoization, we store the results of subproblems to avoid redundant calculations, thereby optimizing the solution.
This C code uses recursion with a memoization array to calculate the nth Fibonacci number efficiently. We initialize a memoization array to store already calculated Fibonacci values to avoid redundant computation.
Time Complexity: O(n) due to memorized results.
Space Complexity: O(n) for the memoization array.
The iterative dynamic programming approach calculates the Fibonacci numbers from the bottom up, using loops to build the result from the base cases. It avoids recursion and efficiently computes the result using a constant amount of space (for the simple implementation).
This approach uses an iterative method to calculate Fibonacci numbers in linear time by updating two variables, thereby reducing space usage significantly.
Time Complexity: O(n) due to the single loop.
Space Complexity: O(1) since only a constant amount of space is used.
Shell
| Approach | Complexity |
|---|---|
| Using Command-line Utilities | The time complexity is O(n log n), where `n` is the number of unique words due to sorting. The space complexity is O(k), where `k` is the number of unique words, since it stores the frequency of each word. |
| Using `grep`, `sort`, and `uniq` | The time complexity is O(n log n) primarily due to the sort operations. The space complexity is O(k), where `k` is the number of unique words because of the storage for word frequency. |
| Recursive Approach with Memoization | Time Complexity: O(n) due to memorized results. |
| Iterative Dynamic Programming | Time Complexity: O(n) due to the single loop. |
| awk | — |
| Approach | Time | Space | When to Use |
|---|---|---|---|
| Command-line Utilities Pipeline | O(n log n) | O(n) | Best for shell scripting and quick text processing using Unix tools |
| grep + sort + uniq | O(n log n) | O(n) | When word extraction requires pattern matching or flexible parsing |
| Recursive Hash Map Counting | O(n) + O(k log k) | O(k) | General programming solution using hash tables |
| Iterative Frequency Table (DP-style) | O(n) + O(k log k) | O(k) | Preferred in interviews for efficient counting and clear logic |
Leetcode 192 - Word Frequency | Solving leetcode problems using bash • Daigen B. River • 724 views views
Watch 3 more video solutions →Practice Word Frequency with our built-in code editor and test cases.
Practice on FleetCode