Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
words.txt contains only lowercase characters and space ' ' characters.Example:
Assume that words.txt has the following content:
the day is sunny the the the sunny is is
Your script should output the following, sorted by descending frequency:
the 4 is 3 sunny 2 day 1
Note:
The Word Frequency problem asks you to process a text file and print each word along with how many times it appears, sorted by frequency. Since the topic is Shell, the goal is to leverage command-line text processing utilities rather than implementing logic in a traditional programming language.
A common strategy is to first normalize and separate words so each appears on its own line. Tools like tr, awk, or similar utilities can help split text and handle whitespace. After that, you can sort the words so identical words become adjacent. Once grouped, use counting utilities to determine how many times each word occurs.
Finally, sort the results by frequency in descending order to match the expected output format. This pipeline-based approach efficiently processes large text files using Unix utilities designed for streaming data. The dominant cost typically comes from sorting operations, which influences the overall time complexity.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Shell pipeline using text utilities (split, sort, count) | O(n log n) | O(n) |
Greg Hogg
Watch expert explanations and walkthroughs
Jot down your thoughts, approach, and key learnings
Yes, variations of the Word Frequency problem appear in interviews to test text processing, command-line skills, and understanding of data aggregation. It is especially relevant for roles requiring scripting, DevOps, or strong familiarity with Unix tools.
The optimal approach in Shell uses a pipeline of text-processing utilities. Typically, words are separated into individual lines, sorted to group identical words, and then counted before a final sort by frequency. This approach leverages efficient Unix tools for handling large text streams.
Common commands include text-processing tools that split words, sorting utilities to group identical words, and counting utilities to calculate frequencies. These commands are often combined in a pipeline so the output of one command becomes the input of the next.
Conceptually, the problem is similar to using a hash map that counts occurrences of each word. In Shell, sorting and counting utilities simulate this behavior by grouping identical words together and then computing their frequencies.