Word Frequency - Solution & Explanation

MediumShell11 min readAsked at: Microsoft, Meta, Google +1

Problem Statement

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.

Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

Note:

Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
Could you write it in one-line using Unix pipes?

Approach Overview

Problem Overview: You receive a text file containing words separated by spaces. The task is to count how many times each word appears and print the result sorted by frequency in descending order. The challenge focuses on efficient text processing using shell utilities or frequency counting techniques.

Approach 1: Using Command-line Utilities (O(n log n) time, O(n) space)

This approach uses standard Unix text-processing tools. First normalize the input so each word appears on its own line using tr or similar utilities. Then sort the words with sort, count duplicates using uniq -c, and finally sort again by frequency in descending order. The key insight is that uniq only counts adjacent duplicates, so sorting groups identical words together. This pipeline-based solution is concise and idiomatic for shell scripting tasks.

Approach 2: Using grep, sort, and uniq (O(n log n) time, O(n) space)

Another shell-based solution relies on grep -o to extract individual words from the file, printing each match on a new line. Once extracted, the pipeline continues with sort to group identical words and uniq -c to compute counts. A final descending sort arranges results by frequency. This approach works well when the input format may contain punctuation or irregular spacing, because grep can precisely control the word pattern being extracted.

Approach 3: Recursive Approach with Memoization (O(n) time, O(k) space)

If implemented in languages like Python or Java, the problem becomes a classic frequency counting task. Iterate through the list of words and store counts in a hash map. A recursive helper can process one word at a time while memoizing previously seen counts in the map. Each insertion or lookup in the hash table runs in average O(1) time. After counting, convert the map entries into a list and sort them by frequency before printing.

Approach 4: Iterative Dynamic Programming (O(n) time counting + O(k log k) sorting, O(k) space)

An iterative approach processes the text sequentially and updates a frequency table for each encountered word. The “DP” aspect is simply building results from previous counts: each step updates count[word] = count[word] + 1. The main data structure is a hash map keyed by the word string. Once all words are processed, sort the unique entries by frequency. This approach is common in general string processing problems and works well outside shell environments.

Recommended for interviews: Interviewers usually expect the hash map frequency-count solution because it demonstrates understanding of constant-time lookups and sorting results by value. Showing the shell pipeline solution proves strong command-line skills, but the hashmap approach highlights core algorithmic thinking.

Approach 1: Using Command-line Utilities

This approach uses a combination of Unix command-line utilities to achieve the desired result. It leverages the pipeline mechanism where the output of one command becomes the input to the next logically and efficiently.

This solution uses `awk` to scan each word in the file and maintain a frequency count in an associative array, `freq`. The `END` block prints each word and its count. This output is then piped into the `sort` command, which sorts the results numerically in descending order by the second column (the frequency values).

Code

Bash

Complexity

The time complexity is O(n log n), where `n` is the number of unique words due to sorting. The space complexity is O(k), where `k` is the number of unique words, since it stores the frequency of each word.

Try this approach in the editor →

Approach 2: Using `grep`, `sort`, and `uniq`

This approach relies on manually splitting input into words and using sorting to order them and then using `uniq` to count the frequencies.

This solution uses `tr` to replace spaces with newlines, effectively breaking words into individual lines. The output is piped into `sort` to arrange the words alphabetically. The `uniq -c` command counts consecutive identical words. Another `sort` is then used to order the results by frequency in descending order. Finally, `awk` is used again to print the final output format with the word before the frequency count.

Code

Bash

Complexity

The time complexity is O(n log n) primarily due to the sort operations. The space complexity is O(k), where `k` is the number of unique words because of the storage for word frequency.

Try this approach in the editor →

Approach 3: Recursive Approach with Memoization

This approach involves breaking the problem into smaller subproblems that can be solved recursively. By using memoization, we store the results of subproblems to avoid redundant calculations, thereby optimizing the solution.

This C code uses recursion with a memoization array to calculate the nth Fibonacci number efficiently. We initialize a memoization array to store already calculated Fibonacci values to avoid redundant computation.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n) due to memorized results.
Space Complexity: O(n) for the memoization array.

Try this approach in the editor →

Approach 4: Iterative Dynamic Programming

The iterative dynamic programming approach calculates the Fibonacci numbers from the bottom up, using loops to build the result from the base cases. It avoids recursion and efficiently computes the result using a constant amount of space (for the simple implementation).

This approach uses an iterative method to calculate Fibonacci numbers in linear time by updating two variables, thereby reducing space usage significantly.

Code

C C++Java Python C#JavaScript

C++

Java

Python

JavaScript

Complexity

Time Complexity: O(n) due to the single loop.
Space Complexity: O(1) since only a constant amount of space is used.

Try this approach in the editor →

Approach 5: awk

Code

Shell

Try this approach in the editor →

Complexity Comparison

Approach	Complexity
Using Command-line Utilities	The time complexity is O(n log n), where `n` is the number of unique words due to sorting. The space complexity is O(k), where `k` is the number of unique words, since it stores the frequency of each word.
Using `grep`, `sort`, and `uniq`	The time complexity is O(n log n) primarily due to the sort operations. The space complexity is O(k), where `k` is the number of unique words because of the storage for word frequency.
Recursive Approach with Memoization	Time Complexity: O(n) due to memorized results. Space Complexity: O(n) for the memoization array.
Iterative Dynamic Programming	Time Complexity: O(n) due to the single loop. Space Complexity: O(1) since only a constant amount of space is used.
awk	—

Detailed Complexity Analysis

Approach	Time	Space	When to Use
Command-line Utilities Pipeline	O(n log n)	O(n)	Best for shell scripting and quick text processing using Unix tools
grep + sort + uniq	O(n log n)	O(n)	When word extraction requires pattern matching or flexible parsing
Recursive Hash Map Counting	O(n) + O(k log k)	O(k)	General programming solution using hash tables
Iterative Frequency Table (DP-style)	O(n) + O(k log k)	O(k)	Preferred in interviews for efficient counting and clear logic

Video Solution

Leetcode 192 - Word Frequency | Solving leetcode problems using bash • Daigen B. River • 724 views views

Watch 3 more video solutions →

Frequently Asked Questions

Is Word Frequency easy or hard?

The problem is rated Medium mainly because of shell command composition and output formatting. The underlying concept—counting frequencies with a hash map—is straightforward once you recognize the pattern.

Word Frequency Python/Java solution

In Python or Java, split the input text into words and store counts in a dictionary or HashMap. Increment the count for each word during iteration, then sort the entries by frequency before printing the results.

How to solve Word Frequency in O(n)?

The counting phase can be done in O(n) by iterating through all words and storing their counts in a hash table. Each insertion or update takes constant average time. If the output does not require sorting, the entire operation stays O(n); otherwise sorting adds O(k log k).

What is the best approach for Word Frequency?

The most practical approach is counting words using a hash map and then sorting the results by frequency. Each word lookup and update runs in O(1) average time, making the counting step O(n). After counting, sorting the unique words takes O(k log k), where k is the number of distinct words.

Is Word Frequency asked at Google/Amazon/Meta?

Frequency counting problems are common interview patterns across companies like Amazon, Google, and Meta. Variations appear in text processing, log analysis, and string manipulation questions that require hash maps and sorting by frequency.

What data structure is used in Word Frequency?

A hash table (hash map or dictionary) is the primary data structure. It maps each word to the number of times it appears, enabling constant-time updates and lookups while processing the text.

What is the time complexity of Word Frequency?

Shell-based solutions typically run in O(n log n) time because they rely on sorting utilities like sort before applying uniq. Hash map counting solutions run in O(n) for counting plus O(k log k) for sorting unique words by frequency.

Ready to solve this problem?

Practice Word Frequency with our built-in code editor and test cases.

Practice on FleetCode

Top K Frequent Elements

Problem Info

DifficultyMedium

Acceptance26.3%

Approaches5

Reading time11 min

Asked at

Microsoft Meta Google Bloomberg

Practice this problem

Open in Editor

Problem Statement

Approach Overview

Approach 1: Using Command-line Utilities

Code

Complexity

Approach 2: Using `grep`, `sort`, and `uniq`

Code

Complexity

Approach 3: Recursive Approach with Memoization

Code

Complexity

Approach 4: Iterative Dynamic Programming

Code

Complexity

Approach 5: awk

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents

Problem Statement

Approach Overview

Approach 1: Using Command-line Utilities

Code

Complexity

Approach 2: Using `grep`, `sort`, and `uniq`

Code

Complexity

Approach 3: Recursive Approach with Memoization

Code

Complexity

Approach 4: Iterative Dynamic Programming

Code

Complexity

Approach 5: awk

Code

Complexity Comparison

Detailed Complexity Analysis

Video Solution

Frequently Asked Questions

Related Problems

Ready to solve this problem?

Problem Info

Table of Contents