Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt . For simplicity sake, you may assume: words.txt contains only lowercase characters and space' ' characters. Each word must consist of lowercase characters only. Words are separated by one or more whitespace characters. Example: Assume that words.txt has the following content: the day is sunny the the the sunny is is Your script should output the following, sorted by descending frequency: the 4 is 3 sunny 2 day 1 Note: Don't worry about handling ties, it is guaranteed that each word's frequency count is unique. Could you write it in one-line using Unix pipes ?

Solution Explanation for Word Frequency Problem This problem requires counting the frequency of each word in a text file and outputting the result sorted by descending frequency. The provided solution uses a Unix pipeline approach with awk for efficient processing. Let's break down the solution step-by-step: 1. cat words.txt: This command reads the content of the words.txt file. 2. tr -s ' ' '\n': This command uses tr (translate characters) to replace one or more spaces (-s option for squeeze repetitions) with newline characters (\n). This effectively separates each word onto a new line. 3. sort: This command sorts the words alphabetically. This step is crucial for the next command, uniq. 4. uniq -c: This command counts the occurrences of each unique line (word). The -c option adds a count of each line before it. The output will look like count word. 5. sort -nr: This command sorts the lines numerically (-n) in reverse order (-r). This sorts the lines based on their frequency (the count) in descending order. 6. awk '{print $2, $1}': This is the final command that uses awk to rearrange the output. $1 refers to the first field (the count) and $2 refers to the second field (the word). The print $2, $1 statement swaps the order, displaying the word followed by its frequency. Example Walkthrough: Let's consider the sample words.txt file: the day is sunny the the the sunny is is cat words.txt: Outputs the content of the file. tr -s ' ' '\n': Transforms the input into: the day is sunny the the the sunny is is sort: Sorts the words alphabetically: day is is is sunny sunny the the the the uniq -c: Counts the occurrences of each unique word: 1 day 3 is 2 sunny 4 the sort -nr: Sorts by count in descending order: 4 the 3 is 2 sunny 1 day awk '{print $2, $1}': Swaps the word and count: the 4 is 3 sunny 2 day 1 This is the desired output. Time Complexity Analysis: The time complexity of this solution is dominated by the sorting steps. sort has an average-case time complexity of O(N log N), where N is the number of words. All other operations (reading the file, tr, uniq, awk) have linear time complexity, O(N). Therefore, the overall time complexity of the solution is O(N log N). Space Complexity Analysis: The space complexity is determined by the intermediate data structures used during the pipeline. The maximum space used will be proportional to the number of unique words, which in the worst case could be O(N). However, in practice, the number of unique words is typically much smaller than the total number of words. Thus, the space complexity is considered to be O(M), where M is the number of unique words (M <= N).

Also Explore

DSA Questions

Department Highest Salary

DSA Questions

Department Top Three Salaries

DSA Questions

Reverse Words in a String II

DSA Questions

Repeated DNA Sequences

DSA Questions

Best Time to Buy and Sell Stock IV

DSA Questions

Rotate Array

DSA Questions

Reverse Bits

DSA Questions

Number of 1 Bits

DSA Questions

Word Frequency

DSA Questions

Valid Phone Numbers

DSA Questions

Transpose File

DSA Questions

Tenth Line

DSA Questions

Delete Duplicate Emails

DSA Questions

Rising Temperature

DSA Questions

House Robber

DSA Questions

Binary Tree Right Side View

DSA Questions

Word Frequency

Solution Explanation for Word Frequency Problem

On This Page

Also Explore

Department Highest Salary

Department Top Three Salaries

Reverse Words in a String II

Repeated DNA Sequences

Best Time to Buy and Sell Stock IV

Rotate Array

Reverse Bits

Number of 1 Bits

Word Frequency

Valid Phone Numbers

Transpose File

Tenth Line

Delete Duplicate Emails

Rising Temperature

House Robber

Binary Tree Right Side View

Number of Islands