teachsomebody

Counting the number of occurrences of a word in a file in Linux

Tue Apr 27 2021

Let's take a file called input.txt containing this text:

Geology describes the structure of the Earth on and beneath its surface, and the processes that have shaped that structure. 
It also provides tools to determine the relative and absolute ages of rocks found in a given location, and also to describe the histories of those rocks.
By combining tools, geologists are able to chronicle the geological history of the Earth as a whole, and also to demonstrate the age of the Earth. 
Geology provides the primary evidence for plate tectonics, the evolutionary history of life, and the Earth's past climates. Source - Wikipedia

If we do a manual count, we find "2" occurrences of the word ”the" standalone or as part of another word.

Let's start by just using "grep" and counting the number of lines in the output of the “grep” command on a terminal.

N.B. - The grep (Global Regular Expression Print) command in Linux can be leveraged to search for a string of characters in a specified file.

| - denotes the pipe symbol and a pipe is a Linux command, which enables you to use two or more commands in such a way that the output of one command serves as input to the next.

wc -l means in general terms that you count the number of lines in the text file.

% grep -i "the" input.txt | wc -l

% 4

This will output 4 which is the number of lines of the text containing the word "the". Note the "-i” option indicates the command should ignore the case.

To count all occurrences of the word "the", we need a "-o" option to tell the "grep" command to print only the part of the line that contains the word we are searching for.

% grep -i -o "the" input.txt | wc -l
% 12

This will output "12" which is indeed the number of times the word "the" appears in the text above.

Written by:

Courses Live classes Blogs Discussions