Counting the number of occurrences of a word in a file in Linux

April 27, 2021 read
Counting the number of occurrences of a word in a file in Linux
Computer Programming
Computer Science
Technology

Let's take a file called input.txt containing this text:

Geology describes the structure of the Earth on and beneath its surface, and the processes that have shaped that structure. 
It also provides tools to determine the relative and absolute ages of rocks found in a given location, and also to describe the histories of those rocks.
By combining tools, geologists are able to chronicle the geological history of the Earth as a whole, and also to demonstrate the age of the Earth. 
Geology provides the primary evidence for plate tectonics, the evolutionary history of life, and the Earth's past climates. Source - Wikipedia

If we do a manual count, we find "2" occurrences of the word ”the" standalone or as part of another word.

Let's start by just using "grep" and counting the number of lines in the output of the  “grep” command on a terminal.

N.B. - The grep (Global Regular Expression Print)  command in Linux can be leveraged to search for a string of characters in a specified file.

| - denotes the pipe symbol and a pipe is a Linux command, which enables you to use two or more commands in such a way that the output of one command serves as input to the next.

wc -l  means in general terms that you count the number of lines in the text file.

% grep -i "the" input.txt | wc -l

% 4

This will output 4 which is the number of lines of the text containing the word "the". Note the "-i” option indicates the command should ignore the case.

To count all occurrences of the word "the", we need a "-o" option to tell the "grep" command to print only the part of the line that contains the word we are searching for.

% grep -i -o "the" input.txt | wc -l
% 12

This will output "12" which is indeed the number of times the word "the" appears in the text above.

User profile image

Created by

Evans Boateng Owusu

Evans is a Computer Engineer and cloud technology enthusiast. He has a Masters degree in Embedded Systems (focusing on Software design) from the Technical University of Eindhoven (The Netherlands) and a Bachelor of Science in Electronic and Computer Engineering from the Polytechnic University of Turin (Italy). In addition, he has worked for the high-tech industry in the the Netherlands and other large corporations for over seven years.


© Copyright 2024, The BoesK Partnership