Count the Number of Words in a File in Python
To count the number of words in a file in Python, we can use the open()
function to read the file, split its contents using split()
, and determine the word count using the len()
function.
Let us go through some examples.
Examples for Counting Words in a File
1. Counting Words in a File Using read()
and split()
In this example, we open a text file sample.txt, read its contents, split the text into words, and count them.
sample.txt
Hello, World!
Learning Python is fun.
main.py
# Open the file in read mode
with open("sample.txt", "r") as file:
content = file.read() # Read the entire file
words = content.split() # Split text into words
word_count = len(words) # Count the number of words
# Print the word count
print("Total Number of Words:", word_count)
Explanation:
We achieve the word count using the following steps:
- The
open()
function is used to opensample.txt
in read mode. file.read()
reads the entire file into the variablecontent
.content.split()
splits the text into words based on whitespace. Reference: string split() method- The
len()
function calculates the total number of words.
Output:
data:image/s3,"s3://crabby-images/99f8f/99f8fd38b80addac1256ceeb457f182d84dfa867" alt=""
2. Counting Words Line by Line Using readlines()
Instead of reading the entire file at once, we process the file sample.txt line by line.
sample.txt
Hello, World!
Learning Python is fun.
main.py
# Open the file in read mode
with open("sample.txt", "r") as file:
word_count = sum(len(line.split()) for line in file) # Count words per line
# Print the word count
print("Total Number of Words:", word_count)
Explanation:
Here, we read the file line by line and count words dynamically:
- The
open()
function opens the file. - We use a generator expression inside
sum()
to iterate through each line. line.split()
splits each line into words.- The
len()
function counts the words in each line. - The sum of these values gives the total word count.
Output:
data:image/s3,"s3://crabby-images/99f8f/99f8fd38b80addac1256ceeb457f182d84dfa867" alt=""
3. Counting Words in a File While Ignoring Punctuation
This example removes punctuation before counting words using the re
module.
sample.txt
Hello, World!
Learning Python is fun.
main.py
import re
# Open the file in read mode
with open("sample.txt", "r") as file:
content = file.read() # Read entire file
words = re.findall(r'\b\w+\b', content) # Extract words using regex
word_count = len(words) # Count words
# Print the word count
print("Total Number of Words:", word_count)
Explanation:
- We import the
re
module for regular expressions. - The
read()
function reads the file intocontent
. re.findall(r'\b\w+\b', content)
extracts words while ignoring punctuation.- The
len()
function calculates the total number of words.
Output:
data:image/s3,"s3://crabby-images/99f8f/99f8fd38b80addac1256ceeb457f182d84dfa867" alt=""
4. Counting Unique Words in a File
We can use a set to count only unique words in the file.
In this example, we will count the unique words in sample.txt file.
sample.txt
Hello, World!
Learning Python is fun. And Hello World!
main.py
# Open the file in read mode
with open("sample.txt", "r") as file:
content = file.read().lower() # Read and convert to lowercase
words = set(content.split()) # Get unique words using a set
unique_word_count = len(words) # Count unique words
# Print the unique word count
print("Total Number of Unique Words:", unique_word_count)
Explanation:
- The
read()
function reads the file contents. - We convert text to lowercase using
lower()
to avoid case sensitivity. set(content.split())
stores only unique words. Refer Python Sets.- The
len()
function returns the count of unique words.
Output:
data:image/s3,"s3://crabby-images/81f98/81f980c9c01b04f6e6852b03e6d3bc495884e2b1" alt=""
Conclusion
In this tutorial, we covered different ways to count words in a file with examples:
- Using
read()
andsplit()
: Simple method to count words. - Processing line by line: Reduces memory usage for large files.
- Ignoring punctuation: Uses regular expressions for accurate word counting.
- Counting unique words: Uses a set to filter duplicates.