Count the Number of Words in a File in Python

To count the number of words in a file in Python, we can use the open() function to read the file, split its contents using split(), and determine the word count using the len() function.

Let us go through some examples.


Examples for Counting Words in a File

1. Counting Words in a File Using read() and split()

In this example, we open a text file sample.txt, read its contents, split the text into words, and count them.

sample.txt

</>
Copy
Hello, World!
Learning Python is fun.  

main.py

</>
Copy
# Open the file in read mode
with open("sample.txt", "r") as file:
    content = file.read()  # Read the entire file
    words = content.split()  # Split text into words
    word_count = len(words)  # Count the number of words

# Print the word count
print("Total Number of Words:", word_count)

Explanation:

We achieve the word count using the following steps:

  • The open() function is used to open sample.txt in read mode.
  • file.read() reads the entire file into the variable content.
  • content.split() splits the text into words based on whitespace. Reference: string split() method
  • The len() function calculates the total number of words.

Output:

2. Counting Words Line by Line Using readlines()

Instead of reading the entire file at once, we process the file sample.txt line by line.

sample.txt

</>
Copy
Hello, World!
Learning Python is fun.  

main.py

</>
Copy
# Open the file in read mode
with open("sample.txt", "r") as file:
    word_count = sum(len(line.split()) for line in file)  # Count words per line

# Print the word count
print("Total Number of Words:", word_count)

Explanation:

Here, we read the file line by line and count words dynamically:

  • The open() function opens the file.
  • We use a generator expression inside sum() to iterate through each line.
  • line.split() splits each line into words.
  • The len() function counts the words in each line.
  • The sum of these values gives the total word count.

Output:

3. Counting Words in a File While Ignoring Punctuation

This example removes punctuation before counting words using the re module.

sample.txt

</>
Copy
Hello, World!
Learning Python is fun.  

main.py

</>
Copy
import re

# Open the file in read mode
with open("sample.txt", "r") as file:
    content = file.read()  # Read entire file
    words = re.findall(r'\b\w+\b', content)  # Extract words using regex
    word_count = len(words)  # Count words

# Print the word count
print("Total Number of Words:", word_count)

Explanation:

  • We import the re module for regular expressions.
  • The read() function reads the file into content.
  • re.findall(r'\b\w+\b', content) extracts words while ignoring punctuation.
  • The len() function calculates the total number of words.

Output:

4. Counting Unique Words in a File

We can use a set to count only unique words in the file.

In this example, we will count the unique words in sample.txt file.

sample.txt

</>
Copy
Hello, World!
Learning Python is fun. And Hello World!

main.py

</>
Copy
# Open the file in read mode
with open("sample.txt", "r") as file:
    content = file.read().lower()  # Read and convert to lowercase
    words = set(content.split())  # Get unique words using a set
    unique_word_count = len(words)  # Count unique words

# Print the unique word count
print("Total Number of Unique Words:", unique_word_count)

Explanation:

  • The read() function reads the file contents.
  • We convert text to lowercase using lower() to avoid case sensitivity.
  • set(content.split()) stores only unique words. Refer Python Sets.
  • The len() function returns the count of unique words.

Output:

Conclusion

In this tutorial, we covered different ways to count words in a file with examples:

  1. Using read() and split(): Simple method to count words.
  2. Processing line by line: Reduces memory usage for large files.
  3. Ignoring punctuation: Uses regular expressions for accurate word counting.
  4. Counting unique words: Uses a set to filter duplicates.