Extract Numbers from a Text File in Python
To extract numbers from a text file in Python, you can use regular expressions (re
module), list comprehensions, or isdigit()
checks while reading the file line by line. This tutorial covers multiple approaches to extract numbers efficiently.
Examples
1. Using Regular Expressions (re.findall()
)
Regular expressions provide a powerful way to extract numbers from text by identifying patterns using the re.findall()
method.
sample.txt
The old bridge was built in 1923 and renovated in 1985.
An old man, 75 years old, shared stories from the 1800s.
This car is 20 years old but still runs like new.
They found an old coin from the year 1901 in the attic.
She replaced her old phone, which she used for 5 years.
main.py
import re
# Open and read the file
with open("sample.txt", "r") as file:
content = file.read()
# Extracting numbers using regex
numbers = re.findall(r'\d+', content)
# Convert extracted numbers from strings to integers
numbers = list(map(int, numbers))
# Printing the extracted numbers
print("Extracted Numbers:", numbers)
Explanation:
- The
open()
function reads the entire content of the file into thecontent
variable. - The
re.findall(r'\d+', content)
function uses a regular expression to match all sequences of digits in the text. - Since
re.findall()
returns a list of strings, we convert them to integers usingmap(int, numbers)
. - Finally, we print the list of extracted numbers.
Output:
data:image/s3,"s3://crabby-images/98ee8/98ee88673b68b81135834699ff1144e7df79ace3" alt=""
2. Using split()
and isdigit()
This method reads the file line by line, splits the content into words, and filters out numbers using isdigit()
. We have used the same sample.txt file that we have taken as input file in the first example.
main.py
# Open and read the file
with open("sample.txt", "r") as file:
numbers = [int(word) for line in file for word in line.split() if word.isdigit()]
# Printing the extracted numbers
print("Extracted Numbers:", numbers)
Explanation:
- The file is opened and read line by line.
- Each line is split into words using
split()
. - The
isdigit()
method filters out numeric values. - The numbers are converted to integers and stored in a list.
Output:
data:image/s3,"s3://crabby-images/c9a9f/c9a9f0ac7293f88da4f33ca9f7b9b7caff319832" alt=""
Please note that it did not consider the numbers: 1985.
, 1800s
as digits. 1985.
has period character, and 1800s
has an alphabet.
3. Extracting Decimal Numbers (Floats)
Regular expressions can also be used to extract floating-point numbers from a text file.
sample.txt
The old clock stopped working after 12.5 years.
An old bridge collapsed after 75.3 years of wear.
She sold her old car for 2500.99 dollars.
main.py
import re
# Open and read the file
with open("sample.txt", "r") as file:
content = file.read()
# Extracting both integers and floating-point numbers
numbers = re.findall(r'\d+\.\d+|\d+', content)
# Convert extracted numbers to float
numbers = list(map(float, numbers))
# Printing the extracted numbers
print("Extracted Numbers:", numbers)
Explanation:
- The file content is read using
read()
. - The regex pattern
\d+\.\d+|\d+
captures both decimal and integer numbers. - The extracted numbers are converted into floating-point numbers.
- The final list of extracted numbers is printed.
Output:
data:image/s3,"s3://crabby-images/e9fd6/e9fd63353e106e5bfb73ec7a220333692c866ab1" alt=""
4. Extracting Numbers and Their Positions in the File
This example extracts numbers and their positions (line numbers) from the text file.
sample.txt
The old bridge was built in 1923 and renovated in 1985.
An old man, 75 years old, shared stories from the 1800s.
This car is 20 years old but still runs like new
main.py
import re
# Open the file and read it line by line
with open("sample.txt", "r") as file:
for line_number, line in enumerate(file, start=1):
numbers = re.findall(r'\d+', line)
if numbers:
print(f"Line {line_number}: Extracted Numbers: {list(map(int, numbers))}")
Explanation:
- The file is opened and read line by line.
- The
enumerate()
function keeps track of line numbers. - The
re.findall()
function extracts numbers from each line. - The extracted numbers are converted to integers and printed along with their line numbers.
Output:
data:image/s3,"s3://crabby-images/4c369/4c3699c18c5e604732cddbe3fbd4d62a7947f12c" alt=""
Conclusion
In this tutorial, we covered different ways to extract numbers from a text file in Python, with examples:
- Regular Expressions (
re.findall()
): Best for extracting numbers from unstructured text. split()
andisdigit()
: A simple approach for structured text.- Extracting Decimal Numbers: Extends regex usage to float values.
- Finding Numbers with Line Numbers: Useful for extracting numbers along with their positions.