Extract Numbers from a Text File in Python

To extract numbers from a text file in Python, you can use regular expressions (re module), list comprehensions, or isdigit() checks while reading the file line by line. This tutorial covers multiple approaches to extract numbers efficiently.


Examples

1. Using Regular Expressions (re.findall())

Regular expressions provide a powerful way to extract numbers from text by identifying patterns using the re.findall() method.

sample.txt

</>
Copy
The old bridge was built in 1923 and renovated in 1985.
An old man, 75 years old, shared stories from the 1800s.
This car is 20 years old but still runs like new.
They found an old coin from the year 1901 in the attic.
She replaced her old phone, which she used for 5 years.

main.py

</>
Copy
import re

# Open and read the file
with open("sample.txt", "r") as file:
    content = file.read()

# Extracting numbers using regex
numbers = re.findall(r'\d+', content)

# Convert extracted numbers from strings to integers
numbers = list(map(int, numbers))

# Printing the extracted numbers
print("Extracted Numbers:", numbers)

Explanation:

  1. The open() function reads the entire content of the file into the content variable.
  2. The re.findall(r'\d+', content) function uses a regular expression to match all sequences of digits in the text.
  3. Since re.findall() returns a list of strings, we convert them to integers using map(int, numbers).
  4. Finally, we print the list of extracted numbers.

Output:

2. Using split() and isdigit()

This method reads the file line by line, splits the content into words, and filters out numbers using isdigit(). We have used the same sample.txt file that we have taken as input file in the first example.

main.py

</>
Copy
# Open and read the file
with open("sample.txt", "r") as file:
    numbers = [int(word) for line in file for word in line.split() if word.isdigit()]

# Printing the extracted numbers
print("Extracted Numbers:", numbers)

Explanation:

  1. The file is opened and read line by line.
  2. Each line is split into words using split().
  3. The isdigit() method filters out numeric values.
  4. The numbers are converted to integers and stored in a list.

Output:

Please note that it did not consider the numbers: 1985., 1800s as digits. 1985. has period character, and 1800s has an alphabet.

3. Extracting Decimal Numbers (Floats)

Regular expressions can also be used to extract floating-point numbers from a text file.

sample.txt

</>
Copy
The old clock stopped working after 12.5 years.  
An old bridge collapsed after 75.3 years of wear.  
She sold her old car for 2500.99 dollars.  

main.py

</>
Copy
import re

# Open and read the file
with open("sample.txt", "r") as file:
    content = file.read()

# Extracting both integers and floating-point numbers
numbers = re.findall(r'\d+\.\d+|\d+', content)

# Convert extracted numbers to float
numbers = list(map(float, numbers))

# Printing the extracted numbers
print("Extracted Numbers:", numbers)

Explanation:

  1. The file content is read using read().
  2. The regex pattern \d+\.\d+|\d+ captures both decimal and integer numbers.
  3. The extracted numbers are converted into floating-point numbers.
  4. The final list of extracted numbers is printed.

Output:

4. Extracting Numbers and Their Positions in the File

This example extracts numbers and their positions (line numbers) from the text file.

sample.txt

</>
Copy
The old bridge was built in 1923 and renovated in 1985.
An old man, 75 years old, shared stories from the 1800s.
This car is 20 years old but still runs like new

main.py

</>
Copy
import re

# Open the file and read it line by line
with open("sample.txt", "r") as file:
    for line_number, line in enumerate(file, start=1):
        numbers = re.findall(r'\d+', line)
        if numbers:
            print(f"Line {line_number}: Extracted Numbers: {list(map(int, numbers))}")

Explanation:

  1. The file is opened and read line by line.
  2. The enumerate() function keeps track of line numbers.
  3. The re.findall() function extracts numbers from each line.
  4. The extracted numbers are converted to integers and printed along with their line numbers.

Output:

Conclusion

In this tutorial, we covered different ways to extract numbers from a text file in Python, with examples:

  1. Regular Expressions (re.findall()): Best for extracting numbers from unstructured text.
  2. split() and isdigit(): A simple approach for structured text.
  3. Extracting Decimal Numbers: Extends regex usage to float values.
  4. Finding Numbers with Line Numbers: Useful for extracting numbers along with their positions.