How to Handle Missing Values in a CSV File in Python

Handling Missing Values in a CSV File in Python

In Python, missing values in a CSV file can be handled using the pandas library, which provides functions like fillna(), dropna(), and interpolate(). These functions help in replacing, removing, or estimating missing values efficiently. In this tutorial, we will explore different techniques to handle missing values in a CSV file.

Examples to Handle Missing Values in a CSV File

1. Detecting Missing Values in a CSV File

Before handling missing values, we need to identify where they exist. In this example, we will read a CSV file using pandas and check for missing values using isnull() and sum().

main.py

</>

Copy

import pandas as pd

# Reading the CSV file
df = pd.read_csv("data.csv")

# Checking for missing values
missing_values = df.isnull().sum()

# Printing missing values count for each column
print("Missing values in each column:\n", missing_values)

Explanation:

pd.read_csv("data.csv"): Reads the CSV file into a DataFrame.
df.isnull(): Returns a DataFrame with True for missing values and False otherwise.
df.isnull().sum(): Counts the number of missing values in each column.
print(): Displays the missing values count for each column.

Output:

Missing values in each column:
Name       0
Age        2
Salary     1
City       3
dtype: int64

2. Removing Rows with Missing Values

Sometimes, it is necessary to remove rows containing missing values if they are not useful. We use the dropna() method to achieve this.

main.py

</>

Copy

# Removing rows with missing values
df_cleaned = df.dropna()

# Printing the cleaned DataFrame
print(df_cleaned)

Explanation:

df.dropna(): Removes all rows containing at least one missing value.
df_cleaned: Stores the cleaned DataFrame without missing values.
print(df_cleaned): Displays the DataFrame after removing missing values.

Output:

(DataFrame output without missing values)

3. Replacing Missing Values with a Default Value

Instead of removing rows, we can replace missing values with a default value using the fillna() method.

main.py

</>

Copy

# Replacing missing values with a default value
df_filled = df.fillna("Unknown")

# Printing the updated DataFrame
print(df_filled)

Explanation:

df.fillna("Unknown"): Replaces all missing values with "Unknown".
df_filled: Stores the updated DataFrame with replaced values.
print(df_filled): Displays the DataFrame after replacing missing values.

Output:

(DataFrame output with "Unknown" replacing missing values)

4. Filling Missing Values with the Column Mean

When dealing with numerical data, filling missing values with the column mean is a common approach.

main.py

</>

Copy

# Filling missing values in 'Age' column with its mean
df["Age"].fillna(df["Age"].mean(), inplace=True)

# Printing the updated DataFrame
print(df)

Explanation:

df["Age"].mean(): Computes the mean of the ‘Age’ column.
df["Age"].fillna(df["Age"].mean(), inplace=True): Fills missing values in ‘Age’ with its mean.
print(df): Displays the DataFrame with missing values replaced by the mean.

Output:

(DataFrame output with missing 'Age' values replaced by mean)

Conclusion

Handling missing values in a CSV file is essential for accurate data analysis. Here are the key techniques:

Detecting missing values using isnull().sum().
Removing missing values using dropna().
Replacing missing values with a default value using fillna().
Filling missing numerical values with column mean.

TutorialKart

How to Handle Missing Values in a CSV File in Python

Handling Missing Values in a CSV File in Python

Examples to Handle Missing Values in a CSV File

1. Detecting Missing Values in a CSV File

Output:

2. Removing Rows with Missing Values

Output:

3. Replacing Missing Values with a Default Value

Output:

4. Filling Missing Values with the Column Mean

Output:

Conclusion

Popular Courses

SAP

CRM

SAP Resources

Apache

GUI

Programming

Databases

Mobile

Linux

Web & Server

Testing

Learning