How pandas fillna works in Python? Best example

How pandas fillna works in Python? Best example
“`html

Handling missing data is an essential skill when working with pandas in Python. One of the most useful functions for filling in those troublesome NaN values is fillna(). In this article, I’ll walk you through how pandas fillna() works, providing the best examples and explanations to ensure you can use this function effectively in your own data processing.

Understanding Missing Data in Pandas

Before diving into fillna(), it’s crucial to understand what missing values are in pandas. In most cases, pandas represents missing values using NaN (Not a Number), which stems from NumPy. A missing value can appear for various reasons, such as:

  • Incomplete datasets
  • Errors during data extraction
  • Intentional placeholders

Let’s start by creating a simple DataFrame with missing values:

import pandas as pd

# Creating a sample dataset with NaN values
data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4], 'C': [10, None, 30, 40]}
df = pd.DataFrame(data)

print(df)

This will output:


     A    B     C
0  1.0  NaN  10.0
1  2.0  2.0   NaN
2  NaN  3.0  30.0
3  4.0  4.0  40.0

Basic Usage of fillna()

The fillna() function replaces all occurrences of missing values with a specified value. Here’s the simplest example:

df_filled = df.fillna(0)
print(df_filled)

The result replaces all NaN values with 0:


     A    B     C
0  1.0  0.0  10.0
1  2.0  2.0   0.0
2  0.0  3.0  30.0
3  4.0  4.0  40.0

Filling Missing Values with Different Strategies

Filling with a Specific Value Per Column

Instead of filling all missing values with the same value, you can specify different values for each column using a dictionary:

df_filled = df.fillna({'A': 99, 'B': 50, 'C': 0})
print(df_filled)

Forward and Backward Filling

You can propagate values forward (ffill) or backward (bfill) to fill missing data:

  • method='ffill': Fills missing values with the last known value above it.
  • method='bfill': Fills missing values with the next known value below it.
df_ffill = df.fillna(method='ffill')
df_bfill = df.fillna(method='bfill')

Filling with Column Mean, Median, or Mode

Another smart way to handle missing values is by filling them with statistical measures such as the mean, median, or mode:

df_filled_mean = df.fillna(df.mean())
df_filled_median = df.fillna(df.median())

# Filling with mode (most frequent value)
df_filled_mode = df.fillna(df.mode().iloc[0])

Replacing Only a Specific Subset of Data

To replace missing values only within a specific row or column range, use slicing:

df.iloc[1:3, 0] = df.iloc[1:3, 0].fillna(55)

Performance Considerations

When working with large datasets, performance matters. Let’s examine how efficient fillna() is:

Method Time Complexity Best Use Case
Constant value O(n) Quick and simple replacements
Mean, median O(n) Numeric data with missing values
ffill, bfill O(n) Time-series or sequential data

Final Thoughts

Now you know exactly how pandas fillna() works in Python and the best examples to use in real-world scenarios. Whether you want to use a constant value, statistical methods, or forward/backward filling, pandas gives you the flexibility to handle missing data effortlessly. The best approach depends on your dataset and the problem you’re solving, so choose wisely!

“` Other interesting article: How pandas dropna works in Python? Best example