
When working with data in Python, you’ll often encounter missing values. Whether you’re handling financial data, scientific experiments, or simple user records, dealing with NaN
(Not a Number) values is a crucial part of data processing. Enter pandas.notna()
– a simple yet powerful function that helps us identify non-missing data. In this article, I’ll walk you through how pandas.notna()
works, providing clear examples, use cases, and even a comparison with similar methods.
Understanding pandas.notna()
The function pandas.notna()
is part of the pandas
library and is used to detect non-missing values in a DataFrame or Series. Unlike its counterpart pandas.isna()
, which identifies missing values, notna()
does the opposite – returning True
for values that are not NaN
and False
for ones that are.
The syntax is straightforward:
pandas.notna(obj)
Where obj
can be any of the following:
- A single scalar value
- A Pandas Series
- A Pandas DataFrame
Basic Example of pandas.notna()
Let’s start with a simple example using a Pandas Series:
import pandas as pd
data = pd.Series([10, None, 25, float('NaN'), "Hello"])
result = pd.notna(data)
print(result)
Output:
0 True
1 False
2 True
3 False
4 True
dtype: bool
As you can see, the function correctly identifies None
and NaN
values as False
, while marking all valid entries as True
.
Using notna()
with DataFrames
Now let’s apply the function to a DataFrame.
df = pd.DataFrame({
"A": [1, 2, None, 4],
"B": ["apple", None, "banana", "cherry"],
"C": [None, 5.5, float('NaN'), 7.2]
})
result_df = pd.notna(df)
print(result_df)
Output:
A B C
0 True True False
1 True False True
2 False True False
3 True True True
Each cell in the DataFrame is evaluated, returning True
for valid values and False
for missing ones.
Practical Use Cases of notna()
Now that you’ve seen how pandas.notna()
works, let’s explore some practical use cases where it can be incredibly useful:
1. Filtering Out Missing Values
One of the most common use cases is filtering out NaN
values from a Pandas Series:
filtered_data = data[pd.notna(data)]
print(filtered_data)
This effectively removes any missing values while keeping valid ones.
2. Selecting Non-Empty Rows in a DataFrame
You may want to select only the rows where a specific column has non-missing values:
filtered_df = df[pd.notna(df["A"])]
print(filtered_df)
In this case, rows where column “A” contains missing values are removed.
Comparison: notna()
vs. isna()
It’s worth comparing notna()
with isna()
to appreciate their differences:
Function | Description | Returns True for |
---|---|---|
pandas.notna() |
Checks if a value is NOT missing | Valid (non-null) values |
pandas.isna() |
Checks if a value is missing | NaN / None |
In short, notna()
flips the boolean values of isna()
. If you ever find yourself using isna()
followed by ~
(negation), you’re better off using notna()
directly.
Best Example of pandas.notna()
Here’s a real-world example where I’ll demonstrate how to clean a messy dataset:
data = pd.DataFrame({
"Name": ["Alice", "Bob", None, "Dave"],
"Age": [25, None, 30, 40],
"City": ["New York", "Los Angeles", "Chicago", None]
})
cleaned_data = data[pd.notna(data["Name"]) & pd.notna(data["Age"])]
print(cleaned_data)
Output:
Name Age City
0 Alice 25.0 New York
3 Dave 40.0 None
In this example, we removed rows where “Name” or “Age” contained missing values, ensuring our dataset remains useful and complete.
Conclusion
Understanding how pandas.notna()
works in Python is an essential skill for any data scientist or analyst. Whether you’re cleaning data, filtering missing values, or preparing datasets for modeling, this function provides a quick and efficient way to identify non-missing entries.
The best part? It’s intuitive, easy to use, and complements Pandas’ missing data handling capabilities beautifully. The next time you’re debugging a DataFrame filled with NaN
values, remember that pandas.notna()
can be your best friend.