
When working with data in Python, one of the most common tasks is counting unique values in a dataset. Thankfully, Pandas provides a powerful and efficient method for this: pandas.nunique()
. In this article, I’ll go through everything you need to know about how pandas.nunique()
works, with clear examples and best practices.
What is pandas.nunique()
?
The pandas.nunique()
function is used to count the number of distinct values in a Series or DataFrame. It helps us quickly understand the variety of unique values in a dataset without manually filtering or iterating over elements.
Basic Syntax of pandas.nunique()
The method can be used on both Series and DataFrame objects. Here’s the basic syntax:
Series.nunique(dropna=True)
DataFrame.nunique(axis=0, dropna=True)
dropna
(default:True
) – Determines whether to exclude NaN (missing values) from the count.axis
(for DataFrame) – If0
(default), it calculates unique values for each column; if1
, it calculates unique values for each row.
Example 1: Using nunique()
on a Series
Let’s start with a simple example using a Pandas Series:
import pandas as pd
# Create a Series
s = pd.Series([1, 2, 2, 3, 4, 4, 4, None])
# Count unique values
unique_count = s.nunique()
print(unique_count)
Output:
4
By default, NaN
values are not counted, so unique values are {1, 2, 3, 4}, which gives a result of 4
.
Example 2: Using nunique()
on a DataFrame
Now, let’s see how nunique()
works on a DataFrame:
df = pd.DataFrame({
'A': [1, 2, 2, 3, 3, 4, None],
'B': ['a', 'b', 'b', 'c', None, 'a', 'a']
})
# Count unique values for each column
col_unique_counts = df.nunique()
print(col_unique_counts)
Output:
A 4
B 3
dtype: int64
This tells us that column A
has 4 unique values and column B
has 3 unique values (ignoring NaN values).
Example 3: Counting Unique Values Row-Wise
We can also use axis=1
to count unique values for each row:
row_unique_counts = df.nunique(axis=1)
print(row_unique_counts)
Output:
0 2
1 2
2 1
3 2
4 1
5 2
6 1
dtype: int64
This returns a Series where each row’s unique value count is displayed.
Example 4: Counting Unique Values Including NaN
If we want to include NaN values in our count, we can set dropna=False
:
df.nunique(dropna=False)
This method will treat NaN as a separate unique value and increase the count accordingly.
Performance Considerations
When working with large datasets, pandas.nunique()
is optimized and much faster than using a loop to find unique values manually. However, keep these considerations in mind:
- Using
dropna=False
may slightly impact performance due to additional memory operations. - For large DataFrames, applying
nunique()
across rows (axis=1
) can be slower than column-wise operations.
Summary Table: Common Use Cases
Use Case | Syntax |
---|---|
Count unique values in a Series | Series.nunique() |
Count unique values in each DataFrame column | DataFrame.nunique() |
Count unique values per row | DataFrame.nunique(axis=1) |
Include NaN values in the count | DataFrame.nunique(dropna=False) |
Conclusion
The pandas.nunique()
function is a simple yet effective tool for analyzing data diversity in a dataset. Whether you’re working with a Series or a DataFrame, it provides quick insights into unique values, making it an essential method in data analysis and preprocessing.