How pandas nunique works in Python? Best example

How pandas nunique works in Python? Best example
“`html

When working with data in Python, one of the most common tasks is counting unique values in a dataset. Thankfully, Pandas provides a powerful and efficient method for this: pandas.nunique(). In this article, I’ll go through everything you need to know about how pandas.nunique() works, with clear examples and best practices.

What is pandas.nunique()?

The pandas.nunique() function is used to count the number of distinct values in a Series or DataFrame. It helps us quickly understand the variety of unique values in a dataset without manually filtering or iterating over elements.

Basic Syntax of pandas.nunique()

The method can be used on both Series and DataFrame objects. Here’s the basic syntax:

Series.nunique(dropna=True)
DataFrame.nunique(axis=0, dropna=True)
  • dropna (default: True) – Determines whether to exclude NaN (missing values) from the count.
  • axis (for DataFrame) – If 0 (default), it calculates unique values for each column; if 1, it calculates unique values for each row.

Example 1: Using nunique() on a Series

Let’s start with a simple example using a Pandas Series:

import pandas as pd

# Create a Series
s = pd.Series([1, 2, 2, 3, 4, 4, 4, None])

# Count unique values
unique_count = s.nunique()
print(unique_count)

Output:

4

By default, NaN values are not counted, so unique values are {1, 2, 3, 4}, which gives a result of 4.

Example 2: Using nunique() on a DataFrame

Now, let’s see how nunique() works on a DataFrame:

df = pd.DataFrame({
    'A': [1, 2, 2, 3, 3, 4, None],
    'B': ['a', 'b', 'b', 'c', None, 'a', 'a']
})

# Count unique values for each column
col_unique_counts = df.nunique()
print(col_unique_counts)

Output:

A    4
B    3
dtype: int64

This tells us that column A has 4 unique values and column B has 3 unique values (ignoring NaN values).

Example 3: Counting Unique Values Row-Wise

We can also use axis=1 to count unique values for each row:

row_unique_counts = df.nunique(axis=1)
print(row_unique_counts)

Output:

0    2
1    2
2    1
3    2
4    1
5    2
6    1
dtype: int64

This returns a Series where each row’s unique value count is displayed.

Example 4: Counting Unique Values Including NaN

If we want to include NaN values in our count, we can set dropna=False:

df.nunique(dropna=False)

This method will treat NaN as a separate unique value and increase the count accordingly.

Performance Considerations

When working with large datasets, pandas.nunique() is optimized and much faster than using a loop to find unique values manually. However, keep these considerations in mind:

  • Using dropna=False may slightly impact performance due to additional memory operations.
  • For large DataFrames, applying nunique() across rows (axis=1) can be slower than column-wise operations.

Summary Table: Common Use Cases

Use Case Syntax
Count unique values in a Series Series.nunique()
Count unique values in each DataFrame column DataFrame.nunique()
Count unique values per row DataFrame.nunique(axis=1)
Include NaN values in the count DataFrame.nunique(dropna=False)

Conclusion

The pandas.nunique() function is a simple yet effective tool for analyzing data diversity in a dataset. Whether you’re working with a Series or a DataFrame, it provides quick insights into unique values, making it an essential method in data analysis and preprocessing.

“` Other interesting article: How pandas unique works in Python? Best example