How pandas value_counts works in Python? Best example

When working with data in Python, we often need to analyze categorical data or count the occurrences of different values in a dataset. This is where pandas.value_counts() comes in handy. Whether we’re dealing with survey responses, transaction records, or log data, this function simplifies the process of summarizing frequency distributions. Let’s dive deep into how pandas.value_counts() works and how to use it effectively.

Understanding pandas.value_counts()

The pandas.value_counts() function is used to count unique values in a Pandas Series or a DataFrame column. It returns a Series with the counts of each unique value, sorted in descending order by default. This is especially useful when dealing with categorical data or when we want to get an overview of the distribution of values.

Basic Usage of pandas.value_counts()

Let’s start with a simple example. Suppose we have a Pandas Series with some repeated values:

import pandas as pd

data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'apple'])

counts = data.value_counts()
print(counts)

The output will be:

apple     3
banana    2
orange    1
dtype: int64

The function automatically counts the occurrences of each unique value and sorts them in descending order. This makes analyzing categorical data quick and easy.

Applying pandas.value_counts() to a DataFrame

In a DataFrame, we usually apply value_counts() to a specific column. Here’s an example:

df = pd.DataFrame({
    'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple'],
    'Quantity': [5, 3, 2, 4, 1, 2]
})

fruit_counts = df['Fruit'].value_counts()
print(fruit_counts)

Again, the function counts the occurrences of each unique fruit in the “Fruit” column.

Sorting Options in pandas.value_counts()

By default, value_counts() sorts values in descending order. However, we can change this behavior using sort parameters.

sort=True (default) – Sort results in descending order.
sort=False – Keep the original order.

Example:

df['Fruit'].value_counts(sort=False)

Handling NaN Values with pandas.value_counts()

By default, value_counts() excludes NaN (missing) values. If we want to include them, we use the dropna=False parameter.

df = pd.Series(['Apple', 'Banana', 'Apple', 'Orange', None, 'Banana', 'Apple'])

print(df.value_counts(dropna=False))

Output:

Apple     3
Banana    2
Orange    1
NaN       1
dtype: int64

As we can see, the NaN value is also counted.

Normalizing Results with pandas.value_counts()

If we want relative frequencies instead of absolute counts, we can use normalize=True. This returns proportions instead of raw counts.

df['Fruit'].value_counts(normalize=True)

The output will show percentages instead of raw counts.

Using the bins Parameter in pandas.value_counts()

When working with numerical data, we might want to group values into bins. This is useful for histograms or understanding distributions.

quantities = pd.Series([5, 3, 2, 4, 1, 2, 10, 12, 15, 18])

print(quantities.value_counts(bins=3))

The bins parameter automatically generates intervals and counts values within them.

Example of pandas.value_counts() with a DataFrame

Let’s consider a more realistic dataset where we want to analyze customer orders.

df = pd.DataFrame({
    'Customer': ['Alice', 'Bob', 'Alice', 'David', 'Alice', 'Bob', 'David', 'Alice'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone', 'Tablet', 'Phone', 'Laptop']
})

print(df['Customer'].value_counts())

Output:

Alice    4
Bob      2
David    2
dtype: int64

Comparison of pandas.value_counts() and groupby()

Sometimes, we might want to compare value_counts() to groupby(). Both can be used to analyze categorical distributions, but groupby() is better suited for multi-column aggregation.

df.groupby('Customer')['Product'].count()

The difference is that value_counts() only works on one column at a time, while groupby() can aggregate multiple columns.

Summary: Key Features of pandas.value_counts()

Feature	Functionality
Default Sorting	Descending Order
Count NaN	Use `dropna=False`
Normalize	Use `normalize=True` for percentages
Bins	Use `bins=n` for numerical data grouping

Conclusion

The pandas.value_counts() function is an essential tool when working with categorical or numerical data in Python. It allows us to quickly analyze distributions, identify trends, and clean datasets efficiently. Mastering this function will make data exploration more efficient and insightful.

Other interesting article:

How pandas qcut works in Python? Best example