How pandas agg works in Python? Best example

If you work with data in Python, you’ve probably used the pandas library. And if you’ve dealt with aggregating data, there’s a very useful function that can make your life easier: pandas.DataFrame.agg(). But how exactly does it work? Let’s dive deep into it.

What is pandas agg() in Python?

The agg() function in pandas is used to perform multiple aggregate operations on a DataFrame or Series. It allows you to apply one or more functions at once, making it a powerful tool for summarizing data.

Basic Usage of pandas agg()

Let’s start with a simple example. Suppose we have a DataFrame that contains sales data:

import pandas as pd

# Sample data
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Sales': [100, 200, 150, 250, 300, 400]
}

df = pd.DataFrame(data)

Now, let’s apply the agg() method to the “Sales” column:

result = df['Sales'].agg(['sum', 'mean', 'max'])
print(result)

Output:

sum     1400.0
mean     233.3
max      400.0
dtype: float64

As you can see, agg() has computed multiple aggregation functions at once.

Using Custom Functions with agg()

You can also pass custom functions to agg(). Here’s an example:

def range_func(x):
    return x.max() - x.min()

result = df['Sales'].agg(['sum', 'mean', range_func])
print(result)

Output:

sum         1400.0
mean         233.3
range_func   300.0
dtype: float64

In this case, we defined a function that calculates the range (max – min) and applied it using agg().

Applying agg() on Multiple Columns

You can also use agg() on a DataFrame to apply different aggregation functions to different columns.

# Create a more complex DataFrame
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Sales': [100, 200, 150, 250, 300, 400],
    'Profit': [20, 50, 30, 80, 90, 120]
}

df = pd.DataFrame(data)

# Apply different functions to different columns
result = df.agg({
    'Sales': ['sum', 'mean'],
    'Profit': ['min', 'max']
})

print(result)

Output:

       Sales  Profit
sum   1400.0     NaN
mean   233.3     NaN
min      NaN    20.0
max      NaN   120.0

Here, we computed the sum and mean for the “Sales” column and the min and max for the “Profit” column.

Using agg() with GroupBy

One of the best use cases for agg() is when working with groupby(). Let’s say we want to aggregate sales and profit data by category.

grouped = df.groupby('Category').agg({
    'Sales': ['sum', 'mean'],
    'Profit': ['sum', 'min', 'max']
})

print(grouped)

Output:

         Sales          Profit           
           sum   mean   sum  min  max
Category                               
A         550  183.3   140   20   90
B         850  283.3   250   50  120

This neatly summarizes sales and profit values per category, saving you from writing multiple aggregation steps manually.

Comparing agg() to apply()

Sometimes people confuse agg() with apply(). The difference is:

apply() applies a function to each column or row.
agg() applies one or multiple aggregation functions column-wise.

For example, using apply() for the same task:

df[['Sales', 'Profit']].apply(lambda x: x.sum())

This sums both columns but doesn’t offer multiple aggregation functions like agg() does.

Performance Considerations

Using agg() is typically faster than using multiple separate aggregation methods. Instead of calling sum(), mean(), and max() sequentially, agg() applies them in a single call, improving efficiency.

Conclusion

The pandas.DataFrame.agg() function is a powerful tool in Python for summarizing and analyzing data. Whether you’re applying it to a single column, multiple columns, or grouped data, it helps streamline and optimize data aggregation tasks.

Other interesting article:

How pandas groupby works in Python? Best example