How pandas pivot_table works in Python? Best example

How pandas pivot_table works in Python? Best example

When working with data in Python, especially when using the pandas library, one of the most powerful tools for data transformation and summarization is the pivot_table() function. If you’ve ever used pivot tables in Excel, you’ll find this function operates similarly, helping you reshape data into meaningful summaries. In this article, I’ll break down how pandas.pivot_table() works, explain its parameters, and provide a hands-on example.

Understanding pandas.pivot_table()

The pandas.pivot_table() function is used to summarize data by aggregating values across multiple categories. It allows us to transform long-format data into a more readable and structured format. The key benefits include:

  • Summarizing large datasets efficiently.
  • Aggregating data based on one or more columns.
  • Customizing the output using different aggregation functions.

Basic Syntax

Here’s the general syntax for pandas.pivot_table():

import pandas as pd

pd.pivot_table(data, 
               values=None, 
               index=None, 
               columns=None, 
               aggfunc='mean', 
               fill_value=None, 
               margins=False)

Let’s break down the key parameters:

  • data – The DataFrame containing the data.
  • values – The column(s) whose data should be aggregated.
  • index – The column(s) to group by (rows).
  • columns – The column(s) to pivot (columns in the resulting table).
  • aggfunc – The aggregation function (default is ‘mean’ but can be ‘sum’, ‘count’, etc.).
  • fill_value – Replaces NaN values with a specific value.
  • margins – Adds row and column totals when set to True.

Practical Example

To fully understand how pandas pivot_table works, let’s go through a practical example.

Step 1: Prepare the Data

Let’s say we have a dataset of sales transactions:

import pandas as pd

data = {'Region': ['East', 'West', 'East', 'West', 'East', 'West'],
        'Product': ['A', 'A', 'B', 'B', 'A', 'B'],
        'Sales': [200, 150, 300, 250, 100, 400]}

df = pd.DataFrame(data)
print(df)

This will create the following DataFrame:

Region Product Sales
East A 200
West A 150
East B 300
West B 250
East A 100
West B 400

Step 2: Creating a Pivot Table

Now, let’s generate a pivot table to see the total sales per region and product:

pivot = pd.pivot_table(df, 
                       values='Sales', 
                       index='Region', 
                       columns='Product', 
                       aggfunc='sum')

print(pivot)

The output will look like this:

Product A B
East 300 300
West 150 650

Advanced Features

Using Multiple Aggregation Functions

We can use multiple aggregation functions simultaneously:

pivot = pd.pivot_table(df, 
                       values='Sales', 
                       index='Region', 
                       columns='Product', 
                       aggfunc=['sum', 'mean'])

print(pivot)

Handling Missing Data

If the dataset has missing values, we can replace them using fill_value:

pivot = pd.pivot_table(df, 
                       values='Sales', 
                       index='Region', 
                       columns='Product', 
                       aggfunc='sum', 
                       fill_value=0)

print(pivot)

Adding Row and Column Totals

To include total values, we enable the margins parameter:

pivot = pd.pivot_table(df, 
                       values='Sales', 
                       index='Region', 
                       columns='Product', 
                       aggfunc='sum', 
                       margins=True)

print(pivot)

The output will now include a row and column for the grand totals:

Product A B All
East 300 300 600
West 150 650 800
All 450 950 1400

Conclusion

Now you know exactly how pandas.pivot_table() works in Python! It’s a powerful feature that allows you to aggregate and summarize data efficiently. Whether you need to compute sums, averages, or other aggregations, pivot tables are a game-changer for data analysis.

Try out these examples on your own dataset, tweak the parameters, and experiment with different aggregation functions to fully grasp the power of pivot tables!

 

Other interesting article:

How pandas join works in Python? Best example