How pandas diff works in Python? Best example

How pandas diff works in Python? Best example
“`html

When working with time series or sequential data in Python, one of the most common tasks is calculating the difference between consecutive values. This is where pandas.diff() comes in handy. In this article, I’ll walk you through how pandas.diff() works in Python with a simple yet effective example.

Understanding pandas.diff()

The diff() method in Pandas calculates the difference between consecutive elements in a Series or DataFrame. It subtracts the previous value from the current one, helping to identify changes over time.

Here’s the basic syntax:

DataFrame.diff(periods=1, axis=0)
  • periods: Specifies the number of periods to shift before performing the difference. Default is 1.
  • axis: Determines whether to calculate differences by rows (axis=0) or columns (axis=1).

Example: How pandas.diff() Works

Let’s start by creating a simple DataFrame to see diff() in action:

import pandas as pd

data = {'A': [10, 20, 30, 40, 50], 
        'B': [5, 15, 25, 35, 45]}
df = pd.DataFrame(data)

print(df.diff())

This will produce the following output:

A B
0 NaN NaN
1 10.0 10.0
2 10.0 10.0
3 10.0 10.0
4 10.0 10.0

As you can see, the first row contains NaN values because there’s no previous value to calculate the difference from. Every subsequent row shows the difference between the current and previous values.

Using periods in diff()

The periods parameter allows us to compute the difference between non-adjacent rows. Let me show you an example:

print(df.diff(periods=2))

The result will be:

A B
0 NaN NaN
1 NaN NaN
2 20.0 20.0
3 20.0 20.0
4 20.0 20.0

Now, instead of comparing consecutive rows, the difference is calculated between the current row and the one two rows before.

Using axis=1 for Column-Wise Differences

If we want to calculate the difference between columns instead of rows, we use axis=1:

print(df.diff(axis=1))

The output:

A B
0 NaN -5.0
1 NaN -5.0
2 NaN -5.0
3 NaN -5.0
4 NaN -5.0

The difference is now calculated between the values in column B and column A for each row.

Real-World Use Cases of pandas.diff()

Here are some common use cases where diff() proves to be invaluable:

  1. Stock Market Analysis: Calculate daily stock price changes.
  2. Sensor Data Processing: Find variations in consecutive time readings.
  3. Website Traffic Analysis: Measure the difference in visitors between days.
  4. Financial Data: Compute revenue growth over time.

Summary

If you’ve been wondering, “How pandas diff works in Python? Best example,” I hope this article clarifies it for you. The pandas.diff() function provides a simple yet powerful way to compute differences in both tabular and time series data. Whether you need row-wise or column-wise differences, adjusting the periods and axis parameters gives you full control.

“` Other interesting article: How pandas cumprod works in Python? Best example