How pandas shift works in Python? Best example

How pandas shift works in Python? Best example
“`html

When working with data in Python, the pandas library provides many useful functions for manipulating and analyzing datasets. One such function is shift(), which allows us to shift values in a DataFrame or Series vertically or horizontally. This is particularly useful for time series data, calculating differences between consecutive rows, or aligning data for analysis.

Understanding pandas.shift()

The shift() function in pandas moves data up or down along an axis while leaving the index unchanged. By default, it shifts values downward, filling the top rows with NaN. The primary use case is for creating lagged datasets in time series analysis, but it has many other applications.

The basic syntax of shift() is:

DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None)
  • periods – The number of periods to shift (default: 1).
  • axis – Whether to shift along rows (axis=0) or columns (axis=1).
  • freq – For time series, specifies a frequency string like ‘D’ for days.
  • fill_value – The value to fill newly created gaps instead of NaN.

Basic Example of pandas.shift()

Let’s see how shift() works using a simple DataFrame:

import pandas as pd

data = {'A': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

df['Shifted'] = df['A'].shift(1)

print(df)

The output will be:

AShifted
10NaN
2010.0
3020.0
4030.0
5040.0

Here, each row in the “Shifted” column contains the previous row’s value from column “A”. The first row has NaN because there is no previous value.

Using Negative Shifts

We can also use a negative shift to move values upward:

df['Shifted_Up'] = df['A'].shift(-1)
print(df)

The result:

AShiftedShifted_Up
10NaN20.0
2010.030.0
3020.040.0
4030.050.0
5040.0NaN

Since we used -1 as the periods parameter, we shifted the values up instead of down.

Setting a Custom Fill Value

By default, pandas fills the new empty cells with NaN. However, we can specify a fill_value:

df['Shifted_Filled'] = df['A'].shift(1, fill_value=0)
print(df)

The missing values will be replaced with 0 instead of NaN:

AShifted_Filled
100
2010
3020
4030
5040

Using shift() with Time Series Data

When working with time series data, we might want to shift data by a specific period instead of a fixed number of rows. The freq parameter allows us to do this.

date_rng = pd.date_range(start='2024-01-01', periods=5, freq='D')
df_time = pd.DataFrame({'Date': date_rng, 'Value': [100, 200, 300, 400, 500]})
df_time.set_index('Date', inplace=True)

df_time['Shifted'] = df_time['Value'].shift(periods=1, freq='D')
print(df_time)

This will shift the index forward by one day rather than moving rows.

Practical Applications of pandas.shift()

The shift() function is helpful in various scenarios, such as:

  1. Calculating differences between consecutive rows: df['Difference'] = df['A'] - df['A'].shift(1)
  2. Aligning data for feature engineering: Useful in machine learning models to create lagged features.
  3. Detecting trends in time-series data: Compare past and present values.
  4. Applying custom transformations: Replace missing values with a fixed number when shifting.

Conclusion

Understanding how pandas.shift() works in Python helps when working with time series data, creating lagged variables, or aligning datasets. This simple yet powerful function offers incredible flexibility, whether shifting rows up or down, working with time-based indexes, or even filling missing values with custom values. The best part? It’s easy to use but provides immense value in data analysis workflows.

“` Other interesting article: How pandas diff works in Python? Best example