How pandas resample works in Python? Best example

How pandas resample works in Python? Best example
“`html

If you work with time series data in Python, chances are you’ve come across the pandas library. One of its most powerful features is the resample() method, which allows you to adjust the frequency of your time series data effortlessly. In this guide, I’ll break down how pandas.resample() works, when to use it, and provide some practical examples along the way.

What is pandas.resample()?

The resample() method is specifically designed for time series data. It helps in changing the frequency of observations by either upsampling (increasing frequency) or downsampling (decreasing frequency). This method is particularly useful when dealing with datasets that have irregular time intervals or when aggregating data.

Basic Syntax of pandas.resample()

The syntax for using resample() is straightforward:

DataFrame.resample(rule, on=None, level=None, axis=0, closed='right', label='right', convention='start', kind=None, loffset=None, base=0, origin='start_day', offset=None)

While there are several parameters available, you will primarily work with:

  • rule: Defines the new frequency (e.g., ‘D’ for daily, ‘M’ for monthly).
  • on: Specifies the column containing datetime values (useful for non-index time series).
  • method: Determines how data will be aggregated or interpolated (e.g., sum, mean, max).

Common Resampling Frequencies

When resampling data, you can specify various frequencies using predefined string aliases. Here are some commonly used ones:

Alias Description
‘D’ Daily frequency
‘W’ Weekly frequency
‘M’ Month-end frequency
‘Q’ Quarter-end frequency
‘A’ Year-end frequency
‘H’ Hourly frequency
‘T’ Minute frequency

Downsampling: Reducing Frequency

Downsampling is when we reduce the frequency of data, such as converting hourly data into daily aggregates.

Here’s an example where we take hourly data and resample it to daily sums:

import pandas as pd
import numpy as np

# Creating hourly time series data
date_rng = pd.date_range(start='2024-01-01', end='2024-01-07', freq='H')
df = pd.DataFrame({'date': date_rng, 'value': np.random.randint(1, 100, size=len(date_rng))})

# Setting the date column as index
df.set_index('date', inplace=True)

# Resampling to daily frequency using sum
df_daily = df.resample('D').sum()
print(df_daily.head())

By specifying 'D', we tell Pandas to resample data into daily intervals, summing up the hourly values.

Upsampling: Increasing Frequency

Upsampling is when we increase the frequency of data by filling in missing intervals. This often requires interpolation or forward-filling techniques.

Here’s an example converting daily data into hourly data:

# Upsampling to hourly frequency with forward fill
df_hourly = df_daily.resample('H').ffill()
print(df_hourly.head(10))

Using ffill(), we ensure that missing values are filled using the last available value.

Applying Aggregation Methods

The real power of resample() comes from the ability to apply different aggregation methods. Some common ones include:

  • sum() – Summing values within each resampled period
  • mean() – Taking the average of values
  • max() – Finding the maximum value
  • min() – Finding the minimum value
  • count() – Counting the number of occurrences

Example using monthly mean:

df_monthly = df.resample('M').mean()
print(df_monthly.head())

By changing the frequency to 'M', we compute the mean value for each month.

Handling Non-DateTime Index

If your DataFrame doesn’t have a DateTime index, you must specify which column contains dates using the on parameter:

df_resampled = df.resample('D', on='date').sum()

Without setting the index first, resample() won’t work unless on is specified.

Combining Multiple Resampling Operations

Sometimes, you might need to apply multiple resampling operations sequentially. For example, first converting daily data into weekly and then taking the mean:

df_weekly = df.resample('W').mean()
print(df_weekly.head())

This approach is useful when preparing data for machine learning models or summarizing trends.

Conclusion

The pandas.resample() function is incredibly versatile for working with time series data. Whether you need to downsample, upsample, or apply aggregations, it provides a seamless way to manipulate time-indexed data.

Now that you know how pandas.resample() works in Python, experiment with it on your datasets to fully grasp its potential. Hope this guide helped clarify any doubts!

“` Other interesting article: How pandas expanding works in Python? Best example