
When working with data in Python, one of the most common operations is calculating cumulative sums. Whether you’re analyzing financial data, tracking running totals, or performing time-series analysis, pandas.cumsum()
is an essential function. In this article, I’ll break down exactly how pandas.cumsum()
works and provide the best examples to illustrate its use.
What is pandas.cumsum()?
The function pandas.cumsum()
is a built-in method in the Pandas library that computes the cumulative sum of elements along a specified axis. It works on both Series and DataFrame objects, making it highly versatile for different data structures.
How to Use pandas.cumsum()
Using pandas.cumsum()
is straightforward. Let’s see a basic example using a Pandas Series:
import pandas as pd
# Creating a sample Series
data = pd.Series([1, 2, 3, 4, 5])
# Applying cumsum()
cumulative_sum = data.cumsum()
print(cumulative_sum)
Output:
0 1
1 3
2 6
3 10
4 15
dtype: int64
How pandas cumsum works with DataFrame
The cumsum()
function is equally effective when used on DataFrames. By default, it operates along rows (axis=0), but you can specify columns (axis=1) if needed.
import pandas as pd
# Creating a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
})
# Applying cumsum() on columns
df_cumsum = df.cumsum()
print(df_cumsum)
Output:
A B
0 1 10
1 3 30
2 6 60
3 10 100
4 15 150
Handling Missing (NaN) Values
If your data contains NaN
values, cumsum()
will ignore them but continue summing non-null values:
import numpy as np
df_nan = pd.DataFrame({
'A': [1, np.nan, 3, 4, np.nan],
'B': [10, 20, np.nan, 40, 50]
})
print(df_nan.cumsum())
Output:
A B
0 1.0 10.0
1 NaN 30.0
2 4.0 NaN
3 8.0 70.0
4 NaN 120.0
Using cumsum() Along Columns
By setting axis=1
, we can apply cumulative sums across columns:
df_col_cumsum = df.cumsum(axis=1)
print(df_col_cumsum)
Output:
A B
0 1 11
1 2 22
2 3 33
3 4 44
4 5 55
Comparison Table
Here’s a quick comparison of different cumsum()
behaviors:
Operation | Description | Example |
---|---|---|
df.cumsum() |
Cumulative sum by default along rows (axis=0) | Column-wise summation |
df.cumsum(axis=1) |
Cumulative sum across columns | Row-wise summation |
df.cumsum(skipna=False) |
Includes NaN values in calculations | NaN propagation |
Best Practices for Using pandas.cumsum()
When using cumsum()
, keep these best practices in mind:
- Make sure your data is properly cleaned to avoid unexpected
NaN
handling. - Use
axis=1
when working with row-wise computations. - If necessary, use
fillna()
before applying cumulative summation to ensure continuity.
Conclusion
Understanding how pandas.cumsum()
works in Python can help you efficiently compute running totals in both Series and DataFrames. This versatile method is crucial for numerous data analysis tasks, from financial modeling to time-series aggregation.