How pandas cumsum works in Python? Best example

How pandas cumsum works in Python? Best example
“`html

When working with data in Python, one of the most common operations is calculating cumulative sums. Whether you’re analyzing financial data, tracking running totals, or performing time-series analysis, pandas.cumsum() is an essential function. In this article, I’ll break down exactly how pandas.cumsum() works and provide the best examples to illustrate its use.

What is pandas.cumsum()?

The function pandas.cumsum() is a built-in method in the Pandas library that computes the cumulative sum of elements along a specified axis. It works on both Series and DataFrame objects, making it highly versatile for different data structures.

How to Use pandas.cumsum()

Using pandas.cumsum() is straightforward. Let’s see a basic example using a Pandas Series:

import pandas as pd

# Creating a sample Series
data = pd.Series([1, 2, 3, 4, 5])

# Applying cumsum()
cumulative_sum = data.cumsum()

print(cumulative_sum)

Output:

0     1
1     3
2     6
3    10
4    15
dtype: int64

How pandas cumsum works with DataFrame

The cumsum() function is equally effective when used on DataFrames. By default, it operates along rows (axis=0), but you can specify columns (axis=1) if needed.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50]
})

# Applying cumsum() on columns
df_cumsum = df.cumsum()

print(df_cumsum)

Output:

    A   B
0   1  10
1   3  30
2   6  60
3  10 100
4  15 150

Handling Missing (NaN) Values

If your data contains NaN values, cumsum() will ignore them but continue summing non-null values:

import numpy as np

df_nan = pd.DataFrame({
    'A': [1, np.nan, 3, 4, np.nan],
    'B': [10, 20, np.nan, 40, 50]
})

print(df_nan.cumsum())

Output:

      A     B
0   1.0  10.0
1   NaN  30.0
2   4.0   NaN
3   8.0  70.0
4   NaN 120.0

Using cumsum() Along Columns

By setting axis=1, we can apply cumulative sums across columns:

df_col_cumsum = df.cumsum(axis=1)

print(df_col_cumsum)

Output:

    A   B
0   1  11
1   2  22
2   3  33
3   4  44
4   5  55

Comparison Table

Here’s a quick comparison of different cumsum() behaviors:

Operation Description Example
df.cumsum() Cumulative sum by default along rows (axis=0) Column-wise summation
df.cumsum(axis=1) Cumulative sum across columns Row-wise summation
df.cumsum(skipna=False) Includes NaN values in calculations NaN propagation

Best Practices for Using pandas.cumsum()

When using cumsum(), keep these best practices in mind:

  • Make sure your data is properly cleaned to avoid unexpected NaN handling.
  • Use axis=1 when working with row-wise computations.
  • If necessary, use fillna() before applying cumulative summation to ensure continuity.

Conclusion

Understanding how pandas.cumsum() works in Python can help you efficiently compute running totals in both Series and DataFrames. This versatile method is crucial for numerous data analysis tasks, from financial modeling to time-series aggregation.

“` Other interesting article: How pandas cov works in Python? Best example