How pandas itertuples works in Python? Best example

How pandas itertuples works in Python? Best example
“`html

When working with large datasets in pandas, efficiently iterating over rows can be a challenge. One powerful method provided by pandas is itertuples(). This function allows us to iterate over DataFrame rows as named tuples, which can significantly improve performance compared to other row-wise iteration methods.

What is itertuples() in pandas?

itertuples() is a generator method that returns each row of a pandas.DataFrame as a named tuple. Since named tuples are lightweight and faster than dictionaries, this method can be much more efficient than iterrows(), which converts each row into a pandas Series.

Basic Usage of itertuples()

Let’s take a simple example of using itertuples() in a DataFrame:

import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)

# Using itertuples to iterate over DataFrame rows
for row in df.itertuples():
    print(row.Name, row.Age, row.City)

In the output, each row is returned as a tuple where column names are accessible as attributes:

Alice 25 New York
Bob 30 Los Angeles
Charlie 35 Chicago

Comparing itertuples() with iterrows()

Both itertuples() and iterrows() allow row-wise iteration, but there are key differences:

Feature itertuples() iterrows()
Data structure Named tuple Pandas Series
Performance Faster Slower
Memory usage Lower Higher
Column name access Dot notation (e.g., row.Name) Dictionary-style (e.g., row['Name'])

Why Choose itertuples() over iterrows()?

itertuples() is generally preferred because:

  • It’s significantly faster than iterrows().
  • It consumes less memory.
  • It provides direct attribute access to columns.

However, keep in mind that iterating over pandas DataFrames row by row is not the most efficient approach in many cases. Vectorized operations are usually preferred.

Using itertuples() with a Custom Index

By default, itertuples() includes the index as the first element of each named tuple. If we want to exclude it, we can pass the index=False argument:

for row in df.itertuples(index=False):
    print(row.Name, row.Age, row.City)

This prevents the index from being part of the tuple, making access slightly cleaner.

Best Example: Applying a Function Using itertuples()

Let’s look at a practical example where we use itertuples() to apply a function:

def categorize_age(age):
    return "Young" if age < 30 else "Adult"

# Creating a new list using itertuples
categories = [(row.Name, categorize_age(row.Age)) for row in df.itertuples(index=False)]

# Converting the list back to a DataFrame
result_df = pd.DataFrame(categories, columns=['Name', 'Category'])

print(result_df)

The output would be:

     Name Category
0  Alice   Young
1    Bob   Adult
2 Charlie   Adult

Final Thoughts

Using itertuples() is an excellent way to iterate over rows efficiently when working with pandas in Python. It provides a balance between readability and performance, making it a solid choice when row-wise iteration is necessary. However, whenever possible, aim to utilize vectorized operations or built-in pandas functions since they are significantly faster.

“` Other interesting article: How pandas iterrows works in Python? Best example