How pandas concat works in Python? Best example

How pandas concat works in Python? Best example
“`html

When working with pandas in Python, we often need to combine or merge datasets. One of the most useful functions for this purpose is pandas.concat(). This powerful function allows us to concatenate data along different axes, making it an essential tool in data manipulation.

Understanding pandas.concat()

The pandas.concat() function helps in combining two or more pandas objects such as Series or DataFrame. Unlike merge() or join(), which focus on merging based on keys, concat() primarily works by stacking data either vertically (along rows) or horizontally (along columns).

Here is the basic syntax for pandas.concat():

import pandas as pd

pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None)

Where:

  • objs: A sequence or mapping of DataFrames/ Series to concatenate.
  • axis: Determines the axis to concatenate along (0 for rows, 1 for columns). Default is 0.
  • join: Defines how to handle index alignment, options are 'inner' or 'outer' (default is 'outer').
  • ignore_index: If True, it disregards the existing index and creates a new one.
  • keys: Used to create a hierarchical index if given.

Concatenating DataFrames Vertically

By default, pandas.concat() stacks DataFrames row-wise, meaning along axis=0. Let’s see an example:

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

result = pd.concat([df1, df2])

print(result)

The output will be:

AB
13
24
57
68

Notice that pandas.concat() preserves the original indexes. If we want to reset them, we simply pass ignore_index=True.

Concatenating DataFrames Horizontally

To concatenate along columns, we use axis=1:

df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})

result = pd.concat([df1, df2], axis=1)

print(result)

Which outputs:

AB
13
24

Using Different Indexes

What happens if the DataFrames have different indexes? Let’s try:

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['c', 'd'])

result = pd.concat([df1, df2])

print(result)

Output:

AB
1NaN
2NaN
NaN3
NaN4

Since the indexes don’t match, NaN values appear where data is missing.

Merging with Inner Join

If we want only common indexes, we use join='inner':

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['b', 'c'])

result = pd.concat([df1, df2], axis=1, join='inner')

print(result)

Output:

AB
23

Only the common index 'b' is retained.

Creating a MultiIndex with Keys

We can create a hierarchical index using keys:

df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

result = pd.concat([df1, df2], keys=['First', 'Second'])

print(result)

Output:

          A
First  0  1
       1  2
Second 0  3
       1  4

The MultiIndex helps differentiate between original DataFrames.

Using pandas.concat() with Series

We can also concatenate Series objects:

s1 = pd.Series([1, 2, 3])
s2 = pd.Series([4, 5, 6])

result = pd.concat([s1, s2])

print(result)

Output:

0    1
1    2
2    3
0    4
1    5
2    6
dtype: int64

Final Thoughts

Understanding how pandas.concat() works in Python is crucial for efficient data processing. Whether we need to stack data vertically, merge columns horizontally, or handle different indexing techniques, concat() provides a flexible and powerful solution.

If you work with data in Python, mastering pandas.concat() will certainly improve your workflow and make data manipulation seamless. Try experimenting with different parameters to see how they affect the output!

“` Other interesting article: How pandas series works in Python? Best example