How pandas merge works in Python? Best example

How pandas merge works in Python? Best example
“`html

When working with data in Python, one of the most common tasks is combining multiple datasets. Whether you’re dealing with CSV files, database tables, or API responses, pandas.merge() allows you to efficiently merge, join, and integrate data. In this article, I’ll give you the best example of how pandas merge works in Python and cover everything you need to know to use it effectively.

Understanding pandas.merge()

The pandas.merge() function is used to combine two DataFrames based on a common column or index. It operates similarly to SQL joins, offering different types of joins such as inner, outer, left, and right. By specifying the right parameters, you can control how data is merged to fit your use case.

Basic Syntax of pandas.merge()

The basic syntax of pandas.merge() is as follows:

import pandas as pd

merged_df = pd.merge(df1, df2, how='inner', on='common_column')

Here’s what each parameter does:

  • df1, df2: The two DataFrames you want to merge.
  • how: Specifies the type of merge:
    • inner – Returns only matching rows (default).
    • outer – Returns all rows from both DataFrames, filling in missing values with NaN.
    • left – Returns all rows from the first DataFrame and matching rows from the second.
    • right – Returns all rows from the second DataFrame and matching rows from the first.
  • on: Specifies the key column(s) to merge on.

Best Example: Merging Two DataFrames

Let’s dive into an example of how pandas merge works in Python using the best approach.

import pandas as pd

# Creating the first DataFrame
data1 = {'ID': [1, 2, 3, 4], 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40]}
df1 = pd.DataFrame(data1)

# Creating the second DataFrame
data2 = {'ID': [3, 4, 5, 6], 'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago'], 'Salary': [70000, 80000, 90000, 100000]}
df2 = pd.DataFrame(data2)

# Merging the DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID', how='inner')

print(merged_df)

Understanding the Output

The result of this merge would be:

ID Name Age City Salary
3 Charlie 35 New York 70000
4 David 40 San Francisco 80000

Since we performed an inner join, only rows with matching ID values in both DataFrames (3 and 4) appear in the output.

Other Merge Types: Left, Right, and Outer Joins

Here’s how you can perform different types of merges:

1. Left Join

In a left join, all rows from the left DataFrame are retained, while non-matching rows from the right DataFrame are replaced with NaN.

left_join_df = pd.merge(df1, df2, on='ID', how='left')
print(left_join_df)

2. Right Join

In a right join, all rows from the right DataFrame are retained, and non-matching rows from the left DataFrame are replaced with NaN.

right_join_df = pd.merge(df1, df2, on='ID', how='right')
print(right_join_df)

3. Outer Join

An outer join returns all rows from both DataFrames. Missing values are filled with NaN where no match is found.

outer_join_df = pd.merge(df1, df2, on='ID', how='outer')
print(outer_join_df)

Using Multiple Columns to Merge

Sometimes, merging on just one column isn’t enough. We can specify multiple columns for a more precise merge.

merged_df = pd.merge(df1, df2, on=['ID', 'Name'], how='inner')

Handling Column Name Conflicts

When both DataFrames contain columns with the same name (other than the key column), pandas.merge() adds suffixes to avoid conflicts.

merged_df = pd.merge(df1, df2, on='ID', how='inner', suffixes=('_df1', '_df2'))

Conclusion

Understanding how pandas.merge() works in Python is essential for efficient data analysis and manipulation. Whether you need an inner, outer, left, or right join, this function provides a flexible way to combine datasets. With the best example illustrated above, now you can confidently merge DataFrames and manage relational data with ease.

“` Other interesting article: How pandas concat works in Python? Best example