How pandas set_index works in Python? Best example

One of the most powerful functions in the Pandas library is set_index(). It allows us to change the index of a DataFrame, transforming the way we organize and access data. Whether we want to set a new custom index, optimize performance, or simply clean up our dataset, set_index() is incredibly useful. In this article, I’ll explain how pandas.set_index() works in Python with the best examples.

What is `set_index()` in Pandas?

set_index() is a method in Pandas that allows us to assign one or more columns as the index of a DataFrame. By default, DataFrames in Pandas have a numerical index starting from 0. However, if we have a column that represents unique values (like an ID, date, or category), we can use it as the index to make our data more meaningful.

Basic Syntax of `set_index()`

Here’s the basic syntax of set_index():


import pandas as pd

df.set_index(keys, drop=True, append=False, inplace=False)

The key parameters in set_index() are:

keys: The column (or list of columns) to be set as the index.
drop: If set to True, the column used as the index will be removed from the DataFrame (default is True).
append: If True, the column is added to the existing index instead of replacing it.
inplace: If True, the operation modifies the original DataFrame instead of returning a new one.

Example 1: Setting a Single Column as Index

Let’s start with a simple DataFrame:


import pandas as pd

data = {
    'ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Score': [90, 85, 88, 92]
}

df = pd.DataFrame(data)
print(df)

The output looks like this:

ID	Name	Score
101	Alice	90
102	Bob	85
103	Charlie	88
104	David	92

Now, let’s use set_index() to make the ID column the new index:


df = df.set_index('ID')
print(df)

Now, our DataFrame looks like this:

ID	Name	Score
101	Alice	90
102	Bob	85
103	Charlie	88
104	David	92

Example 2: Keeping the Original Column

By default, set_index() removes the column from the DataFrame. If we want to keep it, we can use drop=False:


df = pd.DataFrame(data)
df = df.set_index('ID', drop=False)
print(df)

Now, the ID column remains part of the DataFrame while still serving as the index.

Example 3: Setting Multiple Columns as Index

We can also set multiple columns as the index:


df = df.set_index(['ID', 'Name'])
print(df)

This creates a hierarchical index where both ID and Name define the rows.

Example 4: Resetting the Index

If we ever need to revert back to the default index, we can use reset_index():


df = df.reset_index()
print(df)

This will bring back all indexed columns as regular columns.

Example 5: Using `inplace=True`

By default, set_index() returns a new DataFrame. If we want to modify the original DataFrame directly, we can use inplace=True:


df.set_index('ID', inplace=True)

When to Use `set_index()`?

Using set_index() makes sense in numerous cases:

When we have a unique identifier column (such as an Order ID or User ID).
When sorting and selecting subsets of data based on index values.
When merging or joining with another DataFrame based on index.
When working with time series data, where DateTime should be the index.

Conclusion

Using set_index() in Pandas is a great way to better structure and organize data. Whether we want to work with a single index or create multi-index DataFrames, it provides flexibility and improves data management. Now that you understand how pandas.set_index() works in Python with the best examples, you can apply it to your own datasets with confidence!

Other interesting article:

How pandas rename works in Python? Best example