How pandas set_index works in Python? Best example

How pandas set_index works in Python? Best example
“`html

One of the most powerful functions in the Pandas library is set_index(). It allows us to change the index of a DataFrame, transforming the way we organize and access data. Whether we want to set a new custom index, optimize performance, or simply clean up our dataset, set_index() is incredibly useful. In this article, I’ll explain how pandas.set_index() works in Python with the best examples.

What is set_index() in Pandas?

set_index() is a method in Pandas that allows us to assign one or more columns as the index of a DataFrame. By default, DataFrames in Pandas have a numerical index starting from 0. However, if we have a column that represents unique values (like an ID, date, or category), we can use it as the index to make our data more meaningful.

Basic Syntax of set_index()

Here’s the basic syntax of set_index():


import pandas as pd

df.set_index(keys, drop=True, append=False, inplace=False)

The key parameters in set_index() are:

  • keys: The column (or list of columns) to be set as the index.
  • drop: If set to True, the column used as the index will be removed from the DataFrame (default is True).
  • append: If True, the column is added to the existing index instead of replacing it.
  • inplace: If True, the operation modifies the original DataFrame instead of returning a new one.

Example 1: Setting a Single Column as Index

Let’s start with a simple DataFrame:


import pandas as pd

data = {
    'ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Score': [90, 85, 88, 92]
}

df = pd.DataFrame(data)
print(df)

The output looks like this:

ID Name Score
101 Alice 90
102 Bob 85
103 Charlie 88
104 David 92

Now, let’s use set_index() to make the ID column the new index:


df = df.set_index('ID')
print(df)

Now, our DataFrame looks like this:

ID Name Score
101 Alice 90
102 Bob 85
103 Charlie 88
104 David 92

Example 2: Keeping the Original Column

By default, set_index() removes the column from the DataFrame. If we want to keep it, we can use drop=False:


df = pd.DataFrame(data)
df = df.set_index('ID', drop=False)
print(df)

Now, the ID column remains part of the DataFrame while still serving as the index.

Example 3: Setting Multiple Columns as Index

We can also set multiple columns as the index:


df = df.set_index(['ID', 'Name'])
print(df)

This creates a hierarchical index where both ID and Name define the rows.

Example 4: Resetting the Index

If we ever need to revert back to the default index, we can use reset_index():


df = df.reset_index()
print(df)

This will bring back all indexed columns as regular columns.

Example 5: Using inplace=True

By default, set_index() returns a new DataFrame. If we want to modify the original DataFrame directly, we can use inplace=True:


df.set_index('ID', inplace=True)

When to Use set_index()?

Using set_index() makes sense in numerous cases:

  • When we have a unique identifier column (such as an Order ID or User ID).
  • When sorting and selecting subsets of data based on index values.
  • When merging or joining with another DataFrame based on index.
  • When working with time series data, where DateTime should be the index.

Conclusion

Using set_index() in Pandas is a great way to better structure and organize data. Whether we want to work with a single index or create multi-index DataFrames, it provides flexibility and improves data management. Now that you understand how pandas.set_index() works in Python with the best examples, you can apply it to your own datasets with confidence!

“` Other interesting article: How pandas rename works in Python? Best example