How pandas sort_index works in Python? Best example

When working with data in pandas, sorting is a fundamental operation. One of the most efficient ways to sort a DataFrame or Series by its index is using sort_index(). This function comes in handy when we want to organize data based on index labels rather than values. Let’s dive deeper into how it works and see some practical examples.

What is `sort_index()` in pandas?

The sort_index() method in pandas allows us to sort a DataFrame or Series by its index values. This is extremely useful when dealing with labeled data, such as time series or multi-indexed DataFrames.

Basic Syntax of `sort_index()`

The basic syntax of sort_index() is pretty straightforward:

DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

Here’s what each parameter does:

axis: Determines whether to sort rows (axis=0, default) or columns (axis=1).
level: If using a MultiIndex, this lets you specify which level to sort by.
ascending: Sorts in ascending order by default. Use False for descending order.
inplace: If True, modifies the original DataFrame instead of returning a new one.
kind: Specifies the sorting algorithm ('quicksort', 'mergesort', 'heapsort', etc.).
na_position: Determines whether NaNs are placed at the start or end.
sort_remaining: If sorting by multiple levels, this controls sorting of the remaining unsorted ones.
ignore_index: If True, the result will have a new integer index.
key: Accepts a function to modify index labels before sorting.

Sorting a DataFrame by Index

Let’s explore a simple example where we sort a DataFrame by its index.

import pandas as pd

# Creating a DataFrame
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data, index=['c', 'a', 'b'])

print("Original DataFrame:")
print(df)

# Sorting by index
sorted_df = df.sort_index()

print("\nDataFrame sorted by index:")
print(sorted_df)

Output:

Original DataFrame:
    A   B
c  10  40
a  20  50
b  30  60

DataFrame sorted by index:
    A   B
a  20  50
b  30  60
c  10  40

Sorting in Descending Order

If we want our index to be sorted in descending order, we just set ascending=False:

sorted_df_desc = df.sort_index(ascending=False)
print(sorted_df_desc)

Output:

Sorting with NaN Index Values

When dealing with missing index values, we can control their position.

import numpy as np

# Creating a DataFrame with NaN index
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
index = [np.nan, 'c', 'a', 'b']
df_nan = pd.DataFrame(data, index=index)

print("Original DataFrame with NaN index:")
print(df_nan)

# Sorting with NaN first
sorted_nan_first = df_nan.sort_index(na_position='first')

print("\nDataFrame sorted placing NaNs first:")
print(sorted_nan_first)

Output:

Original DataFrame with NaN index:
     A  B
NaN  1  5
c    2  6
a    3  7
b    4  8

DataFrame sorted placing NaNs first:
     A  B
NaN  1  5
a    3  7
b    4  8
c    2  6

Sorting a MultiIndex DataFrame

Sorting a MultiIndex DataFrame requires using the level parameter.

# Creating a MultiIndex DataFrame
arrays = [['A', 'A', 'B', 'B'], [2, 1, 2, 1]]
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['first', 'second'])
data = {'value': [10, 20, 30, 40]}
df_multi = pd.DataFrame(data, index=index)

print("Original MultiIndex DataFrame:")
print(df_multi)

# Sorting by the first level
sorted_multi = df_multi.sort_index(level='first')

print("\nSorted MultiIndex DataFrame by first level:")
print(sorted_multi)

Performance Considerations

Sorting can be slow on large datasets, but choosing the optimal sorting kind can improve performance. The available options include:

'quicksort': Fast and efficient (default).
'mergesort': Stable sort, often used for multi-level sorting.
'heapsort': Less commonly used but available.
'stable': Ensures stable sorting.

Comparison of Sorting Techniques

The following table compares different sorting techniques and their characteristics:

Sorting Method	Stability	Speed
quicksort	No	Fast
mergesort	Yes	Moderate
heapsort	No	Slow
stable	Yes	Depends on implementation

Final Thoughts

Using pandas’ sort_index() allows us to efficiently sort data by its index, whether it’s a single index, MultiIndex, or contains NaN values. Understanding its parameters and behavior can drastically improve data manipulation workflows. Whether you’re dealing with time-series data or hierarchical indexing, mastering sort_index() is an essential skill in pandas.

Other interesting article:

How pandas sort_values works in Python? Best example

How pandas sort_index works in Python? Best example

What is sort_index() in pandas?

Basic Syntax of sort_index()

Sorting a DataFrame by Index

Sorting in Descending Order

Sorting with NaN Index Values

Sorting a MultiIndex DataFrame

Performance Considerations

Comparison of Sorting Techniques

Final Thoughts

What is `sort_index()` in pandas?

Basic Syntax of `sort_index()`