How pandas sort_values works in Python? Best example

How pandas sort_values works in Python? Best example
“`html

Understanding pandas.sort_values() in Python

If you are working with data in Python, you have probably encountered pandas. One of the most essential methods for organizing data in a DataFrame is sort_values(). This method allows us to sort our data based on one or more columns, in ascending or descending order, handling missing values efficiently.

Basic Usage of pandas.sort_values()

The sort_values() method is used to sort a pandas DataFrame based on the values in one or more columns. Let’s take a look at the basic syntax:

DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, na_position='last', kind='quicksort', ignore_index=False)

Here’s what each parameter does:

  • by: Column (or list of columns) to sort by.
  • axis: Whether to sort by rows (0) or columns (1).
  • ascending: Sort in ascending (True) or descending (False) order.
  • inplace: If True, sorts the DataFrame in place; otherwise, it returns a new sorted DataFrame.
  • na_position: Whether to place NaN values at the beginning (‘first’) or end (‘last’).
  • kind: Sorting algorithm to use. Options: ‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’.
  • ignore_index: If True, resets the index after sorting.

Sorting a DataFrame by a Single Column

Let’s start with a simple example to sort a DataFrame by a single column.

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Score': [85, 92, 78, 88]}

df = pd.DataFrame(data)

# Sort by Score in ascending order
sorted_df = df.sort_values(by='Score')

print(sorted_df)

The result will be:

Name Score
Charlie 78
Alice 85
David 88
Bob 92

Sorting in Descending Order

If you want to sort in descending order, simply set ascending=False:

sorted_df_desc = df.sort_values(by='Score', ascending=False)
print(sorted_df_desc)

This will return:

Name Score
Bob 92
David 88
Alice 85
Charlie 78

Sorting by Multiple Columns

Sometimes, you might want to sort by multiple columns. This is easily done by passing a list of column names:

df_multisort = df.sort_values(by=['Score', 'Name'], ascending=[True, False])
print(df_multisort)

Here, the DataFrame is first sorted by Score in ascending order, and in case of ties, it sorts by Name in descending order.

Handling Missing Values

If your DataFrame contains NaN values, you can control their position using the na_position parameter.

data_with_nan = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                 'Score': [85, None, 78, 92]}

df_nan = pd.DataFrame(data_with_nan)

# Sort and place NaN values first
df_sorted_nan = df_nan.sort_values(by='Score', na_position='first')
print(df_sorted_nan)

This ensures NaN values appear at the top, rather than at the bottom (which is the default).

Sorting Index Instead of Values

If you need to sort the DataFrame by index, you should use sort_index() instead:

df_sorted_index = df.sort_index()
print(df_sorted_index)

Performance Considerations

Pandas provides different sorting algorithms through the kind parameter. Here are the most commonly used ones:

  • ‘quicksort’ – Fast but not stable.
  • ‘mergesort’ – Stable but a bit slower.
  • ‘heapsort’ – Not stable and inefficient for large datasets.
  • ‘stable’ – Ensures that equal values retain their original order.

For large datasets where stability is crucial, mergesort is a good choice.

Final Thoughts

The sort_values() method is a powerful and flexible way to organize data in pandas. Whether sorting by single or multiple columns, handling missing values, or choosing the best sorting algorithm, this method offers a wide range of functionalities. Mastering it will make data manipulation much easier and more efficient.

“` Other interesting article: How pandas filter works in Python? Best example