Sort Pandas DataFrame with Examples

DataFrames, as a fundamental data structure in Pandas, present an array of capabilities for effective data organization and manipulation. Among these functionalities, sorting stands as a crucial operation to arrange the DataFrame's contents systematically, enabling insightful data exploration and analysis.

sort_values()

You can use the pandas dataframe sort_values() function to sort a dataframe.

sort_values(by, axis=0, ascending=True,na_position='first', kind='quicksort')

The sort_values() method, a cornerstone of DataFrame sorting, imparts remarkable flexibility, permitting users to customize the sorting process according to their specific requirements. With this method, analysts can sort the DataFrame based on one or multiple columns, orchestrating both ascending and descending orders to tailor the output to their precise needs.

df.sort_values(by=["Name"])

Above code sorting by "Name" column in default ascending order.

Lets' create a DataFrame...

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['TotalMarks'] = [82, 38, 63,22,55,40] df['Grade'] = ['A', 'E', 'B','E','C','D'] df['Promoted'] = [True, False,True,False,True,True] df
Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True 3 Jim 22 E False 4 Harry 55 C True 5 Ben 40 D True

Sort by single column

df.sort_values(by=["Name"])
Name TotalMarks Grade Promoted 5 Ben 40 D True 2 Bill 63 B True 1 Doe 38 E False 4 Harry 55 C True 3 Jim 22 E False 0 John 82 A True

Sort by two columns

df.sort_values(by=["TotalMarks","Name"])

The sort_values() method empowers data professionals to exert control over the sorting algorithm employed, ensuring optimal performance and efficiency when dealing with datasets of varying sizes and complexities. This versatility in sorting algorithms allows users to align their data analysis with the underlying data characteristics, maximizing accuracy and relevance in the obtained results.

Sort by column in descending order

By default DataFrame is sorted by ascending order, if you want to sort in descending order you have to set the ascending=False inside the sort_values() method.

df.sort_values(by=["TotalMarks"],ascending=False)
Name TotalMarks Grade Promoted 0 John 82 A True 2 Bill 63 B True 4 Harry 55 C True 5 Ben 40 D True 1 Doe 38 E False 3 Jim 22 E False

Handling missing values (NaNs) during the sorting process is yet another valuable attribute of the sort_values() method. By offering options to dictate how NaNs should be treated, analysts can steer the sorting operation to reflect their data handling preferences accurately, thereby enhancing the integrity of the analysis.

Sort by missing value

df = pd.DataFrame({'x':[1.0, np.NaN, 3.0, 4.0]}) df
x 0 1.0 1 NaN 2 3.0 3 4.0

Sort by missing value first/last

df.sort_values(by=["x"],na_position='first')
x 1 NaN 0 1.0 2 3.0 3 4.0

The capability to apply a custom key for sorting introduces an additional layer of personalization in the data manipulation process. This allows data professionals to imbue their analyses with domain-specific knowledge, ensuring that the sorting procedure aligns seamlessly with the intricacies of the dataset and the insights sought.

Also, you can sort by missing value last using the following method.

df.sort_values(by=["x"],na_position='last')
x 0 1.0 2 3.0 3 4.0 1 NaN

Conclusion

The sort_values() method symbolizes the epitome of efficiency and adaptability when it comes to sorting DataFrames in Pandas. Its multifaceted functionality caters to diverse use cases, ranging from basic data organization to intricate exploratory data analysis and statistical modeling. By using the power of this method, data analysts can uncover profound patterns, identify trends, and glean valuable insights, empowering them to make informed decisions and drive data-driven solutions with utmost precision.