How to Select Rows from Pandas DataFrame

Pandas is a popular data manipulation library built on top of the Python NumPy library. It provides two main data structures: Series, which represents one-dimensional labeled data, and DataFrame, which represents two-dimensional tabular data. The DataFrame is capable of handling both homogeneous and heterogeneous data, making it versatile for various data analysis tasks. With Pandas DataFrame, you can easily perform essential operations on rows, such as selecting, deleting, adding, and renaming, allowing for efficient data manipulation and exploration.

Create a Pandas DataFrame with data

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['TotalMarks'] = [82, 38, 63,22,55,40] df['Grade'] = ['A', 'E', 'B','E','C','D'] df['Promoted'] = [True, False,True,False,True,True] df
Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True 3 Jim 22 E False 4 Harry 55 C True 5 Ben 40 D True

Selecting rows using []

You can use square brackets with row index or row labels to access specific rows from a Pandas DataFrame. For example, df[2:5] would retrieve rows with index 2, 3, and 4, and df.loc['label'] would retrieve the row with the label 'label'. This indexing method allows you to extract and work with specific rows of your DataFrame efficiently.

df[2:4]
Name TotalMarks Grade Promoted 2 Bill 63 B True 3 Jim 22 E False
**Select rows starting from 2nd row position upto 4th row position of all columns.

Selected columns

When using square brackets to access data from a Pandas DataFrame, you can specify the column names as well. For example, df['column_name'] would retrieve the entire column with the name 'column_name', and df[['column1', 'column2']] would retrieve a subset of the DataFrame containing only the 'column1' and 'column2' columns. This allows you to select specific columns of interest and perform operations on them.

df[2:4][['TotalMarks', 'Grade']]
TotalMarks Grade 2 63 B 3 22 E
**Select rows starting from 2nd row position upto 4th row position of columns 'TotalMarks'and 'Grade' .

Selecting rows using loc[]

df.iloc[2:4]
Name TotalMarks Grade Promoted 2 Bill 63 B True 3 Jim 22 E False
**Select rows starting from 2nd row position upto 4th row position of all columns.

Selected columns

When using the loc method to access data from a Pandas DataFrame, you can specify both row and column labels. The syntax for using loc is df.loc[row_label, column_label].

For example, df.loc[3, 'column_name'] would retrieve the value at row 3 and the specified column with the name 'column_name'. Similarly, df.loc[:, ['column1', 'column2']] would retrieve all rows and only the 'column1' and 'column2' columns.

Using loc provides more flexibility in selecting specific rows and columns based on their labels, making it useful for various data selection tasks in Pandas.

df.loc[2:4, ['TotalMarks', 'Grade']]
TotalMarks Grade 2 63 B 3 22 E 4 55 C
**Select rows starting from 2nd row position upto 4th row position of columns 'TotalMarks'and 'Grade' .

Select rows based on condition using loc

df.loc[df['Grade'] == 'E']
Name TotalMarks Grade Promoted 1 Doe 38 E False 3 Jim 22 E False
**Select all rows from DataFrame where Grade is 'E'.

Using 'loc' and '!='

df.loc[df['Grade'] != 'E']
Name TotalMarks Grade Promoted 0 John 82 A True 2 Bill 63 B True 4 Harry 55 C True 5 Ben 40 D True
**Select all rows whose Grade does not equal 'E'.

Combine multiple conditions with & operator

df.loc[(df['TotalMarks'] >= 50) & (df['TotalMarks'] <= 79)]
Name TotalMarks Grade Promoted 2 Bill 63 B True 4 Harry 55 C True
**Select all rows from DataFrame where total marks greater than 50 and less than 79.

Selected columns using loc

df.loc[(df['TotalMarks'] >= 50) & (df['TotalMarks'] <= 79), ['Name','TotalMarks', 'Grade']]
Name TotalMarks Grade 2 Bill 63 B 4 Harry 55 C
**Retrieve Name, TotalMarks, Grade column where total marks greater than 50 and less than 79.
How to Select Rows from Pandas DataFrame

Using loc[] and isin()

df.loc[df['Grade'].isin(['A', 'B'])]
Name TotalMarks Grade Promoted 0 John 82 A True 2 Bill 63 B True
**Select all rows where grade is 'A' or 'B'

Selected column using loc[] and isin()

df.loc[df['Grade'].isin(['A', 'B']),['Name','TotalMarks', 'Grade'] ]
Name TotalMarks Grade 0 John 82 A 2 Bill 63 B
**Select only Name, TotalMarks, Grade columns where grade is 'A' or 'B'

Using Dataframe.query()

df.query('Grade == "A" Grade == "B" ')
Name TotalMarks Grade Promoted 0 John 82 A True 2 Bill 63 B True

Conclusion

In Pandas DataFrame, you can select rows using different methods like integer indexing, slicing, and boolean indexing. The iloc and loc methods provide ways to access rows based on integer positions and labels, respectively, offering flexibility in row selection.