Selecting columns from Pandas DataFrame

Selecting columns from a Pandas DataFrame can be done using different methods, such as using square brackets [] with column names or a list of column names, using the attribute operator . with the column name, or using the loc and iloc accessors for more advanced selection based on labels or integer positions. These methods offer flexibility and efficiency for retrieving subsets of data from a DataFrame during data manipulation and analysis tasks. Following article will discuss different ways to work with a DataFrame that has a large number of columns.

Create a DataFrame with data

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['TotalMarks'] = [82, 38, 63,22,55,40] df['Grade'] = ['A', 'E', 'B','E','C','D'] df['Promoted'] = [True, False,True,False,True,True]
Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True 3 Jim 22 E False 4 Harry 55 C True 5 Ben 40 D True

Selecting single column from Pandas DataFrame

Python selection filters can be applied to the DataFrame to select a single column or multiple columns based on specific conditions. For example, you can use boolean indexing to filter rows based on certain criteria or use logical operators to combine multiple conditions. These selection filters allow you to extract specific subsets of data from the DataFrame, making it easier to work with specific columns and perform data analysis or manipulation tasks.

df['Name']
0 John 1 Doe 2 Bill 3 Jim 4 Harry 5 Ben

Selecting multiple column from Pandas DataFrame

When you want to select multiple columns from a DataFrame, you can use a list of column names within the selection brackets '[]'. For example, if you have a DataFrame called 'df' and you want to select columns 'column1' and 'column2', you can do it as follows:

selected_columns = df[['column1', 'column2']]

This will create a new DataFrame called 'selected_columns' containing only the data from 'column1' and 'column2' of the original DataFrame 'df'. Using a list of column names allows you to efficiently retrieve specific subsets of data from the DataFrame based on your analysis or processing requirements.

df[['Name','TotalMarks']]
Name TotalMarks 0 John 82 1 Doe 38 2 Bill 63 3 Jim 22 4 Harry 55 5 Ben 40

Here the inner square brackets [] define a Python list with column names from DataFrame, whereas the outer brackets[] are used to select the data from a DataFrame .

If you want to get dimensionality of the DataFrame

df[['Name','TotalMarks']].shape
(6, 2)

Selecting range of columns

#select second and third columns with all rows df[df.columns[1:3]]
TotalMarks Grade 0 82 A 1 38 E 2 63 B 3 22 E 4 55 C 5 40 D

Select two column with first 3 rows

DataFrame.loc[] is a powerful method in Pandas that allows you to access a group of rows and columns in a DataFrame using labels or boolean arrays.

df.loc[0:2, 'Name':'TotalMarks']
Name TotalMarks 0 John 82 1 Doe 38 2 Bill 63

Select all column with first row

df.loc[0, :]
Name John TotalMarks 82 Grade A Promoted True

how to Select columns values from Pandas DataFrame

Select all rows with first three column

df.iloc[:, 0:3]
Name TotalMarks Grade 0 John 82 A 1 Doe 38 E 2 Bill 63 B 3 Jim 22 E 4 Harry 55 C 5 Ben 40 D

Select first three rows with first four column

df.iloc[0:3, 0:4]
Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True

Conclusion

Selecting columns from a Pandas DataFrame is a common task in data manipulation. You can use the indexing operators '[]' or the DataFrame.loc[] method to efficiently retrieve one or multiple columns based on their labels or boolean conditions.