Iterating over rows and columns in Pandas DataFrame

By: Rajesh P.S.

Iterating over rows and columns in a Pandas DataFrame can be done using various methods, but it is generally recommended to avoid explicit iteration whenever possible, as it can be slow and less efficient compared to using vectorized operations offered by Pandas. Instead, try to utilize built-in functions and methods provided by Pandas, which are optimized for handling large datasets and can provide faster execution times.

First let's create a data frame with values.

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['TotalMarks'] = [82, 38, 63,22,55,40] df['Grade'] = ['A', 'E', 'B','E','C','D'] df['Promoted'] = [True, False,True,False,True,True] df

Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True 3 Jim 22 E False 4 Harry 55 C True 5 Ben 40 D True

Continue Reading...

A simple for loop through Pandas DataFrame using index

for index in df.index: print(df['Name'][index], " , " , df['Promoted'][index])

John , True Doe , False Bill , True Jim , False Harry , True Ben , True

Important!!

Iterating through Pandas DataFrame using traditional loops is generally slow and not the most efficient way to work with data in Pandas. As you mentioned, it is better to explore alternative methods like List Comprehensions, vectorized operations, or using DataFrame.apply() to take advantage of Pandas' built-in optimization and avoid the performance drawbacks associated with explicit iteration. These approaches are more efficient and provide better performance when working with large datasets in Pandas.

Pandas DataFrame loop using list comprehension

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])] result

[('John', True, 'A'), ('Doe', False, 'E'), ('Bill', True, 'B'), ('Jim', False, 'E'), ('Harry', True, 'C'), ('Ben', True, 'D')]

Pandas DataFrame loop using DataFrame.apply()

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1) result

0 John , 82 , A 1 Doe , 38 , E 2 Bill , 63 , B 3 Jim , 22 , E 4 Harry , 55 , C 5 Ben , 40 , D

**Other Pandas DataFrame looping methods (DON'T*!)

Using loc()

for i in range(len(df)) : print(df.loc[i, "Name"], ", " , df.loc[i, "Promoted"])

John , True Doe , False Bill , True Jim , False Harry , True Ben , True

Using iloc[]

for i in range(len(df)) : print(df.iloc[i, 0], " , " ,df.iloc[i, 1], " , " , df.iloc[i, 3])

John , 82 , True Doe , 38 , False Bill , 63 , True Jim , 22 , False Harry , 55 , True Ben , 40 , True

Using iterrows()

for index, row in df.iterrows(): print (row["Name"], " , " , row["TotalMarks"] , " , " , row["Grade"])

John , 82 , A Doe , 38 , E Bill , 63 , B Jim , 22 , E Harry , 55 , C Ben , 40 , D

Using itertuples()

for row in df.itertuples(index = True, name ='Pandas'): print (getattr(row, "Name"), " , " , getattr(row, "TotalMarks"))

John , 82 Doe , 38 Bill , 63 Jim , 22 Harry , 55 Ben , 40

Using iteritems()

for key, value in df.iteritems(): print(key, value) print()

Name 0 John 1 Doe 2 Bill 3 Jim 4 Harry 5 Ben Name: Name, dtype: object TotalMarks 0 82 1 38 2 63 3 22 4 55 5 40 Name: TotalMarks, dtype: int64 Grade 0 A 1 E 2 B 3 E 4 C 5 D Name: Grade, dtype: object Promoted 0 True 1 False 2 True 3 False 4 True 5 True Name: Promoted, dtype: bool

Conclusion

Iterating over rows and columns in a Pandas DataFrame using traditional loops is generally slow and considered an anti-pattern. Instead, it is recommended to explore more efficient methods like List Comprehensions, vectorized operations, or using DataFrame.apply() to take advantage of Pandas' built-in optimizations for better performance.

Next > How to drop rows/columns of Pandas DataFrame whose value is NaN

Related Topics

More Related Topics.....