Randomly Shuffle DataFrame Rows in Pandas

You can use the following methods to shuffle DataFrame rows:

  1. Using pandas
pandas.DataFrame.sample()
  1. Using numpy
numpy.random.permutation()
  1. Using sklearn
sklearn.utils.shuffle()

Lets create a DataFrame..

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill','Jim','Harry','Ben'] df['TotalMarks'] = [82, 38, 63,22,55,40] df['Grade'] = ['A', 'E', 'B','E','C','D'] df['Promoted'] = [True, False,True,False,True,True] df
Name TotalMarks Grade Promoted 0 John 82 A True 1 Doe 38 E False 2 Bill 63 B True 3 Jim 22 E False 4 Harry 55 C True 5 Ben 40 D True

pandas.DataFrame.sample()

Shuffling the rows of the Pandas DataFrame using the sample() method with the parameter frac, The frac argument specifies the fraction of rows to return in the random sample.
df.sample(frac=1)

shuffle dataframe sample
Name TotalMarks Grade Promoted 3 Jim 22 E False 0 John 82 A True 5 Ben 40 D True 1 Doe 38 E False 2 Bill 63 B True 4 Harry 55 C True
Argument frac=1 means return all rows.

If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

numpy.random.permutation()

tmpDF =df.iloc[np.random.permutation(df.index)].reset_index(drop=True) tmpDF
Name TotalMarks Grade Promoted 0 Jim 22 E False 1 John 82 A True 2 Ben 40 D True 3 Doe 38 E False 4 Bill 63 B True 5 Harry 55 C True
Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

sklearn.utils.shuffle()

from sklearn.utils import shuffle shuffle(df)

shuffle dataframe sklearn
Name TotalMarks Grade Promoted 5 Ben 40 D True 4 Harry 55 C True 1 Doe 38 E False 3 Jim 22 E False 0 John 82 A True 2 Bill 63 B True