Drop Rows with NaN Values in Pandas DataFrame

NaN stands for "Not a Number," and Pandas treats NaN and None values as interchangeable representations of missing or null values. The presence of missing values can be a significant challenge in data analysis. The dropna() method in Pandas provides a way to identify and remove rows or columns containing NaN values from a DataFrame using various strategies.

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

First let's create a data frame with values.

import pandas as pd import numpy as np df = pd.DataFrame() df['Name'] = ['John', 'Doe', 'Bill',np.nan,'Harry','Ben'] df['TotalMarks'] = [82, np.nan, 63,np.nan,55,40] df['Grade'] = ['A', 'E', 'B',np.nan,'C','D'] df['Promoted'] = [np.nan, np.nan,np.nan,np.nan,np.nan,'True'] df
Name TotalMarks Grade Promoted 0 John 82.0 A NaN 1 Doe NaN E NaN 2 Bill 63.0 B NaN 3 NaN NaN NaN NaN 4 Harry 55.0 C NaN 5 Ben 40.0 D True

How to check if any value is NaN in a Pandas DataFrame

To check if any value is NaN in a Pandas DataFrame, you can use the isnull() method followed by the sum() method. The isnull() method creates a DataFrame of the same shape as the original one, where each element is a boolean value indicating if it is NaN or not. The sum() method then counts the number of True values in each column, effectively giving you the count of NaN values in each column of the DataFrame.

df.isnull().sum()
Name 1 TotalMarks 2 Grade 1 Promoted 5

Above output shows how many null values is each column in a DataFrame.

How to drop all rows that have at least one NaN values

To drop all rows that have at least one NaN value in a Pandas DataFrame, you can use the dropna() method without any arguments. By default, the dropna() method will remove any row that contains at least one NaN value, effectively dropping all rows with missing values from the DataFrame.

df.dropna()
Name TotalMarks Grade Promoted 5 Ben 40.0 D True

Above output returned only one row because at least one NaN values in every other rows.

The axis parameter in the dropna() function is used to specify whether you want to drop rows or columns with NaN values. By default, axis=0, which means the function will drop rows with NaN values. If you want to drop columns with NaN values, you can set axis=1.

df.dropna(axis=1)
Columns: [] Index: [0, 1, 2, 3, 4, 5]

Above output returned no rows because all column have at least one NaN value.

How to drop a row whose particular column is NaN?

The dropna() function in Pandas allows you to specify a subset of columns for checking NaN values. By using the subset parameter, you can indicate which specific columns you want to consider while dropping rows or columns with NaN values. This is helpful when you only want to drop rows or columns that have NaNs in specific columns of interest, rather than the entire DataFrame.

df.dropna(subset=['TotalMarks'])
Name TotalMarks Grade Promoted 0 John 82.0 A NaN 2 Bill 63.0 B NaN 4 Harry 55.0 C NaN 5 Ben 40.0 D True

In the above output, the second and fourth row is missing because in that row the 'TotalMarks' column have NaN values .

If you want to find a particular column have NaN values:

df[df['TotalMarks'].isnull()]
Name TotalMarks Grade Promoted 1 Doe NaN E NaN 3 NaN NaN NaN NaN

Above output returned two rows. This means that the column 'TotalMarks' have two NaN value.

How to drop rows only if ALL columns are NaN

The dropna() function in Pandas allows you to drop rows (axis=0) or columns (axis=1) based on NaN values. By using the how parameter with the value 'all', you can specify that you want to drop rows only if all columns in that row have NaN values. This means that if there is at least one non-NaN value in any column of a row, that row will not be dropped.

df.dropna(how='all')
Name TotalMarks Grade Promoted 0 John 82.0 A NaN 1 Doe NaN E NaN 2 Bill 63.0 B NaN 4 Harry 55.0 C NaN 5 Ben 40.0 D True

Here you can see the fourth row is missing because in that particular rows all column value have NaN values .

How to drop row if it does not have at least two values that are not NaN

df.dropna(thresh=2)
Name TotalMarks Grade Promoted 0 John 82.0 A NaN 1 Doe NaN E NaN 2 Bill 63.0 B NaN 4 Harry 55.0 C NaN 5 Ben 40.0 D True

Here also you can see the fourth rows is missing because it has more than two NaN values .

Conclusion

In Pandas DataFrame, you can use the dropna() function to remove rows containing NaN values. By default, this function drops any row that has at least one NaN value. However, you can customize this behavior using the subset and how parameters to specify which columns to check for NaN values and whether to drop rows only if all columns are NaN, respectively.