Drop Rows with NaN Values in Pandas DataFrame
NaN stands for Not A Number . Pandas DataFrame treat None values and NaN as essentially interchangeable for showing missing or null values . Missing values is a very big problem in real life cases. In some cases you have to find and remove this missing values from DataFrame. Pandas dropna() method allows you to find and delete Rows/Columns with NaN values in different ways.
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
First let's create a data frame with values.
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Name'] = ['John', 'Doe', 'Bill',np.nan,'Harry','Ben']
df['TotalMarks'] = [82, np.nan, 63,np.nan,55,40]
df['Grade'] = ['A', 'E', 'B',np.nan,'C','D']
df['Promoted'] = [np.nan, np.nan,np.nan,np.nan,np.nan,'True']
df
Name TotalMarks Grade Promoted
0 John 82.0 A NaN
1 Doe NaN E NaN
2 Bill 63.0 B NaN
3 NaN NaN NaN NaN
4 Harry 55.0 C NaN
5 Ben 40.0 D True
How to check if any value is NaN in a Pandas DataFrame
df.isnull().sum()
Name 1
TotalMarks 2
Grade 1
Promoted 5
Above output shows how many null values is each column in a DataFrame.
How to drop all rows that have at least one NaN values
df.dropna()
Name TotalMarks Grade Promoted
5 Ben 40.0 D True
Above output returned only one row because at least one NaN values in every other rows.
The axis parameter tells the dropna() function whether you want to drop rows (axis=0) or drop columns (axis=1).
df.dropna(axis=1)
Columns: []
Index: [0, 1, 2, 3, 4, 5]
Above output returned no rows because all column have at least one NaN value.
How to drop a row whose particular column is NaN?
You can use dropna() with parameter subset for specify column for check NaNs:
df.dropna(subset=['TotalMarks'])
Name TotalMarks Grade Promoted
0 John 82.0 A NaN
2 Bill 63.0 B NaN
4 Harry 55.0 C NaN
5 Ben 40.0 D True
In the above output, the second and fourth row is missing because in that row the 'TotalMarks' column have NaN values .
If you want to find a particular column have NaN values:
df[df['TotalMarks'].isnull()]
Name TotalMarks Grade Promoted
1 Doe NaN E NaN
3 NaN NaN NaN NaN
Above output returned two rows. This means that the column 'TotalMarks' have two NaN value.
How to drop rows only if ALL columns are NaN
df.dropna(how='all')
Name TotalMarks Grade Promoted
0 John 82.0 A NaN
1 Doe NaN E NaN
2 Bill 63.0 B NaN
4 Harry 55.0 C NaN
5 Ben 40.0 D True
Here you can see the fourth row is missing because in that particular rows all column value have NaN values .
How to drop row if it does not have at least two values that are not NaN
df.dropna(thresh=2)
Name TotalMarks Grade Promoted
0 John 82.0 A NaN
1 Doe NaN E NaN
2 Bill 63.0 B NaN
4 Harry 55.0 C NaN
5 Ben 40.0 D True
Here also you can see the fourth rows is missing because it has more than two NaN values .
Related Topics
- Creating an empty Pandas DataFrame
- How to Check if a Pandas DataFrame is Empty
- How to check if a column exists in Pandas Dataframe
- How to delete column from pandas DataFrame
- How to select multiple columns from Pandas DataFrame
- Selecting multiple columns in a Pandas dataframe based on condition
- Selecting rows in pandas DataFrame based on conditions
- How to Drop rows in DataFrame by conditions on column values
- Rename column in Pandas DataFrame
- Get a List of all Column Names in Pandas DataFrame
- How to add new columns to Pandas dataframe?
- Change the order of columns in Pandas dataframe
- Concatenate two columns into a single column in pandas dataframe
- How to count the number of rows and columns in a Pandas DataFrame
- Use a list of values to select rows from a pandas dataframe
- How to iterate over rows in a DataFrame in Pandas
- How to Export Pandas DataFrame to a CSV File
- Convert list of dictionaries to a pandas DataFrame
- How to set a particular cell value in pandas DataFrame