Check for NaN Values : Pandas DataFrame

In Pandas, a DataFrame is a two-dimensional tabular data structure that allows you to store and manipulate data efficiently. Checking for NaN (Not A Number) values is a crucial step in data analysis and data cleaning, as missing data can significantly impact the accuracy and validity of your results.

Pandas provides two main methods for checking NaN values in a DataFrame: isnull() and isna(). Both methods return a DataFrame of the same shape as the input DataFrame, but with boolean values indicating whether each element is NaN or not. A True value indicates a NaN value, while False indicates a non-NaN value.

Check for single column

df[ColumnName].isnull().values.any()

Count the NaN under a single column

df[ColumnName].isnull().values.sum()

isnull() method

The isnull() method is used to detect missing values (NaN) in a DataFrame. It returns a DataFrame with the same shape, where each element is True if it's NaN and False otherwise.

import pandas as pd # Create a sample DataFrame with some NaN values data = { 'Name': ['Alice', 'Bob', 'Charlie', None], 'Age': [25, 30, None, 22], 'Salary': [50000, None, 60000, 45000] } df = pd.DataFrame(data) # Check for NaN values using isnull() result_isnull = df.isnull() print(result_isnull)
Output: Name Age Salary 0 False False False 1 False False True 2 False True False 3 True False False

In this example, result_isnull is a DataFrame with the same shape as the original DataFrame df. It indicates that the first row has no NaN values, the second row has a NaN value in the 'Salary' column, the third row has a NaN value in the 'Age' column, and the fourth row has a NaN value in the 'Name' column.

Check for NaN under entire DataFrame

df.isnull().values.any()

Count the NaN under entire DataFrame

df.isnull().sum().sum()

isna() method

The isna() method is an alias for isnull(), meaning they are entirely interchangeable. Both methods serve the same purpose of detecting NaN values.

import pandas as pd # Create a sample DataFrame with some NaN values data = { 'Name': ['Alice', 'Bob', 'Charlie', None], 'Age': [25, 30, None, 22], 'Salary': [50000, None, 60000, 45000] } df = pd.DataFrame(data) # Check for NaN values using isna() (equivalent to isnull()) result_isna = df.isna() print(result_isna)
Output: Name Age Salary 0 False False False 1 False False True 2 False True False 3 True False False

As you can see, the result_isna DataFrame is identical to the previous result_isnull DataFrame, confirming that both methods produce the same output.

Which rows have NaNs in a specific column

df[df[ColumnName].isnull()]

Which rows have NaN values

df[df.isnull().any(1)]

How many rows there are with "one or more NaNs"

df.isnull().T.any().T.sum()

Display the columns that has nulls

df.loc[:, df.isnull().any()].columns

Check the percentage of nulls in every column

df.isna().sum()/(len(df))*100

Conclusion

Pandas provides the isnull() and isna() methods to efficiently detect NaN values in a DataFrame, and you can use them interchangeably. By understanding and utilizing these methods, you can identify and handle missing data effectively in your data analysis workflows.