How to calculate Inter-Quartile Range (IQR)
The Inter-Quartile Range (IQR) is a way to measure the spread of the middle 50% of a dataset. It is the difference between the 75th percentile Q3 (0.75 quartile) and the 25th percentile Q1 (0.25 quartile)of a dataset. Also, it can be used to detect outliers in the data. IQR = Q3 – Q1Interquartile Range of a single array
import numpy as np
#define data
data = np.array([18, 22, 32, 38, 41, 46, 53, 58, 67, 71, 78, 84, 91, 98])
#find quarter-3 and quarter-1
q3, q1 = np.percentile(data, [75 ,25])
#calculate the interquartile range
iqr = q3 - q1
print("Interquartile Range : " , iqr)
Interquartile Range : 37.5
Interquartile Range of a single column in a DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame([[32, 24, 30, 40], [17, 24, 21, 28], [50, 25, 28, 32],
[25, 34, 21, 48], [17, 31, 18, 28], [35, 24, 19, 42]],
columns=['Physics', 'Chemistry', 'Biology', 'Maths'],
index=['Student-1', 'Student-2', 'Student-3', 'Student-4',
'Student-5', 'Student-6'])
#find quarter-3 and quarter-1
q3, q1 = np.percentile(df['Chemistry'], [75 ,25])
#calculate the interquartile range
iqr = q3 - q1
print("Interquartile Range : " , iqr)
Physics Chemistry Biology Maths
Student-1 32 24 30 40
Student-2 17 24 21 28
Student-3 50 25 28 32
Student-4 25 34 21 48
Student-5 17 31 18 28
Student-6 35 24 19 42
Interquartile Range : 5.5
Interquartile Range of multiple columns in a DataFrame
If you want to find Inter-Quartile Range of multiple columns in a DataFrame, you have to define function to calculate interquartile range of a single column in the DataFrame and then pass multiple column name to that DataFrame.
#define function to calculate interquartile range of a single column
def single_iqr(x):
return np.subtract(*np.percentile(x, [75, 25]))
#calculate IQR for 'Physics' and 'Chemistry' columns
df[['Physics', 'Chemistry']].apply(single_iqr)
Physics 15.25
Chemistry 5.50
If you want to find out Inter-Quartile Range of all columns in a DataFrame:
#calculate IQR for 'for all columns
df.apply(single_iqr)
Physics 15.25
Chemistry 5.50
Biology 6.75
Maths 12.50
How to Validate?
Above coding is find the IQT from scratch. If you want to save your time, you can use iqr() function from scipy.stats.
from scipy.stats import iqr
iqr(df['Physics'])
15.25
Visualization
Let’s plot the 25th percentile , the 50th percentile (median) and the 75th percentile of the DataFrame.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import numpy as np
# plot the dataframe as needed
ax = df.plot.box(figsize=(8, 6), showmeans=True)
ax.grid()

Related Topics