R-Squared in Regression Analysis
After building a Machine Learning model , you need to determine how well the model fits the data. R-squared is a statistical measure of how close the data are to the fitted regression line . It is the percentage of variation (from 0 to 1) explained by the relationship between two variables.

- 0% indicates that a low level of correlation, meaning a regression model that is not valid, but not in all cases.
- 100% indicates that two variables are perfectly correlated, i.e., with no variance at all.
R-squared manual calculation
import numpy as np
#manual calculation
actual = np.array([56,45,68,49,26,40,52,38,30,48])
predicted = np.array([58,42,65,47,29,46,50,33,31,47])
ssres = sum((actual - predicted)**2)
sstot = sum((actual-np.mean(actual))**2)
r2_m = 1-(ssres/sstot)
print("R-Squared:", r2_m)
R-Squared: 0.9262792714657415
R-squared using sklearn.metrics
import sklearn.metrics as metrics
actual = np.array([56,45,68,49,26,40,52,38,30,48])
predicted = np.array([58,42,65,47,29,46,50,33,31,47])
r2_sk = metrics.r2_score(actual,predicted)
print("R-Squared:", r2_sk)
R-Squared: 0.9262792714657415
Limitations of R-Squared :
R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data.
Related Topics