R-Squared in Regression Analysis

After building a Machine Learning model , you need to determine how well the model fits the data. R-squared is a statistical measure of how close the data are to the fitted regression line . It is the percentage of variation (from 0 to 1) explained by the relationship between two variables. How to Program R Squared - python R-squared is a statistical measure of the proportion of variance in the dependent variable explained by the independent variable(s). This mean that, it is a comparison of Residual sum of squares (SSres) with total sum of squares(SStot). Residual for a point in the data is the difference between the actual value and the value predicted by your linear regression model . Using the residual values, you can determine the sum of squares of the residuals also known as SSres (Residual sum of squares). Interpret R-squared and Goodness-of-Fit in Regression Analysis The is calculated by dividing sum of squares of residuals from the regression model (SSres) by total sum of squares of errors from the average model (given by SStot ) and then subtract it from 1. R-squared is always between 0 and 100%:
  1. 0% indicates that a low level of correlation, meaning a regression model that is not valid, but not in all cases.
  2. 100% indicates that two variables are perfectly correlated, i.e., with no variance at all.

R-squared manual calculation

import numpy as np #manual calculation actual = np.array([56,45,68,49,26,40,52,38,30,48]) predicted = np.array([58,42,65,47,29,46,50,33,31,47])
ssres = sum((actual - predicted)**2) sstot = sum((actual-np.mean(actual))**2) r2_m = 1-(ssres/sstot) print("R-Squared:", r2_m)
R-Squared: 0.9262792714657415

R-squared using sklearn.metrics

import sklearn.metrics as metrics actual = np.array([56,45,68,49,26,40,52,38,30,48]) predicted = np.array([58,42,65,47,29,46,50,33,31,47])
r2_sk = metrics.r2_score(actual,predicted) print("R-Squared:", r2_sk)
R-Squared: 0.9262792714657415

Limitations of R-Squared :

R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data.