# R-Squared in Regression Analysis

After building a**Machine Learning model**, you need to determine how well the model fits the data.

**R-squared**is a statistical measure of how close the data are to the fitted

**regression line**. It is the percentage of variation (from 0 to 1) explained by the relationship between two variables. R-squared is a

**statistical measure**of the proportion of variance in the dependent variable explained by the independent variable(s). This mean that, it is a comparison of

**Residual sum of squares**(SSres) with total sum of squares(SStot). Residual for a point in the data is the difference between the

**actual value**and the value predicted by your linear

**regression model**. Using the residual values, you can determine the sum of squares of the residuals also known as

**SSres**(Residual sum of squares). The

**R²**is calculated by dividing sum of squares of residuals from the

**regression model**(SSres) by total sum of squares of errors from the average model (given by

**SStot**) and then subtract it from 1.

**R-squared is always between 0 and 100%:**

- 0% indicates that a low level of correlation, meaning a regression model that is not valid, but not in all cases.
- 100% indicates that two variables are perfectly correlated, i.e., with no variance at all.

## R-squared manual calculation

import numpy as np
#manual calculation
actual = np.array([56,45,68,49,26,40,52,38,30,48])
predicted = np.array([58,42,65,47,29,46,50,33,31,47])

ssres = sum((actual - predicted)**2)
sstot = sum((actual-np.mean(actual))**2)
r2_m = 1-(ssres/sstot)
print("R-Squared:", r2_m)

R-Squared: 0.9262792714657415

## R-squared using sklearn.metrics

import sklearn.metrics as metrics
actual = np.array([56,45,68,49,26,40,52,38,30,48])
predicted = np.array([58,42,65,47,29,46,50,33,31,47])

r2_sk = metrics.r2_score(actual,predicted)
print("R-Squared:", r2_sk)

R-Squared: 0.9262792714657415

**Limitations of R-Squared :**

R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data.

**Related Topics**