Model Evaluation and Model Validation

Once the model is built, the next step is to evaluate and validate the model. Model Evaluation is an essential part of the model development process . It is used to test the final performance of the algorithm and is done on the test set. Also, it helps to find the best model that represents your data and how well the chosen model will work in the future. Model validation is the set of processes and activities intended to confirm that models are performing as expected. Effective validation helps you to ensure that models are sound. Also, it identifies potential limitations and speculations, and assesses their possible impact. There are multiple measures that can be used to find out how good a regression model is predicting or how good a classifier is classifying the data. Key terms and concepts:

Model Evaluation and Model  Validation

R-squared (R2)

The coefficient of determination R2 (R-squared) is a statistical measure in a regression model that explains to what extent the percentage of variance of one variable explains the variance of the second variable. R-squared = Explained variation / Total variation

It measures the strength of the relationship between your model and the dependent variable on a convenient (0 – 100%) scale.

Adjusted R Squared (Adj R2)

The Adjusted R-squared is a modified version of R-squared that measures the proportion of variation explained by only those independent variables that really help in explaining the dependent variable. It takes into account the number of independent variables used for predicting the target variable . It has been adjusted for the number of predictors in the model. It increases when the new term improves the model more than would be expected by chance while it decreases when a predictor improves the model by less than expected.

Mean Absolute Percentage Error (MAPE)

Error metrics help you to evaluate the efficiency of the model. Mean Absolute Percentage Error (MAPE) is a statistical error metric that estimates the accuracy of a model for the input dataset, and benchmarking your forecasting process . In other words, it define the accuracy of a machine learning algorithm on a particular dataset. Also, it is used as a figure of merit to identify whether a data mining method is performing well or not.

Mean Squared Error (MSE)

The Mean Squared Error (MSE) is a measure of how close a fitted line is to data points. It is the sum of the square of the difference between the predicted and actual target variables . This means that the average squared difference between the estimated values and what is estimated.

Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) is just the square root of the Mean Squared Error (MSE). To construct the Root Mean Squared Error (RMSE), you first need to determine the residuals. Residuals are the difference between the actual values and the predicted values. Squaring the residuals , averaging the squares, and taking the square root gives you the Root Mean Squared Error (RMSE).

Analysis of Residuals

Residual is the difference between the observed value and the predicted value . Observed value is the actual data point while predicted value is the value obtained from the regression equation. Analysis of Residuals is a mathematical method for checking if a regression model is a "good fit" . It is used to test the validity of the statistical model and to control the assumptions made on the error term. It plays an important role in validating the regression model. For a good model, the residuals have to be random and normally distributed