Ordinary Least Squares Regression | Python

Machine Learning (ML) develops algorithms (models) that can predict an output value with an acceptable error margin, based on a set of known input parameters. Ordinary Least Squares (OLS) is a form of regression, widely used in Machine Learning. The Ordinary Least Squares (OLS) regression technique falls under the Supervised Learning. It is a method for estimating the unknown parameters by creating a model which will minimize the sum of the squared errors between the observed data and the predicted one. This means that given a regression line through the data you calculate the distance from each data point to the regression line, square it, and sum all of the squared errors together. This is the quantity that ordinary least squares seeks to minimize.
what is Ordinary Least Squares Regression | python
OLS method works for both univariate dataset (single independent variables and single dependent variables) and multi-variate dataset (single independent variable set and multiple dependent variables sets). An example of a scenario in which one may use OLS (Ordinary Least Squares) is in predicting Food Price from a data set that includes Food Quality and Service Quality.

Ordinary Least Squares Example:

Consider the Restaurant data set: restaurants.csv . A restaurant guide collects several variables from a group of restaurants in a city. The description of the variables is given below:
Field Description
Restaurant_ID Restaurant Code
Food_Quality Measure of Quality Food in points
Service_Quality Measure of quality of Service in points
Price Price of meal

Restaurant data sample,

What is Multiple Linear Regression? - Python

Loading required Python packages

import pandas import statsmodels.api as sm

Importing dataset

The Python Pandas module allows you to read csv files and return a DataFrame object . The file is meant for testing purposes only, you can download it here: restaurants.csv .
df = pandas.read_csv("restaurants.csv")

From restaurants.csv dataset, use the variable Price of meal ('Price') as your response Y and Measure of Quality Food ('Food_Quality') as our predictor X.

X = df['Food_Quality'] Y = df['Price']

Fit the Model

The statsmodels object has a method called fit() that takes the independent(X ) and dependent(y) values as arguments. Add a constant term so that you fit the intercept of your linear model.
X = sm.add_constant(X) model = sm.OLS(Y, X).fit()


The summary() method is used to obtain a table which gives an extensive description about the regression results. Full Source | Python
import pandas import statsmodels.api as sm df = pandas.read_csv("restaurants.csv") X = df['Food_Quality'] Y = df['Price'] X = sm.add_constant(X) model = sm.OLS(Y, X).fit() summary = model.summary() print(summary)


Ordinary Least Squares in Python

Description of some of the terms in the table :

  1. R-squared - statistical measure of how well the regression line approximates the real data points.
  2. Adj. R-squared - actually adjusts the statistics based on the number of independent variables present.
  3. F-statistic - the ratio of mean squared error of the model to the mean squared error of residuals.
  4. AIC - estimates the relative quality of statistical models for a given dataset.
  5. BIC - used as a criterion for model selection among a finite set of models.
  6. coef - the coefficients of the independent variables and the constant term in the equation.
  7. std err - the basic standard error of the estimate of the coefficient.
  8. t - a measure of how statistically significant the coefficient is.
  9. P > |t| - the null-hypothesis that the coefficient = 0 is true.