Simple Linear Regression | Python
Simple Linear Regression is one of the most popular and well understood Machine Learning algorithms where the result is predicted by the use of known parameter which are correlated with the output. Out of the two variables, one variable is called the dependent variable (which is the variable we are trying to predict) , and the other variable is called the independent variable (input variable used in the prediction). When you have a single input variable and you want to use linear regression, this is called Simple Linear Regression . It is used as a predictive model that assumes a linear relationship between the dependent variable and the independent variable. This relationship represents how an input variable (independent variable) is related to the output variable (dependent variable) and how it is represented by a straight line . With Simple Linear Regression model your data as follows: y(pred) = B0 + B1 * x This is a line where y(pred) is the output variable you want to predict, x is the input variable and B0 and B1 are coefficients that you need to estimate that move the line around.Simple Linear Regression example
The following Machine Learning example create a dataset that has two variables: Stock_Value (dependent variable, y) and Interest_Rate e (Independent variable, x). The purpose of this example is:- Find out if there is any correlation between these two (x,y) variables.
- Find the best fit line for the dataset.
- How the output variable is changing by changing the input variable.
- Process the data
- Split your dataset to train and test
- Fitting the data to the Training Set
- Prediction of test set result
- Final Result
- Visualizing Results
Process the data
Here there are two variables Interest_Rate e (Independent variable) and Stock_Value (dependent variable). The first step is to create a Dataset by adding values to these variables.
import pandas as pd
df = pd.DataFrame({"Interest_Rate": ([2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75]),
"Stock_Value":[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876]})
You can see the dataset in your Spyder IDE screen by clicking on the variable explorer option.

The next step is to extract the dependent (Y) and independent (X) variables from the given dataset. The independent variable is Interest_Rate, and the dependent variable is Stock_Value.
X = df[['Interest_Rate']]
Y = df['Stock_Value']

In the above image, you can see the X (Interest_Rate) variable and Y (Stock_Value) variable has been extracted from the given dataset.
Split your dataset to train and test
The train-test split is a technique for evaluating the performance of a machine learning algorithm.- Train Set: Used to fit the machine learning model.
- Test Set: Used to evaluate the fit machine learning model.
So, the next step is to split both variables into the Test Set and Train Set. Here, split the variables into 30:70 , this means that 30% for test set and 70% for train set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(X, Y, test_size= 1/3, random_state=0)
Now, your dataset is well prepared to work on it and start building a Simple Linear Regression model for the given problem.
Fitting the data to the Training Set
In this step, fit your model to the training dataset.
from sklearn.linear_model import LinearRegression
rg= LinearRegression()
rg.fit(x_train, y_train)
Above code create an object of the class named as a "rg" and used a fit() method to fit you Simple Linear Regression object to the training set. The fit() function, train the dataset for the dependent (y_train) and an independent (x_train) variable. So that the model can easily learn the correlations between the predictor and target variables.
Prediction of test set result
Now your model is ready to predict the output for the new observations. Next, you have to provide the test dataset (new observations) to the model to check whether it can predict the correct output or not. So, you need to create a prediction vector y_pred , and x_pred , which will contain predictions of test dataset, and prediction of training set respectively.
y_pred= rg.predict(x_test)
x_pred= rg.predict(x_train)
Above code create two variables named y_pred and x_pred will generate in the variable explorer options that contain Stock Value predictions for the training set and test set.
Final Result
You can verify the result by clicking on the variable explorer option in the Spyder IDE , and also compare the result by comparing values from y_pred and y_test . By comparing these values(y_pred, x_pred), you can check how good you model is performing.Visualizing the Train set results
The scatter () function of pyplot library will create a scatter plot of observations. In the x-axis, plot the Interest_Rate and on the y-axis, Stock_Value.
mpl.scatter(x_train, y_train, color="green")
mpl.plot(x_train, x_pred, color="red")
mpl.title("Interest_Rate vs Stock_Value (Training Dataset)")
mpl.xlabel("Interest Rate")
mpl.ylabel("Stock Price")
mpl.show()
output 
Here you can see the observations in green dots and predicted values are covered by the red regression line . The regression line shows a correlation between the dependent and independent variable.
Visualizing the Test set results
In the above section you visualized the performance of you model on the training set . The next section is the same for the Test set. The only difference from above plot is that here use the x_test , and y_test instead of x_train and y_train.
mtp.scatter(x_test, y_test, color="blue")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Interest_Rate vs Stock_Value (Test Dataset)")
mtp.xlabel("Interest Rate")
mtp.ylabel("Stock Price")
mtp.show()
output 
Here you can see the observations given by the blue color, and prediction is given by the red regression line . Also, you can see most of the observations are close to the regression line, hence you can confirm your Simple Linear Regression is a good model and able to make good predictions.
Full Source | Python
import numpy as nm
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as mpl
df = pd.DataFrame({"Interest_Rate": ([2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75]),
"Stock_Value":[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876]})
X = df[['Interest_Rate']]
Y = df['Stock_Value']
x_train, x_test, y_train, y_test= train_test_split(X, Y, test_size= 1/3, random_state=0)
rg= LinearRegression()
rg.fit(x_train, y_train)
y_pred= rg.predict(x_test)
x_pred= rg.predict(x_train)
mpl.scatter(x_train, y_train, color="green")
mpl.plot(x_train, x_pred, color="red")
mpl.title("Interest_Rate vs Stock_Value (Training Dataset)")
mpl.xlabel("Interest Rate")
mpl.ylabel("Stock Price")
mpl.show()
mtp.scatter(x_test, y_test, color="blue")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Interest_Rate vs Stock_Value (Test Dataset)")
mtp.xlabel("Interest Rate")
mtp.ylabel("Stock Price")
mtp.show()
Related Topics
- Multiple Linear Regression | Python Data Science
- Ordinary Least Squares Regression | Python Data Science
- Polynomial Regression | Python
- Logistic Regression | Python Machine Learning
- K-Nearest Neighbor(KNN) | Python Machine Learning
- Decision Tree in Machine Learning | Python
- Random Forest | Python Machine Learning
- Support Vector Machine | Python Machine Learning