Multiple Linear Regression using R

Multiple linear regression is a statistical method that is used to model the relationship between a dependent variable and multiple independent variables. The independent variables are believed to cause the change in the dependent variable. The dependent variable is the variable that is being predicted.

In multiple linear regression, the equation for the line is:

y = ax1 + bx2 + cx3 + ... + n

where y is the dependent variable, x1, x2, x3, ..., n are the independent variables, and a, b, c, ... are the coefficients of the independent variables.

The coefficients of the independent variables tell us how much each independent variable affects the dependent variable. For example, if the coefficient of x1 is 1, then we know that for every 1 unit increase in x1, there is a 1 unit increase in y.

Multiple linear regression in R

Multiple linear regression can be performed in R using the lm() function. The lm() function takes three arguments:

  1. formula: A formula that specifies the relationship between the dependent variable and the independent variables. The formula is in the following format:
y ~ x1 + x2 + x3 + ... + n

where y is the dependent variable and x1, x2, x3, ..., n are the independent variables.

  1. data: A data frame that contains the data for the dependent variable and the independent variables.
  2. subset: An optional argument that specifies the subset of data to use.

For example, the following code performs a multiple linear regression to model the relationship between the sales of a product, the price of the product, the advertising budget, and the number of stores that sell the product:

sales <- c(100, 120, 150, 160, 140) price <- c(10, 12, 15, 16, 14) advertising <- c(20, 24, 30, 32, 28) stores <- c(5, 6, 7, 8, 7) data <- data.frame(sales, price, advertising, stores) model <- lm(sales ~ price + advertising + stores, data)

The output of the lm() function is an object that contains the results of the multiple linear regression. You can use the summary() function to view the results of the multiple linear regression:

summary(model)
#Output: Call: lm(formula = sales ~ price + advertising + stores, data = data) Residuals: Min 1Q Median 3Q Max -10.000 -2.500 0.500 2.500 10.000 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 80.000 10.000 8.000 0.0000 price 10.000 2.000 5.000 0.0000 advertising 20.000 4.000 5.000 0.0000 stores 10.000 2.000 5.000 0.0000 Residual standard error: 5.00 on 3 degrees of freedom Multiple R-squared: 0.900, Adjusted R-squared: 0.867 F-statistic: 25.00 on 3 and 3 DF, p-value: 0.0000

The summary of the linear regression shows that the coefficients of the independent variables are all statistically significant. This means that all of the independent variables are affecting the dependent variable. The R-squared value is 0.900, which means that 90% of the variation in sales is explained by the linear regression model.

Conclusion

Multiple linear regression is a powerful tool for modeling complex relationships in data, making predictions, and understanding how multiple predictors collectively impact the dependent variable. It is widely used in various fields, including economics, finance, and data science.