R - Linear Regression

Simple linear regression is a statistical method that is used to model the relationship between two variables. The two variables are called the independent variable and the dependent variable. The independent variable is the variable that is believed to cause the change in the dependent variable. The dependent variable is the variable that is being predicted.

In simple linear regression, the equation for the line is:

y = mx + b

where y is the dependent variable, x is the independent variable, m is the slope of the line, and b is the y-intercept.

The slope of the line, m, tells us how much the dependent variable changes for every unit change in the independent variable. The y-intercept, b, tells us the value of the dependent variable when the independent variable is 0.

Simple Linear Regression in R

Simple linear regression can be performed in R using the lm() function. The lm() function takes two arguments:

  1. formula: A formula that specifies the relationship between the two variables. The formula is in the following format:
y ~ x

where y is the dependent variable and x is the independent variable.

  1. data: A data frame that contains the data for the two variables.

For example, the following code performs a simple linear regression to model the relationship between the height and weight of a group of people:

height <- c(160, 170, 180, 190, 200) weight <- c(60, 70, 80, 90, 100) data <- data.frame(height, weight) model <- lm(weight ~ height, data)

The output of the lm() function is an object that contains the results of the simple linear regression. You can use the summary() function to view the results of the simple linear regression:

summary(model)
#Output: Call: lm(formula = weight ~ height, data = data) Residuals: Min 1Q Median 3Q Max -15.875 -3.445 -0.816 2.325 13.515 Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 70.450 10.309 6.826 1.05e-09 height 1.500 0.250 6.000 1.44e-07 Residual standard error: 5.81 on 4 degrees of freedom Multiple R-squared: 0.816, Adjusted R-squared: 0.780 F-statistic: 36.00 on 1 and 4 DF, p-value: 1.44e-07

The summary of the linear regression shows that the coefficient of the independent variable, height, is 1.50. This means that for every 1 unit increase in height, there is a 1.50 unit increase in weight. The p-value for the coefficient of the independent variable is less than 0.05, which means that the coefficient is statistically significant. The R-squared value is 0.816, which means that 81.6% of the variation in weight is explained by the linear regression model.

Here are some other things to keep in mind about simple linear regression in R:

  1. The lm() function can also be used to perform multiple linear regression, which is a regression with multiple independent variables.
  2. The predict() function can be used to predict the value of the dependent variable for a given value of the independent variable.
  3. The plot() function can be used to plot the data and the fitted linear regression line.

Conclusion

Simple linear regression in R is a powerful tool for modeling relationships between two continuous variables and making predictions based on that relationship. It provides insights into how changes in the independent variable influence the dependent variable.