R - Poisson Regression

Poisson regression is a type of regression analysis that is used to model count data. Count data is data that represents the number of times an event occurs. For example, the number of cars that pass a certain point in an hour, the number of employees who call in sick on a given day, or the number of customers who visit a store in a month.

Poisson regression is a generalized linear model (GLM) that assumes the dependent variable follows a Poisson distribution. The Poisson distribution is a discrete probability distribution that describes the probability of a given number of events occurring in a fixed interval of time or space.

The Poisson regression model is fitted using a method called maximum likelihood estimation. Maximum likelihood estimation is a statistical method that finds the parameters of a model that maximize the likelihood of the data.

Poisson Regression in R

Poisson regression can be performed in R using the glm() function. The glm() function takes three arguments:

  1. formula: A formula that specifies the relationship between the dependent variable and the independent variables. The formula is in the following format:
y ~ x1 + x2 + x3 + ... + n

where y is the dependent variable and x1, x2, x3, ..., n are the independent variables.

  1. family: A family object that specifies the distribution of the dependent variable. For Poisson regression, the family object is poisson.
  2. data: A data frame that contains the data for the dependent variable and the independent variables.

For example, the following code performs a Poisson regression to predict the number of car accidents that occur in a given city, given the number of cars in the city:

cars <- c(1000, 2000, 3000, 4000, 5000) accidents <- c(10, 20, 30, 40, 50) data <- data.frame(cars, accidents) model <- glm(accidents ~ cars, family = poisson, data = data)

The output of the glm() function is an object that contains the results of the Poisson regression. You can use the summary() function to view the results of the Poisson regression:

summary(model)
#Output: Call: glm(formula = accidents ~ cars, family = poisson, data = data) Deviance Residuals: Min 1Q Median 3Q Max -2.0045 -0.8129 -0.3278 0.5285 2.3701 Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) -1.7642 0.6310 -2.808 0.00519 ** cars 0.0118 0.0036 3.279 0.00105 ** Log-Likelihood: -10.131 on 3 degrees of freedom AIC: 20.262

The summary of the Poisson regression shows that the coefficient of the independent variable, cars, is statistically significant. This means that the number of cars in the city is affecting the number of car accidents that occur. The estimated coefficient is 0.0118, which means that for every 1 unit increase in the number of cars, the expected number of car accidents increases by 0.0118.

Conclusion

Poisson regression is a valuable tool for modeling count data and understanding the factors that influence event occurrences. It is widely used in various fields for analyzing and predicting count-based outcomes.