22  Generalized linear models

22.1 Model

\[\begin{align} g(\mathbb{E}(Y|X)) &= \beta_0 + \beta_1 x_1 + \dots + \beta_k x_k \\ \Leftrightarrow g(\mu) &= \beta^\top X \end{align}\]

There are three components to a GLM:

  1. Systematic component: The right-hand side is a linear combination of predictors: \(\beta_0 + \beta_1 x_1 + \dots + \beta_k x_k\).

  2. Random component: The outcome \(Y\) on the left-hand side, must follow a distribution from the exponential family.

  3. Link function: \(Y\) is wrapped in a link function \(g(Y)\) that relates the mean \(\mathbb{E}(Y)\) to the linear predictors.

22.3 Poisson regression

Poisson regression models are generalized linear models with the logarithm as the link function.

22.4 Offset

rate = count / unit time

offset converts a count to rate

week <- 1:7
amount_time <- c(0.85, 1.1, 0.85, 1.2, 0.95, 1.1, 0.9)
count <- c(90, 76, 37, 27, 19, 13, 9)
rate <- count / amount_time
df <- data.frame(week, amount_time, count, rate)

\[\log(\text{count}) = \beta_0 + \beta \cdot \text{week}\]

If we are fitting \(\text{rate}\) instead of \(\text{count}\):

\[\begin{align} \log(\text{rate}) = \log\left(\frac{\text{count}}{\text{amount time}}\right) &= \beta_0 + \beta \cdot \text{week} \\ \Leftrightarrow \log(\text{count}) - \log(\text{amount time}) &= \beta_0 + \beta \cdot \text{week} \\ \Leftrightarrow \log(\text{count}) &= \beta_0 + \beta \cdot \text{week} + \log(\text{amount time}) \end{align}\]

The \(\log(\text{amount time})\) is an offset.

library(ggplot2)
mod <- glm(count ~ week + offset(log(amount_time)), family = poisson)
mod

Call:  glm(formula = count ~ week + offset(log(amount_time)), family = poisson)

Coefficients:
(Intercept)         week  
      5.081       -0.435  

Degrees of Freedom: 6 Total (i.e. Null);  5 Residual
Null Deviance:      164.3 
Residual Deviance: 2.313    AIC: 42.68
pred <- predict(mod, type = "response")
ggplot(df, aes(x = week, y = count)) +
  geom_point() +
  geom_line(aes(y = pred)) +
  theme_classic()