<- 1:7
week <- c(0.85, 1.1, 0.85, 1.2, 0.95, 1.1, 0.9)
amount_time <- c(90, 76, 37, 27, 19, 13, 9)
count <- count / amount_time
rate <- data.frame(week, amount_time, count, rate) df
22 Generalized linear models
22.1 Model
\[\begin{align} g(\mathbb{E}(Y|X)) &= \beta_0 + \beta_1 x_1 + \dots + \beta_k x_k \\ \Leftrightarrow g(\mu) &= \beta^\top X \end{align}\]
There are three components to a GLM:
Systematic component: The right-hand side is a linear combination of predictors: \(\beta_0 + \beta_1 x_1 + \dots + \beta_k x_k\).
Random component: The outcome \(Y\) on the left-hand side, must follow a distribution from the exponential family.
Link function: \(Y\) is wrapped in a link function \(g(Y)\) that relates the mean \(\mathbb{E}(Y)\) to the linear predictors.
22.2 Canonical link function
A canonical link function is the link function that directly connects the mean \(\mu\) of a distribution to its natural (canonical) parameter \(\theta\) in the exponential family form. Which means we want:
\[g(\mu) = \beta^\top X = \theta\]
For instance, suppose \(Y\) follows a Poisson distribution. We know from the exponential‐family representation Table 27.1, that the canonical parameter \(\theta = \log(\lambda)\), where \(\lambda = \mu\) because \(\mathbb{E}(Y) = \lambda\) for Poisson.
Therefore, a log link function is the canonical link for a Poisson distribution:
\[g(\mu) = \log(\mu) = \log(\lambda) = \theta\]
Likewise, you can ask ChatGPT to prove that a logit link is the canonical link if \(Y\) follows a Bernoulli, or Binomial distribution, and many other cases.
22.3 Poisson regression
Poisson regression models are generalized linear models with the logarithm as the link function.
22.4 Offset
rate = count / unit time
offset converts a count to rate
\[\log(\text{count}) = \beta_0 + \beta \cdot \text{week}\]
If we are fitting \(\text{rate}\) instead of \(\text{count}\):
\[\begin{align} \log(\text{rate}) = \log\left(\frac{\text{count}}{\text{amount time}}\right) &= \beta_0 + \beta \cdot \text{week} \\ \Leftrightarrow \log(\text{count}) - \log(\text{amount time}) &= \beta_0 + \beta \cdot \text{week} \\ \Leftrightarrow \log(\text{count}) &= \beta_0 + \beta \cdot \text{week} + \log(\text{amount time}) \end{align}\]
The \(\log(\text{amount time})\) is an offset.
library(ggplot2)
<- glm(count ~ week + offset(log(amount_time)), family = poisson)
mod mod
Call: glm(formula = count ~ week + offset(log(amount_time)), family = poisson)
Coefficients:
(Intercept) week
5.081 -0.435
Degrees of Freedom: 6 Total (i.e. Null); 5 Residual
Null Deviance: 164.3
Residual Deviance: 2.313 AIC: 42.68
<- predict(mod, type = "response")
pred ggplot(df, aes(x = week, y = count)) +
geom_point() +
geom_line(aes(y = pred)) +
theme_classic()