library(tidyr)
library(odin)
library(bbmle)
library(ggplot2)
library(glue)
library(ggsci)
library(RColorBrewer)
<- read.csv("data/flu_inc.csv") df
8 Maximum likelihood estimation
8.1 Likelihood
\[\mathbb{L}(\theta | x) = \mathbb{P}(x | \theta)\]
Likelihood function is the probability of observing the data \(x\) assuming that \(\theta\) are the true parameters.
8.2 Methods
The maximum likelihood estimation follows 4 steps:
- Step 1. Write down the likelihood function \(L(\theta)\). Likelihood function is a function of \(\theta\), which is the product of the \(n\) mass/density function terms of the observed values \(y_i\).
\[\mathbb{L}(\theta) = \prod_{i = 1}^{n} f_Y(y_i| \theta)\]
- Step 2. Take the natural log of the likelihood \(log L(\theta)\), or log-likelihood. The production in likelihood now becomes the sum in log-likelihood. See Section 1.2.2 for more details.
\[log \mathbb{L}(\theta) = \sum_{i = 1}^{n} log \left[ f_Y(y_i| \theta) \right]\]
- Step 3. Take the negative log-likelihood \(- log L(\theta)\). Because optimizers work by minimizing a function, we have to work on negative log-likelihood. See Section 1.2.3 for more details.
\[- log \mathbb{L}(\theta) = - \sum_{i = 1}^{n} log \left[ f_Y(y_i| \theta) \right]\]
- Step 4. The optimizer try different values of \(\theta\) in the paramater space \(\Theta\) for which negative log-likelihood \(- log \mathbb{L}(\theta)\) is minimized.
8.3 Statistical distributions for fitting
Choosing the right distribution depends on the nature of your data:
Count data (number of cases)
- Poisson: when mean \(\approx\) variance.
- Negative Binomial: if data are overdispersed (variance > mean).
- Zero-Inflated Poisson/Negative Binomial: if there are excess zeros.
Non-negative continuous (positive real)
- Gamma: often used for waiting times or skewed data.
- Weibull: used in reliability and survival analysis.
- Log-Normal: when the log of the data is normally distributed.
Binary or proportions (0 or 1, or strictly between)
- Binomial: if data are 0 or 1, or aggregated counts out of a total.
- Beta: if values lie strictly in (0, 1).
Unbounded continuous (all real numbers)
- Gaussian (Normal): commonly assumed for symmetric data.
- Student’s t: if data have heavier tails or potential outliers.
8.4 Code
We will use the same data of a H1N1 influenza outbreak in an elementary school in Section 7.2 and fit the deterministic SEIR model with code Listing 2.1.
Looking at the original paper, the school has 370 students (Cauchemez et al., 2011). So we set the initial values as below.
<- 369
S0 <- 0
E0 <- 1
I0 <- 0 R0
- Step 1. Write down the likelihood function \(L(\theta)\).
We assume the incidence data is generated from a normal distribution with mean \(\mu_{inc}\) is the predicted incidence and a standard deviation \(\sigma_{inc}\).
\[L(\theta) = \prod_{i = 1}^{n} \mathcal{N}(\mu_{inc}, \sigma_{inc})\]
We use the dnorm()
function to define this likelihood.
dnorm(x = data, mean = pred, sd = sd_inc)
- Step 2. Take the natural log of the likelihood \(log L(\theta)\), product becomes sum.
\[log L(\theta) = \sum_{i = 1}^{n} log \left[ \mathcal{N}(\mu_{inc}, \sigma_{inc}) \right]\]
sum(dnorm(x = data, mean = pred, sd = sd_inc, log = T))
- Step 3. Take the negative log-likelihood \(- log L(\theta)\).
\[- log L(\theta) = - \sum_{i = 1}^{n} log \left[ \mathcal{N}(\mu_{inc}, \sigma_{inc}) \right]\]
- sum(dnorm(x = data, mean = pred, sd = sd_inc, log = T))
- Step 4. Pass the negative log-likelihood \(- log L(\theta)\) to the optimizer.
<- function(beta, sigma, gamma, sd_inc) {
mll # Make sure that parameters are positive
<- exp(beta)
beta <- exp(sigma)
sigma <- exp(gamma)
gamma <- exp(sd_inc)
sd_inc
<- seir_mod(beta = beta, sigma = sigma, gamma = gamma, S0 = S0, E0 = E0, I0 = I0, R0 = R0, times = 0:(length(df$inc) - 1))
pred <- pred$Inc
pred # Return the negative log-likelihood
- sum(dnorm(x = df$inc, mean = pred, sd = sd_inc, log = T))
}
<- list(beta = 0.004, sigma = 0.5, gamma = 0.5, sd_inc = 3)
starting_param_val
<- mle2(minuslogl = mll, start = lapply(starting_param_val, log), method = "Nelder-Mead")
estimates
summary(estimates)
Maximum likelihood estimation
Call:
mle2(minuslogl = mll, start = lapply(starting_param_val, log),
method = "Nelder-Mead")
Coefficients:
Estimate Std. Error z value Pr(z)
beta -5.063479 0.092641 -54.6567 < 2.2e-16 ***
sigma 0.708384 0.134498 5.2669 1.388e-07 ***
gamma 0.600399 0.092968 6.4581 1.060e-10 ***
sd_inc 0.530870 0.099628 5.3285 9.902e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2 log L: 198.8602
<- exp(coef(estimates))
params params
beta sigma gamma sd_inc
0.006323524 2.030707475 1.822846603 1.700410622
Compare the model with data.
Code
<- seir_mod(beta = params[1], sigma = params[2], gamma = params[3], S0 = S0, E0 = E0, I0 = I0, R0 = R0, times = df$day)
pred <- pred[,c("t", "Inc")]
df_plot
# Compute 95% confidence interval
<- qnorm(p = 0.025, mean = pred$Inc, sd = params[4])
lwr <- qnorm(p = 0.975, mean = pred$Inc, sd = params[4])
upr
<- brewer.pal(11, "PuOr")[c(10, 1, 4, 3, 8)]
my_palette
ggplot(df_plot, aes(x = t, y = Inc)) +
geom_line(color = my_palette[3], linewidth = 1.2) +
geom_ribbon(aes(ymin = lwr, ymax = upr), fill = my_palette[3], alpha = 0.15) +
geom_point(data = df, aes(x = day, y = inc)) +
coord_cartesian(ylim = c(0, max(upr))) +
labs(x = "Day", y = "Incidence") +
theme_minimal()
It looks very similar to the model we fitted with LSE at Figure 7.1.
Let compute \(R_0\). For a “closed population” SEIR model, \(R_0 = \frac{\beta}{\gamma} S_0\). Again, this is similar to the \(R_0\) estimated from LSE at Listing 7.1. The reason is MLE is equal to LSE when we pick the normal distribution likelihood, see Section 1.2.4 for more details.
<- params[1] * S0 / params[3]
rnum0 rnum0
beta
1.280075