9 MCMC
9.1 MCMC
MCMC stands for Markov chain Monte Carlo:
- Markov chain is the way to generate a sequence of random samples, where each sample is dependent only on the previous one.
- Monte Carlo is using these random samples and simulations to approximate complex systems.
MCMC is a method to draw samples from distributions that we only know the shape or structure (up to a constant Definition A.5), but not the exact functional form Definition A.4. Outputs of MCMC is a set of samples from the posterior distribution.
Posterior, or the joint posterior distribution: is the probability distribution of parameters \(\theta\), conditioning on the model \(M\) and data \(D\).
\[\mathbb{P}(\theta | D, M) = \frac{\mathbb{P}(D | \theta, M) \, \mathbb{P}(\theta | M)}{\mathbb{P}(D | M)} \]
The denominator \(\mathbb{P}(D | M)\) is just a normalisation factor (to help the total probability sums or integrates to 1), MCMC doesn’t care about it. So in MCMC we usually use:
\[\underbrace{\mathbb{P}(\theta | D, M)}_{\text{Posterior}} \propto \underbrace{\mathbb{P}(D | \theta, M)}_{\text{Likelihood}} \, \underbrace{\mathbb{P}(\theta | M)}_{\text{Prior}}\]
9.2 Terminology
- Iterations: The total number of samples being drawn in MCMC
- Warm-up (Burn-in): The initial phase before the algorithm reaches equilibrium. Samples from this period are discarded as they may not accurately represent the target distribution.
- Chains: Independent sequences of samples generated by the MCMC algorithm, each exploring the parameter space through multiple iterations.
- Convergence:
- Mixing:
9.3 Diagnostics
- Trace plots: Graphs that display the sampled parameter values across iterations for each chain, helping to assess convergence and mixing.
- Auto-correlation plots: Visualizations showing the correlation between samples at different lags, indicating how much each sample depends on previous ones.
- Effective Sample Size (ESS): An estimate of the number of independent samples in the MCMC chains, reflecting the quality of the sampling.
- Gelman-Rubin diagnostic (\(\hat{R}\)): A statistic comparing the variance between multiple chains to the variance within each chain, used to assess convergence. Values close to 1 suggest convergence.
9.4 MCMC in infectious disease modelling
Suppose that we have an iid (see Definition A.14) incubation periods dataset of \(n\) samples \(y = (y_1, y_2, \cdots, y_n)\). We assume these data are independent draws from a Gamma distribution with shape \(\alpha\) and rate \(\beta\).
A random variable \(x\) that is gamma-distributed with shape \(\alpha\) and rate \(\beta\) has this pdf:
\[f(x | \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}\]
\(f(x | \alpha, \beta)\) or sometimes written as \(f(x; \alpha, \beta)\) (Wikipedia) reads as: probability density of random variable \(x\) given that we know parameters \(\alpha\) and \(\beta\).
Step 1. Write the likelihood function of the dataset.
Likelihood of a data point is \(f(y_1| \alpha, \beta)\).
Likelihood of the dataset is:
\[\begin{align} f(y| \alpha, \beta) &= f(y_1| \alpha, \beta) \times f(y_2| \alpha, \beta) \times \cdots \times f(y_n| \alpha, \beta) \\ &= \prod_{i = 1}^{n} \frac{\beta^{\alpha}}{\Gamma(\alpha)} y_i^{\alpha - 1} e^{-\beta y_i} \\ &= \frac{\beta^{n\alpha}}{\Gamma(\alpha)^n} \prod_{i = 1}^{n} y_i^{\alpha - 1} e^{-\beta y_i} \end{align}\]
Step 2. Assign prior distributions to the parameters of interest.
Both shape \(\alpha\) and rate \(\beta\) are positive so a natural to distribution to represent our prior beliefs them is a Gamma distribution. We will also assume that a priori (i.e. before we see the data) that \(\alpha\) and \(\beta\) are independent.
\[\alpha \sim \text{Gamma}(\lambda_\alpha, \nu_\alpha)\]
The pdf of \(\alpha\) is:
\[f(\alpha| \lambda_\alpha, \nu_\alpha) = \frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)} \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha}\]
Because \(\frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)}\) does not depends on \(\alpha\).
\[\begin{align} f(\alpha| \lambda_\alpha, \nu_\alpha) &= \frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)} \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \\ &\propto \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \end{align}\]
Similarly:
\[\beta \sim \text{Gamma}(\lambda_\beta, \nu_\beta)\]
\[\begin{align} f(\beta| \lambda_\beta, \nu_\beta) &= \frac{\nu_\beta^{\lambda_\beta}}{\Gamma(\lambda_\beta)} \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \\ &\propto \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \end{align}\]
Step 3. Write the joint posterior distribution.
\[\begin{align} f(\alpha, \beta | y) &\propto f(y| \alpha, \beta) \times f(\alpha) \times f(\beta) \\ &= f(y| \alpha, \beta) \times f(\alpha| \lambda_\alpha, \nu_\alpha) \times f(\beta| \lambda_\beta, \nu_\beta) \\ &= \frac{\beta^{n\alpha}}{\Gamma(\alpha)^n} \prod_{i = 1}^{n} y_i^{\alpha - 1} e^{-\beta y_i} \times \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \times \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \end{align}\]
Step 4. Derive the full conditionals.
The full conditional