9 MCMC

9.1 MCMC

MCMC stands for Markov chain Monte Carlo:

Markov chain is the way to generate a sequence of random samples, where each sample is dependent only on the previous one.
Monte Carlo is using these random samples and simulations to approximate complex systems.

MCMC is a method to draw samples from distributions that we only know the shape or structure (up to a constant Definition A.5), but not the exact functional form Definition A.4. Outputs of MCMC is a set of samples from the posterior distribution.

Posterior, or the joint posterior distribution: is the probability distribution of parameters \(\theta\), conditioning on the model \(M\) and data \(D\).

\[\mathbb{P}(\theta | D, M) = \frac{\mathbb{P}(D | \theta, M) \, \mathbb{P}(\theta | M)}{\mathbb{P}(D | M)} \]

The denominator \(\mathbb{P}(D | M)\) is just a normalisation factor (to help the total probability sums or integrates to 1), MCMC doesn’t care about it. So in MCMC we usually use:

\[\underbrace{\mathbb{P}(\theta | D, M)}_{\text{Posterior}} \propto \underbrace{\mathbb{P}(D | \theta, M)}_{\text{Likelihood}} \, \underbrace{\mathbb{P}(\theta | M)}_{\text{Prior}}\]

9.2 Terminology

Iterations: The total number of samples being drawn in MCMC
Warm-up (Burn-in): The initial phase before the algorithm reaches equilibrium. Samples from this period are discarded as they may not accurately represent the target distribution.
Chains: Independent sequences of samples generated by the MCMC algorithm, each exploring the parameter space through multiple iterations.
Convergence:
Mixing:

9.3 Diagnostics

Trace plots: Graphs that display the sampled parameter values across iterations for each chain, helping to assess convergence and mixing.
Auto-correlation plots: Visualizations showing the correlation between samples at different lags, indicating how much each sample depends on previous ones.
Effective Sample Size (ESS): An estimate of the number of independent samples in the MCMC chains, reflecting the quality of the sampling.
Gelman-Rubin diagnostic (\(\hat{R}\)): A statistic comparing the variance between multiple chains to the variance within each chain, used to assess convergence. Values close to 1 suggest convergence.

9.4 MCMC in infectious disease modelling

Suppose that we have an iid (see Definition A.14) incubation periods dataset of \(n\) samples \(y = (y_1, y_2, \cdots, y_n)\). We assume these data are independent draws from a Gamma distribution with shape \(\alpha\) and rate \(\beta\).

pdf of Gamma distribution

A random variable \(x\) that is gamma-distributed with shape \(\alpha\) and rate \(\beta\) has this pdf:

\[f(x | \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}\]

\(f(x | \alpha, \beta)\) or sometimes written as \(f(x; \alpha, \beta)\) (Wikipedia) reads as: probability density of random variable \(x\) given that we know parameters \(\alpha\) and \(\beta\).

Step 1. Write the likelihood function of the dataset.

Likelihood of a data point is \(f(y_1| \alpha, \beta)\).

Likelihood of the dataset is:

\[\begin{align} f(y| \alpha, \beta) &= f(y_1| \alpha, \beta) \times f(y_2| \alpha, \beta) \times \cdots \times f(y_n| \alpha, \beta) \\ &= \prod_{i = 1}^{n} \frac{\beta^{\alpha}}{\Gamma(\alpha)} y_i^{\alpha - 1} e^{-\beta y_i} \\ &= \frac{\beta^{n\alpha}}{\Gamma(\alpha)^n} \prod_{i = 1}^{n} y_i^{\alpha - 1} e^{-\beta y_i} \end{align}\]

Step 2. Assign prior distributions to the parameters of interest.

Both shape \(\alpha\) and rate \(\beta\) are positive so a natural to distribution to represent our prior beliefs them is a Gamma distribution. We will also assume that a priori (i.e. before we see the data) that \(\alpha\) and \(\beta\) are independent.

\[\alpha \sim \text{Gamma}(\lambda_\alpha, \nu_\alpha)\]

The pdf of \(\alpha\) is:

\[f(\alpha| \lambda_\alpha, \nu_\alpha) = \frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)} \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha}\]

Because \(\frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)}\) does not depends on \(\alpha\).

\[\begin{align} f(\alpha| \lambda_\alpha, \nu_\alpha) &= \frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)} \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \\ &\propto \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \end{align}\]

Similarly:

\[\beta \sim \text{Gamma}(\lambda_\beta, \nu_\beta)\]

\[\begin{align} f(\beta| \lambda_\beta, \nu_\beta) &= \frac{\nu_\beta^{\lambda_\beta}}{\Gamma(\lambda_\beta)} \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \\ &\propto \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \end{align}\]

Step 3. Write the joint posterior distribution.

\[\begin{align} f(\alpha, \beta | y) &\propto f(y| \alpha, \beta) \times f(\alpha) \times f(\beta) \\ &= f(y| \alpha, \beta) \times f(\alpha| \lambda_\alpha, \nu_\alpha) \times f(\beta| \lambda_\beta, \nu_\beta) \\ &= \frac{\beta^{n\alpha}}{\Gamma(\alpha)^n} \prod_{i = 1}^{n} y_i^{\alpha - 1} e^{-\beta y_i} \times \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \times \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \end{align}\]

Step 4. Derive the full conditionals.

The full conditional