8  MCMC

8.1 Monte Carlo simulation

Monte Carlo simulation is a method that uses repeated random sampling to obtain numerical results.

We will use Monte Carlo simulation to create a circle and estimate \(\pi\).

8.1.1 Create a circle

To create a circle of radius \(r = 1\) centered at \((0, 0)\), we first define a square with side length 2, extending from \([-1, 1]\) in both \(x\) and \(y\) directions. We generate random points uniformly within this square by drawing \(x\) and \(y\) values from the interval \([-1, 1]\).

set.seed(7)

n_points <- 1000
xs <- runif(n_points, min = -1, max = 1)
ys <- runif(n_points, min = -1, max = 1)

The distance from each point to the center \((0, 0)\) is \(\sqrt{x^2 + y^2}\). A point lies within the circle if \(\sqrt{x^2 + y^2} \leq 1\).

in_circle <- xs ^ 2 + ys ^ 2 <= 1
df <- data.frame(n_points = 1:n_points, xs, ys, in_circle)

8.1.2 Estimate \(\pi\)

The area of a circle is \(\pi r^2\).

The area of the bounding square is \((2r)^2 = 4r^2\).

Dividing the area of the circle by the area of the square, we get a ratio:

\[\frac{\text{area of circle}}{\text{area of square}} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}\]

Therefore,

\[\pi = 4 \times \frac{\text{area of circle}}{\text{area of square}}\]

If we approximate the area of circle by the number of simulated points within the circle, and the area of square by the total number of simulated points, we have:

\[\pi \approx 4 \times \frac{\text{simulated points within circle}}{\text{total number of simulated points}}\]

8.1.2.1 One simulation

Here we we run a single simulation to estimate \(\pi\). Let explore how the approximation of \(\pi\) changes as more points are sampled.

df$my_pi <- 4 * cumsum(df$in_circle) / df$n_points

As more points are sampled, the approximation of \(\pi\) improves. In the left figure, you can see that increasing the number of points better estimates the areas of the circle and square, leading to a closer approximation of the true value of \(\pi\).

8.1.2.2 Multiple simulations

Instead of a single simulation with many points, we run \(N\) smaller simulations. In each simulation, we sample a small number of points to estimate \(\pi_i\). The final estimation is then averaged across simulations:

\[\pi = \frac{1}{N}\sum_{i = 1}^{N} \pi_i\]

As shown in the one-simulation section, increasing the number of points improves accuracy. However, even with fewer points per simulation (e.g., 50), running many simulations and averaging the results enhances the approximation of \(\pi\). This technique leverages the law of large numbers, where the average of repeated estimates converges to the true value.

8.1.3 MCMC in infectious disease modelling

Suppose that we have an iid (see Definition A.8) incubation periods dataset of \(n\) samples \(y = (y_1, y_2, \cdots, y_n)\). We assume these data are independent draws from a Gamma distribution with shape \(\alpha\) and rate \(\beta\).

A random variable \(x\) that is gamma-distributed with shape \(\alpha\) and rate \(\beta\) has this pdf:

\[f(x | \alpha, \beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}\]

\(f(x | \alpha, \beta)\) or sometimes written as \(f(x; \alpha, \beta)\) (Wikipedia) reads as: probability density of random variable \(x\) given that we know parameters \(\alpha\) and \(\beta\).

Step 1. Write the likelihood function of the dataset.

Likelihood of a data point is \(f(y_1| \alpha, \beta)\).

Likelihood of the dataset is:

\[\begin{align} f(y| \alpha, \beta) &= f(y_1| \alpha, \beta) \times f(y_2| \alpha, \beta) \times \cdots \times f(y_n| \alpha, \beta) \\ &= \prod_{i = 1}^{n} \frac{\beta^{\alpha}}{\Gamma(\alpha)} y_i^{\alpha - 1} e^{-\beta y_i} \\ &= \frac{\beta^{n\alpha}}{\Gamma(\alpha)^n} \prod_{i = 1}^{n} y_i^{\alpha - 1} e^{-\beta y_i} \end{align}\]

Step 2. Assign prior distributions to the parameters of interest.

Both shape \(\alpha\) and rate \(\beta\) are positive so a natural to distribution to represent our prior beliefs them is a Gamma distribution. We will also assume that a priori (i.e. before we see the data) that \(\alpha\) and \(\beta\) are independent.

\[\alpha \sim \text{Gamma}(\lambda_\alpha, \nu_\alpha)\]

The pdf of \(\alpha\) is:

\[f(\alpha| \lambda_\alpha, \nu_\alpha) = \frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)} \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha}\]

Because \(\frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)}\) does not depends on \(\alpha\).

\[\begin{align} f(\alpha| \lambda_\alpha, \nu_\alpha) &= \frac{\nu_\alpha^{\lambda_\alpha}}{\Gamma(\lambda_\alpha)} \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \\ &\propto \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \end{align}\]

Similarly:

\[\beta \sim \text{Gamma}(\lambda_\beta, \nu_\beta)\]

\[\begin{align} f(\beta| \lambda_\beta, \nu_\beta) &= \frac{\nu_\beta^{\lambda_\beta}}{\Gamma(\lambda_\beta)} \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \\ &\propto \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \end{align}\]

Step 3. Write the joint posterior distribution.

\[\begin{align} f(\alpha, \beta | y) &\propto f(y| \alpha, \beta) \times f(\alpha) \times f(\beta) \\ &= f(y| \alpha, \beta) \times f(\alpha| \lambda_\alpha, \nu_\alpha) \times f(\beta| \lambda_\beta, \nu_\beta) \\ &= \frac{\beta^{n\alpha}}{\Gamma(\alpha)^n} \prod_{i = 1}^{n} y_i^{\alpha - 1} e^{-\beta y_i} \times \alpha^{\lambda_\alpha - 1} e^{-\nu_\alpha \alpha} \times \beta^{\lambda_\beta - 1} e^{-\nu_\beta \beta} \end{align}\]

Step 4. Derive the full conditionals.

The full conditional