Stochastic models

Concepts of stochasticity

The dynamics of every population has both deterministic (predictable) and stochastic (unpredictable) forces (Lande et al., 2003). These two processes act at the same time, so real-world data often look “noisy”. We can split that noise into three basic forms of stochasticity, that can be classified into 2 groups:

  1. Process noise: the randomness within the process that generates the data. It has two forms of stochasticity:

    • Demographic stochasticity (statistical noise): random events which are usually conceived as being independent among individuals (Lande et al., 2003). For example, each individual might die or survive with a certain probability per unit time (Lande et al., 2003). Because this stochasticity operates independently among individuals, it tends to average out in large populations and has a greater impact on small populations (Lande et al., 2003).

    • Environmental stochasticity: random temporal changes that affect all individuals equally, no matter the population size (Lande et al., 2003). It overtakes demographic stochasticity in large populations (Ganyani et al., 2021). Examples include external unpredictable events (temperature, rainfall, humidity) and individual variation (variability in contact patterns, superspreaders). These factors translate into fluctuations in the transmission parameter, often captured by overdispersion in Poisson models (Ganyani et al., 2021).

  2. Sampling/observational/measurement error: refers to errors caused by observation or sampling methods (Dennis et al., 2006), because real data collection is never perfect.

A hidden Markov process (aka state-space model) (Endo et al., 2019) can illustrate these noises.

\[x_{t + 1} = \underbrace{f(x_t)}_{\text{deterministic}} + \underbrace{\varepsilon_{\text{process}}}_{\text{stochastic}}\]

\[y_{t + 1} = x_{t + 1} + \underbrace{\varepsilon_{\text{observation}}}_{\text{stochastic}}\]

Process noise \(\varepsilon_{\text{process}}\) is propagated over time because the state feeds into itself:

\[\begin{align}x_{t + 2} &= f(x_{t + 1}) + \varepsilon_{\text{process}} \\ &= f(f(x_t) + \varepsilon_{\text{process}}) + \varepsilon_{\text{process}} \end{align}\]

Observation noise \(\varepsilon_{\text{observation}}\) only shows up once when we measure the system. It doesn’t affect later steps:

\[y_{t + 2} = x_{t + 2} + \varepsilon_{\text{observation}}\]

A general stochastic SIR model

Most stochastic models for compartmental epidemic models are Markov processes, whereby the future state of the population at time \(t + 1\) depends only on the current state at time \(t\) (Ganyani et al., 2021). Recall that a deterministic SIR model follows:

\[\frac{dS}{dt} = -\frac{\beta SI}{N}\]

\[\frac{dI}{dt} = \frac{\beta SI}{N} - \gamma I\]

\[\frac{dR}{dt} = \gamma I\]

Demographic stochasticity

Let \(I_t\) and \(R_t\) be Poisson processes that denote the number of individuals who have been infected and the number of individuals who have recovered at time \(t\). The probability of having a new infection or new recovery occur at time \(t + h\) is:

\[\mathbb{P}(I_{t + h} = I_t + 1 | I_t) = \frac{\beta S_t I_t}{N} h + o(h)\]

\[\mathbb{P}(R_{t + h} = R_t + 1 | R_t) = \gamma I_t h + o(h)\]

Gillespie algorithm

The algorithm proceeds in two steps (Ganyani et al., 2021):

  1. Simulate time at which the next event will occur: The time until the next event occurs follows an exponential distribution with a rate equal to the sum of the rates over all possible events
  2. Given the simulated time, simulate which event occurs

Environmental stochasticity

Environmental stochasticity is typically modeled by randomising the transmission rate with white noise (Ganyani et al., 2021). One way to incorporate it in the SIR model is to multiply the transmission rate by a Lévy white noise process \(\xi_t\) which fluctuates around one. For instance, let \(\xi_t = \frac{d \Gamma(t)}{dt}\), with \(\Gamma(t)\) having Gamma increments so that:

\[\Gamma(t + h) - \Gamma(t) \sim Gamma(\frac{h}{\sigma^2}, \sigma^2)\]

Then the infection event probability becomes:

\[\mathbb{P}(I_{t + h} = I_t + 1 | I_t) = \frac{\beta \xi_t S_t I_t}{N} h + o(h)\]

This lets random environment-driven fluctuations scale the transmission rate for everyone, on top of individual-level randomness.

Observational error

A typical example of observation error in infectious disease modelling is under-reporting. Not everyone who falls ill goes to the GP or hospital, and many aren’t correctly diagnosed. We often capture this by a binomial process, with a reporting probability \(\varepsilon\) that covers both missed cases and diagnosis errors (Endo et al., 2019).

\[g(y_t | x_t) = Bin(y_t; x_t, \varepsilon)\]

In a hidden Markov process, the measurement model links our observed data \({Y_t}_n\) (collected at discrete times \(\{t_1, \cdots, t_N\}\)) to the underlying process. To separate environmental noise from possibly overdispersed observational stochasticity, we often choose a measurement model that can handle overdispersion (Ganyani et al., 2021). For count data, a Poisson or negative binomial model is typically used (Ganyani et al., 2021).

Types of stochastic epidemic models

Stochastic models can conveniently be classified according to their contact structure (Britton et al., 2015):

  1. Global: no structure, often referred to as homogeneous mixing. Individuals’ probabilities of interaction do not depend on their location in the population.

  2. Network: Any individual-based epidemic model can be thought of as a network or random graph: with individuals as nodes, and infection of one by another as a link.

  3. Metapopulation: The population is partitioned into non-overlapping groups, e.g. households; individuals have one contact rate with individuals in different groups, and another (higher) rate for individuals in the same group. More general metapopulation models allow an individual to belong to several different types of group, each with its own contact rate, or allow more levels of mixing.

  4. Spatial: vary from simple lattices with only nearest–neighbour interactions, for which some theoretical analysis is possible, to complex models with long-distance interactions, for which only qualitative and approximate results are known. A key feature of spatial models is that they display slower than exponential growth, even in their earliest stage; this makes it difficult to approximate them adequately by deterministic models, and even to define threshold parameters.

Monte Carlo simulation

Monte Carlo simulation is a method that uses repeated random sampling to obtain numerical results.

We will use Monte Carlo simulation to create a circle and estimate \(\pi\).

Create a circle

To create a circle of radius \(r = 1\) centered at \((0, 0)\), we first define a square with side length 2, extending from \([-1, 1]\) in both \(x\) and \(y\) directions. We generate random points uniformly within this square by drawing \(x\) and \(y\) values from the interval \([-1, 1]\).

set.seed(7)

n_points <- 1000
xs <- runif(n_points, min = -1, max = 1)
ys <- runif(n_points, min = -1, max = 1)

The distance from each point to the center \((0, 0)\) is \(\sqrt{x^2 + y^2}\). A point lies within the circle if \(\sqrt{x^2 + y^2} \leq 1\).

in_circle <- xs ^ 2 + ys ^ 2 <= 1
df <- data.frame(n_points = 1:n_points, xs, ys, in_circle)

Estimate \(\pi\)

The area of a circle is \(\pi r^2\).

The area of the bounding square is \((2r)^2 = 4r^2\).

Dividing the area of the circle by the area of the square, we get a ratio:

\[\frac{\text{area of circle}}{\text{area of square}} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}\]

Therefore,

\[\pi = 4 \times \frac{\text{area of circle}}{\text{area of square}}\]

If we approximate the area of circle by the number of simulated points within the circle, and the area of square by the total number of simulated points, we have:

\[\pi \approx 4 \times \frac{\text{simulated points within circle}}{\text{total number of simulated points}}\]

One simulation

Here we we run a single simulation to estimate \(\pi\). Let explore how the approximation of \(\pi\) changes as more points are sampled.

df$my_pi <- 4 * cumsum(df$in_circle) / df$n_points

As more points are sampled, the approximation of \(\pi\) improves. In the left figure, you can see that increasing the number of points better estimates the areas of the circle and square, leading to a closer approximation of the true value of \(\pi\).

Multiple simulations

Instead of a single simulation with many points, we run \(N\) smaller simulations. In each simulation, we sample a small number of points to estimate \(\pi_i\). The final estimation is then averaged across simulations:

\[\pi = \frac{1}{N}\sum_{i = 1}^{N} \pi_i\]

As shown in the one-simulation section, increasing the number of points improves accuracy. However, even with fewer points per simulation (e.g., 50), running many simulations and averaging the results enhances the approximation of \(\pi\). This technique leverages the law of large numbers, where the average of repeated estimates converges to the true value.

Have I run enough simulations?

The optimal number of simulations is reached when results stabilize, showing approximate convergence. This means additional simulations don’t significantly change the outcome. See Multiple simulations for a demonstration of this convergence.

Steps to calculate the needed number of simulations (source):

  1. Run the simulation with a default number of runs \(R_0\) (usually \(R_0 = 1000\)). Now you should have a vector with the results \(x_0\) where \(\text{length}(x_0) = R_0\).

  2. Calculate the mean value \(\overline{x_0}\) and standard deviation \(\sigma_0\).

  3. Specify the allowed level of error \(\epsilon\) and the uncertainty \(\alpha\) you are willing to accept. Normally you choose \(\epsilon = \alpha = 0.05\%\).

  4. Use this equation to get the required number of simulations:

\[R \geq \left( \frac{Z_{1 - \frac{\alpha}{2}} \times \sigma_0}{\epsilon \times x_0} \right)^2\]

  1. Use the Student t-distribution rather than the normal distribution for small \(R_0\).