13  Particle filter

13.1 Particle filter

Particle filters are used for sampling latent states in systems where we cannot observe all variables directly. This methodology is applied to solve Hidden Markov Models and nonlinear filtering problems. The primary goal of a particle filter is to estimate the posterior density of state variables based on observation variables.

13.1.1 Key terminology

  • Particles: Samples representing the posterior distribution.

  • Weighting: Assigning likelihood weights to particles based on their agreement with observations.

  • Resampling: Selecting particles based on their weights. Particles with higher weights are sampled more frequently. This process is also referred to as Monte Carlo localisation.

13.1.2 Example

A classic example is estimating the position of a robot vacuum on a 2D map of a room:

  • The robot moves on a 2D grid from \((-10, -10)\) to \((10, 10)\).
  • Motion model: the robot follows a noisy circular trajectory around the origin. The velocity vector is \(v = (-y, x) \cdot V / \sqrt{x^2 + y^2}\), which produces counter-clockwise motion, with a small Gaussian process noise added at each step.
  • Observation model: a sensor reports only the robot’s \(x\)-coordinate with Gaussian noise of standard deviation \(\sigma\).

Given a stream of observations, we want the posterior over the robot’s true position. The bootstrap particle filter iterates four ingredients:

  1. Initialisation (\(t = 0\)): sample \(N\) particles from the prior \(p(x_0)\). Each particle starts with weight \(1/N\).
  2. For \(t = 1, 2, \ldots\), repeat:
    1. Predict: propagate each particle forward, \(x_t^{(i)} \sim f(x_t \mid x_{t-1}^{(i)})\), including process noise so the cloud spreads out.
    2. Weight: on receiving observation \(y_t\), assign weights \(w_t^{(i)} \propto g(y_t \mid x_t^{(i)})\) and normalise so they sum to 1.
    3. Resample: draw \(N\) new particles, with replacement, using probabilities equal to the weights. After resampling, all weights reset to \(1/N\).
NoteA small correction to the original version

The verbal algorithm above (Predict → Weight → Resample → repeat) is exactly the bootstrap particle filter, so your high-level description is correct. A few small things worth flagging about the earlier implementation, though:

  • The R code weighted the particles before moving them and then resampled using those pre-movement weights. The rule is that resampling should use the weights from the current time step. Either “predict → weight → resample” or “weight → resample → predict” is a valid cycle, but mixing them (resampling with stale weights) breaks the filter.
  • The predict step didn’t add any process noise. Without it, particles can only spread through resampling, which quickly leads to particle depletion — the posterior collapses to a handful of distinct values.
  • With \(v = (-y, x) \cdot V / \sqrt{x^2 + y^2}\), the motion is counter-clockwise (not clockwise as the original text stated). At \((r, 0)\) for example, \(v = (0, V)\), so the robot moves upward.

Hit ▶ Play to watch the vacuum trundle around the room. The left panel is the physical world: the dark robot follows a circular path, leaves a dashed trail, and at each time step the sensor reports its \(x\)-coordinate with noise (dashed red line, red pulse on the robot). The right panel is the filter’s internal state: the four algorithm boxes at the top light up as the current phase changes, the middle strip shows every particle on the \(x\)-axis (with the observation likelihood curve overlaid during Weight), and the histogram at the bottom is the running posterior density \(p(x_t\mid y_{1:t})\). Sliders let you change \(N\), observation noise, process noise, and playback speed; the scrubber at the very bottom jumps to any frame.

13.2 POMP

maximum likelihood via iterated filtering

https://kingaa.github.io/short-course/mif/mif.html

13.3 PMCMC

PMCMC stands for Particle Markov Chain Monte Carlo. It’s a full Bayesian approach and basically a way to combine:

  • Particle filtering (to sampling hidden states of a system), and
  • MCMC (Markov Chain Monte Carlo) (to sampling unknown parameters).

The main components of HMP are:

  1. Hidden state \(x_t\): This represents the true number of infectious individuals. In an SIR model, the state is “hidden” because:
    • Only the I compartment can be observed, while the others are hidden
    • There is an observation noise that hide the true value of I
  2. Time evolution process \(f(x_{t + 1}|x_t)\): The state variable at the next time step \(x_{t + 1}\) is a function of the current state: \(\mathbb{P}(x_{t + 1}) = f(x_{t + 1}|x_t)\), since this is a Markov process.
  3. Observations \(y_t\): The reported case counts at each time \(t\).
  4. Observation process \(g(y_t|x_t)\): The probabilistic distribution of observing \(y_t\) is a function of \(x_t\). Typically this is underreporting represented by a binomial process with reporting probability \(\epsilon\): \(\mathbb{P}(y_t) = g(y_t|x_t) = Bin(n = x_t, p = \epsilon)\).

We are interested in the time evolution process \(f(x_{t + 1}|x_t)\), modelled with a set of parameters \(\theta\). The posterior distribution of \(\theta\), when we have the case counts \(y_t\): \(\mathbb{P}(\theta|y_t)\), is:

\[\mathbb{P}(\theta|y_t) \propto \mathbb{P}(y_t|\theta) \cdot \mathbb{P}(\theta)\]

We need to compute the likelihood of the data \(\mathbb{P}(y_t|\theta)\). This likelihood is often referred to as the marginal likelihood, as it is a marginal distribution of \(\mathbb{P}(x_t, y_t|\theta)\) on \(y_t\). For example, if we only have 2 time steps \(t = 1, 2\), we know the initial state \(x_1\).

\[\mathbb{P}(y_2|\theta) = \underbrace{\int \underbrace{\mathbb{P}(y_2|x_2, \theta)}_{y_2 \text{ is what we see from } x_2} \cdot \underbrace{\mathbb{P}(x_2|x_1, \theta)}_{x_2 \text{ depends on } x_1} dx_2}_{\substack{\text{We can't observe } x_2 \text{, it has many possible values,} \\ \text{so must integrate for all possible values of } x_2}}\]

If we have 3 time steps:

\[\mathbb{P}(y_3|\theta) = \int \mathbb{P}(y_3|x_3, \theta) \cdot \mathbb{P}(x_3|x_2, \theta) \cdot \mathbb{P}(x_2|x_1, \theta) \ dx_3 \ dx_2\]

The more time steps we have, the more complicated this computation will be. The generalised form of this is:

\[\mathbb{P}(y_{1:T}|\theta) = \int \mathbb{P}(y_{1:T}|x_{1:T}, \theta) \cdot \mathbb{P}(x_{1:T}| \theta) \ dx_{1:T}\]

Particle filter is a sub-algorithm to approximate the marginal likelihood.

The Markov property allows for the use of particle filtering (Endo et al., 2019). The algorithm:

  • For \(t = 1, 2, .. ., T\), sample the current state \(x_t\) using the samples of the previous state \(x_{t−1}\) and the current observation \(y_t\):

\[\begin{align} \mathbb{P}(x_t|y_t) & \propto \mathbb{P}(y_t|x_t) \cdot \mathbb{P}(x_t) \\ \underbrace{\mathbb{P}(x_t| y_t, x_{t - 1}, \theta)}_{\substack{\text{add } x_{t - 1} \text{, } \theta \text{ to the condition}}} &\propto \underbrace{\mathbb{P}(y_t|x_t, x_{t - 1}, \theta)}_{\substack{= \mathbb{P}(y_t|x_t, \theta) \text{ because } y_t \\ \text{ does not depends on } x_{t - 1}}} \cdot \mathbb{P}(x_t|x_{t - 1}, \theta) \\ &= \mathbb{P}(y_t|x_t, \theta) \cdot \mathbb{P}(x_t|x_{t - 1}, \theta) \\ &= g_{\theta}(y_t|x_t) \cdot f_{\theta}(x_t|x_{t - 1}) \end{align}\]

  • So having \(g_{\theta}(y_t|x_t)\) and \(f_{\theta}(x_t|x_{t - 1})\) we can get the current state \(x_t\):

    • Generate particles of \(x_t\) given \(x_{t−1}\) on the previous time step based on \(f_{\theta}(x_t|x_{t - 1})\).

    • Filter those particles so that only those consistent with the observation \(y_t\) are selected based on \(g_{\theta}(y_t|x_t)\).

  • Using the produced \(x_{1:T}\) to approximate the marginal likelihood \(\mathbb{P}(y_{1:T}|\theta)\). First, apply the chain rule of probabilities on \(\mathbb{P}(y_{1:T}|\theta)\):

\[\begin{align} \mathbb{P}(A_1, A_2, A_3) &= \mathbb{P}(A_3|A_2, A_1) \cdot \mathbb{P}(A_2, A_1) \\ &= \mathbb{P}(A_3|A_2, A_1) \cdot \mathbb{P}(A_2|A_1) \cdot \mathbb{P}(A_1) \\ &= \mathbb{P}(A_1) \cdot \mathbb{P}(A_2|A_1) \cdot \mathbb{P}(A_3|A_2, A_1) \end{align}\]

\[\begin{align} \mathbb{P}(y_{1:T}|\theta) &= \mathbb{P}(y_1|\theta) \cdot \mathbb{P}(y_2|y_1, \theta) \cdot \mathbb{P}(y_3|y_2, y_1, \theta) \cdots \mathbb{P}(y_T|y_{1:T-1}, \theta) \\ &= \mathbb{P}(y_1|\theta) \cdot \prod^{T}_{t = 2} \mathbb{P}(y_t|y_{1:t - 1},\theta) \end{align}\]

In which:

\[\mathbb{P}(y_1|\theta) = \int \mathbb{P}(y_1|x_t, \theta) \cdot \underbrace{\mathbb{P}(x_t, \theta) dx_t}_{\substack{\text{because we add } x_t \\ \text{ to the condition}}}\]

\[\prod^{T}_{t = 2} \mathbb{P}(y_t|y_{1:t - 1},\theta) = \prod^{T}_{t = 2} \int \mathbb{P}(y_t|x_t, y_{1:t - 1}, \theta) \cdot \mathbb{P}(x_t|y_{1:t - 1}, \theta) \ dx_t\]

13.4 PMCMC algorithm

  1. Arbitrarily choose an initial parameter value \(\theta^{(0)}\) (step \(n = 0\)). Run particle filtering to generate samples \(X_{1:T}\) and the approximated marginal likelihood \(\mathbb{P}(y_{1:T}|\theta^{(0)})\). Randomly choose one trajectory \(x^{(0)}_{1:T}\) from \(X_{1:T}\).
  2. For step \(n \geq 1\), use the previous parameter value \(\theta^{(n - 1)}\) to propose a new value \(\theta^{*(n)}\) by sampling from the proposal distribution \(q(\theta^{*(n)}; \theta^{(n - 1)})\).
  3. Run particle filtering to generate samples \(X^{*}_{1:T}\) and the approximated marginal likelihood \(\mathbb{P}(y_{1:T}|\theta^{*(n)})\). Randomly choose one trajectory \(x^{*(n)}_{1:T}\) from \(X^{*}_{1:T}\).
  4. Compare the marginal likelihood with that in the previous step, with probability:

\[min \Big[ 1, \frac{\mathbb{P}(y_{1:T}|\theta^{*(n)})}{\mathbb{P}(y_{1:T}|\theta^{(n - 1)})} \cdot \frac{q(\theta^{(n - 1)}; \theta^{(*n)})}{q(\theta^{*(n)}; \theta^{(n - 1)})} \Big]\] update \(\theta^{(n)} = \theta^{*(n)}\) and \(X^{(n)}_{1:T} = X^{*(n)}_{1:T}\). Otherwise, keep the values from the previous step: \(\theta^{(n)} = \theta^{(n - 1)}\) and \(X^{(n)}_{1:T} = X^{(n - 1)}_{1:T}\)

  1. Repeat 2-4 until the Markov-chain converges.

13.4.1 Visual walk-through

The hardest part of PMCMC is keeping the two nested loops straight: the outer Metropolis–Hastings chain walks around in parameter space \(\theta\), and for every single proposal it fires off a whole particle filter to get a (noisy) estimate of \(\mathbb{P}(y_{1:T}\mid\theta)\).

The demo below uses a tiny linear state-space model — \(x_t = x_{t-1} + \theta + \mathcal{N}(0,\sigma_p^2)\), \(y_t = x_t + \mathcal{N}(0,\sigma_o^2)\) — so that the reference log-likelihood profile \(\log L(\theta)\) (top of the left panel, indigo curve) can be pre-computed on a grid for orientation. Hit ▶ Play to watch the algorithm.

  • Top bar: the four algorithm steps light up in turn — ① initialise → ② propose \(\theta^*\) → ③ run particle filter → ④ accept/reject → loop back to ②.
  • Left panelouter MCMC: the top shows \(\log L(\theta)\) as a reference. The bottom is the chain trace: each horizontal row is one iteration \(n\), going downward. The black dot is \(\theta^{(n-1)}\) (current state), the purple dot is the proposal \(\theta^{*(n)}\), and the arrow shows the jump drawn from \(q(\cdot\mid\theta^{(n-1)})\). Green dots are accepted states; the green step-line is the chain trajectory (flat segments = rejections, where \(\theta\) stays put).
  • Right panelinner particle filter: the red dots are the fixed observations \(y_{1:T}\). During step ③ a cloud of blue trajectories sweeps across — these are the particles propagated under \(\theta^*\). Trajectories that pass close to the red dots get high weights, and their average weight at each step gives \(\log \hat{\mathbb{P}}(y_{1:T}\mid\theta^*)\). The two bars underneath compare the current and proposed log-likelihoods.
  • Bottom strip: the Metropolis–Hastings decision — the acceptance ratio \(\alpha\), the uniform draw \(u\), and a green ACCEPT or red REJECT badge.

The sliders let you change the chain length, the proposal width \(\sigma_q\), the number of particles in the inner filter, and the playback speed.