An autoregressive (\(AR\)) model is one way to describe or approximate a stochastic process. In an AR model, the current value of the time series depends linearly on its own previous values and on a stochastic term. An AR model of order \(p\), denoted by \(AR(p)\), is written as:
where \(\varepsilon_t\) is (often) a white-noise process (see Definition A.6).
In particular, an \(AR(1)\) takes the simpler form:
\[X_t = \varphi_1 X_{t-1} + \varepsilon_t\]
If \(\varepsilon_t\) are i.i.d, then this \(AR(1)\) is a Markov process (or specifically, a first-order Markov chain), because \(X_t\) depends only on \(X_{t-1}\) and not on \(X_{t-2}\), \(X_{t-3}\)…
5.2 Uncertainty propagation
In an \(AR\) model, each future value depends on the past value(s) plus a random noise term \(\varepsilon_t\). That randomness at each step accumulates over time, so the uncertainty about future values typically propagates as you move forward through the time series.
For example, if you forecast multiple steps ahead in an \(AR(1)\) model, you are carrying forward the uncertainty from \(\varepsilon_t\), \(\varepsilon_{t - 1}\), \(\varepsilon_{t - 2}\)… Therefore, your prediction intervals get wider the further out you go.
Code
library(tidyverse)library(ggplot2)# Define a function that simulates "branching" from day t-1 to day tsimulate_branching <-function(n_days =5, # total daysK =3, # children per nodephi =0.7, # AR(1) coefficientsigma =1.0, # noise std devx0 =0# initial value on day 1 ) {# We'll keep a data frame of points (each "dot") with:# ID = unique ID for each point# day = which day the point is on# value = the y-value (in an AR sense)# parentID = ID of the parent point (so we can connect them)# parent_val = parent's y-value# Start with day 1 points <-data.frame(ID =1,day =1,value = x0,parentID =NA, # no parent for the first dotparent_val =NA ) current_max_ID <-1# Loop over days 2 to n_daysfor(d in2:n_days) {# Extract all points from the previous day prev_day_points <-subset(points, day == d -1)# For each point from day d-1, generate K new child points for day d new_points_list <-lapply(seq_len(nrow(prev_day_points)), function(i) { parent_id <- prev_day_points$ID[i] parent_val <- prev_day_points$value[i]# AR(1): child value = phi * parent_val + random noise child_values <- phi * parent_val +rnorm(K, mean =0, sd = sigma)data.frame(ID = current_max_ID +seq_len(K), # assign new IDsday = d,value = child_values,parentID = parent_id,parent_val = parent_val ) } )# Combine into one data frame new_points <-do.call(rbind, new_points_list)# Update the overall data frame points <-rbind(points, new_points)# Update the max ID so the next iteration doesn’t reuse IDs current_max_ID <-max(new_points$ID) }return(points)}set.seed(7)df_branch <-simulate_branching(n_days =5,K =3,phi =0.7,sigma =1,x0 =0)df_uncertainty <- df_branch |>group_by(day) |>summarise(min_value =min(value),max_value =max(value))ggplot() +geom_ribbon(df_uncertainty, mapping =aes(ymin = min_value, ymax = max_value, x = day), fill ="#204fc1", alpha =0.2) +geom_segment(df_branch, mapping =aes(x = day -1,y = parent_val,xend = day,yend = value ), alpha =0.3) +geom_point(df_branch, mapping =aes(x = day, y = value), size =2, alpha =0.7) +scale_x_continuous(limits =c(1, 5)) +labs(x ="Day", y ="Value") +theme_classic()