Analysis of Financial Time Series in Stan - Chapter 2 - Autoregressive models

Chapter_2

In this series we are trying to reproduce the models and examples listed in the book “Analysis of Financial Time Series, 3rd Edition”, by Ruey S. Tsay, using Stan https://mc-stan.org/ and the package RStan https://cran.r-project.org/web/packages/rstan/index.html. The main repository for the presented models and data can be found at https://github.com/marcomarconi/AFTS_with_Stan.

Most of the models listed in chapter 2 were already translated into Stan as part of the official documentation and can be found at https://mc-stan.org/docs/2_26/stan-users-guide/time-series-chapter.html. Still, we will go through them in the current document.

{
  require(tidyverse)
  require(rstan)
  require(bayesplot)
  require(loo)
  require(LaplacesDemon)
  library(xts)
  library(quantmod)
  library(FKF)
  library(MARSS)
  library(forecast)
  rstan_options(auto_write = TRUE)
  rstan_options(javascript = FALSE)
  theme_set(theme_classic(base_size = 24))
}

SIMPLE AR MODELS

The typical AR(K) model can be coded in Stan as follow:

data {
  int<lower=0> K;
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real intercept;
  real ar[K];
  real<lower=0> sigma;
}
model {
  intercept ~ normal(0, 1);
  ar ~ normal(0, 1);
  sigma ~ cauchy(0, 1);

  for (n in (K+1):N) {
    real mu = intercept;
    for (k in 1:K)
      mu += ar[k] * y[n-k];
    y[n] ~ normal(mu, sigma);
  }
}
generated quantities {
  vector[N] log_lik;
  vector[N] y_hat;
  {
    for (n in 1:K){
      log_lik[n] = normal_lpdf(y[n] | intercept / (1 - sum(ar)), sigma);
      y_hat[n] = normal_rng(intercept / (1 - sum(ar)), sigma);
    }
    
    for (n in (K+1):N)  {
      real mu = intercept;
      for (k in 1:K)
        mu += ar[k] * y[n-k];
      log_lik[n] = normal_lpdf(y[n] | mu, sigma);
      y_hat[n] = normal_rng(mu, sigma);
    }
  }
}

Example 2.1.

The first example applies ar AR(3) to the quarterly growth rate of U.S. real gross national product (GNP), seasonally adjusted, from the second quarter of 1947 to the first quarter of 1991.

gnp <- scan(file="../data/dgnp82.txt")
gnp1 <- ts(gnp,frequency=4,start=c(1947,2))
plot(gnp1)
points(gnp1)

Let’s compile the stan model and fit it to the data:

#AR_model <- stan_model("../models/Chapter_2/AR.stan")
AR_model <- readRDS("../models/Chapter_2/AR.rds")
AR_gnp <- sampling(AR_model, data=list(N=length(gnp1), y=gnp1, K=3), chains=4, cores=4)

As shown below, parameter estimation is similar to the one obtained in the book using ML:

AR_gnp %>% print(pars=names(AR_gnp)[1:5],digits=4)
## Inference for Stan model: AR.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##              mean se_mean     sd    2.5%     25%     50%     75%  97.5% n_eff
## intercept  0.0047  0.0000 0.0011  0.0026  0.0039  0.0047  0.0054 0.0068  4246
## ar[1]      0.3476  0.0015 0.0780  0.1957  0.2934  0.3460  0.4007 0.5031  2653
## ar[2]      0.1828  0.0016 0.0790  0.0302  0.1303  0.1809  0.2338 0.3410  2539
## ar[3]     -0.1446  0.0015 0.0779 -0.2992 -0.1976 -0.1445 -0.0913 0.0101  2735
## sigma      0.0100  0.0000 0.0006  0.0090  0.0096  0.0099  0.0103 0.0111  2933
##             Rhat
## intercept 1.0003
## ar[1]     1.0000
## ar[2]     1.0001
## ar[3]     1.0001
## sigma     1.0009
## 
## Samples were drawn using NUTS(diag_e) at Sat May 15 20:51:19 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

We can also check that the residuals do not present serial correlation Q(12) and therefore resemble white noise:

y_hat <-  colMeans(extract(AR_gnp, pars="y_hat")[[1]])
(gnp - y_hat)  %>%  Box.test(lag=12, type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  .
## X-squared = 8.6586, df = 12, p-value = 0.7318

Forecasting can be iteratively starting from the last known value (see pag. 56 for the formula). To show this we refit the model leaving out the last 12 observations, and compare the 12-step ahead predictions against the real data:

# refit the model omitting the last 12 observations
AR_gnp_12 <- sampling(AR_model, data=list(N=length(gnp1)-12, y=gnp1[1:(length(gnp1)-12)], K=3), chains=4, cores=4)
# extract the estimated parameters and calculate the forecasting
pars <- AR_gnp_12 %>% as.matrix()
r_hat <- matrix(nrow=nrow(pars), ncol=13)
r_hat[,1] <- gnp1[length(gnp1)-12] # set the last known observation as first value
for(i in 1:12){
  r_hat[,i+1] <- pars[,1] + rowSums(sapply(1:3, function(p) pars[,2:4][,p] * r_hat[,i] )) # see pag. 56 of the book for this formula
}
# get mean and quantiles of the predictions
r_hat_df <- data.frame(m=colMeans(r_hat), q1=apply(r_hat, 2, function(x)quantile(x, probs=0.025)), q2=apply(r_hat, 2, function(x)quantile(x, probs=0.975)))
ggplot(data.frame(gnp1=gnp1)) + geom_line(aes(x=1:length(gnp1), y=gnp1)) + geom_point(aes(x=1:length(gnp1), y=gnp1)) + geom_ribbon(data=r_hat_df, aes(x=(length(gnp1)-12):length(gnp1), ymin=q1, ymax=q2 ), alpha=0.5, col="blue", fill="blue")  + xlab("Time")
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

We can also compare model using the loo package. This kind of model checking is probably not correct for time series if our scope is to assess out-of-sample model prediction as we are including data from the future when performing approximate LOO cross validation. A better solution would to use leave-future-out cross-validation (LFO-CV), see https://mc-stan.org/loo/articles/loo2-lfo.html. However, if we are simply interested in evaluation different models ability to learn the relevant structures of the data, loo can be used for model selection, which is still better than the typical AIC used in the book. For example, we can compare the previous AR(3) mode with a simpler AR(1) model:

##        elpd_diff se_diff
## model1  0.0       0.0   
## model2 -1.2       3.0

As it can be seen, the difference is negligible. We will keep using loo for model comparison along this presentation.

REGRESSION MODELS WITH TIME SERIES ERRORS

Consider the common linear regression model:

\[y_t = \alpha + \beta x_{t} + e_{t} \]

where yt and xt are two time series and et denotes the error term. The error term should be white noise. Now, consider the U.S. weekly interest rate data shown in the example:

y1 <- read.table("../data/w-gs1yr.txt", header=T)
y3 <- read.table("../data/w-gs3yr.txt", header=T)
y <- ts(cbind(y1$rate, y3$rate), start = c(1962, 1), frequency = 52)
colnames(y) <- c("Treasury1", "Treasury3")
plot.ts(y, plot.type = "single", ylab="Percent")

We can fit a simple linear regression model to the data, and observe the residuals plot and ACF:

fit <- lm(Treasury3 ~ Treasury1, data=y)
summary(fit)
## 
## Call:
## lm(formula = Treasury3 ~ Treasury1, data = y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82319 -0.37691 -0.01462  0.38661  1.35679 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.83214    0.02417   34.43   <2e-16 ***
## Treasury1    0.92955    0.00357  260.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5228 on 2465 degrees of freedom
## Multiple R-squared:  0.9649, Adjusted R-squared:  0.9649 
## F-statistic: 6.781e+04 on 1 and 2465 DF,  p-value: < 2.2e-16
plot.ts(fit$residuals)

acf(fit$residuals)

Notice that when regressing between these two time series the error is serially correlated, which does not agree with the assumption that the errors should be white noise. We can also apply the linear regression model to the differenced time series:

y_diff <- diff(y)
fit_diff <- lm(Treasury3 ~ 0+Treasury1, data=y_diff)
summary(fit_diff)
## 
## Call:
## lm(formula = Treasury3 ~ 0 + Treasury1, data = y_diff)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42469 -0.03589 -0.00127  0.03456  0.48911 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## Treasury1 0.791935   0.007337   107.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06896 on 2465 degrees of freedom
## Multiple R-squared:  0.8253, Adjusted R-squared:  0.8253 
## F-statistic: 1.165e+04 on 1 and 2465 DF,  p-value: < 2.2e-16
plot.ts(fit_diff$residuals)

acf(fit_diff$residuals)

The linear dependence is still strong and the ACF still shows some significant serial correlations in the residuals, but magnitudes of the correlations are much smaller. A simple solution is to model the error term as a time series itself. In the example, the error term is modeled as a MA(1) process:

\[\Delta y_t = \beta \Delta x_{t} + e_{t}, \quad e_t = a_t - \theta_1 a_{t-1} \]

where at is white noise. The resulting model is a simple example of linear regression with time series errors. Such model can be easily implemented in Stan:

data {
  int<lower=0> N;
  vector[N] y;
  vector[N] x;
}
parameters {
  real beta;
  real<lower=-1, upper=1> theta;
  real<lower=0> sigma;
}
transformed parameters {
  vector[N] a;    // error term at time t
  a[1] = (y[1] - beta*x[1]);
  for (t in 2:N) 
    a[t] = (y[t] - beta*x[t]) - theta * a[t - 1];

}
model {
  beta ~ normal(0, 1);
  sigma ~ cauchy(0, 1);
  target += normal_lpdf(a | 0, sigma);
}

We applied the proposed model to the U.S. weekly interest rate data shown in the example, and plot the residuals ACF:

TSE <- stan_model("../models/Chapter_2/TSE.stan")
tse <- sampling(TSE, data=list(N=nrow(y_diff), y=y_diff[,2], x=y_diff[,1]), chains=1)
## 
## SAMPLING FOR MODEL 'TSE' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000165 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.65 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.821126 seconds (Warm-up)
## Chain 1:                0.848955 seconds (Sampling)
## Chain 1:                1.67008 seconds (Total)
## Chain 1:
extract(tse, pars="a")[[1]] %>% colMeans() -> a
plot.ts(a)

acf(a)

Box.test(a, lag=12, type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  a
## X-squared = 54.397, df = 12, p-value = 2.32e-07

The serial correlations in the residuals is now strongly reduced.

Addendum: AR-GAS models

Blasques, Koopman and Lucas proposed an alternative AR model with time-varying temporal dependency, https://www.slideshare.net/SYRTOproject/timevarying-temporal-dependene-in-autoregressive-models where the AR component is allowed to vary over time based on a score function:

\[y_t = f_t \times y_{t-1} + u_t, \quad f_t = \omega + \alpha \frac{u_{t-1}y_{t-1}} {\sigma^2_u}+\beta f_{t-1} \] The AR-GAS model can be coded in Stan as follows:

data {
  int<lower=0> N;
  vector[N] y;

}
parameters {
  real omega;
  real<lower=0> alpha;
  real<lower=0> beta;
  real<lower=0> sigma;
}
transformed parameters {
  vector[N] ar;
  real err[N];
  ar[1] = 1;
  ar[2] = 1;
  err[1] = y[1];
  err[2] = y[2] - 2*(inv_logit(ar[2])-0.5) * y[1];
  for (t in 3:N) {
    ar[t] = (omega
                    + alpha * (err[t-1] * y[t-2]) / sigma^2
                    + beta * ar[t-1]);
 
    err[t] = y[t] - 2*(inv_logit(ar[t])-0.5) * y[t-1];
  }
}
model {
  omega ~ normal(0, 0.1);
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  sigma ~ cauchy(0, 1);
  y[3:N] ~ normal(2*(inv_logit(ar[3:N])-0.5) .* y[2:(N-1)], sigma);
}

We fit the AR-GAS model to the unemployement insurance claims data (as in the example at slide 27):

uic <- scan("../data/UIC.txt")
AR_GAS_model <- stan_model("../models/Chapter_2/AR-GAS.stan")
AR_GAS_model <- readRDS("../models/Chapter_2/AR-GAS.rds")
AR_GAS_uic <- sampling(AR_GAS_model, data=list(N=length(uic), y=uic), chains=4, cores=4)

There will be a few divergences, better priors would probably fix that. We can see the time-varying effect of the AR component:

par(mfrow=c(2,1), mai = c(1, 1, 0.1, 0.1))
plot.ts(uic)
ar <- extract(AR_GAS_uic, pars="ar")[[1]] %>% colMeans()
plot.ts(2*( invlogit(ar)-0.5), ylab="AR")

Comments