Analysis of Financial Time Series in Stan - Chapter 5 - High-Frequency Data Analysis and Market Microstructure

Chapter_5.knit

In this series we are trying to reproduce the models and examples listed in the book “Analysis of Financial Time Series, 3rd Edition”, by Ruey S. Tsay, using Stan https://mc-stan.org/ and the package RStan https://cran.r-project.org/web/packages/rstan/index.html. The main repository for the presented models and data can be found at https://github.com/marcomarconi/AFTS_with_Stan.

{
  require(tidyverse)
  require(rstan)
  require(bayesplot)
  require(loo)
  require(LaplacesDemon)
  library(xts)
  library(quantmod)
  library(FKF)
  library(MARSS)
  library(forecast)
  library(NTS)
  
  rstan_options(auto_write = TRUE)
  rstan_options(javascript = FALSE)
  theme_set(theme_classic(base_size = 24))
}

5.4 Models for Price Change - Decomposition Model

When analyzing high frequency intraday data it is common to encounter long periods of “no change” in price data. In order to account for this phenomenon we are going to see the ADS Decomposition model. In this model price change is decomposed into three components: indicator for price change, the direction of price movement if there is a change, and the size of price change if a change occurs. Specifically, the price change at the ith transaction can be written as:

\[ y_i = P_{t_i} - P_{t_i-1} = A_iD_iS_i \]

Where A_i is binary variable that takes the value of 0 if there is a price change at the ith trade and 1 otherwise. D_i is also a discrete variable conditional on A_i that takes the value of 1 if the price has increase or -1 if the price has decreased. S_i is the size of the price change in ticks if there is a change at the ith trade and S_i = 0 if there is no price change at the ith trade. Let’s look at the Stan code, notice that the model can be formulated in both ways presented in the book (we are going to use the analytical likelihood specified in the book at formula (5.29)):

functions {
  real geometric(int y, real phi){
    return(log(phi * (1 - phi)^y));
  }
  real ADS_loglikelihood(vector I, int A, int D, int S, real peta, real eta, real lambda_u, real lambda_d){
    return(I[1] * log(1 - peta) + 
           I[2] * (log(peta) + log(eta) + log(lambda_u) + (S-1) * log(1 - lambda_u)) +
           I[3] * (log(peta) + log(1-eta) + log(lambda_d) + (S-1) * log(1 - lambda_d))
           );
  }
}

data {
  int<lower=0> N;
  int A[N];
  int D[N];
  int S[N];
}
transformed data {
  vector[N] A_real;
  vector[N] D_real;
  vector[3] I[N] ;
  int D_count = 0;
  for(t in 1:N) {
    A_real[t] = A[t];
    D_real[t] = D[t];
    D_count = D_count + 1;
    I[t][1] = 0;
    I[t][2] = 0;
    I[t][3] = 0;
    if(A[t] == 0)
      I[t][1] = 1;
    if(A[t] == 1 && D[t] == 1)
      I[t][2] = 1;
    if(A[t] == 1 && D[t] == -1)
      I[t][3] = 1;
      
  }

}

parameters {
  real beta0 ;
  real beta1 ;
  real gamma0 ;
  real gamma1 ;
  real phi0_d ;
  real phi0_u ;
  real phi1_d ;
  real phi1_u ;
}

model {
  beta0 ~ normal(0,1);
  beta1 ~ normal(0,1);
  gamma0 ~ normal(0,1);
  gamma1 ~ normal(0,1);
  phi0_d ~ normal(0,1);
  phi0_u ~ normal(0,1);
  phi1_d ~ normal(0,1);
  phi1_u ~ normal(0,1);
  /* formulas 5.25 to 5.28
  for(t in 2:N) {
    A[t] ~ bernoulli_logit(beta0 + beta1 * A[t-1]);
    if(A[t] == 1) {
      (D[t]==1) ~ bernoulli_logit(gamma0 + gamma1 * D[t-1]);
      if(D[t] == 1)
        target += geometric(S[t]-1, inv_logit(phi0_u + phi1_u * S[t-1]));
      else
        target += geometric(S[t]-1, inv_logit(phi0_d + phi1_d * S[t-1]));
    }
  }*/
  // formula 5.29
  for(t in 2:N)
    target += ADS_loglikelihood(I[t], A[t], D[t], S[t],
        inv_logit(beta0 + beta1 * A[t-1]),  
        inv_logit(gamma0 + gamma1 * D[t-1]), 
        inv_logit(phi0_u + phi1_u * S[t-1]), 
        inv_logit(phi0_d + phi1_d * S[t-1])) ;
}

We illustrate the decomposition model by analyzing the intraday transactions of IBM stock from November 1, 1990, to January 31, 1991. There were 63 trading days and 59,838 intraday transactions in the normal trading hours:

ibm91_ads <- read.table("data/ibm91-ads.dat", header=F )
colnames(ibm91_ads) <- c("A", "D", "S")
plot(ibm91_ads$S*ibm91_ads$D, xlab="Time", ylab="Price Change")

Let’s fit the ADS model to the data:

ADS <- stan_model("models/Chapter_5/ADS.stan")
fit_ADS <- sampling(ADS, data = list(N = length(ibm91_ads$A), A=ibm91_ads$A, D=ibm91_ads$D, S=ibm91_ads$S), chains = 4, cores = 4, iter=2000)

As you can see, the parameters recovered by Stan as almost the same as the ones showed in the book (it’s no surprise, we used the exact same analytic likelihood):

print(fit_ADS)

## Inference for Stan model: ADS.
## 4 chains, each with iter=500; warmup=250; thin=1; 
## post-warmup draws per chain=250, total post-warmup draws=1000.
## 
##             mean se_mean   sd      2.5%       25%       50%       75%     97.5%
## beta0      -1.06    0.00 0.01     -1.08     -1.06     -1.06     -1.05     -1.03
## beta1       0.96    0.00 0.02      0.92      0.95      0.96      0.97      0.99
## gamma0     -0.07    0.00 0.02     -0.10     -0.08     -0.07     -0.06     -0.03
## gamma1     -2.30    0.00 0.03     -2.37     -2.33     -2.30     -2.28     -2.24
## phi0_d      2.08    0.00 0.03      2.02      2.06      2.08      2.10      2.15
## phi0_u      2.23    0.00 0.03      2.17      2.21      2.23      2.25      2.30
## phi1_d     -0.51    0.00 0.02     -0.54     -0.52     -0.51     -0.50     -0.47
## phi1_u     -0.67    0.00 0.02     -0.71     -0.68     -0.67     -0.66     -0.63
## lp__   -56476.65    0.09 1.91 -56480.95 -56477.65 -56476.32 -56475.31 -56473.83
##        n_eff Rhat
## beta0   1219 1.00
## beta1   1082 1.00
## gamma0  1361 1.00
## gamma1  1161 1.00
## phi0_d   984 1.00
## phi0_u   862 1.00
## phi1_d   880 1.00
## phi1_u   793 1.00
## lp__     406 1.01
## 
## Samples were drawn using NUTS(diag_e) at Sat Jun  5 15:52:29 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The model estimation allows us to determine many interesting features. For example, we can calculate the probability of a price change depends on whether there was a previous price change:

pars <- rstan::extract(fit_ADS)
pA1_0 <- invlogit(pars$beta0 + 0*pars$beta1)
pA1_1 <- invlogit(pars$beta0 + 1*pars$beta1)
plot(density(pA1_0), xlim=c(0.2,0.5), col="red", xlab="Probability", main="")
lines(density(pA1_1), xlim=c(0.2,0.5), col="blue")
legend("topright", legend=c("P(Ai = 1 | Ai-1 = 0)", "P(Ai = 1 | Ai-1 = 1)"), col=c("red", "blue"), lty=1:1, lwd=2, cex=1.2)

This plot shows that price change may occur in clusters and, as expected, most transactions are without price change.

Moreover, we can see that

there is no preference for price increase or decrease after no price change (red line). So little margin for price change prediction.
Consecutive price increases or decreases are very low (blue line).
There is evidence of bid-ask spread bounce, as there is high chance of price increase after a price decrease (green line).

pD1_D0_A0 <- invlogit(pars$gamma0 + 0*pars$gamma1)
pD1_D1_A1 <- invlogit(pars$gamma0 + 1*pars$gamma1)
pD1_D_1_A1 <- invlogit(pars$gamma0 + -1*pars$gamma1)
plot(density(pD1_D0_A0), xlim=c(0,1), ylim=c(0, 200), col="red", xlab="Probability", main="")
lines(density(pD1_D1_A1), xlim=c(0,1), col="blue")
lines(density(pD1_D_1_A1), xlim=c(0,1), col="green")
legend("topright", legend=c("P(Di = 1 | Di-1 = 0)", "P(Di = 1 | Di-1 = 1)", "P(Di = 1 | Di-1 = -1)"), col=c("red", "blue", "green"), lty=1:1, lwd=2, cex=1.2)

Finally, we can see that there is weak evidence suggesting that big price changes have a higher probability to be followed by another big price change:

pS <- pgeom(0, prob=invlogit(mean(pars$phi0_u) + (1:10)*mean(pars$phi1_u))) 
par(mar=c(5,5,5,5))
plot(pS, ylab = "Probality of Price increase by 1", xlab=expression(S[t-1]), ylim=c(0,1),cex.lab=1.5, cex.axis=1.5)

5.5 DURATION MODELS

Duration models address the problem of time intervals between trades. Longer time periods between transactions indicate lack of trading activities, meaning that no information can be retrieved during these periods.

ACD Model

The ACD model can be viewed as an adaptation of GARCH to time durations:

\[ x_i = \psi_i \epsilon_i \\ \psi_i = \omega + \sum_{j=1}^{r} \gamma_i x_{i-j} + \sum_{j=1}^{s} \omega_i \psi_{i-j} \] where x_i is the time elapsed between to consecutive transactions. The error term _i can be modelled with an exponential distribution (EACD), Weibull (WACD) or gamma (GACD).
The ACD model can be coded in Stan as follows:


data {
  int<lower=0> N;
  vector[N] y;
  int family;
  real psi1;
}
parameters {
  real<lower=0> omega0;
  real<upper=1> omega1;
  real<upper=(1-omega1)> epsilon1;
  real<lower=0> alpha;
  vector<lower=0>[family==1] ka;
}
transformed parameters {
  vector[N] psi;
  psi[1] = psi1;
  for(t in 2:N)
    psi[t] = omega0 + epsilon1 * y[t-1] + omega1 * psi[t-1];
}

model {
  omega0 ~ normal(0.5,0.5);
  omega1 ~ normal(0.5,0.5);
  epsilon1 ~ normal(0.5,0.5);
  alpha ~ normal(1,1);
  ka ~ normal(1,1);
  if(family==0)
    y ~ weibull(alpha, psi);
  else
    y ~ gamma(alpha, 1. ./ (ka[1]*psi));
}
generated quantities {
   vector[N] log_lik;
   {
    if(family==0)
      log_lik[1] = weibull_lpdf(y[1] | alpha, psi[1]);
    else
      log_lik[1] = gamma_lpdf(y[1] | alpha, 1. ./ (ka[1]*psi[1]));
    
     for (t in 2:(N)) {
       if(psi[t] < 0) {
        log_lik[t] = 0;
        continue;
       }
       if(family==0)
        log_lik[t] = weibull_lpdf(y[t] | alpha, psi[t]);
       else
        log_lik[t] = gamma_lpdf(y[t] | alpha, 1. ./ (ka[1]*psi[t]));
     }
   }
}

Let’s apply the WACD to the IBM stock transaction durations on five consecutive trading days from November 1 to November 7, 1990:

ibm <- scan("data/ibm1to5-dur.txt")
par(mfrow=c(1,3))
plot.ts(ibm, ylab="Adjusted Duration", main="IBM trading duration")
hist(ibm)
Acf(ibm, ylab="Adjusted Duration", ylim=c(-0.1,0.2))

ACD <- stan_model("models/Chapter_5/ACD.stan")
fit_ACD_ibm <- sampling(ACD, data = list(N = length(ibm), y=ibm, family=0, psi1=1, M=0, m=vector()), chains = 4, cores = 4, iter=1000, control=list(adapt_delta=0.9))

print(fit_ACD_ibm, pars=names(fit_ACD_ibm)[1:4])

## Inference for Stan model: ACD.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##          mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
## omega0   0.20       0 0.06 0.10 0.15 0.19 0.23  0.34   542 1.01
## omega1   0.87       0 0.03 0.81 0.85 0.87 0.89  0.91   579 1.00
## epsilon1 0.07       0 0.01 0.05 0.06 0.06 0.07  0.09  1182 1.00
## alpha    0.88       0 0.01 0.86 0.87 0.88 0.89  0.90   751 1.00
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun  6 09:32:32 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The parameters recovered are very similar to Tsay’s.
We can also see that the standardized innovations do not show any sign of serial dependence, confirming the adequacy of the model:

pars <- rstan::extract(fit_ACD_ibm)
residuals <- ibm / colMeans(pars$psi)
Acf(residuals)

5.6 NONLINEAR DURATION MODELS

We apply Tsay’s test for nonlinearity on the residuals from the WACD model discussed before. This test against the null hypothesis that the time series follows some AR process, we see that the test suggests the presence of non-linearity:

Tsay(residuals, p = 4)

## Non-linearity test & its p-value:  1.89638 0.04110427 
## Degrees of freedom of the test:  10 3519

We therefore fit a threshold WACD model to the data (see the file TAR-ACD.stan to see the implementation in Stan), assuming two regimes with a threshold of 3.79:

TARACD <- stan_model("models/Chapter_5/TAR-ACD.stan")
fit_TARACD_ibm <- readRDS("models/Chapter_5/saved_RDS/fit_TARACD_ibm.rds")

TARACD <- stan_model("models/Chapter_5/TAR-ACD.stan")
fit_TARACD_ibm <- sampling(TARACD, data = list(N = length(ibm), y=ibm, threshold=3.79, psi1=1, M=0, m=vector()), chains = 4, cores = 4, iter=1000, control=list(adapt_delta=0.90))

We can see that parameters estimation is similar to Tsay, and that the non-linearity in the residuals is removed. Moreover, loo comparison seems to strongly prefer the non-linear model:

print(fit_TARACD_ibm, pars=names(fit_TARACD_ibm)[1:8])

## Inference for Stan model: TAR-ACD.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##            mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
## omega0_l   0.15    0.00 0.06 0.06 0.11 0.14 0.18  0.28   849 1.00
## omega0_g   0.89    0.01 0.25 0.45 0.71 0.88 1.05  1.40   609 1.01
## omega1_l   0.84    0.00 0.03 0.77 0.82 0.85 0.87  0.90   892 1.00
## omega1_g   0.74    0.00 0.08 0.58 0.69 0.75 0.80  0.88   627 1.01
## epsilon1_l 0.15    0.00 0.03 0.09 0.12 0.14 0.16  0.21  1060 1.00
## epsilon1_g 0.03    0.00 0.02 0.00 0.02 0.03 0.04  0.07  2356 1.00
## alpha_l    0.90    0.00 0.01 0.87 0.89 0.90 0.91  0.93  1337 1.00
## alpha_g    0.84    0.00 0.02 0.80 0.83 0.84 0.85  0.88  1619 1.00
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun  6 10:25:01 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

pars <- rstan::extract(fit_TARACD_ibm)
residuals_nl <- ibm / colMeans(pars$psi)
Tsay(residuals, p = 4)

## Non-linearity test & its p-value:  1.89638 0.04110427 
## Degrees of freedom of the test:  10 3519

Tsay(residuals_nl, p = 4)

## Non-linearity test & its p-value:  1.384131 0.1808949 
## Degrees of freedom of the test:  10 3519

loo_compare(loo(fit_ACD_ibm), loo(fit_TARACD_ibm))

##        elpd_diff se_diff
## model2   0.0       0.0  
## model1 -14.8       5.2

5.7 BIVARIATE MODELS FOR PRICE CHANGE AND DURATION

We now introduce a combined model for both price change and transaction duration. The PCD model focus on transactions that result in price change. According to this model and the notation described before, stock prices evolve over time as:

\[ P_{t_i} = P_{t_i-1} + D_iS_i \] The PCD model can be coded in Stan as follows:

functions {
  real geometric(real y, real phi){
    return(log(phi * (1 - phi)^y));
  }

}

data {
  int<lower=0> T; // length of data
  vector[T] log_dt; // log of time durations
  vector[T] D; // direction of movement
  int S[T]; // size of price change
  vector[T] N; // number of transactions
}

parameters {
  real  beta0 ;
  real  beta1 ;
  real  beta2 ;
  real<lower=0>  sigma ;
  real  alpha0 ;
  real  alpha1 ;
  real  gamma0 ;
  real  gamma1 ;
  real  omega0 ;
  real  omega1 ;
  real  omega2 ;
  real  Dbeta ;
  real  eta0_d ;
  real  eta1_d ;
  real  eta2_d ;
  real  eta3_d ;
  real  eta0_u ;
  real  eta1_u ;
  real  eta2_u ;
  real  eta3_u ;
}

model {
  sigma ~ cauchy(0,1);
  beta0 ~ normal(0,5);
  beta1 ~ normal(0,1);
  beta2 ~ normal(0,1);
  gamma0 ~ normal(0,5);
  gamma1 ~ normal(0,5);
  alpha0 ~ normal(0,5);
  alpha1 ~ normal(0,5);
  omega0 ~ normal(0,5);
  omega1 ~ normal(0,5);
  omega2 ~ normal(0,5);
  Dbeta ~ normal(0,1);
  eta0_d ~ normal(0,5);
  eta1_d ~ normal(0,5);
  eta2_d ~ normal(0,5);
  eta3_d ~ normal(0,5);
  eta0_u ~ normal(0,5);
  eta1_u ~ normal(0,5);
  eta2_u ~ normal(0,5);
  eta3_u ~ normal(0,5);
  
  for(t in 5:T) {
    // formula (5.48)
    log_dt[t] ~ normal(beta0 + beta1*log_dt[t-1] + beta2*S[t-1], sigma);
    // formula (5.49) and (5.50)
    if (N[t] == 0)
      target += bernoulli_logit_lpmf(0 | alpha0 + alpha1*log_dt[t]);
    else
      target += bernoulli_logit_lpmf(1 | alpha0 + alpha1*log_dt[t])
                              + geometric(N[t]-1, inv_logit(gamma0 + gamma1*log_dt[t]));
    // we use the comulative normal to represent formula (5.51), nice trick!
    (D[t]==1) ~ bernoulli(normal_cdf(0, omega0 + omega1*D[t-1] + omega2*log_dt[t], exp(Dbeta * fabs(sum(D[(t-4):(t-1)])))));  
    // formulas (5.52) and (5.53)
    if(D[t] == -1)
      (S[t]-1) ~ poisson(exp( eta0_d + eta1_d*log(N[t]+1) + eta2_d*log_dt[t] + eta3_d*S[t-1]));
    else
      (S[t]-1) ~ poisson(exp( eta0_u + eta1_u*log(N[t]+1) + eta2_u*log_dt[t] + eta3_u*S[t-1]));
  }
}

Let’s fit it intraday transactions of IBM stock on November 21, 1990. There are 194 price changes within normal trading hours. Tsay needed to implement a Gibbs sampler with 9500 iterations, in Stan we get it for free and with only 100 iterations:

day15 <- read.csv("data/day15v.dat", header = F, sep="\t")
colnames(day15) <- c("Day", "Time", "TT", "D", "S", "N", "I", "C")
par(mfrow=c(2,2))
hist(log(day15$TT)); hist(day15$S); hist(day15$D); hist(day15$N)

PCD <- stan_model("models/Chapter_5/PCD.stan")
fit_PCD<- sampling(PCD, data = list(T = nrow(day15), log_dt=log(day15$TT), D=day15$D, S=day15$S, N=day15$N), chains = 4, cores = 4, iter=1000, control=list(adapt_delta=0.90))

print(fit_PCD)

## Inference for Stan model: PCD.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##           mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## beta0     3.89    0.01 0.52    2.89    3.53    3.88    4.27    4.92  2233    1
## beta1     0.04    0.00 0.07   -0.10   -0.01    0.05    0.10    0.18  2998    1
## beta2    -0.05    0.01 0.41   -0.85   -0.32   -0.05    0.23    0.74  2539    1
## sigma     1.45    0.00 0.08    1.30    1.39    1.44    1.50    1.61  2266    1
## alpha0   -6.86    0.02 0.98   -8.89   -7.49   -6.83   -6.17   -4.98  2198    1
## alpha1    1.74    0.01 0.24    1.29    1.58    1.74    1.89    2.23  2209    1
## gamma0    3.22    0.01 0.69    1.88    2.73    3.23    3.69    4.59  2560    1
## gamma1   -0.85    0.00 0.13   -1.11   -0.94   -0.85   -0.76   -0.59  2518    1
## omega0   -0.12    0.01 0.37   -0.87   -0.36   -0.12    0.12    0.58  1998    1
## omega1    0.79    0.00 0.13    0.54    0.70    0.78    0.87    1.04  2810    1
## omega2    0.01    0.00 0.08   -0.15   -0.05    0.00    0.06    0.17  2035    1
## Dbeta     0.28    0.01 0.24   -0.07    0.13    0.24    0.38    0.82  1016    1
## eta0_d   -1.05    0.09 3.21   -7.08   -3.20   -1.19    0.98    5.51  1250    1
## eta1_d   -0.70    0.03 1.09   -2.83   -1.39   -0.68   -0.02    1.47  1867    1
## eta2_d    0.26    0.02 0.62   -0.89   -0.17    0.24    0.66    1.54  1405    1
## eta3_d   -3.35    0.08 2.87   -9.64   -5.15   -3.06   -1.17    1.16  1259    1
## eta0_u   -4.36    0.04 1.88   -7.97   -5.63   -4.41   -3.11   -0.63  1799    1
## eta1_u   -2.26    0.04 1.49   -5.82   -3.10   -2.01   -1.21    0.02  1354    1
## eta2_u    0.22    0.01 0.40   -0.51   -0.05    0.21    0.48    1.08  2221    1
## eta3_u    0.85    0.04 1.44   -2.37    0.05    1.03    1.86    3.27  1521    1
## lp__   -611.89    0.11 3.22 -619.29 -613.78 -611.63 -609.63 -606.56   829    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun  6 10:55:09 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

We can see that parameter estimation is pretty close to Tsay.

Search This Blog

No Time for Bayes