Analysis of Financial Time Series in Stan - Chapter 7 - Extreme Values, Quantiles, and Value at Risk

Chapter_7.knit

In this series we are trying to reproduce the models and examples listed in the book “Analysis of Financial Time Series, 3rd Edition”, by Ruey S. Tsay, using Stan https://mc-stan.org/ and the package RStan https://cran.r-project.org/web/packages/rstan/index.html. The main repository for the presented models and data can be found at https://github.com/marcomarconi/AFTS_with_Stan.

{
  require(tidyverse)
  require(rstan)
  require(bayesplot)
  require(loo)
  require(LaplacesDemon)
  library(xts)
  library(quantmod)
  library(FKF)
  library(MARSS)
  library(forecast)
  library(NTS)
  
  rstan_options(auto_write = TRUE)
  rstan_options(javascript = FALSE)
  theme_set(theme_classic(base_size = 24))
}

7.1 VALUE AT RISK

The Value at Risk (VaR) is a point estimate of potential financial loss, it tries to foresee the potential loss of a holding position due to adverse market movements during a certain period. VaR is commonly used to ensure that a financial institution is still in business even after a catastrophic event. VaR is usually defined as the upper quantile of a loss function L over a time period h. The Var define the amount of value we are risking over a certain time horizon given a determined probability. The Var therefore requires a predictive distribution of future returns of the financial position. For most of the models described in the previous posts the h steps ahead forecast is easily available, and so is the VaR. The first approach to VaR calculation we are going to see is the RiskMetrics methodology developed at J.P. Morgan in 1995. In its simple form, RiskMetrics assumes that the continuously compounded daily return of a portfolio follows a conditional normal distribution.In its simple form, RiskMetrics assumes that the continuously compounded daily return of a portfolio follows a conditional normal distribution. This simple formulation can be applied to most of the models described in previous posts. Consider the IGARCH model applied to the IBM log returns seen in post about Chapter 3 (GARCHs models):

ibm <- read.table("data/d-ibm6298.txt", header=T)
ibm$date <- as.Date(ibm$date %>% as.character(), format="%Y%m%d")
plot(ibm$date,ibm$rtn)

IGARCH <- stan_model("models/Chapter_3/IGARCH.stan")
fit_ibm <- sampling(IGARCH, data = list(N = length(ibm$rtn), y=ibm$rtn, sigma1=sd(ibm$rtn)), chains = 4, cores = 4, iter=500)
print(fit_ibm, names(fit_ibm)[1:4])
## Inference for Stan model: IGARCH.
## 4 chains, each with iter=500; warmup=250; thin=1; 
## post-warmup draws per chain=250, total post-warmup draws=1000.
## 
##        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
## mu     0.00       0 0.00 0.00 0.00 0.00 0.00  0.00  1227 1.00
## alpha0 0.00       0 0.00 0.00 0.00 0.00 0.00  0.00   353 1.01
## alpha1 0.06       0 0.01 0.05 0.06 0.06 0.07  0.08   490 1.00
## beta1  0.93       0 0.01 0.91 0.92 0.93 0.93  0.94   402 1.00
## 
## Samples were drawn using NUTS(diag_e) at Sat Jun 12 15:47:04 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

We can easily calculate the VaR at 5% and 1% threshold. Noticed that in the bayesian context we can easily get uncertainty about VaR estimation:

pars <- extract(fit_ibm)
sd <- pars$sigma
res <- sapply(1:nrow(sd), function(i)  ibm$rtn[i] / sd[,i] )
sd_1 <- sqrt( sd[,ncol(sd)]^2 + pars$alpha1 * sd[,ncol(sd)]^2 * (res[,ncol(res)]^2 - 1))
VaR_0.05 <- (1.65 * (sd_1) * 10e+6) 
VaR_0.01 <- (2.326 * (sd_1) * 10e+6) 
ggplot(rbind(data.frame(VaR = VaR_0.01, Quantile="VaR 99%"), data.frame(VaR = VaR_0.05, Quantile="VaR 5%"))) +
  geom_density(aes(VaR/1e6, fill=Quantile)) + xlab("VaR (millions $)") 

7.5 EXTREME VALUE THEORY

Extreme value theory concerns with the study of extraordinary events, for example extreme deviations from the median of some probability distribution. In case of financial prices we are going to analyze the tail behavior of returns distribution. One approach is to observe the distribution of the maximums and minimums of returns. However, for a given sample, there is only a single minimum or maximum. One simple idea is to split the dataset into subsample and get maximum and minimum of each subsample. In the case of stock prices the size of each subsample is generally guided by practical considerations (e.g. we have 21 trading days in a month). Maxima and minima tend to follow extreme value distributions, like the generalized extreme value distribution (GEV). We can model the GEV in Stan as follows:

functions {
  real GEVd_lpdf(real y, real mu, real sigma, real xi) {
    //  GEV log pdf 
    real t;
    real log_t;
    if(xi == 0) 
      t = exp(-(y-mu)/sigma);
    else 
      t = pow((1 + xi*((y-mu)/sigma)), -1/xi);
    return log(1/sigma * pow(t, xi+1) * exp(-t) );
  }
}

data {
  int N;
  vector[N] y;
}

parameters {
  real mu;
  real xi;
  real<lower=0> sigma;
}

model {
  mu ~ normal(0,10);
  xi ~ normal(0,1);
  sigma ~ cauchy(0,1);
  for(i in 1:N)
    y[i] ~ GEVd(mu, sigma, xi);
}

Let’s apply the GEV to the IBM prices seem before. We are going to use 21 as subperiod, as extract the extreme maxima and minima:

dates <- split(ibm$date, ceiling(seq_along(ibm$date)/21)) %>% lapply(function(x) head(x,1)) %>% unlist %>% as.Date
mins <- split(ibm$rtn, ceiling(seq_along(ibm$rtn)/21))  %>% lapply(min) %>% unlist 
maxs <- split(ibm$rtn, ceiling(seq_along(ibm$rtn)/21))  %>% lapply(max) %>% unlist 
names(maxs) <- dates
names(mins) <- dates
par(mfrow=c(2,1), mar=c(2,2,2,2))
barplot(maxs, cex.names = 1)
barplot(mins, cex.names = 1)

Now we can estimate the parameters of the generalized extreme value distribution for IBM daily log returns:

GEV <- stan_model("models/Chapter_7/GEV.stan")
fit_gev_max <- sampling(GEV, data = list(N = length(maxs), y=maxs*100), chains = 4, cores = 4)
fit_gev_min <- sampling(GEV, data = list(N = length(mins), y=-mins*100), chains = 4, cores = 4)
print(fit_gev_max)
## Inference for Stan model: GEV.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## mu       2.21    0.00 0.05    2.11    2.18    2.21    2.25    2.31  2710    1
## xi       0.18    0.00 0.04    0.11    0.15    0.18    0.20    0.25  2455    1
## sigma    0.96    0.00 0.04    0.88    0.93    0.96    0.99    1.04  2800    1
## lp__  -716.73    0.03 1.21 -719.80 -717.30 -716.43 -715.84 -715.31  1949    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun 13 17:09:18 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
print(fit_gev_min)
## Inference for Stan model: GEV.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## mu       1.89    0.00 0.04    1.80    1.86    1.89    1.92    1.97  2111    1
## xi       0.19    0.00 0.04    0.12    0.16    0.19    0.21    0.26  2508    1
## sigma    0.81    0.00 0.03    0.75    0.79    0.81    0.84    0.88  2140    1
## lp__  -645.06    0.03 1.21 -648.25 -645.62 -644.74 -644.17 -643.66  1660    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun 13 17:09:21 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

The \(\xi\) parameter is significantly positive, meaning that daily log return may have a heavier left tail (it follows the Fréchet family). This means that the common normality assumptions for the daily log returns is not adequate. We can also observe the residuals of the GEV and check that they follow an exponential distribution:

pars <- extract(fit_gev_max, pars=c("mu", "sigma", "xi")) 
pars_mean <- pars %>%  lapply(mean)
w <- (1 + pars_mean$xi * (maxs*100 - pars_mean$mu) / pars_mean$sigma)^(-1/pars_mean$xi)
par(mfrow=c(1,2))
plot(w, ylab="Residuals")
qqplot(w, rexp(1000), ylab="Exponential quantiles")

7.6 EXTREME VALUE APPROACH TO VAR

The previously describe extreme value approach can be used to calculate the VaR. After estimating the parameters of the GEV, the VaR of a financial position given a small upper probability p is:

\[\begin{equation} VaR = \begin{cases} \mu - \sigma/\xi \{1- [-n\ln(1-p) ]^{-\xi} \} & \text{if } \xi \neq 0 \\ \mu - \sigma/\xi \ln[-n\ln(1-p) ] & \text{if } \xi = 0 \end{cases} \end{equation}\]

Let’s compare this VaR calculation based on extreme value theory with the previous one from RiskMetrics. We also include an empirical estimation based on resampling of the quantiles from the returns distribution (see section 7.4 of the book):

# VaR based on extreme value theory
pars <- extract(fit_gev_min, pars=c("mu", "sigma", "xi")) 
n <- 21
p <- 0.01
VaR_EVT <- (pars$mu - pars$sigma/pars$xi * (1- (-n * log(1-p))^(-pars$xi))) / 100 * 10e+6
# Var based on empirical quantile estimation
nibm=-log(ibm[,2]+1)*100
VaR_quantile <- sapply(1:1000, function(x) quantile(sample(nibm, size = length(nibm), replace = TRUE),  0.99) / 100 * 10e+6 )
# plot all together
df <- rbind(data.frame(VaR = VaR_0.01, Measure="RiskMetrics"), data.frame(VaR = VaR_EVT, Measure="EVT"), data.frame(VaR = VaR_quantile, Measure="Quantiles"))
ggplot(df) + geom_density(aes(VaR/1e6, fill=Measure), alpha=0.75) + xlab("VaR (millions $)") 

Curiously, the EVT estimation is the lowest one. According to Tsay, “VaR obtained here via the traditional extreme value theory may not be adequate because the independent assumption of daily log returns is often rejected by statistical testings. Finally, the use of sub-period maxima overlooks the fact of volatility clustering in the daily log returns”.

7.7 NEW APPROACH BASED ON THE EXTREME VALUE THEORY

Instead of subdividing the sample into n subperiods and taking the maxima/minima we can select the events whose returns exceed some threshold value. This “extreme” returns can be then fitted to a generalized pareto distribution (GPD), out of which we can infer a VaR.
The GPD can be modeled in Stan as follows:

functions {
  real gpareto_lpdf(vector y, real ymin, real k, real sigma) {
    // generalised Pareto log pdf 
    int N = rows(y);
    real inv_k = inv(k);
    if (k<0 && max(y-ymin)/sigma > -inv_k)
      reject("k<0 and max(y-ymin)/sigma > -1/k; found k, sigma =", k, sigma)
    if (sigma<=0)
      reject("sigma<=0; found sigma =", sigma)
    if (fabs(k) > 1e-15)
      return -(1+inv_k)*sum(log1p((y-ymin) * (k/sigma))) -N*log(sigma);
    else
      return -sum(y-ymin)/sigma -N*log(sigma); // limit k->0
  }
  real gpareto_cdf(vector y, real ymin, real k, real sigma) {
    // generalised Pareto cdf
    real inv_k = inv(k);
    if (k<0 && max(y-ymin)/sigma > -inv_k)
      reject("k<0 and max(y-ymin)/sigma > -1/k; found k, sigma =", k, sigma)
    if (sigma<=0)
      reject("sigma<=0; found sigma =", sigma)
    if (fabs(k) > 1e-15)
      return exp(sum(log1m_exp((-inv_k)*(log1p((y-ymin) * (k/sigma))))));
    else
      return exp(sum(log1m_exp(-(y-ymin)/sigma))); // limit k->0
  }
  real gpareto_lcdf(vector y, real ymin, real k, real sigma) {
    // generalised Pareto log cdf
    real inv_k = inv(k);
    if (k<0 && max(y-ymin)/sigma > -inv_k)
      reject("k<0 and max(y-ymin)/sigma > -1/k; found k, sigma =", k, sigma)
    if (sigma<=0)
      reject("sigma<=0; found sigma =", sigma)
    if (fabs(k) > 1e-15)
      return sum(log1m_exp((-inv_k)*(log1p((y-ymin) * (k/sigma)))));
    else
      return sum(log1m_exp(-(y-ymin)/sigma)); // limit k->0
  }
  real gpareto_lccdf(vector y, real ymin, real k, real sigma) {
    // generalised Pareto log ccdf
    real inv_k = inv(k);
    if (k<0 && max(y-ymin)/sigma > -inv_k)
      reject("k<0 and max(y-ymin)/sigma > -1/k; found k, sigma =", k, sigma)
    if (sigma<=0)
      reject("sigma<=0; found sigma =", sigma)
    if (fabs(k) > 1e-15)
      return (-inv_k)*sum(log1p((y-ymin) * (k/sigma)));
    else
      return -sum(y-ymin)/sigma; // limit k->0
  }
  real gpareto_rng(real ymin, real k, real sigma) {
    // generalised Pareto rng
    if (sigma<=0)
      reject("sigma<=0; found sigma =", sigma)
    if (fabs(k) > 1e-15)
      return ymin + (uniform_rng(0,1)^-k -1) * sigma / k;
    else
      return ymin - sigma*log(uniform_rng(0,1)); // limit k->0
  }
}
data {
  real ymin;
  int<lower=0> N;
  vector<lower=ymin>[N] y;
}
transformed data {
  real ymax = max(y);
}
parameters {
  real<lower=0> sigma; 
  real<lower=-sigma/(ymax-ymin)> k; 
}
model {
  y ~ gpareto(ymin, k, sigma);
}
generated quantities {
  vector[N] log_lik;
  vector[N] yrep;
  for (n in 1:N) {
    log_lik[n] = gpareto_lpdf(rep_vector(y[n],1) | ymin, k, sigma);
    yrep[n] = gpareto_rng(ymin, k, sigma);
  }

}

Let’s extract the extreme event above the 3% threshold and fit them:

GPD <- stan_model("models/Chapter_7/GPD.stan")
threshold <- 0.03
y <- (-ibm$rtn)[(-ibm$rtn) > 0.03]
fit_gpd <- sampling(GPD, data = list(ymin = threshold, N = length(y), y=y), chains = 4, cores = 4)
print(fit_gpd, names(fit_gpd)[1:2], digits=5)
## Inference for Stan model: GPD.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean      sd    2.5%     25%     50%     75%   97.5% n_eff
## sigma 0.00851 0.00002 0.00102 0.00666 0.00779 0.00845 0.00917 0.01062  1764
## k     0.30452 0.00227 0.09272 0.14437 0.23894 0.29799 0.36255 0.50408  1662
##          Rhat
## sigma 1.00304
## k     1.00205
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun 13 17:09:23 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

We can now extract the corresponding VaR and compare it to the previous ones. I am using formula (7.36) in the book to calculate the VaR from a GPD. Notice that the parametrization that we use do not include the location parameter \(\beta\), therefore I am using the sample mean as an approximation. The alternative parametrization is discussed in the book at section “7.7.5 Alternative Parameterization”:

pars <- extract(fit_gpd)
D <- 252
p <- 0.01
# formula (7.36) in the book. Notice that I have to use the sample mean as location.
VaR_GPD <- (mean(y)-((pars$sigma-pars$k*(threshold-mean(y)))/pars$k)*(1-(-D * log(1-p))^(-pars$k)) )  * 10e6
df <- rbind(data.frame(VaR = VaR_0.01, Measure="RiskMetrics"), 
            data.frame(VaR = VaR_EVT, Measure="EVT"), 
            data.frame(VaR = VaR_quantile, Measure="Quantiles"),
            data.frame(VaR = VaR_GPD, Measure="GPD")
            ) %>% mutate(Measure=factor(Measure, levels=c("RiskMetrics", "Quantiles", "EVT", "GPD")))
ggplot(df) + geom_density(aes(VaR/1e6, fill=Measure), alpha=0.75) + xlab("VaR (millions $)") 

It looks like that the GPD in this situation returns the most conservative estimation.

Comments