Introduction
Since Tinbergen (1933)'s modelling work on business cycles post World War I, many econometric methods were developed with the aim of constructing ever better economic and financial forecasts. The subject evolved through statistical linear and non-linear relationships with Autoregressive (AR) (Yule (1927)), Moving Average (MA) (N. and Wold (1939)), Autoregressive Moving Average (ARMA) (Whittle (1951)), Autoregressive Conditional Heteroskedasticity (ARCH) (Engle (1982)), Generalized ARCH (GARCH) (Bollerslev (1986)), Exponential GARCH (EGARCH) (Nelson (1991)), threshold nonlinear ARMA (Tong (1990)), and Glosten- Jagannathan-Runkle GARCH (GJR-GARCH) (Glosten, Jagannathan, and Runkle (1993))) models amongst others. $$ \\ $$
Stylized Facts of Financial Return (SFFR)
These models’ uses in financial time-series analysis have grown to become staple tools in academia; but many came upon what are now well established asset return forecasting challenges: Stylized Facts of Financial Return (SFFR) - a set of properties found in most asset return datasets (Booth and Gurun (2004); Harrison (1998); Mitchell, Brown, and Easton (2002)). Many SFFR exist, but only a few are of interest in this instance: asset returns' (i) non-Gaussian distribution (unconditional leptokurtosis/excess-kurtosis and negative-skewness/asymmetry), (ii) weak serial correlation (id est (i.e.): the lack of correlation between returns on different days), (iii) volatility clustering (heteroskedasticity), as well as (iv) serial correlation in squared returns, and (v) leverage effects (the tendency for variance to rise more following large negative price changes than positive ones of similar magnitude). (Taylor (2005))
Numerous time-series processes fail to encapsulate these SFFR, exempli gratia (e.g.): AR, MA and ARMA models may necessitate the assumption of homoscedasticity (i.e.: constant variance in error terms) - failing SFFR (iii), (iv) and (v) - which is a useful characteristic to include as large changes (in asset prices) tend to be followed by large changes-of either sign-and small changes tend to be followed by small changes" (Mandelbrot (1963), p.418).
In this article, I address these challenges using aforementioned models' forecasts in the framework outlined in Christoffersen and Diebold (2006) (C&D hereon) to focus on asset return sign changes instead of their level changes as highlighted in Pesaran and Timmermann (2000). My investigation expands on the work of Chronopoulos, Papadimitriou, and Vlastakis (2018) (CPV hereon) - replicating their research on the United States (U.S.)' Standard & Poor's 500 Index (SPX), and studies its feasible financial impact by constructing and assessing investment strategies stemming from findings.
Evidence that an efficient trading strategy may be constructible for the SPX are found.
Breakdown
In this article, I primarily study the frameworks outlined in CPV to investigate the explanatory and forecasting powers of information demand on the SPX. $$ \\ $$ Financial applications of my models are studied via the perspective of a single profit maximising investor in a manner akin to CPV (Kandel and Stambaugh (1996)). In the interest of time, I only investigate the scenario where the investing agent's risk aversion implies taking a risk as soon as its expected reward is positive. Later we will look into a framework to optimise this risk aversion level to allow for a truly profit maximising strategy.
- Part 1: Data Retrieval and Analysis: In this part, we start the R code, construct some rudimentary functions and collect data from Refinitiv all while explaining their use. We also go through the concept of risk-free rates and excess-returns as well as realised variance & realised volatility in addition to Google’s Search Vector Index before lightly analysing the retrieved data.
- Part 2: Econometric Modelling of Non-Linear Variances: In this section, there is no coding. Instead we go – step by step and including working outs – through ARMA models, their parameter estimation process (via log-likelihood), the concepts of stationarity, Taylor’s expansion, the calculus’ chain rule, first order conditions, and GARCH models (including the GACH, GJRGARCH and EGARCH models). Finally, we outline the way in which we construct error values from which to evaluate the performance of our models.
- Part 3: Coding Non-Linear Variance Econometric Models: In this section we encode the mathematical models and concepts in Part 2. We build our code, explaining it step-by-step, using in-series (computationally slow) and in-parallel (computationally fast) methods. Finally we expose the Diebold-Mariano test results for each pair of model performances, with and without using SVIs.
- Part 4: Sign Forecast Frameworks: Here we outline the Christopherson and Diebold model to estimate the probability of a positive returns, we compare the predictive performance of each model via rudimentary methods, Brier Scores and Diebold and Mariano statistics.
- Part 5: Financial significance: Finally, we put ourselves in the shoes of a profit maximising investor who reinvests – on a daily basis - in the risk-free asset or the S&P500 if the model suggests that its excess return will be positive with more probability than not. We graph the cumulative returns of such an investor for each model-following-strategy and discuss the results with the help of Sharpe-Ratios before concluding.
Development Tools & Resources
Refinitiv's DataStream Web Services for R (DatastreamDSWS2R): Access to DataStream data. A DataStream or Refinitiv Workspace IDentification (ID) will be needed to run the code bellow.
Part 4: Sign Forecast Frameworks
Following from C\&D, Christoffersen, Diebold, Mariano, Tay & Tse (2006)'s working paper outlined how the conditional probability of a positive return in the next time period is $$\begin{equation} \mathbb{P}(R_{t+1} > 0 | \Omega_{t}) = 1 - \mathbb{P} \bigg( \frac{R_{t+1} - \mu}{\sigma_{t+1|t}} \leq \frac{-\mu}{\sigma_{t+1|t}} \bigg) = 1 - F\bigg( \frac{-\mu}{\sigma_{t+1|t}} \bigg) \end{equation}$$
where $\mathbb{P}$ denotes probability, where there is no conditional mean predictability in returns $\textbf{: } \mu_{t+1|t} = \mu$, where $R_{t+1}\sim D(\mu,\sigma_{t+1|t})$ for a generic distribution $D$ dependent only on $\mu$ and $\sigma$ (such as the Normal distribution), and where $F$ is the distribution function of the “standardized” return $\frac{R_{t+1} - \mu}{\sigma_{t+1|t}}$. Then $R_{t+1}$'s sign is predictable even if its conditional mean isn't, provided that $\mu \neq 0$. If $F$ is asymmetric - as per SFFR(i) - then forecast constructions are still possible if $\mu = 0$.
$R_{t+1}$ sign forecasts are formulated as $$\begin{equation} \label{eq:PiCandD} \hat{\pi}_{C\&D_M, t+1} = \hat{\mathbb{P}}_{C\&D_M}(R_{t+1|t}>0) = 1 - \frac{1}{t} \sum_{k=1}^{t} I \bigg( \frac{R_{k} - {\hat{\mu}_{k|k-1}}}{\hat{\sigma}_{k|k-1}} \leq \frac{-\hat{\mu}_{t+1|t}}{\hat{\sigma}_{t+1|t}} \bigg) \end{equation}$$
where $M$ is the model used to compute recursive one-step-ahead out-of-sample forecasts $\hat{\mu}$ and $\hat{\sigma}$ ranging from GARCH to EGARCH-SVI (note that the model used to estimate $\hat{\mu}_{t+1|t}$ and $\hat{\sigma}_{t+1|t}$ are not specified on the right-hand-side of this equation to keep the notation uncluttered; it is so throughout the article), and where $k \in \mathbb{Z}$.
Remember our Naïve Framework. A Naïve model of sign change is used as a benchmark to construct comparisons. It is formulated as $$ \hat{\pi}_{Naive, t+1} = \hat{\mathbb{P}}_{Naive}(R_{t+1|t}>0) = \frac{1}{t} \sum_{k=1}^{t} I (R_k > 0) \text{ .}$$
# Naive-Model's Indicator function according to the GARCH11 Model:
Naive_I = ifelse(matrix(df[, c("SPX_R")])>0 , 1 , 0)
# Naive Model:
Naive = (1:(length(matrix(df[, c("SPX_R")]))))
for(t in c(1:(length(matrix(df[, c("SPX_R")])))))
{Naive[t] = (1/t) * sum(matrix(df[, c("SPX_R")])[1:t])}
Naive_zoo = zoo(Naive, index(df[, c("SPX_R")]))
A few pre-required variables (and R variables) have to be constructed first:
# mean of return up to time 'k'
R_mu = (1:length(matrix(df[, c("SPX_R")])))
for (i in c(1:length(matrix(df[, c("SPX_R")]))))
{R_mu[i] = mean(matrix(df[, c("SPX_R")])[1:i])}
# standard deviation of return up to time 'k'.
# Note that its 1st value is NA since there is no standard deviation for a
# constant (according to R).
R_sigma = (1:length(matrix(df[, c("SPX_R")])))
for (i in c(1:length(matrix(df[, c("SPX_R")]))))
{R_sigma[i] = sd(matrix(df[, c("SPX_R")])[1:i])}
R_sigma[1] = 0 # Since the variance of a constant is 0.
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.02.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.02.RData")
Computing in series the values $\hat{\pi}_{C\&D_{GARCH(1,1)}, t+1} $
CnDprocess_ser = function(regressand = df[, c("SPX_R")],
GARCH_mu_hat,
GARCH_sigma_hat)
{
# # Create our variables:
# C&D's Indicator function according to the GARCH11 Model:
GARCH_CandD_I = matrix(, nrow = roll, ncol = roll)
GARCH_CandD_I_k = matrix(, nrow = roll, ncol = roll)
GARCH_CandD_I_t = matrix( - (GARCH_mu_hat/GARCH_sigma_hat))
for(t in c(1:roll))
{for (k in c(1:t))
{GARCH_CandD_I_k[k,t] =
ifelse(((matrix(regressand)[252+k] - GARCH_mu_hat[k])/GARCH_sigma_hat[k])
<= GARCH_CandD_I_t[t+1],
1,0)
# Note that we have some missing values due to the model not always managing
# to converge to estimates of sigma and mu; since we need the t+1 model
# estimates to compute its CandD_I_k, we also loose its t'th value for
# each model's missing value.
# We also lose the last value (the roll'th value) for the same reason.
GARCH_CandD_I[k,t] = ifelse((is.na(GARCH_CandD_I_k[k,t])), NA,
(sum(GARCH_CandD_I_k[1:k,t], na.rm = TRUE)))}}
# C&D's model according to the GARCH11 Model:
GARCH_CandD = (1:(roll-1))
for(i in c(1:(roll-1)))
{GARCH_CandD[i] = 1 - ((GARCH_CandD_I[i,i])/
length(na.omit(GARCH_CandD_I[1:i,i])))}
GARCH_CandD_zoo = zoo(GARCH_CandD, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
GARCH_CandD_zoo_no_nas = na.omit(GARCH_CandD_zoo)
## Cleaning datasets
# rm(GARCH_CandD_I_t)
# rm(GARCH_CandD_I_k)
# rm(GARCH_CandD_I)
return(list(GARCH_CandD_I_k,
GARCH_CandD_I,
GARCH_CandD,
GARCH_CandD_zoo,
GARCH_CandD_zoo_no_nas
))
}
GARCH11_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],
GARCH_mu_hat = GARCH11_mu_hat_ser,
GARCH_sigma_hat = GARCH11_sigma_hat_ser)
GARCH11_CandD_I_k_ser = GARCH11_CandD_process_ser[[1]]
GARCH11_CandD_I_ser = GARCH11_CandD_process_ser[[2]]
GARCH11_CandD_ser = GARCH11_CandD_process_ser[[3]]
GARCH11_CandD_zoo_ser = zoo(GARCH11_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
GARCH11_CandD_zoo_no_nas_ser = zoo(na.omit(GARCH11_CandD_zoo_ser))
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.03.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.03.RData")
Computing in parallel the values $\hat{\pi}_{C\&D_{GARCH(1,1)}, t+1} $
CnDprocess_t_par = function(
t,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = GARCH11_mu_hat_par,
pre_GARCH_sigma_hat = GARCH11_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
{
pre_GARCH_CandD_I_t = matrix( - (pre_GARCH_mu_hat/pre_GARCH_sigma_hat))
for (k in c(1:t))
{pre_GARCH_CandD_I_k[k] =
ifelse(((matrix(regressand)[252+k] -
pre_GARCH_mu_hat[k])/
pre_GARCH_sigma_hat[k])
<= pre_GARCH_CandD_I_t[t+1],
1,0)
# Note that we have some missing values due to the model not always managing
# to converge to estimates of sigma and mu; since we need the t+1 model
# estimates to compute its CandD_I_k, we also loose its t'th value for
# each model's missing value.
# We also lose the last value (the roll'th value) for the same reason.
pre_GARCH_CandD_I[k] =
ifelse((is.na(pre_GARCH_CandD_I_k[k])),
NA,
(sum(pre_GARCH_CandD_I_k[1:k],
na.rm = TRUE)))}
return(list(pre_GARCH_CandD_I_k, pre_GARCH_CandD_I))
}
no_of_clusters = detectCores(logical = FALSE)
cl = makeCluster(no_of_clusters)
GARCH11_CnDprocess_par = clusterApply(cl,
x = 1:roll, # t
fun = CnDprocess_t_par,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = GARCH11_mu_hat_par,
pre_GARCH_sigma_hat = GARCH11_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
stopCluster(cl)
# C&D's model according to the GARCH11 Model:
GARCH11_CandD_par = (1:(roll-1))
for(i in c(1:(roll-1)))
{GARCH11_CandD_par[i] = 1 - ((GARCH11_CnDprocess_par[[i]][[2]][i])/
length(na.omit(GARCH11_CnDprocess_par[[i]][[2]])))}
GARCH11_CandD_zoo_par = zoo(GARCH11_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
GARCH11_CandD_zoo_no_nas_par = na.omit(GARCH11_CandD_zoo_par)
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.04.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.04.RData")
It is interesting to see that each 'CnDprocess_t' column is not the same. In addition to having 'NA's at different locations, they were built off recursive modellings of $\hat{\mu_t}$ and $\hat{\sigma_t}$. This is most apparent in the second element of the last list in 'CnDprocess_t', where the summation of previous outputs emphasise this effect:
Computing in series the values $\hat{\pi}_{C\&D_{GARCH(1,1)-SVI}, t+1} $
GARCH11SVI_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],
GARCH_mu_hat = GARCH11SVI_mu_hat_ser,
GARCH_sigma_hat = GARCH11SVI_sigma_hat_ser)
GARCH11SVI_CandD_I_k_ser = GARCH11SVI_CandD_process_ser[[1]]
GARCH11SVI_CandD_I_ser = GARCH11SVI_CandD_process_ser[[2]]
GARCH11SVI_CandD_ser = GARCH11SVI_CandD_process_ser[[3]]
GARCH11SVI_CandD_zoo_ser = zoo(GARCH11SVI_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
GARCH11SVI_CandD_zoo_no_nas_ser = zoo(na.omit(GARCH11SVI_CandD_zoo_ser))
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.05.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.05.RData")
Computing in parallel the values $\hat{\pi}_{C\&D_{GARCH(1,1)-SVI}, t+1} $
no_of_clusters = detectCores(logical = FALSE)
cl = makeCluster(no_of_clusters)
GARCH11SVI_CnDprocess_par = clusterApply(cl,
x = 1:roll, # t
fun = CnDprocess_t_par,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = GARCH11SVI_mu_hat_par,
pre_GARCH_sigma_hat = GARCH11SVI_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
stopCluster(cl)
# C&D's model according to the GARCH11SVI Model:
GARCH11SVI_CandD_par = (1:(roll-1))
for(i in c(1:(roll-1)))
{GARCH11SVI_CandD_par[i] = 1 - ((GARCH11SVI_CnDprocess_par[[i]][[2]][i])/
length(na.omit(GARCH11SVI_CnDprocess_par[[i]][[2]])))}
GARCH11SVI_CandD_zoo_par = zoo(GARCH11SVI_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
GARCH11SVI_CandD_zoo_no_nas_par = na.omit(GARCH11SVI_CandD_zoo_par)
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.06.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.06.RData")
Computing in series the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)}, t+1} $
GJRGARCH11_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],
GARCH_mu_hat = GJRGARCH11_mu_hat_ser,
GARCH_sigma_hat = GJRGARCH11_sigma_hat_ser)
GJRGARCH11_CandD_I_k_ser = GJRGARCH11_CandD_process_ser[[1]]
GJRGARCH11_CandD_I_ser = GJRGARCH11_CandD_process_ser[[2]]
GJRGARCH11_CandD_ser = GJRGARCH11_CandD_process_ser[[3]]
GJRGARCH11_CandD_zoo_ser = zoo(GJRGARCH11_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
GJRGARCH11_CandD_zoo_no_nas_ser = zoo(na.omit(GJRGARCH11_CandD_zoo_ser))
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.07.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.07.RData")
Computing in parallel the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)}, t+1} $
no_of_clusters = detectCores(logical = FALSE)
cl = makeCluster(no_of_clusters)
GJRGARCH11_CnDprocess_par = clusterApply(cl,
x = 1:roll, # t
fun = CnDprocess_t_par,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = GJRGARCH11_mu_hat_par,
pre_GARCH_sigma_hat = GJRGARCH11_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
stopCluster(cl)
# C&D's model according to the GJRGARCH11 Model:
GJRGARCH11_CandD_par = (1:(roll-1))
for(i in c(1:(roll-1)))
{GJRGARCH11_CandD_par[i] = 1 - ((GJRGARCH11_CnDprocess_par[[i]][[2]][i])/
length(na.omit(GJRGARCH11_CnDprocess_par[[i]][[2]])))}
GJRGARCH11_CandD_zoo_par = zoo(GJRGARCH11_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
GJRGARCH11_CandD_zoo_no_nas_par = na.omit(GJRGARCH11_CandD_zoo_par)
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.08.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.08.RData")
Computing in series the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)-SVI}, t+1} $
GJRGARCH11SVI_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],
GARCH_mu_hat = GJRGARCH11SVI_mu_hat_ser,
GARCH_sigma_hat = GJRGARCH11SVI_sigma_hat_ser)
GJRGARCH11SVI_CandD_I_k_ser = GJRGARCH11SVI_CandD_process_ser[[1]]
GJRGARCH11SVI_CandD_I_ser = GJRGARCH11SVI_CandD_process_ser[[2]]
GJRGARCH11SVI_CandD_ser = GJRGARCH11SVI_CandD_process_ser[[3]]
GJRGARCH11SVI_CandD_zoo_ser = zoo(GJRGARCH11SVI_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
GJRGARCH11SVI_CandD_zoo_no_nas_ser = zoo(na.omit(GJRGARCH11SVI_CandD_zoo_ser))
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.09.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.09.RData")
Computing in parallel the values $\hat{\pi}_{C\&D_{GJRGARCH(1,1)-SVI}, t+1} $
no_of_clusters = detectCores(logical = FALSE)
cl = makeCluster(no_of_clusters)
GJRGARCH11SVI_CnDprocess_par = clusterApply(cl,
x = 1:roll, # t
fun = CnDprocess_t_par,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = GJRGARCH11SVI_mu_hat_par,
pre_GARCH_sigma_hat = GJRGARCH11SVI_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
stopCluster(cl)
# C&D's model according to the GJRGARCH11SVI Model:
GJRGARCH11SVI_CandD_par = (1:(roll-1))
for(i in c(1:(roll-1)))
{GJRGARCH11SVI_CandD_par[i] = 1 - ((GJRGARCH11SVI_CnDprocess_par[[i]][[2]][i])/
length(na.omit(GJRGARCH11SVI_CnDprocess_par[[i]][[2]])))}
GJRGARCH11SVI_CandD_zoo_par = zoo(GJRGARCH11SVI_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
GJRGARCH11SVI_CandD_zoo_no_nas_par = na.omit(GJRGARCH11SVI_CandD_zoo_par)
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.10.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.10.RData")
Computing in series the values $\hat{\pi}_{C\&D_{EGARCH(1,1)}, t+1} $
EGARCH11_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],
GARCH_mu_hat = EGARCH11_mu_hat_ser,
GARCH_sigma_hat = EGARCH11_sigma_hat_ser)
EGARCH11_CandD_I_k_ser = EGARCH11_CandD_process_ser[[1]]
EGARCH11_CandD_I_ser = EGARCH11_CandD_process_ser[[2]]
EGARCH11_CandD_ser = EGARCH11_CandD_process_ser[[3]]
EGARCH11_CandD_zoo_ser = zoo(EGARCH11_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
EGARCH11_CandD_zoo_no_nas_ser = zoo(na.omit(EGARCH11_CandD_zoo_ser))
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.11.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.11.RData")
Computing in parallel the values $\hat{\pi}_{C\&D_{EGARCH(1,1)}, t+1} $
no_of_clusters = detectCores(logical = FALSE)
cl = makeCluster(no_of_clusters)
EGARCH11_CnDprocess_par = clusterApply(cl,
x = 1:roll, # t
fun = CnDprocess_t_par,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = EGARCH11_mu_hat_par,
pre_GARCH_sigma_hat = EGARCH11_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
stopCluster(cl)
# C&D's model according to the EGARCH11 Model:
EGARCH11_CandD_par = (1:(roll-1))
for(i in c(1:(roll-1)))
{EGARCH11_CandD_par[i] = 1 - ((EGARCH11_CnDprocess_par[[i]][[2]][i])/
length(na.omit(EGARCH11_CnDprocess_par[[i]][[2]])))}
EGARCH11_CandD_zoo_par = zoo(EGARCH11_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
EGARCH11_CandD_zoo_no_nas_par = na.omit(EGARCH11_CandD_zoo_par)
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.12.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.12.RData")
Computing in series the values $\hat{\pi}_{C\&D_{EGARCH(1,1)-SVI}, t+1} $
EGARCH11SVI_CandD_process_ser = CnDprocess_ser(regressand = df[, c("SPX_R")],
GARCH_mu_hat = EGARCH11SVI_mu_hat_ser,
GARCH_sigma_hat = EGARCH11SVI_sigma_hat_ser)
EGARCH11SVI_CandD_I_k_ser = EGARCH11SVI_CandD_process_ser[[1]]
EGARCH11SVI_CandD_I_ser = EGARCH11SVI_CandD_process_ser[[2]]
EGARCH11SVI_CandD_ser = EGARCH11SVI_CandD_process_ser[[3]]
EGARCH11SVI_CandD_zoo_ser = zoo(EGARCH11SVI_CandD_ser, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
EGARCH11SVI_CandD_zoo_no_nas_ser = zoo(na.omit(EGARCH11SVI_CandD_zoo_ser))
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.13.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.13.RData")
Computing in parallel the values $\hat{\pi}_{C\&D_{EGARCH(1,1)-SVI}, t+1} $
no_of_clusters = detectCores(logical = FALSE)
cl = makeCluster(no_of_clusters)
EGARCH11SVI_CnDprocess_par = clusterApply(cl,
x = 1:roll, # t
fun = CnDprocess_t_par,
regressand = df[, c("SPX_R")],
pre_GARCH_mu_hat = EGARCH11SVI_mu_hat_par,
pre_GARCH_sigma_hat = EGARCH11SVI_sigma_hat_par,
pre_GARCH_CandD_I_k = matrix(, nrow = roll, ncol = 1),
pre_GARCH_CandD_I = matrix(, nrow = roll, ncol = 1))
stopCluster(cl)
# C&D's model according to the EGARCH11SVI Model:
EGARCH11SVI_CandD_par = (1:(roll-1))
for(i in c(1:(roll-1)))
{EGARCH11SVI_CandD_par[i] = 1 - ((EGARCH11SVI_CnDprocess_par[[i]][[2]][i])/
length(na.omit(EGARCH11SVI_CnDprocess_par[[i]][[2]])))}
EGARCH11SVI_CandD_zoo_par = zoo(EGARCH11SVI_CandD_par, index(df[, c("SPX_R")][(253+1):(253+(roll-1))]))
# This zoo object has missing values, so we can remove them:
EGARCH11SVI_CandD_zoo_no_nas_par = na.omit(EGARCH11SVI_CandD_zoo_par)
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.14.1.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.14.1.RData")
Forecast error mean, sd (standard deviation), Brier Scores and Diebold&Mariano statistics based on results from the parallel methods used above:
Mean of the probabilistic forecast errors derived from each model letting $ \pi_{Observed,t} = I(R_{SPX,t} > 0) = \begin{Bmatrix} 1 & \text{if } R_{SPX,t} > 0 \\ 0 & \text{otherwise} \end{Bmatrix} $ and $ \mathbf{Observed\_pi} = $ $\left[ \begin{matrix} \pi_{Observed,1} \\ \pi_{Observed,2} \\ \vdots\\ \pi_{Observed,T_{Observed\_pi}} \end{matrix} \right] $ = $\left[ \begin{matrix} I(R_{SPX,1} > 0) \\ I(R_{SPX,2} > 0) \\ \vdots\\ I(R_{SPX,T_{Observed\_pi}} > 0) \end{matrix} \right] $
Observed_pi = ifelse(matrix(df[, c("SPX_R")])>0,1,0)
Observed_pi_zoo = zoo(Observed_pi, index(df[, c("SPX_R")]))
To asses the accuracy of volatility and positive sign probability forecasts, this sub-section specifies error functions.
Variance forecasts from the models GARCH to EGARCH-SVI are simply compared against realised volatilities to compute variance forecast errors as $$ \hat{\sigma}_{t|t-1} - RV_t \text{ .}$$
Sign forecast errors are computed as
$$\begin{equation} \label{eq:SignForecastErrors} \hat{\pi}_{j, t+1} - \mathbb{P}(R_{t+1} > 0 | \Omega_{t \textbf{+1}}) = \hat{\pi}_{j, t+1} - \pi_{Observed,t+1} \end{equation}$$where $j = (C\&D_M, Naive)$, and where $M$ can be replaced by a model name ranging from GARCH to EGARCH-SVI (Note the bold `$+1$' in $\Omega_{t \textbf{+1}}$ here to emphasise how $\mathbb{P}(R_{t+1}>0|\Omega_{t \textbf{+1}})=I(R_{t+1}>0)$.)
Naive_pi_error = Naive_zoo - Observed_pi_zoo
GARCH11_pi_error = GARCH11_CandD_zoo_no_nas_par - Observed_pi_zoo
GARCH11SVI_pi_error = GARCH11SVI_CandD_zoo_no_nas_par - Observed_pi_zoo
GJRGARCH11_pi_error = GJRGARCH11_CandD_zoo_no_nas_par - Observed_pi_zoo
GJRGARCH11SVI_pi_error = GJRGARCH11SVI_CandD_zoo_no_nas_par - Observed_pi_zoo
EGARCH11_pi_error = EGARCH11_CandD_zoo_no_nas_par - Observed_pi_zoo
EGARCH11SVI_pi_error = EGARCH11SVI_CandD_zoo_no_nas_par - Observed_pi_zoo
rbind(Statistics_Table("Naive_pi_error", na.omit(Naive_pi_error)),
Statistics_Table("GARCH11_pi_error", na.omit(GARCH11_pi_error)),
Statistics_Table("GARCH11SVI_pi_error", na.omit(GARCH11SVI_pi_error)),
Statistics_Table("GJRGARCH11_pi_error", na.omit(GJRGARCH11_pi_error)),
Statistics_Table("GJRGARCH11SVI_pi_error", na.omit(GJRGARCH11SVI_pi_error)),
Statistics_Table("EGARCH11_pi_error", na.omit(EGARCH11_pi_error)),
Statistics_Table("EGARCH11SVI_pi_error", na.omit(EGARCH11SVI_pi_error)))
Only the 'Mean_of_Absolute_Values' and 'Standard_Deviation' really are of interest here. We are looking for the lowest absolute mean and standard deviation. The SVI-implementing model's results are seen to perform better than their SVI-free counterparts (bar the GACH model's). Apart from that, little at this time can be concluded - unfortunately.
Brier scores of the probabilistic forecast errors derived from each model
Brier_Score_Table = function(name, data)
{as.data.table(list(Data_Set = name, Brier_Score = (1/length(data))*sum(data^2)))}
rbind(Brier_Score_Table("Naive_pi_error", na.omit(Naive_pi_error)),
Brier_Score_Table("GARCH11_pi_error", na.omit(GARCH11_pi_error)),
Brier_Score_Table("GARCH11SVI_pi_error", na.omit(GARCH11SVI_pi_error)),
Brier_Score_Table("GJRGARCH11_pi_error", na.omit(GJRGARCH11_pi_error)),
Brier_Score_Table("GJRGARCH11SVI_pi_error", na.omit(GJRGARCH11SVI_pi_error)),
Brier_Score_Table("EGARCH11_pi_error", na.omit(EGARCH11_pi_error)),
Brier_Score_Table("EGARCH11SVI_pi_error", na.omit(EGARCH11SVI_pi_error)))
The higher the Brier score, the worst the model did. Expectedly, the naive model did worst. Interestingly, SVI-implementing models always outperformed SVI-free ones (apart from the EGARCH case).
Diebold & Mariano statistics
Here the alternative hypothesis is that method 2 is more accurate than method 1; remember that a small p-value indicates strong evidence against the Null Hypothesis.
GARCH11_pi_error_dm = GARCH11_pi_error - (GARCH11SVI_pi_error*0)
GARCH11SVI_pi_error_dm = GARCH11SVI_pi_error - (GARCH11_pi_error*0)
GARCH11_dm_test = dm.test(matrix(GARCH11_pi_error_dm), matrix(GARCH11SVI_pi_error_dm), alternative = "greater")
show(GARCH11_dm_test)
Diebold-Mariano Test
data: matrix(GARCH11_pi_error_dm)matrix(GARCH11SVI_pi_error_dm)
DM = 1.8532, Forecast horizon = 1, Loss function power = 2, p-value = 0.03197
alternative hypothesis: greater
GJRGARCH11_pi_error_dm = GJRGARCH11_pi_error - (GJRGARCH11SVI_pi_error*0)
GJRGARCH11SVI_pi_error_dm = GJRGARCH11SVI_pi_error - (GJRGARCH11_pi_error*0)
GJRGARCH11_dm_test = dm.test(matrix(GJRGARCH11_pi_error_dm), matrix(GJRGARCH11SVI_pi_error_dm), alternative = c("greater"))
show(GJRGARCH11_dm_test)
Diebold-Mariano Test
data: matrix(GJRGARCH11_pi_error_dm)matrix(GJRGARCH11SVI_pi_error_dm)
DM = 0.7234, Forecast horizon = 1, Loss function power = 2, p-value = 0.2347
alternative hypothesis: greater
EGARCH11_pi_error_dm = EGARCH11_pi_error - (EGARCH11SVI_pi_error*0)
EGARCH11SVI_pi_error_dm = EGARCH11SVI_pi_error - (EGARCH11_pi_error*0)
EGARCH11_dm_test = dm.test(matrix(EGARCH11_pi_error_dm), matrix(EGARCH11SVI_pi_error_dm), alternative = c("greater"))
show(EGARCH11_dm_test)
Diebold-Mariano Test
data: matrix(EGARCH11_pi_error_dm)matrix(EGARCH11SVI_pi_error_dm)
DM = -0.042896, Forecast horizon = 1, Loss function power = 2, p-value = 0.5171
alternative hypothesis: greater
The only D-M test with a small enough p-value to allow for a statistically significant conclusion is the GARCH's (at the 5% Confidence Level / 95% Significance Level). It shows a slight preference for the SVI-implementing model.
Saving our data this far to help with de-bugging if needed:
save.image(file = "IDaSRP_work_space_p4.15.RData")
# # To restore your workspace, type this:
# load("IDaSRP_work_space_p4.15.RData")