Breakdown
In this article, I primarily study the frameworks outlined in CPV to investigate the explanatory and forecasting powers of information demand on the SPX. $$ \\ $$ Financial applications of my models are studied via the perspective of a single profit maximising investor in a manner akin to CPV (Kandel and Stambaugh (1996)). In the interest of time, I only investigate the scenario where the investing agent's risk aversion implies taking a risk as soon as its expected reward is positive. Later we will look into a framework to optimise this risk aversion level to allow for a truly profit maximising strategy.
- Part 1: Data Retrieval and Analysis: In this part, we start the R code, construct some rudimentary functions and collect data from Refinitiv all while explaining their use. We also go through the concept of risk-free rates and excess-returns as well as realised variance & realised volatility in addition to Google’s Search Vector Index before lightly analysing the retrieved data.
- Part 2: Econometric Modelling of Non-Linear Variances: In this section, there is no coding. Instead we go – step by step and including working outs – through ARMA models, their parameter estimation process (via log-likelihood), the concepts of stationarity, Taylor’s expansion, the calculus’ chain rule, first order conditions, and GARCH models (including the GACH, GJRGARCH and EGARCH models). Finally, we outline the way in which we construct error values from which to evaluate the performance of our models.
- Part 3: Coding Non-Linear Variance Econometric Models: In this section we encode the mathematical models and concepts in Part 2. We build our code, explaining it step-by-step, using in-series (computationally slow) and in-parallel (computationally fast) methods. Finally we expose the Diebold-Mariano test results for each pair of model performances, with and without using SVIs.
- Part 4: Sign Forecast Frameworks: Here we outline the Christopherson and Diebold model to estimate the probability of a positive returns, we compare the predictive performance of each model via rudimentary methods, Brier Scores and Diebold and Mariano statistics.
- Part 5: Financial significance: Finally, we put ourselves in the shoes of a profit maximising investor who reinvests – on a daily basis - in the risk-free asset or the S&P500 if the model suggests that its excess return will be positive with more probability than not. We graph the cumulative returns of such an investor for each model-following-strategy and discuss the results with the help of Sharpe-Ratios before concluding.
Development Tools & Resources
Refinitiv's DataStream Web Services for R (DatastreamDSWS2R): Access to DataStream data. A DataStream or Refinitiv Workspace IDentification (ID) will be needed to run the code bellow.
Part 2: Econometric Modelling of Non-Linear Variances
Part 2 Contents
- ARMA models
- ARMA models' Stationarity
- Generic ARMA(p,q) model Estimation
- GARCH models
- Part 2 References
An AutoRegressive Moving Average model of order $p$ and $q$ (ARMA($p,q$)) represents $y_t$ in terms of $p$ of its own lags and $q$ lags of a white noise process such that:
$$\begin{align} y_t &= c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + \epsilon_t + \psi_1 \epsilon_{t-1} + \psi_2 \epsilon_{t-2} + ... + \psi_q \epsilon_{t-q} \\ y_t &= c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right)+ \epsilon_t + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ y_t - \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) &= c + \epsilon_t + \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \\ y_t - \phi_1 y_{t-1} - \phi_2 y_{t-2} - ... - \phi_p y_{t-p} &= c + \epsilon_t + \psi_1 \epsilon_{t-1} + \psi_2 \epsilon_{t-2} + ... + \psi_q \epsilon_{t-q} \\ y_t \left( 1 - \phi_1 L^1 - \phi_2 L^2 - ... - \phi_p L^{p} \right) &= c + \epsilon_t \left( 1 + \psi_1 L^1 + \psi_2 L^2 + ... + \psi_q L^{q} \right) \\ \label{eq:ARMApq} \tag{1} y_t \left( 1 - \sum_{i=1}^{p}\phi_i L^{i} \right) &= c + \epsilon_t \left( 1 + \sum_{j=1}^{q} \psi_j L^{j} \right) \end{align}$$where $L$ is the Lag opperator such that $x_t L^i = x_{t-i} $ $ \forall $ $ x, t, i \in \mathbb{R}$.
It is important to note that due to the fact that we use $p$ autoregressive and $q$ moving average componants, we use $p$ lags of our dependent variable - $y$ - and $q$ lags of our error term - $\epsilon$. This means that the first forecast produced by an ARMA($p,q$) model must use at least $max(p,q)$ lags and we thus loose these observations from our observation data-set.
$\textbf{Example:}$ Let $p = 2$ and $q = 3$, then
$$\begin{align} y_t &= c + \left( \sum_{i=1}^{2}\phi_i y_{t-i} \right)+ \epsilon_t + \left( \sum_{j=1}^{3} \psi_j \epsilon_{t-j} \right) \\ y_t - \left( \sum_{i=1}^{2}\phi_i y_{t-i} \right) &= c + \epsilon_t + \sum_{j=1}^{3} \psi_j \epsilon_{t-j} \\ y_t - \phi_1 y_{t-1} - \phi_2 y_{t-2} &= c + \epsilon_t + \psi_1 \epsilon_{t-1} + \psi_2 \epsilon_{t-2} + \psi_q \epsilon_{t-3} \\ y_t \left( 1 - \phi_1 L^1 - \phi_2 L^2 \right) &= c + \epsilon_t \left( 1 + \psi_1 L^1 + \psi_2 L^2 \psi_q L^{3} \right) \\ y_t \left( 1 - \sum_{i=1}^{2}\phi_i L^{i} \right) &= c + \epsilon_t \left( 1 - \sum_{j=1}^{3} \psi_j L^{j} \right) \end{align}$$
Now we can see that \eqref{eq:ARMApq} can be re-written such that
$$\begin{align} y_t \left( 1 - \sum_{i=1}^{p}\phi_i L^{i} \right) &= c + \epsilon_t \left( 1 + \sum_{j=1}^{q} \psi_j L^{j} \right) \\ y_t \Phi_p(L) &= c + \epsilon_t \Psi_q(L) \end{align}$$
where the AR (AutoRegressive) component of our model is encapsulated in $\Phi_p(z) \equiv (1 - \sum_{i=1}^{p}\phi_i z^{i})$ and the MA (Moving Average)'s in $\Psi_q(z) \equiv (1 + \sum_{j=1}^{q} \psi_j z^{j})$ $\forall p,q,z \in \mathbb{R}$. From there, we can see that
$$\begin{align} y_t &= \frac{c + \epsilon_t \Psi_q(L)}{\Phi_p(L)} \\ y_t &= \frac{c}{\Phi_p(L)} + \frac{\epsilon_t \Psi_q(L)}{\Phi_p(L)} \end{align}$$
$c$ is just an arbitrary constant, so we can use an arbitrary $c_{_{\Psi}}$ for $\frac{c}{\Psi_q(L)}$ such that:
$$\begin{align} y_t &= c_{_{\Psi}} + \epsilon_t \frac{\Psi_q(L)}{\Phi_p(L)} \\ &= c_{_{\Psi}} + \epsilon_t \Psi_q(L) \frac{1}{\Phi_p(L)} \\ &= c_{_{\Psi}} + \epsilon_t \Psi_q(L) \frac{1}{1 - \phi_1 L^1 - \phi_2 L^2 - ... - \phi_p L^p} \end{align}$$
By Tylor's expansion, $\frac{1}{1 - x} = \sum_{n=0}^{∞}{x^n}$, thus $\frac{1}{1 - x} \rightarrow ∞ \text{ if } \begin{Bmatrix} x > 1 & \text{if } x \in \mathbb{R} \\ x \text{ lies outside the unit circle } & \text{if } x \in \mathbb{Z} \end{Bmatrix}$.
$\textbf{Example 1:}$ The last equality only applies thanks to Taylor's expansion: $$\begin{equation} \frac{1}{\Phi_1(L)} = \frac{1}{1 - \phi_1 L^1} = \sum_{n=0}^{∞}{(\phi_1 L)^n} \end{equation}$$
$\textbf{Example 2:}$ The last equality only applies thanks to Taylor's expansion: $$\begin{equation} \frac{1}{\Phi_p(L)} = \frac{1}{1 - \phi_1 L^1 - \phi_2 L^2 - ... - \phi_p L^p} = \frac{1}{1 - (\phi_1 L^1 + \phi_2 L^2 + ... + \phi_p L^p)} = \sum_{n=0}^{∞}{(\phi_1 L^1 + \phi_2 L^2 + ... + \phi_p L^p)^n} \end{equation}$$ similarly to how $$\begin{equation} \frac{1}{\Phi_p(L)} = \frac{1}{1 - \sum_{i=1}^{p}\phi_i L^{i}} = \sum_{n=0}^{∞}{\left(\sum_{i=1}^{p}\phi_i L^{i} \right)^n} \end{equation}$$
Thus, $\forall (\sum_{i=1}^{p}\phi_i L^{i})$ lying outside the unit circle, we end up with $\frac{1}{\Phi_p(L)} \rightarrow ∞$; i.e.:
$ \\ $
An AR process is called **stationary** if the roots of its equation $ \Phi_p(z)$ ( denoted with a star: $z^*$ ) lie outside the unit circle
In practice, an ARMA($p$, $q$) model (${y_t \Phi_p(L) = c + \epsilon_t \Psi_q(L)}$) is stationary if its AR part ($\Phi_p(L)$) is stationary, which itself only occurs if its roots lie outside the unit circle (i.e.: ${|z^*| > 1}$). More explicitly, for $\Phi_p(z)$:
$$\begin{align} \Phi_p(z) &= 0 \\ 1 - (\phi_1 z^1 + \phi_2 z^2 + ... + \phi_p z^p) &= 0 \\ \label{eq:roots}\phi_1 z^1 + \phi_2 z^2 + ... + \phi_p z^p &= 1 \tag{2} \end{align}$$The z value abiding by \eqref{eq:roots} are its roots, ${|z^*|}$, and there often are more than one. As long as they are all greater than one, then the process is stationary.
$\textbf{Example:}$ For AR(1) processes, (i.e.: when $p = 1$) $$\begin{align} \Phi_1(z) &= 0 \\ 1 - \phi_1 z^1 &= 0 \\ \phi_1 z &= 1 \\ z &= \frac{1}{\phi_1} \\ \end{align}$$
Then the modulus is simply the absolute value; and so ${|z^*| > 1}$ if ${\phi_1 < 1}$, by extention: when $\sum_{i=1}^{p}\phi_i L^{i}$ is lying inside the unit circle, $\frac{1}{\Phi_p(L)}$ is finite (or, with notational abuse: $\frac{1}{\Phi_p(L)} < ∞$).
In this study, we will use the Maximum Likelihood method of estimation of our unknown parameters.
The log likelihood function of independantly and identically distributed $y$ variables is defined as $L_T(\Theta) \equiv \prod_{t=1}^T{f(y_t|y_{1}, y_{2}, ... , y_{t}; \Theta)}.$ It is often normalised (without loss of information or generalisation) by $T$ such that $L_T(\Theta) \equiv \frac{1}{T} \prod_{t=1}^T{f(y_t|y_{1}, y_{2}, ... , y_{t}; \Theta)}.$
Looking into the likelyhood function with normally, identically and independently distributed independent-variables and only with known parameters $\Theta = $[${\mu, \sigma^2}$], we start by asumming independently and normaly distributed independent variables {$y_t$}$^T_1$, i.e.: assuming that ${y_t \stackrel{iid}{\sim} N \left( \mu, \sigma^2 \right)}$, then - with known parameters $\Theta = $[${\mu, \sigma^2}$]:
Then, we define the log-likelihood functin as ${ln[ L_T(\Theta) ]}$ such that: $$\begin{align} ln[ L_T(\Theta) ] \equiv ln[ f(y_T|y_{T-1}, y_{T-2}, ... , y_{1}; \Theta) ] &= ln \left( \prod_{t=1}^T{\frac{1}{\sqrt[2]{2 \pi \sigma^2}} exp \left[ -\frac{(y_t - \mu)^2}{2 \sigma^2} \right]} \right) \\ &= ln \left[ \prod_{t=1}^T{f(y_t;\Theta)} \right] \\ &= \sum_{t=1}^T{ln [ f(y_t;\Theta) ] } \\ \end{align}$$
Without loss of generality, and to take an individualistic aproach - normalising by $T$, we can write
$$\begin{align} ln[ L_T(\Theta) ] &= \frac{1}{T} \sum_{t=1}^T{ln [ f(y_t;\Theta) ] } \\ &= \frac{1}{T} \sum_{t=1}^T{ln \left( \frac{1}{\sqrt[2]{2 \pi \sigma^2}} exp \left[ -\frac{(y_t - \mu)^2}{2 \sigma^2} \right] \right) } \\ &= \frac{1}{T} \sum_{t=1}^T{ln \left( \left[2 \pi \sigma^2 \right]^{ - \frac{1}{2}} exp \left[ -\frac{(y_t - \mu)^2}{2 \sigma^2} \right] \right) } \\ &= \frac{1}{T} \sum_{t=1}^T{ \left[ - \frac{1}{2} ln(2 \pi) - \frac{1}{2} ln \left( \sigma^2 \right) - \frac{(y_t - \mu)^2}{2 \sigma^2} \right]} \\ &= - \frac{1}{2} ln(2 \pi) - \frac{1}{2} ln \left( \sigma^2 \right) - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ (y_t - \mu)^2} \\ \end{align}$$We may have situations where we do not know all the parameters in $\Theta$ (which, as a reminder, previously was [${\mu, \sigma^2}$]). Then, we may estimate it as:
$ \begin{equation} { \hat{\Theta} = \stackrel{\Theta}{arg max} ln[L_T(\Theta)]}. \end{equation} $
There, the notation $\stackrel{x}{arg max} f(x)$ denotes 'the value of $x$ maximising the function $f(x)$', i.e.: $\hat{\Theta} = \stackrel{\Theta}{arg max}[f(\Theta)]$ denotes 'the value of $\Theta$ maximising the function $f(\Theta)$'.
With normally, identically and independently distributed independent-variables and unknown ARMA parameters (i.e.: $\Theta = $[${c, \phi_1, ... , \phi_p, \psi_1, ... , \psi_q, \sigma^2}$]), the assumption of 'normally, identically and independently distributed independent-variables' translates to: $ \epsilon_t \stackrel{\textbf{iid}}{\sim} N \left( 0, \sigma^2 \right) $, which itself implies that $ \epsilon_{t \vert t-1} \sim N \left( 0, \sigma^2 \right) $
We may define our (fixed but) unknown parameters as $\Theta = \left[ {c, \phi_1, ... , \phi_p, \psi_1, ... , \psi_q, \sigma^2}\right]$ and estimate them as $\hat{\Theta} = \left[ {\hat{c}, \hat{\phi_1}, ... , \hat{\phi_p}, \hat{\psi_1}, ... , \hat{\psi_q}, \hat{\sigma^2}} \right]$:
$$\begin{array}{ll} \hat{\Theta} & = \stackrel{\Theta}{arg max} ln[L_T(\Theta)] \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ (y_t - \mu)^2} \right] & \text{since } ln[L_T(\Theta)] = \left[ - \frac{ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ (y_t - \mu)^2} \right] \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ (y_t - \mathbb{E} \left( y_t \vert y_{t-1} \right) )^2} \right] & {\text{since $\mu = \mathbb{E} \left( y_t \vert y_{t-1} \right)$ and where $ \vert $ is the standard notation for 'knowing'} \\ \text{Here we only look at conditional Likelihood.}} \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( y_t - \mathbb{E} \left[ c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) \\ + \epsilon_t + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \vert y_{t-1} \right] \right)^2 } \right] & \text{since } y_t = c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right)+ \epsilon_t + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( y_t - \left[ \mathbb{E}(c \vert y_{t-1} ) + \left( \sum_{i=1}^{p} \mathbb{E}( \phi_i y_{t-i} \vert y_{t-1}) \right) \\ + \mathbb{E}(\epsilon_t \vert y_{t-1} ) + \left( \sum_{j=1}^{q} \mathbb{E} (\psi_j \epsilon_{t-j} \vert y_{t-1} ) \right) \right] \right)^2 } \right] & \text{since the Expectation of summs is the sum of Expectations} \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( y_t - c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) \\ + \mathbb{E}(\epsilon_t \vert y_{t-1} ) + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right)^2 } \right] & \text{since only } \epsilon_t \text{ is random in } y_t \text{ (knowing } y_{t-1}\text{)} \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( y_t - c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) \\ + 0 + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right)^2 } \right] & \text{since } \mathbb{E}(\epsilon_t \vert y_{t-1} ) = \mathbb{E}(\epsilon_{t \vert t-1} ) = 0 \text{, since } \epsilon_t \sim N \left( 0, \sigma^2 \right) \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right)+ \epsilon_t + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ - c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right)^2 } \right] & \text{since } y_t = c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right)+ \epsilon_t + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ & = \stackrel{\Theta}{arg max} \left[ - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{\epsilon_t^2 } \right] & \text{rearanging} \end{array}$$
Thus:
$$ \begin{array}{ll} \hat{\Theta} &= \stackrel{\Theta}{arg max} \begin{Bmatrix} - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( {\left[ y_t - c - \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) - \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right]}^2 \right)} \end{Bmatrix} \\ \hat{\Theta} &= \stackrel{\Theta}{arg max} \begin{Bmatrix} - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left( \left[ y_t - c - \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) - \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] \left[ y_t - c - \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) - \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] \right)} \end{Bmatrix} \\ \\ &= \stackrel{\Theta}{arg max} \begin{Bmatrix} - \frac{ ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left[ {y_t}^2 - y_t c - y_t \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) - y_t \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ - c y_t + c^2 + c \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + c \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ - \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) {y_t} + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) c + {\left( \sum_{i=1}^{p}\phi_i y_{t-i} \right)}^2 + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ - \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) {y_t} + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) c + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + {\left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right)}^2 \right] } \end{Bmatrix} \\ \\ &= \stackrel{\Theta}{arg max} \begin{Bmatrix} - \frac{ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left[ {y_t}^2 - 2 y_t c - 2 y_t \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) - 2 y_t \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ + c^2 + 2 c \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + 2 c \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ + { \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) }^2 + 2 \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \\ + { \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) }^2 \right] } \end{Bmatrix} \end{array} $$if $f$ and $g$ are both differntiatable and $F(x)$ is the composite funcition defined by $F(x) = f(g(x))$, then $F$ is diferentiatable and its 1st derivative (i.e.: $F'$) is given by the product:
$$ F'(x) = f'(g(x))g'(x) $$
To maximise $- \frac{ln(2 \pi)}{2} - \frac{ln \left( \sigma^2 \right)}{2} - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{\epsilon_t^2}$ with respect to $\Theta$, we differentiate with respect to $\Theta$ and equate it to $0$, this is known as First Order Condition maximisation; thus:
First Order Condition with Respect to $c$: $$ \begin{array}{ll} \frac{\delta ln[L_T(\Theta)]}{\delta c} &= - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left[ - 2 y_t + 2 c + 2 \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + 2 \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \\ &= - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ 2 \left[ - y_t + c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \\ &= - \frac{2}{2 \sigma^2 T} \sum_{t=1}^T{ \left[ - y_t + c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \\ &= - \frac{1}{\sigma^2 T} \sum_{t=1}^T{ \left[ - y_t + c + \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) + \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \end{array} $$
by FOC, $\Theta = \widehat{\Theta}$ when $\frac{\delta ln[L_T(\Theta)]}{\delta c} = 0$, therefore $\frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \widehat{c}} = 0$, and:
$$ \begin{array}{ll} \frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \widehat{c}} = 0 &= - \frac{1}{\widehat{\sigma^2} T} \sum_{t=1}^T{ \left[ - y_t + \widehat{c} + \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } & \\ &= - \frac{1}{T} \sum_{t=1}^T{ \left[ - y_t + \widehat{c} + \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } & \text{multiplying both sides by } \widehat{\sigma^2} \\ &= - \frac{T \widehat{c}}{T} - \frac{1}{T} \sum_{t=1}^T{ \left[ - y_t + \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } & \\ &= - \widehat{c} - \frac{1}{T} \sum_{t=1}^T{ \left[ - y_t + \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } \end{array} $$then:
$$ \begin{array}{ll} \widehat{c} &= - \frac{1}{T} \sum_{t=1}^T{ \left[ - y_t + \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } \end{array} $$First Order Condition with Respect to $\phi_i$:
The notation here might seem a litle confusing, but it is actually ergonomic. We are going to investigate the $ln[L_T(\Theta)]$ function with respect to any one $\phi_i$ for any applicable $i$. The bellow is thus for any one applicable $i$:
$$ \begin{array}{ll} \frac{\delta ln[L_T(\Theta)]}{\delta \phi_i} &= - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left[- 2 y_t y_{t-i} + 2 c y_{t-i} + 2 { \left( \sum_{k=1}^{p}\phi_k y_{t-k} \right) } y_{t-i} + 2 y_{t-i} \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \\ &= - \frac{2}{2 \sigma^2 T} \sum_{t=1}^T{ \left[- y_t y_{t-i} + c y_{t-i} + { \left( \sum_{k=1}^{p}\phi_k y_{t-k} \right) } y_{t-i} + y_{t-i} \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \\ &= - \frac{1}{\sigma^2 T} \sum_{t=1}^T{ \left[- y_t y_{t-i} + c y_{t-i} + { \left( \sum_{k=1}^{p}\phi_k y_{t-k} \right) } y_{t-i} + y_{t-i} \left( \sum_{j=1}^{q} \psi_j \epsilon_{t-j} \right) \right] } \end{array} $$
by FOC, $\Theta = \widehat{\Theta}$ when $\frac{\delta ln[L_T(\Theta)]}{\delta \phi_i} = 0$, therefore $\frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \phi_i} = 0$, and (again, for any applicable $i$) :
$$ \begin{array}{ll} \frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \widehat{\phi_i}} = 0 &= - \frac{1}{\widehat{\sigma^2} T} \sum_{t=1}^T{ \left[- y_t y_{t-i} + \widehat{c} y_{t-i} + { \left( \sum_{k=1}^{p} \widehat{\phi_k} y_{t-k} \right) } y_{t-i} + y_{t-i} \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } \\ &= \sum_{t=1}^T{ \left[- y_t y_{t-i} + \widehat{c} y_{t-i} + { \left( \sum_{k=1}^{p} \widehat{\phi_k} y_{t-k} \right) } y_{t-i} + y_{t-i} \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } \\ &= \sum_{t=1}^T{ y_{t-i} \left[- y_t + \widehat{c} + { \left( \sum_{k=1}^{p} \widehat{\phi_k} y_{t-k} \right) } + \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \right] } \end{array} $$First Order Condition with Respect to $\psi_j$:
The notation here might seem a litle confusing, but it is actually ergonomic. We are going to investigate the $ln[L_T(\Theta)]$ function with respect to any one $\psi_j$ for any applicable $j$. The bellow is thus for any one applicable $j$:
$$ \begin{array}{ll} \frac{\delta ln[L_T(\Theta)]}{\delta \psi_j} &= - \frac{1}{2 \sigma^2 T} \sum_{t=1}^T{ \left[- 2 y_t \epsilon_{t-j} + 2 c \epsilon_{t-j} + 2 { \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) } \epsilon_{t-j} + 2 \left( \sum_{k=1}^{q} \psi_k \epsilon_{t-k} \right) \epsilon_{t-j} \right] } \\ &= - \frac{2}{2 \sigma^2 T} \sum_{t=1}^T{ \left[- y_t \epsilon_{t-j} + c \epsilon_{t-j} + { \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) } \epsilon_{t-j} + \left( \sum_{k=1}^{q} \psi_k \epsilon_{t-k} \right) \epsilon_{t-j} \right] } \\ &= - \frac{1}{\sigma^2 T} \sum_{t=1}^T{ \left[- y_t \epsilon_{t-j} + c \epsilon_{t-j} + { \left( \sum_{i=1}^{p}\phi_i y_{t-i} \right) } \epsilon_{t-j} + \left( \sum_{k=1}^{q} \psi_k \epsilon_{t-k} \right) \epsilon_{t-j} \right] } \end{array} $$
by FOC, $\Theta = \widehat{\Theta}$ when $\frac{\delta ln[L_T(\Theta)]}{\delta \psi_j} = 0$, therefore $\frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \psi_j} = 0$, and (again, for any applicable $j$) :
$$ \begin{array}{ll} \frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \widehat{\psi_j}} = 0 &= - \frac{1}{\widehat{\sigma^2} T} \sum_{t=1}^T{ \left[- y_t \epsilon_{t-j} + \widehat{c} \epsilon_{t-j} + { \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) } \epsilon_{t-j} + \left( \sum_{k=1}^{q} \widehat{\psi_k} \epsilon_{t-k} \right) \epsilon_{t-j} \right] } \\ &= \sum_{t=1}^T{ \left[- y_t \epsilon_{t-j} + \widehat{c} \epsilon_{t-j} + { \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) } \epsilon_{t-j} + \left( \sum_{k=1}^{q} \widehat{\psi_k} \epsilon_{t-k} \right) \epsilon_{t-j} \right] } \\ &= \sum_{t=1}^T{ \epsilon_{t-j} \left[- y_t + \widehat{c} + { \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) } + \left( \sum_{k=1}^{q} \widehat{\psi_k} \epsilon_{t-k} \right) \right] } \end{array} $$First Order Condition with Respect to $\sigma^2$:
Letting
$$ S(\widehat{c}, \widehat{\phi_i}, \widehat{\psi_j}) = \sum_{t=1}^T { \left[ {y_t}^2 - 2 y_t \widehat{c} - 2 y_t \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) - 2 y_t \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \\ + \widehat{c}^2 + 2 \widehat{c} \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + 2 \widehat{c} \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \\ + { \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) }^2 + 2 \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \\ + { \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) }^2 \right] } $$
then:
$$ \begin{array}{ll} \frac{\delta ln[L_T(\Theta)]}{\delta \sigma^2} &= - \frac{1}{2 \sigma^2} + \frac{1}{2 {\left( \sigma^2 \right)}^2 T} S(\widehat{c}, \widehat{\phi_i}, \widehat{\psi_j}) \end{array} $$
Note that the $+$ in $+ \frac{1}{2 {\left( \sigma^2 \right)}^2 T}$ comes from the differential of $- \frac{1}{2 \sigma^2 T}$, i.e.: of $- \frac{(\sigma^2)^{-1}}{2 T}$
by FOC, $\Theta = \widehat{\Theta}$ when $\frac{\delta ln[L_T(\Theta)]}{\delta \sigma^2} = 0$, therefore $\frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \sigma^2} = 0$, and:
$ \begin{array}{ll} \frac{\delta ln[L_T(\widehat{\Theta})]}{\delta \widehat{\sigma^2}} = 0 &= - \frac{1}{2 \widehat{\sigma^2}} + \frac{1}{2 {\left( \widehat{\sigma^2} \right)}^2 T} S(\widehat{c}, \widehat{\phi_i}, \widehat{\psi_j}) \end{array} $
Thus:
$ \begin{array}{ll} \frac{1}{2 \widehat{\sigma^2}} &= \frac{1}{2 {\left( \widehat{\sigma^2} \right)}^2 T} S(\widehat{c}, \widehat{\phi_i}, \widehat{\psi_j}) \end{array} $
then:
$ \begin{array}{ll} \frac{1}{\widehat{\sigma^2}} &= \frac{1}{{\left( \widehat{\sigma^2} \right)}^2 T} S(\widehat{c}, \widehat{\phi_i}, \widehat{\psi_j}) \end{array} $
then:
$$ \begin{array}{ll} \widehat{\sigma^2} &= \frac{1}{T} S(\widehat{c}, \widehat{\phi_i}, \widehat{\psi_j}) \\ &= \frac{1}{T} \sum_{t=1}^T { \left[ {y_t}^2 - 2 y_t \widehat{c} - 2 y_t \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) - 2 y_t \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \\ + \widehat{c}^2 + 2 \widehat{c} \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) + 2 \widehat{c} \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \\ + { \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) }^2 + 2 \left( \sum_{i=1}^{p} \widehat{\phi_i} y_{t-i} \right) \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) \\ + { \left( \sum_{j=1}^{q} \widehat{\psi_j} \epsilon_{t-j} \right) }^2 \right] } \\ \end{array} $$$ \\ $
First Order Conditions
Resuming; our FOC equations produced:
From there one can only compute the parameters numerically. It does not look like it due to the large amounts of substitutions that would be required but these FOC equations offer a closed system solvable itteratively from the data we have. Remember that the data we have consist of $y_t, y_{t-1}, ... , y_1, \epsilon_{t-1}, \epsilon_{t-2}, ..., \epsilon_{1}$, i.e.: everything without a hat.
Let $\{R_t\}_{t \in \mathbb{Z}}$ be the level change in index excess return such that (i.e.: $\textbf{:}$) $R_t \in \mathbb{R}$ at time $t$. Then one may attempt to encapsulate the movement in $R_t$ via the discrete-time stochastic process $$\begin{equation} \label{eq:AR1} R_t = C + \phi_1 R_{t-1} + \epsilon_t \tag{3} \end{equation}$$ where $C \in \mathbb{R}$ is an unknown constant, $\phi_1$ is the cofactor of the regressor $R_{t-1}$ - the 1$^{st}$ lag of $R_{t}$, and the error term $\epsilon_{t} \stackrel{iid}{\sim} N(0,\sigma^2)$ has standard deviation $\sigma \in \mathbb{R}$. This is an AR process of order one - AR(1). It is then easy to incorporate SVI - enabling us to study its effect on $R_t$ - with the cofactor $\phi_2 \in \mathbb{R}$ via the AR(1)-SVI model: $$\begin{equation} \label{eq:AR1SVI} R_t = C + \phi_1 R_{t-1} + \phi_2 \Delta SVI_{t-1} + \epsilon_t \text{ .} \tag{4} \end{equation}$$
These two models will be used throughout this article to model the level change in index excess returns.
As aforementioned, the lack of time-dependence in $\sigma$ results in the AR(1) model outlined in the above AR(1) Equation \ref{eq:AR1} and \ref{eq:AR1SVI} failing to represent SFFR (iii). To account for this heteroscedasticity, ARCH models of order $q \in \mathbb{Z}$ where $q > 0$ can be used. These ARCH($q$) models render $\sigma$ into time-dependent $\sigma_{t}$ $\textbf{:}$ $\epsilon_t \stackrel{iid}{\sim} N(0,\sigma^2_{t})$. To do so, let $\Omega_t$ denote the information set at time $t$; let the conditional mean and variance of $R_{t+1}$ respectively be $\mu_{t+1|t} = E(R_{t+1} | \Omega_{t})$ and $\sigma_{t+1|t}^2 = Var(R_{t+1} | \Omega_t)$ where $E$ and $Var$ are respectively the expectation and variance operators; let
$$\begin{equation} \epsilon_t = \sigma_{t|t-1} \eta_t \end{equation}$$
where $\eta_t$ is independent from $\Omega_t$ and $\eta_t \stackrel{iid}{\sim} N(0,1)$; and let
$$\begin{equation} \sigma_{t|t-1}^2 = c + \sum_{i=1}^{q}\alpha_i \epsilon_{t-i}^2 \end{equation}$$
where $c,\alpha_i \in \mathbb{R}$, $c>0$ and $\alpha_i \geq 0 \quad \forall i=(1,2,...,q)$ to allow only for positive - thus valid - variances.
Several expansions were applied to the ARCH model; the ones of interest here are namely the GARCH, GJRGARCH and EGARCH models of order one.
GARCH(1,1) account for SFFR (iv) and (v) in addition to (iii) and non-normal unconditional distributions in levels, which works in parallel with SFFR (i). To do so, an MA component is added to the ARCH(1):
$$\begin{equation} \sigma_{t|t-1}^2 = c + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2 \end{equation}$$
where elements are as aforementioned, $\beta \in \mathbb{R}$ is positive and in this specific case: $(\alpha + \beta) < 1$ for stationarity. I implement the SVI factor with positive $\delta \in \mathbb{R}$ - enabling us to study its effect on $R_t$'s variance - via:
$$\begin{equation} \sigma_{t|t-1}^2 = c + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2 + \delta \Delta SVI_{t-1} \end{equation}$$
Whilst GARCH models are a natural fit in this study by the virtues mentioned, “in general, models that incorporate volatility asymmetry such as EGARCH and GJR-GARCH perform better” (Poon and Granger (2003), p.507). Indeed, GARCH models cannot account for SFFR (v) of leverage effects, whereas GJRGARCH and EGACH models may; CPV therefore considered them too - as do I. The thicker bottom left tail of the Empirical Probability Distribution Functions (EPDFs) suggest such leverage effects.
GJRGARCH models account for asymmetry (SFFR (i)) via the last summand in
$$\begin{equation} \sigma_{t|t-1}^2 = c + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-i}^2 + \gamma I(\epsilon_{t-1}<0) \epsilon_{t-1}^2 \end{equation}$$
and second to last in $$\begin{equation} \sigma_{t|t-1}^2 = c + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2 + \gamma I(\epsilon_{t-1}<0) \epsilon_{t-1}^2 + \delta \Delta SVI_{t-1} \end{equation}$$ where a leverage case would result in $\gamma > 0$, and $I(\cdot)$ is the indicator function returning $1$ if its bracketed statement is true, $0$ otherwise - in this case:
$$\begin{equation} I(\epsilon_{t-1}<0) = \begin{Bmatrix} 1 & \text{if } \epsilon_{t-1}<0 \\ 0 & \text{otherwise} \end{Bmatrix} . \end{equation}$$
The use of the logarithm function insures non-negativity in EGARCH models
$$\begin{equation} ln(\sigma_{t|t-1}^2) = c + \alpha \Bigg[ \frac{|\epsilon_{t-1}|}{\sigma_{t-1}} - \sqrt{\frac{2}{\pi}} \Bigg] + \beta ln(\sigma_{t-1}^2) + \gamma \frac{\epsilon_{t-1}}{\sigma_{t-1}} \end{equation}$$
and
$$\begin{equation} ln(\sigma_{t|t-1}^2) = c + \alpha \Bigg[ \frac{|\epsilon_{t-1}|}{\sigma_{t-1}} - \sqrt{\frac{2}{\pi}} \Bigg] + \beta ln(\sigma_{t-1}^2) + \gamma \frac{\epsilon_{t-1}}{\sigma_{t-1}} + \delta \Delta SVI_{t-1} \end{equation}$$
where ln($\cdot$) and |$\cdot$| denote the logarithm base $e$ and the absolute value functions respectively and where $c, \alpha, \beta, \gamma \in \mathbb{R}$ are not constrained $\textbf{: } c + \alpha + \beta + \gamma > 0$ . If $Var(R_t|\Omega_{t-1})$ and $R_t$ react negatively to each other, $\beta$ would be negative, and vice versa.
ARMA(p,q)-GARCH(P,Q), ARMA(p,q)-GJRGARCH(P,Q) and ARMA(p,q)-EGARCH(P,Q) models' estimates can be computed via the same method as the generic ARMA(p,q) model above, but substituting in the modeled value for $\sigma_{t|t-1}^2$. This would then be a (Conditional) Quazi Maximum Likelihood estimation method where (assuming normality in the errors):
$$ L_T(\Theta) = \prod_{t=1}^T{\frac{1}{\sqrt[2]{2 \pi \sigma^2}} exp \left[ -\frac{(y_t - \mu)^2}{2 \sigma^2} \right]} $$
where $\sigma^2$ is recursively computed $\forall t \geq 1$ as dictated by the non-linear variance model GARCH(P,Q), GJRGARCH(P,Q) or EGARCH(P,Q).
To asses the accuracy of volatility and positive sign probability forecasts, this sub-section specifies error functions.
Variance forecasts from the models GARCH to EGARCH-SVI are simply compared against realised volatilities to compute variance forecast errors as $$ \hat{\sigma}_{t|t-1} - RV_t \text{ .}$$
Sign forecast errors are computed as
$$\begin{equation} \label{eq:SignForecastErrors} \hat{\pi}_{t+1}^{j} - \mathbb{P}(R_{t+1} > 0 | \Omega_{t \textbf{+1}}) \end{equation}$$
where $j = (C\&D_M, Naive)$, and where $M$ can be replaced by a model name ranging from GARCH to EGARCH-SVI (Note the bold `$+1$' in $\Omega_{t \textbf{+1}}$ here to emphasise how $\mathbb{P}(R_{t+1}>0|\Omega_{t \textbf{+1}})=I(R_{t+1}>0)$.) and also where:
A Naïve model of sign change is used as a benchmark to construct comparisons. It is formulated as $$ \hat{\pi}_{t+1}^{Naive} = \hat{\mathbb{P}}_{Naive}(R_{t+1|t}>0) = \frac{1}{t} \sum_{k=1}^{t} I (R_k > 0) \text{ .}$$
$ \ $
P.S.: Plenty of help on MLE can be foud on:
http://www.science.unitn.it/AnalisiInfoTLC/SSP/SSP14_15/lectures/lecture14.pdf
https://faculty.washington.edu/ezivot/econ584/notes/armaestimation.pdf
https://pdfs.semanticscholar.org/e3f5/90f5614a25b688c3d8bea67aad21c6840bc0.pdf
http://fmwww.bc.edu/repec/sce2004/up.27698.1075991479.pdf
http://www.math.chalmers.se/~palbin/dongthesis.pdf
https://www.youtube.com/watch?v=ul3u2yLPwU0
https://dachxiu.chicagobooth.edu/download/NGQMLE.pdf
https://rady.ucsd.edu/faculty/directory/valkanov/pub/classes/mfe/docs/lecture6_2010.pdf
https://jeanmariedufour.github.io/ResE/Dufour_2008_C_TS_ARIMA_Estimation.pdf
https://msu.edu/~baillie/822/MLE.pdf
http://www.maths.qmul.ac.uk/~bb/TimeSeries/TS_Chapter6_3_3.pdf
http://www.phdeconomics.sssup.it/documents/Lesson12.pdf
https://math.berkeley.edu/~btw/thesis4.pdf
https://pdfs.semanticscholar.org/9e19/c169fd51c9beddf1007c8b95331619ceea20.pdf
https://pdfs.semanticscholar.org/e3f5/90f5614a25b688c3d8bea67aad21c6840bc0.pdf