Refining value-at-risk estimates using a Bayesian Markov-switching GJR-GARCH copula-EVT model

In this paper, we propose a model for forecasting Value-at-Risk (VaR) using a Bayesian Markov-switching GJR-GARCH(1,1) model with skewed Student’s-t innovation, copula functions and extreme value theory. A Bayesian Markov-switching GJR-GARCH(1,1) model that identifies non-constant volatility over time and allows the GARCH parameters to vary over time following a Markov process, is combined with copula functions and EVT to formulate the Bayesian Markov-switching GJR-GARCH(1,1) copula-EVT VaR model, which is then used to forecast the level of risk on financial asset returns. We further propose a new method for threshold selection in EVT analysis, which we term the hybrid method. Empirical and back-testing results show that the proposed VaR models capture VaR reasonably well in periods of calm and in periods of crisis.


Introduction
In recent decades, Value-at-Risk (VaR) has become a key tool for measuring market risk; it provides risk managers with a quantitative measure of the downside risk of a firm or investment portfolio during a given time frame. VaR attempts to summarise the total risk in a portfolio of asset or exposures to risk factors in a single number over a target horizon.
There are several methods to estimate VaR; the most commonly used by financial institutions are the variance-covariance, historical simulation and Monte Carlo simulation methods (see [1][2][3] and the references therein). Historical simulation relies on actual data and is based on the assumption that history will repeat itself; the VaR is estimated by running hypothetical portfolios from historical data [4]. The variance-covariance and Monte Carlo simulation methods assume that asset returns are independent and identically distributed, a major weakness in these VaR models.
Traditional VaR models assume asset returns in financial markets to be normally distributed; thus, changes in asset prices are independent of each other and exhibit constant volatility over time. This is not the case in real life i.e., financial asset returns are leptokurtic and Methodology

Markov-switching GJR-GARCH model
Let r t represent a time series, then a general Markov-switching GARCH specification can be represented as where Δ t is a Markov chain (a stochastic variable) defined on the parameter space S = {1, . . ., K} that symbolises the model, D(0, h k, t , Θ k ) is a continuous distribution with zero mean and conditional variance h k,t , t is the distribution of the noise variables, which assumes a skewed Student's-t distribution, O t−1 is the information set observed up to time t − 1, and Θ k is a vector of the shape parameters. We define a K × K transition probability matrix P, with distinctive elements where p ij is the probability of transition from state Δ t−1 = i to state Δ t = j. k represents each regime in the Markov chain. The conditional variance, h k,t , for k = 1, . . .K are assumed to follow K-separate GARCH-type processes which evolve in parallel [11,14]. The Markov switching GARCH models use a stochastic process to define the unknown states [17]. The reliability of a good VaR model depends on the type of volatility model which it incorporates. As discussed above, most financial asset returns are not independently and identically distributed; they exhibit fat tails, leverage effects, and volatility is not constant over time. Volatility reacts differently with large negative returns as compared to positive returns reflecting leverage effects [2]. GARCH models often fail to capture these movements. A good volatility estimator must be able to capture the true behaviour of risk factor returns, it should be easy to implement for a wide range of risk factors, and finally, it should be possible to extent the approach to portfolios with a number of different risk factors [3]. It is well known that traditional GARCH models cannot capture the asymmetric response of volatility. Several other extensions of GARCH models have since been developed as possible solutions to these drawbacks. The most common of these are the exponential generalised ARCH (EGARCH) model [18], the threshold GARCH (TGARCH) model [19], and the GJR-GARCH model [20]. The only significant, albeit minor, difference between TGARCH and GJR-GARCH models is that TGARCH uses standard deviation instead of variance in its specifications [21]. We employ the Markov-switching GARCH model of [11] to capture the differences in the variance dynamics of high and low volatility periods [14], and use the GJR-GARCH(1,1) model to capture the asymmetry response in the conditional volatility process, hence the Markov-switching GJR-GARCH(1,1) model (MS-GJR-GARCH(1,1)).
The conditional variance of a MS-GJR-GARCH model is defined as α 2,k controls the degree of asymmetry in the conditional volatility to the past shock in regime k [14]. Thus, α 2,k > 0 indicates the presence of leverage effect which implies previous negative returns have higher influence on the volatility. The constraints α 0,k > 0, α 1,k + α 2,k ! 0 and β k ! 0 ensures a positive variance while covariance stationary is achieved by ensuring that where I f:g ¼ 1 if the condition holds and 0 otherwise. Note that E½ 2 k;t I f k;t <0g ¼ 1 2 when k is symmetrically distributed.
For the conditional distribution of r t in each regime of the Markov chain, we employ a skew and fat tail error probability distribution; the skewed Student's-t distribution. We use the skewed Student's-t distribution because it is able to account for the excess kurtosis in the conditional distribution that is common with financial time series processes [22]. Moreover, recent studies by [23,24] have shown that skewed Student's-t errors distribution is a good choice, when compared to a range of existing alternatives. The probability density function (PDF) of a Student's-t distribution is defined as where the constraint on the degrees of freedom parameter ν > 2 is imposed to guarantee that the second order moment exist, and Γ(Á) is the Gamma function. Skewness is introduced by an additional parameter γ k > 0 as defined in [25]; that is When γ k 6 ¼ 1, the posterior distribution, p( k |v, γ k ) loses symmetry (see [14,25,26] for more details on skewed Student's-t probability distribution).
We use Bayesian statistics to estimate the posterior distribution of the variance equation because the Bayesian estimation method provides reliable results even for finite samples. Moreover, it is usually straightforward when using the Bayesian estimation method, to obtain the posterior distribution of any non-linear function of the model parameter. By comparison, when using the classical maximum likelihood method, it is not easy to perform inferences on non-linear functions of the model parameters, while the convergence rate is slow and presents limitations when the residuals are heavy tailed. The constraints on the GARCH parameters to guarantee a positive variance can be incorporated via priors whereas the classical maximum likelihood method may impede some optimisation procedures [27,28].
We define a vector of the risk factor returns as r = (r 1 , . . ., r T ) 0 , θ k = (α 0,k , α 1,k , α 2,k , β k , P) 0 , and a vector of the model parameters as Λ = (θ 1 , Θ 1 , . . ., θ K , Θ K ); Θ K = (ν K , γ K ). Then, from Bayes theorem and prior distribution of the model parameters p(Λ), we have where f(r t |Δ t = j, O t−1 ; Λ) is the conditional probability density of r t at time t restrictive on O t−1 and regime j. Therefore we have and a likelihood function The Metropolis Hasting (MH) algorithm of Markov Chain Monte Carlo (MCMC) is then employed to estimate the parameter values of the posterior distribution. As discussed in [22], because of the recursive nature of the variance equation, the prior density p(Λ) and posterior density p(r|Λ) do not belong to the same distributional family and, consequently, cannot be expressed in close form. The MH algorithm allows draws to be generated from any density and whose normalising constant is unknown.
In the MH algorithm, Λ is a random variable with Markov chains generated as (Λ [0] ), . . ., (Λ [j] ), . . . in a parameter space. As the number of realised chains reaches infinity, p(r|Λ) tends to a normalised probability distribution with a random variable (Λ [j] ) [29]. The chain converges to its stationary distribution and the optimal mean values of the posterior distribution parameters are realised. [22] summarises the MH algorithm as follows: (i) Initialise the iteration counter to j = 1 and set the initial value Λ [0] . (ii) Move the chain to a new value Λ ? generated from a proposal density q(Á|Λ [j−1]  (iv) Finally, change the counter from j to j + 1 and go back to step (ii) until convergence is reached. More details on MH algorithms can be found in [30][31][32][33].

Copula theory
Copula theory enables the construction of a flexible multivariate distribution with varying margins and dependence structures; it is free from assumptions of normality or linear correlation. In addition, copulas can easily capture the tail dependence of asset returns, i.e., the joint probability of large market movements. Copula theory was first developed by [34] to describe the dependence structure between random variables. It was later introduced to the finance literature by [35,36]. Consequently, [37] introduced the application of copula theory to financial asset returns, and [38] expanded the framework of copula theory with respect to the time-varying nature of financial dependence schemes. Copula theory has also been used in risk management to measure the VaR of portfolios, including both unconditional [39][40][41] and conditional distributions [42][43][44].
In multivariate settings, we use the following version of Sklar's theorem as given by [41] for the purpose of VaR estimation: Sklar's theorem: Consider an n-dimensional joint distributional function F(x), with uniform margins F 1 (x 1 ), . . ., F n (x n ); x = (x 1 , . . ., x n ), with −1 x i 1, then there exists a copula C: [0, 1] n ! [0, 1] such that determined under absolute continuous margins as otherwise, C is uniquely determined on the range R(F 1 ) × . . . × R(F n ). Equally, if C is a copula and F 1 , . . ., F n are univariate distribution functions, then Eq (12) is a joint distribution function with margins F 1 , . . ., F n [45]. The copula C(u 1 , . . ., u n ) has density c(u 1 , . . ., u n ) associated to it and defined as and is related to the density function F for continuous random variables denoted as f, by the canonical copula representation [16] f ðx 1 ; . . . ; where f i are the marginal densities that can be different from each other [41,43,45,46]. [16,47] discuss two commonly used families of copulas in financial applications: the elliptical and the Archimedean copulas.
Elliptical copulas are derived from the elliptical distribution by applying Sklar's theorem. The most common are the Gaussian and the Student's-t copulas, which are symmetric. Their dependence structure is determined by a standardised correlation or dispersion matrix because of the invariant property of copulas. Consider a symmetric positive definite matrix ρ with diag(ρ) = (1, 1, . . ., 1) T ; we can represent the multivariate Gaussian copula (MGC) as where F ρ is the standardised multivariate normal distribution and F À 1 r is the inverse standard univariate normal distribution function of u with correlation matrix ρ. If the margins are normal, then the Gaussian copula will generate the standard Gaussian joint distribution function with density function c G a r ðu 1 ; u 2 ; . . . ; u n Þ ¼ where B ¼ ðF À 1 ðu 1 Þ; . . . ; F À 1 ðu n ÞÞ 0 and I is the identity matrix. On the other hand, the multivariate Student's-t copula (MTC) has the form with density function c r;v ðu 1 ; . . . ; u n Þ ¼ jrj where t ρ,v is the standardised Student's-t distribution with correlation matrix ρ and v degrees of freedom.
Archimedean copulas are useful in risk management analysis because they capture asymmetric tail dependencies between financial asset returns. The most common are the Gumbel [48], Clayton [49] and Frank [50] copulas [51]. These copulas are built via a generator as with density function where φ is the copula generator and φ −1 is completely monotonic on [0, 1]. That is, φ must be infinitely differentiable with derivatives of ascending order and alternative sign such that φ −1 (0) = 1 and lim x ! +1 φ(x) = 0 [47]. Thus, φ 0 (u)<0 (i.e., φ is strictly decreasing) and φ 00 (u) > 0 (i.e., φ is strictly convex). The Gumbel copula captures upper tail dependence, is limited to positive dependence, and has generator function φ(u) = (−ln(u)) α and generator inverse φ À 1 ðxÞ ¼ exp À x 1 a À Á . This will generate a Gumbel n-copula represented by The generator function for the Clayton copula is given by φ(u) = u −α − 1 and generator a , which yields a Clayton n-copula represented by Frank copula has generator function φðuÞ ¼ ln exp ðÀ auÞÀ 1 exp ðÀ aÞÀ 1 and generator inverse φ À 1 ðxÞ ¼ À 1 a ln ð1 þ e x ðe À a À 1ÞÞ, which will result in a Frank n-copula represented by We follow [52] and employ Gaussian, Student's-t, Gumbel, Frank and Clayton copulas in this study.

Modelling dependence
The traditional way to measure the relationship between markets and risk factors is to look at their linear correlations, which depend both on the marginal and joint distributions of the risk factors. If there is a non-linear relationship (i.e., in the case of non-normality) the results might be misleading [47]. In this situation, non-parametric invariant measures that are not dependent on marginal probability distributions such as Kendall's τ or Spearman's ρ are more appropriate. Copulas measure a form of dependence between pairs of risk factors (i.e., asset returns) known as concordance using these invariant measures. Consider n paired continuous observations (x i , y i ) ranked from smallest to largest, with the smallest ranked 1, the second smallest ranked 2, and so on. Then, Kendall's τ is defined as the sum of the number of concordant pairs minus the sum of the number of discordant pairs divided by the total number of pairs, i.e., the probability of concordance minus the probability of discordance: where C is the number of concordant pairs below a particular rank that are larger in value than that particular rank, and D is the number of discordant pairs below a particular rank that are smaller in value than that particular rank. Spearman's ρ, on the other hand, is defined as the probability of concordance minus the probability of discordance of the pair of vectors (x 1 , y 1 ) and (x 2 , y 3 ) with the same margins. That is, The joint distribution function of (x 1 , y 1 ) is H(x, y), while the joint distribution function of (x 2 , y 3 ) is F(x)G(y) because x 2 and y 3 are independent [54]. Alternatively, where d is the difference between the ranked samples. A study by [54] has shown that Kendall's τ and Spearman's ρ depend on the vectors (x 1 , y 1 ), (x 2 , y 2 ) and (x 1 , y 1 ), (x 2 , y 3 ), respectively, through theirs copulas C, and that the following relationship holds: Cðu; vÞdCðu; vÞ À 1 and r X;Y ¼ 12

Extreme value theory
EVT is a statistical approach for estimating extreme events with low frequency but high severity. This technique is widely used in financial risk management since empirical evidence from various studies [5,6] show that in the majority of cases, financial asset return distributions are heavy-tailed, especially in times of financial instability.
There are two fundamental approaches for modeling extreme events with low frequency but high severity: the block maxima method and the POT method. The POT method is a commonly used method to model extreme events in financial time series data. On the other hand, the block maxima method is not commonly used for statistical inference on financial time series data for a few reasons: (i) The method does not make sufficient use of data as it uses only the sub-period maxima, (ii) the choice of sub-period length is not clearly defined, and (iii) the method is unconditional and does not take into account the effects of other explanatory variables [55]. In this paper we use the POT method based on the generalised Pareto distribution (GPD). The POT method focuses on modeling the exceedances of the losses above a certain threshold ϑ and the time of occurrence. The threshold is selected such that there are enough data points to carry out a meaningful statistical analysis. Techniques for selecting the proper threshold are discussed below.
Let fx i g T i¼1 represent the loss variables of an asset return, then as T ! 1, fx i g T i¼1 is assumed to be independent and identically distributed, and (x − μ)/σ follows a generalised extreme value (GEV) distribution: where ξ is the shape parameter and 1/ξ is the tail index of the GEV distribution.
Also, let the conditional distribution of the excesses over the threshold, i.e., Again, as T ! 1, (y + ϑ − μ)/σ follows a GEV distribution; see Eq (26). Therefore, where y > 0 and σ and y 2 0; À cðWÞ x h i when ξ < 0. If ξ = 0, then Eq (30) becomes an exponential distribution with parameter 1/σ ( [55]). Let y = x − ϑ, then Eq (28) can be written as The tail estimator for the underlying distribution F(x|ξ, ψ(ϑ)) is constructed using an empirical estimate of F(ϑ), i.e.,FðWÞ ¼ ðT À N W Þ=T aŝ where N ϑ is the number of observations above the threshold. We obtain the q th quantile F À 1 q ¼ VaR q , by inverting Eq (33), for any given small upper tail probability p for VaR estimation as where q = 1 − p [4,55,56]. After deciding on the choice of ϑ, and assuming that the number of points above ϑ are independent and identically distributed, the parameters ψ(ϑ) and ξ can be estimated by means of maximum likelihood estimation with likelihood function The choice of a threshold ϑ is an important step in the POT method because Eq (34) is dependent on ϑ and the number of points (i.e., exceedances) above ϑ since the parameters are estimated based on the exceedances. Thus, it is very important to find the proper threshold value. There is no clear-cut or wholly satisfactory method to determine a proper threshold that has been determined to date. [57] developed a semi-parametric estimator for the tails of the distribution that estimated the threshold of the bootstrap approximation of the mean square error (MSE) of the tail index and by minimising MSE through the choice of the threshold. [58] further used a two-step subsample bootstrap method to determine the threshold that minimised the asymptotic MSE. [59,60] propose graphical tools to identify the proper threshold known as the Hill plot and the mean excess plot, respectively. In this paper, we use the mean excess plot and propose its extension, which wee call a hybrid method as will be discussed later.
A mean excess function of x over a certain threshold ϑ is defined as A property of the GPD states that if the excess distribution of x given a threshold ϑ 0 be a GPD with shape parameter ξ and scale parameter ψ(ϑ 0 ), then for any random threshold ϑ > ϑ 0 , the excess distribution over the threshold ϑ has a GPD with shape parameter ξ and scale parameter Then which is a linear function of ϑ − ϑ 0 with slope ξ/(1 − ξ) for ϑ > ϑ 0 . From the ordered sample {x i }, we calculate and plot the mean excess function, i.e., Eq (37) against each chosen ϑ i for ϑ i > ϑ 0 . The threshold ϑ is then identified as the lowest point on the mean excess plot above which the graph appears to be approximately linear. However, the choice of ϑ from the mean excess plot is subjective [8,55] and might differ from one bank to another using the same data because of different risk tolerances. Different ϑ values will give different estimates of the shape and scale parameter. A very high threshold will result in too few data points in the left tail for any meaningful statistical analysis. In contrast, a very low threshold will result in a number of data points above the threshold lying close to the body of the sample data. This will result in a poor approximation because the GPD is a limiting distribution as ϑ ! 1; data beyond the threshold will deviate from the GPD since the GPD is not a good approximation for the body of the sample data [8,56]. We propose a hybrid method for selecting a proper threshold value that will significantly diminish the possibility of different ϑ values with the same data.

Data
The In some literature the stability in financial systems is measured using a portfolio consisting of several banks (i.e., by considering the dependence among the banks), while other studies focus on individual banks. This paper considers both measures. Therefore, by using the stock prices for each bank, we calculate the log-return series and apply risk factor mappings to construct a simulated portfolio of returns for all banks as follows: Consider a portfolio consisting of N risk factors represented in vector form as S N = (s 1t , . . ., s Nt ), the log-returns r t , are calculated as Let Inv be the total amount invested in the portfolio, x i be the fraction of the total investment invested in stock i, r i,t is the return of stock i at time t, then the weight applied to r i,t is the fraction of the portfolio invested in stock i calculated as w i ¼ x i Inv . Since the stocks are all from banks of almost the same strength (i.e., the top five banks in the UK), we may assume equal weights. Therefore, the expected return on the portfolio at time t is given by which is a weighted average of the return on the individual stocks in the portfolio.  Table 1 presents summary statistics of the data. We see from the table that the log-return series for each bank and the portfolio are far from being normally distributed as indicated by their high excess kurtosis and skewness. Furthermore, Jarque-Bera normality tests, Ljung-Box tests on the squared residuals a 2 i;t ; where a i,t = r i,t − μ i (μ i being the unconditional mean), and a Lagrange multiplier tests for autoregressive conditional heteroscedasticity (ARCH LM test) on the residuals a i,t , as described in [55,61,62], are significant at 5% level.

Modelling the marginal distributions of volatility equations
As noted, the log-return series are leptokurtic and skewed. Thus, to capture the tail distribution and the dynamics of fluctuations in the time series data, we consider a single-state, k = 1 and two-state, k = {1, 2} Markov Switching GARCH specifications. The underlying volatility model is a GJR-GARCH(1,1) model with skewed Student's-t distribution. Since we use just one variance specification (i.e., GJR-GARCH), the two-state Markov Switching GARCH is generated by setting the number of regimes in the conditional distribution to 2. For the singlestate, the length of the variance specification is equal to the length of the conditional distribution, which is 1 (see [14]). Also note that the single-state Markov Switching GJR-GARCH(1,1)  Refining value-at-risk estimates model corresponds to GJR-GARCH(1,1) model without regime change. Therefore, we simply refer to the single-state and two-state Markov Switching GJR-GARCH(1,1) models as GJR-GARCH(1,1) and MS-GJR-GARCH(1,1) models, respectively (see [14]). GARCH parameters are estimated using Bayesian statistics as follows: (i) We assign a prior distribution with initial hyperparameters and generate MCMC simulations of 40000 draws; (ii) if convergence is attained, we discard the first 20000 draws and select only the 10th draw from each chain such that auto-correlation between draws is reduced to almost zero. We merge the two chains together to obtain a sample data set of 2000 observations. (iii) If convergence is not attained, repeat (i) using parameter estimates from the previous draw as the hyperparameters to increase the chance of convergence. The mean value of each parameter with respect to its respective posterior distribution is the optimal parameter estimate of the Bayesian GJR-GARCH(1,1) and Bayesian MS-GJR-GARCH(1,1) models with skewed Student's-t distributions. Estimation results are presented in Tables 2 and 3 with standard errors in parenthesis.
For MS-GJR-GARCH(1,1) model, the degrees of freedom parameter, ν is fixed across the regimes. Refining value-at-risk estimates Applying Eq (2), we then obtain a matrix S, which consists of the filtered marginal standardised residuals, f i;t g T t¼1 , of the overall process for the MS-GJR-GARCH(1,1) model and GJR-GARCH(1,1) model. That is The ARCH LM test and Ljung-Box test on the standardised residuals and standardised squared residuals, respectively, for lags 5 and 10 are presented in Table 4.

Modelling dependence with copulas
We model the dependence structure among the stock returns using copula functions. Copula parameters are estimated by the canonical maximum likelihood (CML) method [41]. This entails the use of pseudo-observations of the standardised residuals to estimate the marginals. We then estimate the copula parameters by inversion of Kendall's τ, which is one of the most commonly used invariant measures and has been proven to provide more efficient ways of estimating correlations [63,64]. The copula that fits the data best is selected by maximum likelihood estimation (MLE) method by maximising the likelihood function whereĈ 2 are estimates of the copula parameters. The estimated copula parameters are reported in Table 5, along with their Akaike information criterion (AIC) values. For both models, Frank and Student's-t copulas are selected from each copula family based on the highest MLE values. From Table 5, the same copula types have been selected based on the AIC values (the copula with the smallest AIC value is preferred). Note that Gaussian copula gives a higher MLE value compared to the Archimedean copulas but also higher AIC value. shows the Kendall's τ for Gaussian and Student's-t copula parameter estimates. Thus, the Gaussian copula is not a good fit for the data. The analysis continues based on the selected copulas. Next, we specify the desired marginal distributions, which we set to Student's-t distribution, and using the estimated copula parameters, we generate 10000 simulations to obtain a new matrix of marginal standardised residualŝ which is free from assumptions of normality and linear correlations. To confirm this, we employ a multivariate ARCH test based on the Ljung-Box test statistics and its modification Q r k ðmÞ, known as a robust test, on the log returns at 5% significance level, where m is the number of lags of cross-correlation matrices used in the tests, k is the dimension of r i,t , T is the sample size, b i ¼ vecðr 0 i Þ withρ j being the lag-j cross-correlation matrix of The modification Q r k ðmÞ involves discarding those observations from the return series whose corresponding standardised residuals exceed 95th quantile in order to reduce the effect of heavy tails. The motivation for Q r k ðmÞ test is that Q k (m) may fare poorly in finite samples when the residuals of the time series, r i,t , have heavy tails [45]. The tests show no evidence of conditional heteroscedasticity lags m = 10; Table 7.
We follow the approach by [9] and apply the POT method of EVT to each of the marginal distributions of {z i,t } (i.e., Eq (42)) to obtain the q th quantile, VaR(Z) q of the noise variables for VaR estimation. Let {χ i,τ } be the negative variables of the marginal distributions of {z i,t } such that {χ i,τ } {z i,t }. Then, from the ordered sample of {χ i,τ }, we calculate and plot the mean excess function to help identify the threshold. As an example, Figs 2 and 3 are mean excess function plots for Bank 1 following the Bayesian GJR-GARCH(1,1) Student's-t and Frank copula models. The plots suggest us to select threshold values of about 1.3 and 1.4 for Figs 2 and 3, respectively, which are the lowest points on the graphs above which the graph appears to be  Refining value-at-risk estimates approximately linear. However, if we select these points as the threshold values, we will have 1402 exceedances for Fig 2 and 1039 exceedances for Fig 3, which are too many compared to the size of the data (i.e., T = 10000). The number of exceedances thus lie towards the body of the data, which will inevitably result in a poor approximation of the GPD parameters and hence lead to inaccuracies in the VaR estimate. In addition, the threshold selection method is very subjective and will be different from one analyst to the other based on their preferences. We propose an extension to the mean excess plot for threshold selection; the hybrid method. That is, from the mean excess plot, we identify the lowest point, making the graph appears approximately linear, a point ϑ 0 . We then insert a tangent line from ϑ 0 through the rest of the points ϑ i , where ϑ i > ϑ 0 ; see Fig 4. Since the tangent to a linear curve is the tangent itself and the mean excess function is a linear function of the threshold, we take an average of the set of points that touches the tangent line as the threshold value, a point ϑ Ã . This point ϑ Ã will lead to a better approximation of VaR estimates than ϑ 0 because the inference is restricted to the left tail. Apart from better approximation of VaR estimates, this method significantly reduces the probability of having different VaR estimates for the same data and also the probability of selecting a very low or very high threshold value. Let ϑ i = ϑ 1 , . . ., ϑ ℏ be a set of points that touches the tangent line, then we obtain the value of ϑ Ã as where ℏ is the number of points in the set. It can be seen in Fig 4 that the points touching the tangent line, i.e., ϑ 0 , are too compact and might lead us to miss some important points. A better way for selecting these points is by fitting a regression lineŷ which is based on the least square method to the points fW i g ℏ i¼1 , whereŷ is the estimate of the dependent variable, and x is the independent variable with intercept b 0 and slope b 1 . In the presence of heteroscedasticity and outliers, it may be advantageous to consider fitting a robust regression line. Robust regression methods are not influenced by outliers, and are also very useful when there are problems with heteroscedasticity in the data set. This method is demonstrated for Bank 1 in Figs 5 and 6, which illustrates a comparison between simple linear and robust regression methods. It can be seen that the regression lines for standard regression models are affected by outliers in the left tails of the mean excess plots, hence a robust regression model is more reliable.
Following the above analysis, we obtained a threshold value of 2.2448 and 333 exceedances for Fig 5, and 2.6476 and 176 exceedances for Fig 6. The analysis are restricted to the tails and the data is sufficient to allow for reasonable statistical inferences with EVT. Tables 8 and 9 presents the POT parameter estimates and the forecast VaR estimates.  Refining value-at-risk estimates where ρ i,j is the Pearson cross-correlation coefficient between the returns of the ith and jth stocks. As noted, the overall risk measures are quite stable for both models and different thresholds indicating that the model has effectively captured the dynamics of fluctuations in the left tails of the return distributions. This claim may be validated through back-testing the model. We can also see the effect of diversification on the risk of the individual banks on the portfolio VaR. Employing Eq (46), the one step ahead VaR is then calculated as Note thatĥ 1 2 D t ;tþ1 is the one-step-ahead conditional volatility forecast of the overall conditional variance for the portfolio at time t + 1 for state k, Δ t is a Markov chain as defined in Eqs (1) and (2), but for " R p is by Eq (39). That is, " R p;t jðD t ¼ k; O tÀ 1 Þ and the parameters are sampled from the posterior distribution using MH algorithm. Figs 7-10 show time plots of profit and loss (P&L) of the portfolio return series and forecasts portfolio VaR estimates at 99% and 95% confidence levels. A visual observation of the plots suggests that the VaR models perform quite well in capturing the dynamics in the portfolio return series.   https://doi.org/10.1371/journal.pone.0198753.g006 Table 9.  Refining value-at-risk estimates

Model checking
The reliability of the VaR model is often tested by performing back-testing. This involves comparing the estimated VaRs for a given time horizon and observation period to the subsequent returns and recording the number of days in which the loss on the portfolio exceeds VaR. The number of days T 1 in which the loss on the portfolio exceeds VaR is recorded as the number of exceptions or failures. Too many exceptions imply that the VaR model underestimates the level of risk, and too few exceptions imply the model overestimates risk. For the VaR model to be accepted as a reliable risk measure, the number of exceptions produced for any given observation period should satisfy the unconditional coverage (UC) and independent (IND) properties. We define an indicator function on the exceptions at time t as where the indicator function equal to 1 when the loss on the portfolio L t exceed VaR p q;t , and 0 if otherwise; note that q is the choice of confidence level. For the UC property, Pr ½I t ð1 À qÞ ¼ 1 % 1 À q; 8 t ; i.e., the number of exceptions should be reasonably close to T w (1 − q)%, depending on the choice of q, and should follow a binomial distribution. T w is the size of the window over which back-testing is conducted. For the IND property, the exceptions produced on day t − 1 should be independent of exceptions produced on day t and evenly spread over time.
In this study, we use several back-testing methods to test the accuracy of the proposed VaR models. The most common are the Kupiec's proportion of failures (POF) test for the UC [65], Christoffersen's test for the UC and IND [66], Engle and Manganelli's Dynamic Quantile (DQ) test [67], and Santos and Alves' new class of independence test [68]. We also consider the Basel traffic light test proposed by the Basel Committee on Banking and Supervision (BCBS) [69].
Kupiec defined an approximate 95% confidence region whereby the number of exceptions produced by the VaR model must lie within this interval for it to be considered a reliable risk measurement model. The test is based on the likelihood ratio where T 0 = T w − T 1 . Under the UC, the null hypothesis for LR POF is H 0 : E½I t ð1 À qÞ A study by [66] extended Kupiec's POF test to test the independence of conditional coverage. Under the null hypothesis that the number of exceptions produced are independent and evenly spread over time, π 01 = π 11 = π with likelihood ratio where T ij , with i, j = 0(noviolation), 1(violation), is the number of observed events with the j th event following i th , and π 01 , π 01 and π are estimates of the probabilities of T i,j [70].
The hypothesis is Pr ½I t ð1 À qÞ ¼ 1jO tÀ 1 ¼ 1 À q; 8 t against Pr ½I t ð1 À qÞ ¼ 1jO tÀ 1 6 ¼ 1 À q; 8 t , where O t−1 is the information available on day t − 1. The model is rejected for the conditional coverage property if LR CC > w 2 2 ¼ 5:99. The DQ test utilises the criterion that the number of exceptions produced on day t should be independent of the information available at day t − 1. The function is defined as The Hit t function assumes the value q when the loss on the portfolio at time t is less than VaR p q;t , and −(1 − q) otherwise. As explained in [67], clearly E[Hit t ] = 0, E[Hit t |O t−1 ] = 0 and Hit t must be uncorrelated with its own lagged values. The test statistics is given by where X t is a vector containing all values of Hit t , VaR p q;t and its lags. Under the null hypothesis E[Hit t ] = 0 and E[Hit t |O t−1 ] = 0, Hit t and X t are orthogonal and Hit t must be uncorrelated with its own lagged values [67,71]. The DQ test is easy to perform, and does not depend on the estimation procedure; all that is needed is a series of VaRs and the corresponding values of the portfolio returns [67]. In this study, we follow [67,72,73] to use a constant, four lagged values of Hit t .
In Santos and Alves' new class of independence test [68], we first define the duration between two consecutive exceptions as D i = t i − t i−1 , where t i denotes the time of exception number i; and t 0 = 0 implies that D 1 is the time until the first exception. We denote a sequence of N durations by fD i g N i¼1 , where the order statistics are D 1:N . . . D N:N . The test statistics is defined as See [68] for more details on this test. Finally, BCBS developed a set of requirements that the VaR model must satisfy for it to be considered a reliable risk measure and proposed the Basel traffic light test. That is, (i) VaR must be calculated with 99% confidence, (ii) back-testing must be done using a minimum of a one year observation period and must be tested over at least 250 days, (iii) regulators should be 95% confident that they are not erroneously rejecting a valid VaR model, and (iv) Basel specifies a one-tailed test -it is only interested in the underestimation of risk [74]. [2] summarises the acceptance region for the Basel traffic light approach to back-testing VaR models.
We use out-of-sample data of m = T − n observations for back-testing; thus we have n = 1869 sample of the return observations for VaR estimation procedure containing the 2008 global financial crisis period, and m = 1000 of return observations for back-testing. VaR is then estimated following a rolling window approach. The out-of-sample data is further divided into blocks of 250, 500, and 1000 trading days to observe how the models behave for both longer and shorter observation periods. The division of out-of-sample data is also employed to meet the BCBS requirements. Table 10 presents the expected and observed number of exceptions produced following each model for a portfolio consisting of all five banks. At 99% confidence level and 250 trading days, the MS-GJR-GARCH(1,1) copula EVT VaR model registered 3 exceptions for a single-state and 0 exceptions for a two-state MS-GJR-GARCH (1,1) model. Thus, following Basel rules for back-testing, the VaR models passed the reliability test and are placed in the green zone. Back-testing results based on LR UC , LR IND , LR CC , DQ, Refining value-at-risk estimates and T N, [N/2] tests are presented in Tables 11 and 12. For the DQ test, we use a lagged value of 4.
In Tables 13 and 14 we present, as a benchmark for our VaR models, back-testing results for VaR models constructed using GJR-GARCH(1,1) and standard GARCH(1,1) (sGARCH(1,1)) volatility models with skewed Student's-t errors but without copula functions and EVT. It can be seen from the number of exceptions recorded that the MS-GJR-GARCH(1,1) copula-EVT VaR model and the benchmark VaR models do not underestimate risk but rather too conservative at 99% and 95% confidence levels and thus preferred by most financial institutions. Most financial institutions will prefer VaR models with zero or very few exceptions as they routinely produce plots of P&L that show no violation of their 99% confidence VaR models over long periods stating that this supports their risk models. "The amount of economic capital banks currently hold is in excess of their regulatory capital. As a result, banks may prefer to report higher VaR numbers to avoid the possibility regulatory intrusion" [2]. Kupiec's UC test (LR UC ) will reject VaR models that produce 0 exceptions. The GJR-GARCH(1,1) copula-EVT VaR model captures VaR quite well in periods of calm and in periods of crisis for short and long observation periods. It does not overestimate or underestimate the level of risk on the portfolio and is does considered reliable as a measure of risk. Performance evaluation for rejection or acceptance of the VaR models based on 5% significance level are presented in Table 15.

Conclusion
In recent decades, VaR has become the most common risk measure used by financial institutions to assess market risk of financial assets. Since VaR models often focus on the behavior of asset returns in the left tail, it is important that the models are calibrated such that they do not underestimate or overestimate the proportion of outliers, as this will have significant effects on the allocation of economic capital for investments. Due to the extremistan [75] nature of financial asset returns and volatility, the real tail risk of a financial asset is not stable as time passes, and the maximum loss is difficult to predict. Therefore, to implement a reliable VaR model, the time horizon and type of volatility model used is very important. We constructed our VaR models by combining a single-state and two-state Bayesian MS-GJR-GARCH(1,1) models with skewed Student's-t distributions for the underlying volatility model, copula functions to model dependence, and EVT to model the left tail. The single-state Bayesian MS-GJR-GARCH(1,1) is a GJR-GARCH(1,1) model without regime change, hence the names Bayesian GJR-GARCH(1,1) copula-EVT VaR model for the single-state MS-GJR-GARCH (1,1) and Bayesian MS-GJR-GARCH(1,1) copula-EVT VaR model for the two-state MS-GJR-GARCH(1,1). We use as a benchmark, VaR models constructed using GJR-GARCH(1,1) and sGARCH (1,1) volatility models with skewed Student's-t distributions, but without copula functions and EVT to compare the performance of our VaR models. Back-testing results show that the GJR-GARCH(1,1) copula-EVT VaR model is much reliable than the MS-GJR-GARCH(1,1) copula EVT VaR model and the benchmark VaR models. Back-testing results further indicates that the GJR-GARCH(1,1) copula EVT VaR model does not overestimate or underestimate the level of risk on the portfolio whereas the two-state MS-GJR-GARCH(1,1) copula EVT VaR model and the benchmark VaR models seems to overestimate the level of risk.
It is also important to draw attention to the fact that Eq (34) is a point estimate with an error band that becomes larger as we move to more extreme quantiles. It is concerned only with the number of exceedances above a certain threshold and is not affected by data outside the tail of the distribution [8]. This can be problematic in some cases due to limited data points in the tail, which can inhibit proper analysis.
Eq (34) also depends on the threshold and the number of points (i.e., exceedances) above the threshold because the parameters are estimated based on the exceedances. Thus, it is logical to say that the reliability of Eq (34) rests solely on the choice of thresholds, which is subjective. The proposed hybrid method for the threshold addresses this issue and diminishes the possibility of selecting a less suitable threshold value. This method is reliable and can be implemented with other conditional multivariate volatility models providing positive-definite volatility matrices.