Compound distributions for financial returns

In this paper, we propose six Student’s t based compound distributions where the scale parameter is randomized using functional forms of the half normal, Fréchet, Lomax, Burr III, inverse gamma and generalized gamma distributions. For each of the proposed distribution, we give expressions for the probability density function, cumulative distribution function, moments and characteristic function. GARCH models with innovations taken to follow the compound distributions are fitted to the data using the method of maximum likelihood. For the sample data considered, we see that all but two of the proposed distributions perform better than two popular distributions. Finally, we perform a simulation study to examine the accuracy of the best performing model.


Introduction
The Student's t distribution due to Gosset [1] is the most common and parsimonious model for economic and financial data [2,3]. It not only offers the potential to fit the leptokurtic properties of financial data but also, can serve as a foundation for building complex statistical models that can describe more subtle features of financial data such as volatility clustering. In recent times, many notable modifications to its functional form have been proposed, for example, see Hansen [4], Fernández and Steel [5], Theodossiou [6], Jones and Faddy [7], Sahu et al. [8], Bauwens and Laurent [9], Aas and Haff [10], Zhu and Galbraith [11,12] and Papastathopoulos and Tawn [13]. They have been applied beyond Bayesian finite and infinite variance models [14], Markov regime switching models [15] as well as multivariate stochastic volatility models [16]. A detailed review of various modifications of the Student's t distribution is provided by Li and Nadarajah [17] but the list is still by no means complete.
One of the Student's t popular generalizations, often recommended for risk quantification in finance as noted by McNeil et al. [18] is the generalized hyperbolic distribution (GHYP) due to Barndorff-Nielsen [19]. The GHYP distribution offers a flexible functional form and possesses a number of attractive properties. For instance, the GHYP distribution can be both symmetric and skewed and is classified as a normal mean-variance mixture distribution and has the Student's t as one of its special cases. Normal mean-variance distributions are also widespread and not uncommon. For example, mixing of this type can be traced back to Press [20] and Praetz [21], followed by Andrews and Mallows [22], Barndorff-Nielsen [19], Barndorff-Nielsen et al. [23], Kon [24], West [25], Madan and Seneta [26], Madan et al. [27], Tjetjep and Seneta [28], Luciano and Semeraro [29], Geweke and Amisano [30], Nadarajah [31], among others. Typically, these class of models (compound distributions) capture heterogeneous characteristics of financial data by randomizing one of the parameters (often the scale parameter) of the parent distribution with appropriate mixing distributions, for example, see McDonald and Butler [32], Hoogerheide et al. [33] and Ardia et al. [34,35].
Recently, Afuecheta et al. [36], unlike the previous compositions which are based on the normal distribution, introduced mixture models based on scale mixing of the Student's t distribution by specifically focusing on the leptokurtic properties of financial data. In particular, they provided flexible compositions of the Student's t with three mixing distributions: exponential, Weibull and gamma. Their models were shown to provide better fits than some of the popular and more complicated generalizations of the Student's t distribution, including the GHYP distribution. Hence, given good empirical performance of these models and because of the increasing interest in terms of methodology and applications, we extend this work by considering six mixing distributions. We proceed with the assumption that the conditional distribution for financial returns follows the Student's t distribution. The variance (volatility) of returns is assumed to follow any of the six mixing distributions: one parameter half normal, two parameter Fréchet, two parameter Lomax, two parameter Burr III, two parameter inverse gamma and three parameter generalized gamma distributions. With these distributions, our research offers six new compound distributions.
The primary objectives of this paper are: (i) to propose six new compound distributions based on the Student's t distribution; (ii) to illustrate applications of these distributions using real financial data sets; (iii) to compare the proposed distributions with two of the most popular parametric distributions used in finance-the GHYP distribution and asymmetric Student's t (AST) distribution due to Zhu and Galbraith [11,12]. For each of the proposed compound distribution, we provide its probability density function (PDF), cumulative distribution function (CDF), moments and characteristics functions. We perform our estimations using the method of maximum likelihood (ML). For the samples considered, empirical comparisons are made using a common set of log-likelihood based criteria. We show that all but one of the proposed distributions perform better than the GHYP distribution under the selection criteria. We also show that all but two of the proposed distributions perform better than the AST distribution under the selection criteria.
The rest of this paper is organized as follows. In Section 2 and corresponding subsections, we present the general form of the proposed distributions; Section 3 describes the data, conducts some exploratory analysis linked to the proposed distributions and outlines evaluation criteria; the results and their discussion are given in Section 4. In Section 5, we conduct a simulation study to assess the performance of the ML estimators with respect to sample size n and to demonstrate the ability of the best performing model. The simulation study also helps to evaluate the uncertainty surrounding the parameters of the best performing model, which ensures that the results obtained are reproducible if the same model is applied to the same data sets, but at a different time interval; finally, Section 6 concludes and summarises our work.
Two of the data sets used are data on cryptocurrencies. There are many papers on risk estimation for cryptocurrency data. Most notable papers include Acereda et al. [37], Trucios et al. [38] and Jimenez et al. [39].

Compound distributions
In this section, we begin by writing down the general form of the proposed distributions. Let X denote a continuous random variable representing the observed financial data series; in our case, log-returns of two financial stock indices, two fuel commodities and two cryptocurrencies exchange rates. Assuming that the conditional asset return distribution is Student't with the PDF given by Having obtained the general expressions for the PDF given by (4), the CDF given by (6), the kth moment given by (7), and the characteristic function given by (9), we shall proceed to obtain expressions for any given mixing distribution, g(�). The choice of the mixing distributions (two parameter inverse gamma distribution, two parameter Lomax distribution, the generalize gamma distribution, two parameter Burr distribution, two parameter Fréchet and one parameter half normal) is motivated by Fig 1, showing the histograms of the volatility for financial series considered in Section 3. The volatility is measured by the standard deviation taken over non-overlapping windows of length 50 days. From Fig 1, we see that g(�) corresponds to an exponential-type family of distributions with unimodal PDF, suggesting overall appropriateness of the choices. Notably, the following procedure was used for the choice of g(�): (i) fit the considered g(�) forms to the standard deviation series obtained using MLE; (ii) select the best performing g(�) based on the lowest negative log-likelihood and provide the best fitting parametric outcome for each histogram shown in Fig 1. With this, we observe that the volatility for stock indices and cryptocurrencies is best described by the generalized gamma PDF.
The calculations in the following sections make use of two special functions: the generalized hypergeometric function defined by the Wright [40] generalized hypergeometric function defined by The properties of these special functions can be found in Prudnikov et al. [41], Gradshteyn and Ryzhik [42], Mathai and Saxena [43] and Srivastava et al. [44].

Two parameter inverse gamma: With g taking the form
GðaÞ for τ > 0, α > 0 and β > 0. Note that β and α are the scale and shape parameters, respectively. This PDF has a unique mode (which is found at τ = β/(α+ 1)) and skewed moderately to the right. It can be used to describe a wide range of physical phenomenon in diverse disciplines, including climatology, reliability, option pricing, economics, finance and survival analysis. See Bouchaud and Potters [45] for some application of the inverse gamma distribution to stock returns. For the two parameter inverse gamma distribution, Hence, from (4), (6), (7), and (9) we obtain the closed form expressions for the PDF, CDF, moments and characteristic function as and respectively.

Two parameter Lomax: With g taking the form
gðtÞ ¼ ab a ðb þ tÞ aþ1 for τ > 0, β > 0 and α > 0. The scale and shape parameters are respectively governed by β and α. This PDF has a unique mode (with the mode at zero). It is notable for characterizing business failure. As a distribution within the Pareto family it has often used in modelling tail losses of returns. In fact, this distribution is also known as type II Pareto distribution and is a special case of the generalized Pareto. It has also been used extensively in analyzing lifetime data. See Benckert and Jung [46], Revankar et al. [47], Arnold [48], Hogg and Klugman [49] and Nair and Hitha [50] for some applications of the Lomax distribution. For the two parameter Lomax distribution, Hence, from (4), (6), (7), and (9) we obtain the closed form expressions for the PDF, CDF, moments and characteristic function as and respectively.

Generalized gamma: With g taking the form
The scale, first shape and second shape parameters are respectively given by β, λ and α. This PDF has a unique mode and skewed to the right. The generalized gamma distribution has extensive applications in different areas, including hydrology, water resources, biology, and economics. It encompasses a number of other distributions often used in survival analysis. For example, if λ = α = 1 then the generalized gamma distribution becomes the exponential distribution; if λ = 1 the generalized reduces to the gamma distribution; and if α = 1 the generalized becomes the Weibull distribution. For applications of this family of distribution to stock returns, see Madan and Seneta [26] and Tjetjep and Seneta [28]. For the generalized gamma distribution, GðaÞ : Hence, from (4), (6), (7), and (9) we obtain the closed form expressions for the PDF, CDF, moments and characteristic function as and respectively.

Two parameter Burr III distribution: With g taking the form
for τ > 0, c > 0 and λ > 0. The two parameters are commonly referred to as the shape (c, λ) parameters. This distribution has a unique mode and moderately skewed to the right. The Burr distribution is one of the popular distribution in statistics. It is often used in reliability analysis as more flexible alternative to other competing distributions such as the lognormal, etc. It has a wide range of applications in other areas such as forestry, meteorology, etc. For the two parameter Burr III distribution, Hence, from (4), (6), (7), and (9) we obtain the closed form expressions for the PDF, CDF, moments and characteristic function as and respectively.

Two parameter Fréchet: With g taking the form
for τ > 0, α > 0 and β > 0. This distribution has a unique mode and skewed to the right. The shape and scale parameters are, respectively, governed by α and β. The distribution is also known as inverse Weibull distribution because if 1/O has the Weibull distribution then O will have the Fréchet distribution. It is a special case of the generalized extreme value distribution which is widely used in characterization of "tail risks" in fields ranging from insurance to finance. Some other application areas of the Fréchet distribution include business and operations research, economics, hydrology, materials and product technology. For the two parameter Fréchet distribution, Hence, from (4), (6), (7), and (9) we obtain the closed form expressions for the PDF, CDF, moments and characteristic function as and respectively.

One parameter half normal: With g taking the form
for τ > 0 and θ > 0. The half normal distribution is a normal distribution with scale parameter θ bounded from below at zero. Its applications cut across many areas. For instance, see Meeusen and van Den Broeck [51] and Chou and Liu [52] for applications of the half normal distribution in production processes; Lawless [53] and Cooray and Ananda [54] for applications in life data analysis; Dobzhansky and Wright [55] for applications in genetics; and Bland and Altman [56] for applications in biological sciences. For the one parameter half normal distribution, Hence, from (4), (6), (7), and (9) we obtain the closed form expressions for the PDF, CDF, moments and characteristic function as respectively.

Skewness and kurtosis
By definition, each of the six compound distributions has zero skewness. The kurtosis values can be computed using (11), (13), (15), (17), (19) and (21). These values versus the degree of freedom parameter, ν, are shown in Fig 2. We see that kurtosis is a decreasing function of ν for each compound distribution. The kurtosis for each distribution takes larger values compared to the Student's t distribution; hence, they are more flexible with respect to heavy tailed data. Over the plotted range, the compound Lomax distribution takes the largest kurtosis values. We note further that the kurtosis is a decreasing function of: α for the compound inverse gamma distribution; α for the compound Lomax distribution; λ for the compound generalized gamma distribution; c for the compound Burr distribution; α for the compound Fréchet distribution. For details about how skewness and kurtosis can be used to improve model fitting and forecasting performance, see Feunou et al. [57] and Lalancette and Simonato [58].

Data
To investigate the empirical performance of the proposed distributions, we consider six popular financial series. These include: two financial stock indices, two fuel commodities prices and two cryptocurrencies exchange rates.  [59] and the references therein. In general, there are no specific rationale for composing our data set, though alongside some very common stock indices (S&P500 and DJI) we aim to have some financial series with notable tail (excess kurtosis) characteristics (Propane and LTC), since our work is partially motivated by the heavy tail potential of the parent distribution of the compound distributions. For the above discussed financial series, we computed log-returns as where R i,t is the return on the index i for the period t, P i,t is the closing rate/price of the index at the end of period t and P i,t−1 is the price of the index at the end of the period t − 1. The histogram of the transformed data and their kernel density evaluations are shown in Fig 3. Their characteristics described in Table 1 are: minimum, first quartile (Q1), median, mean, third quartile (Q3), maximum, skewness, kurtosis, standard deviation (SD), variance, range and inter quartile range (IQR). From Table 1, we observe the highest range is for the cryptocurrencies returns, followed by the commodities and the smallest for the stock indices. Fig 3 shows the time series plots of returns which appear to oscillate around zero. The oscillations vary a great deal in magnitude, but are almost constant in average over period of the study. Also, from the plot we observe that for each return, periods of high volatility are followed by the periods of low volatility and vice versa. This is not surprising as it is a typical nature of financial indices [60][61][62]. Notably, from  where Θ = (θ 1 . . .θ k ) 0 is the parameter vector. Consequently, the optimal estimates for Θ arê . All our computations were performed using the standard Nelder-Mead optimization routine with optim command in R as provided by R Core Team [63].
Since the considered distributions are not nested, discrimination among them is performed using the Akaike information criterion (AIC) due to Akaike [64], the Bayesian information criterion (BIC) due to Schwarz [65], the corrected Akaike information criterion (AICc) due to Hurvich and Tsai [66], the Hannan-Quinn criterion (HQC) due to Hannan and Quinn [67], and the consistent Akaike information criterion (CAIC) due to Bozdogan [68]. Extensive discussion on these commonly used criteria is provided by Burnham and Anderson [69] and Fang [70]. Roughly speaking, the smaller the values of these criteria the better the fit.

Estimation results and discussion
The GARCH (1, 1) model with the six innovation distributions proposed in Section 2 was fitted to the data described in Section 3. The six innovation distributions do not allow for asymmetry. Also fitted is the GARCH (1, 1) model with the AST and GHYP distributions chosen as the innovation distributions. These two distributions allow for asymmetry of the volatility, which has been noted in the literature for cryptocurrency and energy data sets [37,71,72]. We have chosen GARCH (1, 1) as a baseline model, because it is the most simple and accessible model available in the R packages fGarch and rugarch for fitting GARCH type models. We fitted also GARCH models of higher orders, but they did not provide significantly better fits. The method of ML was used for fitting all of the models. For fitting the GARCH (1, 1) model with GHYP innovations, we used the rugarch package. For fitting the GARCH (1, 1) model with AST innovations, we used the VaRES package. The log-likelihood values and the values of two of the five selection criteria (AIC and BIC) for all the proposed distributions are provided in Table 2. The values of the three remaining selection criteria can be obtained from the authors. They led to the same conclusions. Table 2 also gives the differences in empirical and fitted estimates of kurtosis.
According to the selection criteria and the kurtosis values in Table 2, the GARCH (1, 1) with compound generalized gamma innovations gives the best fit, the compound Burr innovations give the second best fit, the compound Fréchet innovations give the third best fit, the compound inverse gamma innovations give the fourth best fit, the AST innovations give the fifth best fit, the compound Lomax innovations give the sixth best fit and the GHYP innovations give the seventh best fit. The worst fit is given by the GARCH (1, 1) model with compound half normal innovations. These conclusions are the same for all the returns.
The probability plots of the standardized residuals for the best fitting GARCH (1, 1) model with compound generalized gamma innovations are shown in Fig 4. The corresponding quantile plots are shown in Fig 5. Both figures suggest that the fit of the model is adequate.
The p-values of Vuong [73]'s likelihood ratio test to see if the best fitting model is significantly better than the other seven models are given in Table 3. The p-values of Amisano and Giacomini [74]'s likelihood ratio test to see if the best fitting model is significantly better than the other seven models in the left and right tails are given in Table 4. The p-values in all these tables show that the GARCH (1, 1) model with compound generalized gamma innovations provides significantly better fits than all other models. Vuong [73]'s test was performed using the command vuongtest in the nonnest2 package. Amisano and Table 2

Returns
Innovation Giacomini [74]'s test was performed using the code available in https://sites.google.com/site/ gianniamisanowebsite/. Table 5 tests the significant difference between mean squared errors when the GARCH (1, 1) models were fitted to rolling windows of length 100 days and used to predict the 101th data value [75]. The GARCH (1, 1) with compound generalized gamma innovations is used as the baseline model. The R package fDMA was used to perform the tests. The p-values show that the GARCH (1, 1) model with compound generalized gamma innovations provides significantly better mean squared errors than all other models. These conclusions were the same when the widow length was taken to be 200, 300, . . ., 1000 days.
Finally, Table 6 gives the p-values of three backtesting methods at 99 percent value-at-risk. In each triplet, the first is the p-value of Kupiec's proportion of failures [76] test, the second is the p-value of Escanciano and Olmo [77]'s test, and the third is the p-value of peak over threshold's method. For the last method, we used the evd package. The threshold was chosen by the mean residual plot which was drawn using the command mrlplot. As expected, the peak over threshold's method gives the largest p-values. For the first two methods, the GARCH (1, 1) with compound generalized gamma innovations gives the largest p-values, the compound Burr innovations give the second largest p-values, the compound Fréchet innovations give the third largest p-values, the compound inverse gamma innovations give the fourth largest

Simulation study
In this section, we conduct a simulation study to assess the performance and accuracy of the ML estimators of the best fitting GARCH(1, 1) model with compound generalized gamma innovations. The following scheme was used: 2. estimate (ν, β, λ, α) and the three GARCH parameters; 3. repeat steps 1 and 2 ten thousand times; 4. hence, estimate the biases and the mean squared errors for the seven parameters; 5. repeat steps 1 to 4 for n = 20, 21, . . ., 500.
The plots of the biases versus n are shown in Fig 6. The plots of the mean squared errors versus n are shown in Fig 7. We can observe the following from the figures: the biases can be positive or negative but approach zero as n approaches 500; the biases appear largest forb and smallest forn; the biases appear reasonably small at around n = 500; the mean squared errors gradually decrease with increasing n; the mean squared errors appear largest forb and smallest forn; the mean squared errors appear reasonably small at around n = 500. In the simulation scheme, we have taken the initial parameter values as the estimated values for S&P500 returns. The results were similar for a wide range of other initial values including the estimated values for the other five returns.

Conclusions
In this paper, based on the scale mixing of the Student's t distribution, we have developed six new compound distributions. We have also derived their basic properties such as the PDF, CDF, moments and characteristic functions. With these distributions taken as innovations for the GARCH(1, 1) model, we have shown that all but one (respectively, two) of the six distributions perform better than the GARCH(1, 1) model with generalized hyperbolic (respectively, asymmetric Student's t) innovations. The comparison was made in terms Akaike information criterion values, Bayesian information criterion values, consistent Akaike information criterion values, corrected Akaike information criterion values, Hannan-Quinn criterion values, pvalues of Vuong [73]'s likelihood ratio test, p-values of Amisano and Giacomini [74]'s likelihood ratio test for the left tails, p-values of Amisano and Giacomini [74]'s likelihood ratio test for the right tails, mean squared errors of one-day ahead forecasts, and three backtesting methods.
In addition, we have performed a simulation study to examine the accuracy of the best fitting GARCH(1, 1) model with compound generalized gamma innovations. The accuracy was assessed in terms of biases and mean squared errors. Both decreased in magnitude when the sample size increased. Both appeared reasonably small when the sample size was as large as 500. The sample sizes of all six data sets considered are well above 500. The results showed that the GARCH (1, 1) model with compound generalized gamma innovations is valid and worth considering in the general financial context of risk exposure modelling.
Nearly all of the data sets we have considered have skewness close to zero. Hence, there is no need for the compound distributions in Section 2 to incorporate a skewness parameter. However, there are several ways that these distributions can be extended to incorporate skewness. A prominent approach is described in Theodossiou and Savva [78] and Savva and Theodossiou [79]. Another prominent approach is described in Fernández and Steel [5].
An extension of the paper is an analysis of the finiteness of the return distribution unconditional moments through the tail-index according to the "power law" literature; see Gabaix et al. [80], Ibragimov et al. [81] and references therein. This analysis could lead to a better understanding of the empirical results on the existence of the unconditional higher-order moments under the proposed distributions.
Further extensions to the GARCH time series frameworks could be also considered. However, framework of the Generalized Autoregressive Score (GAS) models of Creal et al. [82] and Harvey [83] is more intriguing. The GAS framework allows relatively straightforward introduction of the time-varying dynamics for any desired parameters and can enhance empirical performance of the suggested models further.