Figures
Abstract
SVR-ARMA-GARCH models provide flexible model fitting and good predictive powers for nonlinear heteroscedastic time series datasets. In this study, we explore the change point detection problem in the SVR-ARMA-GARCH model using the residual-based CUSUM test. For this task, we propose an alternating recursive estimation (ARE) method to improve the estimation accuracy of residuals. Moreover, we suggest using a new testing method with a time-varying control limit that significantly improves the detection power of the CUSUM test. Our numerical analysis exhibits the merits of the proposed methods in SVR-ARMA-GARCH models. A real data example is also conducted using BDI data for illustration, which also confirms the validity of our methods.
Citation: Wang H, Guo M, Lee S, Chua C-H (2022) Forecasting and change point test for nonlinear heteroscedastic time series based on support vector regression. PLoS ONE 17(12): e0278816. https://doi.org/10.1371/journal.pone.0278816
Editor: Petre Caraiani, Institute for Economic Forecasting, Romanian Academy, ROMANIA
Received: August 2, 2022; Accepted: November 25, 2022; Published: December 30, 2022
Copyright: © 2022 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: This work was supported in part by Ministry of Science and Technology, Taiwan under grant MOST 110-2118-M-110-002-MY2 (Wang, Guo and Chua) and National Research Foundation of Korea under grant NRF-2021R1A2C1004009 (Lee).
Competing interests: NO authors have competing interests.
Introduction
Non-linearity and conditional heteroscedasticity are two major characteristics of financial time series data. The SVR-ARMA-GARCH models proposed by [1] provide flexible model fitting and good predictive ability for data with these two characteristics. However, major events such as the financial tsunami and the COVID-19 pandemic often bring shocks and changes to the financial market, making the original models no longer suitable for future decisions. Therefore, it is important to detect the occurrence of change points in time to update the model. The objective of this paper is twofold: the first is to improve the forecasting capability for more accurate parameter/residual estimation for the SVR-ARMA-GARCH time series, and the second is to enhance the detection power of the residual-based CUSUM test of [2] when changes occur near the current observing time.
For the first, we propose an alternating recursive estimation (ARE) method which estimates jointly the parameters of the conditional mean and conditional variance equations of SVR-ARMA-GARCH models. Our numerical studies show that the proposed ARE method outperforms [1] method in terms of forecasting performance and thereby improves the accuracy of parameter/residual estimation in SVR-ARMA-GARCH models. This improvement in the residual estimation also greatly enhances the detection ability of a change point.
For the second, we aim to improve the residual-based CUSUM test proposed by [2]. Their test only uses time series observations and model-based residuals, so that it has merits to be simpler and more flexible than other types of CUSUM tests. We refer to [3] for a theoretical background and history of CUSUM tests for time series, particularly, the ones based on residuals. Although [2] test performs adequately in many situations, Tmax has a shortcoming to yield low powers when a change is near current observing time, which results in a late detection. To resolve this problem, we here propose an alternative testing procedure by supplying a time-varying upper control limit (UCL) for Ts, wherein the underlying basic process asymptotically behaves like a standardized (by its time point) Brownian bridge, and additionally, the test statistic of [4] whose limiting process is the supremum of a Brownian bridge standardized by its maximizing point.
Our numerical analysis reveals that those three tests have their own merits of yielding high powers in different stages of time period, namely, , Tmax, and Ts with UCL (
) perform the best for detecting a change in an early, middle and late stage, respectively. Since the CUSUM test under consideration is retrospective, the late stage of time period implicates the time close to the currently observing time, and therefore, the last testing procedure is conducive to quickly detecting a change point near the current time and makes a more suitable test for a dynamic monitoring scheme and timely update of models particularly when a moving window scheme is adopted.
The organization of this paper is as follows. Section 2 introduces the proposed alternating recursive estimation (ARE) method for the SVR-ARMA-GARCH model. Section 3 introduces the aforementioned three residual-based CUSUM tests. Simulation and empirical studies are conducted in Sections 4 and 5 for illustration. Section 6 provides the conclusions.
Materials and methods
ARE of the SVR-ARMA-GARCH models
Time series forecasting is crucial to forecast the characteristics of time series and detect anomalies in statistical monitoring process. Conventionally, linear ARMA models have been used for this purpose, but as time series often has significant nonlinear features, the forecasting result based on the ARMA models is inaccurate and hard to utilize for the application to monitoring processes. As an alternative, researchers can consider using support vector regression (SVR). SVR originates from [5, 6] statistical learning theory, which uses nonlinear functions to convert input variables into a higher dimensional space to helps explore information that can only be observed in higher dimensional space. Under the varieties of circumstances, it provides flexibility and excellent forecasting accuracy, and satisfies the structural risk minimization principle, see [7–9]. This far, the SVR has been applied to various practical problems; for example [10], used it for stock prediction and [11] used it for anomaly detection.
Chen et al. [1] adopted the SVR method to estimate the parameters in SVR-based nonlinear GARCH models and showed that the SVR-GARCH models significantly outperform classical parametric models in the respect of one-period-ahead volatility forecasting. In this study, we aim to improve and refine their method as addressed below.
Let rt denote an asset return at time t. The rt is said to follow an ARMA(p, q)-GARCH(m, s) model if
(1)
(2)
where
is a martingale difference sequence and
It is understood that αi = 0 for i > m and βj = 0 for j > s.
The SVR-ARMA-GARCH model replaces the linear functions in Eqs (1) and (2) with the nonlinear functions h(⋅) and g(⋅) and is expressed as:
(3)
(4)
where
with rt,p = (rt−1, rt−2, …, rt−p) and at,q = (at−1, at−2, …, at−q).
The functions ϕh(⋅) and ϕg(⋅) are defined using a Gaussian kernel, i.e.
Chen et al. [1] proposed to estimate wh and wg separately by optimizing the two objective functions in Eqs (5) and (6) below. Namely, Eq (5) is optimized to obtain wh and the residual , and
is used to optimize Eq (6) for obtaining wg:
(5)
and
(6)
However, this method does not consider the impact of volatility on the returns when estimating wh. As such, we replace the residuals in Eq (5) with the standardized residuals in Eq (7) to estimate wh as follows:
(7)
To optimize Eqs (6) and (7), we adopt an alternating recursive estimation (ARE) method illustrated in Fig 1. Namely, at each iteration, we fix wg to update wh, then fix wh to update wg, and continue to alternate this estimation procedure until the two residual sequences sufficiently converge to each other. This method performs more functionally in computing the predicted values and residuals, used for detecting a change point as described in Section 4 below, wherein the predominance of the ARE method is demonstrated empirically through Monte Carlo simulations.
Residual-based CUSUM test
This section introduces three different test statistics to detect a change point.
Tmax of Lee et al. [2].
Let us consider the location-scale time series model of the form:
(8)
where ηt are i.i.d. random variables with mean 0 and a fourth moment. The ht(⋅) and gt(⋅) indicate the conditional mean and variance of yt at time t. A structural change occurs if ht(⋅) and gt(⋅) experiences a change. Given observations y1, …, yn, we set up the null and alternative hypotheses:
To test these hypotheses [2], used the test:
with
where
and
is the estimated residual at time t. Since the statistic Ts only involves observations and residuals, as far as the residuals of the model are estimable, the residual-based CUSUM test can be constructed for detecting a change point. For parametric location-scale models [3, 12], proved that under H0 and regularity conditions, Tmax converges weakly to the sup of a standard Brownian bridge. Lee et al. [2] applied this paradigm to Model (8) with a hybrid of SVR methods and demonstrated its validity through empirical studies.
According to [13],
(9)
Hence, for a given significance level α, we can find a critical value K from Eq (9) satisfying
(10)
In particular, when α = 0.05, the corresponding K is 1.358. When Tmax is greater than the critical value K, we reject the null hypothesis.
Fig 2 plots 500 paths of |Bs|, where the red horizontal line corresponds to the critical value K = 1.358 for the significance level α = 0.05.
of Ferger [4].
Instead of Tmax, one can consider another residual-based CUSUM test:
(11)
Ferger [4] obtained the limiting distribution of and used it to improve the power of the Kolmogorov-Smirnov test. More specifically [4], proved that the c.d.f. of R:
(12)
has a form of
(13)
for x ≥ 0 and αi = 2i + 1, where Φ and ϕ denote the distribution function and its density of a standard normal random variable. For a given significance level α, the null hypothesis is rejected if
is greater than the critical value
. In particular, using Eq (13), we obtain
and
.
Ts with a time-varying control limit.
Although can overcome a shortcoming of Tmax, it only considers the standardized Tmax but not the standardized values at other time points. As such, we here construct the time-varying control limits for Ts based on the fact:
(14)
where
and
refer to Corollary A.3.1 of [14]. For example, when α = 0.05 and n = 500, we have
. The time-varying control limit of Ts is exhibited in Fig 3.
This result enables us to set up an adjusted control limit for improvement, but for small s, there can be a concern regarding excessive type 1 errors in its implementation. For example, when the AR (1) model below considered,
(15)
where at are i.i.d. N(0, 1) errors, the test based on 500 observations from this model shows that the rejection ratio of the null hypothesis out of 500 replications is 0.114 which is much larger than the significance level of α = 0.05.
In Fig 4, the x-axis represents the time points where the type 1 error occurs, and the y-axis represents the number of occurrences. The figure indicates that the control limit based on Eq (14) cannot perform well in an early stage.
To overcome this problem, we propose to use the control limit for Ts, s ∈ (0, 1), as follows:
(16)
Then, we reject the null hypothesis when Ts > UCL(s) for some s ∈ (0, 1). We name this testing method .
For a given significance level α, the upper control limit can be obtained via finding K, K*, and s* in Eq (16) through the following steps:
- Find K with P(sup|Bs| > K) = α.
- Simulate 100,000 standard Brownian bridge paths of sample size n.
- Find K* ∈ [2, 4] with
satisfying
Fig 5 shows the plot of α v.s. K* for n = 500. Fig 6 shows the upper control limit of Ts for α = 0.05 (the red solid line) which is determined by the following formula:
(17)
We further investigate the stability of the critical value K* vs. the sample size n. Fig 7 shows that when the sample size n < 400, the value of K* increases as n grows gradually, but when 400 ≤ n ≤ 1000, the value of K* looks stable.
Results and discussion
Numerical analysis
We evaluate the performance of the SVR-ARMA-GARCH model using the proposed ARE method and the residual-based CUSUM test based on Ts. For this task, we generate the data following a Lorenz system, namely, yt, t = 1, 2, …, n, is from the following model:
(18)
where βt and γt originate from the Lorenz system, as shown in Eq (19). Under the null of no changes, we consider ϕ = 1. The Lorentz system is a three-dimensional dynamic system, obtained from the convective volume equation in the atmospheric equation:
(19)
For each simulation, 1100 observations are simulated, and the first 100 are removed. The first 500 observations are used as a training set to fit the three models: ARMA-GARCH, SVR-ARMA-GARCH based on the ARE and the method of [1], and the last 500 observations are used as a test set for prediction. The last 200 observations in the training set are used as a validation set to decide the free parameters of the SVR-ARMA-GARCH model. Fig 8 shows a time series realization simulated by Lorenz system and the partition of training, validation and testing sets. We use rolling window to evaluate the 500 one-step-ahead forecasts of the fitted three models respectively in the testing period.
Evaluation metrics.
We use five evaluation metrics to measure the prediction accuracy. Let yt denote the series of the returns to predict, and denote the estimated residuals.
- Evaluation metrics for conditional mean predictor
- Evaluation metrics for conditional variance predictor
Table 1 shows the averaged performance for each evaluation metric after 500 repetitions. As the data is generated from the nonlinear Lorenz system, the performance of the ARMA-GARCH model is shown to be flawed. A pairwise t-test demonstrates that the ARE method performs significantly better than that of [1].
Next, we evaluate the performance of the three tests: , Tmax, and
, using the SVR-ARMA-GARCH and ARE method to fit it into the dataset and estimate the residuals. Each simulation is conducted with the samples of size 500 and the significance level α = 0.05. Then, under the null hypothesis, the ratios of rejection of the three tests appear to be 0.03, 0.038, and 0.044, respectively, which are all close to α = 0.05. In Fig 9, the x-axis represents the time points where the type 1 error occurs, and the y-axis represents the number of occurrences in five hundred experiments.
Finally, we investigate the power of the three tests by changing the value of the parameter ϕ in the test set at different time points and compute the proportion of rejecting the null hypothesis. Fig 10 presents the powers of the three tests at different locations. The left side presents the changed parameter. The x-axis stands for the position where the change point occurs, while the y-axis presents the rejection rate of the null hypothesis. The results exhibit that when the change point appears in an early period, the power of is 20% higher than that of Tmax, and in contrast, when the change point appears in a later stage, the power of
is 60% higher than that of Tmax.
Empirical study of the BDI data
Data description.
The Baltic Dry Index (BDI) is issued daily by the Baltic Exchange in London. The BDI is a composite of the Capesize, Panamax, and Supramax Timecharter Averages. This is reported worldwide as a proxy for dry bulk shipping stocks and a general shipping market bellwether. We use the BDI daily log return data to perform a one-step-ahead forecast. We split the data into the training set from 2015/01/01 to 2016/12/31 and the test set from 2019/01/01 to 2021/04/06 concurrent with the validation set from 2017/01/01 to 2018/12/31. See Fig 11.
Prediction model.
First, we determine the free parameters in the SVR-ARMA-GARCH model. The orders of the ARMA model and GARCH model are determined by extended autocorrelation function (EACF) and AIC. The free parameters of the SVR model are determined by a grid search. As a result, we first obtain the ARMA(2,2) model:
(25)
and then an ARMA(2,2)-GARCH(1,1) model that minimizes AIC:
(26)
We then apply a grid search to the validation set to determine the free parameters of SVR. Parameters γh, γg, and ϵg are selected from {0.01, 0.02, …, 0.1}, and parameter ϵh is selected from {0.1, 0.2, …, 1}. Also, parameters Ch and Cg are selected from {0.5, 0.6, …, 1.5}.
For comparison, we also consider three neural network methods: RNN, LSTM, and GRU, with two hidden layers. Here, the number of hidden layer neurons in the network is selected by a grid search from {2, 4, 8, …, 256}. We also build a neural network model for the sequence to predict the conditional variance. For measuring the accuracy of prediction, we use metrics: NMES, Sign error, and RMSE_var, wherein the number of repetition is 100. Table 2 shows that the neural network model has better predictive ability than the traditional time series models (ARMA and ARMA-GARCH), but underperforms the SVR-ARMA-GARCH model, and further, the ARE method performs better in the SVR-ARMA-GARCH model.
Change point detection.
To perform a residual-based CUSUM test, we use the residuals obtained based on the ARE method and apply , Tmax, and
to the BDI data to detect a change point from 2019/01/01 to 2021/04/06. In this task, we use the rolling window scheme with the width of one year and one day movement. See Fig 12. When a change point is detected, the point exceeding the control limit the most is declared as a change point.
One can guess that a true change point occurs on 2019/12/24 as the COVID-19 outbreak began at that time and triggered a sudden change in the global financial market resulting in a plunge of crash in stock prices. To examine this, the three tests are conducted. The result shows that detects a change point most quickly on 2020/01/10, while both Tmax and
detect the change point on 2020/06/11, which is quite late compared to the first test. See Fig 13.
The second change point is claimed to be on 2020/03/30 as the price of Brent crude oil fell 9% at the date to US $23 per barrel which was the lowest price since 2002. As all ships in BDI needed oil to travel, the abnormality of Brent crude oil led to trigger a serious BDI change. In this case, ,
, and Tmax appear to detect a change point on 2020/04/14, 2020/05/15, and 2020/06/11, respectively. See Fig 14.
This finding reveals that can detect a change about a half month after the occurrence of such big events, a lot faster than Tmax and
, which strongly affirms the validity of our proposed method when used in tandem with the rolling window scheme.
Conclusion
In this study, we proposed an alternating recursive estimation (ARE) for the SVR-ARMA-GARCH model. Our numerical study showed that our one-step-ahead prediction method has better predictive ability and supplies more accurate residual estimation than the method of [1]. We also proposed a new residual-based CUSUM test employing a time-varying control limit. We demonstrated via Monte Carlo simulations that
, Tmax of [2] and
of [4] have their own advantages of producing higher powers in different time periods. Our findings in the empirical study using the daily BDI (Baltic Dry Index) data demonstrate the superiority of
over the others in terms of quick detection ability. All these results affirmed the validity of the proposed method.
References
- 1. Chen S, Härdle WK, Jeong K. Forecasting volatility with support vector machine-based GARCH model. Journal of Forecasting. 2010;29(4):406–433.
- 2. Lee S, Lee S, Moon M. Hybrid change point detection for time series via support vector regression and CUSUM method. Applied Soft Computing. 2020;89:106101–106101.
- 3. Lee S. Location and scale-based CUSUM test with application to autoregressive models. Journal of Statistical Computation and Simulation. 2020;90(13):2309–2328.
- 4. Ferger D. On the supremum of a Brownian bridge standardized by its maximizing point with applications to statistics. Statistics & Probability Letters. 2018;134:63–69.
- 5.
Vapnik VN. The Nature of Statistical Learning Theory. Springer.; 1995.
- 6.
Vapnik VN. Statistical learning theory. John Wiley & Sons.; 1997.
- 7. Cao LJ, Tay FEH. Support vector machine with adaptive parameters in financial time series forecasting. IEEE Transactions on neural networks. 2003;14(6):1506–1518. pmid:18244595
- 8. Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP,Oliveira AL. Computational intelligence and financial markets: A survey and future directions. Expert Systems with Applications. 2016;55:194–211.
- 9. Sapankevych NI, Sankar R. Time series prediction using support vector machines: a survey. IEEE Computational Intelligence Magazine. 2009;4(2):24–38.
- 10. Dash RK, Nguyen TN, Cengiz K, Sharma A. Fine-tuned support vector regression model for stock predictions. Neural Computing and Applications. 2021;44:1–15.
- 11. Lee H, Li G, Rai A, Chattopadhyay A. Real-time anomaly detection framework using a support vector regression for the safety monitoring of commercial aircraft. Advanced Engineering Informatics. 2020;44:10171–10171.
- 12. Oh H, Lee S. Modified residual CUSUM test for location-scale time series models with heteroscedasticity. Annals of Institute of Statistical Mathematics. 2019;71:1059–1091.
- 13. Kolmogorov A. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn. 1933;4:83–91.
- 14.
Csörgö M, Horváth L. Limit theorems in change-point analysis. John Wiley & Sons.; 1997.