Financial time series forecasting using twin support vector regression

Financial time series forecasting is a crucial measure for improving and making more robust financial decisions throughout the world. Noisy data and non-stationarity information are the two key factors in financial time series prediction. This paper proposes twin support vector regression for financial time series prediction to deal with noisy data and nonstationary information. Various interesting financial time series datasets across a wide range of industries, such as information technology, the stock market, the banking sector, and the oil and petroleum sector, are used for numerical experiments. Further, to test the accuracy of the prediction of the time series, the root mean squared error and the standard deviation are computed, which clearly indicate the usefulness and applicability of the proposed method. The twin support vector regression is computationally faster than other standard support vector regression on the given 44 datasets.


Introduction
For the last two decades in the machine learning area, support vector machines (SVMs) have been a computationally powerful kernel-based tool for various classification problems, such as pattern recognition and regression problems and function approximations [1]. It has the advantages over other methods, such as artificial neural networks (ANN), which focus on minimizing the empirical risk in the training phase, whereas SVM was developed on the structural risk minimization principle [1], which minimizes the upper bound on the generalization error. Another advantage of SVM is that it forms a convex optimization problem, a single large quadratic programming problem (QPP) that yields a unique global solution. The SVM has been applied in many fields to solve various well-known real-world problems ranging from image classification [2], remote sensing image classification [3], text characterization [4], biomedicine [5,6], time series prediction [7,8] and business prediction [9], which clearly justify its popularity.
To obtain an optimal regressor function for a given set of training data, support vector regression (SVR) was introduced by Vapnik [ space or in a higher dimensional space via kernel mapping. The SVR has the advantage of better generalization performance than the other regression methods. However, standard SVM has a drawback in that it optimizes a computationally expensive cost function for large-scale datasets that have high training costs, i.e., O(m 3 ), where m is the number of training samples. Due to this high training cost, it is not easy to find the optimal parameters from a large set of parameters. To address this issue, different variants of SVM have been proposed, such as chunking and decomposition methods [10,11], exact SVM training algorithm SMO [12], approximate SVM training algorithms [13][14][15] and LS-SVM [16,17]. Mangasarian and Wild [18] suggested a new method for binary classification as a generalized eigenvalue proximal support vector machine (GEPSVM) based on two nonparallel hyperplanes. To find the nonparallel hyperplanes, GEPSVM solves two eigenvalue problems based on the size of the input space dimensions. The GEPSVM outperforms the standard SVM in terms of computational speed and accuracy. Similarly, in the spirit of GEPSVM, twin support vector machine (TWSVM) has recently been proposed [19] for binary classification problems that consist of two nonparallel planes, for example, where each plane is closer to the data points of one of the two classes and as far as possible from the data points of the other class. In TWSVM, two QPPs of smaller size are solved to obtain two nonparallel hyperplanes instead of a QPP of large size. This strategy gives TWSVM good generalization ability, making it better than GEPSVM and approximately four times faster than the standard SVM. The main difference between GEPSVM and TWSVM is that GEPSVM solves two generalized eigenvalue problems to obtain the hyperplanes because TWSVM solves two related SVM-type problems to obtain the hyperplanes. Peng [20] recently proposed a twin support vector regression technique based on TWSVM in which an unknown regressor function is generated by the construction of nonparallel insensitive up and down bound functions. In this case, it solves a pair of two smaller sized QPPs unlike the large QPP solved in the case of SVR. To find the solution to this problem through machine learning approaches, various methods have been applied, such as artificial neural networks [21], statistical learning [22], fuzzy logic [23][24][25][26], neural networks [27][28][29], evolutionary algorithms [30] and hidden Markov models [31]. Eugene et al. [32], estimated that the factors for high expected returns that are due to future price increases are only offset through the decrementing of the current price. Therefore, expected returns based on the variable time generate temporary subsets of different prices. Lewellen et al. [33] proposed an approach for testing the prediction of aggregate financial ratios, named predictive regression, on small-scale sample biases. Goh et al. [34] tried to find the relationship between the U.S. and Chinese economic variables and predicted the economic variable for each country that justifies which country's economic variables are greater than others. In 2017, Shen et al. [35] presented a novel method for predicting the Chinese stock returns for different asset values using the Baidu index. Similarly, Li et al. (2018) [36] found that idiosyncratic volatility significantly grows when internet stock message boards are already built up.
The prediction of stock market indices has been the focus of interest from the day the stock market came into existence. Researchers have several goals and motivations for trying to predict stock market prices. One of the motivations could be to make life easier and more luxurious. Many investment professionals, along with researchers, are trying to find a superior system that will yield high returns in terms of financial gain. There has been considerable work performed to predict the behavior of the stock market. To perform the financial time series prediction, various parameters are involved: (a) price of the last trade performed during the day, (b) total number of commodities traded during the day, and (c) lowest and highest traded price [37]. Because of these parameters, the nonlinearity and uncertainty involved in the prediction of financial time series forecasting, this paper proposes TSVR to address these situations. To determine the effectiveness of TSVR on financial time series datasets, first, this paper discusses the formulation of TSVR and then the performance of the numerical experiments for various financial datasets. The experimental results of TSVR are compared with the standard SVR formulation with accuracy in terms of average RMSE and training time.
The remainder of this paper is organized as follows: Sections 2 and 3 discuss the formulation of SVR and TSVR, respectively. Section 4 shows the experimental results on different financial time series datasets of TSVR and comparison results with SVR. Finally, conclusions are drawn in section 5.

Support vector regression
This section describes the standard formulation of support vector regression (SVR). Assume that a set of training samples is {(x 1 ,y 1 )} i = 1,2,. . .,m where x i = (x i1 ,x i2 ,. . .,x in ) t 2R n is the input example and y i 2R is the target value for i = 1,2,. . .,m, where m corresponds to input training samples. Let matrix D2R m×n denote the input examples where x t i is the i-th row and y = (y 1 ,. . ., y m ) t is the vector of observed values. The main goal of SVR is to approximate the regression function f(.) in the form where unknowns w is the vector and b is a scalar value. Vapnik [1] suggested the formulations of SVR by introducing the ε-insensitive loss function and determining the unknown variables w and b by solving the following QPP: subject to: where ξ 1 = (ξ 1i ,. . .,ξ 1m ) t , ξ 2 = (ξ 21 ,. . .,ξ 2m ) t are slack variables in vector form, and C>0 and ε>0 denote the input parameters.
Here, the solution of the above problem is obtained by introducing Lagrange multipliers where the Lagrange multipliers are λ 1 = (λ 11 ,. . .,λ 1m ) t and λ 2 = (λ 21 ,. . .,λ 2m ) t in R m , which give Financial time series forecasting the solution to the above quadratic problem. Here, nonzero values of Lagrangian multipliers, which are known as support vectors in Eq (3) are useful for predicting the regression function, which is defined for any x2R n as For a nonlinear regressor, the input data maps to a higher dimensional feature space using a kernel function k (.,.) which is defined by the Gaussian kernel as k(x i ,x j ) = exp(−μkx i −x j k 2 ) for i, j = 1,2,. . .,m and μ is a parameter. The nonlinear case can be obtained as subject to: The nonlinear prediction function f (.) is given by finding the value of λ 1 and λ 2 from the solution of the problem mentioned in Eq (5) for any x2R n ,

Twin support vector machine
To further improve the generalization performance and training time of SVR, a new approach was discussed by Peng [20], termed TSVR. The TSVR constructs a pair of nonparallel hyperplanes such that one of the hyperplanes determines the ε-insensitive downbound f 1 (x) = x t w 1 +b 1 and another ε-insensitive upbound function f 2 (x) = x t w 2 +b 2 to identify the end regression function. The TSVR solves a pair of smaller QPPs of m constraints to identify the solution instead of solving a single large QPP with a 2 m number of constraints. The formulation of TSVR determines the regression function by the following pair of constrained QPPs as: subject to: subject to: where C 1 ,C 2 >0 and ε 1 ,ε 2 �0 denote input parameters, ξ = (ξ 1 ,. . .ξ m ) t and η = (η 1 ,. . .η m ) t denote the vector of slack variables.
By applying the KKT conditions for the Lagrangian function as shown in Eq (8), we obtain: Since ν 1 �0, we have Similarly, for the Lagrangian function as shown in Eq (9), we obtain Since ν 2 �0, we have Combining Eq (10) with Eq (11) and Eq (17) with Eq (18), we obtain Let us define, and then we have, i.e., and Here, note that S t S is positive semidefinite, but to overcome the situation in which its inverse does not exist, σI is introduced as a regularization term, so that (S t S+σI) becomes positive definite where σ is a very small positive number, such as σ = Ie-7. Thus, we have Substituting Eq (29) into the primal Lagrangian function Eq (8) and using Eqs (13) to (16), the dual problem of Eq (6) is obtained as Similarly, substituting Eq (30) into the primal Lagrangian function Eq (9) and using Eq (20) to (23), the dual problem of Eq (7) is obtained as subject to: The vectors λ 1 and λ 2 are calculated by solving the dual QPPs Eqs (31) and (32). Finally, in the output for any data point x2R n , the end regressor f(.) is given by: To extend TSVR to a nonlinear case, TSVR finds the regression function by solving the following primal problems: subject to: and where the kernel matrix K(D,D t ) of order m whose (i, j) element is given by in a similar manner, the dual formulations of QPPs Eqs (34) and (35) are given by Eqs (36) and (37), respectively. and After resolving Eqs (36) and (37), we find the value of u 1 and u 2 as Finally, for any data sample x2R n , the end regression function f(.) is given by:

Numerical experiments
In this section, various numerical experiments are conducted to test the generalization performance and the computational efficiency of the TSVR on standard datasets and compared with SVR. This paper considered 44 benchmark datasets and divided them into two groups. The first group has a combination of 24 individual company stocks, and the second group has 20 stock market index datasets from the Yahoo financial website, i.e., http://finance.yahoo.com  Table 1 and Table 2, respectively. All computations are carried out on a PC with Windows 7 OS, with a 32 bit, 3.10 GHz Intel core i5-2400 processor with 4 GB of RAM under the MATLAB R2012b environment. This paper used the MOSEK optimization toolbox to solve the quadratic programming problem in SVR and TSVR formulations, which is taken from http://www.mosek.com [39].
All the datasets are normalized in the following manner so that each feature value lies in [0, 1]: where the total number of test samples is denoted by P, andỹ i is the predicted value corresponding to the observed values. To construct a nonlinear regressor, we use a Gaussian kernel where vector x,y2R m and μ>0. The optimal parameter values of C = C 1 = C 2 are selected from the sets {10 −5 ,. . .,10 5 } and μ from the set {2 −5 ,. . .,2 5 } for the training using 10-fold cross validation. By using the optimal values, the whole dataset is divided into 10 equal parts at random, out of which one part is used for testing and the remaining parts for the training to obtain the computational test accuracy. Finally, to measure the prediction, the average RMSE of the test accuracies is considered.  Table 3 shows the average RMSE for the optimal parameter values with standard deviation and the training time in seconds. Fig 1 shows the absolute prediction error of SVR and TSVR for the linear kernel on the SHI dataset. Fig 2 shows the actual and predicted values of SVR and TSVR for the linear kernel on the SHI dataset. To verify the performance of both algorithms statistically on 24 individual stock datasets, we perform a simple, nonparametric safe test, i.e., the Friedman test with the corresponding post hoc test [40]. For this, the average rank of 24 datasets for the linear case is tabulated in Table 4. The Friedman statistic [40] can be computed under the null hypothesis, as shown in Table 4.  Table 5 shows the average RMSE for the optimal parameter values with the standard deviation and the training time in seconds. From Table 5, we can conclude that TSVR gives better results in 19 cases out of 24 datasets in terms of average RMSE of test accuracy, which signifies the performance of TSVR in comparison to SVR in terms of prediction. Additionally, it shows the superiority of TSVR with respect to SVR in terms of computational time.

Individual stocks datasets of company
Similar to linear case, for individual stocks, the Friedman statistic can be computed under the null hypothesis from Table 4, which shows that both algorithms have a similar performance:    Since the difference between the average ranks of TSVR with SVR (1.791667−1.208333 = 0.583334) is greater than 0.3358, we conclude that TSVR is significantly better than SVR for individual stock datasets. For the non-linear case, the absolute prediction error of SVR and  Table 6 shows the average RMSE for the optimal parameter values with the standard deviation and the training time in seconds. We can conclude that TSVR gives better results in 13 cases out of 20 datasets in terms of average RMSE of test accuracy. Additionally, the training time of TSVR is lower than that of SVR. The Friedman statistical nonparametric post hoc test is performed on the average rank of 20 financial datasets from Table 7. The Friedman statistic [40] can be computed under the null hypothesis for the linear case: significant difference between these two algorithms for the linear case. Fig 7 shows the absolute prediction error plot of SVR and TSVR for the linear kernel on the BFX dataset. Fig 8 also shows the actual and predicted values of SVR and TSVR for the linear kernel on the market stock index BFX dataset. One can easily conclude that TSVR is in close agreement with the target values compared to SVR. Nonlinear case. For the non-linear kernel, Table 8 shows the average RMSE for the optimal parameter value with the standard deviation and the training time in seconds. We can conclude that TSVR gives better results in 19 out of 20 datasets in terms of average RMSE of test accuracy. The training time of TSVR is less than that of SVR due to solving a pair of smaller-sized QPPs instead of a large QPP, as in the case of SVR. This shows the superiority of TSVR with respect to SVR.
In the nonlinear case for different stock market index datasets, the Friedman statistic can be computed under the null hypothesis from Table 7 as:    q � 0:3678 at p = 0.10.
Since the difference between the average ranks of TSVR with SVR (1.95−1.05 = 0.90) is greater than 0.3678, we conclude that TSVR is significantly better than SVR for stock market index datasets. For the non-linear case, the absolute prediction error of SVR and TSVR is shown in Figs 9, 10 and 11 for the BVSP, DJI and IXIC datasets, respectively. The actual and predicted values of SVR and TSVR are plotted in Figs 12, 13 and 14 for the BVSP, DJI and IXIC datasets, respectively. It can easily be observed from these figures that TSVR is in close agreement with the desired output in comparison to SVR, which clearly demonstrates the applicability and usefulness of TSVR.

Conclusion
In this paper, support vector regression and twin support vector regression formulations are discussed in detail and applied to an individual companies' stock indices in the area of information technology industries, banking, oil, and petroleum industry and stock market index datasets of different countries to predict stock prices. Here, a pair of smaller sized QPPs is solved instead of a single large sized QPP, as in the case of SVR, thus yielding a reduction in the cost of the system. To verify the effectiveness of TSVR, we performed numerical experiments for both linear and Gaussian kernels on financial time series datasets. In experimental results, TSVR shows better learning speed for both linear and Gaussian kernels with the ability to predict having a better generalization ability than SVR. In fact, the computation time of the TSVR is approximately four times lower than the standard SVR in terms of learning speed, which clearly indicates its existence and usability. In future work, a new model that is able to handle noise and outliers for predicting the prices of stock indices can be explored.