A Feature Fusion Based Forecasting Model for Financial Time Series

Predicting the stock market has become an increasingly interesting research area for both researchers and investors, and many prediction models have been proposed. In these models, feature selection techniques are used to pre-process the raw data and remove noise. In this paper, a prediction model is constructed to forecast stock market behavior with the aid of independent component analysis, canonical correlation analysis, and a support vector machine. First, two types of features are extracted from the historical closing prices and 39 technical variables obtained by independent component analysis. Second, a canonical correlation analysis method is utilized to combine the two types of features and extract intrinsic features to improve the performance of the prediction model. Finally, a support vector machine is applied to forecast the next day's closing price. The proposed model is applied to the Shanghai stock market index and the Dow Jones index, and experimental results show that the proposed model performs better in the area of prediction than other two similar models.


Introduction
Detecting financial time series trends is a decision support process, and stock data is typically representative of a financial time series. Two types of approaches are used to predict the stock market, namely fundamental and technical analysis. The former predicts the stock price trend by using economic factors, while the latter utilizes historical data or some technical variables to forecast the stock price. The technical analysis model can be regarded as a pattern recognition problem [1]. The model is trained using historical data or technical variables, and current data is used to predict the future stock price.
Accordingly, there are two types of stock market forecasting systems. One is to predict the stock price movement, which can be regarded as a classification problem. The other is to predict the value of the stock price, which is commonly regarded as a regression problem. For the latter, two types of forecasting frameworks exist: both auto regression and multi-variable regression models have been proposed in previous researches. The auto regression model deals with the problem using the principle of time series prediction. More specifically, the time series is divided into several segments, and then the segments are used as raw data to predict the future stock price. The basic idea of a multi-variable regression model is that related technical variables are selected as raw data to predict the future stock price.
Both auto regression and multi-variable regression models face a problem: data pre-processing. Plenty of methods are proposed to remove noise and reduce the dimensions of raw data, such as Principal Component Analysis (PCA) [2], Kernel Principal Component Analysis [3], Perpetually Important Points [4], Piecewise Aggregate Approximation [5], Singular Spectral Analy-sis [6], Discrete Fourier Transform, Discrete Wavelet Transform [7], the Landmarks model [8], and Random Matrix Theory [9], [65]. Zhao and Zhang [10] proposed a dimension reduction framework for time series which obtained more coefficients for recent data while fewer were kept for older data. Recently, Independent Component Analysis (ICA) has becomes a popular tool in the field of signal processing and pattern recognition, which is commonly used for feature extraction and blind signal separation.
Regarding prediction tools, some soft computing methods, such as Artificial Neural Networks (ANNs) and Support Vector Machine (SVM), have become popular methods for stock market forecasting due to their excellent nonlinear regression performance. Feed-forward Neural Networks were the first models used to detect regularities in the stock market [11]. Subsequently, Back Propagation Neural Network [12], Procedural Neural Networks [13], Probabilistic Neural Network [14], Functional Link Artificial Neural Network [15], Recurrent Neural Network [16], and Radial Basis Function Neural Network (RBFNN) [17] have been proposed for application to stock market forecasting. However, ANNs are based on the Empirical Risk Minimization principle, which may run the risk of model over-fitting and local minimums. Support Vector Regression (SVR) [18] is based on the structural risk minimization principle and has a new regression approach with good generalization ability. It has been successfully applied to problems of finance series prediction problems, which are reported in [19], [20], [21] and [22].
When modeling of financial time series using SVR, since the noise in the data could lead to over-fitting or under-fitting problems [23], data pre-processing is a key problem in this task. As a novel pre-processing tool, ICA may use higher order statistical information for separating the signals, rather than the secondorder information of the sample covariance as used in PCA. ICA can therefore reveal some underlying structure in the data, giving a fresh perspective to the problem of understanding the mechanisms that influence the stock market data. Recently, a hybrid model has become popular by combining ICA and SVR in conducting time series prediction tasks. Typically, ICA and SVR are used under the auto regression framework, also known as the AICA-SVR model, such as the model in [24], [25] and [26]. ICA and SVR are used under the multivariable regression framework, also called MICA-SVR in [27], [28] and [29]. Both of them apply ICA to extract the feature from the raw data and use SVR to predict the future price. In these models, ICA and SVR are jointly employed to improve the predictive performance. However, AICA-SVR focuses on the closing price movement from the influence of the historical data, while MICA-SVR is concerned about influence from other technical variables. In fact, the stock price trend is related to both closing price history and current technical variables. In this study, a data driven model named ICA-CCA-SVR is proposed, which predicts stock closing price considering the influence of both historical closing price and current technical variables by combining ICA, Canonical Correlation Analysis (CCA), and SVR. Experimental results in the Shanghai stock market index and the Dow Jones index show that the ICA-CCA-SVR model performs better than AICA-SVR and MICA-SVR.
The article is organized as follows. In Section 2, we provide a brief explanation of the theoretical background of ICA, SVR, and the AICA-SVR and MICA-SVR models. In the subsequent section, the proposed model ICA-CCA-SVR will be explained in depth. Section 4 presents the research design and experiments, and the experimental results are presented and discussed. The final section gives the conclusion and the limitations of this study.

Related Works
As a pre-processing tool, ICA is used in plenty of prediction models. Lu et al [20] proposed a method to predict time series using ICA as a pre-processing tool. Matteson and Tsay [30] presented an ICA for multivariable time series analysis. Ahn et al [31] used ICA as a pre-processing tool and hybrid ANNs to predict Customer Relationship Management, and the experimental result shows that the performance of ICA outperforms PCA. Mok et al [32] used ICA to extract the underlying news factors from intraday stock data to improve stock index predictions using such extracted ''news factors''. Lizieri et al [33] applied an ICA procedure based on a kurtosis maximization algorithm to Real Estate Investment Trust data. The results show that ICA successfully captures kurtosis characteristics of Real Estate Investment Trust returns. Kwak et al [69] applied ICA as a dimensionality reduction tool for data mining. Lu [34] proposed an integrated independent component analysis ICA-based denoising scheme with neural network to predict the TAIEX closing index and Nikkei 225 opening indexes. Wu and Yu [35] proposed the ICA-GARCH model which is computationally efficient in estimating the volatilities. The experimental results show that this method is more effective for modeling multivariate time series than PCA-GARCH. Cao and Chong [36] compared the performance of applications of PCA, Kernel Principal Component Analysis, and ICA to a SVM for feature extraction to predict the stock price. In these studies, a typical auto regression prediction model based on ICA and SVR is proposed by Yeh et al [26]. They regard a stock market index as a chaotic time series, and predict the index by combining ICA and SVR after phase space reconstruction. Wu and Wei [29] proposed a multivariable regression model, selecting 18 technical variables as the input of the prediction model based on ICA and SVR. In the following section, we introduce the basic idea of ICA and SVR, as well as two important prediction models, AICA-SVR and MICA-SVR.
The principle of ICA ICA is a tool used for the solution to the blind source separation problem. The basic idea of ICA is to extract a set of statistically independent components (ICs) from the observed signal X. Originally, ICA was used for voice signal processing and digital image processing. Later, some researchers introduced this method to finance signal analysis in order to find the independent factors hiding in the complex financial phenomenon [37].
To describe the principle of the ICA, given m observed signals x 1 (t),x 2 (t), Á Á Á x m (t), and n independent random signals s 1 (t),s 2 (t), Á Á Á s n (t), then the relationship of vector x(t)~½x 1 (t),x 2 (t), Á Á Á x m (t) T and s(t)~½s 1 (t),s 2 (t), Á Á Á s n (t) T can be described as follows: or Where A is called mixing matrix, W~A {1 is a separation matrix, and each element a ij of A is an unknown mixture coefficient. From the formula (1), we can see each observed signal x i (t) is the linear combination of the independent random signals s 1 (t),s 2 (t), Á Á Á s n (t). That is, the random signals s 1 (t),s 2 (t), Á Á Á s n (t) are linearly independent due to the property of statistical independence. s 1 (t),s 2 (t), Á Á Á s n (t) span a linear subspace, and s 1 (t),s 2 (t), Á Á Á s n (t) are the base of the subspace. a ij is the coefficient of a linear combination which can be regarded as the coordinates x i (t) projecting the subspace s(t).
Hence, the ith row of matrix A can represent the observed signal x i (t) as an intrinsic feature. In general, the mixing matrix A and independent components s(t) are unknown, so the basic idea of ICA is to build up an estimate model to obtain A, W and s(t) from the observed signal x(t), if we make an assumption that the factors are statistically independent. Since the idea of ICA was proposed, various algorithms have been suggested to implement it, such as minimizing higher order moments [38], [39], maximization of mutual information of the outputs or maximization of the output entropy [40], minimization of the Kullback-Leibler divergence between the joint and the product of the marginal distributions of the outputs [41] and a fixed-point algorithm for ICA [42]. Among these algorithms, the fixed-point algorithm has become a very popular way to implement ICA, due to the fast convergence speed and good stability. For details of fixed-point algorithm, please refer to [43].
The theory of SVR SVM can be used in both classification and regression problem, the former being called a Support Vector Classifier and the latter being called a SVR. To describe the principle of SVR, given a training set X~x 1 ,x 2 , Á Á Á ,x N f g , Y~y 1 ,y 2 , Á Á Á ,y N f g , x i is the input of SVR, y i is the output of SVR. SVR approximates the function as Eq. (4) where, w is the weight vector, b is constant, and Q(x) represents a kind of nonlinear function that maps x from the input space to the high dimensional space in order to transform the nonlinear problem into a linear one. Any function that meets Mercer's condition can be used as the kernel function such as the Gaussian kernel function, polynomial kernel function, and perception kernel function. The mathematical expressions are , respectively. w and b can be estimated by a minimizing function where 1 2 w k k 2 is the regularization term. Minimizing 1 2 w k k 2 aims to control the model's capability and improve the performance of the generalization. C refers to a regularization constant which is used to specify the trade-off between the empirical risk and the regularization term. L e () is the e -insensitive loss function which is defined as Eq.(6) where e is a precision parameter which represents the tube size of the SVR. Both C and e need to be per-set before the SVR is built up. By introducing the positive slack variables j i and j Ã i , we can transform Eq. (5) into the following objective Eq. (7).
Subject to By introducing Lagrange multipliers a, a i and solving the quadratic programming problem, the decision function can be expressed as Eq. (8).
where, K is the kernel matrix, and the element K(x,x i ) is equal to the inner product of Q(x i ) and Q(x j ) in the high dimension space, which can be computed by the kernel function.

The auto regression model based on ICA and SVR
In order to obtain a return on investment, investors commonly care more about the future stock price, especially the closing price than other issues. The auto regression forecasting model is built up based on time series analysis. It aims to predict the future closing price by using the historical data. Since both the input and output are the value of the closing price, we name this type of model an auto regression model. Given a stock time series, when the slide window is moved from the beginning to the end, the training and testing samples are obtained sequentially. For example, typical input and output data of the auto regression model for stock market forecasting are shown in Fig. 1. The gray block represents the previous n trade days' data of stock closing price and the white block represents the n+1trade days' data. With the window sliding, N input data are obtained from the m year trade data set. If we want to predict the stock closing price on tz1 time, x(t), x(t{1), Á Á Á , and x(t{n{1) are used as the input of the model, and the output is x(tz1). n is the length of the slide window which can be selected versus empirical value.
Yeh et al [26] proposed the auto regression model based on ICA and SVR for time series prediction. Here, we call it the AICA-SVR model. Fig. 2 gives the stock market forecasting framework into which the model is applied. We can see that the model contains three stages: (1) the slide window is used to prepare the input data, (2) data pre-processing by ICA, and (3) forecast by SVR. The AICA-SVR model only focuses on the effect of the closing price itself, and does not pay attention to other related factors. In other words, this model behaves as if all the related factors can be reflected by the closing price of the stock, so the historical closing price decides the future trends. Actually, the movement of stock price is determined by numerous factors [44] and one single factor cannot represent all aspects needed to predict the future trends accurately.

The multi-variable regression model based on ICA and SVR
In previous studies [45], [22], [46], researchers have believed that some technical variables could be useful for predicting price movement, such as moving averages, relative strength index, oscillator, Williams's index and so on. Based on this concept, numerous multi-variable regression models have been proposed to predict the stock price. Under this type of framework, current technical variables are selected as the input of the forecasting model and the next day's price is the output. In Fig. 3, the top row block represents the output of the model, and the low line block represents the input. For example, on the time t, technical variables I 1 ,I 2 , Á Á Á I N are used as the input to predict the tz1 closing price x(tz1). However, different models offer different variables and there is no unified measure for the selection of input technical variables [47] [48] [49]. For example, Ettes [50] selected only two input variables while Zorin and Boriso [51] used sixtyone input variables. Recently, an alternative processing method has been proposed to select sufficient variables before component analysis methods are utilized to extract the intrinsic features from them [49], [68]. After the feature is extracted, the dimension of the raw data is reduced and the noise is filtered.
Lu and Wang [27], Samsudin et al [28], Wu and Wei [29] proposed the multi-variable regression model based on ICA and SVR for stock market prediction. Here, we call it MICA-SVR regression prediction model. The framework for this type of model is depicted in Fig. 4. We can see that this model contains three stages: (1) the initial exploration to prepare the technical variables, (2) dimension reduction by ICA, and (3) forecast by SVR. Compared to AICA-SVR, MICA-SVR takes some related factors into account and stresses the importance of other technical variables. However, it is not reasonable to treat the stock closing price as coequal with other technical variables. In fact, the history values of stock closing price plays the most important role in impacting the future of the closing price.

Feature fusion
Information fusion is a new, high-level technology which collects different information from multi-sensors of the same object and removes redundant information or noise from mutual information. Commonly, there are three different fusion levels: data level fusion, feature level fusion, and decision level fusion [52]. Due to its simplicity, feature level fusion is widely used in image recognition and fault diagnosis. The basic idea of feature fusion is to extract more than one type of feature from the original data, and to combine these features by using some fusion techniques. From the point of fusion form, there are three different fusion forms: series fusion, parallel fusion, and complex vector fusion that have all been applied to various research fields [53], [54], [55], [67]. Feature fusion supplies a useful method to combine different features to a union feature for the same recognition problem. The advantage of feature fusion is that the new union feature not only keeps useful information about the original features, but also eliminates redundant information to a certain degree.

Method
The forecasting model based on ICA, CCA and SVR Both the AICA-SVR and the MICA-SVR models can be regarded as pattern recognition systems. For this type of problem, the feature of the input is a key factor in impacting the prediction accuracy. AICA-SVR focuses on the closing price movement as the influence for the historical price, while MICA-SVR is more concerned about the influence of other technical variables. Both AICA-SVR and MICA-SVR have different features to deal with the same problem. It is obvious that these two types of features are both correlating and complementary.
In this paper, we propose a stock market predictive framework based on feature fusion. In this framework, an auto regression module extracts feature A from the history data of the closing price, and a multi-variable regression extracts feature B from related technical variables. The feature fusion module combines feature A and B to create a union feature by using certain fusion methods. The union feature is the input of prediction tool such as ANNs or SVR, and the output is the predicted future closing price.
In [53], a feature fusion framework is proposed, adopting the idea of CCA for pattern recognition. Inspired by this method, we discuss a specific stock market prediction model based on a feature fusion framework. This model hybrids AICA-SVR and MICA-SVR, and utilizes CCA as the feature fusion tool to extract the union feature. SVR is used to predict the future closing price. Since this model comprehensively applies ICA, CCA and SVR  The first stage of this model is that two types of features are extracted from the historical closing price and multi-variables respectively. The time series of historical closing price is divided into several segments by the slide window, then ICA is utilized to extract the first type of feature A from each segment. The length of the segment is equal to the width of the slide window. There is no definite conclusion about how to select the width of the slide window. Some studies have also indicated that recent day data have a bigger influence over the future price than old day data [56], [57]. At the same time, some researchers believe that the movement of stock price has periodicity [58], [59], so perhaps year, month and week can be used as units of length. As discussed above, we selected 30 days as the width of the slide window to produce the segments in this study.
Another type of feature B is extracted from some pre-selected technical variables by utilizing the ICA method. Thirty-nine technical variables for each day are selected as the raw data of the MICA module in this study. The name and description of the variables are displayed in Table 1. The variables are shown in publications [48], [49]. The meaning and role of the variables can be interpreted as follows. Open price, close price, high price and low price present the basic information which provides the movement of the stock market. Moving average is used to identify the direction of the price trend. BIAS serves as an indicator of overbought or oversold conditions and an indicator of price breakouts. Exponential moving average returns the exponential moving average of the specified period. Moving average convergence divergence displays trend following characteristics and momentum characteristics. The stochastic oscillator measures how much a price tends to close in the upper or lower areas of its trading range. Price rate of change shows the speed at which a stock's price is changing. True range returns a numeric value containing the difference between the true high and true low of the price. Momentum can help pinpoint the end of a decline or advance. Williams index Uses Stochastics to determine overbought and advance. Oscillator shows how a stock's price is doing relative to past movements. Relative Strength Index shows how strongly a stock is moving in its current direction. Phychological line reflects the buying power in relation to the selling power. On balance volume combines price and volume to show how money may be flowing into or out of a stock. Bollinger band shows the upper and lower limits of normal price movements based on the standard deviation of prices. Emotional index is the most important index to measure the power changes of the straddle  both sides. AR index called popular indicators, BR index called the sale will target, they are both long and short measure of market forces the most important indicators of change. Other variables (e.g. I 5 , I 12 , I 33 to I 39 ) reflect the change of close price, exponential moving average, stochastic %K %D and moving average. More detail about the technical indicators please refer to [64].
In order to extract features A and B, the historical data and technique variables data are organized to reshape the observed data x(t)~½x 1 (t),x 2 (t), Á Á Á x m (t) T according to Figs 1 and 2, respectively. Then, a fixed-point algorithm is used to generate mixing matrix A and independent components S(t) in formula (1). The ith row A i of matrix A is regarded as the ICA feature of the observed data x i (t). If is generated from historical data of the stock price, then A i is feature A; if x(t) is generated from technique variables data, then A i is feature B.
In the ICA algorithm, the selection of ICs subspace is a key issue. Bartlett et al [43] have proposed three methods to tackle this problem: (1) Method based on an amplitude of weight vector; (2) PCA-ICA method; and (3) Scaling factor method based on cluster analysis. Method (2) depends heavily on the PCA method and the selected subspace is kept within the PCA subspace. Method (3) is suitable to the classification problems but not regression problems, so Method (1) based on amplitude of weight vector is selected to reduce the dimension of raw data in this study.
The second stage of the model is the fusion module. CCA is used in this model to be the fusion tool. Hotelling [60] developed CCA, which is used to analyze the correlation problem of two random vectors. Suppose that x~x 1 ,x 2 , Á Á Á x N f g and y~y 1 ,y 2 , Á Á Á y N f grepresent the feature A and B extracted from the AICA module and the MICA module, respectively. x i e< p and y i e< q are the features of the ith sample. The basic idea of CCA is to find two project directions a and b to maximise the correlation of a T x i and b T y i while minimizing the correlation between the elements of a T x i and b T y i . Pearson Correlation coefficient can be used to measure the relationship between a T x i and b T y i ,we expect to search for the optimal values a and b and maximize correlation Corr(a T x i ,b T y i ),so the following objective function is given to solve the problem, S xx~E ½xx T and S yy~E ½yy T are the covariance matrices of x and y respectively, while S xy~E ½xy T denotes the between covariance matrix of x and y. Given the constrain aS xx a~bS yy b~1 ð10Þ by introducing Lagrange multipliers l 1 ,l 2 , the objective function (9) can be transformed to maximize the following equation The partial derivatives of L(a,b) with respect of a and b are then equalled to zero respectively.
Substituting Eq. (13) into Eq. (12), obtain the following eigenfunction where a and b are the eigenvector of eigenfunction, respectively. The projection matrix A d~a1 ,a 2 , Á Á Á ,a d f g and B d~b1 ,b 2 , Á Á Á ,b d f gis composed of the eigenvectors corresponding to the first d largest eigenvalues of function (15). After A d and B d are calculated, the fusion feature of x and y can be obtained by the following equation.
In order to solve Eq. (14), S xx and S yy must be nonsingular. If S xx and S yy are singular, we can use the perturbation method (Hong 1999) to modify S xx and S yy . The main idea is that a small perturbation is added to the singular S xx and S yy such that S xx and S yy becomes nonsingular, i.e. full rank matrix. For more details, please refer to [66]. The third stage of the model is the SVR used to predict the future value of the closing price. After the fusion feature is obtained, it can be used to train the stock market prediction model based on SVR. Its first step is to choose the kernel function. Different kernel function may yield different performances. Research indicates that the Gaussian kernel function shows good performance for forecasting problems [20]. Therefore, as it is suitable to cope with the finance series prediction problem, we choose the Gaussian kernel function for SVR. Another key problem of the SVR is to decide the parameters C, e and s which will deeply affect the predictive performance. The selection of C, e and kernel parameter s is an open problem, and a cross-validation method is commonly used in some research fields.

Model analysis
In this section, we give an intuitive analysis for the ICA-CCA-SVR model. The advantage of the proposed model may lie on the following reasons.
First, as a component analysis tool, ICA is often compared to another popular component analysis tool, PCA. The first difference between ICA and PCA is that the components of PCA are orthogonal, while those of ICA are independent. Secondly, PCA can only extract second order statistic characteristics of the observed signal. However, ICA can obtain the high order statistic characteristic hiding in the signal. Moreover, PCA application to the signal analysis is that the original data should be satisfied with a normal distribution. Unfortunately, this condition cannot be satisfied in a practical application such as stock market analysis. ICA does not demand that the original data follow a normal distribution.
Second, the main characteristic of ICA-CCA-SVR is that it combines two types of features that are used in the AICA-SVR and MAICA-SVR models. The advantage of this structure is the following: (1) From the feature extraction principle of ICA, the feature in the AICA-SVR and MAICA-SVR models defines the coordinates that the raw data projects into the ICA subspace; that is, the row vectors of mixing matrix A. In ICA subspace, the base vectors are independent because they represent the independent component of the raw data. However, from Eq. (3), we can see that the coefficients of linear combination are not independent, indicating that the features in the AICA-SVR and MAICA-SVR models are not independent. In this case, the features in both models must have redundant information. (2) Features in AICA-SVR and MAICA-SVR are the description for the same predictive object. It is obvious that common information between the two features is the most important factor for predictive performance. That is, two types of features have a certain correlation. (3) Features in AICA-SVR and MAICA-SVR are extracted from different observed points for the same predictive object. The former pays attention to the influence of the history data of the predictive object, and the latter focuses on related factors outside the predictive object. Thus the two types of feature have mutual complementarity. Based on the discussion above, it is quite useful for the improvement of the predictive performance to wipe off redundant information and combine the two types of features. From section 3.1, the union feature of the ICA-CCA-SVR model is made up of two parts z 1 and z 2 . Based on the principle of CCA, the elements of z 1 and z 2 have minimum correlation; compared to the original x and y, z 1 and z 2 have lower information redundancy. At the same time, there is maximum correlation between z 1 and z 2 , as serial features fusion strengthens the predictive information.
Finally, compared to ANNs and Basis Function Neural Network, SVR based on structural risk minimization has a stronger generalization capability to tackle the stock market forecasting problem. A cross-validation method is used to select the SVM parameters, which makes the model have better adaptability.
We give a computational complexity analysis of the three models. If the dimension of original features of AICA-SVR and MAICA-SVR is d 1 and d 2 , respectively, the training sample is N, the dimension of feature A and feature B is d, and N i is the maximum iteration of the ICA algorithm, then the computational complexity of AICA, MICA, SVR and CCA are o(d 3 In the proposed model, N&d 1 ,d 2 wd, (e.g. N~697,d 1~3 0,d 2~3 9 and dv30 the case study on the Shanghai stock marke) so the computational complexity of the three models depends on o(N 3 ). That is, the computational complexities of the three models have the same order of magnitude.

Experiments
To evaluate the performance of the ICA-CCA-SVR model, we performed experiments on two real-world datasets: the Shanghai stock market index and the Dow Jones index. Comparison was made with the AICA-SVR and the MICA-SVR models. We performed experiments on a PC with Intel (R) Core (TM) i3 CPU, 2G RAM memory, on a MATLAB 7.0 platform.

Data set description
The Shanghai stock market index data collected from January 4, 2003 to December 31, 2005 are used in this experiment. The overall data includes 1180 trading days' data, which are split into two parts: January 4, 2003 to December 31, 2004 and January 1, 2005 to December 31, 2005. The former, which includes 726 trading days' data, is used as the training set, and the latter, which includes 242 trading days' data, is used as the testing set.
To test the robustness of the model, we selected three years' worth of data from the Dow Jones index, which includes two years of data for the training set and one year of data for the testing set. The Dow Jones index data were collected from January 2, 2003 to December 31, 2005 for use in this experiment. The overall data Non-linear regression multiple correlation coefficient  The former, which includes 507 trading days' data, is used as the training set, and the latter, which includes 252 trading days' data, is used as the testing set.

Experimental settings measure index selection
As discussed in Section 3, the length of the slide window is 30 days so as to build up the raw data of the AICA module, and the 39 technical variables in Table 1 are used for the raw data of the MICA module. The forecasting performance of the proposed model ICA-CCA-SVR is compared to those of the AICA-SVR and MICA-SVR models. AICA-SVR uses previous price to predict the next days' price, while MICA-SVR uses current technical variables, and ICA-CCA-SVR uses both previous price and current technical variables for its prediction. To build the three models discussed above, we use the libsvm toolbox to compute the SVM algorithm, which is compiled by Chih-Jen Lin, a professor at Taiwan University (http://www.csie.ntu.edu.tw/ ,cjlin/). Cao and Tay [56] showed that SVR are insensitive to e, as long as it is a reasonable value. Therefore, we choose 0.01 for e in all the experiments in this study. In determining the kernel bandwidth s and the margin C, a three-fold cross validation technique is used to choose parameters that yield the best results, where s and C range from 2 {8 to 2 8 , the varying exponent step is selected as 1. ICs are ordered by the method based on amplitude of weight vector.
The predictive performance is evaluated by using the following performance measures, namely, Correlation Coefficient (r), Nonlinear Regression Multiple Correlation Coefficient (R 2 ), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE),and Root Mean Square Error (RMSE) [61], [62], [63]. The description and formulae of these indicators are given in columns 2 and 3 in Table 2. These indicators are used to measure whether the predicted value is similar to the actual value. If r and R 2 are bigger, it means that the predicting value is similar to the actual value. If MAE, MAPE, MSE, and RMSE are smaller, this also indicates that the predicted value is close to the actual value. In the table, y(t) and y _ (t) represents the actual value and predicted value respectively.

Experimental result
As we have discussed in Section 2, the Selection of ICs is the key issue in the data pre-processing. Different numbers of ICs correspond to the different dimensionality of features A and B, which have a strong influence on the predictive performance of the models. To determine the dimensionality of features A and B in our framework, we compare the correlation coefficient r as we change the dimensionality Dim in isolation. For the AIC-SVR  model, since the dimensionality of raw data is 30, the dimensionality of feature A is ranged from 1 to 29. In order to compare the AICA-SVR model, the dimensionality of feature B is also selected from 1 to 29.
The curves between r and Dim on the Shanghai stock index and Dow Jones index are displayed in the Figs. 6 and 7, respectively. From these figures, we can not only find the optimal dimensionality, but also illustrate the validity of the selected features of the three models. Figs. 6 and 7 give a comparison in terms of r using AICA-SVR and MICA-SVR, and ICA-CCA-SVR, respectively. The number Dim in the X-axis of figures refers to the dimensionality of features using different feature selection models. Note that the actual dimensionality of the ICA-CCA-SVR model is 2|p if the dimensionality of the MICA-SVR and AICA-SVR models are p based Eq. (15). For convenience, we draw the curves between r and Dim for the three models in a single figure. From the curves in Figs. 6 and 7, we can see that the AICA-SVR model's amplitude of fluctuation is the largest among all models, indicating that the AICA-SVR model is not as stable as the other two models. One possible underlying reason is that the single variable does not contain sufficient information whereas the multi-variable does. On the Shanghai stock market index, when dimensionality is smaller than 5, the r of the AICA-SVR model is higher than that of the MICA-SVR model. However, when the dimensionality is more than 8, the r of the MICA-SVR model is much higher than that of AICA-SVR. The highest r of the MICA-SVR model is 0.91685, which is also much bigger than that of AICA-SVR's 0.8174. On the Dow Jones index, the r of the MICA-SVR model is higher than that of AICA-SVR in the whole range of the dimensionality.
The performance of the ICA-CCA-SVR model is superior to both MICA-SVR and AICA-SVR with the increase of Dim. On the Shanghai stock market index, all plots of the ICA-CCA-SVR model are higher than MICA-SVR and AICA-SVR, although the first two plots have no distinct advantages and are even lower than AICA-SVR, due to the prediction information not being sufficient at lower dimensionality. The highest r of the ICA-CCA-SVR model is 0.95174, which is also much bigger than that of AICA-SVR and MICA-SVR. At the same time, as the Dim increases, the ICA-CCA-SVR model shows stable predictive performance compared to the other models. On the Dow Jones index, when dimensionality is smaller than 5, the r of the ICA-CCA-SVR model is lower than that of the MICA-SVR model, and even is lower than AICA-SVR when Dim is equal to 3, 4 and 5. However, when the dimensionality is increased to 6, the r of the ICA-CCA-SVR model is much higher than that of AICA-SVR and MICA-SVR. The ICA-CCA-SVR model obtains the highest r of 0.87446 among the three models.
The actual Shanghai stock market index and predicted values from all three models are illustrated in Fig. 8 and Fig. 9 is the same curve for the Dow Jones index. It can be observed from Fig. 8 that the predicted values obtained from the proposed ICA-CCA-SVR model are closer to the actual values than those of the MICA-SVR and AICA-SVR models. From Fig. 9, we can see that the predicted values on the Dow Jones index of all three models are  not fitted as well as they are on the Shanghai stock market index. Even so, the ICA-CCA-SVR model remains superior to the other two models. For comparison, AICA-RVR and MICA-SVR model were applied to evaluate the prediction accuracy of the proposed ICA-CCA-SVR model. Table 3 and Table 4 show the prediction results for AICA-SVR, MICA-SVR and the proposed ICA-CCA-SVR models. In the Tables, Dim, BestC, and Bestg represent the optimal dimensionality, parameter C, and parameter s for each model respectively. The comparison results show that the proposed ICA-CCA model has the smallest RMSE, MAPE, MSE and MAE values, and the highest R 2 , and r values in comparison with MICA-SVR and AICA-SVR. Table 3 demonstrates the comparisons of the forecasting results of three models for Shanghai stock market index. It can be seen from the table that the ICA-CCA-SVR model shows much better performance than the other two models. All the measure indicators of the ICA-CCA-SVR model are significantly improved after feature fusion. For example, the indicators MAPE, RMSE, MAE and MSE values of the ICA-CCA-SVR model reach 0.011, 16.54, 12.638 and 273.56 respectively, which is much less than those of the AICA-SVR and MICA-SVR models. The indicators R 2 and r values of ICA-CCA-SVR model reach 0.9486 and 0.95174, which is much bigger than those of the AICA-RVR and MICA-SVR models. Comparing the MICA-SVR model with the AICA-SVR model, the MICA-SVR model shows better performance than the AICA-SVR model. Table 4 compares the forecasting results derived from the three models for Dow Jones index, we can see that the three models do not work as well as they do in the Shanghai stock market. The result shows that the proposed model also has the lowest MAPE, RMSE, MAE and MSE and the highest R 2 and r values and outperforms the AICA-SVR and MICA-SVR models. It concludes that the proposed ICA-CCA-SVR model can produce lower prediction errors and higher prediction accuracy in the direction of change in price and outperforms MICA-SVR and AICA-SVR methods in forecasting the Shanghai stock market index and Dow Jones index closing prices.

Discussion
From the above results, we can draw the conclusion that the ICA-CCA-SVR model performs well and surpasses the AICA-SVR and MICA-SVR models. The predicted stock price is influenced both by its the historical price and by related technical variables. However, both the AICA-SVR and MICA-SVR models only extract the features from one side. The ICA-CCA-SVR model further removes redundant information from AICA and MICA features, and combines the retained useful information to improve predictive performance. We also notice that the performance of ICA-CCA-SVR is no better than that of AICA-SVR and MICA-SVR models, when the projecting dimensionality is low, that is, less than 3 for the Shanghai stock market index and 6 for the Dow Jones index. We believe that the possible reasons lie in the two following explanations. (1) The lower the dimensionality, the greater the ratio of noise to useful information is contained in the features. In this case, the fusion feature will strengthen the noise to impact the predictive performance. (2) For the CCA algorithm, the component of extracted features is uncorrelated but not independent, which means that the components have no influence over each other in the sense of statistical average. However, it cannot be visibly displayed when the components are insufficient.

Conclusion
This paper builds a forecasting model to predict the closing price of the stock market. It utilizes ICA and CCA as tools to extract predictive features before constructing an SVR stock market forecasting model. Experimental results on the Shanghai stock market index and on the Dow Jones index show that the ICA-CCA-SVR model proposed in this paper obtains better performance than both the AICA-SVR and MICA-SVR models.
Noise and redundant information exist in original stock market data, so feature extraction is a vital step in a forecasting model. Various types of existing forecasting models only emphasize the classifier of the model and pay little attention to the pre-processing of the data. In this study, we introduce ICA as the pre-processing tool to reduce the dimensions and to extract features from two different points. CCA is used as feature fusion tool to extract the intrinsic features of the input raw data. Due to the fusion feature extraction characteristic, ICA-CCA shows better performance in evaluating stock market data.
Although the proposed model provides many insights, it also has minor weaknesses. The forecasting accuracy of the model is not particularly high; for example, the highest correlation coefficient is 0.95174 for the Shanghai stock market index and 0.87446 for the Dow Jones index. We believe the main reason is that the proposed model has a certain sensibility to the data. Another weakness is that the optimal feature dimensionality of the ICA-CCA-SVR model may sometimes be higher than that of the other models, due to the serial features fusion method. To solve this problem, utilization of a more effective method as the parallel fusion method will be investigated through further study.

Author Contributions
Conceived and designed the experiments: ZG HW QL. Performed the experiments: ZG. Analyzed the data: ZG. Contributed reagents/materials/ analysis tools: HW JY. Wrote the paper: ZG HW.