A novel stock forecasting model based on High-order-fuzzy-fluctuation Trends and Back Propagation Neural Network

In this paper, we propose a hybrid method to forecast the stock prices called High-order-fuzzy-fluctuation-Trends-based Back Propagation(HTBP)Neural Network model. First, we compare each value of the historical training data with the previous day's value to obtain a fluctuation trend time series (FTTS). On this basis, the FTTS blur into fuzzy time series (FFTS) based on the fluctuation of the increasing, equality, decreasing amplitude and direction. Since the relationship between FFTS and future wave trends is nonlinear, the HTBP neural network algorithm is used to find the mapping rules in the form of self-learning. Finally, the results of the algorithm output are used to predict future fluctuations. The proposed model provides some innovative features:(1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2)BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. This paper takes the TAIEX data set of Taiwan stock exchange as an example, and compares and analyzes the prediction performance of the model. The experimental results show that this method can predict the stock market in a very simple way. At the same time, we use this method to predict the Shanghai stock exchange composite index, and further verify the effectiveness and universality of the method.


Introduction
Forecasting is an important means of reducing risk and increase revenue in financial sector. Stock price prediction models can be divided into two categories: statistical model and artificial intelligence model. The former models include ANFIS [1], ARIMA [2], ARCH [3], GARCH [4], and so on. In such models, the variables must strictly obey the restrictive assumptions of linear or normal distribution. However, because of the uncertainty and complexity of the stock market, it is difficult to make out a strict normal assumption for a linear prediction model. Wang [5] studied the relationship between stock price and the changing of investors' a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 explore the optimal solution through technology integration. So far, in terms of stock market forecasting, the combination of fuzzy, fluctuating and BPNN synthesis is very rare.
The aim of this paper is to propose a new neural model to improve learning efficiency and predictive power. Therefore, we propose a hybrid forecasting method called High-orderfuzzy-fluctuation-Trends-based Back Propagation(HTBP)neural network modal. In such a model, the original data are first decomposed into multiple layers by the High-Order-Fuzzy-Fluctuation series. The algorithm node is consistent with the order of the wave sequence. This paper is the first attempt to utilize the HTBP based algorithm for forecasting the stock prices. The advantages of the model can be summarized as follows: (1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2) BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. The HTBP model is used to predict the stock market from 1997 to 2005 using the TAIEX data set and Shanghai Stock Exchange Composite Index (SHSECI) from 2007 to 2015. Furthermore, the superiority of our model is shown by comparing the HTBP with a traditional model based single BP neural network. we also compare the prediction results with several other existing methods, and conclude that the prediction effect of the model is better than the general prediction model.
The remainder of this paper is organized as follows: Section 2 introduces some research on fuzzy time series and the concept and model of BP neural network. Section 3 describes a prediction method based on BP neural network and fuzzy wave trends and logical relationships. In section 4, the model is used to predict the stock market from 1997 to 2005 using different data set. In section 5, summarize the conclusions and potential problems of future research.

Definition of fuzzy-fluctuation time series (FFTS)
Song and Chissom [19][20][21]combined fuzzy set theory with time series and presented the definitions of fuzzy time series. In this section, we will extend the fuzzy time series to fuzzy-fluctuation time series (FFTS) and propose the related concepts. Definition 1. Let L = {l 1 , l 2 , . . ., l g } be a fuzzy set in the universe of discourse U; it can be defined by its membership function, μ L : U ! [0,1], where μ L (u i ) denotes the grade of membership of u i , U = {u 1 , u 2 , . . .u i , . . ., u l }.
The fluctuation trends of a stock market can be expressed by a linguistic set L = {l 1 , l 2 , l 3 } = {down, equal, up}. The element l i and its subscript i is strictly monotonically increasing [26], so the function can be defined as follows, f: l i = f(i). To preserve all of the given information, the discrete L = {l 1 , l 2 , . . ., l g } also can be extended to a continuous label " L ¼ fl a ja 2 Rg, which satisfies the above characteristics. " L 0 is defined as forecasting value.M is defined as a constant to scale the range of " SðiÞ to facilitate machine learning. " QðiÞ is defined as the s value after scaling. Definition 2. Let F(t)(t = 1, 2, . . ., T) be a time series of real numbers, where T is the number of the time series G(t) is defined as a fluctuation time series, where G(t) = F(t) − F(t − 1), (t = 2, 3, . . ., T). Each element of G(t)can be represented by a fuzzy set S(t)(t = 2, 3, . . ., T) as defined in Definition 1. Then we call time series G(t) to befuzzified into a fuzzy-fluctuation time series (FFTS) S(t).
Sðt À 1Þ; Sðt À 2Þ; . . . ; Sðt À nÞ ! " QðtÞ ð4Þ Basic concept of BP neural network BP Neural Network belongs to a hierarchical network with powerful nonlinear processing ability. It doesn't need to know the relationship between the form or the variable of the data distribution. It can spontaneously organize training and learning based on the observed training data. In addition, it establishes a nonlinear mapping between the number of variables and the output. The principle of the network is based on the external feedback of the network, and the weight of the network mapping control variables is realized by adjusting the values of the neural network parameters to minimize errors. Based on BP Neural Network algorithm, we can predict future stock market fluctuations by using algorithms to learn historical fuzzy fluctuations. The model of the activation function is tanh(x). Compared to the Sigmoid function, the tanh(x) has been optimized to overcome the shortcomings of Sigmoid's not zero-centered. The value range of tanh(x) is [-1, 1].
The number of input layer nodes of BP Neural Network model is 9, which denote the 9thorder historical fuzzy-fluctuation trends (Fig 1). The number of output layer nodes of the model is 1, which denote the RHS. When the number of hidden layer nodes is 5, the learning effect is best.
x i represents the input value for each node of the input layer, and i represents the corresponding node number of the input layer. z j represents the hidden layer node, w ij represents the weight between input layer and the hidden layer node, and y j represents the output layer node.

A novel forecasting model based on BP Neural Network
In this paper, we propose a novel forecasting model based on High-Order Fuzzy-Fluctuation-Trends and BP Neural NetworkMachine Learning. In order to compare the forecasting results with other researchers' work, the authentic TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) is employed to illustrate the forecasting process. The data from January 1999 to October 1999 are used as training time series and the data from November 1999 to December 1999 are used as testing dataset. The basic steps of the proposed model are shown (Fig 2).
Step 2. Establish nth-order FFLRs for the forecasting model According to Eq (2), each S(t)(t ! n + 2) can be represented by its previous n days' fuzzyfluctuation number. Therefore, the total of FFLRs for historical training data is p n = T − n − 1. Step 3. Determine the parameters for the forecasting model based on BP Neural Network-Machine Learning algorithm In this paper, the BP Neural Network method is employed to learnthe fuzzy-fluctuation logical relationship.
Step 4. Forecast test time series For each data in the test time series, its future number can be forecasted according to Eq (7), based on the result of the output of the BP Neural NetworkMachine Learning, its n-order fuzzy-fluctuation trends.
Step 1. Calculate the fluctuation of each element of the history training dataset. Then, the fluctuation trends will be fuzzified into FFTS by the whole mean of the fluctuation numbers of the training dataset. For example, the whole mean of the historical dataset of TAIEX1999 from January to October is 85. That is to say, len = 85. For F(1) = 6152.43 and F(2) = 6199.91, G(2) = 47.48, S(2) = 3. In this way, the historical training dataset can be represented by a fuzzified fluctuation dataset as shown in S1 Table. Step 2. Based on the FFTS from 5January 1999 to 30October shown in S1 Table, the nthorder FFLRs for the forecasting model are established as shown in S2 Table. The subscript I is used to represent element l i in the FFLRs for convenience.
Step 4. Usethe FFLR obtained from historical training data to forecast the test dataset from 1 November 1999 to 30 December.
Firstly, the 9th-order historical fuzzy-fluctuation trends 3,2,2,2,2,3,1,2,2 on 1 November 1999 can be forecasted by the result 0.14506. Therefore, the forecasted fuzzy-fluctuation number is: The forecasted fluctuation from current value to next value can be obtained by defuzzifying the fluctuation fuzzy number: The other forecasting results are shown (Table 1 and Fig 3). This paper compares the difference between the predicted value and the actual value, and the objective is to evaluate the prediction performance. In the comparison of time series model, the broad indexes are the mean squared error (MSE), root of the mean squared error (RMSE), mean absolute error (MAE), mean percentage error (MPE), etc. These indicators are defined by Eqs (8) Let the order number n vary from 2 to 10, the RMSEs for different nth-order forecasting models are listed in Table 2. The item "Average" refers to the RMSE for the average forecasting results of these different nth-order(n = 2,3,. . .,10) models.
In practical forecasting, the average of results for different nth-order (n = 2,3,. . ., 9) forecasting models is adopted to avoid the uncertainty. The proposed method is employed to forecast the TAIEX from 1997 to 2005. The forecasting results and errors are shown (Fig 4 and Table 3). Table 4 shows the comparison results for RMSEs of different methods for predicting TAIEX1999. As can be seen from this table, the performance of the proposed method is acceptable. The best advantage of this method is that you do not need to determine the target function, nor do you need to determine the mapping rules. Learn from the algorithm and find the rules. Although some other methods of RMSEs are superior to the methods presented in this article, they usually need to determine complex rules to predict the results. In practice, however, it is often difficult to establish proper rules. The method presented in this paper is very simple and easy to implement computer program.

Friedman test
In order to verify the validity of the model proposed in this paper, we applied the Friedman test for the significance test based on JanezDemˇsar's [35] study. The Friedman test was a nonparametric statistical test proposed by Milton Friedman [36][37][38][39]. It sequenced the algorithm of each data set, the best algorithm got the rank 1, and the second best was 2. . ., as shown in Table 6. Let r j i be the rank of the j-th of k algorithms on the i-th of N data sets. The Friedman testwill compares the average ranks of algorithms, R j ¼ 1 N X i r j i . the Friedman statisticis distributed according to w 2 F with k − 1 degrees of freedom, when N and k are big enough. Iman and Davenport [40]thinked that Friedman's w 2 F is undesirably conservative and proposed a better statistic. Which is distributed according to the F-distribution with k − 1 and (k − 1)(N − 1) degrees of freedom.
Nemenyitest [41] is used when compared between all classifiers. The performance of the two classifiers is very different if the corresponding average level is at least different.
This article will rank the data sets from 1999 to 2004 and sort the different methods based on the RMSE error, as shown in Table 5. With 9 methods and 6 data sets, F F is distributed according to the F distribution with 9 − 1 = 8 and (9 − 1) × (6 − 1) = 40 degrees of freedom. The critical value of F (8,40) for α = 0.05 is 2.18, so we reject the null-hypothesis. Next, we used the Nemenyi test for pairwise comparisons. The critical value of CD for α = 0.05 is 3.102.
According to the average order value in the table, the difference between method A and method B exceeds the critical value, and the others are not exceeded. Therefore, there are significant differences between methods A,B and D(7.83-2.17>4.9), and no significant differences among other algorithms. In general, there is no significant difference between the proposed method and the latest methods in predicting the effect of error and predictive value.

Forecasting Shanghai Stock Exchange Composite Index
TheSHSECI is China's most typical stock market index. In further research, we apply the method to SHSECI's stock market forecast from 2007 to 2015. We use the real data set of SHSECI's closing price from January to October as training data, and data sets from November to December are used as test data. The RMSEs for the prediction error is shown in Table 6.
From Table 6, We can see that this method can successfully predict the SHSECI stock market.

Conclusions
This paper presents a prediction model based on high order fuzzy fluctuation and BP neural network. This method is based on the high order fuzzy logic relation of time series and then uses the self-learning of BPNN to automatically find the optimal prediction rules to predict the fluctuation trend. The greatest advantage of this approach is that the fuzzy theory, stock market fluctuation model and neural network algorithm are combined to construct a new model, which solves the problem of overfitting and over-fuzzy existing models. Experiments show that the parameters generated from the training data set can also be used for future data sets.
To compare the performance of other methods, we take TAIEX1999 as an example. We also predicted the validity and universality of TAIEX 1997-2005and Shanghai Stock Exchange Composite Index (SHSECI) from 2007 to 2015. The model presented in this paper has a significant advantage in universality, flexibility and comprehensibility. However, because of the influence of changing external factors, the accuracy of the forecasting results is just acceptable comparing with other models. In further research, we will take more consideration of the influence of external factors to improve the accuracy. Moreover, we will consider other factors that may affect the volatility of the stock market, such as trading volume, starting value, final value, etc. We will also consider the impact of other stock markets, such as the Dow Jones, the NASDAQ, and so on.
Supporting information S1