A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine

Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS). Because of the uncertainty and nonlinearity, short-term traffic flow prediction is a challenging task. In order to improve the accuracy of short-time traffic flow prediction, a hybrid model (SSA-KELM) is proposed based on singular spectrum analysis (SSA) and kernel extreme learning machine (KELM). SSA is used to filter out the noise of traffic flow time series. Then, the filtered traffic flow data is used to train KELM model, the optimal input form of the proposed model is determined by phase space reconstruction, and parameters of the model are optimized by gravitational search algorithm (GSA). Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. And the SSA-KELM model is compared with several well-known prediction models, including support vector machine, extreme learning machine, and single KLEM model. The experimental results demonstrate that performance of the proposed model is superior to that of the comparison models. Apart from accuracy improvement, the proposed model is more robust.


Introduction
Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS) [1][2]. Many subsystems of ITS such as Advanced Traffic Management System (ATMS) and Advanced Traveler Information Systems (ATIS) can benefit from improved prediction of traffic flow parameters (such as traffic volume, average traffic speed, and average occupancy) in a short-term future.
Many researchers have paid more attention to short-term traffic flow prediction because of its importance. As a result, a large number of relevant methods have been published in the academic literature. In general, these methods are categorized into three types [3][4]: traffic model-based methods, statistical methods and machine learning-based methods. A brief summary of these three types of methods is shown in Table 1. 1. Traffic model-based methods, which use the traffic flow model (include macroscopic model and microscopic model) to predict the evolving mechanism of traffic flow. Macroscopic model regards the traffic flow as fluid for exploring its evolution mechanism based on fluid dynamics [5][6]. Microscopic model focuses on behavior of single vehicle and the interactions between vehicles, such as lane changing model [7] and car following model [8].
2. Statistical methods, which is implemented by using the historical data to obtain optimal parameters in the fitting process. Typical statistical methods have been proposed and applied for years, such as local linear regression model [9] and Auto-Regressive Integrated Moving Average (ARIMA) [10].
3. Machine learning-based methods, which can effectively capture the nonlinearity relationship between the input and output existed in the data by using intelligent learning algorithm. Such as Artificial Neural Network (ANN) [11][12] and Support Vector Machine (SVM) [13][14].
Reader interested in details of these prediction models can refer to review papers such as Refs. [15][16][17]. By analyzing the characteristics of the three types of methods, traffic modelbased methods and statistical methods often assume several restrictive assumptions, such as a predefined model structure, the normality of residuals and the stationary of the time series, which are seldom satisfied in the case of nonlinear and chaotic traffic flow. However, machine learning-based methods are not restricted by previous assumptions. Particularly, ANN and SVM are more popular because of their mature theoretical basis and the excellent prediction performance.
In recent years, a novel learning algorithm called Extreme Learning Machine (ELM) is proposed for training Single Layer Feed-forward Neural Network (SLFN) [18]. In ELM, input weights and hidden biases are assigned randomly instead of being exhaustively tuned. For this reason, ELM training is fast and saves a lot of computing resources. Moreover, ELM also avoids falling into local minima in the learning process. Therefore, ELM achieves superior performance than traditional SLFN and Back-Propagation (BP) learning algorithm. Because of these advantages of ELM, it has been applied in different fields successfully [19][20][21][22][23][24][25]. In order to improve the generalization ability of ELM and reduce time consumption for determining the number of hidden layer nodes, Kernel Extreme Learning Machine (KELM) is developed more recently by replacing the ELM hidden layer with a kernel function [26]. Different with SVM, the kernel function of KELM does not need to satisfy Mercer's theorem. Thus application of KELM is easier than SVM. In addition, KELM is applied to solve many prediction or classification problems and achieves comparative or superior performance [27][28][29][30]. In spite of great success in some scientific fields, the KELM method is not utilized for predicting short-term traffic Gains knowledge from training data, do not need to predefined model structure, good robustness, ability to approximate any degree of complexity of traffic flow flow at present. Thus, we have reasons to believe that KELM method has a better application prospect in traffic flow prediction and try to employ KELM method for building a traffic flow prediction model in this paper. At the same time, to obtain the optimal KELM model, it is important to choose a kernel function and determine the model parameters. If the parameters are not selected correctly, the performance of KELM model is greatly reduced. In this paper, Gravitational Search Algorithm (GSA) is introduced to optimize the parameters of KELM model, which is a heuristic optimization method based on the Newtonian gravity law [31]. The GSA has demonstrated potential in parameter determination for nonlinear models [32][33][34]. Moreover, the input form of KELM model affects its performance and training speed. With the development of chaos theory, recent studies such as Refs. [35][36][37] have concluded that the nonlinear chaotic phenomena existed in short-term traffic flow time series. Phase Space Reconstruction (PSR) is the basis of chaotic time series analysis, which affects the prediction performance directly. It can map scalar time series to the multi-dimensional phase space and mining the implicit information thoroughly. Therefore, PSR is used to determine the optimal input form of KELM model in this paper. Delay time and embedding dimension are the key parameters for PSR. At present, there are many methods for choosing the two parameters. The methods can be used to calculate delay time including Autocorrelation Function method [38], Mutual Information method [39] and Average Displacement method [40]. The methods can be used to calculate embedding dimension including G-P method [41], False Nearest Neighbors method [42] and Cao method [43]. However, C-C method [44] is different from these methods and can simultaneously estimate delay time and embedding dimension based on statistical results. The advantages of C-C method are small amount of calculation, strong antiinterference ability and easy to use. Thus, we choose C-C method to determine the two parameters for PSR.
The data pre-processing is necessary before training the KELM model. Due to inevitable interference in the process of data acquisition or transmission, the original traffic data contain some noise components and unpredictable components (outliers) which have less useful information and often lead to over-fitting. In pre-processing process, removing noise and outliers of chaotic time series can reduce its complexity and increase its predictability [45]. Spectrum Analysis (SSA) is a relatively new time series analysis technique which combines multivariate statistic, probability theory and signal processing [46]. SSA is suitable for time series with various features and structures, such as stationary and non-stationary, linear and nonlinear time series [47]. It can extract the main features of time series, remove noise and unpredictable components effectively. So far, SSA has received increasing attentions and has been employed in several areas successfully, such as medicine, energy, climatology, and economics [48][49][50][51]. Therefore, we choose SSA to filter traffic flow time series in the pre-processing process.
The main objective of this work is to propose a hybrid short-term traffic flow prediction model named SSA-KELM using SSA and KELM. The KELM can achieve a better prediction performance using the filtered traffic flow time series data, because noise components of the original traffic time series have been removed by SSA during the pre-processing process. The novelty of the proposed model is highlighted in the following aspects.
1. The KELM as a relatively novel machine-learning method is first studied for short-term traffic flow prediction.
4. The parameters of KELM are tuned by gravitational search algorithm (GSA) to achieve a better prediction performance.
5. Non-parametric statistical tests are used for multiple comparisons of different prediction models on multiple data sets.
The rest of this paper is organized as follows. In Section 2, SSA method and KELM model are described briefly, and the hybrid traffic flow prediction model (SSA-KELM) is introduced. In Section 3, empirical analysis is performed, the prediction results of several different prediction models are given and discussed. In Section 4, conclusions and recommendations for future study are presented.

Methodology
In this setction, the basic methods related to the SSA-KELM model are briefly introduced, including Singular Spectrum Analysis (SSA), Kernel Extreme Learning Machine (KELM), Phase Space Reconstruction (PSR) and Gravitational Search Algorithm (GSA). Then the main steps of SSA-KELM model are given. In the SSA-KELM model, KELM is the basic prediction method, SSA is used to filter the noise components from the original traffic data, PSR is used to determine the optimal input form of KELM and the GSA is used to optimized the paramenters of the KELM.

Singular spectrum analysis (SSA)
SSA performs four steps including embedding, singular value decomposition, grouping and diagonal averaging. The first two steps are usually named decomposition of time series, while the third step and fourth step called reconstruction of time series. A brief review of SSA is as follows (more information can be obtained in Ref. [52]).
Step 1: Embedding. This step is to transform the original series into a sequence of multidimensional vector. For one dimensional series vector with length N as X = (x 1 , Á Á Á, x N ). Given a window length L(1 < L < N), the initial series is mapped into K lagged vectors: Then, the trajectory matrix is expressed as Eq (2). The matrix T is a Hankel matrix with size of L × K, which has equal elements x ij along the anti-diagonals where i + j = const, Step 2: Singular Value Decomposition (SVD). Let S = TT T , the eigenvalues of S are calculated and arranged in decreasing order, denoted as λ 1 ! Á Á Á !λ L ! 0. The eigenvectors of matrix S corresponding to these eigenvalues are denoted as U 1 , Á Á Á U L . Then the SVD of trajectory matrix X is defined as Eq (3): where d is the rank of T, termed as ith eigen triple of SVD. The contribution of T i can be measured by ratio of eigenvalues and given by Eq (4), Step 3: Grouping. Indices set {1, Á Á Á, d} is divided into m disjointed subsets I 1 , Á Á Á, I m . Let I = {i 1 , Á Á Á, i p } and the resultant trajectory matrix T I corresponding to the group I is defined as The resultant trajectory matrices are calculated for every group I = I 1 , Á Á Á, I m and the expansion of Eq (3) leads to the decomposition as follows.
Step 4: Diagonal averaging: the grouped decomposition in Eq (5) is transformed a new series with length N. Let Y 1 = (y 1 , Á Á Á, y N ) be the transformed one dimensional series of T I 1 , elements of Y 1 is calculated by Eq (6). where

Kernel extreme learning machine (KELM)
A brief description of ELM and KELM is given in this section. Interested readers can refer to Refs. [18,26] for more details. The output function of ELM for generalized SLFN is where g i (Á) denotes the output of the i-th hidden node with respect to the input x, i.e., activation function, such as "Sigmoid" function. w i 2 R n is the weight vector connecting the ith hidden layer neuron and the input layer neurons, b i the bias of the ith hidden layer neuron, β i 2 R is the weight connecting the ith hidden neuron and the output neuron, and f L (x) is the output of the SLFN. w i and b i are randomly assigned before learning. If SLFN can approximate theses N samples with zero error, there exists β i , w i and b i such that Thus Eq (9) is expressed compactly as . . .
Where H is the hidden layer output matrix, β is output weight matrix, and T is the target matrix. In the ELM, β is the only parameter that needs to be calculated and can be easily achieved by Least Squares Estimate (LSE) Where H † is Moore-Penrose generalized inverse of matrix H. one of the methods for calculating H † is the orthogonal projection method: According to the ridge regression theory, the diagonal of HH T can be add a positive value for regularization, then we have Where I is a unit matrix, C is penalty parameter. When the hidden feature mapping function h (x) is unknown, a kernel matrix for ELM is used according to the following equation: where K(x i , x j ) is a kernel function. Thus, the output function of ELM is denoted compactly as Because of the application of kernel function in ELM, this novel learning algorithm is named KELM. In KELM, weight vector w i , bias b i , the feature mapping function h(x) and the number of hidden layer neurons L are not taken into consideration. Instead, KELM only focuses on the kernel functions K(x i , x j ) and the training data set. Many kernel functions can be used for KELM, such as linear, polynomial and Gaussian radial basis function (RBF). We choose the Gaussian RBF kernel K(u, v) = exp(−ku − vk 2 / 2σ 2 ) to construct KELM model because of its superior performance [26,30]. In the KELM with Gaussian RBF kernel, the are two key parameters (penalty parameter C and kernel parameter σ) need to be determined.

Phase space reconstruction (PSR)
According to Takens' Embedding Theorem [53], as enough delayed coordinates are used, scalar time series is sufficient to reconstruct the dynamic of the underlying systems. For a time series {x(i), i = 1, 2, Á Á Á, N}, the phase space can be reconstructed according to Where τ is delay time, m is embedding dimension. According to phase space reconstruction. Input of the model is [ The C-C method is used to determine delay time τ the embedding dimension m. According to Ref. [39], the basic principle of C-C method is as follows.
The correlation integral is defined as where r is the search radius, M = N-(m-1)τ is the number of embedded points in m-dimensional space, N is the data number of the time series.
The correlation integral is a cumulative distribution function and denotes the probability of distance that between any pairs of points in the phase space is not greater than r.
The time series {x(i), i = 1, 2, Á Á Á, N} is divided into t disjoint time series: The test statistics is As N ! 1, the Eq (22) can be expressed as The maximum deviation ΔS(m, t) of S(m, r, t)~t with r is defined as According to the BDS statistical results obtained by Brock et al. [54], m = 2, 3, 4, 5, r i = iσ / 2 and i = 1, 2, 3, 4 is selected in general, where σ is standard deviation of time seires. Calculate the following variables, SðtÞ ¼ 1 16 where SðtÞ is the mean of S(m, r, t) for all subsequence. Let t equal or be smaller than 200, the first local minimum point of D SðtÞ $ t is the delay time τ. The global minimum point of Scor (t)~t is the delay time window τ w . The embedding dimension m is calculated according to τ w = (m − 1)τ.

Gravitational search algorithm (GSA)
In GSA, a set of agents has been given to find optimum solution based on Newtonian gravity law. Assume agents with number N in the dimension of n, and the position of ith agent is defined as: Where x d i is the position of ith agent in the dth dimension. The main steps of GSA are as follows: Step 1: Initialize velocity and positon of each agent.
Step 2: Evaluate the fitness of each agent.
Step 3: Update gravitational constant G(t) according to the following equation: where G 0 is the initial gravitational constant, α is decay rate (some arbitrary constant) and T is the maximum number of iterations.
Step 4: Calculate the gravitational (inertia) mass of each agent using the following equation: Where M i (t) and fit i (t) are the mass and the fitness value of the agent i at time t, respectively. worst(t) and best(t) are the worst fitness value of agent and the best fitness value of agent, respectively.
Step 5: Calculate the resultant force of agents according to the law of universal gravitation.
where F d ij t ð Þ is the gravitation between the agent i in the d dimension and agent j at time t; M i (t) and M j (t) are the passive gravitational mass of agent i and agent j respectively; ε is the small constant for avoiding the divisor equal to zero; x d i t ð Þ is the position of agent i in d dimension and x d j t ð Þ is the position of agent j at time t; R ij (t) = kx i (t), x j (t)k 2 is the Euclidean distance between agent i and agent j.
Step 6: Calculate accelerated velocity of agent. Accelerated velocity a d i t ð Þ at time t in d dimension is expressed as follows.
where rand is a random number in the interval [0, 1].
Step 7: Update the positon and velocity of each agent according to Eqs (33) and (34) Step 8: If the stopping criterion is satisfied, output optimal parameters, else proceed to Step 2.

SSA-KELM model
The overall flowchart of SSA-KELM model is illustrated as Fig 1. And the main steps of SSA-KELM model are as follows.
Step 1: The original traffic flow time series is filtered by SSA and get a reconstruction time series without nosie components. Step 2: The optimal input form of the KELM is determined by PSR. The filtered traffic flow time series is used to train the KELM model, while the original traffic flow time series is used to test the KELM model.
Step 3: The parameters of KELM model are optimized by GSA.  (3.6) Judge whether the terminated conditions are reached (usually the default calculation accuracy or iterations). If reached, continue to step 3.7; otherwise, go to step 3.2 and continue to search.
(3.7) Output the optimal parameters of KELM model.

Empirical research
In this section, the measured data is used to validate the performance of SSA-KELM model. First, a detailed description of the data source is given. Then, the traffic data is used to illustrate how to construct and optimize the SSA-KELM model. Finally, the performance of the SSA-KELM model is compared to that of several prediction models on multiple data sets. Besides the traditional statistical indices are employed to evaluate the performances of all the prediction models, non-parametric tests are used to analyze the results of the contrast experiment.

Data Source
All the experimental traffic volume data is obtained from two loop detectors (No. DC00004965 and No. DC00004966) installed on an expressway named Lianqian W Rd in Xiamen, China. The two detectors' locations are approximately shown in Fig 2, where No. DC00004965 detector locates in the westbound direction while No. DC00004966 detector locates in the eastbound direction. The traffic data is collected every 5 min in five consecutive working days (January 5, 2015 to January 9, 2015). Although average speed is also available, only traffic volume is considered in this study. The original traffic volume data is shown in Fig 3. The traffic volume data of the first four days is used to construct the SSA-KELM model, while the traffic volume data of the fifth day is used to test the KELM model. In the following several subsections, traffic volume data from No. DC00004966 detector is used as an example to illustrate how to build and optimize the SSA-KELM model in detail, while traffic volume data from the two detectors is all used to evaluate the performance of the SSA-KELM model.

SSA decomposition and reconstruction
As mentioned in subsection 2.1, the window length L is the only parameter in the decomposition process. If the time series has a periodic component, the window length is taken proportional to that period to get better separability [55]. Therefore, L = 288 is assumed here, which corresponds to daily variations of traffic volume time series. This window length results in 288 Eigen triples. Each Eigen triples corresponds to a singular value.

Determination of the optimal input form
The C-C method is used to implement phase space reconstruction for determining the optimal input form of the proposed model.  input ¼ . . . output ¼ x 1þð15À1ÞÂ9þ1 x 2þð15À1ÞÂ9þ1 x 3þð15À1ÞÂ9þ1 . . .

Parameter Optimization
GSA is employed to optimize the two parameters of the KELM model. In order to ensure the algorithm efficiency, the mean absolute percentage error is taken as fitness function. Then the fitness function is calculated as follows.
Where y i is the real value in time interval i,ỹ i is the prediction value for time interval i and n is the total number in the time series. The specific parameters of GSA are as follows: population size is set to 20, the initial gravitational constant G 0 = 100, the attenuation rate α = 10, the maximum number of iteration is set to 100, the value range of parameter C is set to [0.1, 1000] and the value range of kernel parameter σ is set to [0.01, 100].
In the process of parameter optimization, the k-cross validation method [56] is used to train the model. Because this method can make full use of the information in the sample and avoid over-fitting and under-fitting. In other words, it can improve the generalization ability of the model under the premise of ensuring good prediction accuracy. In k-cross validation, the training data set is randomly divided into k subsets. The k − 1 subsets are used as training set for building the model and the kth subset is used as a validation set for verifying model performance. Each subset is used as a validation set and the verifying is repeated k times in total. The average value of the results of k times verifying is used to evaluate the model performance.   with the increase of the number of iterations, the fitness curve is gradually convergent. When the number of iterations is 96, both the best fitness and average fitness are 0.0135 (MAPE = 1.35%). And the corresponding optimal parameter combination of the model is C = 7.28 and σ = 0.15.

Performance Evaluation Index
In order to evaluate the performance of the proposed model, two statistical indices are utilized to measure the prediction accuracy. These indices are the MAE (mean absolute error) and MAPE (mean absolute percent error). The smaller values of MAE and MAPE, the more accurate prediction results are. The calculation formulas for the MAE and MAPE are as follows: where y i is the real value in time interval i,ỹ i is the prediction value for time interval i and n is the total number of the time series.

Model Performance and Analysis
Traffic volume data from the fifth day is used as test set to evaluate the performance of the proposed model. For illustrating the model performance intuitively, Figs 11 and 12 show prediction results of the proposed model using NO. DC00004965 detector's data and NO. DC00004966 detector's data, respectively. Figs 11(a) and 12(a) present the curves of prediction data and measured data. The prediction curves can fit the measured curves well. Figs 11(b) and 12(b) present the scatterplots between the predicted data and the measured data. It is clear that these scatter points distribute near the measured line (the red line) without large deviation. Figs 11(c) and 12(c) show APE (absolute percent error) of the predicted data. The APE are mostly within 15%. However, the APE from 24 to 48 (correspond to 2:00-4:00) is larger, and the reason is that the actual traffic volume data is too small during that period.
To illustrate the superiority of the SSA-KELM model, contrast experiment is carried out. In this paper, the related models for comparsion are as follows: the single model (KELM), the  Table 2.
Besides related models (SSA-SVM, KELM and SSA-ELM) mentioned above, two state-ofthe-art traffic flow prediction models are used for comparison, that are Hybrid Particle Swarm Optimization Support Vector Regression (HPSO-SVR) model [57] and Long Short-Term Memory Neural Network (LSTM-NN) model [58]. In the contrast experiment, we implement the two state-of-the-art models according to detail steps of the corresponding references. Fig 13(a) describes the prediction results of the related models based on NO. DC00004965 detector's data, Fig 13(b), 13(c) and 13(d) are specific parts of the prediction results. Fig 14(a) describes the prediction results of the state-of-the-art models based on NO. DC00004965 detector's data. Fig 14(b), 14(c) and 14(d) are specific parts of the prediction results. Fig 15(a) describes the prediction results of related models based on NO. DC00004966 detector's data. Fig 15(b) is a specific part of the comparison results. Fig 16(a) describes the prediction results of state-of-the-art models based on NO. DC00004966 detector's data. Fig 16(b) is a specific part of the comparison results. As shown in Figs   For the SSA-KELM model, we can clearly see that frequency distribution of absolute errors in the interval [0,5) is the highest, and almost all the absolute errors of SSA-KELM model are less than 25, which illustrates the prediction model has better stability. Table 3 gives the prediction accuracy by using two statistical indices (MAE and MAPE). The prediction accuracy of the SSA-KELM model is improved significantly compared with the   Table 4, which illustrate that performance of the SSA-KELM model is also better than the other models, when traffic volume changes greatly.

Non-parametric tests for multiple comparisons of different models
In order to find significant differences among the results obtained by the studied models, statistical analysis should be performed. In this paper, non-parametric tests are used as suggested in Refs. [59][60][61]. The parametric statistical analysis loses its credibility because the initial conditions guaranteeing its reliability may not be satisfied [59]. They can offer safe and reliable procedures to contrast the differences between different techniques, especially in multipleproblem analysis.
To compare the performance of multiple models on the multiple data sets, Friedman aligned-ranks test is conducted according to the suggestions of Ref. [60]. Table 5 shows the average ranking of all models. In terms of both MAE and MAPE, the performances of all models can be sorted by average ranking into the following order: SSA-KELM, LSTM-NN,

Conclusions
A novel short-term traffic flow prediction model (SSA-KELM) is proposed based on SSA and KELM. The prediction accuracy of SSA-KELM model compares with that of SSA-ELM model, SSA-SVM model and single KELM model using the same real-world traffic volume data. The prediction results are encouraging, especially at the period that traffic volume changes greatly.
The main contributions of this paper are not only the introduction of KELM method for traffic flow prediction and how to optimize the model parameters based on GSA, but also considering the chaotic characteristics of short-term traffic flow and determining the model's optimal input form based on PSR. It is worth noting that filtering original traffic volume data in the preprocessing stage is important. The new time series reconstructed by SSA retains the main characteristics of the original traffic volume time series (the contribution ratio is 98.27%), but has filtered the main noise components. A conclusion can be got that the proposed model is an effective and accurate method for short-time traffic flow prediction, which can provide satisfactory prediction results. Although the experimental results presented here are promising and the SSA-KELM model can be successfully applied to predict traffic flow, this model suffers from weak interpretability like other ANN-based models. In future work, the model with well interpretability can be introduced into the proposed model, such as fuzzy system and its improvement [62][63][64]. It is difficult to improve the interpretability of the model while keeping the accuracy of the model. And it is a difficulty that how to reflect the characteristics of traffic flow in the model. To obtain more general and robust conclusions, traffic flow data from different roadways and more complicated traffic conditions should be used to test the proposed model. Moreover, it is required that the other data sets of traffic flow parameters (such as travel time, average traffic speed and average occupancy, this study chooses the traffic volume as the demonstration) are applied to test the model. Furthermore, traffic flow parameters data set with different time intervals (such as 1 min, 2 min and 10 min) should be used to test the proposed model for further study. Other advanced optimization algorithms should be further studied to search for more appropriate parameter combinations of the model and to obtain more accurate results of short-term traffic flow prediction.
Supporting Information S1 Dataset. The traffic volume data from the two detectors. (XLSX)

Acknowledgments
We thank DX Yu and HX zhou for their help in traffic data collection and analysis.