A study on ship collision conflict prediction in the Taiwan Strait using the EMD-based LSSVM method

Ship collision accidents are the primary threat to traffic safety in the sea. Collision accidents can cause casualties and environmental pollution. The collision risk is a major indicator for navigators and surveillance operators to judge the collision danger between meeting ships. The number of collision accidents per unit time in a certain water area can be considered to describe the regional collision risk However, historical ship collision accidents have contingencies, small sample sizes and weak regularities; hence, ship collision conflicts can be used as a substitute for ship collision accidents in characterizing the maritime traffic safety situation and have become an important part of methods that quantitatively study the traffic safety problem and its countermeasures. In this work, an EMD-QPSO-LSSVM approach, which is a hybrid of empirical mode decomposition (EMD) and quantum-behaved particle swarm optimization (QPSO) optimized least squares support vector machine (LSSVM) model, is proposed to forecast ship collision conflicts. First, original ship collision conflict time series are decomposed into a collection of intrinsic mode functions (IMFs) and a residue with EMD. Second, both the IMF components and residue are applied to establish the corresponding LSSVM models, where the key parameters of the LSSVM are optimized by QPSO algorithm. Then, each subseries is predicted with the corresponding LSSVM. Finally, the prediction values of the original ship collision conflict datasets are calculated by the sum of the forecasting values of each subseries. The prediction results of the proposed method is compared with GM, Lasso regression method, EMD-ENN, and the predicted results indicate that the proposed method is efficient and can be used for the ship collision conflict prediction.


Introduction
The global shipping industry is witnessing a boom as economic globalization gains speed and the world economic integration trend intensifies in recent decades. According to the Review of Maritime Transport 2019, about 90 percent of global trade in terms of the weight of goods is undertaken by shipping, there is no doubt that shipping plays an irreplaceable role in the global economy [1]. However, shipping has long been regarded as a complex and high-risk activity, and maritime accidents often lead to serious damage, death, loss, injury or pollution, and may also have significant political, economic and environmental consequences [2]. The frequencies, shows much more regularities and is predicted more accurately than the original ship collision conflict series. EMD, proposed by Huang [21], exhibits a strong generality in dealing with non-stationary data. This method can reflect the physical characteristics of the original time series signal without pre-set basis function. As a special signal processing technique, EMD can decompose a complex signal into a collection of IMFs and a residue, which are relatively stationary subseries and can be readily modelled [22,23]. Discrete wavelet transform (DWT) is also a powerful method in dealing with non-stationary and nonlinear signals [24,25]. But the processing procedure of DWT is not autoregressive and the decomposition accuracy is affected by the band-pass filters which are chose to decompose target signals. Wavelet basics function and decomposed layer also affect the decomposition results. Therefore, the decomposition accuracy of DWT is relatively lower than EMD, and EMD is used in the decomposition of ship collision conflict time series. According to the above comprehensive analysis, in this work, an EMD-QPSO-LSSVM approach, which is a hybrid of empirical mode decomposition and quantum-behaved particle swarm optimization optimized least squares support vector machine model, is proposed to forecast ship collision conflicts. In the approach, the original ship collision conflict time series are decomposed into a collection of IMFs and a residue with EMD. Then, both the IMF components and the residue are used to establish the corresponding LSSVM models, where the key parameters of each LSSVM models are optimized by quantum-behaved PSO algorithm. Finally, the prediction values of the original ship collision conflict datasets are calculated by summing the forecasting values of every subseries. The effectiveness of the proposed model is verified using the real data from ship collision conflicts in the Taiwan Strait in 2014. The prediction results can, to some extent, provide a theoretical basis for the maritime department to develop an effective maritime management countermeasure and will be helpful in guiding the management of maritime traffic safety.

Objectives and contributions
Maritime transport plays an extremely important role in international trade and makes great contributions to national economic development. Shipping has long been regarded as a complex and high-risk activity, and maritime accidents often lead to serious damage, death, loss, injury or pollution, and may also have significant political, economic and environmental consequences. The collision risk is a major indicator for navigators and surveillance operators to judge the collision danger between meeting ships. In order to measure the collision risk, ship collision conflicts are used as an important index for measuring maritime traffic safety and maritime management. The objective of this study is to propose an efficient method to predict the future state by analysing the historical data of ship collision conflicts in the Taiwan Strait. The contribution of the work is the reference value for the administrative department in developing a maritime management countermeasure to reduce ship collision accidents.

Methodology
A hybrid of empirical mode decomposition and a least squares support vector machine model, named EMD-QPSO-LSSVM method, is proposed to forecast ship collision conflicts. The flowchart is shown in Fig 1. In the approach, the original ship collision conflict time series are decomposed into a collection of IMFs and a residue by EMD. Then, both the IMF components and the residue are used to establish the corresponding LSSVM models, where the key parameters of each LSSVM models are optimized by quantum-behaved particle swarm optimization algorithm. Finally, the prediction values of the original ship collision conflict datasets are calculated by summing the forecasting values of every subseries.

Empirical mode decomposition
Empirical mode decomposition method was first proposed by Huang [21]. In the prediction of non-stationary time series, EMD processing is very beneficial. This method can reflect the physical characteristics of the original time series signal without setting the basis function beforehand. The basic idea of empirical mode decomposition is that any set of signals consists of a limited number of intrinsic mode functions. According to the time scale characteristics of the data itself, the time series are decomposed step by step to extract IMF with different characteristic scales. Each IMF represents an intrinsic characteristic vibration form of the signal. The IMF needs to satisfy the following two basic conditions: i) The number of extrema and the number of zero-crossings should be equal or differ by one; ii) The average value of the upper envelope formed by the local maxima and the lower envelope formed by the local minima point should be zero.
Given an original ship collision conflict time series x(t), the EMD calculation can be described as follows: where imf k is the k th IMF and res(t) is the residue after the IMFs are derived. The empirical mode decomposition steps are as following: Step 1. Find all the maximum and minimum points of original data sequence x(t), and fit all the maximum points with a cubic spline function. This curve is the upper envelope of data. All minimum points, similarly, are fitted with a cubic spline function to fit the lower envelope of data. Let m 1 (t) be the mean of the upper and the lower envelopes. By subtracting the mean value m 1 (t) from x(t), a new data sequence h 1 (t) is achieved.
If h 1 (t) does not satisfy the two basic requirements of IMF, the work above should be repeated with h 1 (t) as the original data until h k (t) meets the two requirements after k times. At this time h k (t) is imf 1 (t).
Step 2. A new data sequence x 2 (t) is achieved by subtracting IMF 1 (t) from the original data x(t).
Step 3. Repeat the above steps n times until the last data sequence x n+1 (t) cannot be decomposed into IMF. This data sequence x n+1 (t) is named the residue res(t) of the original data.

Quantum-behaved PSO-LSSVM
Least-squares-SVM is a very active artificial intelligence method and is widely applied in modelling and control problems [19,26]. To optimize the LSSVM parameters, different algorithms were used in literature [20,[27][28][29][30][31]. Quantum-behaved particle swarm optimization algorithm is a kind of intelligent optimization algorithm developed on particle swarm optimization, and can be used to solve the nonlinear and complex optimization problems with the features of less control parameters, easily to set up, strong search capability and good global search ability [32,33]. In this work, a modified QPSO algorithm is adopted [20], where the swarm updates the individuals' positions in the following way: where φ[t],u[t] are random numbers in [0,1] at step t, N is the size of the swarm, D is the dimension of the particles, g best (t) is the entire swarm's best known position, p besti [t] is the ith particle's best known position, and p[t] is called a local attractor. The inertia weight β[t] takes the following form where χ [1] = λ [1] = 0 and w½t�≜ FITðg best ½t�Þ FITðg best ½t À 1�Þ ; l½t�≜ FITðp best i ½t À 1�Þ and β 0 ,β 1 ,β 2 satisfy the constraints β 1 <β 0 and β 0 +β 2 <1.78 as it was proved in [33]  , where x i 2 R m is input data in input space and y i 2 R is output value for given value of specific input variable, the LSSVM-based prediction model for the nonlinear function is The The kernel function Kernal(�) is chosen as the RBF kernel function, and the parameters γ 0 ,ρ and σ 2 are determined by QPSO algorithm. The flow chart of parameters adjustment QPSObased is depicted in Fig 2. The optimization procedure has been repeated several times as attempts to reach the most probable global optimum of the fitness function.

Error measures
To assess the performance of the prediction models, three error measures are used for model comparison, i.e., the mean absolute error (MAE), the mean relative error (MRE), the mean square error (MSE) and the mean absolute percentage error (MAPE).
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ðz p ðjÞ À zðjÞÞ where N Pred is the prediction sample size and z(j) and z p (j) are the actual and forecast values during a time period, respectively.

Ship collision conflict datasets
To verify the validity of the proposed hybrid approach, ship collision conflict data from the Taiwan Strait are employed. The data consist of actual daily ship collision conflicts from 1999 to 2014 [34], and the verification is processed on the data in 2014, as presented in Table 1.

Data processing
The data processing follows the following steps: Step 1: EMD of the ship collision conflict time series Due to the intrinsic complexity of the original ship accident time series, the variation tendency is difficult to predict. To improve the prediction accuracy, EMD is used to decompose the original ship collision conflict time series z = (z 1 ,z 2 ,� � �,z T ) with T = 365, which yields seven IMF components imf k = (z k1 ,z k2 ,� � �,z kT )(k = 1,2,� � �,7) and a residue res = (r 1 ,r 2 ,� � �,r T ), as illustrated in Fig 3. Step 2: Data normalization For the sake of expression, denote imf k by z k (k = 1,2,� � �,6) and res by z 7 = {z 71 ,z 72 ,� � �,z 7T }. Then normalize the sequence z k ¼ fz k1 ; z k2 ; � � � ; z kT g 7 k¼1 by Min-Max Normalization method [35] in the following form: Step 3: Data phase space reconstruction To sufficiently extract the useful information from time series � z k ¼ ð� z k1 ; � z k2 ; � � � ; � z kT Þ, the commonly used method is the phase space reconstruction (PSR) method in delay coordinates proposed by Packard et al. [36]. Theoretically speaking, a time series can sufficiently reconstruct an original dynamic system according to Takens [37]. From this procedure, time series � z k ¼ ð� z k1 ; � z k2 ; � � � ; � z kT Þ can be reconstructed in a multidimensional phase space as follows: where τ is the delay parameter and m is the embedding dimension. It is very important to select a suitable pair of embedding dimensions m and time delay τ when performing PSR [38][39][40]. There is no exact way to determine the values of τ and m, the result in [41] indicates that a larger value for τ than necessary should be selected to prevent system information from being ignored. Besides, according to the result in Brock et al [42], the appropriate values for embedded dimension m should be between 2 and 5. In the following simulations, the embedded dimension m is set equal to 4 and the time delay is assumed to be day to day.

Prediction by QPSO-LSSVM and representation
The data pair fðx ki ; y ki Þg (10) is used to train the QPSO-LSSVM and obtain an optimal parameter pair ðg k0 ; % k ; s 2 k Þ, where T 1 is the number of sample data in the training set. Then, the trained LSSVM is used to make a prediction The final step is to carry out the reverse normalization on � z k ¼ ð� z k1 ; � z k2 ; � � � ; � z kT 1 ; � y kðT 1 þ1Þ ; � � � ; � y kðTÀ mtÞ Þ; which yields the sequence z 0 k ¼ ðz 0 k1 ; z 0 k2 ; � � � ; z 0 kT Þ and the prediction result is

Analysis of prediction results
To evaluate the prediction accuracy, the dataset is partitioned into a training dataset (90%) and a validation dataset (10%). The training dataset can be applied to establish the prediction model, and the validation dataset can be applied to validate the effectiveness of the model. Grey model is easily set up, and the prediction result is presented in Fig 4. It can be observed that the prediction of GM is unsatisfied, and most of predictions are higher than the actual data. When LSSVM with key parameters γ 0 = 10,ρ= 0,σ = 2 is applied, the prediction results for training dataset and testing dataset are shown in Fig 4. It is obvious that the performance of the LSSVM is better than that of the GM. The maximum error is about 24% and the mean square error is about 5, it is still not suitable for real applications.
In order to improve the prediction accuracy, QPSO algorithm is applied to search an optimal key parameters (γ 0 ,%,σ 2 ). Here, the K-fold cross-validation is adopted to prevent the overfitting issue, and the training dataset is divided randomly into 9 folds, one of which was selected as the validation set each time for model selection, and the rest was used for model training. Table 2 illustrates the performance of LSSVM with 9-fold cross-validation.
Besides, due to the intrinsic complexity of ship collision, the regularity of the conflict time series is unobvious, and the prediction results directly from the original dataset is unsatisfied. Since a ship accident depends on the climate, which has specific cycles such as year, month, and week, the ship collision conflict time series can be considered as a combination of subseries characterized by different frequencies. Each subseries corresponds to a range of frequencies, shows much more regularities and is predicted more accurately than the original ship collision conflict series. The IMF components and residue by EMD is shown in Fig 3. The regularity of the latter five IMFs and residue is obviously stronger than the first two IMFs. By establishing different LSSVMs to the IMF components and residue, it can obtain a satisfied prediction results. The parameters of each LSSVM can be achieved by the flow chart of   The prediction of the quantum-behaved PSO-LSSVM for each IMF component and residue are shown in Fig 5. The final prediction of the original ship collision conflict numbers are calculated by the sum of the prediction of each subseries, as shown in Fig 6. It can be seen that the prediction accuracy has been greatly improved. This indicates that the proposed method can be used for the prediction of ship collision conflicts as a substitute for ship collision accidents in characterizing the maritime traffic safety situation.
To evaluate the performance of the proposed method, the statistical test is carried out on the real data and the prediction result of EMD-QPSO-LSSVM, as shown in Table 3. The sig. is 0.212, which is greater than 0.05. Thus, the proposed method is suitable in predicting the ship collision conflict numbers.
To verify the efficiency of the proposed method, it is compared with GM, Lasso Regression, Bayes Regression, LSSVR and EMD-ENN. The comparison results is shown in Fig 7 and Table 4, where ENN contains 15 neurons. It can be seen that the performance of EMD-QP-SO-LSSVM is better than other methods. But it should also be pointed that EMD-ENN is also a suitable method for ship collision conflicts predication.
Since there is no exact way to determine the choice of the embedded dimension, according to Brock et al [38], different simulations are carried out to show the influence of embedded dimension m, as shown in Table 5. For the ship collision conflicts, the embedded dimension can be set equal to 4 or 5 when the time delay is one.

Conclusion
The Taiwan Strait is a large channel between northern and southern China and is an important maritime passage connecting the Korean Peninsula, Japan, Southeast Asian countries, Hong Kong and Macao. The ship traffic flow is large, the navigation risk is high, and the daily average number of ship collision conflicts is approximately 220. The number of collision accidents per unit time in a certain water area can be used to describe the regional collision risk, which is the main index for evaluating maritime traffic safety and measuring maritime management. It is of great significance for maritime administrative authorities to formulate strategies to reduce ship collision accidents by predicting the occurrence of ship collision conflicts in the Taiwan Strait in a short period of time through historical collision conflicts. By considering the advantages of the empirical mode decomposition method, quantum-behaved PSO optimized least squares support vector machine, a hybrid of EMD and QPSO-LSSVM model, is proposed to forecast the ship collision conflicts. The original ship collision conflict time series are first decomposed into a collection of IMFs and a residue by EMD method. And then, both the IMF components and residue are applied to establish the corresponding LSSVM models, where the key parameters of the LSSVM are optimized by quantum-behaved PSO algorithm. Each subseries is predicted using the corresponding LSSVM. Finally, the prediction values of the original ship collision conflict datasets are calculated by the sum of the forecasting values of every subseries. The prediction results show that the EMD-QPSO-LSSVM is an efficient method and can be used in the forecasting of ship accidents.
Supporting information S1 File.