Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

CSFPre: Expressway key sections based on CEEMDAN-STSGCN-FCM during the holidays for traffic flow prediction

  • Libiao Chen,

    Roles Conceptualization, Formal analysis, Methodology, Writing – review & editing

    Affiliation Fujian Expressway Science & Technology Innovation Research Institute Co., Ltd., Fuzhou, Fujian, China

  • Qiang Ren ,

    Roles Conceptualization, Data curation, Methodology, Writing – original draft

    2200601001@smail.fjut.edu.cn

    Affiliations Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou, Fujian, China, Research Institute for Transportation Big Data of Digital Fujian, Fuzhou, Fujian, China

  • Juncheng Zeng,

    Roles Investigation, Supervision

    Affiliation Fujian Expressway Science & Technology Innovation Research Institute Co., Ltd., Fuzhou, Fujian, China

  • Fumin Zou,

    Roles Funding acquisition, Project administration

    Affiliations Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou, Fujian, China, Research Institute for Transportation Big Data of Digital Fujian, Fuzhou, Fujian, China

  • Sheng Luo,

    Roles Methodology

    Affiliation Fujian Expressway Science & Technology Innovation Research Institute Co., Ltd., Fuzhou, Fujian, China

  • Junshan Tian,

    Roles Resources

    Affiliation Fujian Expressway Science & Technology Innovation Research Institute Co., Ltd., Fuzhou, Fujian, China

  • Yue Xing

    Roles Visualization

    Affiliations Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou, Fujian, China, Research Institute for Transportation Big Data of Digital Fujian, Fuzhou, Fujian, China

Abstract

The implementation of the toll free during holidays makes a large number of traffic jams on the expressway. Real-time and accurate holiday traffic flow forecasts can assist the traffic management department to guide the diversion and reduce the expressway’s congestion. However, most of the current prediction methods focus on predicting traffic flow on ordinary working days or weekends. There are fewer studies for festivals and holidays traffic flow prediction, it is challenging to predict holiday traffic flow accurately because of its sudden and irregular characteristics. Therefore, we put forward a data-driven expressway traffic flow prediction model based on holidays. Firstly, Electronic Toll Collection (ETC) gantry data and toll data are preprocessed to realize data integrity and accuracy. Secondly, after Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) processing, the preprocessed traffic flow is sorted into trend terms and random terms, and the spatial-temporal correlation and heterogeneity of each component are captured simultaneously using the Spatial-Temporal Synchronous Graph Convolutional Networks (STSGCN) model. Finally, the fluctuating traffic flow of holidays is predicted using Fluctuation Coefficient Method (FCM). Through experiments of real ETC gantry data and toll data in Fujian Province, this method is superior to all baseline methods and has achieved good results. It can provide reference for future public travel choices and further road network operation.

Introduction

At the end of 2020, the number of car ownership in Fujian Province had reached 7.313 million, and the mileage of expressways had breached 6,000 km, making China the 3rd ranked province for expressway road network density [1]. However, the growth rate of expressway traffic mileage is much lower than the growth rate of car ownership, which makes the contradiction between the supply and demand of expressways more serious, thus causing serious traffic congestion problems. The traffic flow during holidays contains more complex spatial-temporal characteristics, with a large range of changes, and the rules are very different from those on weekdays [2, 3], as shown in Fig 1. In addition, to provide convenience for people to travel, the State Council issued a notice on the toll free for the small-sized bus on expressways during major holidays (Spring Festival, Qingming Festival, Labor Day, and National Day) in 2012 [4], After the implementation of this policy, the national expressway traffic has increased drastically on major holidays, and traffic accidents and large-scale congestion have emerged, seriously affecting the quality of people’s travel. As soon as possible to solve the problem of free highway vehicle congestion on major holidays, it has become one of the key research directions in China’s traffic management. To further improve the operations efficiency of China’s expressways, 27,000 ETC gantries have been built in China, and 390 toll stations and 1,063 ETC main line physical gantries have been built in Fujian Province [5]. ETC gantry data and toll data can record the driving conditions of each vehicle on the expressway. Therefore, using ETC gantry data and toll data, we can accurately estimate the location of ETC gantries [6], prediction of traffic flow [7], travel time [8], travel speed [9], and service area vehicle recognition and dwell time estimation [10]. Since the expressway traffic management department cannot fully grasp the changing trend of road status during holidays, travelers cannot timely understand the changes in road conditions, resulting in too concentrated vehicle travel and difficulty in effective traffic control and diversion, causing traffic congestion [11]. Thus, how to accurately predict the traffic flow during holidays and reduce the occurrence of traffic congestion is an important challenge to realizing the fine management of expressways [12].

thumbnail
Fig 1. Traffic flow of Fuzhou-Xiamen expressway on May Day 2021.

https://doi.org/10.1371/journal.pone.0283898.g001

Aiming at this challenge, some scholars have been researching the expressway traffic flow during holidays. For example, Liu and Dai proposed a method based on deep learning that uses BP neural network models and wavelet filtering techniques to predict holiday traffic flow [13]. Since the Genetic Algorithm (GA) can effectively improve the efficiency and feasibility of the algorithm, Chen et al. proposed an improved GA-BP to predict holiday traffic flow [14]. In addition, Ji and Ge analyzed the spatial and temporal characteristics of holiday traffic flow and proposed a deep learning model based on Long Short-Term Memory neural network-Support Vector Regression (LSTM-SVR) for holiday traffic flow prediction [15]. The establishment of these prediction models makes up the blank of holiday expressway traffic flow prediction. But there are several shortcomings as follows. Firstly, most of the research is vehicle detector data, and vehicle detectors are placed at a specific location to report the traffic flow at those specific locations, and the destruction rate is high and the precision is low. Secondly, most models ignore the fact that traffic flow data may include a large amount of noise and use separate components to capture spatial and temporal correlations, not considering the direct capture of local spatial-temporal correlations. Finally, the limited number of holiday traffic flow samples is ignored. Therefore, this paper studies how to extract the salient features of traffic flow under limited data conditions, and capture these complex local spatial-temporal correlations and heterogeneities at the same time, to accurately predict the traffic flow on holidays.

In this paper, a holiday traffic flow prediction model is established. Firstly, the ETC gantry data and toll data are self-adaptive processed by fusing the CEEMDAN algorithm and decomposed into multiple scales of Intrinsic Mode Functions (IMFs). Secondly, the STSGCN model is used to predict each IMF, and the local spatial-temporal correlation of traffic flow data is fully explored. Finally, the FCM method is used to predict the traffic flow in the fluctuating part of holidays, which solves the problem of lack of traffic flow data during holidays in the past. By experiments on holiday expressway ETC data and toll data from Fuzhou to Xiamen in Fujian Province, the holiday traffic flow prediction model is verified to have good prediction performance. As the following is an overview of the main contributions of this paper: (1) A new method is proposed to predict traffic flow on key sections of expressways during holidays by using ETC gantry data and toll data; (2) The holiday traffic flow prediction model is composed of CEEMDAN, STSGCN and FCM. The model can well extract the uncertainty and nonlinear characteristics of traffic flow. A localized spatial-temporal graph is established, which only considers the spatial-temporal correlation of the ETC gantry and toll station associated with the target section. FCM is used to predict the part of the holiday traffic flow volatility; (3) By conducting extensive experiments on real holiday ETC gantry datasets and toll datasets, we verified that our proposed prediction method outperforms all baseline methods.

The rest of the paper is organized as follows: The rest of the paper is organized as follows: Section II reviews the literature on expressway traffic flow forecasting. Section III presents the problem definition of holiday traffic flow prediction. Section IV introduces the proposed models and methods. Section V conducts a detailed experiment on the developed model, and presents the experimental results as well as the prediction performance of the proposed model. Finally, our conclusions are presented.

Related work

At present, expressway traffic flow prediction is mainly classified into two categories: one is a regression model based on mathematical statistics, and the other is a machine learning model. In traffic flow forecasting, the most commonly used Autoregressive Integrated Moving Average model (ARIMA) and its variants [1618]. Yao et al. [19] proposed a hybrid model of ARIMA and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) for short-term traffic flow prediction. Lu [20] proposed a method for combined short-time traffic flow prediction based on the ARIMA model and LSTM neural network, which makes use of ARIMA and LSTM to obtain linear and nonlinear features of traffic data, respectively, using the dynamic weighting of sliding windows, and finally combining the prediction effects of these two techniques. Zhang et al. [21] proposed a wavelet transform based on seasonal ARIMA and combined it with external factors to predict traffic flow. However, how to capture the temporal and spatial correlations under nonlinear relationships is the core problem of traffic flow prediction. In general, traffic flow data are usually nonlinear, and ARIMA can only capture linear relationships by nature and cannot reflect nonlinear relationships and temporal correlation.

In capturing nonlinearities and temporal correlations, machine learning models have shown excellent performance. For example, Support Vector Machine (SVM) [22] and random forest [7] adequately capture nonlinear features and temporal characteristics. Lin et al. [23] proposed a new spatial time delays traffic sequences screening algorithm based on maximum information coefficients. On this basis, the SVR method and k-nearest neighbor algorithm are combined to transform the selected time delays traffic sequence into a traffic state vector to predict traffic flow. Based on the SVR algorithm, Zou et al. [24] established a prediction model by sliding window, and optimized the SVR model by Particle Swarm Optimization (PSO). However, most of these models only consider the temporal correlation and do not adequately consider the spatial correlation of traffic flow.

In recent years, as an important branch of machine learning, deep learning models have been widely applied in the field of traffic flow prediction. Yang et al. [25] proposed a multi-feature fusion framework that combines a Convolutional Neural Network (CNN) and external factors (weather and holidays) for traffic flow prediction. Qiao et al. [26] proposed a method that uses one-dimensional convolution neural network (1DCNN) for spatial information extraction and used LSTM to obtain temporal information of traffic data. To efficiently make use of the spatial-temporal dependency, a Graph Convolutional Neural (GCN) network is proposed to make up for the deficiencies of the CNN model. Zhao et al. [27] proposed a temporal graph convolutional network using a combination of GCN and Gated Recurrent Unit (GRU) to obtain the spatial-temporal characteristics of traffic flow. Yu et al. [28] proposed a Spatial-temporal Graph Convolutional Network (STGCN) with multiple graph convolutional layers to capture spatial features and one-dimensional convolution to fit temporal features. Guo et al. [29] increased the spatial-temporal attention mechanism on this basis, which can capture dynamic spatial-temporal information in traffic data adaptively. Whether on weekdays or weekends, these traffic flow prediction models have good results. However, the traffic flow during holidays is highly irregular and volatile, making these models unable to adequately capture the holiday characteristics, which affects the validity of the prediction. Therefore, how to further improve the accuracy of expressway traffic flow prediction during holidays still is meaningful research.

To accurately predict holiday expressway traffic flow, many scholars have conducted research. Xie et al. [30] used improved back propagation neural network (IGA-BPNN) optimized by GA to develop non-holiday and holiday traffic flow prediction models, and achieved accurate prediction. Lu et al. [31] analyzed the holiday traffic flow characteristics, divided the holiday traffic flow into regular and fluctuating parts, and predicted the regular flow by LSTM and the fluctuating flow by FCM. Luo et al. [32] proposed a hybrid prediction method combining the Discrete Fourier Transform (DFT) and SVR. Zhang et al. [33] proposed the use of CNN, GRU, and Convolutional LSTM (ConvLSTM) networks to analyze the temporal and spatial characteristics of holiday traffic flow.

The above research mainly focuses on the regularity of holiday traffic flow. But due to the limited sample volume of historical traffic flow during holidays, the specific patterns cannot be well modeled, and there are obvious differences in the flow data of the same period in different years, especially in years, which are far apart, so the accuracy of prediction cannot be ensured. Improve prediction accuracy by using ETC portal data and toll station data while capturing their temporal and spatial correlation. In this paper, we propose a method of ETC gantry data and toll data to predict the traffic flow of expressways during holidays.

Problem defintion

In this section, the relevant definitions are given. We first define some critical parameters, and then propose the problem of traffic flow prediction on the key sections during expressway holidays. The purpose of this research is to use historical ETC gantry data and toll data to predict the traffic flow in the target section at the next time interval. The traffic flow data extracted from the historical ETC gantry data and toll data are counted in a time series with a time interval of 15 minutes.

Definition 1 (Expressway QD): Each ETC gantry and toll station on the expressway is generically referred to as a Node, and two adjacent nodes on the road constitute an expressway section QD, QD = (Q, Distance), Q = (Node1, Time1, Node2, Time2), where Node1 is the start of the section, Node2 is the end of the section, and Distance is the actual distance of the section.

Definition 2 (Trajectory of a vehicle Traj): The sequence of nodes formed by a vehicle entering the toll station, passing through the ETC gantry, and finally exiting from the toll station is called Traj = (Node1, …, NodeN), where Node1 is called the starting point of the trajectory, NodeN is called the end point of the trajectory.

Definition 3 (The expressway network G): In order to express the spatial topology of each ETC gantry and toll station in the whole expressway network, the graph G = (V, E, A) is defined, where V = (Node1, …, NodeN) denotes the set of all nodes on the expressway, N denotes the number of nodes, E denotes the set of edges connected between nodes. A is the adjacency matrix of road network G.

Definition 4 (Traffic flow matrix ): Where P is the number of node attribute features, the length of the traffic flow history time series, and t denotes the time step. This matrix represents the observed values of the expressway road network G at time t.

Therefore, the traffic flow prediction problem for the entire key section of the expressway during holidays can be formulated as follows. Learn a mapping function F that maps historical traffic flow spatial-temporal data (, …, , ) to future observations in the expressway road network G, and T denotes the length of the historical traffic flow time series. Therefore, the problem of predicting the traffic flow for the next time interval t+1 in the target segment can be defined as: (1)

Methodology

In this section, we introduce in detail our proposed method for forecasting traffic flow on key sections of expressways based on holidays, which combines holiday traffic flow forecasting and holiday traffic flow fluctuation forecasting. First, the basic framework of traffic flow forecasting on the holiday key sections is introduced. Then, the four components of data preprocessing, CEEMDAN, STSGCN, and FCM will be introduced separately.

Overview of the overall framework

Expressway traffic flow data during holidays are more random and volatile, which makes prediction more difficult. In this paper, we propose a method for predicting holiday expressway traffic flow based on ETC gantry data and toll data, as shown in Fig 2. First, data need to be preprocessed to ensure its accuracy. On this basis, based on the time series, the vehicle trajectory is constructed based on the gantry topology data, and for the missed vehicle trajectory, the vehicle trajectory repair algorithm is utilized to repair the trajectory and ensure the completeness of the data. The nonlinearity and uncertainty of holiday traffic flow are more prominent. The CEEMDAN algorithm is applied to add adaptive positive and negative Gaussian white noise to the time series components of holiday traffic flow, thereby realizing the adaptive processing of holiday traffic flow. At the same time, due to the limitation of the spatial distance between the ETC gantry and the time range of the traffic flow sequence, we construct a localized spatial-temporal graph to represent the spatial-temporal correlations between the gantry and the toll station related to the target gantry after 15 minutes and use the STSGCN [34] model to extract the spatial-temporal features of the localized spatial-temporal graph. The decomposed IMF are predicted by the STSGCN model, the spatial-temporal characteristics of the traffic flow data are fully explored, and the predicted values of each IMF component are superimposed as the prediction results of the STSGCN model. During the holidays, the characteristics of the weekday traffic flow will be broken, and the traffic flow in a certain period appears to be a continuous peak, and we use FCM to predict the fluctuating flow. Finally, we can get the predicted holiday traffic flow.

Data preprocessing

ETC gantry data and charging data acquisition process, due to equipment abnormal, wireless crosstalk, bad weather, and other uncontrollable random factors, resulting in the gathered data mainly exist in the following three kinds of abnormal problems [35]. (1) Data duplication: This is usually due to network or system faults or software vulnerabilities, data is repeatedly uploaded, resulting in duplicate transaction records. (2) Data missing: The problem that data cannot be efficiently collected appears. This is usually due to traffic congestion, the ETC gantry failing to identify the vehicle or the device being abnormal and other reasons. (3) Data error: Data records do not match with normal traffic regulations, such as the entrance toll station time being later than the ETC gantry transaction time, and the vehicle is detected by the reverse gantry. These abnormal data seriously affect the application value of data mining. To reduce the impact of errors on the accuracy of the established prediction model, this type of data will first be removed.

Vehicle trajectory repair algorithm: the abnormal data of ETC gantry transaction data is removed, and the vehicle trajectory data suffer from different degrees of missing, as shown in Fig 3, the black color part represents this ETC gantry data is recorded normally, and the red color part represents the missing of this ETC gantry data. If the vehicle trajectory is not repaired, when the traffic flow of each section is counted, some vehicles will not be counted effectively, which will have some effect on the final prediction results. Therefore, it is necessary to implement vehicle trajectory repair to improve the reliability of prediction.

According to the time series, the driving trajectory of each vehicle is constructed, the ETC gantry search is applied to the driving trajectory of each vehicle by using the expressway road network G, and the two adjacent ETC gantries are rifled through to check whether the two adjacent gantry topologies exist in the expressway road network G. If not, based on these two gantries for road search, interpolation vehicle trajectory. The average speed of the vehicle passing through the two gantries is calculated, and this average speed is regarded as the average speed of all the sections between these two gantries. According to the distance between the adjacent gantries, the travel time of each section can be calculated, so that the vehicle travel trajectory can be repaired.

CEEMDAN

The ETC gantry system is susceptible to external factors, the gathered data will contain more noise. EMD is a classical adaptive method to solve nonlinear and non-smooth signal problems based on noise frequency processing. It does not need to set any basis function in advance, and only relies on its time scale characteristics to decompose the signal to obtain IMF with different amplitudes and a residue (res) [36]. However, the IMF components obtained by EMD have mode mixing, and the appearance of a mixing problem causes inaccurate time-frequency distribution, which makes some IMFs lose their physical meaning. EEMD [37] added the Gaussian white noise to the original signal to solve the mode mixing problem. But the Gaussian white noise after decomposition cannot be eliminated, and the completeness of the decomposition is poor, resulting in large reconstruction errors in the algorithm. To overcome these problems, Torres [38] proposed CEEMDAN, which improves EEMD by adding adaptive Gaussian white noise to each decomposition process to improve the completeness of the decomposition of EEMD, reduces the reconstruction error, have the highest decomposition efficiency and reduce the computational cost greatly.

The implementation steps of traffic flow time series data based on CEEMDAN decomposition are as follows. First, the time series data of the original traffic flow y(t) is denoted, IMFk is the kth order IMF obtained after EMD decomposition, vj(t) is the Gaussian white noise signal satisfying the standard normal distribution added at time j (j = 1,2,…N) during the decomposition process, and εj is the signal-to-noise ratio at each stage of the decomposition of the original traffic flow time series data y(t).

(1) Adding the Gaussian white noise vj(t) to the original traffic flow time series data y(t) at the jth, the traffic flow time series can be expressed as: (2)

(2) The EMD method is used to decompose the traffic flow data and obtain the 1st order modal component as , the mean value as IMF1(t) and the 1st order residual component as res1(t). (3) (4)

(3) The 2nd order modal component as and the 2nd order residual component sequence as res2(t). (5) (6)

(4) The kth order residual component as resk(t) and the k+1th order modal component as . (7) (8)

(5) Repeat the above steps until the residual component cannot continue to be decomposed (monotonic function or no more than two extreme points), and the final residual component can be expressed as: (9)

The IMFs together constitute the characteristics of the original signal on different time scales. The residual components clearly show the trend of the original traffic flow time series and effectively reduce the prediction error.

STSGCN

The STSGCN model uses an end-to-end approach to take historical traffic flow spatial-temporal sequence data as input and output it for future traffic flow spatial-temporal sequence data. First, the spatial-temporal sequence information of historical traffic flow is input to a fully connected layer, which is designed to map from low-dimensional traffic flow features to high-dimensional dimensions. Then, the Spatial-Temporal Synchronous Graph Convolutional Layer (STSGCL) automatically divides the localized spatial-temporal graph and the traffic flow matrix, obtains the spatial-temporal correlation of each l localized spatial-temporal graph, and aggregates and crops the output of the convolution module for each spatial-temporal graph. The STSGCN framework is shown in Fig 4.

Input layer.

The input traffic flow spatial-temporal sequence data XGRN×T is converted to XGRN×P×T by a fully connected layer. To directly capture the influence of the target segments on their adjacent segments, each segment is connected to its segment at two time points before and after to construct a spatial graph containing three times before, during, and after, which is called the localized spatial-temporal graph [39]. The adjacency matrix of the spatial-temporal graph is expressed by ARN×N, and the adjacency matrix of the local spatial-temporal graph constructed using 3 consecutive spatial-temporal graphs can be expressed by . For ETC gantries i in the spatial graph, the new index of the localized spatial-temporal graph is calculated by (t − 1)N + i (0 ≤ t ≤ 3). The spatial connectivity of the nodes is the same in each time step t. If two ETC gantries are interconnected in this localized spatial-temporal graph, the corresponding value in the adjacency matrix is set to 1. Fig 5(b) represents the related localized spatial-temporal graph extended from the spatial graph in (a).

thumbnail
Fig 5. Spatial-temporal graph representation.

(a) Spatial graph; (b) Localized spatial-temporal graph.

https://doi.org/10.1371/journal.pone.0283898.g005

Putting nodes with different time steps in the localized spatial-temporal graph will blur the spatial-temporal properties of each node. Aiming at the traffic flow spatial-temporal sequence data XGRN×T, a learnable temporal embedding matrix Te = RP×T and a spatial embedding matrix Se = RN×P are established for the extraction of spatial-temporal features. To enhance the modeling capability of spatial-temporal correlation, these two embedding matrices are added to the spatial-temporal sequence with broadcast operation [40]. Thus, the new representation of traffic flow spatial-temporal sequence data is: (10)

Spatial-temporal synchronous graph convolutional module (STSGCM).

A spatial-temporal synchronous graph convolutional module is constructed to obtain local spatial-temporal features for the traffic flow spatial-temporal sequence data , which contains two graph convolutional layers and one aggregation layer, as in Fig 6(a). The adjacency matrix determines the size of the aggregated weights. If the adjacency matrix of the localized spatial-temporal graph contains only 0 and 1, when the graph convolution operation is applied, even if there are topological relationships between ETC gantries, their features will be aggregated when there is no correlation at some time node. To enhance the performance of spatial-temporal feature extraction, a learnable mask matrix WmaskR3N×3N is embedded in the adjacency matrix , the mask matrix WmaskR3N×3N and the adjacency matrix are multiplied to generate the weighted localized spatial-temporal graph adjacency matrix . The calculation formula is as follows. (11)

thumbnail
Fig 6. STSGCM architecture.

(a) STSGCM; (b) Aggregation operation; (c) Cropping operation.

https://doi.org/10.1371/journal.pone.0283898.g006

Currently, there are two main types of graph convolutional neural networks, one based on the spatial domain [41] and the other based on the frequency domain [42]. In this paper, we use GLU as the activation function to perform graph convolution operations based on the spatial domain graph convolution method. The purpose is to aggregate the information related to the target gantry and the surrounding neighbor nodes through the defined aggregation function to update the characteristics of the traffic flow through the target gantry. This graph convolution operation can be expressed by the Eq (12). (12) where h(l) represents the output of the lth GCN layer, σ represents the GLU activation function, epresents the weighted localized spatial-temporal graph adjacency matrix, h(l−1) represents the input of the lth GCN layer and the output of the l-1th GCN layer, W and b are parameters.

To capture the spatial-temporal correlation of ETC gantries over a wider range, multiple stacked spatial-temporal synchronous graph convolution operations are designed to expand the region of their feature aggregation. For the multi-layer STSGCM, the output of each graph convolution operation is input into the aggregation layer for aggregation, as in Fig 6(b); on the basis of it, the contents of the previous and the next time steps are deleted, and the aggregated features of the intermediate time steps are retained, as in Fig 6(c).

There are two operational steps in the aggregation layer: the aggregation operation and the cropping operation. In the aggregation operation, max-pooling is used to maximize all the outputs of the graph convolution layer to obtain the aggregated representation of the intermediate spatial-temporal information. The max-pooling hAgg = max(h1, h2, …, hl) in the aggregation operation. Then by the clipping operation, only the intermediate time aggregated features are kept. The redundancy in the model is reduced and the generalization ability of the model is improved.

STSGCL.

The aim is to predict the traffic flow of the target section in the next 15 minutes. Therefore, a local spatial-temporal graph based method is proposed, which can capture the local spatial-temporal characteristics of the traffic flow for the before and after 15 minutes. Since spatial-temporal sequence data of longer time ranges are input in the actual situation, the traffic flow has different spatial-temporal dependencies in different times and spaces. Therefore, STSGCL is constructed using sliding window and multi-component modeling to deal with the spatial-temporal heterogeneity in spatial-temporal network sequences, the long-distance spatial-temporal features of traffic flow spatial-temporal sequence data are extracted.

For the traffic flow spatial-temporal series data of time length T, T-2 time segments can be cut out by sliding windows. The independent localized spatial-temporal graphs and the corresponding traffic flow matrix are then constructed on each of the T-2 time segments. They are input to T-2 STSGCM to capture the local spatial-temporal correlation. The outputs of the T-2 STSGCM are combined to form a new spatial-temporal sequence of traffic flows, and the output M of the STSGCL is expressed as: (13) Where Mi represents the output of the ith STSGCM and T represents the time step of the STSGCL input.

Output layer.

An output layer is deployed after the STSGCL, which consists of two fully connected layers to map the final output to the target sequence. It is first transposed, then its dimensions are reorganized and put into two fully connected layers. The output is .

FCM

Fluctuation coefficients of traffic flow at different times of holidays are directly related to the fluctuation coefficients of historical traffic flows and also related to the prediction coefficients of traffic flow changes at that time of the month. Therefore, the traffic flow during the holiday period can be considered as two parts: the weekday traffic flow and the fluctuating traffic flow based on the weekday traffic flow. (14) Where is the traffic flow at the next moment of the holiday, is the traffic flow at the next moment of the weekday, and α indicates the rate of the average traffic flow between the current moment of the holiday and the corresponding moment of the month.

Experiments

In this section, we use a real dataset of the Fuzhou-Xiamen holiday expressway to validate the feasibility of the expressway holiday traffic flow prediction model. We first introduce the dataset used in our study and then describe the setup and evaluation metrics of our model. We also select several convention models as baseline models to compare the predictive performance. Finally, the experimental results are analyzed from several angles.

Data

This study is based on the Fuzhou-Xiamen expressway road network in Fujian Province in 2021. The data are mainly divided into two types of data, one is the ETC gantry data and toll data from April 30 to May 5, 2021 of Fujian Expressway Science and Technology Innovation Research Institute Co., Ltd., the main attributes are shown in Tables 1 and 2. The second is the expressway road network topology data, which includes the connection relationship and actual distance between each ETC gantry, ETC gantry and toll station.

thumbnail
Table 1. ETC gantry system partial transaction data attribute table.

https://doi.org/10.1371/journal.pone.0283898.t001

thumbnail
Table 2. Toll system partial transaction data attribute tablee.

https://doi.org/10.1371/journal.pone.0283898.t002

The ETC gantry data is matched with the toll data to get the trajectory of each vehicle. There exist incidents of missing transactions and wrong transactions because of the expressway ETC system. For example, when a vehicle passes through the gantry of the upstream section, it is missed transaction due to the blocking of a larger vehicle in front of it, or it is too close to the gantry of the downstream section causing a wrong transaction. On this basis, a vehicle trajectory repair algorithm is proposed for trajectory repair to complement the integrity of the data set. Then the traffic flow dataset of each ETC gantry and toll station is counted at 15-minute intervals, as shown in Table 3. The z-score is used for global normalization, and the normalization formula is shown in Eq (15). (15) Z represents the normalized data, X, Xmean, and Xstd represent the initial value, the mean and standard deviation of the historical time series traffic flow, respectively.

thumbnail
Table 3. Flow statistics table for each gantry and toll station.

https://doi.org/10.1371/journal.pone.0283898.t003

Evaluation metrics and experimental setup

To evaluate the prediction performance of different models, the experiments use Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) as evaluation metrics. And Huber loss is used as the loss function. Compared with the common loss function, Huber loss is less sensitive to outliers and improves the speed of training. (16) (17) (18) where yi is the actual traffic flow, is the predicted traffic flow, and N is the number of ETC gantries and toll stations.

All data sets were split into a training set, and a validation set at a ratio of 8:2. To eliminate the randomness of the experiments, all experiments were repeated ten times to take the average. The STSGCN model was designed using Python in the MXNet [43] library, and the model parametric learning rate is set to 0.001, batch size is set to 32, and epochs are set to 3000. The STSGCN model consists of four STSGCLs, each STSGCM contains 3 graph convolution operations with 64, 64, and 64 filters. The input time series length has a significant impact on the traffic flow prediction. To further analyze the effect of input historical time series length on the prediction performance, the historical time series input length is tested from 1 to 16. Fig 7 shows the trend of RMSE with increasing input series length, and the blue line shows the trend of MAE. From the figure, we can know that a historical input sequence length of 12 is the best. Therefore, we use the past 12 consecutive time intervals to predict the traffic flow for the next time interval.

Experimental results and analysis

Before inputting the data into the model, we first decompose the original non-stationary characteristics traffic flow signal by CEEMDAN, and in the decomposition process, we add 500 groups of white noise signals with a standard deviation of 0.2. The decomposition results are shown in Fig 8, in which the original time series of traffic is decomposed into 7 IMF components with different randomness, and the adjacent IMF components fluctuate to a similar degree.

thumbnail
Fig 8. Decomposition results of the traffic flow after CEEMDAN.

https://doi.org/10.1371/journal.pone.0283898.g008

To analyze the predictive performance of our proposed model, we use ETC gantry data and toll data to compare the following methods, which include regression models based on mathematical statistics, machine learning models and deep learning models.

Autoregressive Integrated Moving Average mode (ARIMA): ARIMA is widely used in traffic flow forecasting as a time series forecasting method based on mathematical and statistical models.

Support Vector Regression (SVR): SVR model applies linear support vector machines for regression analysis.

Gate Recurrent Unit (GRU): Used for time series prediction.

Graph Convolutional Network (GCN): Well known for its excellent performance in capturing spatial correlation and widely used for capturing time series data processing.

STDN [44]: Proposes a low-gated local CNN for spatial modeling of dynamic similarity between locations, and LSTM for handling long-term periodic information and temporal variation in a layered manner.

STGCN: Consists of a graph convolution layer and a convolutional sequence learning layer to model spatial and temporal dependencies, respectively.

ASTGCN: Attention-based spatial-temporal graph convolutional networks are designed to capture complex dynamic spatial-temporal correlations with spatial attention mechanism and temporal attention mechanism, respectively.

STSGCN: Merges three-time steps into one graph to capture spatial dependencies and temporal correlations simultaneously.

Prediction performance comparison of key road sections.

During the holidays, not all expressway sections have significant holiday characteristics. Traffic flows in sections adjacent to key cities may have obvious holiday characteristics, while those undertaking daily commuting or suburban sections have no obvious characteristics.

In this paper, according to the document “Information and Distribution Map of 32 Congestion-prone Sections on Fujian Expressway” published by Fujian Expressway Information Technology Co., Ltd, we selected congestion-prone road sections in different areas on the Fuzhou-Xiamen Expressway to show the prediction performance of our proposed method. Those four sections are Xinglin Interchange Section, Xiang’an Interchange Section, Chidian Interchange Section, and Putian Interchange Section. Tables 4 and 5 shows the prediction performance of our model compared with other baseline models.

thumbnail
Table 4. Comparison of prediction results between Xinglin Interchange Section and Xiang’an Interchange Section.

https://doi.org/10.1371/journal.pone.0283898.t004

thumbnail
Table 5. Comparison of prediction results between Chidian Interchange Section and Putian Interchange Section.

https://doi.org/10.1371/journal.pone.0283898.t005

Among these four road sections, ARIMA performs the worst regardless of the section, with RMSE, MAE, and MAPE of 64.893, 40.524, and 0.2381, respectively. The reason is that ARIMA has great limitations in capturing complex traffic flow data and can only capture a certain amount of temporal correlation. In addition, we compare our model with other machine learning and deep learning prediction models. SVR can only capture a limited range of nonlinear features, so its prediction performance is only better than ARIMA. GCN can capture more spatial correlation, and GRU can capture more temporal correlation, so their performance is better than SVR. T-GCN, STDN, STGCN, and ASTGCN simultaneously consider spatial-temporal features, thus achieving improved prediction accuracy. However, these models are predicted for general weekday traffic flows, therefore, they are not suitable for application to predict holiday traffic flows. We propose a deep learning-based traffic flow prediction model for holidays.

To better explore the holiday characteristics of holiday traffic flows, we consider the effects of the quality and quantity of traffic flow data, for improving the accuracy of holiday traffic flow prediction. In addition, the CEEMDAN algorithm has the ability to solve the uncertainty of traffic flow, STSGCN has the ability to powerfully capture local spatial-temporal correlation and topological information, and the FCM algorithm can solve the problem of lack of traffic flow data during holidays. On this basis, we propose a holiday traffic flow prediction model. The combined model CEEMDAN-STSGCN-FCM proposed achieves the best prediction performance with RMSE, MAE, and MAPE of 26.428, 15.235, and 0.1331, respectively.

Fig 9 shows the predicted results for the Xinglin Interchange Section, the Xiang’an Interchange Section, the Chidian Interchange Section, and the Putian interchange section, respectively. It can be seen that the traffic flow shows obvious holiday characteristics: the peak is more significant than the usual. Our proposed model can fully capture the spatial-temporal characteristics of different types of traffic flows, and its prediction results closely match with the actual values. Therefore, the CEEMDAN-STSGCN-FCM model has strong robustness and can make accurate traffic flow predictions for different types of key sections.

thumbnail
Fig 9. Comparison of the prediction performance for different models.

(a) Xinglin Interchange Section; (b) Xiang’an Interchange Section; (c) Chidian Interchange Section; (d) Putian Interchange Section.

https://doi.org/10.1371/journal.pone.0283898.g009

Prediction performance for different time periods.

In order to explore in depth the prediction performance of CEEMDAN-STSGCN-FCM for different time periods, the average loss of the Xinglin interchange section was analyzed. Fig 10 shows the prediction performance of our proposed model with other baseline models for different time periods. When the traffic flow increases sharply, the performance indicators increase as well. The performance metrics of the statistics-based model ARIMA have the most significant peak characteristics, while the evaluation metrics of our CEEMDAN-STSGCN-FCM model exhibit the smallest peak characteristics in all periods. The results show that the CEEMDAN-STSGCN-FCM exhibits better stability and robustness in the peak and off-peak periods, and is able to achieve better forecasting results during holidays.

thumbnail
Fig 10. Performance comparison of models for different time periods on the Xinglin interchange section.

https://doi.org/10.1371/journal.pone.0283898.g010

Prediction performance of different components.

To further explore the impact of different components on prediction accuracy in our model architecture, we studied the construction of each component. First, we removed the CEEMDAN component or FCM component and kept the others unchanged in structure to show their effects on the prediction performance. Then, we compared various modules in the STSGCN model, for example, without setting the spatial-temporal embedding matrix and the mask matrix. Finally, we compared the RMSE, MAE, and MAPE of the prediction results.

  1. Without CEEMDAN: Remove CEEMDAN from CEEMDAN-STSGCN-FCM.
  2. Without FCM: Remove FCM from CEEMDAN-STSGCN-FCM.
  3. Without mask: The mask matrix does not exist in the STSGCN model.
  4. Without emb: The spatial-temporal embedding matrix is not added in STSGCL.

Table 6 and Fig 11 illustrate the experimental results of the different compositional research. According to the results, we conclude that the CEEMDAN-STSGCN-FCM outperforms the other conditions with the lowest RMSE of 26.428, the lowest MAE of 15.235; the lowest MAPE of 0.133.

thumbnail
Table 6. Comparison of prediction performance of different components.

https://doi.org/10.1371/journal.pone.0283898.t006

CEEMDAN-STSGCN-FCM without spatial-temporal embedding matrix performs the worst when various constituents are analyzed, indicating that the spatial-temporal embedding matrix in STSGCN can adequately capture the spatial-temporal characteristics of traffic flow and thus effectively improve the prediction performance. The prediction accuracy is significantly improved by adding a mask matrix to adjust the weights between each node in the graph convolution. The prediction performance of the model decreases when the model does not have CEEMDAN, indicating that CEEMDAN can improve the prediction performance because it enhances the ability to capture the nonlinear and nonsmooth signal features. The prediction accuracy of CEEMDAN-STSGCN is low when the FCM algorithm is not used, while the performance is significantly improved after using the FCM algorithm, indicating that the FCM algorithm can better reflect the characteristics of holiday traffic flow and effectively overcome the problem of inadequate prediction accuracy because of the limited sample size of holiday traffic flow. The above results show that our model architecture can fully capture the spatial and temporal characteristics of traffic flow on key sections during holidays, thus ensuring a better prediction of holiday traffic flow.

Conclusion

During holidays, due to the suddenness and irregularity of expressway traffic flow, forecasting traffic flow during holidays is a difficult task for traffic management. We propose a CEEMDAN-STSGCN-FCM model based on ETC gantry data and toll data for short-term traffic flow prediction during holidays. The main conclusions are as follows.

  1. The proposed CEEMDAN-STSGCN-FCM has obvious advantages in capturing the spatial-temporal correlation of traffic flow, especially in the expressway road network on holidays.
  2. Our proposed method can not only efficiently capture the spatial-temporal correlation of vehicles in route, but also can consider the spatial-temporal correlation of vehicles that will be proceeding from toll stations.
  3. The results of testing on real ETC gantry data and toll data show that CEEMDAN-STSGCN-FCM performs well under different studies and shows good robustness.

However, there are some limitations to our study. For example, the effects of other factors on traffic flow are not considered. In the next step, we will consider multiple data sources, such as COVID-19 data, weather data, to improve prediction accuracy based on real-time changes in data.

References

  1. 1. Chen H, Zou F, Guo F, Gu Q, Yu X, Luo Y, et al. A ETC gantry information calibration method based on trajectory data of special transportation vehicles. In: International Conference on Frontiers of Electronics, Information and Computation Technologies; 2021. p. 1–7.
  2. 2. Rahman SA, Mourad A, El Barachi M, Al Orabi W. A novel on-demand vehicular sensing framework for traffic condition monitoring. Vehicular Communications. 2018;12:165–178.
  3. 3. Rahman SA, Mourad A, El Barachi M. An infrastructure-assisted crowdsensing approach for on-demand traffic condition estimation. IEEE Access. 2019;7:163323–163340.
  4. 4. Lin X, Susilo YO, Shao C, Liu C. The implication of road toll discount for mode choice: Intercity travel during the chinese spring festival holiday. Sustainability. 2018;10(8):2700.
  5. 5. Huang L. Construction scheme of intelligent comprehensive application platform for expressway toll collection big data; 2022; 8: 17. Available from: https://mp.weixin.qq.com/s/rgA5bitqJfBqDacL48KJlw.
  6. 6. Guo F, Zou F, Luo S, Chen H, Yu X, Zhang C, et al. Positioning method of expressway ETC gantry by multi-source traffic data. IET Intelligent Transport Systems. 2022; https://doi.org/10.1049/itr2.12280
  7. 7. Tian J, Zou F, Guo F, Gu Q, Ren Q, Xu G. Expressway Traffic Flow Forecasting based on SF-RF Model via ETC Data. In: International Conference on Frontiers of Electronics, Information and Computation Technologies; 2021. p. 1–7.
  8. 8. Luo S, Zou F, Zhang C, Tian J, Guo F, Liao L. Multi-view travel time prediction based on electronic toll collection data. Entropy. 2022;24(8):1050. pmid:36010714
  9. 9. Zou F, Ren Q, Tian J, Guo F, Huang S, Liao L, et al. Expressway Speed Prediction Based on Electronic Toll Collection Data. Electronics. 2022;11(10):1613.
  10. 10. Cai Q, Yi D, Zou F, Zhou Z, Li N, Guo F. Recognition of Vehicles Entering Expressway Service Areas and Estimation of Dwell Time Using ETC Data. Entropy. 2022;24(9):1208. pmid:36141094
  11. 11. Ma Y, Lu S, Zhang X, Wei W. Model of highway travel selection considering individual risk preference difference. Journal of Jilin University (Engineering and Technology Edition). 2021;51(5):1673–1683.
  12. 12. Liu Q, Yang Z, Cai L. Predicting short-term traffic flow on expressway based on ETC gantry system data. Journal of Highway and Transportation Research and Development. 2022;39(4):123–130.
  13. 13. Liu Q, Dai H. Wavelet filtering of the BP neural network of highway congestion forecast analysis during the holidays. Highway Engineering. 2016;41(6):98–102.
  14. 14. Chen X, Wang K, Peng J, Li Y. Prediction expressway traffic flow on holidays with modified GA-BP model. Technology of Highway and Transport. 2018;34(6):114–117.
  15. 15. Ji X, Ge Y. Holiday highway traffic flow prediction method based on deep learning. Journal of System Simulation. 2020;32(6):1164–1171.
  16. 16. Lin X, Huang Y. Short-term high-speed traffic flow prediction based on ARIMA-GARCH-M model. Wireless Personal Communications. 2021;117(4):3421–3430.
  17. 17. Shahriari S, Ghasri M, Sisson S, Rashidi T. Ensemble of ARIMA: combining parametric and bootstrapping technique for traffic flow prediction. Transportmetrica A: Transport Science. 2020;16(3):1552–1573.
  18. 18. Hou Y, Deng Z, Cui H. Short-term traffic flow prediction with weather conditions: based on deep learning algorithms and data fusion. Complexity. 2021;2021.
  19. 19. Yao R, Zhang W, Zhang L. Hybrid methods for short-term traffic flow prediction based on ARIMA-GARCH model and wavelet neural network. Journal of Transportation Engineering, Part A: Systems. 2020;146(8):04020086.
  20. 20. Lu S, Zhang Q, Chen G, Seng D. A combined method for short-term traffic flow prediction based on recurrent neural network. Alexandria Engineering Journal. 2021;60(1):87–94.
  21. 21. Zhang H, Wang X, Cao J, Tang M, Guo Y. A multivariate short-term traffic flow forecasting method based on wavelet analysis and seasonal time series. Applied Intelligence. 2018;48(10):3827–3838.
  22. 22. Tang J, Chen X, Hu Z, Zong F, Han C, Li L. Traffic flow prediction based on combination of support vector machine and data denoising schemes. Physica A: Statistical Mechanics and its Applications. 2019;534:120642.
  23. 23. Lin G, Lin A, Gu D. Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient. Information Sciences. 2022;608:517–531.
  24. 24. Zou Z, Hao L, Li Q, Chen H, Knag L. Short-term traffic flow prediction of expressway based on particle swarm optimization-support vector regression. Science Technology and Engineering. 2021;21(12):5118–5123.
  25. 25. Yang D, Li S, Peng Z, Wang P, Wang J, Yang H. MF-CNN: traffic flow prediction using convolutional neural network and multi-features fusion. IEICE TRANSACTIONS on Information and Systems. 2019;102(8):1526–1536.
  26. 26. Qiao Y, Wang Y, Ma C, Yang J. Short-term traffic flow prediction based on 1DCNN-LSTM neural network structure. Modern Physics Letters B. 2021;35(02):2150042.
  27. 27. Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, et al. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems. 2019;21(9):3848–3858.
  28. 28. Yu B, Yin H, Zhu Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:170904875. 2017;.
  29. 29. Guo S, Lin Y, Feng N, Song C, Wan H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: Proceedings of the AAAI conference on artificial intelligence; 2019. p. 922–929.
  30. 30. Xie B, Sun Y, Huang X, Yu L, Xu G. Travel characteristics analysis and passenger flow prediction of intercity shuttles in the pearl river delta on holidays. Sustainability. 2020;12(18):7249.
  31. 31. Lu G, Li J, Chen J, Chen A, Gu J, Pang R. A long-term highway traffic flow prediction method for holiday. In: Advanced Multimedia and Ubiquitous Engineering. Springer; 2018. p. 153–159.
  32. 32. Luo X, Li D, Zhang S. Traffic flow prediction during the holidays based on DFT and SVR. Journal of Sensors. 2019;2019.
  33. 33. Zhang W, Yao R, Du X, Ye J. Hybrid Deep Spatio-Temporal Models for Traffic Flow Prediction on Holidays and Under Adverse Weather. IEEE Access. 2021;9:157165–157181.
  34. 34. Song C, Lin Y, Guo S, Wan H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2020. p. 914–921.
  35. 35. Guo F, Zou F, Luo S, Liao L, Wu J, Yu X, et al. The fast detection of abnormal ETC data based on an improved DTW algorithm. Electronics. 2022;11(13):1981.
  36. 36. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London Series A: mathematical, physical and engineering sciences. 1998;454(1971):903–995.
  37. 37. Wu Z, Huang NE. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in adaptive data analysis. 2009;1(01):1–41.
  38. 38. Torres ME, Colominas MA, Schlotthauer G, Flandrin P. A complete ensemble empirical mode decomposition with adaptive noise. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE; 2011. p. 4144–4147.
  39. 39. Ge L, Li S, Wang Y, Chang F, Wu K. Global spatial-temporal graph convolutional network for urban traffic speed prediction. Applied Sciences. 2020;10(4):1509.
  40. 40. Yu Q, Li Z. Correlated load forecasting in active distribution networks using Spatial-Temporal Synchronous Graph Convolutional Networks. IET Energy Systems Integration. 2021;3(3):355–366.
  41. 41. Li Y, Yu R, Shahabi C, Liu Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:170701926. 2017;.
  42. 42. Diao Z, Wang X, Zhang D, Liu Y, Xie K, He S. Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In: Proceedings of the AAAI conference on artificial intelligence; 2019. p. 890–897.
  43. 43. Chen T, Li M, Li Y, Lin M, Wang N, Wang M, et al. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:151201274. 2015;.
  44. 44. Yao H, Tang X, Wei H, Zheng G, Li Z. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In: Proceedings of the AAAI conference on artificial intelligence; 2019. p. 5668–5675.