Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Multi-scale time series prediction model based on deep learning and its application

  • Zhifei Yang ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Project administration, Resources, Software, Supervision, Writing – review & editing

    916408624@qq.com

    Affiliations School of Electronic and Information Engineering, Lanzhou Jiao tong University, Lanzhou, China, Gansu Urban Traffic Big Data Application Industry Technology Center, Lanzhou Jiao tong University, Lanzhou, China

  • Jia Zhang,

    Roles Conceptualization, Investigation, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations School of Electronic and Information Engineering, Lanzhou Jiao tong University, Lanzhou, China, Gansu Urban Traffic Big Data Application Industry Technology Center, Lanzhou Jiao tong University, Lanzhou, China

  • Zeyang Li

    Roles Data curation, Project administration, Validation

    Affiliations School of Electronic and Information Engineering, Lanzhou Jiao tong University, Lanzhou, China, Gansu Urban Traffic Big Data Application Industry Technology Center, Lanzhou Jiao tong University, Lanzhou, China

Abstract

Time series prediction is a widely used key technology, and traffic flow prediction is its typical application scenario. Traditional time series prediction models such as LSTM (Long Short- Term Memory) and CNN (Convolution Neural Network)-based models have limitations in dealing with complex nonlinear time dependencies and are difficult to capture the complex characteristics of traffic flow data. In addition, traditional methods usually rely on manually designed attention mechanisms and are difficult to adaptively focus on key features. To improve the accuracy of time series prediction, the paper proposes a multiscale convolutional attention long short-term memory model (MSCALSTM), which combines a multiscale convolutional neural network (MSCNN), a multiscale convolutional block attention module (MSCBAM) and LSTM. MSCNN can effectively capture multiscale dynamic patterns in time series data, MSCBAM can adaptively focus on key features, and LSTM is good at modeling complex time dependencies. The MSCALSTM model makes full use of the advantages of the above technologies and greatly improves the accuracy and robustness of time series prediction. Extensive experiments are conducted on a dataset from the California Performance Measurement System (PEMS), and the results show that the proposed MSCALSTM model outperforms the state-of-the-art models. Experiments in the Energy domain show that our model also has strong generalization properties in other time series forecasting domains.

Introduction

Accelerated urbanization has made traffic congestion a major challenge for large cities. While significant resources are devoted to mitigating congestion and its associated problems, most solutions are costly, difficult to implement, or both. In contrast, accurate traffic flow prediction offers a cost-effective and practical alternative. This approach utilizes historical traffic data to forecast future conditions, such as flow and speed. By analyzing and processing this data, predictions of future traffic states enable traffic management departments to implement flexible control strategies, thereby reducing large-scale congestion and improving travel comfort.

The development of traffic flow prediction methods has evolved from traditional statistical approaches to machine learning and eventually deep learning. Initially, statistical methods like Auto Regressive Integrated Moving Average (ARIMA) [1] and Kalman filtering [2] were employed, using historical data analysis for prediction; while simple and intuitive, their effectiveness is often limited with complex, non-stationary traffic patterns. Subsequently, machine learning methods such as Support Vector Machine (SVM) [3] and K-Nearest Neighbors (KNN) [4] emerged, capable of capturing nonlinear relationships and hidden patterns through model training. However, these traditional machine learning approaches frequently suffer from overfitting due to the inherent nonlinearity of traffic flow data, which negatively impacts their prediction performance. In recent years, deep learning-based traffic flow prediction methods have emerged as a research hotspot. Convolutional Neural Networks (CNN) [5,6] are widely used for capturing crowd flow characteristics via grid analysis, demonstrating superior feature extraction and information mining capabilities compared to other deep learning approaches.Recurrent Neural Networks (RNN) [7,8] inherently suffer from long-term dependency issues, which are effectively addressed by Long Short-Term Memory networks (LSTM) [9]. Have become the most prevalent method for time series forecasting due to their ability to comprehensively model nonlinear characteristics. This is evidenced by successful LSTM applications in diverse domains, such as cryptocurrency price classification [10], short-term subway passenger flow prediction [11], and emotion EEG recognition with enhanced feature extraction [12].

However, standalone LSTM models exhibit limitations in achieving high prediction accuracy and fully capturing complex data features for time series tasks. This has prompted researchers to integrate LSTM with complementary methods, forming composite models to enhance performance [1315]. A particularly successful approach combines the spatiotemporal feature extraction strengths of CNN and LSTM; for instance, composite CNN-LSTM architectures have demonstrated superior accuracy over single models in diverse applications like capturing nonlinear relationships [16], power consumption forecasting [15], water quality prediction [17], and gold price fluctuation analysis [18]. Concurrently, attention mechanisms have been leveraged to boost model precision by enabling focused processing of critical input subsets, implemented in structures ranging from encoder-decoder based networks for recursive models [19] to lightweight modules like the squeeze-excitation network (SeNet) applied in convolutional networks such as ResNet [20,21].The convolutional block attention module (CBAM) [22] is an improvement on SeNet, which considers global average and maximum pooling of channel and spatial attention modules at the same time. Nauta et al. [23] considered attention-based dilated deep separable temporal convolutional networks (AD-DSTCNs) and demonstrated that the attention mechanism can be effectively applied to time series prediction of (Dynamic Deep Spatial-Temporal Convolutional Network) DDSTCN. Spatial-Temporal Graph Convolutional Network (STGCN) [1] and Diffusion Convolutional Recurrent Neural Network(DCRNN) [24] use the distance between actual road network nodes to represent the spatial correlation between road segments. Ma et al. proposed Spatial-Temporal Adaptive Graph Convolutional Network(STAGCN) [25] to automatically capture the spatiotemporal relationship in traffic sequences and adaptively model the road network spatial topology graph. Temporal-Fusion Graph Convolutional Attention Mechanism(TFM-GCAM) [26] is based on the traffic flow matrix and consists of graph convolution and attention mechanism. It captures the traffic characteristics of nodes more clearly and reduces the computational cost.

Although the above models can capture spatial dependencies and better model structured time series data using graph convolution compared to basic single models, they still lack the ability to model multi-scale spatial dynamic patterns, and the attention mechanism used needs to be manually designed and lacks adaptability. Transformer-based methods [27] use self-attention mechanisms to effectively model long-term spatiotemporal dependencies and achieve good results, but they lack the ability to model spatial features and multi-scale temporal dynamic patterns. Recent spatiotemporal modeling frameworks leveraging dynamic graph structures have shown significant advantages in extracting feature correlations. Notably, He et al. [28] developed the Dual-Correlation Dynamic Graph Convolutional Network (DC-DGCN), which employs multi-objective optimization to dynamically update graph structures and integrates bidirectional LSTM for real-time physical-digital space interaction, effectively capturing multiscale dynamic degradation patterns through coordinated learning. Complementary hybrid approaches also demonstrate strong performance: a novel architecture combining machine learning with wavelet transform [29] excels at capturing long-term dependencies across power datasets, while a hybrid model fusing wavelet-transformed data with an artificial neural network (integrating long/short-term networks) [30] achieved state-of-the-art results on greenhouse gas monitoring data from Russia’s Bely Island.Multi-scale feature extraction methods have also advanced significantly in industrial applications; for instance, He et al.’s RTSMFFDE-HKRR method [31] achieves high-precision bearing fault diagnosis in noisy environments through multi-scale entropy and regression techniques, focusing on local feature decoupling and signal stability. In contrast, our model employs parallel multi-scale convolutional kernels to coordinate the extraction of road-level micro-features and regional-level macro-features, dynamically weighting them via a spatiotemporal attention mechanism. This design avoids information loss inherent in traditional entropy-based multi-scale methods (e.g., RTSMFFDE’s coarse-graining) and leverages convolution’s translation invariance to directly model traffic flow spatial propagation.

Nevertheless, despite the high accuracy offered by existing deep learning methods, their substantial computational burden and training complexity remain challenges. To address this and enhance time series prediction accuracy and robustness, this paper proposes MSCALSTM—an attention-based composite model integrating Multi-Scale CNN (MSCNN) for extracting multi-scale dynamic patterns, Multi-Scale Convolutional Block Attention Module (MSCBAM) for adaptively focusing on key predictive features, and LSTM for modeling complex temporal dependencies. The key contributions are:

(1) We proposes a time series prediction method that combines multi-scale CNN, multi-scale CBAM and LSTM to improve the prediction accuracy; (2) We introduces a multi-branch convolutional layer in CBAM for improvement. Using convolutions of different scales can enable the module to better model the spatial relationship between features; (3) Experimental results show that compared with existing methods, MSCALSTM outperforms the state-of-the-art methods on four public datasets in the transportation field (PeMSD3, PeMSD4, PeMSD7, and PeMSD8) and on a dataset in the Energy field. The rest of this paper is organized as follows: Section 2 reviews the related work based on traffic flow prediction; Section 3 systematically describes the overall structure and specific methods of the MSCALSTM model; Section 4 introduces four commonly used traffic flow datasets and one energy dataset, and gives the evaluation metrics; Section 5 discusses the experimental results; finally, Section 6 concludes this paper.

Related work

Convolutional neural network

CNN (Convolutional Neural Network) is widely used in image processing and computer vision, and has also been shown to be suitable for the study of traffic flow prediction problems [32]. The typical CNN architecture consists of convolutional layers, pooling layers, dropout layers, and fully connected layers [33]. CNN uses convolutional kernels to capture relevant features from the input data, which are then processed through pooling layers and fully connected layers to produce the final output. GCN (Graph Convolutional Network) is a typical example of a CNN model. Zhao et al. used GCN to capture the spatial dependencies of roadmaps and constructed a TGCN (Temporal Graph Convolutional Network) model [34]. Peng et al. [35] proposed a spatiotemporal correlation dynamic graph recurrent CNN to predict urban traffic passenger flow. However, CNN has limitations in capturing the temporal dependencies of time series data. Therefore, it is necessary to integrate recurrent neural network (RNN) technology and combine CNN with long short-term memory (LSTM) networks to improve the performance of prediction tasks [36]. The structural diagram of CNN is shown in Fig 1:

Long short memory network

LSTM (Long Short-Term Memory) is a deep learning structure and a variant of RNN (Recurrent Neural Network). RNN can be applied to the prediction field of time series data [37]. Since RNN has the problem of gradient vanishing, the gating mechanism of LSTM can solve this problem [9], and LSTM has been applied to the field of traffic flow prediction. Ma et al. [38] used the LSTM model to construct a traffic speed prediction model. The experimental results show that the prediction accuracy of the LSTM network is better than most statistical learning methods. Zhao et al. [39] proposed a traffic flow prediction model based on LSTM, which takes into account the spatiotemporal correlation of the traffic system. The experimental results show that the model has better prediction performance than other mainstream prediction models. Tian et al. [40] proposed a multi-scale smoothing method to fill the missing values in traffic flow data, and based on this, established an LSTM model to predict traffic flow. Wang et al. [41] developed an LSTM encoder-decoder model combined with the attention mechanism of time series prediction. The model covers periodic patterns and recent time models. The results show that the model has high effectiveness and reliability in long-term time series prediction. Zhang et al. [42] proposed a short-term traffic prediction model that combines graph convolution operations and residual LSTM structures. The prediction effect of this model is better than six baseline models through evaluation on the traffic speed dataset. Xia et al. [43] proposed a distributed LSTM weighted model that combines time window and normal distribution to improve the traffic flow prediction ability. The experimental results show that the model has improved the prediction accuracy.

Attention mechanism

With the remarkable success of attention mechanism in many fields, researchers have also begun to apply it to traffic flow prediction. In the field of traffic flow prediction, many models have adopted attention mechanism in spatiotemporal dimension. Guo et al. [44] proposed an attention-based spatiotemporal graph convolutional network ASTGCN to more effectively capture the spatiotemporal relationship between nodes. Zhang et al. [45] proposed a spatiotemporal convolutional graph attention network (ST-CGA) to better extract the dependencies between global regions. Ali et al. [46] designed an attention-based network to capture the dynamic spatiotemporal correlation of traffic flow, thereby improving the prediction effect. Wang et al. [47] proposed an attention-based spatiotemporal graph attention network (ASTGAT) to solve the problems of network degradation and over-smoothing and enhance the ability to capture the dynamic correlation of traffic flow prediction. Qiu et al. [48] proposed an event-aware graph attention fusion network to effectively capture the spatiotemporal characteristics of traffic networks.

CBAM (Convolutional Block Attention Module) is a simple and efficient convolutional neural network attention module. It is able to generate attention feature maps in the two-dimensional domain of channel and feature space and adaptively adjust the original features. Since CBAM is a general end-to-end module, it can be easily integrated into the convolutional layer and jointly trained with the basic convolutional layer [49]. CBAM [22] enhances feature connectivity across both channel and spatial dimensions through sequential application of channel and spatial attention modules. Wang et al. [50] proposed a generative adversarial network (GAN) that combines high-order degradation models and CBAM, aiming to generate low-resolution remote sensing images, significantly improving image quality. This integration strategy effectively reduces noise interference and achieves significant improvements in texture and feature representation. In addition, Wang et al. [51] also proposed a model called MTCNet that combines multi-scale transformers with CBAM to improve the detection quality of different remote sensing image change detection tasks.

CBAM has become one of the most widely used attention mechanisms due to its high efficiency and lightweight. This mechanism takes into account both the channel dimension and the spatial dimension, effectively highlighting important features and compressing redundant features. The channel attention and spatial attention submodules are the core components of CBAM. The basic structure of each submodule is a multilayer perceptron (MLP), which includes an input layer, a hidden layer, and an output layer. Fig 2 shows the detailed structure of the network.

The channel attention module compresses the feature map in the spatial dimension (width and height), combining average pooling and maximum pooling. After the pooled vector is processed by a multi-layer perceptron (MLP) with shared weights, the element-by-element product of the output vector is added. Finally, the channel attention feature vector is obtained by the sigmoid activation function. The specific process is shown in formula (1):

(1)

where MC is the channel attention vector, indicating the importance of the channel, F is the feature map, AvgPool represents the average pooling, MaxPool represents the maximum pooling and represents the activation function of the sigmoid.

Spatial attention focuses on important spatial locations and performs average pooling and maximum pooling in the channel dimension. This module concatenates the two pooling matrices and then mixes them through a convolutional layer to reduce the number of channels. Finally, the result is processed by a sigmoid activation function to generate a spatial attention matrix. The specific process is shown in formula (2):

(2)

Among them, Conv is the convolution layer and MS is the spatial attention vector.

The CBAM attention mechanism effectively combines the channel attention module with the spatial attention module, which has been proven to achieve superior results. First, the channel attention module takes the feature map F as input and outputs the attention matrix MC(F). By multiplying these two components, a new feature map is generated. Next, the output of the previous stage is input into the spatial attention module, and the process is repeated to obtain the final feature map. Its mathematical expression is shown in formula (3):

(3)

Among them, MC is the channel attention vector, is the element-wise multiplication method, and MS is the spatial attention vector.

Method overview

This section will elaborate on the proposed method MSCALSTM, which can be divided into three main modules. The first module is the multi-scale convolution (MSCNN), which can extract multi-scale features, increase the receptive field, improve prediction accuracy and enhance generalization ability through multi-branch convolution. The second module is the multi-scale convolution block attention mechanism (MSCBAM), which can enhance the key feature representation and adaptively focus on important areas, thereby improving the model’s prediction performance and interpretability for time series data. The core of the discussion on enhancing interpretability of MSCBAM is that the spatial attention map generated by multi-scale CBAM can intuitively reflect the model’s focus on key areas in traffic flow data. Specifically, in the traffic flow prediction task, MSCBAM uses multi-scale convolution ( and convolution kernels) to capture spatial dependencies of different ranges, and adaptively assigns weights through the attention mechanism to visualize the model’s attention to important intersections or road sections. For example, the attention map can clearly show the model’s dependence on surrounding intersections or main roads when predicting traffic flow in a certain area, thereby revealing the key spatiotemporal dependencies in traffic flow. This design enhances the interpretability of the model. The third module is the LSTM and fully connected layer, which can capture the complex dependencies in the data, perform dimensionality transformation and feature combination, and generate the final prediction results. In this way, the proposed MSCALSTM can effectively focus on key features, enhance nonlinear modeling capabilities, and better adapt to the complexity of data. Specifically, The method combining multi-scale CNN, multi-scale CBAM and LSTM performs well in time series forecasting because it can effectively extract multi-scale features and capture short-term and long-term trends. Multi-scale CNN provides a combination of local and global information, while the attention mechanism of multi-scale CBAM dynamically adjusts feature weights, enhances the focus on important features, and better models the spatial relationship between features. LSTM is good at capturing temporal dependencies and can handle complex dynamic changes. The combination of the three makes the model more expressive in the fusion of spatial and temporal features, thereby significantly improving the prediction accuracy. The overall framework of MSCALSTM is shown in Fig 3. Among them, in MSCBAM represents concatenation and represents residual connection. in LSTM represents multiplication, and represents sigmoid.

Multi-scale convolution

First, the data is usually used as the input of the model in the form of (B, T, C), where B represents the batch size, T represents the time step, and C represents the number of features. In order to expand the input image size and better realize feature extraction, this paper introduces batches into the operation for the first time, and adjusts the data format to through dimensionality transformation. The input data is first processed by MSCNN for multi-scale convolution, and then the concatenated result is reduced in dimension. The dimension after dimensionality reduction is still (C, B, T). Specifically, three convolution kernels of size 11, 33 and 55 are set. Multi-scale convolution is shown in formula (4):

(4)

where fki represents the convolution operation with a convolution kernel size of ki, and Concat represents the concatenation operation.

Specifically, using convolution kernels of different sizes can capture features of different scales, enhance the richness of feature representation, effectively expand the receptive field, and enable the model to capture longer-range dependencies, thereby improving prediction accuracy. convolution focuses on the relationship between channels, convolution is suitable for capturing local details, and convolution can extract more extensive contextual information.

Multi-scale CBAM

The improvement of introducing multi-branch convolutional layers in CBAM enables the model to extract features of traffic flow data at different scales simultaneously, thereby more effectively modeling the spatial relationship between features. This design enhances the focus on important features related to traffic flow prediction, improves the flexibility and expressiveness of the model under complex traffic patterns, and significantly improves the accuracy of prediction.

Expanding the dimension enables the data to be better processed by the CBAM attention module, and the expanded dimension is in the form of (1, C, B, T). Then, the dimensionalized data is input into the MSCBAM attention module. Specifically, The multi-scale channel attention mechanism (MSCBAM) used in this paper improves on the traditional spatial attention module and uses convolution kernels of two different scales, and , to extract spatial features. This multi-scale convolution operation enhances the model’s ability to model the spatial relationship between features, thereby improving performance. MSCBAM weights the extracted features through the spatial attention mechanism. Similar to the traditional CBAM, MSCBAM calculates the importance of each spatial position. However, its uniqueness lies in the generation of spatial attention maps of different scales through and convolution layers. This design enables the model to effectively integrate local and global information, thereby more accurately identifying the most critical areas for traffic flow prediction.

The result after multi-scale convolution is reduced in dimension through convolution, and then the result after dimension reduction is passed through multi-scale CBAM to effectively enhance the key feature representation and adaptively focus on important areas, thereby improving the prediction performance and interpretability of the model for time series data. The MSCBAM attention operation can be expressed by equations (5).

(5)

In the above formula, F uses the dimension reduction result as the input feature map of the MSCBAM attention module. represents the multiplication between elements, MC is the channel attention extraction operation, and MMS is the multi-scale spatial dimension extraction operation.

The CAM attention calculation formula is shown in (6), where is the sigmoid function.

(6)

The formula of the MSSAM attention module is shown in (7), where represents the MSSAM result using a convolution kernel, and represents the MSSAM result using a convolution kernel. represents the result after concatenating and reducing the dimension of the MSSAM results of the two convolution kernels.

(7)

Among them, Concat represents the concatenation operation, represents the convolution, and represents the sigmoid function.

Finally, the dimension of the reduced result is converted to the original state (B, T, C) through the reshaping operation so that it can be better processed by the subsequent LSTM.

LSTM

The long short-term memory (LSTM) method consists of three gates: a forget gate, an input gate, and an output gate [9]. The forget gate uses the Sigmoid function to selectively filter the memory information of the previous moment and the new input information. When the value of the gate is 1, all information is retained; when the value is 0, all information is discarded. The design of this gate effectively alleviates the common gradient vanishing problem in the recurrent neural network (RNN) model. LSTM combines multiple gating mechanisms, such as the forget gate, to selectively retain important information from previous time steps and discard irrelevant information. This feature is crucial to LSTM’s ability to retain long-term dependencies in sequence data. The LSTM structure diagram is shown in Fig 4.

The result after MSCBAM processing is used as the input of LSTM. The formula for processing in LSTM is shown in formula (8):

(8)

Among them, and tanh represent the sigmoid function and tanh function respectively; xt is the input of the LSTM unit at time t. It is worth noting that xt represents the feature extracted by the MSCBAM proposed in this paper; Wf,Wi,Wc,Wo represent the weight matrices of the forget gate, input gate, input node and output gate respectively; bf,bi,bc,bo represent the bias vector of each gate respectively; ht−1 and ht represent the hidden states of the LSTM unit at time t–1 and time t; ft,it,ct and ot represent the outputs of the forget gate, input gate, input node and output gate respectively; is the candidate unit state at time t; Ct and Ct−1 represent the unit states at time t and t–1.

Finally, a fully connected layer is used to transform the dimension of the feature vector output by LSTM and map it to the dimension of the prediction target. At the same time, the output features are further extracted and combined to enhance the nonlinear modeling capability and output the final prediction result with the dimension (B, T).

Experimental analysis

Datasets

The proposed model is validated on four highway datasets and energy datasets in California, namely PeMSD3, PeMSD4, PeMSD7, PeMSD8 and Energy.

  1. PeMSD3: This dataset contains traffic flow data from 358 traffic collection nodes from September 1, 2018 to November 30, 2018.
  2. PeMSD4: It refers to the traffic data of the San Francisco Bay Area, which contains 3848 detectors on 29 roads. The time span of this dataset is from January to February 2018. This paper selects the first 50 days of the dataset as the training set, and the remaining data as the test set.
  3. PeMSD7: The traffic speed dataset is collected by the California Department of Transportation in the seventh district of California through 228 road traffic sensors, and the collected data samples are aggregated every 5 min. The dataset records the vehicle speed of the seventh district of California from May 1, 2012 to June 30, 2012.
  4. PeMSD8: It is the traffic data released in San Bernardino from July to August 2016, which contains 1979 detectors on 8 roads. This paper selects the first 50 days of data as the training set, and the last 12 days of data as the test set.
  5. Energy: It includes the monthly natural gas production data of a gas field in southwest China from 1992 to 2021.

Baseline model

This article compares the following models with the proposed MSCALSTM model:

  1. HA [52]: This model uses the weighted average of historical speed data to predict future speeds.
  2. ARIMA [53]: This model is a classic time series prediction model.
  3. LSTM: This model maintains data validity features through a gating mechanism.
  4. STGCN [1]: This model is a convolutional structure for traffic prediction tasks based on spatiotemporal GCN.
  5. DCRNN [24]: This model is a classic GNN-based spatiotemporal series prediction method that uses diffuse convolutional RNN to handle complex spatial dependencies and nonlinear temporal correlations in road networks.
  6. ASTGCN [44]: This model is an evolution of STGCN, which introduces the spatiotemporal attention mechanism of STCGCN.
  7. STSGCN [54]: This model is a spatiotemporal synchronous graph convolutional network that learns local spatiotemporal correlations through a spatiotemporal synchronous modeling mechanism.
  8. AGCRN [55]: This model is an adaptive GCN that learns spatiotemporal dependencies from spatiotemporal data and can perform graph convolution without a predefined spatial graph.
  9. STG-NCDE [56]: This model uses two differential neural control equations to handle the temporal and spatial dimensions of traffic prediction respectively.
  10. MAGRN [57]: This model is based on a multi-scale graph convolutional network recursive feature extraction framework with multi-scale attention and a dual attention mechanism for traffic flow prediction.
  11. TARGCN [58]: This model exploits the dynamic spatial correlation between traffic nodes and the temporal dependency between time slices for prediction.

Evaluation metric

Evaluation metrics are an important part of determining model performance. In order to evaluate the prediction effect of MSCALSTM, this paper uses MAE (mean absolute error), RMSE (root mean square error), MAPE (mean absolute percentage error), AIC (Akaike’s Information Criteria) and BIC (Bayesian information criterion) as the evaluation criteria of the evaluation method.

MAE is a metric equal to the mean absolute value of the difference between the truth yi and the predicted value , and its definition is shown in formula (9):

(9)

RMSE is the standard deviation of the prediction error presented in the data set around the prediction result, and its definition is shown in formula (10):

(10)

Among them, is the predicted value and yi is the truth.

MAPE measures the size of the error in percentage terms and is defined as shown in formula (11):

(11)

Among them, is the predicted value and yi is the truth.

AIC is a statistical metric used to quantify the trade-off between model complexity and goodness of fit. The specific formula for AIC is shown in (12). BIC is an alternative model selection criterion that introduces a stronger penalty for model complexity. The specific formula for BIC is shown in (13).

(12)(13)

where k is the number of estimated parameters, n is the number of recorded measurements, and L is the value of the likelihood. Both AIC and BIC are designed to decrease the model complexity by selecting the model with the lowest probability distribution.

Implement details

All experiments are conducted in PyTorch with the NVIDIA GeForce 4090 GPU and 48G memory. The dataset is divided with a ratio of 6:2:2 into training sets, validation sets, and test sets. We train for 200 epochs using the Adam optimizer, with a batch size of 64 on all datasets. We adopt the Adam optimizer with an initial learning rate of 0.003 to train the model. The learning rate will be decayed at the steps of 15, 40, 70, 105, 145 and 190 with a decay rate of 0.3. Moreover, the model parameters with the lowest loss on the verification set are saved as the best parameters and tested on the test set. To verify the applicability of the model on large-scale data sets, this paper further tests the computational efficiency of MSCALSTM under different data scales. Since traffic flow prediction and energy prediction often involve massive data, in order to better simulate large-scale data scenarios, we expand the original data set through time series generation algorithm and spatial dimension splicing respectively.

Time series generation algorithm: In order to expand the traffic flow data, we use the time series generation algorithm to expand the original data to generate large-scale simulation data. Similarly, we also use this method to expand the energy data set.

Spatial expansion: The detector data of PeMSD3 (358 nodes) and PeMSD7 (228 nodes) are horizontally spliced to construct a 586-node virtual road network (PeMSD3+7) to simulate the complexity of the city-level road network.

In order to verify the performance and scalability of the model in large-scale data scenarios, this paper designed a series of experiments through data expansion and optimization strategies. Through the time series generation algorithm, PeMSD4 (3848 nodes, 50 days of training data) and PeMSD8 (1979 nodes, 62 days of complete time series) were expanded to double the duration, and long sequence synthetic datasets containing 100 days (PeMSD4Large) and 124 days (PeMSD8Large) were constructed. As shown in Table 1, the single CPU training time of the expanded PeMSD4Large increased from 2.8 hours to 5.9 hours, and the memory usage increased from 6.2 GB to 12.8 GB, but the prediction accuracy MAE only increased slightly from 12.76 to 13.62 (an increase of 6.7%), while the training time of PeMSD8Large increased from 3.2 hours to 6.7 hours, the memory usage increased from 8.1 GB to 16.5 GB, and the MAE increased from 10.88 to 11.93 (an increase of 9.6%). This shows that the model is significantly robust to long-term dependencies. In the spatial dimension, the detector data of PeMSD3 (358 nodes) and PeMSD7 (228 nodes) are horizontally fused to form a 586-node virtual road network (PeMSD3+7). The video memory usage increases from 6.2 GB to 11.8 GB, but it can be controlled within 9.5 GB through dynamic batch adjustment (batch size is reduced from 64 to 32), and the single sample inference time is still maintained at 19 ms (meeting the real-time threshold <50 ms). In order to solve the problem of high resource consumption of extended data, a gradient accumulation strategy (steps = 4) is further proposed, which reduces video memory usage by 35% and increases training speed by 15% in long sequence training.

thumbnail
Table 1. Performance comparison on traffic datasets of different sizes.

https://doi.org/10.1371/journal.pone.0325474.t001

Similarly, to verify the model’s long-term time-series adaptability in the energy domain and its practicality in edge computing scenarios, we conducted systematic experiments involving data expansion and model optimization strategies. Firstly, we expanded the existing 30 years of energy data to generate an ultra-long time-series dataset, EnergyLarge (720 months). On this dataset, the prediction error MAE of MSCALSTM increased from 651.37 to 689.45 (a rise of 5.8%), which significantly outperformed the baseline model (HA’s increase was 33.2%), indicating that its multi-scale convolution and attention mechanism can effectively capture long-term trend features. Although the training time and memory usage exhibited an approximately linear increase with the sequence length (e.g., memory grew from 13.9 GB to 28.4 GB), employing a gradient accumulation strategy (steps=4) allowed us to compress it to 10.1 GB. Furthermore, targeting edge deployment requirements, we implemented FP16 quantization and parameter sparsification on the LSTM module. This reduced the model size by 40% and decreased the inference latency from 25 ms to 18 ms (an efficiency improvement of 28%), with a minimal accuracy loss of only 1.1% (MAE 697.21).

Likewise, as shown in Table 2, MSCALSTM continued to demonstrate good adaptability in the expanded energy dataset scenario. Even with the extended time series (720 months), its MAE only increased by 5.8%, indicating that the multi-scale convolution and attention mechanism possess a strong ability to capture long-term trends. Furthermore, through quantization and sparsification strategies, the model’s inference efficiency on edge devices was improved by 28%, providing feasibility for practical deployment.

thumbnail
Table 2. Performance comparison on traffic datasets of different scales.

https://doi.org/10.1371/journal.pone.0325474.t002

Results

Comparative experimental results

Tables 3 and 4 compare the results of the proposed model with 11 other benchmark models, all of which are evaluated on four datasets. Tables 3 and 4 highlight that our model, MSCALSTM, has the lowest error and the best performance.

thumbnail
Table 3. Average performance comparison of different methods on PeMSD3 and PeMSD4.

https://doi.org/10.1371/journal.pone.0325474.t003

thumbnail
Table 4. Average performance comparison of different methods on PeMSD7 and PeMSD8.

https://doi.org/10.1371/journal.pone.0325474.t004

Compared with traditional methods, the MSCALSTM model proposed in this paper is significantly better than traditional models such as HA and ARIMA. Although traditional methods can process time series data, they have obvious deficiencies in modeling nonlinear relationships. HA cannot effectively capture complex time series characteristics in traffic flow prediction, while the ARIMA model has high requirements for data stationarity, is difficult to process non-stationary traffic flow data, and has complex parameter adjustment. MSCALSTM effectively captures the complex nonlinear relationship of traffic flow and reflects the correlation between different regions by introducing multi-scale CNN and attention modules. It also introduces LSTM to capture long-term dependencies and complex dynamics, successfully overcoming the above shortcomings and improving prediction accuracy.

Compared with STGCN, STGCN has limited ability to capture long-term dependencies, and cannot fully consider changes at different time scales during feature extraction. It ignores the potential relationship between regions in spatial correlation modeling. MSCALSTM extracts features at different time scales through multi-scale CNN, enhances the ability to capture complex dynamic patterns, and combines the CBAM module to improve the attention of spatial features, so that the model can more accurately identify the correlation between regions. Compared with DCRNN, DCRNN mainly relies on diffuse convolution to capture spatial dependencies. Its advantage is that it can explicitly model the propagation process of the traffic network, but its limitation is that the fixed mode of diffuse convolution is difficult to adapt to the dynamically changing traffic environment. In addition, DCRNN has weak modeling ability for long-term dependencies, which limits its prediction accuracy in complex dynamic scenes. MSCALSTM extracts local and global spatial features through multi-scale CNN and combines LSTM to capture long-term dependencies, thereby achieving more robust predictions in dynamic scenes. Compared with ASTGCN, ASTGCN dynamically adjusts the spatial structure through adaptive graph convolution. Its advantage is that it can partially adapt to the dynamic changes of the traffic network. However, the single-scale design of its temporal convolution limits the extraction of multi-granularity temporal features, and the computational complexity of adaptive graph convolution is high. MSCALSTM extracts features of different time scales in parallel through multi-scale CNN, and dynamically allocates spatial attention weights in combination with lightweight CBAM, thereby significantly reducing computational overhead while ensuring prediction accuracy. Compared with STSGCN, STSGCN combines graph convolution and synchronous convolution to simultaneously model spatial and temporal dependencies. Its advantage is that it can explicitly capture local spatiotemporal relationships, but its complex design leads to a heavy computational burden and limited ability to model global dynamics. MSCALSTM significantly reduces computational complexity while ensuring global dynamic modeling capabilities through the cascade design of multi-scale CNN and LSTM, and dynamically adjusts spatial weights through CBAM, thereby achieving more efficient prediction in complex scenarios. Compared with AGCRN, AGCRN models spatiotemporal dependencies through adaptive graph convolution and RNN, and its advantage is that it can partially capture dynamic spatial relationships. However, its adaptive graph convolution has a weak ability to respond to sudden events, and the vanishing gradient problem of RNN in long-term time series modeling limits its ability to capture long-term dependencies. MSCALSTM dynamically perceives local and regional spatial dependencies through multi-scale CBAM, and flexibly adjusts the importance of temporal features by combining the gating mechanism of LSTM, significantly improving adaptability in dynamic scenes. Compared with MAGRN, MAGRN enhances information transfer through a multi-graph attention mechanism, and its advantage is that it can fuse multi-source graph structure information. However, its attention mechanism may cause information loss in complex scenes, and the temporal modeling ability of RNN limits its adaptability to rapid changes. MSCALSTM extracts features through multi-scale CNN layers, and combines the channel-spatial attention of CBAM to filter key information, effectively achieving accurate prediction. Compared with TARGCN, TARGCN combines the temporal attention mechanism with graph convolution, and its advantage is that it can dynamically adjust the weight of temporal features. However, its temporal attention mechanism has a weak response to short-term changes, and graph convolution fails to fully consider dynamic changes in spatial feature modeling. MSCALSTM extracts multi-granular temporal features through multi-scale CNN, and combines CBAM to dynamically perceive spatial dependencies, achieving more flexible prediction in dynamic scenes.

Figs 58 show the comparison results of the proposed model MSCALSTM with other baseline models on four PeMS datasets. The results show that on the four PeMS datasets, MSCALSTM shows significant advantages in MAE, RMSE and MAPE indicators, further proving the superiority of the proposed model on the four datasets.

Table 5 shows that the proposed MSCALSTM model achieves the best AIC and BIC values on the four datasets of PeMSD3, PeMSD4, PeMSD7 and PeMSD8 (e.g., AIC is 6618.65 and BIC is 6628.47 on PeMSD3; AIC is 8215.37 and BIC is 8227.56 on PeMSD4), which is significantly better than other baseline models. This excellent performance is due to the effective combination of multi-scale convolutional layers, multi-scale CBAM modules and LSTM, which enables it to simultaneously capture the spatial characteristics and temporal dependencies of traffic flow data. In addition, MSCALSTM shows low AIC and BIC values on all four datasets, which not only reflects its good trade-off between fitting data and model complexity, but also shows that it has strong generalization ability and robustness, and can adapt to different traffic flow scenarios. These results fully verify the effectiveness and practicality of MSCALSTM in traffic flow prediction tasks, and provide important reference value for subsequent research.

thumbnail
Table 5. Comparison of AIC and BIC of different methods on four datasets.

https://doi.org/10.1371/journal.pone.0325474.t005

Overall, MSCALSTM has significant advantages over the above models in terms of feature extraction, spatial correlation identification and long-term dependency processing, making it perform even better in traffic flow prediction. Through the combination of multi-scale CNN and CBAM, MSCALSTM can effectively capture complex spatiotemporal dynamics, improve the accuracy and adaptability of prediction, and is suitable for large-scale traffic flow prediction in practical applications. In addition, since the baseline model predicts multiple points when conducting experiments based on the PeMS data sets, it involves a wider spatial range, and the baseline model needs to handle the spatial correlation and dynamic changes between multiple points, resulting in model overfitting or it is difficult to capture the changing patterns of all points. In a multi-point prediction scenario, traffic data may be affected by a variety of external factors (such as weather, traffic accidents, etc.), and these noises will affect the prediction accuracy of the model. The MSCALSTM model in this article only predicts a single traffic flow node during the experiment, and the output of the model only involves data from one point, which naturally reduces the possibility of errors because it only needs to focus on the traffic changes at one location. MSCALSTM’s single-point prediction can also better avoid noise and can also be optimized for specific points. Therefore, its evaluation indicators such as MAE, RMSE and MAPE are significantly lower than other baseline models that predict multiple points.

To verify the cross-domain adaptability of the model, we supplemented the prediction experiment in the energy field (gas field production). This task has similar spatiotemporal dynamics to traffic flow prediction, but there are significant differences in data distribution and feature patterns.

Experimental results demonstrate that MSCALSTM outperforms baseline models on the energy dataset (Tables 6 and 7). Its multi-scale design can effectively capture local fluctuations and global trends in energy data, and the attention mechanism adaptively focuses on key time nodes. This proves the generalization ability of the method for heterogeneous spatiotemporal data and provides a general framework for cross-domain time series prediction.

thumbnail
Table 6. Comparison of average performance of different methods on the energy dataset.

https://doi.org/10.1371/journal.pone.0325474.t006

thumbnail
Table 7. Comparison of AIC and BIC of different methods on energy dataset.

https://doi.org/10.1371/journal.pone.0325474.t007

Ablation study

In order to demonstrate the impact of each module on the performance of the model, this paper will verify it through experiments in this section. The effectiveness of these modules is verified by removing different modules. The following three variants are designed to verify the impact of each module: (1) MSCALSTM-A: remove multi-scale spatial attention; (2) MSCALSTM-B: remove CBAM; (3) MSCALSTM-C: remove the multi-branch convolution used in the convolutional network and remove CBAM.

Tables 810 show the ablation study results on four PeMS datasets. It can be found that: (1) Without multi-scale spatial attention (i.e. MSCALSTM-A), its performance is poor. It shows that multi-scale spatial attention can enhance features by focusing on the importance of different spatial locations, and can effectively capture the dynamic relationship between regions. Without this part, the model will not be able to fully identify and exploit the interactions between different traffic flow areas, resulting in insufficient attention to important features. This will prevent the model from accurately capturing key spatial features when dealing with complex traffic patterns, thereby reducing prediction performance. The MAE value of MSCALSTM on PeMSD4 is reduced by 37.4% compared with MSCALSTM-A, indicating its key role in multi-granularity spatiotemporal feature extraction. (2) Without CBAM (i.e. MACALSTM-B), its performance is even worse. It shows that CBAM weights the features by applying channel and spatial attention simultaneously to enhance the model’s focus on important information. If CBAM is completely removed, the model will lose its ability to select features, resulting in excessive attention to unimportant or redundant features. This lack of information makes the model likely to ignore key features, thereby affecting the overall prediction ability. Especially in complex traffic flow data, the loss of important information will significantly reduce the prediction accuracy. The MAE value of MSCALSTM on PeMSD7 is reduced by 46.7% compared with MSCALSTM-B, which verifies the effectiveness of its dynamic focusing on key areas. (3) Without multi-branch convolution in the convolutional network, without CBAM (i.e., MSCALSTM-C), its performance is even worse. Multi-scale convolution is able to extract features from different scales, especially when traffic flow changes are diverse and complex. If multi-scale convolution is removed, the model can only rely on feature extraction at a single scale, which may not capture the diversity and complexity of traffic flows. This limitation can cause the model to perform poorly when faced with rapidly changing or cyclically fluctuating traffic flows, thus affecting the overall prediction accuracy. On this basis, CBAM lacks attention and processing of important information, so its performance is the worst. The RMSE value of MSCALSTM on PeMSD8 is reduced by 87% compared with MSCALSTM-C. This significant improvement highlights its critical role in long-term dependency modeling.

In addition, the comparison of AIC and BIC further supports the above conclusion: on the PeMSD3 dataset, MSCALSTM-A caused the AIC to increase from 6618.65 to 9097.42 (an increase of 37.5%), indicating that the model fitting ability has significantly decreased. The BIC of MSCALSTM-C on the PeMSD8 dataset increased from 7226.55 to 11523.15 (an increase of 59.4%), further verifying its irreplaceable role in time series modeling.

In order to present the results more intuitively, this paper provides a visual description of the experimental results in Figs 912. In summary, the above three modules are equally important in learning spatiotemporal correlations.

For the purpose of verify the scalability of MSCALSTM in other fields, we conducted ablation experiments on the energy dataset. The experimental results are shown in Table 11.

In order to more intuitively demonstrate the superiority of MSCALSTM and highlight the practicality of each module, we show the result graphs of the actual and predicted values of the ablation experiments on PeMSD4 and PeMSD8, two commonly used datasets in traffic flow. As shown in Figs 1316 MSCALSTM has a smaller error in the area of traffic flow mutation (peak/trough), and MSCALSTM–C has a significant lag in the long-term trend. This shows that the model combines key technologies such as MSCNN, MSCBAM and LSTM, giving full play to the advantages of these modules in time series feature extraction, adaptive attention, temporal dependency modeling and feature fusion, and shows excellent performance in time series prediction tasks. Whether it is the overall trend curve or the local details, the fitting effect is very good, which can highly fit the complex dynamic characteristics of real data. In order to better observe the curve trend, peak and valley characteristics and subtle changes, some key curves are locally enlarged, as shown in Figs 1416. MSCALSTM can accurately capture the instantaneous fluctuations of high-density traffic flow, while other ablation models have excessive smoothing in response to such nonlinear changes.

thumbnail
Fig 14. Local amplification of true and predicted values on Pemsd4.

https://doi.org/10.1371/journal.pone.0325474.g014

thumbnail
Fig 16. Local amplification of true and predicted values on Pemsd8.

https://doi.org/10.1371/journal.pone.0325474.g016

Attention visualization and interpretability

Taking the multi-scale spatial attention module in MSCBAM as an example. As shown in Fig 17, the left panel shows the impact of the convolutional kernel on the spatial attention heatmap within MSCBAM. The high-weight regions are concentrated within the grid surrounding the target point, indicating that the model prioritizes real-time traffic conditions and can quickly respond to instantaneous changes. The right panel displays the influence of the convolutional kernel on the spatial attention heatmap in MSCBAM. This design enables the model to capture long-distance traffic propagation patterns, reflecting long-term trend features, and thus covers a broader area.

Statistical significance analysis

To rigorously validate the superiority of MSCALSTM, we conducted t-tests on the MAE metric across all datasets, comparing our model against both traditional baseline models and state-of-the-art approaches. As shown in Tables 1216, at a significance level of = 0.05, the performance differences between MSCALSTM and all the comparative models were statistically significant (all p-values were less than 0.001). This analysis further confirms that MSCALSTM’s superior prediction accuracy is not a random occurrence but stems from the design advantages of its multi-scale feature extraction and dynamic attention mechanism, thereby enhancing the credibility and persuasiveness of our research findings.

thumbnail
Table 12. MAE significance comparison between MSCALSTM and baseline models based on PeMSD3 dataset.

https://doi.org/10.1371/journal.pone.0325474.t012

thumbnail
Table 13. MAE significance comparison between MSCALSTM and baseline models based on PeMSD4 dataset.

https://doi.org/10.1371/journal.pone.0325474.t013

thumbnail
Table 14. MAE significance comparison between MSCALSTM and baseline models based on PeMSD7 dataset.

https://doi.org/10.1371/journal.pone.0325474.t014

thumbnail
Table 15. MAE significance comparison between MSCALSTM and baseline models based on PeMSD8 dataset.

https://doi.org/10.1371/journal.pone.0325474.t015

thumbnail
Table 16. MAE significance comparison between MSCALSTM and baseline models based on Energy dataset.

https://doi.org/10.1371/journal.pone.0325474.t016

As demonstrated in Tables 1216, all the compared p-values are significantly smaller than 0.05, indicating that MSCALSTM’s MAE is significantly lower than all the baseline models. The difference is most pronounced when compared to traditional models (HA, ARIMA) and the foundational model (LSTM), where the absolute t-values reach 45 to 63 on PeMSD7, highlighting MSCALSTM’s significant advantage in complex spatio-temporal modeling. While the differences compared to some advanced models (such as STGCN and TARGCN) are relatively smaller, with absolute t-values ranging from 6.9 to 22.8, they remain statistically significant. This suggests that the improvements in MSCALSTM’s model architecture design consistently lead to better performance.

Conclusions

This paper proposes a time series prediction model named MSCALSTM, which extracts time series features with a new idea. It extracts features through dimensional transformation, making the feature information of each dimension more closely aggregated. At the same time, the model also adds batch dimensions to more efficiently extract the temporal correlation between channels in different time periods. The model integrates multi-scale convolutional neural network (MSCNN) and multi-scale convolutional block attention mechanism (MSCBAM) to effectively capture multi-scale dynamic patterns in time series data and adaptively focus on key features, and uses long short-term memory network (LSTM) to model complex time-dependent characteristics. The organic combination of this deep neural network architecture enables the MSCALSTM model to be highly adaptable to the complex characteristics of nonlinear time series data, greatly improving its prediction accuracy. Compared with traditional time series analysis models, the model shows excellent performance in capturing subtle fluctuations and abnormal changes in data, reducing the error value, and greatly improving the accuracy and robustness of the prediction results.

In order to verify the accuracy of the model in time series prediction, this paper conducted a large number of experiments on the PEMS dataset. The results show that the performance achieved by the MSCALSTM model significantly exceeds the baseline method, proving that the model is more suitable for traffic flow prediction.The complete flowchart of the computational process is shown in Fig 18.

References

  1. 1. Yu B, Yin H, Zhu Z. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv, preprint, 2017. https://doi.org/10.48550/arXiv.1709.04875
  2. 2. Cipra T, Romera R. Robust Kalman filter and its application in time series analysis. Kybernetika. 1991;27(6):481–94.
  3. 3. Wang W, Men C, Lu W. Online prediction model based on support vector machine. Neurocomputing. 2008;71(4–6):550–8.
  4. 4. Qiao W, Haghani A, Hamedi M. A Nonparametric model for short-term travel time prediction using bluetooth data. J Intell Transp Syst. 2012;17(2):165–75.
  5. 5. Cui H, Radosavljevic V, Chou F-C, Lin T-H, Nguyen T, Huang T-K, et al. Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada. 2019, pp. 2090–6. https://doi.org/10.1109/ICRA.2019.8793868
  6. 6. Rao GM, Ramesh D. Parallel CNN based big data visualization for traffic monitoring. J Intell Fuzzy Syst. 2020;39(3):2679–91.
  7. 7. Ramakrishnan N, Soni T. Network traffic prediction using recurrent neural networks. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE; 2018.
  8. 8. Li W, Tao W, Qiu J, Liu X, Zhou X, Pan Z. Densely connected convolutional networks with attention LSTM for crowd flows prediction. IEEE Access. 2019;7:140488–98.
  9. 9. Hochreiter S. Long short-term memory. MIT Press; 1997.
  10. 10. Kwon D-H, Kim J-B, Heo J-S, Kim C-M, Han Y-H. Time series classification of cryptocurrency price trend based on a recurrent LSTM neural network. J Inf Process Syst. 2019;15(3):694–706.
  11. 11. Chen Q, Wen D, Li X, Chen D, Lv H, Zhang J, et al. Empirical mode decomposition based long short-term memory neural network forecasting model for the short-term metro passenger flow. PLoS One. 2019;14(9):e0222365. pmid:31509599
  12. 12. Jiang H, Jiao R, Wang Z, Zhang T, Wu L. Construction and analysis of emotion computing model based on LSTM. Complexity. 2021;2021(1):8897105.
  13. 13. Yuan X, Chen C, Jiang M, Yuan Y. Prediction interval of wind power using parameter optimized beta distribution based LSTM model. Appl Soft Comput. 2019;82:105550.
  14. 14. Deng D, Jing L, Yu J, Sun S. Sparse self-attention LSTM for sentiment lexicon construction. IEEE/ACM Trans Audio Speech Lang Process. 2019;27(11):1777–90.
  15. 15. Kim T-Y, Cho S-B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy. 2019;182:72–81.
  16. 16. Song X, Yang F, Wang D, Tsui K-L. Combined CNN-LSTM network for state-of-charge estimation of lithium-ion batteries. IEEE Access. 2019;7:88894–902.
  17. 17. Barzegar R, Aalami MT, Adamowski J. Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch Environ Res Risk Assess. 2020;34(2):415–33.
  18. 18. Vidal A, Kristjanpoller W. Gold volatility prediction using a CNN-LSTM approach. Expert Syst Appl. 2020;157:113481.
  19. 19. Qin Y, et al. A dual-stage attention-based recurrent neural network for time series prediction. arXiv, preprint, 2017.
  20. 20. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2018.
  21. 21. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 2016, pp. 770–8. https://doi.org/10.1109/CVPR.2016.90
  22. 22. Woo S, Park J, Lee JY, Kweon IS. CBAM: convolutional block attention module. Cham: Springer; 2018.
  23. 23. Nauta M, Bucur D, Seifert C. Causal discovery with attention-based convolutional neural networks. Mach Learn Knowl Extraction. 2019;1(1):312–40.
  24. 24. Li Y, Yu R, Shahabi C, Liu Y. Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv, preprint, 2017. https://doi.org/10.48550/arXiv.1707.01926
  25. 25. Ma Q, Sun W, Gao J, Ma P, Shi M. Spatio-temporal adaptive graph convolutional networks for traffic flow forecasting. IET Intell Trans Sys. 2022;17(4):691–703.
  26. 26. Chen J, Zheng L, Hu Y, Wang W, Zhang H, Hu X. Traffic flow matrix-based graph neural network with attention mechanism for traffic flow prediction. Inf Fusion. 2024;104:102146.
  27. 27. Jiang J, Han C, Zhao XW, Wang J. PDFormer: propagation delay-aware dynamic long-range transformer for traffic flow prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press; 2023.
  28. 28. He D, Zhao J, Jin Z, Huang C, Zhang F, Wu J. Prediction of bearing remaining useful life based on a two-stage updated digital twin. Adv Eng Inf. 2025;65:103123.
  29. 29. Han J, Zeng P. Short-term power load forecasting based on hybrid feature extraction and parallel BiLSTM network. Comput Electrical Eng. 2024;119:109631.
  30. 30. Sergeev A, Baglaeva E, Subbotina I. Hybrid model combining LSTM with discrete wavelet transformation to predict surface methane concentration in the Arctic Island Belyy. Atmos Environ. 2024;317:120210.
  31. 31. He D, Zhang Z, Jin Z, Zhang F, Yi C, Liao S. RTSMFFDE-HKRR: a fault diagnosis method for train bearing in noise environment. Measurement. 2025;239:115417.
  32. 32. Polson NG, Sokolov VO. Deep learning for short-term traffic flow prediction. Transp Res Part C Emerg Technol. 2017;79:1–17.
  33. 33. Chiu M-C, Hsu H-W, Chen K-S, Wen C-Y. A hybrid CNN-GRU based probabilistic model for load forecasting from individual household to commercial building. Energy Rep. 2023;9:94–105.
  34. 34. Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, et al. T-GCN: a temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst. 2020;21(9):3848–58.
  35. 35. Peng H, Wang H, Du B, Bhuiyan MZA, Ma H, Liu J, et al. Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Inf Sci. 2020;521:277–90.
  36. 36. Zhang G, Bai X, Wang Y. Short-time multi-energy load forecasting method based on CNN-Seq2Seq model with attention mechanism. Mach Learn Appl. 2021;5:100064.
  37. 37. Sameen MI, Pradhan B. Severity prediction of traffic accidents with recurrent neural networks. Appl Sci. 2017;7(6):476.
  38. 38. Ma X, Tao Z, Wang Y, Yu H, Wang Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp Res Part C Emerg Technol. 2015;54:187–97.
  39. 39. Zhao Z, Chen W, Wu X, Chen PCY, Liu J. LSTM network: a deep learning approach for short-term traffic forecast. IET Intell Trans Sys. 2017;11(2):68–75.
  40. 40. Tian Y, Zhang K, Li J, Lin X, Yang B. LSTM-based traffic flow prediction with missing data. Neurocomputing. 2018;318:297–305.
  41. 41. Wang Z, Zhang L, Ding Z. Hybrid time-aligned and context attention for time series prediction. Knowl-Based Syst. 2020;198:105937.
  42. 42. Zhang Y, Cheng T, Ren Y, Xie K. A novel residual graph convolution deep learning model for short-term network-based traffic forecasting. Int J Geogr Inf Sci. 2019;34(5):969–95.
  43. 43. Xia D, Zhang M, Yan X, Bai Y, Zheng Y, Li Y, et al. A distributed WND-LSTM model on MapReduce for short-term traffic flow prediction. Neural Comput Appl. 2020;33(7):2393–410.
  44. 44. Guo S, Lin Y, Feng N, Song C, Wan H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33(1):922–9.
  45. 45. Zhang X, Huang C, Xu Y, Xia L. Spatial-temporal convolutional graph attention networks for citywide traffic flow forecasting. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York, NY: Association for Computing Machinery; 2020.
  46. 46. Ali A, Zhu Y, Zakarya M. Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci. 2021;577:852–70.
  47. 47. Wang Y, Jing C, Xu S, Guo T. Attention based spatiotemporal graph attention networks for traffic flow forecasting. Inf Sci. 2022;607:869–83.
  48. 48. Qiu Z, Zhu T, Jin Y, Sun L, Du B. A graph attention fusion network for event-driven traffic speed prediction. Inf Sci. 2023;622:405–23.
  49. 49. Zeng H, Jiang C, Lan Y, Huang X, Wang J, Yuan X. Long short-term fusion spatial-temporal graph convolutional networks for traffic flow forecasting. Electronics. 2023;12(1):238.
  50. 50. Wang L, Yu Q, Li X, Zeng H, Zhang H, Gao H. A CBAM-GAN-based method for super-resolution reconstruction of remote sensing image. IET Image Process. 2023;18(2):548–60.
  51. 51. Wang W, Tan X, Zhang P, Wang X. A CBAM based multiscale transformer fusion approach for remote sensing image change detection. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2022;15:6817–25.
  52. 52. Liu J, Guan W. A summary of traffic flow forecasting methods. J Highway Transp Res Dev. 2004;21(3):82–5.
  53. 53. Williams BM, Hoel LA. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng. 2003;129(6):664–72.
  54. 54. Song C, et al. Spatial-temporal synchronous graph convolutional networks: a new framework for spatial-temporal network data forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(1):914–21.
  55. 55. Bai L, Yao L, Li C, Wang X, Wang C. Adaptive graph convolutional recurrent network for traffic forecasting. In: Advances in Neural Information Processing Systems 33. 2020, pp. 17804–15.
  56. 56. Choi J, Choi H, Hwang J, Park N. Graph neural controlled differential equations for traffic forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press; 2022.
  57. 57. Xiong L, Hu Z, Yuan X, Ding W, Huang X, Lan Y. Multi-scale attention graph convolutional recurrent network for traffic forecasting. Cluster Comput. 2023;27(3):3277–91.
  58. 58. Yang H, Jiang C, Song C, Deng Z, Bai X, Fan W. TARGCN: temporal attention recurrent graph convolutional neural network for traffic prediction. Complex Intell Syst. 2024;10:1–18.