STIL-TA: A new model of traffic flow forecasting based on spatiotemporal interactive learning and temporal attention

Linlong Chen; Linbiao Chen; Hongyan Wang; Jian Zhao

doi:10.1371/journal.pone.0331095

Abstract

Accurate traffic flow forecasting plays a critical role in alleviating urban road congestion. Despite the success of existing models (e.g., graph-based or attention-based methods), three key limitations persist: (1) inflexible spatial dependency modeling, where static graph structures fail to adapt to dynamic traffic patterns; (2) decoupled spatiotemporal learning, where spatial and temporal correlations are processed separately, leading to information loss; and (3) limited long-term trend awareness, as traditional attention mechanisms overlook local contextual cues (e.g., rush-hour fluctuations). To address this, a new model of traffic flow forecasting based on Spatiotemporal Interactive Learning and Temporal Attention (STIL-TA) is proposed. This model effectively enhances the accuracy of traffic flow predictions by jointly modeling the spatiotemporal characteristics of road networks. Specifically, STIL-TA consists of two key components: (1) an interactive learning module built upon interactive dynamic graph convolution, which adopts a divide-and-conquer strategy to synchronize interactions and share the dynamically captured spatiotemporal features across different time periods, and (2) a temporal multi-head trend-aware self-attention mechanism, which utilizes local contextual information to transform the numerical sequence, enabling the capture of dynamic temporal dependencies in traffic flow and improving long-term prediction accuracy. Experimental results on four real-world traffic datasets demonstrate that the proposed STIL-TA model outperforms existing approaches, achieving significant improvements in forecasting accuracy.

Citation: Chen L, Chen L, Wang H, Zhao J (2025) STIL-TA: A new model of traffic flow forecasting based on spatiotemporal interactive learning and temporal attention. PLoS One 20(8): e0331095. https://doi.org/10.1371/journal.pone.0331095

Editor: Jinlei Zhang, School of Systems Science, Beijing Jiaotong University, CHINA

Received: June 5, 2025; Accepted: August 11, 2025; Published: August 25, 2025

Copyright: © 2025 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are publicly available from the figshare repository (https://figshare.com/s/e4d9f6d9fddb01e5b6aa).

Funding: This work is supported by the Youth Science and Technology Talent Growth Project of Guizhou Provincial Education Department (No. QJJ2024272); the University Research Fund Project of Guiyang Institute of Humanities and Technology (2024rwjs04).

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

As urbanization accelerates, traffic problems have become increasingly severe, particularly in large cities. The continuous growth in traffic volume has led to the overload of road traffic systems and frequent congestion. To address this challenge, governments and research institutions worldwide have actively promoted the development and application of Intelligent Transportation Systems (ITS) [1], with accurate traffic flow prediction being regarded as a core component of ITS. Traffic flow prediction not only significantly enhances transportation efficiency but also effectively alleviates traffic congestion, reduces accident rates, minimizes energy consumption, and improves environmental quality.

Spatiotemporal traffic data, generated by sensors deployed at traffic network nodes, consists of sequential traffic data recorded at fixed time intervals. These data reflect the real-time operational state of the traffic network, making them a key data source for traffic flow prediction research. However, urban traffic conditions are influenced by various factors, and traffic flow exhibits high dynamic variation and uncertainty over time, posing significant challenges to accurate prediction. Therefore, the primary task in addressing traffic flow prediction is to identify effective methods that can capture the spatiotemporal data characteristics and handle its complex uncertainties.

Currently, the main approaches to traffic flow prediction can be categorized into three types: statistical methods, machine learning methods, and deep learning methods. Statistical methods capture the temporal characteristics of traffic flow through historical data analysis, with common models including the historical average model [2], the Autoregressive Moving Average (ARMA) model [3], and the Vector Autoregression (VAR) model [4]. For instance, Hamed et al. [5] proposed the ARIMA model for traffic flow prediction based on the analysis of historical data. However, statistical methods often struggle to handle nonlinear and high-dimensional complex data, leading to limitations in their predictive capabilities. To overcome these shortcomings, machine learning models have emerged, such as k-Nearest Neighbors (KNNs) [6] and Support Vector Regression (SVR) [7]. Mathew and Rawther [8] incorporated the correlation between traffic flows into the prediction by optimizing the k-NN classifier, while Wu et al. [9] applied SVR for travel time prediction, demonstrating its strong generalization ability. Nonetheless, machine learning models are still influenced by prior knowledge and parameter selection, making it difficult to guarantee consistent prediction performance.

In recent years, deep learning models, particularly deep belief networks (DBNs) and recurrent neural networks (RNNs), have shown significant potential in handling time-series data. Huang et al. [10] introduced a more accurate traffic flow prediction method by combining DBNs with regression models. RNNs [11,12] and their variants, such as Long Short-Term Memory (LSTM) [13] networks and Gated Recurrent Units (GRUs) [14], have become the mainstream approaches for time-series data due to their exceptional memory capabilities, achieving remarkable performance, especially in traffic flow forecasting. Despite the ability of statistical and machine learning methods to capture temporal characteristics of traffic flow effectively, they still exhibit considerable limitations when faced with high-dimensional traffic networks with spatiotemporal correlations [15].

Convolutional Neural Networks (CNNs) offer a novel solution to this problem. With their strong spatial feature extraction capabilities, CNNs help capture the spatial correlations within traffic networks. The integration of CNNs with RNNs has emerged as a promising approach to effectively model the spatiotemporal dependencies of traffic flow, becoming a hot topic in current research. For instance, Shi et al. [16] proposed a Convolutional LSTM (ConvLSTM) network that incorporates CNNs for extracting spatial features of road networks and combines them with LSTM to process temporal data, thus enabling spatiotemporal dependency modeling of traffic flow. Ke et al. [17] introduced the FCL-Net model, which combines ConvLSTM, standard LSTM, and convolutional layers, allowing it to address both spatiotemporal dependencies and external influencing factors. Zhang et al. [18], based on deep residual CNNs, proposed the ST-ResNet model for pedestrian flow prediction within urban areas, which demonstrated significant effectiveness.

Furthermore, many existing studies define various adjacency matrices to represent the deep structure of traffic flow in order to capture hidden dynamic spatial features. For instance, Wu et al. [19] use GCNs, 1D CNNs, and adaptive adjacency matrices to learn hidden spatial features, while Song et al. [20] combine multiple adjacency matrices with embedded structures for traffic flow prediction. Adaptive adjacency matrices explore hidden relationships between road network nodes to improve the model’s learning of the spatial heterogeneity of traffic flow. However, as training halts, adaptive adjacency matrices fail to learn the time-varying dynamic relationships between graph nodes. Despite the positive progress made by existing methods in capturing spatiotemporal features, there remains a deficiency in the interactive learning ability of dynamic spatiotemporal features. This limitation restricts the model’s capacity to perceive the periodicity, trend changes, and dynamic features of traffic flow, ultimately affecting prediction accuracy.

To address this challenge, we propose a novel traffic flow prediction model, STIL-TA, which significantly enhances prediction accuracy by fully exploiting the dynamic spatiotemporal features within traffic flow time series. First, we introduce a dynamic graph convolutional network (DGCN) that leverages prior knowledge to generate dynamic graphs, thereby capturing the latent spatial features of traffic flow. By embedding DGCN into an interactive learning framework, we construct the interactive dynamic graph convolutional network (IDGCN). This network analyzes the periodicity of traffic flow, segments the sequence into time intervals, and captures deep spatiotemporal dynamic features through interactive learning between subsequences. Additionally, we propose adaptive adjacency matrices and dynamic adjacency matrices to further uncover the time-varying dynamic relationships between nodes. To better handle the nonlinear temporal characteristics of traffic flow, we introduce a temporal multi-head trend-aware self-attention mechanism (TMHTAAtt) module, which effectively perceives local context and further integrates spatiotemporal features. By considering the periodicity and dynamic properties of traffic flow, STIL-TA overcomes the limitations of existing methods in modeling spatiotemporal dependencies, effectively capturing the dynamic spatiotemporal features of traffic flow and making it well-suited for traffic flow prediction tasks.

The main contributions are summarized as follows:

A novel spatiotemporal traffic flow prediction model, STIL-TA: This model embeds dynamic graph convolution into an interactive learning framework and introduces a new temporal multi-head trend-aware self-attention mechanism. The interactive learning structure captures spatiotemporal dependencies, while the self-attention mechanism further integrates spatiotemporal features, improving prediction accuracy.
A dynamic graph convolutional network is proposed to capture spatiotemporal features. The network generates a fusion of adaptive adjacency matrices and learnable adjacency matrices. The former captures the heterogeneity of traffic flow time series, while the latter learns the dynamic relationships between road network nodes.
An innovative temporal multi-head trend-aware self-attention mechanism is designed, enabling effective perception of local context and in-depth exploration of dynamic temporal relationships in traffic flow, further enhancing the model’s prediction accuracy.
Extensive comparative experiments on four traffic datasets demonstrate that the proposed model outperforms existing baseline methods in terms of prediction performance, achieving the best predictive results.

2 Related work

2.1 Traffic flow forecasting

Traffic flow forecasting has long been a critical research area within the field of transportation information systems. As one of the key issues in traffic management and optimization, it has attracted significant attention from the academic community. Over the past several decades, researchers have developed a variety of accurate traffic flow prediction models, utilizing both traditional mathematical models and data-driven approaches. Traditional mathematical models typically rely on statistical methods to analyze historical traffic data, assuming that future traffic flow shares certain similarities with past data, thereby enabling predictions.

With the continuous advancement of computational power and the rise of artificial intelligence technologies, traffic flow forecasting has once again become a focal point of research. Although traditional autoregressive models (e.g., ARIMA, VAR) and support vector regression (SVR) have shown promising results in certain applications, they often struggle to effectively handle non-stationary or complex time-series data, resulting in suboptimal forecasting performance in many real-world scenarios. In contrast to these traditional methods, deep neural networks and their variants, such as Long Short-Term Memory networks and Gated Recurrent Units, have demonstrated superior performance in capturing temporal dependencies within traffic flow data. These models are capable of extracting features from large volumes of sequential data and learning complex patterns. However, simple RNN models are still limited in their ability to leverage the spatial information inherent in traffic data. As a result, modeling spatial dependencies within spatiotemporal data has become a major challenge. To address this issue, researchers have explored the use of CNNs to capture spatial variations in Euclidean space. However, this approach is restricted to regular grid data. More recently, research has shifted towards the use of GCNs to model non-Euclidean relationships in road networks [21]. This method has shown significant potential in simulating the spatiotemporal dependencies of traffic flow, demonstrating its effectiveness in capturing the complex interactions within transportation systems [22,23].

Recently, Zhang et al. [24] propose the Event Flow Transformer Network (EF-former), a deep learning model for multi-step passenger flow prediction in Urban Rail Transit (URT) during large-scale events, using normal and extra outflow data to predict actual outflow and identify sudden passenger flow occurrences. Qiu et al. [25] propose a Spatial–Temporal Multi-Task Learning (STMTL) framework for predicting short-term passenger inflow and outflow in Urban Rail Transit (URT) systems during holidays. The framework includes a Multi-Graph Channel Attention Network (MGCA) that extracts and integrates both static and dynamic spatial dependencies from inter-station interaction graphs. Zhang et al. [26] propose a Multi-Frequency Spatial-Temporal Graph Neural Network (MFST-GNN) for accurately predicting metro Origin-Destination (OD) demand during public health emergencies. The model leverages multiple OD demand patterns, including real-time, daily, and weekly, to capture periodic spatial-temporal features. It includes a multi-frequency temporal feature extraction module for periodic temporal features and an adaptive spatial feature extraction module for complex hidden spatial features. Zhang et al. [27] propose a unified framework named Physics-Guided Adaptive Graph Spatial-Temporal Attention Network (PAG-STAN) for predicting metro Origin-Destination (OD) demand under pandemic conditions. Specifically, PAG-STAN includes a real-time OD estimation module to estimate complete real-time OD demand matrices and a novel dynamic OD demand matrix compression module to generate dense real-time OD demand matrices.

In recent years, the application of Large Language Models (LLM) [28] in the field of spatio-temporal data analysis has made significant progress, especially in the task of traffic flow prediction, where researchers have significantly improved the prediction accuracy and interpretability of the models through techniques such as fine-tuning, dynamic modeling, and multimodal alignment. For example, Zhao et al. [28] propose a novel method named Large Language Model Enhanced Traffic Flow Predictor (LEAF) to improve traffic flow forecasting by integrating LLM. LEAF consists of two branches: one using graph structures and the other using hypergraph structures to capture different spatio-temporal relationships. These branches are pre-trained separately and generate different predictions during testing. A large language model is then employed to select the most likely prediction result. Additionally, a ranking loss is applied as the learning objective to enhance the prediction capabilities of both branches. Guo et al. [29] propose a traffic flow prediction model named xTP-LLM, which leverages LLM to generate explainable traffic predictions. The model converts multi-modal traffic data into natural language descriptions to capture complex time-series patterns and external factors. The LLM framework is fine-tuned with language-based instructions to align with spatial-temporal traffic flow data.

2.2 Graph Convolution Networks

The Graph Convolutional Networks (GCNs) is an innovative technique that extends traditional convolution methods to graph-structured data. GCN approaches are generally classified into two main types: one that generalizes spatial neighborhood aggregation through convolutional filters, with the key challenge being the selection of neighboring nodes. A seminal work in this area is the graph convolution method based on attention mechanisms, proposed by Veličković et al. [30]. This method enhances the flexibility and expressive power of graph convolutions by assigning different weights to neighboring nodes. Another approach extends the convolution operation to the spectral domain via Fourier transforms, thereby enabling the processing of graph-structured data. For instance, Kipf and Welling [31] further simplified this approach, proposing the Graph Convolutional Network (GCNs), which optimizes the processing efficiency of graph data. With the continuous evolution of GCN technology, many improved models have emerged. For example, Li et al. [22] proposed the DCRNN model based on ChebNet, which effectively learns the spatial diffusion process of traffic flow data. Wu et al. [32] proposed an adaptive adjacency matrix that learns the graph structure and captures temporal dependencies through a data-driven approach. Additionally, Guo et al. [33] introduced a spatiotemporal attention mechanism that significantly enhances the model’s ability to learn dynamic spatiotemporal features.

The initial development of graph convolutions was marked by the work of Bruna et al. [34], who proposed a general graph convolution architecture based on the graph Laplacian operator. This laid the foundation for the field. Later, Defferrard et al. [35] introduced a Chebyshev polynomial approximation in graph theory, successfully circumventing the high computational cost associated with calculating the Laplacian eigenvectors, and achieved significant results. Building upon this, Yu et al. [36] proposed a gated graph convolutional network (GCN) specifically designed for traffic flow prediction. However, this model struggled to fully capture the dynamic spatiotemporal correlations present in traffic data.

In contrast, spatial-domain graph convolution [20] employs inductive learning to directly extract spatial features of nodes in the graph, as well as the spatial features of dynamically changing graphs. For example, Hechtlinger et al. [37] proposed the Graph CNNs model, where the convolution operation is defined as constructing neighborhoods using a random walk method and selecting a fixed number of neighboring nodes based on their expected size to build the neighborhood. Hamilton et al. [38] performed a certain number of samples on adjacent nodes and used an aggregation function to gather information from neighboring nodes to predict the node’s value. Although these methods have made certain improvements in capturing the dynamic spatiotemporal features of traffic flow, they still exhibit poor interactive learning capabilities in the spatiotemporal modules that extract dynamic spatiotemporal features. This limitation affects the ability of traffic flow prediction models to perceive the periodicity and trend variations of time series and results in insufficient capture of the dynamic spatiotemporal features of traffic flow. Chen et al. [39] propose a Time-Aware Structural Semantic Coupled Graph Network (TASSGN) that learns both structural and semantic features of graphs through a new graph learning module, captures temporal features with a self-sampling method and a time-aware graph encoder, and generates sparse graphs to capture unique node features. Jiang et al. [40] propose a recurrent network utilizing a memory network and incorporate this concept into the Meta-Graph Convolutional Recurrent Network (MegaCRN) by integrating a Meta-Graph Learner, enhanced by a Meta-Node Bank, into the GCRN encoder-decoder framework. Lai et al. [41] propose a Long-term Explicit–Implicit Spatio-Temporal Network (LEISN) for traffic flow prediction. This network features a Long-term Dependency Module to store hidden states from multiple previous time steps and utilizes two graph convolution-based branches to extract explicit and implicit spatial features. The fusion of all these features enables the prediction of the next state.

2.3 Attention mechanism

The attention mechanism has gained widespread application in various fields such as natural language processing, traffic flow prediction, and speech recognition due to its flexibility and efficacy in modeling complex dependencies. The core concept of attention is to identify and focus on the most relevant portions of the vast amounts of data, thereby enhancing the model’s expressive power and predictive performance. For instance, Liang et al. [42] introduced a multi-layer attention mechanism designed to model the dynamic spatiotemporal correlations between geographical sensors, effectively addressing the challenges of spatiotemporal data prediction. However, this approach often requires the training of a separate model for each time series, which can be computationally expensive. Guo et al. [33] proposed a graph convolutional network with an attention mechanism, which achieved promising results in traffic flow prediction.

In contrast to previous methods, this paper considers both the graph structure of traffic networks and the dynamic spatiotemporal characteristics of traffic data. We propose a novel temporal multi-head trend-aware self-attention mechanism, which not only captures the dynamic dependencies within the traffic network but also generates more effective feature representations. This, in turn, provides enhanced modeling capabilities for more accurate traffic flow prediction.

3 Methodology

3.1 Problem definition

The traffic network can be represented as a weighted directed graph , where denotes the set of nodes (representing the sensors in the traffic network), and is the set of edges connecting all nodes in (indicating the strength of connections between nodes). The graph is represented by the spatial adjacency matrix , where denotes the proximity (measured by the similarity of node features) or distance (measured by the Euclidean distance between road sensors) between nodes and . Specifically, if and , then ; otherwise, . The traffic state at time step is represented as a graph signal on the traffic network , where each element corresponds to traffic features (e.g., speed, flow, etc.) observed by the respective sensors. Traffic flow prediction is a typical time series forecasting task. Formally, given the velocity observations at nodes over the past time steps, this can be represented as . The objective is to predict the traffic flow velocities at all nodes over the subsequent time steps, denoted as . Based on this notation, the traffic prediction problem is generally defined as:

(1)

where is the function to be learned.

3.2 Framework of STIL-TA

This paper proposes a novel approach, STIL-TA, designed to simultaneously capture the dynamic spatiotemporal correlations of traffic flow. The overall framework of the model is illustrated in Fig 1, which primarily consists of an interactive dynamic graph convolutional network (IDGCN), a concatenation fusion module, and a temporal multi-head trend-aware self-attention (TMHTAAtt) mechanism. First, the raw data is fed into the Linear layer, which generates a high-dimensional spatial representation of the data to capture deeper spatiotemporal dependencies. Next, based on dynamic graph convolutional networks (DGCN), the IDGCN further processes the features extracted by the start convolution layer through an interactive learning strategy. Specifically, during the partitioning phase, the input data is split into two equal-length subsequences (each having half the length of the original sequence) via an interleaved sampling recursive method. These two subsequences are then subjected to interactive learning in the IDGCN, where the model shares the features learned by each subsequence. By embedding the DGCN into the interactive learning structure, the model not only captures temporal dependencies but also effectively learns the dynamic spatial characteristics of traffic flow. Once the spatiotemporal features are extracted by the IDGCN, the model outputs the two subsequences. Subsequently, these subsequences are reordered based on their temporal indices and are input into the diffusion graph convolution layer through the concatenation fusion module, further extracting the global dynamic spatiotemporal features of the traffic flow. Finally, the captured dynamic spatiotemporal features are passed to the TMHTAAtt mechanism and a multilayer perceptron layer for the final traffic flow prediction.

Download:

Fig 1. Overall Framework of STIL-TA.

https://doi.org/10.1371/journal.pone.0331095.g001

3.3 Interactive Learning (IL)

The interactive learning framework consists of three identical interactive dynamic graph convolutional network (IDGCN) modules, with the core component being the IDGCN structure, as illustrated in Fig 2a. The mechanism of the interactive learning framework is to capture the respective dynamic spatio-temporal features through the interactive learning of two subsequences. Each subsequence is first preprocessed by a convolution operation, and then these two subsequences share parameter weights within the IDGCN, which allows them to efficiently capture the dynamic spatio-temporal dependencies between them. This design allows the model to flexibly adjust its learning focus over different time periods, better adapting to the nonlinear and nonsmooth characteristics of time series.

Download:

Fig 2. Overall Framework of the Interactive Learning Module: (a) Structure of the IDGCN; (b) Structure of the DGCN.

https://doi.org/10.1371/journal.pone.0331095.g002

The mathematical formulation of the interleaved sampling recursive method is as follows, given an input sequence ( time step, nodes, features), the sampling process is:

(2)

where , , this nonoverlapping division preserves the integrity of the original sequence while enabling the two subsequences to capture complementary spatiotemporal patterns through interactive learning.

Let the input to IDGCN be denoted as is interlaced sampled to obtain two subsequences (odd and even sequences), and . The first stage of interactive learning produces outputs and . Through further interaction, the final output sequences are and . The specific operations in IDGCN are as follows:

(3)

(4)

(5)

(6)

(7)

where , , , and represent 1D convolution operations, denotes the activation function and the Hadamard product, and refers to the dynamic graph convolution in IDGCN.

The DGCN parameter shared by the parity subsequence produces coupled gradient updates:

(8)

When the ratio is out of range, the anomalous gradient is automatically attenuated by the gating factor .

Interleaved sampling ensures that both subsequences come from the same data stream shape:

(9)

The DGCN must accommodate both odd/even subsequence distributions, and its parameter converges to the “intersection space” of the two substreams.

Compared with many existing models, the interactive learning framework offers the following improvements and unique advantages: first, it captures spatio-temporal dependencies more effectively by processing two subsequences in parallel. Second, the model is able to adjust the learning focus according to different time periods, better adapting to the dynamic changes of time series. Third, by sharing parameter weights, the model is able to integrate information from different time steps more effectively and improve learning efficiency. Finally, the model is able to better identify and utilize the periodic features in the time series while adapting to the continuous time pattern through the interactive learning mechanism.

3.4 Dynamic Graph Convolution Network (DGCN)

The dynamic graph convolutional network (DGCN) primarily consists of the diffusion graph convolution network and the graph generation module, as illustrated in Fig 2b. By leveraging the diffusion graph convolution and the graph generation module, DGCN effectively captures deeper dynamic spatial features, thereby enhancing the ability of STIL-TA to capture spatial heterogeneity. DGCN feeds the hidden feature matrix and a predefined initial adjacency matrix into the diffusion graph convolution network, followed by feeding the output to the graph generator to produce a discrete matrix containing spatiotemporal information. The representation is as follows:

(10)

where represents the diffusion convolution and graph generator operations, and denotes a multilayer perceptron.

To ensure differentiability during training, the STIL-TA model employs the gumbel reparameterization:

(11)

where denotes a random variable, is the temperature set to 0.5, and is the adjacency matrix generated by the graph generator to model the dynamic dependencies between nodes. The temperature parameter is subjected to an exponential decay strategy (initial value of 0.5, 15% decay every 10 epochs) in order to progressively strengthen the discretization. The random variable is generated by inverse transformation sampling to ensure that the gradient is computable during backpropagation.

Furthermore, an adaptive adjacency matrix , which does not require any prior knowledge, is constructed as follows:

(12)

where and represent learnable parameters, and the initial value of is based on the adjacency matrix derived from the original graph data.

DGCN uses an adaptive fusion module to merge and , producing a dynamic adjacency matrix , which is then fed into the diffusion graph convolution network to extract hidden dynamic spatiotemporal correlations in traffic roads. The operation of this fusion module is as follows:

(13)

where is a learnable adaptive parameter factor.

By gradient descent optimization with L2 regularization (), the update process can be expressed as:

(14)

where is the loss function and is the learning rate.

In the graph generator network, the fusion graph convolution, and the concatenation fusion module, diffusion graph convolutions are applied, and the input to the diffusion graph convolution is uniformly defined as .

In the graph generator network, the diffusion graph convolution is defined as:

(15)

where represents the diffusion step, is the maximum diffusion step, and denotes the parameter matrix.

In the fusion graph convolution module, is the input adjacency matrix for the fusion graph convolution, where the diffusion graph convolution is represented as:

(16)

In the concatenation module, the dynamic spatiotemporal features extracted by the IL module are recombined in temporal order and fed into the diffusion graph convolution layer to capture and correct features across the entire time series. Additionally, the forward and backward transition matrices and of the initial adjacency matrix are used. The diffusion graph convolution in the concatenation fusion module is represented as:

(17)

3.5 Temporal multi-head trend-aware self-attention

To capture the complexity and trend of traffic flow, we propose a temporal multi-head trend-aware self-attention (TMHTAAtt) mechanism that incorporates local contextual information. Traditional self-attention is a specific implementation of the attention mechanism, where queries, keys, and values are derived from the same symbol representation sequence. The multi-head self-attention mechanism is the most widely used variant in practice, as it enables simultaneous attention to information from multiple representation subspaces. The fundamental operations in multi-head self-attention are as follows:

(18)

where represents the query, represents the key, and represents the value.

In multi-head self-attention, the query, keys, and values are initially projected into separate subspaces. The attention functions are then executed in parallel. The resulting outputs are concatenated and further projected to obtain the final output, represented as follows:

(19)

(20)

where represents the number of attention heads, , , and are the projection matrices used on , , and , respectively, and represents the final output projection matrix. Multi-head self-attention offers a flexible approach to capture complex correlation dynamics in traffic data, leading to accurate long-term forecasting.

However, the multi-head self-attention mechanism was initially designed to handle discrete tokens and does not account for the inherent local trend information in continuous data. Therefore, directly applying it to traffic signal sequence transformation may lead to mismatching issues. TMHTAAtt addresses the local trend unawareness problem of traditional multi-head self-attention in numerical data prediction. This mechanism is a variant of convolutional self-attention. By employing convolutional operations, which compute representations based on local context as input, the model is able to capture the underlying local variation trends present in traffic data. Formally, the TMHTAAtt mechanism is defined as follows:

(21)

where represents the convolution operation, while and represent the parameters of the convolution kernel.

3.6 Other components

Huber loss [43] is a widely used loss function that combines the mean squared error (MSE) and the linear term of absolute error. It behaves like squared loss when the predicted values are close to the true values and like absolute loss when the predicted values are far from the true values. This characteristic allows huber loss to effectively mitigate the impact of outliers during model training while maintaining stable performance. Consequently, Huber loss is employed as the loss function in the optimization process of this study.

(22)

where is the hyperparameter that balances the Squared Error, and are the predicted and true values, respectively.

4 Experiment

4.1 Datasets

The predictive performance of the STIL-TA model was evaluated on two publicly available traffic datasets: METR-LA, PEMS-BAY, PEMS04 and PEMS08. METR-LA consists of traffic speed statistics recorded by 207 sensors located across highways in Los Angeles County over a four-month period. PEMS-BAY, on the other hand, comprises traffic speed data collected by 325 sensors deployed along roadways in the San Francisco Bay Area over a six-month span. Both datasets include information on the sensor locations, the dates of data collection, and the types of data recorded. PEMS04 and PEMS08 are real-world traffic datasets collected in real-time every 30 seconds by the California Department of Transportation’s Performance Measurement System (PeMS). A detailed description of the experimental datasets is provided in Table 1.

Download:

Table 1. Details of the experimental dataset.

https://doi.org/10.1371/journal.pone.0331095.t001

4.2 Parameter setting

In the experiment, historical traffic data from the past hour () were used to predict the traffic flow for the next 60 minutes (). The entire dataset was split into training, validation, and test sets in a 6:2:2 ratio, maintaining chronological order. The traffic flow was predicted for 15, 30, and 60-minute intervals. The batch size was set to 64, the learning rate was set to 0.001, dropout was set to 0.3, epoch was set to 500, weight decay was set to 0.0001, and the Python version was 3.6.2. All models were trained for 200 epochs using the Adam optimizer. All experiments were performed on a 22 vCPU AMD EPYC 7T83 64-Core Processor with a RTX 4090 GPU Card. Model performance was evaluated using three metrics: Mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE), as follows:

(23)

(24)

(25)

where represents the number of samples, and and denote the predicted and ground truth values for the -th sample, respectively. The smaller the values of MAE, RMSE, and MAPE, the better the performance of the STIL-TA model in traffic flow prediction.

4.3 Baselines

To evaluate the performance of STIL-TA, we compare it with the following models:

● HA [2]: The Historical Average model, which predicts traffic flow based on the historical average traffic data.
● VAR [4]: The Vector Auto-Regression model, a statistical model that captures the linear relationships among multiple time series.
● SVR [7]: Support Vector Regression (SVR) uses a linear vector machine to model the relationship between input features and output traffic flow, thereby making predictions.
● ARIMA [3]: The Auto-Regressive Integrated Moving Average model, combined with the Kalman filter, is employed for time series forecasting.
● FNN [23]: A Feedforward Neural Network with two hidden layers and L2 regularization, used for learning complex patterns in the traffic data.
● FC-LSTM [23]: A Recurrent Neural Network architecture that integrates Fully Connected Long Short-Term Memory (LSTM) units for capturing temporal dependencies in traffic flow data.
● DCRNN [22]: The Diffusion Convolutional Recurrent Neural Network (DCRNN) leverages diffusion convolutional networks to learn spatial information, coupled with a sequence-to-sequence model to capture temporal dynamics.
● STGCN [36]: The Spatio-Temporal Graph Convolutional Network (STGCN) combines graph convolution with 1D convolution to model both spatial and temporal dependencies in traffic flow.
● ASTGCN [33]: The Attention-based Spatio-Temporal Graph Convolutional Network (ASTGCN), which employs an attention mechanism to enhance the learning of spatio-temporal relationships in traffic data.
● STSGCN [20]: The Spatial-Temporal Synchronous Graph Convolutional Network (STSGCN), which captures both spatial and temporal characteristics by stacking multiple local Graph Convolutional Network (GCN) layers along the temporal dimension.
● T-GCN [44]: The T-GCN framework integrates Graph Convolutional Networks (GCNs) with Gated Recurrent Units (GRUs), where GCNs learn the spatial topology of the road network and GRUs capture the temporal dependencies in traffic data.
● Graph WaveNet [32]: Graph WaveNet constructs adaptive adjacency matrices that preserve implicit spatial relationships while also designing a framework for efficiently capturing spatio-temporal dependencies through the fusion of dilated causal convolutions with graph convolutions.
● MRA-BGCN [45]: The Multi-Scale Residual Attention-based Bi-directional Graph Convolution Network (MRA-BGCN) builds node graphs based on road network distances and edge graphs based on edge interaction patterns. It models node and edge correlations separately using bicomponent graph convolution.
● TASSGN [39]: A time-aware structural semantic coupled graph network (TASSGN) is proposed to learn both structural and semantic features of graphs by designing a new graph learning module, proposing a self-sampling method and a time-aware graph encoder to capture temporal features, and generating sparse graphs to capture node unique features.
● MegaCRN [40]: A recurrent network that utilizes a memory network. Incorporate the concept into the Meta-Graph Convolutional Recurrent Network (MegaCRN) by integrating a Meta-Graph Learner, which is enhanced by a Meta-Node Bank, into the GCRN encoder-decoder framework.
● LEISN-ED [41]: This paper proposes a Long-term Explicit–Implicit Spatio-Temporal Network (LEISN) for traffic flow prediction, which includes a Long-term Dependency Module to store hidden states from multiple previous time steps and two graph convolution-based branches to extract explicit and implicit spatial features, with all features fused to predict the next state.

4.4 Experimental results

The performance of the STIL-TA model was compared against 16 commonly used baseline models in forecasting tasks over 15, 30, and 60-minute time horizons. Compared to the best baseline model, STIL-TA also demonstrated improvements in predictions for other time steps. The experimental results shown in Table 2 and Table 3 indicate that the proposed STIL-TA model nearly achieved optimal predictive performance on the four datasets. For instance, in the 15-minute and 60-minute prediction tasks, STIL-TA outperformed the state-of-the-art MegaCRN model by 5.42% and 6.77% in terms of MAE and 0.37% and 4.19% in terms of RMSE, respectively.

Download:

Table 2. Performance comparison of different traffic flow prediction models on METR-LA and PEMS-BAY datasets.

https://doi.org/10.1371/journal.pone.0331095.t002

Download:

Table 3. Performance comparison of different traffic flow prediction models on PEMS04 and PEMS08 datasets.

https://doi.org/10.1371/journal.pone.0331095.t003

It is evident that statistical methods (HA, VAR, ARIMA) and traditional machine learning models such as SVR and FC-LSTM did not perform as well, consistently falling short of the deep learning-based approaches. SVR and FC-LSTM only consider temporal features, failing to effectively capture spatial dependencies, which are crucial for spatiotemporal traffic forecasting. Consequently, their predictive performance was suboptimal. Although the VAR model can represent spatial and temporal correlations across different time series, its ability to model nonlinear and dynamic spatiotemporal dependencies is limited, leading to poor performance in spatiotemporal traffic forecasting tasks.

GCN based models, on the other hand, are capable of handling non-Euclidean traffic data and effectively capturing hidden relationships between road network nodes. Therefore, spatiotemporal GCN models such as STGCN and STSGCN performed well in the experiments. Despite STSGCN’s ability to simultaneously capture spatiotemporal features, it uses a simple sliding window to model temporal dependencies, neglecting the fine-grained temporal patterns, which resulted in suboptimal performance. Attention-based models, such as ASTGCN, performed relatively well due to their flexibility in capturing temporal dependencies within the sequence.

RNN based methods are limited in their ability to capture long-term temporal dependencies. In contrast, the STIL-TA model significantly outperformed RNN based models, especially in long-term prediction tasks. STGCN and Graph WaveNet are two classic CNN based spatiotemporal models. They model temporal correlations using 1D CNNs or TCNs along the time dimension, employing small convolutional kernels to capture local features. However, these models face challenges in long-term predictions due to their inability to effectively capture long-range temporal information. Compared to Graph WaveNet, our method demonstrated more accurate long-term predictions, while the short-term performance was similar. This can be attributed to the fact that spatial and temporal dependencies are often more stable in short-term predictions, and Graph WaveNet lacks an attention mechanism to further exploit spatiotemporal features. In contrast, STIL-TA leverages attention mechanisms to focus on relevant information from each time slice in a data-driven manner, effectively capturing long-term temporal dependencies.

In general, as the prediction horizon increases, model performance tends to be influenced by more complex and uncertain factors. However, the STIL-TA model exhibited only a marginal decline in performance for long-term predictions, indicating its robustness in handling complex situations, particularly in long-term forecasting tasks. The experimental results validate the effectiveness of STIL-TA in capturing dynamic spatial dependencies and long-term temporal dependencies. Furthermore, they demonstrate that STIL-TA can uncover hidden dynamic associations between road network nodes, thus effectively capturing spatial correlations. Although the difficulty of prediction tasks increases with the forecast horizon, as shown in Table 2, STIL-TA still outperformed other models in long-term predictions, further validating the effectiveness of STIL-TA’s interactive learning strategy.

Based on the experimental results in the Table 3, we can analyze the performance of different traffic flow prediction models on the PEMS04 and PEMS08 datasets. First, the STIL-TA model performs well on all time intervals (15, 30 and 60 minutes) and both datasets, and its MAE, RMSE and MAPE metrics are significantly lower than those of the other models, which shows its superiority in capturing the spatio-temporal characteristics of traffic flow. On the PEMS08 dataset, the STIL-TA model also performs the best in all prediction intervals. For example, at the 15-minute prediction interval, it has the lowest MAE of 13.86, RMSE of 22.04, and MAPE of 8.88%. These results indicate that the STIL-TA model not only performs well in short-term forecasting, but also has high accuracy and stability in long-term forecasting. In contrast, other models such as HA, VAR, SVR, and other traditional methods do not perform as well as STIL-TA for all prediction intervals and datasets. Even some deep learning-based models, such as Graph WaveNet and ASTGCN, although they perform better in some cases, they are still inferior to the STIL-TA model in general.

In summary, the STIL-TA model effectively improves the accuracy of traffic flow prediction through its interactive learning and temporal attention mechanisms, and especially performs well in dealing with complex spatio-temporal dependencies. These results indicate that the STIL-TA model has great potential for application in practical traffic management and can provide more reliable support for traffic flow prediction.

To provide a clearer illustration of the advantages of the STIL-TA model, we have visualized the experimental results of STIL-TA compared to FNN, FC-LSTM, Graph WaveNet, and STGCN on the PEMS-BAY dataset, as shown in Fig 3–4 and Fig 5. The results clearly demonstrate that STIL-TA significantly outperforms FNN, FC-LSTM, Graph WaveNet, and STGCN in terms of prediction performance. This indicates that the proposed model is more effective at capturing the dynamic spatiotemporal characteristics of traffic flow. Further analysis reveals that as the prediction horizon increases, the growth of prediction errors remains relatively small. When the prediction horizon exceeds 15 minutes, the prediction error of STIL-TA consistently remains significantly lower than that of the other models, further validating the superior performance of this model in long-term predictions.

Download:

Fig 3. Visual comparison of different error metrics (MAE).

https://doi.org/10.1371/journal.pone.0331095.g003

Download:

Fig 4. Visual comparison of different error metrics (RMSE).

https://doi.org/10.1371/journal.pone.0331095.g004

Download:

Fig 5. Visual comparison of different error metrics (MAPE).

https://doi.org/10.1371/journal.pone.0331095.g005

4.5 Ablation study

To further investigate the performance of various modules within the STIL-TA model, this study designed seven variants of the STIL-TA model. The performance of these variants was evaluated based on experiments conducted on the METR-LA and PEMS-BAY datasets. Specifically, we computed and visualized the performance of these eight variants in terms of MAE, RMSE, and MAPE, as shown in Fig 6–7 and Fig 8. The differences between each variant and the original STIL-TA model are as follows:

Download:

Fig 6. Visualization of MAE metrics on METR-LA and PEMS-BAY datasets.

https://doi.org/10.1371/journal.pone.0331095.g006

Download:

Fig 7. Visualization of RMSE metrics on METR-LA and PEMS-BAY datasets.

https://doi.org/10.1371/journal.pone.0331095.g007

Download:

Fig 8. Visualization of MAPE metrics on METR-LA and PEMS-BAY datasets.

https://doi.org/10.1371/journal.pone.0331095.g008

● w/o GCN: This variant is based on STIL-TA but removes the diffusion GCNs module.
● w/o DGCN: This variant replaces the DGCN module in STIL-TA with a standard diffusion GCN, using a predefined initial adjacency matrix as the input to the GCN.
● w/o Conv: This variant is based on STIL-TA but eliminates the 1D convolution module from the interactive learning structure.
● w/o Interaction: This variant replaces the interactive learning structure in STIL-TA with a temporal convolutional network (TCN) module, which is concatenated with a dynamic convolution module. The TCN consists of 6 layers, with 64 feature channels.
● w/o Apt Adj: This variant is based on STIL-TA but removes the adaptive adjacency matrix from the DGCN module, replacing it with a predefined initial adjacency matrix as the input to the graph generator.
● w/o Learned Adj: This variant is based on STIL-TA but removes the graph generator structure, retaining the adaptive adjacency matrix while replacing the fusion GCN in the DGCN module with a diffusion GCN.
● w/o TMHTAAtt: This variant is based on STIL-TA but removes the temporal multi-head trend-aware self-attention (TMHTAAtt) module.

The graph convolutional network plays a crucial role in the STIL-TA model. Furthermore, the proposed IDGCN and TMHTAAtt modules are also critical to enhancing the overall model performance. Specifically, the 1D convolution is essential for expanding the receptive field and serves as a core component in the interactive learning structure. The results of the ablation study indicate that 1D convolution significantly improves the model’s performance.

Moreover, an ablation analysis was performed on the two adjacency matrices defined within the DGCN module in the STIL-TA model. As shown in Fig 6, Fig 7 and Fig 8, the adaptive adjacency matrix is crucial for the model’s prediction accuracy. The combination of the learnable adjacency matrix and the adaptive adjacency matrix generates a dynamic adjacency matrix. Further analysis revealed that this dynamic adjacency matrix enables the graph convolution to better capture the hidden spatial correlations within traffic data, thereby validating the effectiveness of the two core structures: interactive learning and dynamic graph convolution. Finally, the TMHTAAtt mechanism effectively captures local context, fully exploits the spatiotemporal information of traffic flow, and captures dynamic temporal relationships, which is essential for the STIL-TA model.

4.6 Visualization analysis

Furthermore, Fig 9 and Fig 10 presents a visualization of the true and predicted traffic flow values at specific time intervals for STIL-TA on PEMS-BAY, focusing on the 15-minute (Horizon 3) and 60-minute (Horizon 12) prediction horizons. It is evident that the STIL-TA model accurately captures peak periods and overall traffic patterns, effectively predicting the fluctuations in traffic flow.

Download:

Fig 9. Visualization of the true and predicted values of the STIL-TA model on the PEMS-BAY dataset (Horizon 3).

https://doi.org/10.1371/journal.pone.0331095.g009

Download:

Fig 10. Visualization of the true and predicted values of the STIL-TA model on the PEMS-BAY dataset (Horizon 12).

https://doi.org/10.1371/journal.pone.0331095.g010

Additionally, the results demonstrate that the 15-minute predictions are more accurate than the 60-minute predictions. This discrepancy can be attributed to the complex dynamic spatiotemporal characteristics of traffic flow, which make long-term predictions inherently more challenging. Nevertheless, STIL-TA’s 60-minute predictions still closely follow the true traffic fluctuations, further validating the accuracy and effectiveness of the proposed model in traffic flow forecasting.

4.7 Comparison of time complexity

As shown in Table 4, we compare the training and inference times of various models on the METR-LA dataset. During the training phase, STGCN demonstrates the fastest speed; however, its performance in practical predictions is suboptimal. In contrast, DCRNN exhibits a significantly slower training speed due to its reliance on the RNN structure, which requires additional time to learn temporal features of the time series data. In the inference phase, STIL-TA achieves the fastest inference time. On the other hand, both DCRNN and STGCN have relatively slower inference speeds, primarily because they necessitate multiple iterative computations to generate predictions. STIL-TA and Graph WaveNet benefit from shorter inference times, as they are capable of generating 12-step predictions in a single computation. While STIL-TA incurs slightly higher computational costs during the training phase compared to STGCN, it consistently outperforms other models in terms of traffic flow prediction accuracy.

Download:

Table 4. Computation time comparison with other models on the PEMS-BAY dataset.

https://doi.org/10.1371/journal.pone.0331095.t004

In terms of trade-off considerations for model deployment, STIL-TA, although the training phase is time-consuming (136.55s/epoch) compared to STGCN (51.35s/epoch), this computational cost enhancement leads to significant accuracy gains (15.49% reduction in MAE). Notably, STIL-TA exhibits the best real-time performance in the inference phase (a single computation generates a 12-step prediction), which makes it particularly suitable for real-world traffic system scenarios with stringent timeliness requirements, e.g., adaptive signal control (200ms-level response). The model design achieves a balance between accuracy and efficiency through three strategies: (1) the training phase uses a parallel computing architecture for dynamic graph learning, which increases the initial training burden but avoids the sequential computation bottleneck of the RNN structure; (2) the inference phase utilizes the single feed-forward property of the spatio-temporal attention mechanism to reduce the computational complexity while maintaining the prediction accuracy; and (3) the modular design allows for the deployment of the model on the resource-constrained edge devices to deploy only critical computational paths (e.g., shedding interaction learning branches).

5 Conclusion

This paper presents STIL-TA, an efficient and accurate traffic flow prediction model that integrates non-Euclidean traffic flow characteristics with an interactive learning strategy and temporal multi-head trend-aware self-attention (TMHTAAtt). By capturing dynamic spatiotemporal features, STIL-TA addresses key challenges of traditional models, such as limited interaction, incomplete spatiotemporal feature capture, and difficulties in long-term forecasting. The model generates dynamic graph structures from spatiotemporal data and uses a predefined adjacency matrix to model dynamic node dependencies, uncovering hidden spatial correlations. The incorporation of dynamic graph convolutional networks within an interactive framework allows STIL-TA to capture periodic trends and spatiotemporal dependencies. Additionally, the TMHTAAtt mechanism enhances its ability to identify dynamic temporal patterns, improving prediction accuracy. Experimental results demonstrate that STIL-TA outperforms existing methods on real-world traffic datasets, validating its effectiveness.

This study has achieved significant results in modeling the spatio-temporal dynamics of traffic flow, but it is still complicated by external factors such as weather, special events and holidays. In this paper, we will consider the effects of external factors (e.g., weather and social events) on traffic flow prediction in the next step of the study and explore the application of STIL-TA on large-scale datasets to further enhance its prediction capability.

References

1. Zhu W, Chen X, Jiang L. PV-LaP: Multi-sensor fusion for 3D Scene Understanding in intelligent transportation systems. Signal Processing. 2025;227:109749.
- View Article
- Google Scholar
2. Tian W, Zhang Y, Zhang Y, Chen H, Liu W. A Short-Term Traffic Flow Prediction Method for Airport Group Route Waypoints Based on the Spatiotemporal Features of Traffic Flow. Aerospace. 2024;11(4):248.
- View Article
- Google Scholar
3. Shahriari S, Sisson SA, Rashidi T. Modelling time series with temporal and spatial correlations in transport planning using hierarchical ARIMA-copula Model: A Bayesian approach. Expert Systems with Applications. 2025;274:126977.
- View Article
- Google Scholar
4. Wu J, Wang Y-G, Zhang H. Augmented support vector regression with an autoregressive process via an iterative procedure. Applied Soft Computing. 2024;158:111549.
- View Article
- Google Scholar
5. Hamed MM, Al-Masaeid HR, Said ZMB. Short-Term Prediction of Traffic Volume in Urban Arterials. J Transp Eng. 1995;121(3):249–54.
- View Article
- Google Scholar
6. Buzzio-García J, Vergara J, Ríos-Guiral S, Garzón C, Gutiérrez S, Botero JF, et al. Exploring Traffic Patterns Through Network Programmability: Introducing SDNFLow, a Comprehensive OpenFlow-Based Statistics Dataset for Attack Detection. IEEE Access. 2024;12:42163–80.
- View Article
- Google Scholar
7. Xu C, Chen Y, Zeng Q, Yang S, Zhang W, Li H. Informer–SVR: Traffic Volume Prediction Hybrid Model Considering Residual Autoregression Correction. J Transp Eng, Part A: Systems. 2025;151(4).
- View Article
- Google Scholar
8. Mathew A, Rawther FA. Hadoop based short — term traffic flow prediction on D2its using correlation model and KNN HSsine. In: 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017. 1123–9.
- View Article
- Google Scholar
9. Wu C-H, Ho J-M, Lee DT. Travel-Time Prediction With Support Vector Regression. IEEE Trans Intell Transport Syst. 2004;5(4):276–81.
- View Article
- Google Scholar
10. Huang W, Song G, Hong H, Xie K. Deep architecture for traffic flow prediction: Deep belief networks with multitask learning. IEEE Transactions on Intelligent Transportation Systems. 2014;15(5):2191–201.
- View Article
- Google Scholar
11. Gräber T, Hossny M, Nahavandi S. A Hybrid Approach to Side-Slip Angle Estimation with Recurrent Neural Networks and Kinematic Vehicle Models. IEEE Transactions on Intelligent Vehicles. 2019;4(1):39–47.
- View Article
- Google Scholar
12. Saleh K, Hossny M, Nahavandi S. Intent Prediction of Pedestrians via Motion Trajectories Using Stacked Recurrent Neural Networks. IEEE Trans Intell Veh. 2018;3(4):414–24.
- View Article
- Google Scholar
13. Khairdoost N, Shirpour M, Bauer MA, Beauchemin SS. Real-Time Driver Maneuver Prediction Using LSTM. IEEE Trans Intell Veh. 2020;5(4):714–24.
- View Article
- Google Scholar
14. Chauhan NS, Kumar N, Eskandarian A. A Novel Confined Attention Mechanism Driven Bi-GRU Model for Traffic Flow Prediction. IEEE Trans Intell Transport Syst. 2024;25(8):9181–91.
- View Article
- Google Scholar
15. Zheng H, Lin F, Feng X, Chen Y. A Hybrid Deep Learning Model With Attention-Based Conv-LSTM Networks for Short-Term Traffic Flow Prediction. IEEE Trans Intell Transport Syst. 2021;22(11):6910–20.
- View Article
- Google Scholar
16. Shi X, Chen Z, Wang H, Yeung D, Wong W, Woo W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2015. 802–10.
- View Article
- Google Scholar
17. Ke J, Zheng H, Yang H, Chen X (Michael). Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transportation Research Part C: Emerging Technologies. 2017;85:591–608.
- View Article
- Google Scholar
18. Zhang J, Zheng Y, Qi D, Li R, Yi X, Li T. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence. 2018;259:147–66.
- View Article
- Google Scholar
19. Wu Z, Pan S, Long G, Jiang J, Chang X, Zhang C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020:753–63.
- View Article
- Google Scholar
20. Song C, Lin Y, Guo S, Wan H. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting. AAAI. 2020;34(01):914–21.
- View Article
- Google Scholar
21. Kumar R, Panwar R, Chaurasiya VK. Urban traffic forecasting using attention based model with GCN and GRU. Multimed Tools Appl. 2023;83(16):47751–74.
- View Article
- Google Scholar
22. Li Y, Yu R, Shahabi C, Liu Y .Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Prediction. International Conference on Learning Representations. 2017.
- View Article
- Google Scholar
23. Zhang S, Ju Y, Kong W, Qu H, Huang L. sAMDGCN: sLSTM-Attention-Based Multi-Head Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Mathematics. 2025;13(2):185.
- View Article
- Google Scholar
24. Zhang J, Mao S, Zhang S, Yin J, Yang L, Gao Z. EF-former for short-term passenger flow prediction during large-scale events in urban rail transit systems. Information Fusion. 2025;117:102916.
- View Article
- Google Scholar
25. Qiu H, Zhang J, Yang L, Han K, Yang X, Gao Z. Spatial–temporal multi-task learning for short-term passenger inflow and outflow prediction on holidays in urban rail transit systems. Transportation. 2025.
- View Article
- Google Scholar
26. Zhang J, Zhang S, Zhao H, Yang Y, Liang M. Multi-frequency spatial-temporal graph neural network for short-term metro OD demand prediction during public health emergencies. Transportation. 2025.
- View Article
- Google Scholar
27. Zhang S, Zhang J, Yang L, Chen F, Li S, Gao Z. Physics Guided Deep Learning-Based Model for Short-Term Origin–Destination Demand Prediction in Urban Rail Transit Systems Under Pandemic. Engineering. 2024;41:276–96.
- View Article
- Google Scholar
28. Zhao Y, Luo X, Wen H, Xiao Z, Ju W, Zhang M. Embracing large language models in traffic flow forecasting. 2024.
- View Article
- Google Scholar
29. Guo X, Zhang Q, Jiang J, Peng M, Zhu M, Yang H. Towards explainable traffic flow prediction with large language models. 2024.
- View Article
- Google Scholar
30. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In:
- View Article
- Google Scholar
31. Kipf T, and Welling M (2016) Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations.
- View Article
- Google Scholar
32. Wu Z, Pan S, Long G, Jiang J, Zhang C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019. 1907–13.
- View Article
- Google Scholar
33. Guo S, Lin Y, Feng N, Song C, Wan H. Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. AAAI. 2019;33(01):922–9.
- View Article
- Google Scholar
34. Bruna J, Zaremba W, Szlam A, and LeCun Y (2013) Spectral Networks and Locally Connected Networks on Graphs. International Conference on Learning Representations.
- View Article
- Google Scholar
35. Defferrard M, Bresson X, Vandergheynst P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. Advances in Neural Information Processing Systems. 2016;3837–45.
- View Article
- Google Scholar
36. Yu B, Yin H, Zhu Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018:3634–40.
- View Article
- Google Scholar
37. Hechtlinger Y, Chakravarti P, Qin J. A generalization of convolutional neural networks to graph-structured data. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), 2017:1024–34.
- View Article
- Google Scholar
38. Hamilton W, Ying Z, Leskovec J. Inductive Representation Learning on Large Graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017:1024–34.
- View Article
- Google Scholar
39. Chen M, Han L, Xu Y, Zhu T, Wang J, Sun L. Temporal-aware structure-semantic-coupled graph network for traffic forecasting. Information Fusion. 2024;107:102339.
- View Article
- Google Scholar
40. Jiang R, Wang Z, Yong J, Jeph P, Chen Q, Kobayashi Y, et al. Spatio-Temporal Meta-Graph Learning for Traffic Forecasting. AAAI. 2023;37(7):8078–86.
- View Article
- Google Scholar
41. Lai Q, Chen P. LEISN: A long explicit–implicit spatio-temporal network for traffic flow forecasting. Expert Systems with Applications. 2024;245:123139.
- View Article
- Google Scholar
42. GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18). 3428–34.
- View Article
- Google Scholar
43. Xie J, Liu S, Chen J, Jia J. Huber loss based distributed robust learning algorithm for random vector functional-link network. Artif Intell Rev. 2022;56(8):8197–218.
- View Article
- Google Scholar
44. Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, et al. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans Intell Transport Syst. 2020;21(9):3848–58.
- View Article
- Google Scholar
45. Chen W, Chen L, Xie Y, Cao W, Gao Y, Feng X. Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting. AAAI. 2020;34(04):3529–36.
- View Article
- Google Scholar

[ref1] 1. Zhu W, Chen X, Jiang L. PV-LaP: Multi-sensor fusion for 3D Scene Understanding in intelligent transportation systems. Signal Processing. 2025;227:109749.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Tian W, Zhang Y, Zhang Y, Chen H, Liu W. A Short-Term Traffic Flow Prediction Method for Airport Group Route Waypoints Based on the Spatiotemporal Features of Traffic Flow. Aerospace. 2024;11(4):248.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Shahriari S, Sisson SA, Rashidi T. Modelling time series with temporal and spatial correlations in transport planning using hierarchical ARIMA-copula Model: A Bayesian approach. Expert Systems with Applications. 2025;274:126977.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Wu J, Wang Y-G, Zhang H. Augmented support vector regression with an autoregressive process via an iterative procedure. Applied Soft Computing. 2024;158:111549.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Hamed MM, Al-Masaeid HR, Said ZMB. Short-Term Prediction of Traffic Volume in Urban Arterials. J Transp Eng. 1995;121(3):249–54.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Buzzio-García J, Vergara J, Ríos-Guiral S, Garzón C, Gutiérrez S, Botero JF, et al. Exploring Traffic Patterns Through Network Programmability: Introducing SDNFLow, a Comprehensive OpenFlow-Based Statistics Dataset for Attack Detection. IEEE Access. 2024;12:42163–80.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Xu C, Chen Y, Zeng Q, Yang S, Zhang W, Li H. Informer–SVR: Traffic Volume Prediction Hybrid Model Considering Residual Autoregression Correction. J Transp Eng, Part A: Systems. 2025;151(4).
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Mathew A, Rawther FA. Hadoop based short — term traffic flow prediction on D2its using correlation model and KNN HSsine. In: 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017. 1123–9.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Wu C-H, Ho J-M, Lee DT. Travel-Time Prediction With Support Vector Regression. IEEE Trans Intell Transport Syst. 2004;5(4):276–81.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Huang W, Song G, Hong H, Xie K. Deep architecture for traffic flow prediction: Deep belief networks with multitask learning. IEEE Transactions on Intelligent Transportation Systems. 2014;15(5):2191–201.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Gräber T, Hossny M, Nahavandi S. A Hybrid Approach to Side-Slip Angle Estimation with Recurrent Neural Networks and Kinematic Vehicle Models. IEEE Transactions on Intelligent Vehicles. 2019;4(1):39–47.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Saleh K, Hossny M, Nahavandi S. Intent Prediction of Pedestrians via Motion Trajectories Using Stacked Recurrent Neural Networks. IEEE Trans Intell Veh. 2018;3(4):414–24.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Khairdoost N, Shirpour M, Bauer MA, Beauchemin SS. Real-Time Driver Maneuver Prediction Using LSTM. IEEE Trans Intell Veh. 2020;5(4):714–24.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Chauhan NS, Kumar N, Eskandarian A. A Novel Confined Attention Mechanism Driven Bi-GRU Model for Traffic Flow Prediction. IEEE Trans Intell Transport Syst. 2024;25(8):9181–91.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Zheng H, Lin F, Feng X, Chen Y. A Hybrid Deep Learning Model With Attention-Based Conv-LSTM Networks for Short-Term Traffic Flow Prediction. IEEE Trans Intell Transport Syst. 2021;22(11):6910–20.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Shi X, Chen Z, Wang H, Yeung D, Wong W, Woo W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2015. 802–10.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Ke J, Zheng H, Yang H, Chen X (Michael). Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transportation Research Part C: Emerging Technologies. 2017;85:591–608.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Zhang J, Zheng Y, Qi D, Li R, Yi X, Li T. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence. 2018;259:147–66.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Wu Z, Pan S, Long G, Jiang J, Chang X, Zhang C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020:753–63.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Song C, Lin Y, Guo S, Wan H. Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting. AAAI. 2020;34(01):914–21.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Kumar R, Panwar R, Chaurasiya VK. Urban traffic forecasting using attention based model with GCN and GRU. Multimed Tools Appl. 2023;83(16):47751–74.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Li Y, Yu R, Shahabi C, Liu Y .Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Prediction. International Conference on Learning Representations. 2017.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Zhang S, Ju Y, Kong W, Qu H, Huang L. sAMDGCN: sLSTM-Attention-Based Multi-Head Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Mathematics. 2025;13(2):185.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Zhang J, Mao S, Zhang S, Yin J, Yang L, Gao Z. EF-former for short-term passenger flow prediction during large-scale events in urban rail transit systems. Information Fusion. 2025;117:102916.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Qiu H, Zhang J, Yang L, Han K, Yang X, Gao Z. Spatial–temporal multi-task learning for short-term passenger inflow and outflow prediction on holidays in urban rail transit systems. Transportation. 2025.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Zhang J, Zhang S, Zhao H, Yang Y, Liang M. Multi-frequency spatial-temporal graph neural network for short-term metro OD demand prediction during public health emergencies. Transportation. 2025.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref27] 27. Zhang S, Zhang J, Yang L, Chen F, Li S, Gao Z. Physics Guided Deep Learning-Based Model for Short-Term Origin–Destination Demand Prediction in Urban Rail Transit Systems Under Pandemic. Engineering. 2024;41:276–96.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Zhao Y, Luo X, Wen H, Xiao Z, Ju W, Zhang M. Embracing large language models in traffic flow forecasting. 2024.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. Guo X, Zhang Q, Jiang J, Peng M, Zhu M, Yang H. Towards explainable traffic flow prediction with large language models. 2024.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In:
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref31] 31. Kipf T, and Welling M (2016) Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref32] 32. Wu Z, Pan S, Long G, Jiang J, Zhang C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019. 1907–13.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref33] 33. Guo S, Lin Y, Feng N, Song C, Wan H. Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. AAAI. 2019;33(01):922–9.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref34] 34. Bruna J, Zaremba W, Szlam A, and LeCun Y (2013) Spectral Networks and Locally Connected Networks on Graphs. International Conference on Learning Representations.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref35] 35. Defferrard M, Bresson X, Vandergheynst P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. Advances in Neural Information Processing Systems. 2016;3837–45.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref36] 36. Yu B, Yin H, Zhu Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018:3634–40.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref37] 37. Hechtlinger Y, Chakravarti P, Qin J. A generalization of convolutional neural networks to graph-structured data. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), 2017:1024–34.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref38] 38. Hamilton W, Ying Z, Leskovec J. Inductive Representation Learning on Large Graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017:1024–34.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref39] 39. Chen M, Han L, Xu Y, Zhu T, Wang J, Sun L. Temporal-aware structure-semantic-coupled graph network for traffic forecasting. Information Fusion. 2024;107:102339.
View Article
Google Scholar

[116] View Article

[117] Google Scholar

[ref40] 40. Jiang R, Wang Z, Yong J, Jeph P, Chen Q, Kobayashi Y, et al. Spatio-Temporal Meta-Graph Learning for Traffic Forecasting. AAAI. 2023;37(7):8078–86.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref41] 41. Lai Q, Chen P. LEISN: A long explicit–implicit spatio-temporal network for traffic flow forecasting. Expert Systems with Applications. 2024;245:123139.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref42] 42. GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18). 3428–34.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref43] 43. Xie J, Liu S, Chen J, Jia J. Huber loss based distributed robust learning algorithm for random vector functional-link network. Artif Intell Rev. 2022;56(8):8197–218.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref44] 44. Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, et al. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans Intell Transport Syst. 2020;21(9):3848–58.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref45] 45. Chen W, Chen L, Xie Y, Cao W, Gao Y, Feng X. Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting. AAAI. 2020;34(04):3529–36.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

Figures

Abstract

1 Introduction

2 Related work

2.1 Traffic flow forecasting

2.2 Graph Convolution Networks

2.3 Attention mechanism

3 Methodology

3.1 Problem definition

3.2 Framework of STIL-TA

3.3 Interactive Learning (IL)

3.4 Dynamic Graph Convolution Network (DGCN)

3.5 Temporal multi-head trend-aware self-attention

3.6 Other components

4 Experiment

4.1 Datasets

4.2 Parameter setting

4.3 Baselines

4.4 Experimental results

4.5 Ablation study

4.6 Visualization analysis

4.7 Comparison of time complexity

5 Conclusion

References