Figures
Abstract
Accurate traffic flow prediction is essential for Intelligent Transportation Systems (ITS), yet capturing the complex spatiotemporal relationships within traffic data remains challenging due to dynamic traffic patterns and the non-Euclidean structure of road networks. Existing models struggle to adapt in real time, limiting their prediction accuracy and reliability. This study introduces the Spatiotemporal Feature Alignment with Graph Convolutional and Gated Recurrent Unit (STF-GGRU) model to address these limitations. By integrating a novel Integrated Spatiotemporal Feature Alignment (ISTFA) module, which combines Dynamic K-Nearest Neighbor (D-KNN) and Centered Kernel Alignment (CKA), the model dynamically captures critical spatial and temporal interactions. The STF-GGRU model achieves superior prediction accuracy, with RMSE values of 27.18 and 11.1 on the PeMSD4 and PeMSD8 datasets, outperforming traditional methods such as ARIMA, GRU, LSTM, and advanced neural models. These results demonstrate STF-GGRU’s potential for robust, real-time traffic predictions, marking a significant advancement in ITS capabilities.
Citation: Ata KI, Hassan MK, Al-Haddad SAR, Alquthami T, Rahman RZA, Alani S, et al. (2026) Integrated spatial-temporal feature alignment with graph convolutional and gated recurrent networks for traffic flow prediction. PLoS One 21(4): e0337661. https://doi.org/10.1371/journal.pone.0337661
Editor: Guangyin Jin, National University of Defense Technology, CHINA
Received: November 11, 2025; Accepted: March 18, 2026; Published: April 28, 2026
Copyright: © 2026 Ata et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The dataset used in this study is publicly available in the figshare repository. https://doi.org/10.6084/m9.figshare.31830970.
Funding: This work is supported by a research grant from the Research, Development, and Innovation Authority (RDIA), Saudi Arabia, grant no. 13010-Tabuk-2023-UT-R-3-1-SE.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
With rapid urbanization and the explosive growth of transportation networks worldwide, urban traffic congestion has emerged as one of the most pressing societal challenges. Traffic congestion not only causes significant economic losses, estimated at hundreds of billions of dollars annually, but also contributes to increased greenhouse gas emissions, environmental degradation, and reduced quality of life for urban residents. Accurate and timely traffic flow prediction is a critical component of Intelligent Transportation Systems (ITS). It empowers city planners and traffic operators to implement proactive congestion management, dynamic route guidance, and adaptive traffic signal control [1].
However, traffic flow prediction remains inherently difficult because of its highly nonlinear, dynamic, and spatiotemporally correlated nature. Traffic conditions are influenced not only by local patterns but also by distant upstream or downstream interactions, unpredictable events (e.g., accidents, public gatherings), and external factors (e.g., weather conditions, roadwork). These complex dependencies require sophisticated modeling techniques that can capture both spatial correlations across the road network and temporal evolution over time [2]. Traditional statistical approaches, such as Historical Average (HA), Autoregressive Integrated Moving Average (ARIMA), and Vector Autoregression (VAR), offer simplicity and computational efficiency. However, they are built on restrictive assumptions of linearity and stationarity [3–5]. Consequently, they often fail to adapt to abrupt changes and nonlinear interactions present in real-world traffic systems [6]. Machine learning methods such as K-Nearest Neighbor (KNN) and Support Vector Regression (SVR) introduced more flexibility by learning from historical patterns without requiring explicit model formulations. Nonetheless, they rely heavily on manually engineered features and lack the capacity to model deeper, hierarchical spatiotemporal relationships [6].
Recent advances in intelligent transportation research emphasize the importance of optimizing sensation, perception, and data-driven decision-making in complex traffic environments. For example, recent studies have investigated the optimal deployment of roadside sensing systems, such as LiDAR-based cooperative perception, to improve vehicle detection accuracy and traffic awareness in dense and dynamic traffic conditions [7]. These works demonstrate that intelligent sensing configuration and adaptive modeling are essential for improving traffic system performance, reinforcing the need for predictive models that can dynamically adapt to evolving spatiotemporal patterns.
The rise of deep learning has revolutionized traffic prediction by enabling automatic hierarchical feature extraction. Temporal modeling methods, including Recurrent Neural Networks (RNNs) [8], Long Short-Term Memory (LSTM) networks [9], and Gated Recurrent Units (GRUs) [10], have demonstrated strong capabilities in capturing sequential dependencies and long-term temporal patterns. Yet their focus is primarily on temporal dynamics, often overlooking intricate spatial relationships inherent in urban road networks [8].
To address spatial dependencies, Convolutional Neural Networks (CNNs) have been used to capture local spatial features. However, they are limited by their reliance on grid-like structures, which do not accurately represent the non-Euclidean topology of real-world road networks [9]. Graph Convolutional Networks (GCNs) overcome this limitation by effectively modeling traffic data as graphs, capturing complex spatial correlations across irregular and dynamic urban networks [10]. Hybrid architectures, such as Temporal Graph Convolutional Networks (T-GCN) and Diffusion Convolutional Recurrent Neural Networks (DCRNN), integrate both spatial and temporal features by combining GCNs and recurrent structures [11], [12]. Despite this progress, most models still depend on static or predefined graph structures. This limits their adaptability in dynamic scenarios such as accidents, unexpected events, or temporary road closures [13]. Recent studies have explored dynamic graph structures and attention mechanisms to improve flexibility and accuracy. Dynamic spatiotemporal GCNs update graph connectivity over time [14], while attention-based networks selectively emphasize critical spatiotemporal patterns to improve learning performance [15]. Furthermore, Generative Adversarial Networks (GANs) have been employed to augment data and improve model robustness against noise and sparsity [16]. Although these methods represent significant steps forward, they typically fail to explicitly integrate feature-based similarities across sensors. They also often do not provide a unified, adaptive framework capable of simultaneously modeling both spatial proximity and feature relationships in real time [17]. To address these challenges, we propose a novel and unified traffic flow prediction framework named STF-GGRU (Spatiotemporal Feature-aligned Graph Gated Recurrent Unit). The core innovation lies in the Integrated Spatiotemporal Feature Alignment (ISTFA) module, which explicitly integrates both spatial proximity and feature-based similarity using a combination of Dynamic K-Nearest Neighbor (D-KNN) and Centered Kernel Alignment (CKA). Unlike previous approaches that rely on static adjacency matrices or focus solely on attention mechanisms, ISTFA provides a dynamic and unified feature representation that evolves adaptively as traffic conditions change.
The primary contributions of this work are summarized as follows:
- A novel Integrated Spatiotemporal Feature Alignment (ISTFA) module: We propose a unique module that dynamically integrates spatial proximity and feature-based similarities using D-KNN and CKA, providing a more comprehensive and flexible graph representation. To our knowledge, this is the first attempt to combine these techniques for traffic flow prediction.
- A hybrid STF-GGRU architecture: We develop a hybrid model that combines Graph Convolutional Networks (GCNs) with Gated Recurrent Units (GRUs) to jointly capture complex spatial correlations and temporal dependencies, improving predictive accuracy and robustness.
- Extensive validation on large-scale real-world datasets: We rigorously evaluate STF-GGRU on two widely used urban traffic datasets, METR-LA and PEMS-BAY. Experimental results demonstrate that STF-GGRU consistently outperforms state-of-the-art baseline models across multiple horizons and metrics, confirming its practical effectiveness.
- Comprehensive scalability analysis: We conduct additional experiments to analyze the scalability of STF-GGRU under different network sizes and sensor densities. The results indicate that STF-GGRU maintains stable performance and computational efficiency, highlighting its potential for real-world deployment in large-scale urban transportation systems.
- Enhanced interpretability and deployment readiness: By integrating feature alignment and adaptive graph construction, STF-GGRU offers better interpretability of traffic relationships and is designed to support integration with distributed or edge computing environments, paving the way for intelligent and responsive traffic management applications.
The remainder of this paper is organized as follows. Section 2 presents a comprehensive review. Section 3 details the proposed STF-GGRU model and its key modules. Section 4 describes experimental setup. section 5 presents the dataset. evaluation metrics is presented in section 6. The results of the article is presented in sction 7.. Section 8 discussesthe ablation of the styudy.,. Finally, Section 9 concludes the paper and presents the future directions
2. Related work
Accurate traffic flow prediction is a cornerstone of intelligent transportation systems (ITS), directly impacting congestion mitigation, urban mobility planning, and safety. Classical statistical methods, such as Historical Average (HA), Autoregressive Integrated Moving Average (ARIMA), and Vector Autoregression (VAR), have been widely used due to their simplicity and interpretability [18,19]. However, these methods rely on assumptions of linearity and stationarity and often fail to capture abrupt and nonlinear variations characteristic of real-world urban traffic patterns, especially under incident scenarios or abnormal conditions. Machine learning techniques such as K-Nearest Neighbor (KNN) and Support Vector Regression (SVR) provide better adaptability by learning from data without explicit parametric assumptions [20,21]. Nonetheless, these methods depend heavily on manually engineered features and lack the capacity to capture hierarchical, nonlinear relationships, making them unsuitable for large-scale dynamic environments.
The advent of deep learning has fundamentally transformed traffic prediction. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks effectively model temporal dependencies by learning from sequential patterns [2,22]. However, LSTMs are resource-intensive and prone to overfitting, particularly when trained on sparse or noisy datasets [23]. Gated Recurrent Units (GRUs) simplify the gating structure and retain comparable performance while reducing computational demands [24]. Despite this, RNN-based models often focus solely on temporal dependencies and struggle to fully capture complex spatial relationships present in large sensor networks.
To address spatial dependencies, Convolutional Neural Networks (CNNs) have been applied but are limited by their requirement for grid-structured data, which does not fit the non-Euclidean nature of road networks [25]. Graph Convolutional Networks (GCNs) overcome this limitation by modeling traffic as a graph structure, effectively capturing irregular topologies and non-local spatial correlations [26]. Yet, most GCN-based models utilize static adjacency matrices derived from historical connectivity or road maps, thus failing to adapt to dynamic and evolving traffic conditions.
Hybrid models that combine spatial and temporal learning have shown promising improvements. The Temporal Graph Convolutional Network (T-GCN) integrates GCNs with GRUs to jointly learn spatial and temporal features [27], while the Diffusion Convolutional Recurrent Neural Network (DCRNN) models traffic as a diffusion process over a graph, effectively capturing propagation dynamics [28]. These approaches mark significant advances but mainly focus on spatial adaptivity, often overlooking detailed feature-level similarity between sensors. A notable contribution by Chen et al. [29]proposed the TFM-GCAM model, which integrates attention mechanisms with graph convolution and constructs a traffic flow matrix grounded in traffic flow theory. This framework successfully captures dynamic node relationships and fuses spatiotemporal features, outperforming more conventional hybrid architectures developed at the Beijing Institute of Technology. However, similar to other attention-based approaches, TFM-GCAM relies predominantly on learned attention weights and does not explicitly integrate dynamic feature similarity matrices across sensors, potentially limiting its interpretability and ability to generalize across diverse sensor types. Yan et al. [30] proposed a vehicle path planning and prediction algorithm utilizes attention mechanisms at complex intersections. Their approach significantly improved collaborative decision-making and safety at intersections but focused primarily on short-term vehicle trajectory prediction rather than holistic network-level flow forecasting. This limits its direct applicability to large-scale traffic flow prediction tasks.
Furthermore, Ahmad Ali et al. [31] introduced a data aggregation-based approach to exploit dynamic spatiotemporal correlations for citywide crowd flow prediction within fog computing environments. Their method emphasized scalability and edge-computing capabilities, addressing resource constraints in distributed systems. Despite its effectiveness in decentralized settings, the approach primarily targets crowd dynamics rather than continuous vehicle traffic flow and lacks fine-grained feature similarity integration, which is crucial for high-fidelity predictions in vehicular networks. Parallel to these graph-based advancements, Generative Adversarial Networks (GANs) have been increasingly employed to generate synthetic traffic data, augment sparse datasets, and enhance prediction robustness [32]. The systematic review highlighted GANs’ potential in improving data diversity and addressing missing data issues. However, GAN-based approaches primarily focus on data generation and augmentation rather than directly modeling spatiotemporal dependencies for predictive purposes. Attention-based models, such as the Attention-based Periodic-Temporal Neural Network (APTN) [33], further advance the field by enabling selective focus on relevant temporal segments and periodic patterns. These models have demonstrated superior performance, especially in highly dynamic and periodic traffic scenarios. Nevertheless, attention mechanisms alone cannot fully substitute for robust, dynamically integrated spatial and feature similarity modeling, which remains essential for network-level prediction accuracy and interpretability. Another line of research has focused on dynamic spatiotemporal graph construction, which aims to update graph structures as traffic conditions evolve. Two representative studies that fall within this category are DHSTNet and GSTRGCT. DHSTNet, introduced by Ali et al. [34], adaptively updates spatiotemporal graph structures and captures multi-scale spatial correlations. Although DHSTNet improves spatial adaptivity, it relies solely on structural correlations and does not incorporate feature-level similarity. As a result, it cannot combine spatial and feature information, and it offers limited interpretability and responsiveness to abrupt traffic changes.
GSTRGCT, proposed by Xiong et al. [35], employs tensor decomposition and autocorrelation-driven attention to model non-stationary spatiotemporal patterns. This approach captures complex dynamics but is computationally intensive and difficult to interpret. It also lacks a kernel-based similarity mechanism, and its transformer architecture reacts slowly to rapid fluctuations in traffic. Furthermore, GSTRGCT does not integrate spatial, temporal, and feature similarity within a unified framework. Table 1 provides a comparative overview of DHSTNet, GSTRGCT, ISTFA, and STF-GGRU.
Despite significant progress in traffic prediction, many existing models still face important limitations. Many approaches rely on static or partially dynamic graph structures and do not include feature-based sensor similarity. This reduces their flexibility when traffic patterns shift suddenly. It also limits their ability to capture detailed relationships between sensors in real time. Although several studies have explored dynamic graph construction, most of these methods depend mainly on structural updates. Many also use isolated attention mechanisms that do not create deeper semantic alignment between sensor features. Table 2 provides a clear summary of the related traffic prediction studies, outlining their main approaches, key features, and observed limitations.
To address these gaps, recent research has introduced techniques that incorporate richer feature information into dynamic graph learning. The ISTFA module contributes to the development of dynamic graph construction with explicit feature similarity. It does this by combining Dynamic K-Nearest Neighbour (D-KNN) with Centred Kernel Alignment (CKA). This combination quantifies feature-level similarity in a clear and interpretable way. It also supports fast and adaptive updates to the graph structure. As a result, the representation can evolve with changes in traffic conditions and sensor behaviours.
When used within a broader GCN-GRU architecture such as STF-GGRU, ISTFA provides a unified framework that brings together spatial proximity, temporal dependencies, and feature similarity. This integration improves the model’s ability to capture complex, context-dependent traffic patterns. It also maintains computational efficiency that is suitable for real-time prediction.
3. Materials and methods
3.1. Problem definition
Traffic congestion is a critical issue in modern cities, directly affecting travel time, fuel consumption, and air pollution. To manage traffic effectively, it is essential to accurately predict future traffic conditions. Traffic flow prediction aims to forecast the number of vehicles, their speed, and occupancy at specific locations and times, based on previously observed traffic patterns. Formally, the problem can be defined as forecasting future traffic flow using past traffic data collected from road sensors. This data is represented as a multivariate time series:
where N=307 for PeMSD4 and N=170N for PeMSD8.
Spatial dependencies among sensors are modeled as a graph:
where is the time-varying adjacency matrix that captures relationships between sensors, including both sparse and direct connections.
The goal is to predict future traffic flow at time
using data from the past
time steps:
where is the prediction function, and the objective is to minimize the error between the predicted and actual traffic flow:
where is the total number of sensors,
is the predicted traffic flow of sensor
at time
, and
is the actual observed traffic flow of sensor
at the same time. The loss function
represents the Mean Squared Error (MSE), which measures the average squared difference between the predicted and actual values. The objective of the model is to minimize this error to improve the accuracy of traffic flow forecasting. To address this problem, a comprehensive framework is required to integrate key components: an optimized spatial kernel to capture localized spatial dependencies, an optimized feature kernel to model global temporal patterns, and adjacency matrices to represent raw and bidirectional spatial relationships. Hence, ISTFA is proposed as in formula 20.
3.2. Overview of the proposed model
The STF-GGRU model presented in this study is structured into three key components: (1) the Feature Extraction Module, (2) the Deep Learning Modules encompassing both spatial and temporal learning, and (3) the Prediction Module, as illustrated in Fig 1. The STF-GGRU begins with collecting and preparing traffic data from 307 sensors, each recording speed, occupancy, and flow measurements every 5 minutes over a two-month period. To make sense of this high-dimensional data, the measurements are aggregated into daily, hourly, and weekly patterns using statistical methods such as mean, standard deviation, and min-max scaling. This aggregation reduces dimensionality and reveals meaningful patterns in the data, providing a clearer picture of traffic behaviour for each sensor over time.
Once the data is prepared, it enters the ISTFA module. Here, the D-KNN method identifies relevant neighbors for each sensor in two ways: spatially and by feature similarity. The spatial neighbours are identified based on the geographic distance between sensors, ensuring that nearby sensors are considered together. In contrast, the feature-based neighbors are determined by comparing aggregated traffic patterns, such as daily or weekly similarities, to find sensors with similar traffic behaviour. This two-fold neighbors selection provides a rich understanding of both geographic proximity and behaviourally similar sensors. To refine this selection, the CKA technique is then applied to align the spatial and feature-based neighbor matrices. CKA maximizes the congruence between these matrices, retaining only those neighbors that share both spatial and feature-based similarities. The result is a merged, comprehensive neighbor matrix that captures the most relevant spatiotemporal relationships for each sensor, which are essential for accurate traffic prediction. With the refined neighbor matrix as input, the model splits the data into spatial and temporal pathways. The GCN layer processes the spatial relationships, learning patterns and dependencies in the sensor network structure. Simultaneously, the GRU layer handles the temporal aspect, capturing sequential dependencies over time to understand how traffic patterns evolve. Finally, the outputs from the spatial GCN and temporal GRU pathways converge in the FCL layers, which synthesize these insights to produce the final traffic flow predictions. This layered approach, moving from raw sensor data through spatiotemporal alignment to predictive modelling, enables the proposed model to deliver precise and contextually relevant traffic forecasts.
3.3. Input module
In this study, we incorporate both spatial and temporal features to enhance the model’s ability to predict traffic flow accurately. The spatial features capture real-time, dynamic relationships among traffic sensors, while the temporal segments, including hourly, daily, and weekly intervals, capture recurring patterns over different time scales for each sensor. Fig 2 illustrates that the sensors are represented as blue dots, with lines indicating dynamically updated connections based on real-time traffic conditions. This approach allows the model to capture dynamic spatial relationships that adjust with traffic flow variations, effectively reflecting immediate changes in the sensor network structure.
In addition to adjacent sensors, the model also incorporates non-adjacent sensor relationships to account for long-range dependencies. Adjacent sensors provide immediate spatial correlations, which are essential for localized traffic predictions. However, non-adjacent sensors, even those that are physically distant, can exhibit similar patterns over time due to recurring traffic trends, such as periodic congestion or shared routes.
The proximity and density of sensors are also significant factors in the model. In urban environments, sensor distribution is uneven; certain areas, such as intersections, have densely connected sensors, while others, such as less frequented roads, have sparse connections. This variability is evident in the sensor layout, where clusters of connected sensors contrast with isolated points. By capturing these proximity and density variations, the model can accurately represent areas of high traffic flow as well as isolated sections with unique traffic patterns.
The input data is organized to capture different patterns over time, as shown in Fig 3. The data is divided into three main segments: hourly, daily, and weekly, each tailored to identify specific recurring trends. The first segment labelled
and highlighted in green, focuses on the most recent hour leading up to the current moment. This segment utilizes data from the hour immediately before the target prediction period on February 26, 2018, from 08:00–09:00 AM. It is represented mathematically as:
The weekly segment (green) focuses on the recurrent pattern at 8 AM from different week, the daily segment (blue) captures recurring traffic at 8 AM in different days, and the hourly segment (red) identifies the traffic pattern of the recent hours. The second segment, shown in blue and labelled , captures daily patterns by examining data from the same timeframe on previous days, such as January 1 and 2, 2018. This helps the model learn from past days to predict traffic flow for the current day. It is expressed as:
The third segment, indicated by the red lines and labelled , captures weekly recurring patterns by using data from the same hours but from previous weeks, specifically from January 1 to January 26, 2018. This is formulated as:
By combining these segments, the model can recognize both short-term fluctuations and longer-term patterns in traffic, leading to more accurate predictions. This approach is essential for understanding and predicting traffic dynamics, providing a solid foundation for the model’s effectiveness.
3.4. Integrated spatiotemporal feature alignment
The Integrated Spatiotemporal Feature Alignment (ISTFA) module is designed to address a key limitation in existing traffic prediction models, namely the inability to jointly and dynamically integrate spatial proximity with fine-grained temporal and feature-level similarity. As shown in Fig 4, ISTFA operates on multi-resolution traffic data, including hourly, daily, and weekly observations, and constructs an adaptive representation that evolves with traffic conditions. The importance of ISTFA lies in its ability to align spatial and temporal relationships in real time, enabling the model to distinguish between sensors that are geographically close but behaviorally different, and those that are spatially distant yet temporally correlated. By combining Dynamic K-Nearest Neighbours (D-KNN) with Centered Kernel Alignment (CKA), ISTFA produces an interpretable and adaptive graph structure that enhances responsiveness to abrupt traffic changes and improves the robustness of downstream graph convolutional learning.
3.5. Dynamic neighbourhood matrix construction
Dynamic KNN is used to build an adaptive neighbourhood structure that reflects the evolving nature of traffic flow. Unlike static KNN, where neighbours remain fixed, the neighbourhood in this approach is updated at every time step. This is achieved by recalculating similarity scores, adjusting the similarity threshold dynamically, selecting the most relevant neighbouring sensors, and weighting their contributions according to their importance for prediction.
In a real traffic network such as PeMSD4, which contains 307 loop detectors, this means that the neighbours of a given sensor at 07:00 AM during morning congestion may differ from those at 02:00 AM when traffic is light, even though the physical road network itself does not change.
At each time step , the similarity between traffic sensors is measured using the Pearson correlation coefficient across multiple temporal resolutions, including hourly
, daily
, and weekly
patterns. This allows the model to capture both short-term fluctuations and long-term trends in traffic behaviour.
The similarity between sensor and sensor
within the time window
is computed as:
Here, represents the similarity score between sensors
and
at time
, while
and
denote their mean traffic values over the selected interval. For example, two sensors located on different road segments may still show a high similarity score if congestion builds up and dissipates at the same time during peak hours.
Once similarity scores are computed, a binary selection rule is applied to determine whether two sensors should be considered neighbours:
The threshold is not fixed. Instead, it is updated dynamically based on the distribution of similarity values at time
:
Here, denotes the total number of sensors. In practice, this mechanism raises the threshold during periods of widespread congestion, keeping only the strongest relationships, and lowers it during off-peak hours, allowing a broader set of neighbours to be considered.
For each sensor at time
, the top
neighbours are selected and combined across different temporal patterns:
The weights ,
, and
control the relative importance of daily, hourly, and weekly information. For instance, during weekday rush hours, short-term hourly patterns often dominate, whereas weekly trends become more informative during weekends.
To fully describe sensor relationships, two complementary kernel matrices are constructed from the dynamic neighbourhood information.
Then, the spatial kernel matrix captures proximity-based relationships derived mainly from daily traffic behaviour:
Each matrix is defined as:
Lower values in this matrix indicate stronger spatial affinity, reflecting sensors that exhibit similar daily traffic patterns.
In parallel, a feature similarity kernel is constructed to capture temporal behaviour beyond spatial proximity:
with entries computed as:
This kernel highlights sensors that behave similarly over short and long-time horizons, even if they are geographically distant.
3.6. Kernel alignment using CKA
To integrate spatial and feature-based relationships, Centered Kernel Alignment (CKA) is applied. The alignment score between the two kernels is given by:
A higher alignment score indicates stronger consistency between spatial proximity and temporal behaviour, guiding the model toward a more meaningful kernel combination.
The aligned spatial and feature kernels are obtained as:
Finally, the Integrated Spatiotemporal Feature Alignment (ISTFA) matrix is formed as:
This matrix jointly represents spatial proximity, temporal behaviour, and their interactions. It serves as the adjacency structure for the GCN layers, enabling effective information propagation and improving traffic flow prediction accuracy. The overall procedure for constructing the ISTFA similarity kernel, including dynamic neighbourhood selection, threshold adaptation, kernel construction, and CKA-based alignment, is formally summarized in Algorithm 1.
(Input array represents Traffic data (daily, hourly, weekly) is processed through Dynamic KNN to capture spatial relationships, generating spatial adjacency () and feature similarity (
) matrices which are by CKA to produce the arrays (
and
.Then, they are integrated into the ISTFA array, which serves as input to the GCNN
Algorithm1. ISTFA
Input:
• D: historical dataset {x₁, x₂, …, x₍ₘ ₋ ₁₎}
• X₀: feature vector of the target sensor at time t
• P: sensor position matrix
• k: number of neighbours
• ε: small constant
• θ_min, θ_max: minimum and maximum similarity thresholds
• M₀: number of missing sensors
• M: total number of sensors
• γ: threshold scaling parameter
• βₛ, β𝒻: spatial and feature fusion weights
Output:
• K_ISTFA(t): ISTFA similarity kernel matrix at time t
Begin
m ← |D|;
distances ← NULL;
X_e ← NULL;
X_d ← NULL;
for i ← 1 to m do
Compute distᵢ between xᵢ and X₀;
Add (distᵢ, xᵢ) to distances;
end for
Sort distances in ascending order;
X_e ← first k elements of distances;
D_s ← (M₀ / M) × 100;
θ ← θ_min + (θ_max − θ_min) × (1 − γ × D_s / 100);
for each (distᵢ, xᵢ) in distances do
if distᵢ < θ then
Add xᵢ to X_d;
end if
end for
Compute weights wᵢ ← 1 / (distᵢ + ε);
Normalize all weights.
X_c(t) ← weighted combination of X_e and X_d;
Construct spatial kernel Kₛ(t) using P and X_c(t);
Construct feature kernel K𝒻(t) using X_c(t);
Apply CKA to align Kₛ(t) and K𝒻(t);
Obtain optimized kernels Kₛ*(t) and K𝒻*(t);
K_ISTFA(t) ← [Kₛ*(t) A Aᵀ K𝒻*(t)];
Return K_ISTFA(t);
End
3.7. Spatial learning
This lays the groundwork for the next critical phase—feeding the enriched array into a GCN for intricate traffic flow learning. The array
is primed to serve as the initial feature matrix
for our GCN:
This initial representation encompasses the comprehensive sensor relationships at time
and is pivotal for the GCN to discern and exploit the underlying structure within the traffic network. The first layer of the GCN receives
and performs the following operation:
Here, signifies the nodes’ feature representations after the first convolutional layer,
represents a non-linear activation function(ReLU), to introduce non-linearity into the model,
is the adjacency matrix of the graph with self-connections, facilitating the inclusion of a node’s features in the update process,
is the degree matrix corresponding to inclusion of a node’s features in the update process,
is the degree matrix corresponding to
ensuring that the feature aggregation is normalized and
is the weight matrix for the first layer, which is subject to optimization during the training phase. Fig 5 presents the flowchart of the ISTFA array.
The output then acts as the input for the subsequent layer. The final output of the GCN
after layers can be aligned with the traffic flow labels through a prediction layer. Fig 6 shows the architecture of the input and output matrices of GCN in the spatial learning module.
3.8. Temporal leaning
The temporal module, depicted in Fig 7 is designed to capture, and model the sequential dependencies across different time scales—specifically, weekly, daily, and hourly patterns. This module integrates these temporal dependencies using a series of GRU layers arranged in a multi-layered structure, with each layer corresponding to a specific temporal pattern. The input to the temporal module consists of three distinct sequences: ,
, and
, representing weekly, daily, and hourly data at the i-th sensor, respectively. Each GRU cell receives inputs corresponding to these temporal patterns. For example,
,
, and
, represent the features for the first sensor, while
,
, and
correspond to the second sensor, continuing in this manner for all N sensors. The GRU cells process these inputs at each time step to generate hidden states
,
, and
, which capture the temporal dependencies within each time scale. The operation of each GRU cell is governed by the following equations:
- 1. Update gate:
Where, determines the extent to which the previous hidden state
will contribute to the current hidden state.
represents the weight matrix for the update gate, and
is the sigmoid activation function.
- 2. Reset Gate
:
The reset gate decides how much of the previous hidden state should be forgotten.
is the weight matrix associated with the reset gate.
- 3. Candidate Hidden State
:
The candidate hidden state represents the new content that could be added to the current hidden state, modulated by the reset gate.
is the weight matrix for the candidate hidden state, and
denotes the hyperbolic tangent activation function.
- 4. Final Hidden State
:
The final hidden state is a combination of the previous hidden state
and the candidate hidden state
, weighted by the update gate.
The hidden states are updated according to these GRU equations, where the update gate , reset gate
, and new hidden state
are computed to maintain the temporal continuity of the sequences. The architecture allows for horizontal connections between the GRU cells within the same row, which propagate temporal information forward across the time sequence. Additionally, vertical connections between different rows, representing different temporal scales, enable the integration of multi-scale temporal features, ensuring that each temporal level informs the others. The final hidden states from each temporal scale, denoted as
,
, and
for weekly, daily, and hourly outputs respectively, are then used to make the final predictions. These outputs are effectively combined to generate a comprehensive prediction that encapsulates information from all three temporal scales. This temporal module is a critical component of the model, integrating multi-scale temporal data to enhance prediction accuracy. The structured configuration of the GRU cells ensures that both short-term and long-term dependencies are captured, providing predictions that are informed by comprehensive temporal insights.
3.9. Prediction module
The Prediction Module, shown in Fig 8 combines the features learned from the spatial relationships captured by the GCN and the time-based patterns modelled by the GRU network to produce a final traffic flow prediction. The inputs to these FCL are the outputs from the temporal module, labelled as , and the outputs from the spatial module, labeled as
. These outputs represent the time-based features and the spatial relationships between different traffic sensors. These input vectors are combined into a single feature vector, which is then fed into the first FCL. The fully connected structure includes two hidden layers. The first hidden layer receives the combined feature vector as its input. Each node in this layer is represented as
, where
= 1, 2, 63). The process in this layer can be described by the equation:
where is the output of the first hidden layer,
is the weight matrix,
is the bias vector, and
is the activation function, a ReLU function. ReLU is chosen as the activation function due to its ability to introduce non-linearity into the model while maintaining computational efficiency. Non-linearity is crucial for capturing complex traffic flow patterns that a linear model would miss. ReLU is defined as:
This simple yet powerful function outputs the input directly if it is positive and outputs zero otherwise. Its use helps mitigate the vanishing gradient problem, common in deep networks, thereby ensuring faster convergence and more stable training. The output from the first hidden layer, , is then fed into the second hidden layer, which has nodes represented as
, where k = 1, 2, . . ., 31. The operation in this layer is given by:
Where is the output of the second hidden layer,
is the weight matrix, and
is the bias vector. Finally, the second hidden layer’s output,
, is passed into the output layer, which generates the predicted traffic flow value, represented by
.This final prediction is calculated using the equation:
Where is the weight vector connecting the second hidden layer to the output, and
is the bias term. The use of ReLU activation in the hidden layers helps the model effectively learn and capture complex patterns in the data. The final output
is then compared with the actual traffic flow to measure the prediction error, using MSE. This ensures that the fully connected layers effectively translate the spatial and temporal features learned by the model into accurate and reliable traffic flow predictions.
4. Experiments
The implementation of the STF-GGRU model, as illustrated in Fig 9 begins with data aggregation into daily, hourly, and weekly intervals to capture temporal patterns. The D-KNN method extracts feature arrays (Kf) and spatial arrays (Ks), which are aligned using CKA to generate the ISTFA matrix, encapsulating spatiotemporal dependencies. The ISTFA matrix is then split into validation, training, and testing subsets for hyperparameter tuning, model training, and evaluation. The STF-GGRU structure is configured with GRU layers for temporal dependencies and GCN layers for spatial relationships, followed by fully connected layers for feature integration. The model undergoes fine-tuning and evaluation to produce the final predictive model for traffic flow forecasting. Each step of this process will be explained in detail in the subsequent sections.
5. Datasets
This research employs the PeMSD4 and PeMSD8 datasets, which are well-established benchmarks for traffic flow prediction. Tables 3, 4 showcase samples from these datasets. The PeMSD4 dataset comprises traffic data collected from 307 sensors between January 1, 2018, and February 28, 2018. In contrast, the PeMSD8 dataset includes data from 170 sensors recorded from July 1, 2016, to August 31, 2016. Both datasets feature records from multiple sensors, capturing essential metrics such as traffic flow, occupancy, and speed at various timestamps. For example, in the PeMSD4 dataset, Sensor 0 recorded a flow of 62 vehicles on “2018-01-01T00:00:00Z,” with an occupancy rate of 0.0074 and a speed of 67.3 km/h. Similarly, in the PeMSD8 dataset, Sensor 0 recorded a flow of 133 vehicles on “2016-07-01T00:00:00Z,” with an occupancy rate of 0.0637 and a speed of 65.8 km/h.
6. Evaluation of metrics and baseline models
To evaluate the performance of the proposed model, three key metrics have been employed which are Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). These metrics assess the accuracy of the predicted traffic flow, denoted as = {
,
}, compared to the actual observed traffic flow,
= {
,
}, where N represents the number of stations in the traffic network.
RMSE measures the standard deviation of the prediction errors, giving greater weight to larger errors. Lower RMSE values indicate higher prediction accuracy, especially when large prediction errors are minimized. It is calculated as:
MAE provides the average magnitude of the prediction errors, offering a straightforward measure of accuracy by averaging absolute differences between predicted and actual values. It is determined by:
MAPE expresses error as a percentage, highlighting the model’s accuracy in relative terms. It’s particularly useful for understanding prediction performance across different scales in traffic flow. It is determined by:
Both RMSE and MAE offer insights into the differences between predicted and actual traffic flows, with lower values indicating higher prediction accuracy. The proposed STFGGRU model for long-term traffic flow prediction is compared against the following baseline methods:
- Autoregressive Integrated Moving Average (ARIMA) [37]: This model treats the temporal data sequence as a stochastic sequence and utilizes autocorrelation analysis to forecast future values based on the historical time series data.
- Gated Recurrent Unit (GRU) [38]:This model used GRU models for urban traffic flow prediction. By integrating traffic data, episodic events, and weather information, the research aims to improve the accuracy of traffic congestion forecasts.
- Long Short-Term Memory (LSTM) [39]: This model addresses the issue of gradient vanishing using a “gate” mechanism, making it a popular choice for time series prediction. It takes the traffic flow from the previous time step as input to predict the flow at the next time step.
- K- Nearest Neighbour LSTM (KNN-LSTM) [40]: This model based on KNN, and two layers of LSTM can capture the spatial dependence based on the most related neighbouring stations and mine the variability of the traffic flow.
- Spatiotemporal Graph Convolutional Network (STGCN) [41]: This model based on fixed Laplacian matrix for spatial temporal data.
- Multi-Component Spatiotemporal GCN (MSTGCN) [42]: This is a spatiotemporal graph convolution model incorporating a spatiotemporal attention mechanism. It utilizes the road network structure to model station relationships, forming the foundation for graph convolution. It has demonstrated remarkable accuracy in predicting road traffic flow.
- Attention-based Periodic-Temporal Neural Network (APTN) [33]:This model uses spatiotemporal attention to capture complex dependencies by incorporating encoder and temporal attention mechanisms. Its ability to enable inter-node interactions while preventing overfitting makes it effective for traffic prediction challenges.
6.1. Experiment setup
The PeMSD4 and PeMSD8 datasets were used to evaluate the STF-GGRU model, split into training, validation, and test sets with a 7:1.5:1.5 ratio, and 5-fold cross-validation ensured robust results. Input features (traffic flow, occupancy, speed) were normalized to [0, 1], and data was aggregated into hourly, daily, and weekly segments to capture multi-scale temporal patterns. In terms of architecture, the model includes GRU layers and a two-layer GCN with 64 units each. Key hyperparameters were optimized: a learning rate of 0.001, batch size of 64, dropout rate of 0.3, and up to 100 training epochs with early stopping and a learning rate scheduler reducing the rate by 0.1 if validation loss plateaued for 10 epochs. To evaluate model sensitivity, increasing GCN layers from 1 to 3 reduced RMSE by 8% on PeMSD4, and 64 GRU units were found to balance efficiency and accuracy. Additionally, a dynamic KNN threshold of 0.7 in ISTFA improved MAE by 5%. To ensure robust experimentation, experiments were conducted on Google Collab Pro (Intel Xeon, 13GB RAM, NVIDIA Tesla T4 GPU) using TensorFlow, PyTorch, and Scikit-learn. Metrics such as RMSE, MAE, and MAPE benchmarked STF-GGRU against baseline models, including ARIMA, GRU, LSTM, KNN-LSTM, STGCN, MSTGCN, and APTN. Finally, paired t-tests (0.05 significance) confirmed the importance of hyperparameter optimization for achieving state-of-the-art performance in traffic flow.
7. Results
7.1. Comparison with other baseline methods
The STF-GGRU model was evaluated against baseline models using RMSE, MAE, and MAPE metrics on the PeMSD4 and PeMSD8 datasets, as summarized in Table 5, 6. The results were obtained by averaging the outcomes over 5 independent runs with different random seeds to ensure robustness, and the standard deviations are reported to reflect performance variability. These comparisons demonstrate STF-GGRU’s ability to significantly outperform traditional and neural network-based models due to optimized parameter choices and its advanced spatiotemporal modelling capabilities. In particular, the dynamic KNN threshold in the ISTFA module was set to 0.7, ensuring effective spatiotemporal feature alignment. Additionally, the use of two GCN layers enabled the model to capture intricate spatial dependencies across sensors, while the inclusion of 64 GRU units balanced the ability to model long-term temporal patterns with computational efficiency. These parameter optimizations resulted in a significant reduction in RMSE values, achieving 27.18 ± 0.85 for PeMSD4 and 11.01 ± 0.50 for PeMSD8. In contrast, the GRU model recorded an RMSE of 58.5 ± 2.5 on PeMSD4, and ARIMA showed higher RMSE values of 54.14 ± 2.00 on PeMSD4 and 44 ± 1.80 on PeMSD8 as shown in Fig 10.
Further illustrating its effectiveness, MAE results confirmed STF-GGRU’s superior performance. The model achieved the lowest MAE values of 16.2 ± 0.60 for PeMSD4 and 10.75 ± 0.40 for PeMSD8. By comparison, ARIMA, GRU, and KNN-LSTM exhibited much higher MAE values with greater variability. This performance was supported by a carefully tuned learning rate of 0.001 and a dropout rate of 0.3, which enhanced convergence stability and mitigated overfitting as shown in Fig 11. Additionally, where error bars represent the standard deviation across 5 independent runs. This highlights the model’s consistent performance and confirms minimal overlap with competing method. The MAPE measures the average percentage error between predicted and actual values, with lower values indicating higher accuracy. Fig 12 compares the MAPE of various models, highlighting the STF-GGRU model’s superior performance due to optimized spatiotemporal feature extraction. All results are averaged over five independent runs, with standard deviations included to ensure reliability and illustrate performance variability. For the PeMSD4 dataset, traditional models like ARIMA (25.3 ± 1.2%) and GRU (23.5 ± 1.1%) exhibited higher MAPE values, while LSTM (21.2 ± 1.0%) and KNN-LSTM (21.1 ± 0.9%) showed moderate improvements. Advanced spatiotemporal models, STGCN (20.4 ± 0.8%) and MSTGCN (19.86 ± 0.7%), further reduced errors. STF-GGRU achieved the lowest MAPE of 9.67 ± 0.3%, attributed to parameters like a dynamic KNN threshold of 0.7 and two GCN layers for capturing intricate spatial dependencies. In the less complex PeMSD8 dataset, all models performed better. ARIMA (18.6 ± 1.0%), GRU (15.9 ± 0.9%), and LSTM (15.43 ± 0.8%) showed steady improvements, while KNN-LSTM (15.2 ± 0.7%) and STGCN (14.64 ± 0.6%) demonstrated further accuracy gains. MSTGCN reduced errors to 14 ± 0.5%, but STF-GGRU achieved the lowest MAPE of 6.97 ± 0.2%. The learning rate of 0.001 and a dropout rate of 0.3 played critical roles in preventing overfitting and ensuring convergence, validating the robustness of STF-GGRU across datasets. Overall, these results highlight the importance of strategic parameter selection, optimization, and robustness in evaluation. The STF-GGRU model’s performance on PeMSD4 and PeMSD8 demonstrates its ability to deliver state-of-the-art accuracy in traffic flow prediction while maintaining stability across multiple runs, surpassing other benchmark models.
7.2. Comparison of STF-GGRU and other baseline methods under data variability
Model Performance Variability Analysis includes two examples that assess how well the model can adjust to different kinds of conditions. Using data collected from a variety of sensors, Case 1 investigates its spatial precision. Case 2 examines how well the model consistently predicts different traffic volumes. The collective analysis assesses the model’s reliability and practicality in predicting traffic conditions.
7.3. Case1: Scalability of STF-GGRU at different sensors
This case study examines the scalability and predictive performance of the proposed STF-GGRU model in comparison with several benchmark methods across all sensors in the PeMSD4 and PeMSD8 datasets. The strong performance of STF-GGRU is mainly attributed to the effective integration of dynamic KNN-based graph construction, stacked GCN layers, GRU-based temporal modeling, and the ISTFA module. Careful parameter tuning, including optimization of the dynamic KNN threshold, selection of two GCN layers, and a GRU hidden size of 64, played an important role in improving model adaptability and prediction accuracy. As shown in Fig 13, STF-GGRU closely follows the ground-truth traffic patterns on PeMSD4 and clearly outperforms the baseline models, demonstrating its ability to capture complex spatiotemporal relationships. Similarly, Fig 14 illustrates stable and reliable performance on the PeMSD8 dataset despite the smaller number of sensors, highlighting the robustness of the proposed model. The quantitative results presented in Figs 15, 16 further confirm the superiority of STF-GGRU. On PeMSD4, the model achieves the lowest RMSE of 30.15, MAE of 22.48, and MAPE of 10, along with the highest coefficient of determination with an R² value of 0.91, outperforming APTN and other competing approaches. These performance trends are consistently maintained on PeMSD8, where STF-GGRU again records the lowest prediction errors and the highest R² of 0.93, demonstrating strong generalization across datasets with different network sizes.
In terms of computational complexity, the STF-GGRU model combines GCNs, GRUs, and the ISTFA module to capture complex spatiotemporal dependencies. Although this integrated design leads to state-of-the-art prediction accuracy, it also results in increased computational requirements. The GCN component has a complexity of , where the quadratic dependence on the feature dimension
becomes computationally expensive, especially in dense graph structures. The GRU component operates with a complexity of
, which remains relatively efficient but increases with the length of the temporal sequence. The ISTFA module introduces a complexity of
and represents the main source of computational overhead, particularly for large-scale sensor networks. Overall, the model complexity can be expressed as
, highlighting the trade-off between high predictive performance and computational cost.
Despite these computational demands, STF-GGRU remains feasible for medium-sized datasets such as PeMSD4 and PeMSD8 when deployed on modern GPU platforms. However, real-time and large-scale deployment scenarios may benefit from further algorithmic optimizations, including sensor clustering strategies and sparse matrix computations, to reduce computational and memory overhead. This scalability is further validated by the experimental results presented in Table 7, where the number of nodes varied from 50 to 300. Across this range, prediction accuracy remains stable, with only marginal increases observed in MAE, RMSE, and MAPE. At the same time, training time, GPU memory consumption, and inference latency increase gradually and remain within practical limits. These results confirm that STF-GGRU scales effectively while maintaining a strong balance between accuracy and efficiency. In summary, STF-GGRU not only achieves state-of-the-art predictive performance but also demonstrates robust scalability, making it well suited for real-world deployment in large-scale urban traffic management systems.
7.4. Performance of the model prediction at different traffic volumes
A focused analysis of our traffic prediction model’s performance across various sensor volumes and times uses RMSE as the key metric to assess accuracy. By comparing actual and predicted traffic data, we evaluate the model’s ability to capture traffic patterns, particularly during rush hours and transitional periods. The comparison between the predicted and actual traffic flow is illustrated in Figs 17–20 for four representative sensors from the PeMSD4 and PeMSD8 datasets. Specifically, Fig 17 presents the prediction results for Sensor 16 on PeMSD4, while Fig 18 shows the results for Sensor 250 on PeMSD4. The corresponding RMSE values of 17.8 and 15.79 indicate that the proposed model accurately captures the underlying traffic trends for both sensors. In addition, Fig 19 illustrates the prediction performance for Sensor 14 on PeMSD8, which achieves the lowest RMSE value of 13, demonstrating superior accuracy on this dataset. Finally, Fig 20 depicts the results for Sensor 105 on PeMSD8, further confirming the model’s ability to closely follow real traffic dynamics across different sensors and network scales.
The model performance is further examined in Figs 21–24. Fig 21 shows the prediction results for Sensor 250 on the PeMSD4 dataset during the 15:00–16:00 period, where an RMSE of 27.96 indicates the increased difficulty of prediction under highly variable traffic conditions. Fig 22 presents the results for Sensor 160 on PeMSD4 during the rush hour, achieving better accuracy with an RMSE of 20.90, as the predicted values closely follow the actual traffic patterns. For the PeMSD8 dataset, Fig 23 illustrates the results for Sensor 14 during the rush hour, which records the lowest RMSE of 18.43, highlighting the model’s strong performance during peak traffic periods. In addition, Fig 24 shows the prediction results for Sensor 105 on PeMSD8 during the 14:00–15:00 period, with an RMSE of 17.81, demonstrating reliable performance even during transitional traffic conditions. Overall, these results suggest that the model performs more effectively during rush-hour periods than during non-rush or transitional intervals. Although prediction accuracy varies slightly across sensors, the consistently reasonable RMSE values across all cases confirm that the model provides reliable and accurate traffic flow predictions.
8. The ablation analysis of STF-GGRU
The STF-GGRU model is designed to capture traffic flow patterns by jointly modelling temporal evolution, spatial relationships, and adaptive feature importance. To examine how each component contributes to prediction accuracy, an ablation study is conducted using data from Sensor 75 in the PeMSD4 dataset. This sensor recorded the highest traffic volume on February 9, 2018, particularly during the morning rush hour shown in Fig 25. The traffic profile on this day is highly dynamic, showing a rapid rise in flow, pronounced peaks, and noticeable fluctuations. These characteristics make it a suitable case for evaluating the model under realistic and challenging conditions.
As illustrated in Fig 25, traffic flow increases sharply during the rush-hour period and then gradually decreases. Several short-term variations are also observed throughout the day. Accurately capturing this behaviour requires a model that can respond quickly to changes over time. This requirement is clearly reflected in Fig 26, where the temporal module is removed. Without temporal modelling, the predicted traffic curve becomes overly smooth and lags the actual data. The model struggles to follow sudden increases and sharp drops in traffic, particularly during peak congestion. As a result, prediction accuracy deteriorates significantly. The RMSE increases to approximately 79.86, compared to 29.62 when the temporal module is included. This large gap confirms that temporal modelling is the most influential factor in achieving accurate traffic predictions.
The importance of spatial relationships is examined in Fig 27 by removing the spatial module. In this case, the model continues to follow the overall temporal pattern. However, noticeable deviations from the actual traffic flow appear during transition phases, such as the buildup and release of congestion. These differences suggest that information from neighbouring sensors helps refine local predictions. Without spatial modelling, the RMSE increases to about 32.48. This result indicates that spatial modelling provides a meaningful improvement, although its impact is smaller than that of temporal modelling and adaptive feature extraction.
In addition to temporal modelling, STF-GGRU relies on the ISTFA module to adaptively highlight the most relevant spatiotemporal features. When this module is removed, as shown in Fig 28, the predicted traffic flow becomes less responsive to rapid changes. Peak values are underestimated, and fluctuations are smoothed out, especially during busy periods. Although the general trend is still captured, the model fails to reflect the true intensity of congestion. This limitation leads to an RMSE of approximately 41.01, compared to 29.74 for the complete model. These results show that adaptive feature extraction plays a key role in handling complex traffic behaviour.
Further insight is gained by analysing the individual elements within the ISTFA module. Fig 29 shows the effect of removing the CKA alignment while keeping the D-KNN mechanism. The predictions remain generally accurate but display slight inconsistencies. Alignment with peak values is also reduced. This behaviour results in an RMSE of approximately 33.9. The result suggests that CKA helps stabilize feature representations and improve consistency, even if its impact is relatively subtle. Similarly, Fig 30 illustrates the effect of removing the D-KNN mechanism. Without dynamic neighbour selection, the model becomes less sensitive to sudden traffic changes, particularly during peak periods. This leads to an RMSE of approximately 35.4. These findings indicate that D-KNN contributes to adaptability under varying traffic densities.
Overall, the ablation results show that the temporal module has the strongest influence on prediction accuracy. It is followed by the ISTFA module, which enhances the model’s ability to adapt to complex and rapidly changing traffic patterns. The spatial module further improves prediction quality by capturing interactions between sensors. The D-KNN and CKA components provide additional gains through adaptability and representation stability. Together, these components allow STF-GGRU to produce accurate and reliable traffic flow predictions under challenging real-world conditions.
9. Conclusion and future work
In this paper, we introduced STF-GGRU, a novel traffic flow prediction framework designed to address the limitations of existing models in handling complex, nonlinear, and dynamic urban traffic patterns. The key innovation of our approach is the Integrated Spatiotemporal Feature Alignment (ISTFA) module, which uniquely combines Dynamic K-Nearest Neighbor (D-KNN) and Centered Kernel Alignment (CKA) dynamically integrate both spatial proximity and feature-based similarities into a unified graph representation. Through extensive experiments on large-scale real-world datasets (METR-LA and PEMS-BAY), STF-GGRU demonstrated superior predictive accuracy compared to state-of-the-art baseline models across multiple short-term and long-term horizons. Our additional scalability analysis further validated the model’s robustness and computational efficiency under varying network sizes and sensor densities, highlighting its practical viability for deployment in large-scale intelligent transportation systems and real-time urban mobility management. The proposed STF-GGRU model can be practically implemented within modern urban traffic management systems by integrating it into either centralized traffic control centers or distributed edge computing infrastructures, such as roadside processing units or local traffic servers. Real-time traffic flow data collected from existing sensing networks, including loop detectors, traffic cameras, GPS-equipped vehicles, and connected vehicle systems, can be streamed directly to the model for continuous prediction without requiring fundamental changes to current infrastructure. The adaptive nature of the ISTFA module allows STF-GGRU to process dynamic traffic changes promptly and provide timely flow forecasts that can inform signal control, congestion mitigation strategies, and route guidance services. Furthermore, computational cost analysis shows that the model maintains moderate training times, reasonable GPU memory usage, and fast inference speeds (ranging from 18 to 42 milliseconds), ensuring its practicality for real-time deployment. The model can also be periodically updated using newly collected data to sustain accuracy over time. These features collectively enable STF-GGRU to be seamlessly integrated into existing smart city platforms and intelligent transportation infrastructures, supporting more efficient, adaptive, and data-driven urban mobility management. In contrast to prior works that primarily focus on either static spatial relationships or simplified temporal dependencies, STF-GGRU offers a comprehensive and adaptive framework capable of modeling intricate spatiotemporal correlations. This original contribution opens new avenues for advanced traffic prediction research and provides a solid foundation for developing intelligent, responsive, and scalable traffic management solutions in smart cities. Future work will focus on integrating external factors such as weather conditions, major events, and multi-modal transportation data to further enhance prediction accuracy and system resilience. Additionally, exploring edge and fog computing deployment architectures will be pursued to support distributed, real-time traffic control applications.
References
- 1. Jilani U. A systematic review on urban road traffic congestion. Wireless Personal Communications. 2023;(November 2024).
- 2.
Liu R, Shin S. A review of traffic flow prediction methods in intelligent transportation system construction. 2025.
- 3.
Wang Y, Chen P. Transformer and temporal convolutional network. 2025;:1–22. https://doi.org/10.1371/journal.pone.0320368
- 4. Nagpal H, Soppelsa A, Madsen H, Fedrizzi R. An adaptive multi-seasonal ARIMA approach for domestic hot water load forecasting: A pilot study. Energy Build. 2025;347(PA):116226.
- 5.
Xu C, Chen Y, Zeng Q, Yang S. Informer – SVR: Traffic volume prediction hybrid model considering residual autoregression correction. 2025;d(4). https://doi.org/10.1061/JTEPBS.TEENG-8715
- 6. Jayanthi LN. Early detection of road traffic injuries with IoT sensor networks and K-nearest neighbors algorithm. In: 2025 Int. Conf. Vis. Anal. Data Vis., 2025. 773–8.
- 7. Chen Y, Zheng L, Tan Z. Roadside LiDAR placement for cooperative traffic detection by a novel chance constrained stochastic simulation optimization approach. Transp Res Part C. 2024;167(July):104838.
- 8. Yuan H, Li G. A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation. Data Sci Eng. 2021;6(1):63–85.
- 9. Shen W, Zhang H, Guo S, Zhang C. Time-Wise Attention Aided Convolutional Neural Network for Data-Driven Cellular Traffic Prediction. IEEE Wireless Commun Lett. 2021;10(8):1747–51.
- 10. Yang H, Li Z, Qi Y. Predicting traffic propagation flow in urban road network with multi-graph convolutional network. Complex Intell Syst. 2024;10(1):23–35.
- 11. Bai J, Zhu J, Song Y, Zhao L, Hou Z, Du R, et al. A3T-GCN: Attention Temporal Graph Convolutional Network for Traffic Forecasting. IJGI. 2021;10(7):485.
- 12. Zhai X, Shen Y. Short-Term Bus Passenger Flow Prediction Based on Graph Diffusion Convolutional Recurrent Neural Network. Appl Sci. 2023;13(8).
- 13. Dolatyabi P, Khodayar M. Graph Neural Networks and Their Applications in Power Systems: A Review. In: 2025 IEEE Int. Conf. Electro Inf. Technol., 2025. 1–10.
- 14. Tell MR. Dynamic spatio-temporal graph convolutional networks. 2022.
- 15. Wang D, Bao Y, Wang C. A hybrid deep learning method based on CEEMDAN and attention mechanism for network traffic prediction. IEEE Access. 2023;11(April):39651–63.
- 16.
Jilani U. Traffic congestion classification using GAN-based synthetic data augmentation and a novel 5-layer convolutional neural network model. 2022.
- 17. He Z, Chow C-Y, Zhang J-D. STNN: A Spatio-Temporal Neural Network for Traffic Predictions. IEEE Trans Intell Transport Syst. 2021;22(12):7642–51.
- 18. Hoque JM, Erhardt GD, Schmitt D, Chen M, Wachs M. Estimating the uncertainty of traffic forecasts from their historical accuracy. Transportation Research Part A: Policy and Practice. 2021;147:339–49.
- 19. Hussain F, Ali Y, Li Y, Haque MM. Real-time crash risk forecasting using Artificial-Intelligence based video analytics: A unified framework of generalised extreme value theory and autoregressive integrated moving average model. Analytic Methods in Accident Research. 2023;40:100302.
- 20.
Id YK, Kim H, Lee H, Id JC. Privacy-preserving parallel kNN classification algorithm using index-based filtering in cloud computing. 2022:1–29. https://doi.org/10.1371/journal.pone.0267908
- 21.
Guo C, Chou Y. A novel machine learning strategy for model selections - Stepwise Support Vector Machine. 2020;:1–18. https://doi.org/10.1371/journal.pone.0238384
- 22.
Guo C, Zhu J, Wang X. Applied sciences MVHS-LSTM: The comprehensive traffic flow prediction based on improved LSTM via multiple variables heuristic selection. 2024.
- 23. Singh V, Sahana SK, Bhattacharjee V. A novel CNN-GRU-LSTM based deep learning model for accurate traffic prediction. Discov Comput. 2025;28(1).
- 24. Ren C. Spatial – Temporal Traffic Flow Prediction With Fusion Graph Convolution Network and Enhanced Gated Recurrent Units. IEEE Access. 2024;12(December 2023):56477–91.
- 25. Mehdi MZ, Kammoun HM, Benayed NG, Sellami D, Masmoudi AD. Entropy-based traffic flow labeling for CNN-based traffic congestion prediction from meta-parameters. IEEE Access. 2022;10:16123–33.
- 26. Liang G, U K, Ning X, Tiwari P, Nowaczyk S, Kumar N. Semantics-Aware Dynamic Graph Convolutional Network for Traffic Flow Forecasting. IEEE Trans Veh Technol. 2023;72(6):7796–809.
- 27. Xiao W, Wang X. Spatial-Temporal Dynamic Graph Convolutional Neural Network for Traffic Prediction. IEEE Access. 2023;11(September):97920–9.
- 28. Li Y, Shahabi C, Yu R, Liu Y. Graph Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Location Privacy in Spatial Crowdsourcing View Project Anomaly and Change Detection View Project Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. Iclr. 2017;(July):1–16.
- 29. Chen J, Zheng L, Hu Y, Wang W, Zhang H, Hu X. Traffic flow matrix-based graph neural network with attention mechanism for traffic flow prediction. Information Fusion. 2024;104:102146.
- 30. Li Y, Feng L, Tang C. A Vehicle Path Planning and Prediction Algorithm Based on Attention Mechanism for Complex Traffic Intersection Collaboration in Intelligent Transportation. IEEE Trans Intell Transport Syst. 2025;26(10):17522–33.
- 31.
Ali A, Zhu Y, Zakarya M. A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. 2021;31401–33.
- 32.
Li Y, Bai F, Lyu C, Qu X, Liu Y. A systematic review ofgenerative adversarial networks for traffic state prediction: Overview, taxonomy, and future prospects. 2025;117.
- 33. Shi X, Qi H, Shen Y, Wu G, Yin B. A spatial – temporal attention approach for traffic prediction. In: 2020. 1–10.
- 34. Ali A, Zhu Y, Zakarya M. Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw. 2022;145:233–47. pmid:34773899
- 35. Xiong L, Su L, Zeng S, Li X, Wang T, Zhao F. Generalized spatial–temporal regression graph convolutional transformer for traffic forecasting. Complex Intell Syst. 2024;10(6):7943–64.
- 36. Li G, Wang D, Zhang Y, Liang C, Xiao Q, Luo J. Using graph attention network and graph convolutional network to explore human circRNA–disease associations based. Frontiers in Genetics. 2022;13(February):1–13.
- 37. Williams BM, Hoel LA. Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. J Transp Eng. 2003;129(6):664–72.
- 38. Jang HC, Chen CA. Urban Traffic Flow Prediction Using LSTM and GRU †. Eng Proc. 2024;55(1).
- 39. Zhao Z, Chen W, Wu X, Chen PCV, Liu J. LSTM network: A deep learning approach for short-term traffic forecast. IET Image Process. 2017;11(1):68–75.
- 40. Luo X, Li D, Yang Y, Zhang S. Spatiotemporal traffic flow prediction with KNN and LSTM. J Adv Transp. 2019;2019.
- 41. Bing ZZ, Yin H. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. arXiv Preprint. 2018. 3634–40.
- 42. Guo S, Lin Y, Feng N, Song C, Wan H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: 33rd AAAI Conf. Artif. Intell. AAAI 2019, 31st Innov. Appl. Artif. Intell. Conf. IAAI 2019 9th AAAI Symp. Educ. Adv. Artif. Intell. EAAI 2019, 2019. 922–9.