Figures
Abstract
With the rapid growth in global air traffic volume, accurate airspace traffic prediction has become critical for enhancing aviation safety and promoting sustainable airspace management. However, most existing approaches have demonstrated success primarily in short-term forecasting, whereas long-term traffic prediction remains challenging. This difficulty arises because, as the temporal granularity grows, the predictive model’s ability to learn traffic dynamics declines, and positional information is more easily lost over long sequences, resulting in diminished forecasting accuracy. To address these challenges, this paper proposes a novel intrinsic dynamics capture architecture, termed IDCformer, which capitalizes on the intrinsic characteristics of airspace flow to achieve long-term sequence prediction. IDCformer comprises three core modules: a Trend and Seasonal Extraction module (TSE), enhanced feature representation and position-aware Patch Time Series Transformer (PatchTST), and a Local Self-Attention module (LAT). Specifically, the TSE module preprocesses the input data to stabilize the data and extract long-term dynamics; second, the position-aware PatchTST alleviates the issue of temporal order loss in long sequences by integrating convolutional positional signals; finally, the LAT provides hierarchical refined processing to capture local fluctuations, thereby improving the accuracy of long-term forecasting. Experimental results based on real-world air traffic data indicate that our method surpasses other state-of-the-art models in predictive performance. Furthermore, this paper investigates the capacity of IDCformer to incorporate external information; the findings demonstrate that when external data are introduced as additional input features, IDCformer’s long-term prediction performance is further enhanced, illustrating its potential for effectively leveraging multisource information.
Citation: Liu B, Tang W, Huang Z (2026) An intrinsic dynamics capture network for long-term airspace traffic prediction. PLoS One 21(1): e0338949. https://doi.org/10.1371/journal.pone.0338949
Editor: Guangyin Jin, National University of Defense Technology, CHINA
Received: June 3, 2025; Accepted: November 28, 2025; Published: January 5, 2026
Copyright: © 2026 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: As the data involves the specific traffic conditions of the airport, we are not authorized to publicly disclose it. However, researchers seeking access to relevant data may apply to the appropriate authority—in this case, Fraport AG (the operator of Frankfurt Airport, which oversees the air traffic data in question). Requests can be directed to Fraport AG via their official email: info@fraport.de.
Funding: This work was supported by the Civil Aviation Administration of China Safety Capacity Building Funding Projects with grants awarded to WT (MHAQ2022032 and MHAQ2025021). This work was also supported by the Civil Aviation Administration of China Education Talent Funding Project with a grant awarded to WT (MHJY2022015).
Competing interests: The authors have declared that no competing interests exist.
Introduction
The air transport industry serves as a crucial hub connecting global regions, playing an irreplaceable role in promoting economic growth and enhancing cultural exchanges. However, the rapid increase in air traffic has posed unprecedented challenges to airspace traffic management [1–3]. Effective airspace traffic prediction is essential not only for ensuring aviation safety and optimizing resource allocation but also for enhancing the overall efficiency of the aviation system [4,5].
Long-term airspace traffic forecasts enable airlines to anticipate future traffic trends, facilitating more rational route planning [6]. While short-term forecasts are vital for real-time scheduling and emergency responses, they have a limited prediction range and struggle to address fluctuations and trends over extended periods. In contrast, long-term forecasts, covering time scales of several days [7,8], can better capture the impacts of weather changes, flight schedule adjustments, and holiday effects on airspace traffic. This approach aids airlines in optimizing flight scheduling and resource allocation and provides air traffic control (ATC) with more time for airspace planning and adjustments. Additionally, long-term forecasts assist in developing more reasonable air density control strategies, reducing delays and congestion, and improving the operational efficiency of the overall aviation system [9,10]. Therefore, developing accurate long-term airspace traffic prediction models is significant for enhancing the flexibility and responsiveness of aviation operations.
Currently, airspace traffic prediction primarily relies on historical data analysis, statistical models, and recent machine learning methods [11–13]. These approaches have achieved some success in short-term prediction but face limitations in long-term forecasting [14,15]. Firstly, traditional statistical models often assume linear data properties, making it difficult to capture complex nonlinear dynamic changes. Secondly, although machine learning methods excel in handling large-scale data and complex pattern recognition [16], their generalization ability in long-term trend and seasonal forecasting needs further improvement. Moreover, airspace traffic is influenced by various factors, including weather conditions and aviation policies, whose complex interactions increase the difficulty of long-term prediction [17,18].
In recent years, deep learning technology has emerged as a novel prediction tool in airspace traffic prediction research, demonstrating potential in handling complex data and nonlinear problems [19–21]. Unlike traditional statistical models, deep learning can automatically extract and learn complex relationships hidden within large-scale data through multi layer network structures and powerful parameter learning capabilities [22,23]. Several studies have recognized that airspace traffic is affected by multiple factors, such as weather variations, flight altitude, aviation policies, and holiday effects [24–26]. For example, Liu et al. [27] combined weather data to analyze the effects of different meteorological conditions, such as snowstorms and high winds, on airspace traffic flow, proposing a multifactor fusion prediction model. Han et al. [28] improved the prediction model by incorporating aviation policy and holiday data, demonstrating that these external factors significantly impact traffic flow and substantially improve prediction accuracy when introduced into the model. Additionally, Spatharis et al. [29] proposed an augmented learning-based traffic prediction model considering not only airspace traffic flow but also flight altitude and policy differences among airlines.
Despite the significant progress achieved by these advanced technologies, their performance in long-term forecasting still faces fundamental structural limitations. The primary challenges in long-term prediction are not the external factors themselves—which often act as valuable auxiliary variables—but rather the model’s intrinsic ability to capture long-term dependencies and mitigate error accumulation. For example, architectures reliant on recurrent or attention mechanisms, while powerful, often face inherent difficulties in preserving precise temporal context and avoiding information decay over the extended horizons required for long-term forecasting. While incorporating external factors can enhance performance, it cannot compensate for a core architecture that is ill-suited for these long-horizon tasks. Consequently, a critical challenge in this domain is to design a model architecture that is inherently robust to these long-term challenges, capable of maintaining predictive accuracy by effectively modeling the intrinsic dynamics of the traffic data itself.
To overcome the limitations of existing methodologies, this paper introduces a deep learning model specifically designed to analyze trends, fluctuations, and seasonality in airspace traffic dynamics. Specifically, a linear layer is utilized to capture the data’s linear relationships, while the data concurrently flows into a Trend and Seasonality Extraction (TSE) layer. This layer, comprising two convolutional networks, is tailored to extract nonlinear relationships, trends, and seasonal features from airspace traffic data. Leveraging the characteristics of convolutional operations and patching techniques, the processed features are divided into multiple patches embedded with implicit positional information. These patches are subsequently fed into a Transformer [30] encoder to enable the learning and capture of global patterns. Finally, a localized self-attention mechanism is applied to the globally learned features, enabling the model to capture short-term local fluctuations effectively and achieve comprehensive perception of airspace traffic data.
Overall, our study makes the following three main contributions:
- To improve the prediction accuracy of long-term airspace traffic, we designed a network architecture that captures the internal dynamics of traffic from three synergistic aspects. First, a Trend and Seasonal Extraction (TSE) module pre-processes the data to stabilize and extract long-range dynamics, tackling the issue of declining long-term accuracy. Second, an enhanced position-aware PatchTST [31] mitigates the loss of temporal order over long sequences by integrating robust convolutional positional signals. Finally, a Local Self-Attention (LAT) module provides a hierarchical refinement, capturing short-term local fluctuations to improve the precision of the global forecast. This process of pre-extraction, positionally-aware encoding, and local refinement ensures both stability and accuracy in long-term predictions.
- To investigate the model’s capacity to integrate external information, we systematically analysed exogenous variables, including flight altitude, heading angle, and latitude/longitude. By incorporating these factors as input features, we quantified the model’s ability to learn from external information. Interestingly, the experimental results indicate that the combined utilisation of multiple external variables does not always outperform the use of a single variable. This finding suggests potential redundancy or interference among the external variables.
- Experiments on real-world air traffic datasets demonstrate that our proposed method outperforms state-of-the-art machine learning and deep learning methods.
Literature review
Airspace traffic forecasting is crucial in aviation management, aiming to optimize air traffic resource allocation, reduce delays [32], and enhance operational efficiency [33] by predicting future traffic changes. The rapid growth of global air traffic has increased airspace complexity, making accurate traffic prediction increasingly important. Over the past decades, researchers have developed various methods to address this problem, ranging from traditional statistical models to modern machine learning and deep learning techniques. Each method has unique advantages and limitations, especially when facing the complex and dynamic airspace environment. Achieving traffic predictions that meet expected criteria has become a significant research challenge. To contextualize the specific challenges addressed in this paper, this review is organized as follows: we first touch upon the broader field of land-based traffic prediction, then survey methods for short-term air traffic forecasting, and finally focus on the advancements and remaining gaps in long-term air traffic forecasting.
Land-based traffic flow prediction
Traffic flow prediction is a mature and extensively studied field, particularly in the context of land-based transportation systems such as urban road networks. Research in this area has progressed from traditional statistical models to machine learning and, more recently, deep learning approaches designed to handle complex spatio-temporal dependencies. State-of-the-art methods for ground traffic often employ Graph Neural Networks (GNNs) [34] to explicitly model the fixed topology of road networks, effectively capturing the intricate spatial relationships between interconnected roads and intersections. This is necessary because ground traffic is characterized by high stochasticity and complex, grid-like spatial dependencies influenced by numerous factors like traffic signals, accidents, and local events. In ground transportation safety research, two studies on U.S. North Carolina data focus on vulnerable road users (VRUs) [35,36]: one develops a priority-based framework using 2014–2019 data to identify high-risk locations, noting changing impacts of factors like traffic control (declining influence) and daylight (growing influence) on crash severity; the other uses nine-year data and seasonal random parameter models to find seasonal variations in pedestrian crash factors (e.g., hit-and-run in spring, alcohol impairment in summer) and higher crashes in darker seasons. Furthermore, recent studies have demonstrated the extensive potential of advanced deep learning and stochastic optimization frameworks in addressing complex challenges within intelligent transportation systems, ranging from infrastructure planning and anomaly detection to data synthesis under scarcity [37–41].
In contrast, airspace traffic is more highly regulated, follows more structured long-range trends, and its network topology is more fluid and less rigidly defined than a physical road graph [42,43]. These fundamental differences necessitate the development of specialized models tailored to the unique dynamics of the aviation domain, rather than simply adapting land-based models. This review will therefore focus on methods specifically applied to or suitable for air traffic forecasting.
Short-term air traffic flow prediction
Short-term forecasting is essential for tactical air traffic management, focusing on real-time adjustments and safety assurance. In the early stages, traditional statistical models were widely used for this purpose. For example, Nieto et al. [44] employed Autoregressive Integrated Moving Average (ARIMA) models to effectively capture short-term traffic fluctuations in low-complexity environments. Similarly, the dynamic updating capability of Kalman filtering [45] has been applied in air traffic control to adjust predictions in real time by incrementally updating flight data, providing timely and accurate forecasts in response to immediate changes [46].
More recently, deep learning methods have been applied to short-term tasks. Lin [47] proposed a model based on ConvLSTM modules to accurately predict flow distributions at varying altitudes, while Jardines [48] developed a Convolutional Neural Network (CNN)-based model to predict thunderstorms, enhancing tactical air traffic planning. To handle the complex network topology of the airspace system in real time, some studies have used Graph Convolutional Networks (GCNs) to effectively model the intricate relationships between nodes in large-scale airspace systems [49]. In parallel, other advanced Transformer-based architectures, such as the Temporal Fusion Transformer (TFT), have been successfully applied to predict numerical airport arrival delays at a high temporal resolution, demonstrating strong performance with multi-factor inputs [50]. Valuable methodological insights can also be drawn from the adjacent domain of urban rail transit (URT), where researchers have developed sophisticated models to handle sharp, non-routine traffic fluctuations during events, holidays, or public health emergencies. These works demonstrate the efficacy of advanced deep learning architectures for such scenarios. For instance, Transformer-based models have been designed to explicitly separate and predict the “extra” passenger flow generated by large-scale events. Furthermore, Graph Neural Networks (GNNs) have been extended to not only capture spatial dependencies but also to integrate multi-frequency temporal patterns (e.g., real-time, daily, and weekly) and even incorporate physics-guided loss functions to enhance model interpretability. Other approaches have utilized multi-task learning frameworks to jointly predict related variables like station inflow and outflow, capturing their complex interactions [51–54]. While effective for immediate operational needs, these methods are primarily designed for tactical decision-making and often struggle to capture the broader patterns required for long-range strategic planning.
Long-term air traffic flow prediction
Long-term forecasting, which covers time scales of several days, is critical for strategic planning, including optimizing airline schedules and airspace resource allocation. This task requires models that can effectively capture underlying trends and seasonality. Traditional methods like the Holt-Winters technique have been used to forecast aggregated passenger data by decomposing the time series into trend and seasonal components [55].
To handle the non-linearities inherent in long-term data, machine learning and deep learning models have become the primary focus. Ensemble learning methods such as Random Forests have been used to analyze features like flight origins and destinations to predict air traffic flows and flight levels [12]. Deep learning models, which excel at handling complex long-term dependencies, have also been a major area of research [56–58]. For instance, Gui et al. [59] applied Long Short-Term Memory (LSTM) networks to airway traffic prediction, demonstrating strong performance, particularly when abnormal factors are considered.
A more recent and transformative trend is the application of Large Language Models (LLMs) to spatio-temporal forecasting. This emerging field aims to create spatio-temporal foundational models with strong generalization capabilities. For example, UrbanGPT [60] has been proposed to integrate spatio-temporal dependency encoders with the instruction-tuning paradigm, enabling LLMs to make accurate predictions even in data-scarce or zero-shot scenarios. Alongside the development of new models, efforts are also underway to create comprehensive benchmarks, such as STBench [61], to systematically evaluate the spatio-temporal knowledge, reasoning, and application capabilities of existing LLMs.
To further improve long-term accuracy, researchers have increasingly incorporated external variables, such as weather conditions and holidays, into deep learning models [62]. However, while these advanced models have shown progress, they face significant challenges. As the number of input features grows, the computational complexity and resource requirements escalate rapidly. Furthermore, for high-dimensional inputs and multi-layer neural networks, the total number of parameters can expand swiftly, increasing model complexity without guaranteeing better performance. There is an urgent need for a model that can efficiently and effectively learn the fundamental nonlinear variations, trends, and seasonal patterns from the airspace traffic data itself, forming the primary motivation for this work.
Problem formulation
The historical traffic data for a single, predefined airspace can be represented as a two-dimensional tensor . Each element
in this tensor corresponds to the value of the d-th feature at the t-th time step. The feature dimension, d, includes variables representing the airspace traffic flow. This historical data is fused with other relevant feature sequences (e.g., flight traffic state such as flight altitude and heading) via concatenation at each time step to form an integrated feature matrix,
. If the additional feature sequence has a dimension of
, the resulting feature dimension of the integrated matrix becomes
.
Our goal is to use the historical integrated feature sequence to predict the airspace traffic over a future prediction horizon of P time steps. This sequence of future predictions is denoted as
. Each value
in this sequence represents the predicted number of flights in a particular airspace or altitude stratum at the future time step
.
Essentially, airspace traffic prediction can be viewed as finding a suitable mapping function between historical and future data that enables it to predict future traffic based on a sequence of historical composite features
:
,
are model parameters that need to be learnt from historical airspace data;
is a predictive model designed to capture the complex spatio-temporal dependencies of airspace traffic.
Model architecture
Overall structure
This paper introduces a deep learning model, IDCformer, specifically designed for long-term airspace traffic prediction. As depicted in Fig 1, the model builds upon the transformer architecture, incorporating multiple feature extraction techniques. The extracted features are divided into patches enriched with positional information, enabling stronger feature representation and positional awareness capabilities. IDCformer is composed of several modules, including the Trend and Seasonality Extraction (TSE) module, the PatchTST module with enhanced feature representation and positional awareness, and the Local Attention Transformer (LAT) module.
IDCformer aims to enhance the modeling capability for complex air traffic flow data through multi-module feature extraction and patch-based processing. Specifically, the processing workflow for the input sequence is as follows: First, the input sequence undergoes parallel processing, where linear transformations are applied, and the Trend and Seasonality Extraction (TSE) module captures trend and seasonal features. Next, the processed feature representations are segmented into equal-length patches using patching techniques and mapped to a high-dimensional space. Subsequently, a Transformer encoder is employed to extract global dependencies. Building on this, a local attention mechanism is utilized to strengthen local feature representations. Finally, the predicted air traffic flow is obtained through a linear layer.
Feature extraction module
The feature extraction module learns the linear relationship of the data through two sub-modules, Linear and TSE, on the raw input series respectively, and captures the characteristics of the flow changes in terms of global trends and seasonality.
The Linear module extracts the implicit information directly from the sequence level through a two-layer linear mapping, and uses the neural net layer to process the linear relationships in the original sequence. For linear transformation, we define the input sequence is linearly transformed as:
where is the weight matrix of the linear mapping and
is the bias vector.
In time series analysis, trend and seasonality features typically manifest as correlations between neighbouring time points. Convolutional operations effectively learn smooth trends and seasonal patterns in time series due to their local connectivity and parameter-sharing properties. Fig 2 compares Recurrent Neural Network (RNN)-based methods and Convolutional Neural Network (CNN)-based methods, with Fig 2(a) clearly demonstrating the superiority of convolution in capturing trends and seasonal features.
(a) CNN-based Methods: Node w acquires and fuses information from its neighboring node p within a short time window at the current and preceding time steps to efficiently capture trends and seasonality. (b) RNN-based Methods: At the current time step, node w receives raw information from its neighboring node p and integrates latent representations from the previous time step to capture temporal features.
As shown in Fig 2(a), the CNN-based method processes information in parallel. A convolutional filter aggregates weighted information from multiple neighbouring points within a local window simultaneously. This parallel, windowed view allows the model to directly perceive the local shape and structure of the time series, making it highly effective at identifying local slopes (trends) and periodic patterns (seasonality). In contrast, Fig 2(b) depicts the RNN-based method, which processes information sequentially. At each time step, it considers only the current input and its own past hidden state. While excellent for capturing long-term temporal dependencies, this point-by-point processing is less direct for identifying local shapes, as an RNN must infer a pattern by remembering a sequence of individual points, whereas a CNN recognizes the pattern of that sequence in a single operation. Therefore, for the specific task of extracting trend and seasonality, the convolutional approach is inherently more direct and efficient.
For sequential data, one-dimensional convolution simplifies computation by avoiding the need to process additional dimensional information, unlike two-dimensional convolution. Furthermore, as the convolution kernel slides along the temporal axis, the extracted features at each time step vary according to the local context at that specific position, inherently making the convolution operation position aware. Consequently, our TSE module employs one-dimensional convolution to capture long-term trends and seasonal components in traffic data. This process is represented as:
Where, is the convolution kernel,
is the padding parameter and
is the implicit position information carried by the 1D convolution operation. Specifically, to effectively capture both long-term trends and shorter-term seasonality, the TSE module consists of two parallel 1D convolutional layers, organized into a two-branch structure. The first branch, designed for trend extraction, employs a 1D convolutional layer with a larger kernel size (48) to smooth the sequence and capture low-frequency signals. The second branch focuses on seasonality, using a 1D convolutional layer with a smaller kernel size (16) to identify more local, periodic patterns. Following each convolutional layer, a GELU (Gaussian Error Linear Unit) activation function is applied to introduce non-linearity. The outputs from these two branches are then fused through element-wise addition, allowing the model to learn a combined representation of both trend and seasonal dynamics.
The final linearly transformed output with the combination of trend and seasonal components is:
Through the collaboration of these two modules, the model extracts richer features of flow changes, thereby effectively enhancing its ability to learn from data.
Enhanced feature representation and position-aware PatchTST
PatchTST is a Transformer-based model for long-term time series forecasting, introduced in recent years. By segmenting the original sequence into patches and leveraging the Transformer’s global attention mechanism, PatchTST effectively models long-range dependencies across patches, enabling accurate long-term time series predictions.
In this paper, we enhance the long-term prediction capability of PatchTST by strengthening the feature representation of Patch and injecting position information into it. Specifically, the core idea of the original PatchTST is to divide the original time series into a number of non-overlapping patches
, then perform linear mapping (i.e., Patch Embedding) on each
, and finally embed all the patches into the input Transformer encoder for global modelling. However, directly dividing the raw sequence into a number of patches would result in redundant information or noisy features, making it difficult for the model to effectively discriminate the correct feature representation. The enhanced PatchTST in this paper is to divide the high-level feature representation
in Eq. (3) into
patches with implicit position information:
Before doing patch division, the TSE module in Eq. (2) captures a more advanced and smoother representation of the temporal features, which allows the divided to have clearer information in the time dimension and smoother articulation between patches. Where each patch
can be represented as:
Each patch is then mapped to a higher dimensional space via an embedding matrix:
Where, denotes the embedding matrix and
is the bias term, after the above processing, the embedded feature sequence can be obtained:
This process involves segmenting the sequences, organizing them into multiple patches, and embedding them into sequences that the Transformer can learn from and subsequently pass to the Transformer encoder. The patch size is a critical hyperparameter that determines the granularity of the input to the Transformer. A size that is too large can average out important short-term patterns, while a size that is too small increases computational cost and may lose local contextual information. Therefore, in this paper, we adopt a patch size of 16, which is commonly used in relevant studies.
After embedding the patch, the original PatchTST applies learnable positional encoding to capture the temporal positions within the time series. Meanwhile, the convolutional property of the TSE module, as described in Eq. (2), inherently provides relative positional information for neighbouring moments. This process is equivalent to embedding linearly refined local positional signals. The integration of these positional encodings further enhances PatchTST’s ability to sense position and temporal patterns in traffic data, as represented by:
where denotes the final position information and
denotes the position information of the learnable position code. This learnable positional encoding is implemented as an embedding layer, where each position in the sequence of patches is assigned a unique vector that is optimized during training. This positional information is then combined with the patch embedding via element-wise addition. For this operation to be possible, the embedding size for both the patch and the positional encoding must be identical, which is set to 128 in our model. capacity:e of embedding size affects the model’s capacity; a size too small may lead to underfitting, while one too large increase the risk of overfitting and computational cost. Therefore, to avoid model overfitting and save computational costs, the embedding dimension is set to 128.
The global features are then extracted using a Transformer encoder, which models the long-range dependencies in traffic data through a multi-head self-attention mechanism. Each patch is processed through multiple layers of Transformer encoders. By calculating the correlations between sequence elements, the self-attention mechanism generates a weight matrix that is used to compute a weighted sum of the value vectors, producing a new representation while emphasizing distinct subspace features. Furthermore, the Transformer encoder incorporates a feed-forward neural network, enabling independent linear transformations of the representations at each position, thereby enhancing the model’s representational capacity.
Local self-attention
Although a global self-attention mechanism is applied to patches to capture long-term dependencies and semantic relationships, the local associations between patches are insufficiently addressed. This limitation arises because the Transformer encoder primarily applies attention between patches, hindering the model’s ability to detect local fluctuations effectively. To overcome this, we design a local attention mechanism that employs window segmentation to enable finer-grained interactions at a local scale, building upon features that already contain global contextual information.
While our LAT module utilizes a self-attention mechanism, it is fundamentally different from the global attention in the main Transformer encoder in its scope, architectural role, and the data it processes. Unlike the main encoder’s attention, which operates globally across all patches to learn long-range dependencies, the LAT module is a subsequent, refining step that operates locally within fixed, non-overlapping windows to capture short-term fluctuations. Critically, it takes the globally aware feature representations from the main encoder as its input, rather than the initial embeddings. This hierarchical approach, which first models the global trend and then focuses on local details, enables the model to achieve long-term stability without compromising local accuracy.
Following the modelling of long-range dependencies in PatchTST, short-term fluctuations in airspace traffic data are captured through the local self-attention mechanism. With a daily data step of 96 and a patch size of 16, the window size is set to 6, corresponding to the time step of the preceding day. Within each window, the multi-head self-attention process is applied. The specific self-attention computation is as follows:
Where ,
,
are Query, Key and Value matrices respectively. They are obtained by linear transformation.
are the feature representations of the input window, and
,
,
are the weight matrices of Query, Key, and Value, respectively. The self-attention calculation formula is as follows:
where is the scaling factor, usually equal to the dimension of the key vector, used to stabilize the gradient. The attention weight matrix is generated by computing the dot product of the query and the keys, which is then normalized by SoftMax. Finally, this attention matrix is used to generate a new feature representation by adding a vector of weights.
Local self-attention is primarily used for short-term temporal dependencies, thereby enhancing the model’s capacity to capture local patterns. Following the application of local self-attention, the model integrates the outputs with the global features extracted by the Transformer encoder via residual connections.
Where is the global feature representation, the residual linkage effectively combines global and local information, enabling the model to have both an understanding of global dependencies and a keen capture of local variations.
Finally, the model performs an aggregation operation of the fused feature sequences using average pooling to obtain unified feature representation:
where is the i element in the fused feature sequence. Finally the aggregated feature representation is mapped to the final prediction through a linear layer:
Where is the weight matrix of the output layer and
is the bias term.
Through the above process, the model in this paper is able to make full use of the self-contained features of the input sequences themselves to capture the complex temporal dependencies and achieve multi-step prediction of airspace traffic.
Experiment
Dataset and experimental settings
In this section, we use real-world airspace traffic data to evaluate the performance of the proposed model. The dataset comprises real-time ADS-B data collected at 15-minute intervals from 1 June 2018–30 June 2018 within a 300 km radius centered on Frankfurt Airport. This includes flight data for all flights within this airspace. Specifically, the data encompass information such as flight ID, flight altitude, flight heading angle, and the latitude and longitude of each aircraft. ADS-B data acquisition and transmission can be affected by various factors, including signal occlusion and multipath effects, leading to data loss. Therefore, interpolation was employed to impute missing data during the preprocessing stage.
The dataset was divided into training, validation, and test sets in the ratio of 7:1.5:1.5. The TSE module consists of two parallel 1D convolutional layers, organized into a two-branch structure. There are three layers of Transformer encoders, and the number of attention heads matches the number of local attention heads in the final part of the model, which is 4. In the subsequent LAT module, local attention is applied with a window size of 6 using non-overlapping windows. The resulting local features are then integrated with the global features from the main encoder via a residual connection, as described in the “Local self-attention” section. The batch size was set to 64 to ensure robust model performance on the dataset.
To ensure robust convergence and mitigate overfitting, we implemented a rigorous training strategy. The model was trained using the Adam optimizer with an initial learning rate of 1e-3. We employed a Cosine Annealing learning rate scheduler, which gradually decreases the learning rate to a minimum of 1e-6 over the course of training, facilitating smoother convergence in later epochs. The batch size was set to 64. The maximum number of training epochs was set to 100. To further prevent overfitting, we adopted an early stopping strategy with a patience of 10 epochs; training was automatically terminated if the validation loss did not improve for 10 consecutive epochs. Fig 3 presents the training and validation loss curves. As shown, both curves descend rapidly from an initial value of approximately 0.14 and converge to a low error range after 80 epochs. The consistent alignment between training and validation loss demonstrates the stability of the training process and the effectiveness of the regularization strategy. All experiments were conducted on a desktop computer equipped with a 13th Gen Intel® Core™ i7-13700H processor, 16 GB of RAM, and an NVIDIA GeForce RTX 4060 Desktop GPU.
Evaluation metrics
We divide the whole airspace traffic data into three common airspaces according to altitude: high altitude airspace (above 6000m), medium altitude airspace (3000m ~ 6000m) and low altitude airspace (below 3000m). Consequently, the objective was to predict flight traffic not only across the entire airspace but also within these three specific airspaces to provide controllers with more detailed decision-making guidance. Additionally, long-term traffic prediction in the airspace involves multiple time steps; therefore, Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) were employed to evaluate the model’s performance.
Where: denotes the sequence length of the airspace traffic data,
is the ADS-B data value, and
is the airspace traffic prediction value. To interpret these metrics, it is crucial to understand that the prediction target,
, is a single scalar value: the total number of aircraft in a defined airspace. Therefore, the metrics directly quantify the error in this aircraft count prediction.
The practical significance of this is best explained using MAE, which represents the average absolute difference between the predicted and actual aircraft counts. For the 24-hour prediction (Table 1 in the subsequent “Model Comparison” section), the baseline HA model has an MAE of 106.98, meaning its prediction is, on average, wrong by about 107 aircraft. In stark contrast, our IDCformer model achieves an MAE of 45.19, meaning its average error is only about 45 aircraft. This improvement from an average error of 107 aircraft to 45 is practically significant. It elevates the prediction from a rough estimate to a reliable decision-support tool, enabling air traffic managers to proactively optimize sector configurations and staffing with much greater confidence, thereby enhancing operational safety and efficiency.
Baseline methods
To comprehensively assess the model’s performance, five state-of-the-art baseline methods were selected for comparison, encompassing both machine learning and deep learning approaches:
- (1). HA: Historical Average. It uses the average value of historical data to predict future values.
- (2). SVR [63]: a support vector regression with linear kernel.
- (3). LSTM [64]: Long Short-Term Memory. A specialized form of recurrent neural network, LSTM addresses the issues of gradient vanishing and explosion in traditional RNNs by introducing a gating mechanism, thereby effectively capturing long-term dependencies in sequences.
- (4). Autoformer [65]: Based on the transformer architecture, it incorporates a series decomposition module and an auto-correlation mechanism.
- (5). PatchTST [31]: Patch Time Series Transformer. A Transformer-based model that segments time series data into several patches, each containing a sequence of consecutive time steps and treated as a token.
- (6). iTransformer [66]: An inverted Transformer model that performs inverted modelling of time series within an encoder-only framework, deeply exploring the correlations between variables through an attention mechanism.
- (7). ModernTCN [67]: Modern Temporal Convolutional Network. An advancement of Temporal Convolutional Networks (TCN), ModernTCN enhances the receptive field of the convolutional kernel and adopts a lightweight design while maintaining performance.
- (8). AMD [68]: An MLP-based framework that decomposes time series into multiple scales and adaptively synthesizes predictions. Its core is an Adaptive Multi-predictor Synthesis (AMS) block configured with several predictors.
These baseline models have undergone extensive research and practical validation, occupying a core position in the field of traffic flow prediction and demonstrating stable performance and strong adaptability in scenarios similar to air traffic. Incorporating them into comparative experiments allows coverage of a wide range of classical algorithmic approaches and advanced technical solutions. To ensure a fair comparison of air traffic flow prediction performance, the input sequence length of all baseline models is kept consistent with that of IDCformer.
Model comparison
Table 1 presents the performance of the proposed method alongside other baseline methods across different test datasets, illustrating the variation in error for each model in predicting various time steps within different airspace traffic datasets. The analysis focuses on the following aspects:
Superiority of Deep Learning Models in long-term prediction: A comparative analysis of classical deep learning models, such as LSTM, and SVR on several datasets reveals that the prediction error of LSTM is only greater than that of SVR for relatively short time steps. As the prediction horizon increases, the error of LSTM becomes significantly smaller than that of SVR, while the errors of other state-of-the-art deep learning models are considerably lower than that of LSTM. This indicates that deep learning models are better suited for long-term predictions of flight traffic across different spatial domains. They effectively learn complex fluctuation patterns from large volumes of ADS-B data, primarily due to the deeper network structures and more sophisticated iterative mechanisms employed in deep learning models compared to SVR.
Accuracy of IDCformer in long-term forecasting: We compared the proposed model with state-of-the-art deep learning approaches, including ModernTCN, PatchTST, AMD and iTransformer. As shown in Table 1, the proposed model consistently outperforms the competing models across all time scales. Specifically, compared with the second-best model, PatchTST, IDCformer achieves overall reduction of 4.12% in RMSE, 4.20% in MAE, and 4.87% in MAPE. These results demonstrate that the proposed model is better suited for predicting long-term airspace flight traffic, owing to its superior ability to effectively capture the dynamics of airspace traffic conditions.
The stability and statistical significance of IDCformer: To rigorously evaluate our model’s ability to learn in a stable and reliable manner from the overall traffic flow, we conducted both 5-fold time-series cross-validation and hypothesis testing on the entire altitude dataset. The cross-validation results, presented in –, demonstrate that the proposed IDCformer not only consistently achieves the lowest average error across all metrics and prediction horizons but also exhibits small error bars. This indicates that its superior performance is stable and robust when evaluated on different partitions of the data. Furthermore, to formally verify that this outperformance is not due to random chance, hypothesis testing was conducted against the top three baseline models. As shown in Table 2, the p-values from the Diebold-Mariano test are consistently below the 0.05 significance level. This provides strong statistical evidence that the IDCformer’s performance represents a significant and reliable improvement over other advanced methods for long-term forecasting of the entire airspace’s traffic.
To comprehensively understand the interpretability of the proposed model, Fig 7 exhibits the integrated attention weight heatmap across different time periods of the day. This heatmap is derived by averaging the weights from the enhanced feature representation and position-aware PatchTST and the Local Attention module, providing a holistic view of the model’s focus. The vertical and horizontal axes denote the indices of query patches and key patches, respectively, where each patch corresponds to a 4-hour interval within the 96-step daily sequence (e.g., Patch 0 represents 00:00–04:00). Color brightness corresponds to attention weight magnitudes, with brighter colors indicating a higher degree of dependency and focus.
The visualization reveals that the model captures intrinsic dynamics by identifying critical periodic structures rather than distributing attention uniformly. Specifically, the model assigns the highest attention weights (reaching 0.24) to Patch 4 (16:00–20:00), identifying the evening peak as a deterministic anchor for the day’s traffic pattern. Furthermore, a significant structural “look-back” mechanism is observed: at the end of the sequence (Patch 5), the model heavily attends to Patch 2 (08:00–12:00) with a weight of 0.23. This demonstrates that IDCformer effectively utilizes the daily periodicity of airspace traffic, querying historical morning peak data to calibrate predictions for the night, thereby validating its ability to learn complex intrinsic dynamics beyond simple temporal proximity.
Visual analytics
The MAPE effectively reflects the degree of fit between predicted and actual traffic flow. Given that MAPE values are generally higher for models in mid-altitude and low-altitude airspaces, we focus on these two airspaces to visualise the actual traffic flow alongside predictions from state-of-the-art models.
Figs 8 and 9 offer a detailed visual comparison of the prediction performance of our proposed model and the PatchTST baseline. The top two rows of each fig illustrate the individual performance of our proposed model and PatchTST, respectively. For each model, two subplots are provided: a time-series plot on the left compares the predicted traffic (bar chart) against the actual values (red dots) to visualize the temporal fit, while a correlation plot on the right maps predicted values against actual values, where the proximity of the scatter points to the 45-degree line of perfect prediction indicates accuracy. The bottom panel then provides a direct time-series overlay, visualizing the predictions of both models as line graphs against the real traffic data, with an inset offering a magnified view of a specific interval to highlight performance on more volatile segments.
Figs 8 and 9 illustrate that flight traffic in the Mid-altitude and Low-altitude airspace exhibits significant fluctuations. These variations are primarily attributable to the impact of low-altitude fog, which reduces visibility and directly increases the intervals between take-offs and landings, thereby causing fluctuations in flight traffic data. In the Mid-altitude airspace, the influence of clouds differs from that at lower altitudes but remains substantial. Mid-altitude clouds are closely associated with weather systems, often heralding precipitation or storm systems. Such weather events prompt aircraft to alter flight paths or adjust altitudes, thereby introducing variability in mid-altitude airspace traffic.
Our model consistently demonstrates superior fit to actual traffic flow compared to the second-best model across all time intervals, with a notable advantage during periods of heightened prediction difficulty. This underscores the effectiveness of employing one-dimensional convolution to separately model trends and seasonality in air traffic flow. Additionally, the use of linear layers proves sufficient for capturing linear patterns within the data. These findings further validate the robustness of the proposed model in handling the complexities of dynamic airspace environments.
Ablation studies
To evaluate the contribution of each component in the airspace traffic prediction task, a series of ablation experiments were conducted on five IDCformer variants using the traffic data of the entire airspace. These experiments systematically dissected the impact of key modules within the model. The variants are defined as follows:
- IDCformer-RLAT: LAT module removed.
- IDCformer-RL: Linear layer removed.
- IDCformer-RC: A set of Conv1d in TSE removed.
- IDCformer-RTSE: TSE module removed
- IDCformer-RP: Enhanced PatchTST removed.
The input variables for each model are listed in Table 3. The prediction evaluations are illustrated in Fig 10, demonstrating that each component contributes to the model’s predictive performance. Fig 11 depicts the mean error increase for each ablated model compared to the original model, showing that the removal of any component results in decreased model performance. Based on this, the following conclusions can be drawn:
- Effectiveness of the Patching Technique and the Enhanced Position-Aware PatchTST: The results highlight the effectiveness of both the general patching technique and our specific enhancements. A comparison between IDCformer-RP (which removes the PatchTST module entirely) and the full model reveals that IDCformer-RP exhibits the poorest performance. This demonstrates that the patching technique itself is fundamentally important, as it efficiently encodes the structural features of the input data and provides essential contextual information for the Transformer encoder.Furthermore, to specifically evaluate the contribution of our proposed enhancements, we tested a variant where the Enhanced PatchTST was replaced by the original PatchTST. This variant also underperformed the full IDCformer model, confirming the value of our modifications. By applying patching to high-level feature representations and integrating implicit convolutional positional signals, our enhanced module provides the model with a more robust understanding of temporal context. This is particularly crucial in long-term forecasting as it helps prevent the loss of positional information over long sequences, leading to a significant improvement in prediction accuracy.
- Effectiveness of TSE and LAT Modules: Effectiveness of TSE and LAT: As shown in Fig 10, a comparison of the errors among IDCformer-RTSE, IDCformer-RC, IDCformer-RLAT, and the full IDCformer model demonstrates that the model’s error increases significantly, whether TSE is incrementally removed or LAT is directly removed.This phenomenon arises because the convolutional operations within TSE facilitate hierarchical feature abstraction, aiding the model in recognizing higher-level patterns and traffic flow trends. Moreover, when the TSE module is removed, the positional information inherently encoded in the patches due to convolutional characteristics is also lost. This leads to ambiguity in sequence positional information, particularly for long-term predictions. The LAT module, on the other hand, effectively captures local dependencies and mitigates the influence of long-range noise. Consequently, the removal of either TSE or LAT substantially impairs the predictive performance of the model.
- Effectiveness of the Overall Architecture: The full IDCformer model outperforms all variants, underscoring the pivotal role of its constituent modules. The high accuracy of IDCformer in long-term prediction is attributable not only to the superior performance of individual components but also to the synergistic integration of these components within the model architecture.
Impact of different input_window
For deep neural networks, the input window (i.e., look-back window) is a key parameter that significantly affects the model’s ability to learn essential information from historical data. Selecting an appropriate input window can significantly improve the model’s prediction performance. Fig 12 demonstrates the error performance of various models on the entire airspace under different input window sizes. This analysis aims to evaluate how different input window sizes affect the model’s prediction performance while maintaining a consistent output window (i.e., prediction horizon).
Analysis by Different Metrics: Each row in Fig 12 represents a distinct error metric, specifically RMSE, MAE, and MAPE. It is evident from the figure that, irrespective of the metric utilized, the error of our proposed model consistently remains lower than that of the other state-of-the-art models as both input and output windows increase.
Analysis by Different Output Windows: Each column in Fig 12 corresponds to a different output window size. The performance of each model improves with increasing input window size only when the output window is set to 192. For output window sizes of 96, 288, and 344, the performance of each model peaks at an input window size of 192, after which it deteriorates as the input window size increases further. Consequently, an input window size of 192 is deemed optimal compared to other input window sizes. Additionally, we observed that the error of each model decreases when the input window is increased from 96 to 192. However, the prediction error increases sharply when the input window is expanded from 192 to 288 and 384. When the input window was set to 96, the models lacked sufficient information to accurately predict the changing patterns of flight traffic in future airspace. Conversely, an input window size of 384 introduced excessive redundant information into the model, leading to inaccuracies in predicting future traffic flows and thus diminishing model performance.
Efficiency analysis
To evaluate the suitability of the proposed model for the airspace traffic forecasting task, we use the entire altitude dataset to analyze the computational cost of the proposed model. The experiments select training time, inference time, and GPU memory usage as the evaluation metrics, conducted in a unified environment with a fixed batch size of 64 to ensure comparability.
The results in Table 4 indicate that the MLP-based AMD is the most computationally efficient, followed by the iTransformer. Our proposed IDCformer, owing to its more complex, multi-stage architecture that includes dedicated modules for feature extraction (TSE) and local refinement (LAT), exhibits moderately higher computational costs. However, these costs remain well within a practical range for real-world deployment. For critical applications such as airspace traffic management, predictive accuracy is paramount to ensure operational safety and efficiency. As demonstrated in the preceding sections, IDCformer consistently achieves the highest forecasting accuracy. Therefore, the modest increase in computational resources is considered a justifiable and acceptable trade-off for the significant and reliable gains in predictive performance.
The external information integration ability of IDCformer
Although the IDCformer proposed in this study is primarily designed to address the internal dynamics of airspace traffic data, the dynamic variations in flight traffic during actual air traffic operations are often influenced by multiple external factors. Therefore, this section focuses on evaluating whether IDCformer possesses the capability to effectively learn and integrate external information. Let the original feature vector at time , containing internal traffic variables (e.g., number of flights), be denoted as
. The selected external features (e.g., average flight altitude, longitude, and latitude) at the same time step are represented by another vector,
. These two vectors are then concatenated to form a new, augmented feature vector,
:
Where denotes the concatenation operation. The model then uses the historical sequence of these augmented feature vectors,
, as its new input to predict the future traffic count. This formulation allows the model to learn from both the intrinsic traffic dynamics and the external factors provided. The subsequent experiments analyze the impact of including these external variables.
In this study, four external factors—aircraft heading, longitude, latitude, and flight altitude—were selected as features to analyze airspace traffic data. The prediction error of IDCformer was calculated, and Fig 13 illustrates the impact of these factors on long-term traffic prediction performance.
Characteristic Description and Rationale for Selection: Aircraft Heading represents the azimuth of flight relative to true north. The heading reflects the flight path of an aircraft, aiding in capturing the distribution characteristics of aircraft on different routes. Longitude and Latitude indicate the geographic coordinates in the east-west and north-south directions, respectively. These coordinates assist the model in capturing spatial traffic change patterns. Flight Altitude indicates the vertical distance of an aircraft relative to sea level. Changes in flight altitude directly affect route selection and airspace utilization. Incorporating these features is essential for a comprehensive understanding of the spatial and temporal dynamics of air traffic.
Impact of Adding Features: Analyzing Fig 13, it was observed that when the prediction duration is 72 hours (i.e., output window = 288), adding two external factors—flight altitude and heading angle—results in higher prediction errors compared to the original IDCformer model. Apart from this exception, the inclusion of any individual feature improved the model’s long-term prediction accuracy. This outcome highlights the model’s inherent capability to adaptively learn from external features. This adaptive learning is a direct result of the model’s hierarchical attention architecture. The global self-attention mechanism in the main Transformer encoder first assesses the overall importance of each external feature across the entire prediction horizon. Subsequently, the Local Self-Attention (LAT) module refines this understanding by focusing on smaller time windows, allowing the model to capture how an external factor’s influence might vary over shorter periods. This synergy between global and local attention enables the model to dynamically amplify the weights of consistently relevant features while suppressing those that introduce noise, thereby effectively learning from a complex, multi-feature environment.
Table 5 presents the prediction performance of the model after introducing different features, including the prediction error after combining latitude and longitude. This allows for a clear quantification of the performance improvements resulting from capturing various types of information. The abbreviations in Table 5 correspond to the following feature combinations: ‘Only F’ represents the baseline model using only traffic flow; ‘&H’ denotes the addition of aircraft heading; ‘&LA’ adds latitude; ‘&LO’ adds longitude; ‘&FA’ adds flight altitude; and ‘& LA & LO’ represents the combined addition of latitude and longitude. When adding heading as a feature, the prediction accuracy improved by only 2.05% on average compared to the baseline model. This limited improvement is attributed to the extensive coverage of the study area, the large number of flights, and the significant variation in heading angles, which may render heading less effective as a feature input.
In contrast, adding latitude and longitude features individually enhanced the model’s prediction accuracy by 4.04% and 5.91%, respectively. This suggests that changes in latitude and longitude significantly impact overall traffic flow, and our model effectively captures the relationship between spatial location and traffic flow changes. However, although flight altitude is a crucial parameter in aviation operations, its performance enhancement was relatively limited. This may be due to the high number of flights introducing randomness or insufficient correlation with traffic patterns, making it challenging for the model to extract effective information from the altitude feature to improve prediction accuracy.
When both geographic features, latitude and longitude, were input into the model, the improvement in predictive accuracy was not as pronounced as when they were added separately. This suggests that the design of flight routes leads to high correlation between longitude and latitude, resulting in multicollinearity between these features. Feature redundancy can impede the model’s ability to extract effective information and may introduce noise, thereby affecting prediction performance. Therefore, the correlation and information gain between features must be carefully considered during the feature selection process.
To succinctly quantify the contribution of each external feature and investigate the reasons for feature conflicts, we conducted a Pearson correlation analysis. Fig 14 presents the correlation matrix between the target variable (Traffic Flow) and external factors. The analysis yields two critical insights that explain the experimental results in Table 5:
Weak Correlation of Heading: The correlation coefficient between Heading and Traffic Flow is relatively low (r = 0.12). This confirms that while aircraft heading contains some valid information (contributing to the minor 2% performance gain), it is a weak predictor compared to spatial features, limiting its potential for significant improvements.
Feature Entanglement and Interference: A moderate correlation (r = 0.44) is observed between Longitude and Latitude. More importantly, we observe consistent cross-correlations between auxiliary features (Heading/Altitude) and spatial features (ranging from 0.30 to 0.39). This indicates a complex entanglement among external factors. When all features are integrated simultaneously, this coupling introduces inter-feature interference, which hinders the model from isolating unique information gains from each variable. This explains why the “combined” performance plateaued compared to individual feature integration.
In summary, our experimental results demonstrate that although IDCformer is specifically designed to capture the dynamic variations inherent in traffic data, it exhibits a robust capability to effectively learn and integrate external information. When external factors are incorporated as feature inputs, IDCformer achieves notable improvements in long-term prediction performance. Furthermore, the experiments reveal that individual independent factors are more conducive to enhancing the accuracy of long-term air traffic flow forecasting.
Conclusion
This paper addresses the challenge of long-term airspace traffic forecasting by proposing a novel intrinsic dynamics capture network designed to overcome the limitations of traditional statistical models and machine learning methods in capturing complex dynamics and performing long-term predictions.
To enable in-depth analysis and modeling of airspace traffic data, we devised three core components: Trend and Seasonal feature Extraction (TSE), enhanced feature representation and position-aware PatchTST, and a Local Attention Mechanism (LAT). Experimental results on real-world datasets demonstrate the model’s mechanistic superiority. Specifically, the synergistic integration of TSE and LAT robustly addresses long-term dependency modeling by isolating and capturing multi-scale temporal dynamics. Simultaneously, the position-aware PatchTST effectively mitigates the loss of temporal order in extended horizons, ensuring precise positional information preservation. In terms of practical value for airspace management, IDCformer serves as a critical tool for strategic Air Traffic Flow Managemen. Its capability to accurately forecast long-term traffic peaks enables authorities to transition from reactive flow control to proactive congestion mitigation, facilitating data-driven decisions such as dynamic sector configuration and optimized controller staffing schedules, thereby enhancing both operational safety and efficiency.
Additionally, as internal variations in airspace traffic are influenced by external factors, we further investigated the model’s ability to integrate and learn from external information. Results show that incorporating independent external variables, particularly spatial positional information such as latitude and longitude, significantly improves long-term forecasting accuracy, validating the model’s applicability in real-world scenarios. This finding underscores the model’s dual capability to extract intrinsic data features while effectively learning spatiotemporal patterns. However, when multiple external factors are introduced simultaneously, the performance gains diminish compared to the inclusion of a single factor. This can be attributed to interference among external factors or the model’s suboptimal ability to capture the coupling relationships between different variables. This finding highlights the need for further optimization in model design to efficiently integrate the intrinsic patterns of traffic data with external information, aiming to achieve synergistic enhancements in predictive performance.
Future research will focus on enhancing the model’s predictive capability in response to sudden climate changes or abnormal conditions. We may also consider integrating additional dimensions of external factors and exploring optimal strategies for combining these factors to further improve the robustness and accuracy of airspace traffic prediction.
References
- 1. Pang Y, Zhao X, Yan H, Liu Y. Data-driven trajectory prediction with weather uncertainties: a Bayesian deep learning approach. Transp Res Part C Emerg Technol. 2021;130:103326.
- 2. Gui G, Liu F, Sun J, Yang J, Zhou Z, Zhao D. Flight delay prediction based on aviation big data and machine learning. IEEE Trans Veh Technol. 2020;69(1):140–50.
- 3. Zhang X, Mahadevan S. Ensemble machine learning models for aviation incident risk prediction. Decis Support Syst. 2019;116:48–63.
- 4. Du X, Lu Z, Wu D. An intelligent recognition model for dynamic air traffic decision-making. Knowl-Based Syst. 2020;199:105274.
- 5. Murca MCR, Hansman RJ. Identification, characterization, and prediction of traffic flow patterns in multi-airport systems. IEEE Trans Intell Transp Syst. 2019;20(5):1683–96.
- 6. Jiang W, Luo J. Graph neural network for traffic forecasting: a survey. Expert Syst Appl. 2022;207:117921.
- 7. Wu H, Xu J, Wang J, Long M. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. Proc ICLR. 2021.
- 8. Shaygan M, Meese C, Li W, Zhao X (George), Nejad M. Traffic prediction using artificial intelligence: Review of recent advances and emerging opportunities. Transp Res Part C Emerg Technol. 2022;145:103921.
- 9. Monechi B, Servedio VDP, Loreto V. Congestion transition in air traffic networks. PLoS One. 2015;10(5):e0125546. pmid:25993476
- 10. Wang C, Hu M, Yang L, Zhao Z. Prediction of air traffic delays: an agent-based model introducing refined parameter estimation methods. PLoS One. 2021;16(4):e0249754. pmid:33826641
- 11. Dalmau R. Predicting the likelihood of airspace user rerouting to mitigate air traffic flow management delay. Transp Res Part C Emerg Technol. 2022;144:103869.
- 12. Pérez Moreno F, Gómez Comendador VF, Delgado-Aguilera Jurado R, Zamarreño Suárez M, Janisch D, Arnaldo Valdés RM. Methodology of air traffic flow clustering and 3-D prediction of air traffic density in ATC sectors based on machine learning models. Expert Syst Appl. 2023;223:119897.
- 13. Du W, Li B, Chen J, Lv Y, Li Y. A spatiotemporal hybrid model for airspace complexity prediction. IEEE Intell Transport Syst Mag. 2023;15(2):217–24.
- 14. Li A, Li Y, Xu Y, Li X, Zhang C. Multi-scale convolution enhanced transformer for multivariate long-term time series forecasting. Neural Netw. 2024;180:106745. pmid:39340967
- 15. Zhang D, Zhang Z, Chen N, Wang Y. RFNet: multivariate long sequence time-series forecasting based on recurrent representation and feature enhancement. Neural Netw. 2025;181:106800.
- 16. Dhiman HS, Deb D, Guerrero JM. Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renew Sustain Energy Rev. 2019;108:369–79.
- 17. Coulibaly L, Kamsu-Foguem B, Tangara F. Rule-based machine learning for knowledge discovering in weather data. Future Gener Comput Syst. 2020;108:861–78.
- 18. Taszarek M, Kendzierski S, Pilguj N. Hazardous weather affecting European airports: climatological estimates of situations with limited visibility, thunderstorm, low-level wind shear and snowfall from ERA5. Weather Clim Extrem. 2020;28:100243.
- 19. Zheng P, Wang Z, Chen C-H, Pheng Khoo L. A survey of smart product-service systems: key aspects, challenges and future perspectives. Adv Eng Inform. 2019;42:100973.
- 20. Ng KKH, Chen C-H, Lee CKM, Jiao J (Roger), Yang Z-X. A systematic literature review on intelligent automation: aligning concepts from theory, practice, and future perspectives. Adv Eng Inform. 2021;47:101246.
- 21. Yu X, Chen C-H, Yang H. Air traffic controllers’ mental fatigue recognition: a multi-sensor information fusion-based deep learning approach. Adv Eng Inform. 2023;57:102123.
- 22. Ding Y, Zhao W, Song L, Jiang C, Tao Y. Traffic flow prediction based on spatiotemporal encoder-decoder model. PLoS One. 2025;20(5):e0321858. pmid:40445887
- 23. Huang D, He J, Tu Y, Ye Z, Xie L. Spatiotemporal information enhanced multi-feature short-term traffic flow prediction. PLoS One. 2024;19(7):e0306892. pmid:39008494
- 24. Gonzalo J, Domínguez D, López D, García-Gutiérrez A. An analysis and enhanced proposal of atmospheric boundary layer wind modelling techniques for automation of air traffic management. Chin J Aeronaut. 2021;34(5):129–44.
- 25. Yuan J, Pei Y, Xu Y, Li X, Ge Y. Automatic interval management for aircraft based on dynamic fuzzy speed control considering uncertainty. Chin J Aeronaut. 2023;36(11):354–72.
- 26. Gössling S, Humpe A. The global scale, distribution and growth of aviation: implications for climate change. Glob Environ Change. 2020;65:102194. pmid:36777089
- 27. Cai K, Tang S, Qian S, Shen Z, Yang Y. Multi-faceted spatio-temporal network for weather-aware air traffic flow prediction in multi-airport system. Chin J Aeronaut. 2024;37(7):301–16.
- 28. Han Y, Zhang T, Wang M. Holiday travel behavior analysis and empirical study with Integrated Travel Reservation Information usage. Transp Res Part A Policy Pract. 2020;134:130–51.
- 29. Spatharis C, Bastas A, Kravaris T, Blekas K, Vouros GA, Cordero JM. Hierarchical multiagent reinforcement learning schemes for air traffic management. Neural Comput Appl. 2021;35(1):147–59.
- 30.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017.
- 31. Nie Y, Nguyen NH, Sinthong P, Kalagnanam J. A time series is worth 64 words: long-term forecasting with transformers. arXiv. 2023. Available from: http://arxiv.org/abs/2211.14730
- 32. Cai Q, Alam S, Duong VN. A spatial–temporal network perspective for the propagation dynamics of air traffic delays. Engineering. 2021;7(4):452–64.
- 33. Di Vaio A, Varriale L. Blockchain technology in supply chain management for sustainable performance: evidence from the airport industry. Int J Inf Manag. 2020;52:102014.
- 34. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: a review of methods and applications. arXiv. 2021.
- 35. Abdulrazaq MA, Fan W (David). A priority based multi-level heterogeneity modelling framework for vulnerable road users. Transportmetrica A: Transp Sci. 2025:1–34.
- 36. Abdulrazaq MA, Fan WD. Temporal dynamics of pedestrian injury severity: a seasonally constrained random parameters approach. IJTST. 2024.
- 37. Wang X, Jiang H, Zeng T, Dong Y. An adaptive fused domain-cycling variational generative adversarial network for machine fault diagnosis under data scarcity. Inf Fusion. 2026;126:103616.
- 38. Chen Y, Yu F, Chen L, Jin G, Zhang Q. Predictive modeling and multi-objective optimization of magnetic core loss with activation function flexibly selected Kolmogorov-Arnold networks. Energy. 2025;334:137730.
- 39. Yan J, Cheng Y, Zhang F, Li M, Zhou N, Jin B, et al. Research on multimodal techniques for arc detection in railway systems with limited data. Struct Health Monit. 2025.
- 40. Chen Y, Zheng L, Tan Z. Roadside LiDAR placement for cooperative traffic detection by a novel chance constrained stochastic simulation optimization approach. Transp Res Part C Emerg Technol. 2024;167:104838.
- 41. Chen Y, Zhang Q, Yu F. Transforming traffic accident investigations: a virtual-real-fusion framework for intelligent 3D traffic accident reconstruction. Complex Intell Syst. 2024;11(1):76.
- 42. Ermagun A, Levinson D. Spatiotemporal traffic forecasting: review and proposed directions. Transp Rev. 2018;38(6):786–814.
- 43. Yan J, Hu H, Wang Y, Ma X, Hu M, Delahaye D, et al. Robust pre-departure scheduling for a nation-wide air traffic flow management. Chin J Aeronaut. 2025;38(4):103223.
- 44. Nieto MR, Carmona-Benítez RB. ARIMA + GARCH + Bootstrap forecasting method applied to the airline industry. JATM. 2018;71:1–8.
- 45. Fang H, Tian N, Wang Y, Zhou M, Haile MA. Nonlinear Bayesian estimation: from Kalman filtering to a broader horizon. IEEE/CAA J Autom Sinica. 2018;5(2):401–17.
- 46. Xu L, Liang Y, Duan Z, Zhou G. Route-based dynamics modeling and tracking with application to air traffic surveillance. IEEE Trans Intell Transp Syst. 2020;21(1):209–21.
- 47. Lin Y, Zhang J, Liu H. Deep learning based short-term air traffic flow prediction considering temporal–spatial correlation. Aerosp Sci Technol. 2019;93:105113.
- 48. Jardines A, Eivazi H, Zea E, García-Heras J, Simarro J, Otero E, et al. Thunderstorm prediction during pre-tactical air-traffic-flow management using convolutional neural networks. Expert Syst Appl. 2024;241:122466.
- 49. Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, et al. T-GCN: a temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst. 2020;21(9):3848–58.
- 50.
Liu K, Ding K, Cheng X, Xu G, Hu X, Liu T, et al. Airport delay prediction with temporal fusion transformers. Proceedings of the 17th ACM SIGSPATIAL International Workshop on Computational Transportation Science GenAI and Smart Mobility Session. Atlanta (GA): ACM; 2024. p. 5–11. https://doi.org/10.1145/3681772.3698212
- 51. Zhang J, Mao S, Zhang S, Yin J, Yang L, Gao Z. EF-former for short-term passenger flow prediction during large-scale events in urban rail transit systems. Inf Fusion. 2025;117:102916.
- 52. Zhang J, Zhang S, Zhao H, Yang Y, Liang M. Multi-frequency spatial-temporal graph neural network for short-term metro OD demand prediction during public health emergencies. Transportation. 2025.
- 53. Zhang S, Zhang J, Yang L, Chen F, Li S, Gao Z. Physics guided deep learning-based model for short-term origin–destination demand prediction in urban rail transit systems under pandemic. Engineering. 2024;41:276–96.
- 54. Qiu H, Zhang J, Yang L, Han K, Yang X, Gao Z. Spatial–temporal multi-task learning for short-term passenger inflow and outflow prediction on holidays in urban rail transit systems. Transportation. 2025.
- 55. Dantas TM, Cyrino Oliveira FL, Varela Repolho HM. Air transportation demand forecast through Bagging Holt Winters methods. JATM. 2017;59:116–23.
- 56. Hua Y, Zhao Z, Li R, Chen X, Liu Z, Zhang H. Deep learning with long short-term memory for time series prediction. IEEE Commun Mag. 2019;57(6):114–9.
- 57.
Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R. FEDformer: frequency enhanced decomposed transformer for long-term series forecasting. International Conference on Machine Learning. PMLR; 2022. p. 27268–86. Available from: https://proceedings.mlr.press/v162/zhou22g.html
- 58. Qu L, Li W, Li W, Ma D, Wang Y. Daily long-term traffic flow forecasting based on a deep neural network. Expert Syst Appl. 2019;121:304–12.
- 59. Gui G, Zhou Z, Wang J, Liu F, Sun J. Machine learning aided air traffic flow analysis based on aviation big data. IEEE Trans Veh Technol. 2020;69(5):4817–26.
- 60. Li Z, Xia L, Tang J, Xu Y, Shi L, Xia L, et al. UrbanGPT: spatio-temporal large language models. arXiv. 2024.
- 61.
Li W, Yao D, Zhao R, Chen W, Xu Z, Luo C, et al. STBench: assessing the ability of large language models in spatio-temporal analysis. Companion Proceedings of the ACM on Web Conference 2025. Sydney (NSW): ACM; 2025. p. 749–52. https://doi.org/10.1145/3701716.3715293
- 62. Du W, Chen S, Li Z, Cao X, Lv Y. A spatial-temporal approach for multi-airport traffic flow prediction through causality graphs. IEEE Trans Intell Transp Syst. 2024;25(1):532–44.
- 63.
Pisner DA, Schnyer DM. Chapter 6 - support vector machine. In: Mechelli A, Vieira S, editors. Machine learning. Academic Press; 2020. p. 101–21. https://doi.org/10.1016/B978-0-12-815739-8.00006-7
- 64.
Graves A. Long short-term memory. supervised sequence labelling with recurrent neural networks. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 37–45. https://doi.org/10.1007/978-3-642-24797-2_4
- 65. Wu H, Xu J, Wang J, Long M. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. arXiv. 2022. Available from: http://arxiv.org/abs/2106.13008
- 66. Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, et al. iTransformer: inverted transformers are effective for time series forecasting. arXiv. 2024.
- 67.
Donghao L, Xue W. ModernTCN: a modern pure convolution structure for general time series analysis; 2023. Available from: https://openreview.net/forum?id=vpJMJerXHU
- 68.
Hu Y, Liu P, Zhu P, Cheng D, Dai T. Adaptive multi-scale decomposition framework for time series forecasting. 2025.