Fig 1.
Overall architecture of KALFormer.
The framework integrates Long Short-Term Memory (LSTM) units for temporal encoding, a self-attention mechanism for capturing global dependencies, a Graph Neural Network (GNN)-based Knowledge-Augmented Network (KAN) for nonlinear relational interactions, and a multi-layer Transformer encoder–decoder for feature fusion and sequence prediction. The model outputs are normalized and passed through a linear layer and Softmax for forecasting probabilities.
Fig 2.
Structure of the multi-head attention mechanism.
Each attention head performs a scaled dot-product attention using individual query (Q), key (K), and value (V) projections. The outputs from all heads are concatenated and linearly transformed to form the final attention representation, enabling the model to jointly attend to information from different subspaces.
Table 1.
Main characteristics of each dataset used in the experiments.
Table 2.
Experimental environment, model configuration, and training hyperparameters.
Table 3.
Performance comparison between KALFormer and state-of-the-art models.
Fig 3.
Performance comparison across benchmark datasets.
Heatmap visualization of MSE and MAE values for different models on six public datasets. Darker blue shades indicate lower error values. KALFormer consistently achieves the lowest MSE and MAE across all datasets, demonstrating superior generalization and robustness.
Table 4.
Ablation study results for different module combinations.
Fig 4.
Ablation study on MAE and MSE performance.
Each configuration represents a variant of the KALFormer architecture, isolating the contribution of individual modules. Results are reported as mean ± Std over three independent runs. KALFormer achieves the lowest error, confirming the effectiveness of multi-level fusion.
Fig 5.
Accuracy and loss comparison in ablation experiments.
The figure reports trend-based accuracy (%) and loss (mean ± Std) for each model variant. KALFormer exhibits the highest accuracy (95.19%) and the lowest loss (0.1631), highlighting its predictive precision and training stability compared with other configurations.
Fig 6.
Attention mechanism visualization on representative datasets.
Normalized attention maps of KALFormer on ETTm1 and Electricity datasets for 96- and 192-step forecasts, showing a shift from local diagonal focus to broader periodic patterns across variables.
Fig 7.
Evolution of attention energy across temporal scales.
As the forecasting horizon extends, attention becomes more diffuse, reflecting adaptive balance between short-term precision and global contextual awareness.
Table 5.
Complexity and efficiency comparison among LSTM, BiLSTM, and KALFormer.