KALFormer: Knowledge-augmented attention learning for long-term time series forecasting with transformer

doi:10.1371/journal.pone.0338052

Fig 1.

Overall architecture of KALFormer.

The framework integrates Long Short-Term Memory (LSTM) units for temporal encoding, a self-attention mechanism for capturing global dependencies, a Graph Neural Network (GNN)-based Knowledge-Augmented Network (KAN) for nonlinear relational interactions, and a multi-layer Transformer encoder–decoder for feature fusion and sequence prediction. The model outputs are normalized and passed through a linear layer and Softmax for forecasting probabilities.

More »

Expand

Fig 2.

Structure of the multi-head attention mechanism.

Each attention head performs a scaled dot-product attention using individual query (Q), key (K), and value (V) projections. The outputs from all heads are concatenated and linearly transformed to form the final attention representation, enabling the model to jointly attend to information from different subspaces.

More »

Expand

Table 1.

Main characteristics of each dataset used in the experiments.

More »

Expand

Table 2.

Experimental environment, model configuration, and training hyperparameters.

More »

Expand

Table 3.

Performance comparison between KALFormer and state-of-the-art models.

More »

Expand

Fig 3.

Performance comparison across benchmark datasets.

Heatmap visualization of MSE and MAE values for different models on six public datasets. Darker blue shades indicate lower error values. KALFormer consistently achieves the lowest MSE and MAE across all datasets, demonstrating superior generalization and robustness.

More »

Expand

Complexity and efficiency comparison among LSTM, BiLSTM, and KALFormer.

More »

Expand