Fig 1.
Overview of the proposed framework.
(1) Multi-scale Masking: Segments in the global and local regions are masked separately. (2) Multi-scale Cross-attention Encoding: Unmasked segments from both regions are concatenated and fed into a lightweight Transformer-based encoder for cross-attention. (3) Multi-scale Reconstruction: Masked segments in global and local regions are reconstructed using a single-layer Transformer block based on mean square loss after per-segment normalization. (4) Anomaly Score Aggregation: An aggregation strategy enhances sample-level and point-level anomaly scores for anomaly detection and localization, respectively. The Transformer Block denotes the standard Transformer block.
Table 1.
Comparison of methods.
Table 2.
Comparison of computational complexity and model requirements.
Fig 2.
(A) Receiver operating characteristic (ROC) curve and (B) precision–recall (PR) curve of the proposed method, illustrating its discrimination ability across varying thresholds.
Table 3.
Detailed performance metrics of the proposed method at different recall (sensitivity) levels, including corresponding precision, F1-score, and specificity.
Fig 3.
Examples of anomaly localization on the PTB-XL dataset across different types of ECG abnormalities.
Ground truth regions, annotated by cardiologists, are highlighted with red boxes on the ECG signals, while the corresponding anomaly localization results based on the point-level anomaly score (defined in 5) of the proposed method are shown below. Detailed descriptions are provided in S1 Appendix.
Table 4.
Ablation study results for different model configurations.
Fig 4.
Ablation study results for different masking ratios and values of H.
In (A), the x-axis is for different masking ratio θ used in (1) and (2). While in (B), the x-axis represents H used in the probability (3).