A lightweight and robust method for electrocardiogram anomaly detection and localization using multi-scale masked autoencoder

doi:10.1371/journal.pone.0343571

Fig 1.

Overview of the proposed framework.

(1) Multi-scale Masking: Segments in the global and local regions are masked separately. (2) Multi-scale Cross-attention Encoding: Unmasked segments from both regions are concatenated and fed into a lightweight Transformer-based encoder for cross-attention. (3) Multi-scale Reconstruction: Masked segments in global and local regions are reconstructed using a single-layer Transformer block based on mean square loss after per-segment normalization. (4) Anomaly Score Aggregation: An aggregation strategy enhances sample-level and point-level anomaly scores for anomaly detection and localization, respectively. The Transformer Block denotes the standard Transformer block.

More »

Expand

Table 1.

Comparison of methods.

More »

Expand

Table 2.

Comparison of computational complexity and model requirements.

More »

Expand

Fig 2.

(A) Receiver operating characteristic (ROC) curve and (B) precision–recall (PR) curve of the proposed method, illustrating its discrimination ability across varying thresholds.

More »

Expand

Table 3.

Detailed performance metrics of the proposed method at different recall (sensitivity) levels, including corresponding precision, F1-score, and specificity.

More »

Expand

Fig 3.

Examples of anomaly localization on the PTB-XL dataset across different types of ECG abnormalities.

Ground truth regions, annotated by cardiologists, are highlighted with red boxes on the ECG signals, while the corresponding anomaly localization results based on the point-level anomaly score (defined in 5) of the proposed method are shown below. Detailed descriptions are provided in S1 Appendix.

More »

Expand

Table 4.

Ablation study results for different model configurations.

More »

Expand

Fig 4.

Ablation study results for different masking ratios and values of H.

In (A), the x-axis is for different masking ratio θ used in (1) and (2). While in (B), the x-axis represents H used in the probability (3).

More »

Expand