Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

The proposed framework includes feature embedding and fusion layers, an adaptive time-frequency fusion Transformer encoder and decoder with a deformable attention mechanism, and a medical prior-guided attention module.

The model achieves time-frequency complementarity and semantic alignment through gated connections and residual fusion, thereby improving acoustic interpretability while maintaining discriminative performance.

More »

Fig 1 Expand

Fig 2.

Schematic diagram of the Adaptive Time-Frequency Fusion Transformer architecture.

The ATF-Transformer encoder on the left uses multimodal feature fusion and a time-frequency attention mechanism to jointly model time and frequency domain features. The decoder on the right uses learnable weights and gating mechanisms to achieve feature reconstruction and semantic alignment, resulting in an interpretable time-frequency fusion representation.

More »

Fig 2 Expand

Fig 3.

A schematic diagram of the Medical Guided Interpretable Attention Map (MGIAM) structure.

The module introduces medical feature channels between Transformer blocks, and achieves dynamic coupling of medical semantics and attention through layer normalization, feature allocation and interpretable feature generation, so that both forward and backward propagation have medical interpretability and structural consistency.

More »

Fig 3 Expand

Fig 4.

Mel spectra of partial data examples from three datasets.

More »

Fig 4 Expand

Table 1.

Experimental configuration.

More »

Table 1 Expand

Table 2.

Performance comparison of different models on Dataset 1 for pertussis sound recognition (Mean ± Std).

More »

Table 2 Expand

Table 3.

Performance comparison of different models on Dataset 2 for pertussis sound recognition (Mean ± Std).

More »

Table 3 Expand

Table 4.

Performance comparison of different models on Dataset 3 for pertussis sound recognition (Mean ± Std).

More »

Table 4 Expand

Table 5.

Ablation study of ATF and MGIAM modules across three datasets for pertussis sound recognition (Mean ± Std).

More »

Table 5 Expand

Fig 5.

Comparative analysis of the experimental results of using and not using the MGIAM module on the SHAP feature importance map.

More »

Fig 5 Expand

Fig 6.

Comparative experimental results of this paper and Baseline’s Transformer algorithm on confusion matrices.

More »

Fig 6 Expand

Fig 7.

Experimental results of the algorithm presented in this paper on the test set using t-SNE.

More »

Fig 7 Expand

Fig 8.

Comparison of experimental results between the algorithm presented in this paper and the Transformer architecture.

More »

Fig 8 Expand

Fig 9.

Dependency graph comparison between Transformer and the algorithm in this paper.

More »

Fig 9 Expand

Table 6.

Model stability analysis under different noise intensities across three datasets (Mean ± Std).

More »

Table 6 Expand

Table 7.

Computational cost comparison between the Transformer baseline and the proposed model.

More »

Table 7 Expand