Prediction of cognitive impairment through speech data analysis: A comparative evaluation of deep learning models

doi:10.1371/journal.pone.0349412

Table 1.

Characteristics of the voice dataset.

More »

Expand

Fig 1.

Architecture of the 1D CNN (1-Dimensional Convolutional Neural Network) model.

The diagram illustrates the sequential transformation of raw audio features through feature extraction layers down to the final classification logic, highlighting its capability to process sequential time-series acoustic data.

More »

Expand

Fig 2.

Architecture of the AST (Audio Spectrogram Transformer) model.

This schematic details how 2D audio spectrograms are divided into localized patches and processed through transformer encoder blocks utilizing self-attention mechanisms to learn global acoustic context.

More »

Expand

Fig 3.

Architecture of the Wav2Vec 2.0 model.

The figure demonstrates the self-supervised learning pipeline, showcasing the initial encoding of raw speech waveforms via a CNN block followed by deep contextualized representation learning within a robust transformer network.

More »

Expand

Table 2.

Performance metrics and statistical comparisons for MCI and AD classification models.

More »

Expand

Fig 4.

Confusion matrices for the Wav2Vec 2.0 and 1D CNN (Spectrogram) models across binary classification tasks.

(A) Wav2Vec 2.0 for NC vs. MCI. (B) Wav2Vec 2.0 for NC vs. AD. (C) 1D CNN (Spectrogram) for NC vs. MCI. (D) 1D CNN (Spectrogram) for NC vs. AD. Values represent aggregated predictions across all five cross-validation folds. NC, Normal Cognition; MCI, Mild Cognitive Impairment; AD, Alzheimer’s Disease.

More »

Expand