Prediction of cognitive impairment through speech data analysis: A comparative evaluation of deep learning models
Fig 2
Architecture of the AST (Audio Spectrogram Transformer) model.
This schematic details how 2D audio spectrograms are divided into localized patches and processed through transformer encoder blocks utilizing self-attention mechanisms to learn global acoustic context.