Fig 1.
Raw data in waveform and representation in spectrogram, mel-spectrogram, and mfcc form.
Fig 2.
The architecture of the proposed 1D-CNN model.
Table 1.
Layers and parameters of the proposed 1D-CNN model.
Fig 3.
The architecture of the 2DS-CNN model.
Table 2.
Layers and parameters of the 2DS-CNN model.
Fig 4.
The architecture of the 2DM-CNN model.
Table 3.
Layers and parameters of the 2DM-CNN model.
Fig 5.
Speech, spectrogram, mel-spectrogram and mfcc images of used datasets.
Table 4.
Hardware specification values were used for training, testing, and analysis.
Fig 6.
Comparison of complexity and time parameters of all models (lower is better).
Table 5.
Comparison results on digital speech dataset (mean±std).
Table 6.
Comparison results on spectrogram dataset.
Table 7.
Comparison results on the mel-spectrogram dataset.
Table 8.
Comparison results on the mfcc dataset.
Fig 7.
Results of the proposed algorithms (higher is better).
Fig 8.
The training process of the End2End and 2DM-CNN models in the first run.
Fig 9.
The confusion matrix of End2End and 2DM-CNN models in the first run.
Fig 10.
Confusion matrices at the first run of the proposed neural networks used on the mel-spectrogram dataset.
Fig 11.
Models ranking (lower is better).