Crossmixed convolutional neural network for digital speech recognition

doi:10.1371/journal.pone.0302394

Fig 1.

Raw data in waveform and representation in spectrogram, mel-spectrogram, and mfcc form.

More »

Expand

Fig 2.

The architecture of the proposed 1D-CNN model.

More »

Expand

Table 1.

Layers and parameters of the proposed 1D-CNN model.

More »

Expand

Fig 3.

The architecture of the 2DS-CNN model.

More »

Expand

Table 2.

Layers and parameters of the 2DS-CNN model.

More »

Expand

Fig 4.

The architecture of the 2DM-CNN model.

More »

Expand

Table 3.

Layers and parameters of the 2DM-CNN model.

More »

Expand

Fig 5.

Speech, spectrogram, mel-spectrogram and mfcc images of used datasets.

More »

Expand

Table 4.

Hardware specification values were used for training, testing, and analysis.

More »

Expand

Fig 6.

Comparison of complexity and time parameters of all models (lower is better).

More »

Expand

Table 5.

Comparison results on digital speech dataset (mean±std).

More »

Expand

Table 6.

Comparison results on spectrogram dataset.

More »

Expand

Table 7.

Comparison results on the mel-spectrogram dataset.

More »

Expand

Table 8.

Comparison results on the mfcc dataset.

More »

Expand

Fig 7.

Results of the proposed algorithms (higher is better).

More »

Expand

Fig 8.

The training process of the End2End and 2DM-CNN models in the first run.

More »

Expand

Fig 9.

The confusion matrix of End2End and 2DM-CNN models in the first run.

More »

Expand

Fig 10.

Confusion matrices at the first run of the proposed neural networks used on the mel-spectrogram dataset.

More »

Expand

Fig 11.

Models ranking (lower is better).

More »

Expand