Data augmentation using Variational Autoencoders for improvement of respiratory disease classification

doi:10.1371/journal.pone.0266467

Fig 1.

Summary of techniques used in automated respiratory sounds auscultation.

More »

Expand

Table 1.

Literature review of classification models proposed for lung sound auscultation.

More »

Expand

Table 2.

A literature review of data augmentation techniques for audio classification.

More »

Expand

Fig 2.

Distribution of crackles and wheezes in the respiratory cycle.

More »

Expand

Fig 3.

Patient wise diagnosis in ICBHI dataset.

More »

Expand

Fig 4.

Count of audio files for various respiratory diseases.

More »

Expand

Fig 5.

Distribution of respiratory cycle per class.

More »

Expand

Fig 6.

Class wise split of audio segments into train and test sets.

More »

Expand

Fig 7.

Proposed methodology.

More »

Expand

Fig 8.

Histogram showing the distribution of respiratory cycle durations.

More »

Expand

Fig 9.

Padded raw audio segments of all classes used in the study.

More »

Expand

Fig 10.

Structure of variational autoencoder.

More »

Expand

Fig 11.

Mel spectrograms of various respiratory diseases.

More »

Expand

Fig 12.

Overall architecture of MLP-VAE.

More »

Expand

Fig 13.

Architecture of CNN-VAE.

More »

Expand

Fig 14.

Overall architecture of conditional VAE.

More »

Expand

Table 3.

Samples generated by proposed variational autoencoders.

More »

Expand

Fig 15.

Procedure for computing MFCCs.

More »

Expand

Fig 16.

MFCC for various respiratory classes.

More »

Expand

Table 4.

Hyperparameters configuration of the proposed classification models.

More »

Expand

Fig 17.

Visual representation of MLP model.

More »

Expand

Fig 18.

Visual representation of CNN model.

More »

Expand

Fig 19.

Visual representation of RNN-LSTM model.

More »

Expand

Fig 20.

Visual representation of RESNET-50 transfer learning model.

More »

Expand

Fig 21.

Visual representation of EFFICIENT NET B0 transfer learning model.

More »

Expand

Fig 22.

Computation of FAD.

More »

Expand

Fig 23.

FAD of synthetic samples w.r.t real samples for minority classes.

More »

Expand

Table 5.

FAD of synthetic samples of minority classes w.r.t real samples.

More »

Expand

Fig 24.

Principal components of MFCCs of synthetic (MLP-VAE) and real samples of minority classes.

More »

Expand

Fig 25.

Principal components of MFCCs of synthetic (CNN-VAE) and real samples of minority classes.

More »

Expand

Fig 26.

Principal components of MFCCs of synthetic (Conditional VAE) and real samples of minority classes.

More »

Expand

Fig 27.

Correlation heatmap between sampled synthetic (MLP-VAE) and real audio segments for all minority classes.

More »

Expand

Fig 28.

Correlation heatmap between sampled synthetic (CNN-VAE) and real audio segments for all minority classes.

More »

Expand

Fig 29.

Correlation heatmap between sampled synthetic (Conditional-VAE) and real audio segments for all minority classes.

More »

Expand

Table 6.

Cross-correlation between sampled synthetic and real audio segments for each class.

More »

Expand

Fig 30.

Mean Mel Cepstral Distortion between the mel cepstras of the synthetic and real audio samples for all classes.

More »

Expand

Fig 31.

Confusion matrix.

More »

Expand

Fig 32.

Classwise comparison of F1 score achieved by the classifiers with different training set.

More »

Expand

Table 7.

Impact of VAE augmentation on the performance of classification models.

More »

Expand

Fig 33.

Confusion matrices for ANN classifier with imbalanced and augmented training sets.

More »

Expand

Fig 34.

Confusion matrices for CNN classifier with imbalanced and augmented training sets.

More »

Expand

Fig 35.

Confusion matrices for LSTM classifier with imbalanced and augmented training sets.

More »

Expand

Fig 36.

Confusion matrices for RESNET-50 classifier with imbalanced and augmented training sets.

More »

Expand

Fig 37.

Confusion matrices for Efficient Net B0 classifier with imbalanced and augmented training set.

More »

Expand

Table 8.

Statistical significance of performance metrics achieved by various classifiers with imbalanced and augmented training sets.

More »

Expand

Fig 38.

Comparative summary of recent works undertaken towards respiratory sounds classification.

More »

Expand

Table 9.

Comparison of our results with recent works undertaken towards multi-class respiratory disease classification.

More »

Expand