Fig 1.
Summary of techniques used in automated respiratory sounds auscultation.
Table 1.
Literature review of classification models proposed for lung sound auscultation.
Table 2.
A literature review of data augmentation techniques for audio classification.
Fig 2.
Distribution of crackles and wheezes in the respiratory cycle.
Fig 3.
Patient wise diagnosis in ICBHI dataset.
Fig 4.
Count of audio files for various respiratory diseases.
Fig 5.
Distribution of respiratory cycle per class.
Fig 6.
Class wise split of audio segments into train and test sets.
Fig 7.
Proposed methodology.
Fig 8.
Histogram showing the distribution of respiratory cycle durations.
Fig 9.
Padded raw audio segments of all classes used in the study.
Fig 10.
Structure of variational autoencoder.
Fig 11.
Mel spectrograms of various respiratory diseases.
Fig 12.
Overall architecture of MLP-VAE.
Fig 13.
Architecture of CNN-VAE.
Fig 14.
Overall architecture of conditional VAE.
Table 3.
Samples generated by proposed variational autoencoders.
Fig 15.
Procedure for computing MFCCs.
Fig 16.
MFCC for various respiratory classes.
Table 4.
Hyperparameters configuration of the proposed classification models.
Fig 17.
Visual representation of MLP model.
Fig 18.
Visual representation of CNN model.
Fig 19.
Visual representation of RNN-LSTM model.
Fig 20.
Visual representation of RESNET-50 transfer learning model.
Fig 21.
Visual representation of EFFICIENT NET B0 transfer learning model.
Fig 22.
Computation of FAD.
Fig 23.
FAD of synthetic samples w.r.t real samples for minority classes.
Table 5.
FAD of synthetic samples of minority classes w.r.t real samples.
Fig 24.
Principal components of MFCCs of synthetic (MLP-VAE) and real samples of minority classes.
Fig 25.
Principal components of MFCCs of synthetic (CNN-VAE) and real samples of minority classes.
Fig 26.
Principal components of MFCCs of synthetic (Conditional VAE) and real samples of minority classes.
Fig 27.
Correlation heatmap between sampled synthetic (MLP-VAE) and real audio segments for all minority classes.
Fig 28.
Correlation heatmap between sampled synthetic (CNN-VAE) and real audio segments for all minority classes.
Fig 29.
Correlation heatmap between sampled synthetic (Conditional-VAE) and real audio segments for all minority classes.
Table 6.
Cross-correlation between sampled synthetic and real audio segments for each class.
Fig 30.
Mean Mel Cepstral Distortion between the mel cepstras of the synthetic and real audio samples for all classes.
Fig 31.
Confusion matrix.
Fig 32.
Classwise comparison of F1 score achieved by the classifiers with different training set.
Table 7.
Impact of VAE augmentation on the performance of classification models.
Fig 33.
Confusion matrices for ANN classifier with imbalanced and augmented training sets.
Fig 34.
Confusion matrices for CNN classifier with imbalanced and augmented training sets.
Fig 35.
Confusion matrices for LSTM classifier with imbalanced and augmented training sets.
Fig 36.
Confusion matrices for RESNET-50 classifier with imbalanced and augmented training sets.
Fig 37.
Confusion matrices for Efficient Net B0 classifier with imbalanced and augmented training set.
Table 8.
Statistical significance of performance metrics achieved by various classifiers with imbalanced and augmented training sets.
Fig 38.
Comparative summary of recent works undertaken towards respiratory sounds classification.
Table 9.
Comparison of our results with recent works undertaken towards multi-class respiratory disease classification.