Fig 1.
The proposed deep learning framework, where n denotes the number of timepoints from each subject, which is not the same for all subjects.
Fig 2.
Architecture of the autoencoder used in the proposed model.
It has three hidden layers with 60, 25, and 60 neurons in that order. The number of neurons for the output layer equals the number of raw features (215 in our case).
Fig 3.
Summary of the DIABIMMUNE dataset.
(A) Number of subjects allergic to milk, egg, and peanut within the DIABIMMUNE cohort after filtering out missing data. The food-allergy group is the summation of milk, egg, and peanut allergy. (B) Time distribution of 731 samples from 195 subjects (281 samples from 71 Finnish, 197 samples from 70 Estonian, and 253 samples from 54 Russian) stool samples sequenced using MGS from the DIABIMMUNE project. The collected samples have various forms of inconsistencies, such as different numbers of samples per subject.
Table 1.
Losses (training, validation, and testing) of the best trained model for several autoencoder architectures.
The 60x25x60 architecture is the chosen model because it achieves small loss while using the least number of parameters.
Fig 4.
(A) Trajectory of training and validation loss function (Loss(2)) of the autoencoder with 60x25x60 architecture. The lowest loss on validation set was 16.79 and when the test set applied to this best model, the loss was 17.04. (B) Trajectory of training and validation loss function (Loss(1)) of the LSTM. The lowest loss on validation set was 0.91 and when the test set applied to this best model, the loss was 1.02
Table 2.
Evaluation of auROC and MCC for the proposed LSTM model versus baseline models.
In this table, we evaluated six classifiers; LSTM, HMM, MLPNN, RF, SVM, and LASSO. For each classifier, we evaluated four types of input features; latent features which extracted from the trained autoencoder, 25 features selected by mRMR method, 25 most variable features, and 215 raw taxonomic profile features. The auROC and MCC results shown below are the average of auROC and MCC measured on the test set. The experiments were repeated 10 times and samples were shuffled after each 10-fold cross-validation to test the robustness of each classifier. P-values were calculated using Mann-Whitney U test between LSTM-mRMR-25 versus each corresponding method.