Predicted meta-omics: A potential solution to multi-omics data scarcity in microbiome studies

doi:10.1371/journal.pone.0345919

Fig 1.

Overview of our experimental set-up.

(A) We perform a pre-evaluation of MelonnPan [16] on three paired metagenomics and metabolomics datasets (S1 Table), comparing two filtering approaches for microbial features. In our main experimental pipeline, we use pre-processed gut microbiome data (B) to train regression models as meta-omics predictors (C), and subsequently evaluate these predictions for the downstream task of IBD prediction (D). Abbreviations: IBD (inflammatory bowel disease), ESRD (End-Stage Renal Disease), IBDMDB (The Inflammatory Bowel Disease Multi’omics Database), mGx (metagenomics), mTx (metatranscriptomics), mPx (metaproteomics), mBx (metabolomics), ECs (enzyme commission numbers), LC-MS (liquid chromatography-mass spectrometry), CLR (centered log-ratio).

More »

Expand

Fig 2.

Comparison of meta-omics predictors.

(A) Mean test performance results of cross-omics regression models on several prediction tasks, calculated across 10 different dataset partitions. The average Spearman’s rank correlation coefficient was calculated for the 50 best predicted features for each output type. (B) For metatranscriptomics (mTx) and metaproteomics (mPx) predictions generated by MelonnPan [16], we also plot kernel density estimates comparing correlations between the input data and the ground-truth mTx/mPx data with those computed between the predicted data and the ground truth data. We perform this analysis on the 50 best predicted features for each output type, as well as all predicted features. Input types are represented through different colors, while cross-omics models are represented using different color intensities. Abbreviations: neural network (NN), metagenomics (mGx), metatranscriptomics (mTx), metaproteomics (mPx), metabolomics (mBx).

More »

Expand

Fig 3.

Performance comparison of multi-omics and single-omics input data using MelonnPan [16].

Results of single-omics input types are shown as vertical colored lines. Model performance on multi-omics data is indicated relatively to the best single-omics input type in a combination. The combination is displayed as a two- and three-color diagonally spliced bar with colors indicating including respective data types. Improvements or downgrades in performance are indicated with arrows and the size of the bar. Abbreviations: metagenomics (mGx, blue), metatranscriptomics (mTx, pink), metaproteomics (mPx, green), metabolomics (mBx).

More »

Expand

Fig 4.

Analysis of well-predicted features.

(A) Jaccard similarities between the sets of the 25% best predicted features by MelonnPan [16] for each output type (mTx, mPx, mBx), compared across 10 different train/test partitions. All predictions were generated from mGx data. (B) Jaccard similarities and Venn diagram of the sets of the 25% best predicted proteins, compared across input types. (C) Jaccard similarities and Venn diagram of the sets of the 25% best predicted metabolites, compared across input types. (D) Performance of a deep neural network model (S3 Note) trained on different feature subsets (all, 50%, 25% and 10%), based on a pre-training step for feature selection (S3 Note and S4 Fig). The best results for each input-output combination are highlighted. Abbreviations: metagenomics (mGx), metatranscriptomics (mTx), metaproteomics (mPx), metabolomics (mBx).

More »

Expand

Fig 5.

Accuracy of random forest classifiers on the binary task of inflammatory bowel disease prediction, averaged across 10 test partitions.

Balanced accuracy of random forest classifiers on the binary task of inflammatory bowel disease prediction, averaged across 10 test partitions. From left to right within a group of bars, we show performance of classifiers using: input meta-omics data, predicted meta-omics data (striped; generated with MelonnPan [16]), and ground-truth meta-omics data for the predictions. Gray boxes indicate performance of random guessing, obtained from Dummy classifiers based on stratified sampling. Abbreviations: metagenomics (mGx), metatranscriptomics (mTx), metaproteomics (mPx), metabolomics (mBx).

More »

Expand