Multi-omics data integration reveals metabolome as the top predictor of the cervicovaginal microenvironment

doi:10.1371/journal.pcbi.1009876

Fig 1.

Schematic of a multi-omics approach to study the complex interplay between HPV, host and microbiota in women across cervical neoplasia.

In this multicenter study n = 72 women were enrolled with invasive cervical carcinoma (ICC), high- and low-grade squamous intraepithelial lesions (HSIL, LSIL), as well as, HPV-positive and healthy HPV-negative controls (Ctrl). Two vaginal swabs and cervicovaginal lavage (CVL) were collected from each participant. Vaginal swabs were used for microbiome analysis and to evaluate vaginal pH. CVL samples were used for metabolome and immunoproteome analyses. The vaginal microbiota compositions were determined by 16S rRNA gene sequencing revealing 763 amplicon sequencing variants (ASVs). Cervicovaginal metabolic fingerprints in CVL samples were profiled by liquid chromatography-mass spectrometry and identified 467 unique metabolites. Levels of immune mediators and other cancer-related proteins in CVL samples (68 targets) were evaluated using multiplex cytometric bead arrays. Principal component, hierarchical clustering, neural network (mmvec) and Random Forest analyses were utilized to explore associations among multi-omics data sets to predict Lactobacillus dominance (dominant vs. non-dominant), vaginal pH (low ≤5 vs. high >5), evidence of genital inflammation (high, low, none) and disease status (Ctrl HPV–, Ctrl HPV+, LSIL, HSIL, ICC).

More »

Expand

Fig 2.

Metabolome features cluster most significantly according to patient covariate groups.

A-D. Principal coordinate analysis (PCoA) of the Jaccard distance calculated from microbiome samples. The differences among the groups were tested for significance using a PERMANOVA on the distance matrices. E-L. For metabolome (E-H) and immunoproteome (I-L) features the principal component analysis (PCA) was performed on log-transformed and scaled features (zero mean and unit variance). The differences among groups were assessed using the multivariate analysis of variance (MANOVA) model for the first two principal components.

More »

Expand

Fig 3.

Microbiome-metabolome interaction probabilities via mmvec predict strong associations between lipid metabolites with Prevotella, Streptococcus, Atopobium, Sneathia and other clades.

A. The principal component analysis (PCA) biplot displays the top correlations, colored by genus (for microbial features) or by super pathway (for metabolite features). The correlations were tested using mmvec. This method uses neural networks for estimating microbe-metabolite interactions through their co-occurrence probabilities. Microbes (points) and metabolites (arrows) that appear closer to each other in the biplot have a higher likelihood of co-occurring. B. The heatmap depicts the correlation coefficients between ASVs and metabolites; hierarchical clustering was done via average weighted Bray-Curtis distance. ASVs were determined using the consensus taxonomy (see Materials and Methods section).

More »

Expand

Fig 4.

Metabolites (particularly xenobiotics, carbohydrates, amino acids and peptides) and the inflammatory cytokine MIF can accurately predict Lactobacillus dominance.

Integrated vaginal metabolome and immunoproteome profiles were used as predictive features for training cross-validated Random Forest classifiers to predict whether a subject’s vaginal microbiota is Lactobacillus dominant (LD ≥ 80% relative abundance consists of Lactobacillus ASVs) or non-LD (NLD < 80% relative abundance consists of lactobacilli). Combined measurements predict the Lactobacillus dominance at an overall accuracy rate of 86.1%. A 1.6-fold improvement over baseline accuracy was observed. Receiver operating characteristics (ROC) analysis showing true and false positive rates for each group, indicating excellent predictive accuracy for both LD (AUC = 0.93) and NLD groups (AUC = 0.93) (A). The confusion matrix illustrates the proportion of times each sample receives the correct classification when evaluating the classifier at a threshold of 0.5 (B). The graphs depict the 25 most strongly predictive features ranked by their mean Gini importance score across all 10 trained classifiers, a measure of their overall contribution to classifier accuracy (C).

More »

Expand

Fig 5.

Metabolites (particularly amino acids, peptides and nucleotides) and inflammatory cytokine MIF are the best predictors of vaginal pH.

Integrated vaginal microbiome, metabolome, and immunoproteome profiles were used as predictive features for training cross-validated Random Forest classifiers to predict whether a subject’s vaginal pH was low (≤ 5.0) or high (> 5.0). Combined measurements predict vaginal pH at an overall accuracy rate of 77.8%. A 1.5-fold improvement over baseline accuracy was observed. Receiver operating characteristics (ROC) analysis showing true and false positive rates for each group, indicating weak predictive accuracy (micro-average AUC = 0.72) for both low (AUC = 0.71) and high pH groups (AUC = 0.71) (A). The confusion matrix illustrates the proportion of times each sample receives the correct classification when evaluating the classifier at a threshold of 0.5 (B). The graphs depict the 25 most strongly predictive features ranked by their mean Gini importance score across all 10 trained classifiers, a measure of their overall contribution to classifier accuracy (C).

More »

Expand

Fig 6.

Various metabolites (particularly long-chain fatty acids, sphingolipids and glucose), protein biomarkers (IL-6, IL-10, MIP-1α) are the best predictors of the genital inflammation.

Integrated vaginal microbiome, metabolome, and immunoproteome profiles (excluding the 7 cytokines used to score genital inflammation) were used as predictive features for training cross-validated Random Forest classifiers to predict whether a subject’s genital inflammation score was “no inflammation” (0), low (1–4), or high (≥ 5.0). Combined measurements predict inflammation score at an overall accuracy rate of 77.8%. A 1.7-fold improvement over baseline accuracy was observed. Receiver operating characteristics (ROC) analysis showing true and false positive rates for each group, indicating moderate average accuracy (micro-average AUC = 0.90) and weak to good predictive accuracy for each group (A). The confusion matrix illustrates the proportion of times each sample receives the correct classification when evaluating the classifier at a threshold of 0.5 (B). The graphs depict the 25 most strongly predictive features ranked by their mean Gini importance score across all 10 trained classifiers, a measure of their overall contribution to classifier accuracy (C).

More »

Expand

Fig 7.

Integrating multiple–omics datasets does not dramatically improve overall prediction accuracy; however, different integration of various measurements are needed for the best prediction of distinct features.

Graphs show stepwise accuracy levels for Lactobacillus dominance (A), vaginal pH (B) and genital inflammation (C) when Random Forest models are trained on a single omics dataset or combined data containing 2–3 omics datasets. Lactobacillus dominance can be explained mostly by metabolome data, vaginal pH by metabolome and microbiome datasets, and genital inflammation by metabolome and immunoproteome datasets. Combining omics datasets leads to higher average accuracy scores for Lactobacillus dominance and vaginal pH and genital inflammation classifications, but not for Lactobacillus dominance classification.

More »

Expand