A unified framework for unconstrained and constrained ordination of microbiome read count data

doi:10.1371/journal.pone.0205474

Fig 1.

Unconstrained ordination methods.

(A): Principal coordinates (PCoA) sample ordination with Bray-Curtis dissimilarity on relative abundances of the Turnbaugh mice dataset. Taxon scores were added as weighted sample scores. Coloured symbols represent mice, percentages on the axes indicate fraction of eigenvalue to the sum of all eigenvalues. Only the six taxa with taxon scores furthest from the origin are plotted. (B): Biplot of the unconstrained RC(M) ordination of the same dataset. Arrows represent taxa, the ratios of the ψ parameters reflect the relative importance of the corresponding dimensions. Only the six taxa with strongest departure from homogeneity are shown for clarity. The sample ordination is similar to PCoA, but the RC(M) method proposes a more principled approach to identifying the taxa that contribute most to the separation of the samples. LF/PP: low fat, plantpolysaccharide rich.

More »

Expand

Fig 2.

Constrained ordination methods.

(A): Triplot of canonical correspondence analysis (CCA) of the Zeller data. Dots represent samples, the taxon labels indicate the location of the peaks of the taxon response functions under strict assumptions. For clarity, only the eight taxa with peaks furthest from the origin are shown. Percentages along the axes indicate fractions of total inertia (departure from sample homogeneity) explained by the dimension. Arrows depict the contribution of the variables to the environmental gradient. (B): Triplot of the constrained ordination of the same dataset by the RC(M) method with linear response functions. Arrows represent taxon response functions, and labels represent variables constituting the environmental gradient. The ratio of the ψ parameters reflects the relative importance of the corresponding dimensions. Only the eight taxa that react most strongly to the environmental gradients (the longest arrows) are shown. Two Fusobacterium species are among the taxa most sensitive to the environmental gradient, and are more abundant in cancer patients than in the others, which is in accordance with the findings of [11].

More »

Expand

Fig 3.

Effect of conditioning on unconstrained RC(M) ordination.

(A): Unconstrained RC(M) sample ordination of the anterior nares samples of the HMP dataset without conditioning. (B): Ordination of the same sample, but after conditioning on the main sequencing center (Washington University genome center (WUGC), J. Craig Venter Institute (JCVI), Baylor College of Medicine (BCM) and Broad Institute (BI)). The ratio of the ψ parameters reflects the relative importance of the corresponding dimensions.

More »

Expand

Fig 4.

Diagnostic plots for the constrained RC(M) model with linear response functions on the Zeller data.

(A) Triplot with samples coloured by deviance. No clusters of samples with high deviance are visible, which would have pointed to a group of poorly fit samples. (B) Residual plot in function of the first environmental gradient. A clear increase in positive deviance residuals is visible towards for positive environmental scores, which points to a violation of the linearity assumption. (C) Triplot with samples coloured by their influence on the parameter for the “Cancer” level of the diagnosis variable. On the right side of the plot, one sample with a strong negative and one with a strong positive influence on the parameter estimate are visible. These samples may deserve further scrutiny.

More »

Expand

Fig 5.

RC(M) ordination with nonparametric response functions.

One-dimensional triplot of the first dimension of the constrained RC(M) ordination with non-parametrically estimated response functions of the Zeller data. Coloured lines represent taxon response functions. The horizontal dotted line represents the expected taxon abundances under sample homogeneity. Only the eight taxa that react most strongly to changes in the environmental score are shown for clarity. Black labels show the variables constituting the gradient and vertical dashes at the bottom represent the sample scores. The horizontal positions of the variable labels with respect to the vertical dashed line at zero indicate how much they contribute to the environmental gradient; the vertical stacking is only for readability.

More »

Expand

Fig 6.

Results of simulations without signal.

Boxplots of the pseudo-F statistic for sample clustering (y-axis) for several ordination methods (x-axis) for 100 parametric simulation runs. All samples have the same mean taxon composition, but four groups of samples differ in mean library sizes or mean dispersions. See Section Competitor ordination methods for the meaning of the abbreviations. As clustering according to library size or dispersion is undesirable, a small pseudo-F value is preferred. Top: Four groups with differences in library sizes. Bottom: Four groups with differences in dispersions. See S1 Appendix for details.

More »

Expand

Fig 7.

Results of biological signal simulations.

Boxplots of the silhouette (top), pseudo-F statistic (center) and taxon ratio (bottom) for several ordination methods (x-axis) over 100 parametric simulation runs. See Section Competitor ordination methods for the meaning of the abbreviations. 10% of the taxa were made differentially abundant in each of 4 sample groups, with a fold change of 5. As there are true differences in composition between the groups, a large pseudo-F value is preferred. Columns correspond to the simulation scenario: negative binomial (NB) (cor: data generation with taxon correlation, phy: phylogenetically correlated taxa were made differentially abundant), Dirichlet multinomial (DM) and zero-inflated negative binomial (ZINB). See S1 Appendix for details.

More »

Expand