Fig 1.
The four stages of the causal inference framework [21] adapted to the exploration of environment-gut microbiome relationships.
Stage 1: Formulation of a plausible hypothetical intervention (e.g., decreasing inhaled environmental exposures) to examine its impacts on the gut microbiome. Stage 2: Construct a hypothetical paired-randomized experiment in which the environmental intervention been implemented randomly. Stage 3: Choose powerful test statistics comparing the gut microbiome had the subjects been hypothetically randomized to the environmental intervention vs. not and test the sharp null hypotheses of no effect of the intervention at different aggregation levels of the data. Stage 4: Interpretation of the statistical analyses and recommendations for future studies or implementation of the intervention.
Table 1.
Potential outcomes for the subjects of the hypothetical experiment.
Table 2.
Before and after matching number of units.
The thresholds for the air pollution experiment are based on 90th and 10th percentiles of the PM2.5 distribution.
Table 3.
Data transformation and choice of test statistics.
Table 4.
Baseline characteristics of the study population in the air pollution reduction (left table) and smoking prevention experiments (right table).
Continuous variables: mean and standard deviation (St. d.). Categorical variables: number of samples per category (N) and proportion of category (%).
Fig 2.
Boxplots (with median), values of the test-statistics from the betta regression [54], and one-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design. (A) Boxplots of the richness. (B) Boxplots of the α-diversity.
Table 5.
Microbiome Regression-based Kernel Association Test (MiRKAT), unadjusted and adjusted one-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design.
Table 6.
Compositional equivalence test.
Test statistic for high-dimensional data suggested by [56] and one-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design.
Fig 3.
For each genus, adjusted two-sided randomization-based p-values for 10,000 permutations of the smoking prevention intervention assignment following a matched-pair design. Genera with no tip point belong to the set of reference taxa. Black circled tip point: differentially abundant genus (Marvinbryantia) in the air pollution reduction experiment.
Fig 4.
Genus-genus associations of smokers and never-smokers (n = 271, p = 140).
(A) Visualization of the genus-genus partial correlations estimated with the SPIEC-EASI method. Edges thickness is proportional to partial correlation, and color to sign: red: negative partial correlation, green: positive partial correlation. Node size is proportional to the centered log ratio of the genus abundances, and color is according to phyla. Triangle shaped nodes are differentially abundant (see Fig 3). (B) Zoom in largest connected component and differential associations (bold genera).
Table 7.
Differential associations of genera.
Smallest five adjusted two-sided randomization-based p-values for 10,000 permutations of the intervention assignment following a matched-pair design.
Fig 5.
Lipid metabolites exploration.
(A) Lipid metabolites correlation with selected genera from the smoking prevention experiment (green). (B) Scatterplots of high-density lipoprotein (HDL) cholesterol and triglycerides vs. centered log-ratio transformed relative abundances of the genera Ruminococcaceae-UCG-005 and Christensenellaceae-R-7-group.