Multivariable association discovery in population-scale meta-omics studies
Fig 3
MaAsLin 2 facilitates multivariable association discovery in large-scale human epidemiological and other microbial community studies.
Synthetic datasets containing five “metadata” with varying types of induced feature associations were analyzed using a variety of multivariable approaches (S1C Fig). As measured by power (recall) and false discovery rate (FDR), MaAsLin 2’s default linear model outperformed other methods in controlling FDR while maintaining power across true-positive fold-change values, regardless of the total number of features. As expected, MaAsLin 2 has better power for stronger effect sizes, eventually attaining the highest power among all FDR-controlling methods (full results in S1–S8 Data). Red line parallel to the x-axis is the nominal FDR. Values are averages over 100 iterations for each parameter combination. The x-axis (effect size) within each panel represents the linear effect size parameter; a higher effect size represents a stronger association. For visualization purposes, the best-performing methods from each class of models (as measured by average F1 score) are shown. Methods are sorted by increasing order of average F1 score across all simulation parameters in this setting.