Fig 1.
Overview of the proposed bioinformatics pipeline for microbial community dynamics analysis.
The pipeline offers an integrated suite of computational tools that allow researchers to identify disease-related microorganisms, stratify samples into clinically relevant subtypes, construct disease progression models, and delineate disease-specific community dynamics at both organism and functional levels.
Fig 2.
Microbial community dynamic analysis performed on a human gut microbiome dataset (n = 275) obtained from a Crohn’s disease study.
(a) The number of clusters was estimated to be five by gap statistic. (b) Resampling-based consensus clustering analysis identified five robust and stable clusters. (c) Silhouette width analysis further confirmed the robustness of clustering assignment. A total of 255 of 275 (93%) samples had a positive silhouette width, and the average was equal to 0.1. (d) By combining the principal-tree and clustering results, a microbial progression model of Crohn’s disease was constructed, and four progression paths were identified. Each node represents an identified cluster, and the pie chart in each node depicts the percentage of the samples in the node having one of the CD behaviors (left panel) or belonging to one of the CD subtypes (right panel). (e-g) Visualization analysis provided a general view of sample distribution supported by the selected microorganisms. Each point represents a sample, which was projected onto a three-dimensional space by using the DDRTree method. Each sample was color-coded by its cluster index (e), CD behavior (f), and CD subtype (g), respectively. The solid line represents the constructed principal tree. HC: healthy control, cCD: colonic Crohn’s disease, iCD: ileal Crohn’s disease, icCD: ilealcolonic Crohn’s disease, r/nr: with/without ileocaecal resection.
Fig 3.
Changes in microbiome alpha diversity along identified progression paths and clinical characteristics of healthy states and disease endpoints.
(a) Spearman’s rank correlation analysis of alpha diversity as measured by Chao1 index and Shannon index along the four identified progression paths (see Fig 2D). To aid in visualization, each sample was annotated by its clinical behavior. (b) Comparison of alpha diversity of five detected clusters. The asterisks indicate the levels of significance determined by ANOVA. *: p-value < 0.05, **: p-value < 0.01, ***: p-value < 0.001. Also see S3 Table. (c) Enterotype analysis of the HC samples in Cluster 1 (HC-C1) and Cluster 2 (HC-C2). HC-C1 and HC-C2 correspond to the enterotypes driven by Bacteroides and Prevotella, respectively. (d-e) Comparison of clinical characteristics of patients in two disease endpoints (i.e., Clusters 4 and 5). Cluster 5 contained a significantly higher proportion of patients with active inflammation (fecal calprotectin >150 μg/g) compared with Cluster 4 (p-value = 0.016, χ2 test). Clusters 4 and 5 exhibited significantly different female-to-male ratios and CD behavior compositions (p-value < 0.01, χ2 test).
Fig 4.
Heatmap of microorganisms for which the relative abundances were detected to be highly correlated with at least one of the four identified CD progression paths.
Each row represents an OTU, and each column represents a sample. The samples were first ordered by cluster labels and then by progression distances. For the purpose of visualization, the relative abundance of each OTU was log-transformed and scaled into the range of [0, 1]. See S5 Fig for additional details.
Fig 5.
Microbial interaction networks inferred by the gLV method applied to pseudo-time series data recovered from four identified CD progression paths.
Each node represents an OTU, its size is proportional to the number of edges directed out of the node (i.e., out-degree), and its face color represents the sign of the correlation of the relative abundance of the OTU with a progression path (red: positive, blue: negative). Only the nodes with out-degrees larger than 10 were annotated. Since compositionality was not considered in the analysis, artificial links might arise. See S9 Fig for detailed annotations.
Fig 6.
Microbial community dynamics analysis of individual patients.
(a) The progression distances of the samples collected from individual participants over a two-year period. The participants were first ordered by CD subtypes and then by median progression distances. (b) The microbiome composition of a patient was significantly altered by medication. Sample 0 contained only a few hundreds of reads and thus was omitted. (c) Samples collected from a two-year study provided only a partial picture of microbial community dynamics associated with disease development. Each circle represents a patient, the face color represents the CD subtype, and the radius equals 1.5 MAD of the progression distances of the samples collected from the patient. MAD: median absolute deviation.