Boolean analysis reveals systematic interactions among low-abundance species in the human gut microbiome

The analysis of microbiome compositions in the human gut has gained increasing interest due to the broader availability of data and functional databases and substantial progress in data analysis methods, but also due to the high relevance of the microbiome in human health and disease. While most analyses infer interactions among highly abundant species, the large number of low-abundance species has received less attention. Here we present a novel analysis method based on Boolean operations applied to microbial co-occurrence patterns. We calibrate our approach with simulated data based on a dynamical Boolean network model from which we interpret the statistics of attractor states as a theoretical proxy for microbiome composition. We show that for given fractions of synergistic and competitive interactions in the model our Boolean abundance analysis can reliably detect these interactions. Analyzing a novel data set of 822 microbiome compositions of the human gut, we find a large number of highly significant synergistic interactions among these low-abundance species, forming a connected network, and a few isolated competitive interactions.

One of the limitations of the Boolean model used to simulate (binary) abundance patterns from species interaction networks is that we are limited in the range of connectivities producing a substantially rich set of abundance patterns (i.e. attractors). Figure D shows the number of attractors as a function of connectivity for species interaction networks with 15 nodes for three different schemes of changing connectivity. Due to the rapid decrease of the number of attractors with connectivity in all three schemes, we restrict to the choice M − = const = 15 and M + = const = 15.
Regarding our choice of parameters, two points should be noted: (1) There is no reliable a priori information about suitable ranges of connectivity. In particular, one of the main findings of the present investigation is that the dominant (and well known) interactions among high-abundance species are embedded in a large network of (mostly positive) interactions of S-2  S-3 low-abundance species. (2) Our choice of requiring more than 100 distinct attractors is purely heuristic. It is intuitively clear that the quality should decrease with decreasing numbers of attractors. However, we have not studied this decrease in detail.
The prediction quality is, to a certain extent, arbitrary, as it is based on specific thresholds: We count as correctly classified the cases, where the Jaccard index was larger than 0.6 and as incorrectly classified (thus counting negatively) those cases, where the Jaccard index was smaller than 0.4. Figure E shows a histogram of prediction qualities for the Boolean AND and for the Jaccard index for 80 simulated species interaction networks (N = 15, M + = M − = 15) at a binary noise level of p = 0.2 and a threshold for the Jaccard index of 0.8. It is quite clear that for this choice of parameters the Boolean AND performs substantially better in recovering positive interactions. Based on a wide range of such simulations, we are convinced that the Boolean analysis is the better choice, if connectivity and noise levels are not a priori known. We also intend to extend the ESABO analysis further by allowing the choice, whether binarized abundance vectors are evaluated via the entropy shifts based on (several) Boolean operations or via the Jaccard index. Blue: prediction quality obtained from the Jaccard index (detection threshold 0.8: If the Jaccard index for a positive interaction is 0.8 or higher, the link is counted as 'correctly predicted'; if it is below 0.8 it is counted as 'incorrectly predicted'). Red: prediction quality obtained from ESABO (Boolean AND). (Note that 'mixed colors' appear, when histograms overlap.) Given the binarized abundance data, there are two additional filtering steps, which may be applied: (1) Eliminating dublicates from the abundance vectors (filter D). In the analysis of the simulated abundance data, this step has significantly improved the detection quality (cf. Figure S2). (2) Discarding taxa with near-constant abundance vectors (i.e. taxa that are either almost always present or almost always absent; filter C). Table A shows how many steady state abundance vectors and how many taxa remain after each of these steps and how many positive and negative links the ESABO analysis yields, when applied with and without these filtering steps. The overall picture emerging from applying the ESABO analysis to different taxonomic levels is that there is a substantial number of significant interactions and that the positive interactions tend to be much more frequent than the negative interactions. As pointed out above, the due to the binarization, the ESABO analysis rather focuses on the low-abundance species. The multi-level analysis summarized in Table A thus supports our key result presented in section Analysis of the human gut microbiome compositions, namely that the well-known, strong (mostly inhibitory) links in microbial interaction networks are embedded in a dense systematic network of (mostly positive) interactions among low-abundance species. For the detailed phyla-level analysis presented in section Analysis of the human gut microbiome compositions, we opted for the unfiltered version, as the data matrix becomes indeed very small under these filtering steps.

Definition of prediction quality
The prediction quality for the ESABO score shown in Figure 4 is computed using the following 'template': (number of successful prediction − number of unsuccessful predictions)/(total number of cases in this category). For positive interactions: where, e.g., |z > 1| denotes the number of times (out of the M + positive interactions) a z-score larger than 1 has been found. For negative interactions:

S-5
Abundance and co-abundance tables       Table of correlation coefficient for pairs (larger than 0.4 or smaller than -0.10) of phyla. The third column lists the Pearson correlation coefficient c ij for the pairs of phyla with the numbers i and j displayed in columns 1 and 2. As expected, phyla 4 and 11 display a strong anticorrelation of -0.892683 and the two pairs 2 -4 and 11 -18 show a small anticorrelation. On the side of positive correlations there are nine stronger co-abundances whereof three (1-16, 1-17 and 1-13) have a correlation coefficient ≥ 0.5.

S-11
Re-analysis of the American Gut dataset  Figure F: Links of the phyla interaction network, as compiled for the within-gut interactions from the data from Faust et al. [12]. The network is comprised by three positive weak interactions, one strong inhibiting link (between Bacteriodetes and Firmicutes) and five weak negative interactions. Herein the arguments of ESABO are filename: defines the filename including the path random: defines the number of randomized vectors for calculation of the z-scores booleanOp: specifies the boolean operation (hier: AND) threshold: sets the binarization threshold of the abundance data The main results (pairs of species names, entropy shift z-score) then were obtained by evaluateESABO["phyla_1000.otu_matrix_row", 1000, BitAnd, 1]