Inference of Network Dynamics and Metabolic Interactions in the Gut Microbiome

Steven N. Steinway; Matthew B. Biggs; Thomas P. Loughran Jr; Jason A. Papin; Reka Albert

doi:10.1371/journal.pcbi.1004338

Abstract

We present a novel methodology to construct a Boolean dynamic model from time series metagenomic information and integrate this modeling with genome-scale metabolic network reconstructions to identify metabolic underpinnings for microbial interactions. We apply this in the context of a critical health issue: clindamycin antibiotic treatment and opportunistic Clostridium difficile infection. Our model recapitulates known dynamics of clindamycin antibiotic treatment and C. difficile infection and predicts therapeutic probiotic interventions to suppress C. difficile infection. Genome-scale metabolic network reconstructions reveal metabolic differences between community members and are used to explore the role of metabolism in the observed microbial interactions. In vitro experimental data validate a key result of our computational model, that B. intestinihominis can in fact slow C. difficile growth.

Author Summary

The community of bacteria that live in our intestines (called the “gut microbiome”) is important to normal intestinal function, and destruction of this community has a causative role in diseases including obesity, diabetes, and even neurological disorders. Clostridum difficile is an opportunistic pathogenic bacterium that causes potentially life-threatening intestinal inflammation and diarrhea and frequently occurs after antibiotic treatment, which wipes out the normal intestinal bacterial community. We use a mathematical model to identify how the normal bacterial community interacts and how this community changes with antibiotic treatment and C. difficile infection. We use this model to identify bacteria that may inhibit C. difficile growth. Our model and subsequent experiments indicate that Barnesiella intestinihominis inhibits C. difficile growth. This result suggests that B. intestinihominis could potentially be used as a probiotic to treat or prevent C. difficile infection.

Citation: Steinway SN, Biggs MB, Loughran TP Jr, Papin JA, Albert R (2015) Inference of Network Dynamics and Metabolic Interactions in the Gut Microbiome. PLoS Comput Biol 11(6): e1004338. https://doi.org/10.1371/journal.pcbi.1004338

Editor: Costas D. Maranas, The Pennsylvania State University, UNITED STATES

Received: February 18, 2015; Accepted: May 13, 2015; Published: June 23, 2015

Copyright: © 2015 Steinway et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: The code generated for this paper is publicly available in this repository: https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/src.

Funding: SNS and MBB were funded by The Jefferson Trust/ University of Virginia Data Science Institute Collaborative Research Award in Big Data: http://gradstudies.virginia.edu/bigdata. MBB and JAP were funded by R01 GM108501 National Institute of General Medical Sciences, National Institutes of Health: http://www.nigms.nih.gov/Research/Mechanisms/Pages/ResearchProjectGrants.aspx. MBB was funded by University of Virginia Biotechnology Training Grant: http://faculty.virginia.edu/biotech/Home.html. SNS was funded by F30 DK093234 National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health: http://grants.nih.gov/grants/guide/contacts/parent_F30.html. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Human health is inseparably connected to the billions of microbes that live in and on us. Current research shows that our associations with microbes are, more often than not, essential for our health [1]. The microbes that live in and on us (collectively our “microbiome”) help us to digest our food, train our immune systems, and protect us from pathogens [2,3]. The gut microbiome is an enormous community, consisting of hundreds of species and trillions of individual interacting bacteria [4]. Microbial community composition often persists for years without significant change [5].

When change comes, however, it can have unpredictable and sometimes fatal consequences. Acute and recurring infections by Clostridium difficile have been strongly linked to changes in gut microbiota [6]. The generally accepted paradigm is that antibiotic treatment (or some other perturbation) significantly disrupts the microbial community structure in the gut, which creates a void that C. difficile will subsequently fill [7–10]. Such infections occur in roughly 600,000 people in the United States each year (this number is on the rise), with an associated mortality rate of 2.3% [11]. Each year, healthcare costs associated with C. difficile infection are in excess of $3.2 billion [11]. An altered gut flora has further been identified as a causal factor in obesity, diabetes, some cancers and behavioral disorders [12-17].

What promotes the stability of a microbial community, or causes its collapse, is poorly understood. Until we know what promotes stability, we cannot design targeted treatments that prevent microbiome disruption, nor can we rebuild a disrupted microbiome. Studying the system level properties and dynamics of a large community is impossible using traditional microbiology approaches. However, network science is an emerging field which provides a powerful framework for the study of complex systems like the gut microbiome [18–23]. Previous efforts to capture the essential dynamics of the gut have made heavy use of ordinary differential equation (ODE) models [24,25]. Such models require the estimation of many parameters. With so many degrees of freedom, it is possible to overfit the underlying data, and it is difficult to scale up to larger communities [26,27]. Boolean dynamic models, conversely, require far less parameterization. Such models capture the essential dynamics of a system, and scale to larger systems. Boolean models have been successfully applied at the molecular [28,29], cellular [20], and community levels [30]. Here we present the first Boolean dynamic model constructed from metagenomic sequence information and the first application of Boolean modeling to microbial community analysis.

We analyze the dynamic nature of the gut microbiome, focusing on the effect of clindamycin antibiotic treatment and C. difficile infection on gut microbial community structure. We generate a microbial interaction network and dynamical model based on time-series data from metagenome data from a population of mice. We present the results of a dynamic network analysis, including steady-state conditions, how those steady states are reached and maintained, how they relate to the health or disease status of the mice, and how targeted changes in the network can transition the community from a disease state to a healthy state. Furthermore, knowing how microbes positively or negatively impact each other—particularly for key microbes in the community—increases the therapeutic utility of the inferred interaction network. We produced genome-scale metabolic reconstructions of the taxa represented in this community [31], and probe how metabolism could—and could not—contribute to the mechanistic underpinnings of the observed interactions. We present validating experimental evidence consistent with our computational results, indicating that a member of the normal gut flora, Barnesiella, can in fact slow C. difficile growth.

Methods

Data Sources

Buffie et al. reported treating mice with clindamycin and tracking microbial abundance by 16S sequencing [32]. Mice treated with clindamycin were more susceptible to C. difficile infection than controls. The collection of 16S sequences corresponding to these experiments was analyzed by Stein et al. [24]. First, Stein et al. aggregated the data by quantifying microbial abundance at the genus level. Abundances of the ten most abundant genera and an “other” group were presented as operational taxonomic unit (OTU) counts per sample. We use the aggregated abundances from Stein et al. as the starting point for our modeling pipeline (Fig 1).

Download:

Fig 1. Dynamic analysis workflow.

Time course genus abundance information was acquired from metagenomic sequencing of mouse gastrointestinal tracts under varying experimental conditions. Missing time points from experimental data were estimated such that genus abundances existed at the same time points across all treatment groups. Next, genus abundances were binarized such that Boolean regulatory relationships could be inferred. A dynamic Boolean model was constructed to explore gut microbial dynamics, therapeutic interventions, and metabolic mediators of bacterial regulatory relationships.

https://doi.org/10.1371/journal.pcbi.1004338.g001

This processed dataset consisted of nine samples and three treatment groups (n = 3 replicates per treatment group). The first treatment group (here called “Healthy”) received spores of C. difficile at t = 0 days, and was used to determine the susceptibility of the native microbiota to invasion. The second treatment group (here called “clindamycin treated”) received a single dose of clindamycin at t = -1 days to assess the effect of the antibiotic alone, and the third treatment group (here called “clindamycin+ C. difficile treated”) received a single dose of clindamycin (at t = -1 days) and, on the following day, was inoculated with C. difficile spores (S1A Fig). Under the clindamycin+ C. difficile treatment group conditions, C. difficile could colonize the mice and produce colitis; however this was not possible under the first two treatment group conditions.

Interpolation of Missing Genus Abundance Information

The gut bacterial genus abundance dataset included some variation in terms of time points in which genera were sampled. That is, genus abundances were measured between 0 to 23 days; however, not all samples had measurements at all the time points (S1A Fig). Particularly, the healthy population only included time points at 0, 2, 6, and 13 days and Sample 1 of clindamycin+ C. difficile treated population was missing the 9 day time point. Missing abundance values for these 4 points were estimated using an interpolation approach (S1B Fig). For healthy samples, the 16 and 23 day time points could not be interpolated as the last experimentally identified time point for these samples is at 13 days. The assumption of the approximated polynomial for these samples is that extrapolated data points are linear using the slope of the interpolating curve at the nearest data point. Because genera abundances are fairly stable across time in this treatment group (i.e. the slope of most of the genera abundances is approximately zero), extrapolating two time points was deemed reasonable. A principal component analysis was completed on the interpolated data (Fig 2A) and shows that the interpolated time series bacterial genus abundance data clusters by experimental treatment group in the first two principal components. Furthermore, the results of the binarization for the healthy population suggest that interpolation did not have any concerning effects on the 16 and 23 day time points (S2 Fig).

Download:

Fig 2. Construction of a network model of the gut microbiome from time course metagenomic genus abundance information.

Principal component analysis coefficients associated with each sample in the metagenomic genus abundance dataset was completed for A) interpolated genus abundances and B) binarized interpolated genus abundances. ‘*’ = Healthy; ‘^’ = clindamycin treated; ‘#’ = clindamycin+ C. difficile treated. C) Consensus binarization of genus abundance information. Each heatmap represents the consensus binarization for each treatment group. The horizontal axis represents the day of the experiment that the sample came from. The vertical axis represents the specific genera being modeled. Each genus was binarized to a 1 (ON; above activity threshold) or 0 (OFF; below activity threshold). D) Interaction rules were inferred from the binarized data. The interaction rules were simplified for visualization (compound rules were broken into simple one-to-one edges).

https://doi.org/10.1371/journal.pcbi.1004338.g002

Natural cubic spline interpolation was used to estimate genus abundances at missing time points in some samples. A cubic spline is constructed of piecewise third order (cubic) polynomials which pass through the known data points and has continuous first and second derivatives across all points in the dataset. Natural cubic spline is a cubic spline that has a second derivative equal to zero at the end points of the dataset [33]. Natural splines were interpolated such that all datasets had time points at single day intervals through the 23 day time point (S1B Fig).

Network Modeling Framework

We use a Boolean framework in which each network node is described by one of two qualitative states: ON or OFF. We chose this framework because of its computational feasibility and capacity to be constructed with minimal and qualitative biological data [34]. The ON (logical 1) state means an above threshold abundance of a bacterial genus whereas the OFF (logical 0) state means below-threshold genus absence. The putative biological relationships among genera are expressed as mathematical equations using Boolean operators [29,34]. We inferred putative Boolean regulatory functions for each node, which are able to best capture the trends in the bacterial abundances. These rules, (edges in the interaction network) can be assigned a direction, representing information flow, i.e. effect from the source (upstream) node to the target (downstream) node. Furthermore, edges can be characterized as positive (growth promoting) or negative (growth suppressing). An additional layer of network analysis is the dynamic model, which is used to express the behavior of a system over time by characterizing each node by a state variable (e.g., abundance) and a function that describes its regulation. Dynamic models can be categorized as continuous or discrete, according to the type of node state variable used. Continuous models use a set of differential equations; however, the paucity of known kinetic details for inter-genus and/or inter-species interactions makes these models difficult to implement.

Binarization

Genus abundance data was binarized (converted to a presence-absence dataset) to enable inference of Boolean relationships for modeling applications. We adapted a previously developed approach called iterative k-means binarization with a clustering depth of 3 (KM3) for this purpose [35]. This approach was employed because binarized data is able to maintain complex oscillatory behavior in Boolean models constructed from this data, whereas other binarization approaches fail to maintain these features [35].

Briefly, this approach uses k-means clustering with a depth of clustering d and an initial number of clusters k = 2^d. In each iteration, data for a specific genus G are clustered into k unique clusters C¹_G,…,C^k_G, then for each cluster, Cⁿ_G, all the values are replaced by the mean value of Cⁿ_G. For the next iteration, the value of d is decreased and clustering is repeated. This methodology is repeated until d = 1. This approach, with d = 3 (called here as KM3 binarization) has previously been demonstrated as a superior binarization methodology to other binarization approaches for Boolean model construction because it conserves oscillatory behavior [35]. These analyses were performed using custom Python code based on a previously written algorithm [35] and is available in the supplemental materials.

Because KM3 binarization has a stochastic component (the initial grouping of binarization clusters), we employed KM3 binarization on the entire bacterial genus abundance time series dataset 1000 times. The average binarization for each sample (S2 Fig) was used to determine the most probable binarized state of each genus in each sample at each time point (S3 Fig). A principal component analysis of the most probable binarized genus abundances for each sample demonstrates that as with the continuous time series abundances (Fig 2A), binarized bacterial genus abundance data cluster by experimental treatment group (Fig 2B). For inference of Boolean rules from the binarized genus abundances (S3 Fig), the consensus of two of three samples for each treatment population was used as the binarized state of each genus at each time point in each sample (Fig 2C).

Inference of Boolean Rules from Time Series Genus Abundance Information

The Best-fit extension was applied to learn Boolean rules from the binarized time series genus abundance information [36]. For each variable (genus) X_i in the binarized time series genus abundance data, Best-fit identifies the set of Boolean rules with k variables (regulators) that explains the variable’s time pattern with the least error size. The algorithm uses partially defined Boolean functions pdBf (T, F), where the set of true (T) and false vectors (F) are defined as T = {X′ ∈ {0, 1}^k: X_i (t + 1) = 1} and F = {X′ ∈ {0, 1}^k: X_i (t + 1) = 0}. Intuitively, the partial Boolean function summarizes the states of the putative regulators that correspond to a turning ON (T) or turning OFF (F) of the target variable. The error size ε of pdBf(T,F) is defined as the minimum number of inconsistencies within X′ that best classifies the T and F values of the dataset. The Best-Fit extension works by identifying smallest size X′ for X_i. For more detailed information refer to [36]. In line with this, we considered the most parsimonious representation of the rules with the smallest ε. If the most parsimonious rule was self-regulation, we also considered rules with the same ε that included another regulator. If multiple rules fit these criteria for a given X_i, it implied that they can independently represent the inferred regulatory relationships. In cases where the alternatives had the same value of (non-zero) ε, we explored combinations (such as appending them by an OR rule) and used the combination that best described the experimentally observed final (steady state) outcomes. For example, we combined the two alternative rules for Blautia with an OR relationship. In the case of Barnesiella, we chained three rules ("Other", "Lachnospiraceae_other", "Lachnospiraceae") by an OR relationship, and "not Clindamycin" by an AND relationship to incorporate the loss of Barnesiella in the presence of clindamycin (Fig 2C). This was also done for rules for “Lachnospiraceae”, “Lachnospiraceae_other” and “Other” and all four nodes attained the same rule. There are six nodes with multiple inferred (alternative) rules: “Barnesiella”,”Blautia”,”Enterococcus”,”Lachnospiraceae”,”Lachnospiraceae_other”, and”Other” had 4, 2, 5, 4, 4, and 4 rules, respectively. The six other nodes had a single inferred rule. The network in Fig 2C represents the union of all of the alternative rules produced by Best-Fit, or in other words,–it is a super-network of all alternative rules. Any alternative networks would be a sub-network of what we show. A strongly connected component between the nodes inhibited by clindamycin is a feature of the vast majority of these sub-networks. We used the implementation of Best-Fit in the R package BoolNet [37].

Dynamic Analysis

Dynamic analysis is performed by applying the inferred Boolean functions in succession until a steady state is reached. Boolean models and discrete dynamic models in general focus on state transitions instead of following the system in continuous time. Thus, time is an implicit variable in these models. The network transitions from an initial condition (initial state of the bacterial community) until an attractor is reached. An attractor can be a fixed point (steady state) or a set of states that repeat indefinitely (a complex attractor). The basin of attraction refers to the set of initial conditions that lead the system to a specific attractor. For the network under consideration, the complete state space can be traversed by enumerating every possible combination of node states (2¹²) and applying the inferred Boolean functions (or “update rules”) to determine paths linking those states. The state transition network describes all possible community trajectories from initial conditions to steady states, given the observed interactions between bacteria in the community.

We made use of two update schemes to simulate network dynamics: synchronous (deterministic) and asynchronous (stochastic). Synchronous models are the simplest update method: all nodes are updated at multiples of a common time step based on the previous state of the system. The synchronous model is deterministic in that the sequence of state transitions is definite for identical initial conditions of a model. In asynchronous models, the nodes are updated individually, depending on the timing information, or lack thereof, of individual biological events. In the general asynchronous model used here, a single node is randomly updated at each time step [38]. The general asynchronous model is useful when there is heterogeneity in the timing of network events but when the specific timing is unknown. Due to the heterogeneous mechanisms by which bacteria interact, we made the assumption of time heterogeneity without specifically known time relationships. Synchronous and asynchronous Boolean models have the same fixed points, because fixed points are independent of the implementation of time. However, the basin of attraction of each fixed point (i.e. the initial conditions that lead to each fixed point) may differ between synchronous and asynchronous models (S2 Table). For identification of all of the fixed points in the network (the attractor landscape), the synchronous updating scheme was used. However, for the perturbation analysis, the asynchronous updating scheme was used because it more realistically models the possible trajectories in a stochastic and/or time-heterogeneous system. The simulations of the gut microbiome model were performed using custom Python code built on top of the BooleanNet Python library, which facilitates Boolean simulations [39]. Our custom Python code is available in the supplemental materials.

Perturbation Analysis

To capture the effect of removal (knockout) or addition (probiotic; forced over abundance) of genera, modification of the states/rules to describe removal or addition states were performed. These modifications were implemented in BooleanNet by setting the corresponding nodes to either OFF (removal) or ON (addition) and then removing the corresponding updating rules for these nodes for the simulations. By examining many such forced perturbations, we can identify potential therapeutic strategies, many of which may not be obvious or intuitive, particularly as network complexity increases. We used asynchronous update when simulating the effect of perturbations on the microbial communities. In each case we performed 1000 simulations and report the percentage of simulations that achieve a certain outcome.

Generating Genus-Level Genome-Scale Metabolic Reconstructions

To generate draft metabolic network reconstructions for each of the ten genera in the paper, we first obtained genome sequences for representative species by searching the “Genomes” database of the National Center for Biotechnology Information (NCBI). Complete genomes for the first ten (or if less than ten, all) species within the appropriate genus were downloaded. During the process of reconstructing genus-level metabolic reconstructions, some genera were underrepresented (fewer than 10 species genomes) in the NCBI Genome database, including Akkermansia, Barnesiella and Coprobacillus (S3 Table). The search result order is based on record update time, and so it is quasi-random. Genomes were uploaded to the rapid annotations using subsystems technology (RAST) server for annotation [40]. Draft metabolic network reconstructions were generated by providing the RAST annotations to the Model SEED service [41]. Metabolic network reconstructions were downloaded in “.xls” format. Genus-level metabolic reconstructions were produced by taking the union of all species-level reconstructions corresponding to each genus, as has been done previously [42]. The one exception was C. difficile, which was produced by taking the union of three strain-level reconstructions.

Subsystem Enrichment Analysis

Subsystems were defined as the Kyoto Encyclopedia of Genes and Genomes (KEGG) map with which each reaction was associated [43,44]. These associations were determined based on annotations in the Model SEED database [41]. To quantify enrichment, the complete set of unique reactions from all genus-level reconstructions was pooled, and the subsystem annotations corresponding to those reactions were counted. To determine enrichment for a given subset of the community (either a single genus-level reconstruction, or a set of reconstructions corresponding to a subnetwork), the subsystem occurrences were counted within the subset. The probability of a reconstruction containing N total subsystem annotations, with M or more occurrences of subsystem I, was determined by taking the sum of a hypergeometric probability distribution function (PDF) from M to the total occurrences of subsystem I in the overall population. Enrichment analysis was performed in Matlab [45].

Identifying Seed Sets and Defining Metabolic Competition and Mutualism Scores

To quantify metabolic interactions, we started by utilizing the seed set detection algorithm developed by Borenstein et al. [46,47]. The algorithm follows three steps:

The genome-scale metabolic network reconstruction is reduced into simple one-to-one edges, such that for each reaction, each substrate and product pair forms an edge (e.g. A + B → C would become A → C and B → C).
The network is divided into strongly connected components, those groups of nodes for which two paths of opposite directions (e.g. A → B and B → A) exist between any two nodes in the group.
Nodes (and strongly connected components with five or fewer nodes) for which there are exclusively outgoing edges are defined as “inputs” to the model, or seed metabolites.

The rationale is that metabolites that feed into the network, but cannot be produced by any reactions within the network, must be obtained from the environment.

Competition metrics were generated following the process of Levy and Borenstein [46]. For a given pair of genera, the competition score is defined as:

(1)

Here SeedSet_i is the set of obligatory input metabolites to the metabolic network reconstruction for genus i, and |SeedSet_i| is the number of metabolites contained in SeedSet_i. The competition score indicates the fractional overlap of inputs that genus i shares with genus j, and so ranges between zero and one. The higher the score, the more similar the metabolic inputs to the two networks, making competition more likely.

For a given pair of genera, the mutualism score is defined as:

(2)

Here ¬SeedSet_j is the set of metabolites that can be produced by the metabolic network for species j (i.e. all non-seed metabolites). The mutualism score indicates the fractional overlap of inputs that genus i consumes which genus j can potentially provide. The mutualism score ranges between zero and one. The higher the score, the more potential there is for nutrient sharing between species. While the score does not measure “mutualism” per se (it cannot necessarily distinguish between other interactions such as commensalism or amenalism [48]), for simplicity, we will refer to these scores as the competition and mutualism scores.

All metabolic reconstructions, seed sets, competition scores and mutualism scores are available in the supplemental materials. Seed set generation was performed using custom Matlab scripts, which are available in the supplement. [45]. Statistical tests were performed in R [49].

Co-culture and Spent Media Experiments

Barnesiella intestinihominis DSM 21032 and Clostridium difficile VPI 10463 were grown anaerobically in PRAS chopped meat medium (CMB) (Anaerobe Systems, Morgan Hill, CA) at 37 C. To prepare B. intestinihominis spent medium, B. intestinihominis was grown in CMB until stationary phase (44 hours). The saturated culture was centrifuged, and the supernatant was filter sterilized (0.22 μM pore size). Growth curves were obtained by inoculating batch cultures in 96-well plates and gathering optical density measurements (870 nm) using a small plate reader that fits in the anaerobic chamber [50]. Single cultures were inoculated from overnight liquid culture to a starting density of 0.01. The co-cultures were started at a 1:1 ratio, for a total starting density of 0.02. Optical density was measured every 2 minutes for 24 hours, and the resulting growth curves were analyzed in Matlab [45]. Maximum growth rates were calculated by fitting a smooth line to each growth curve, and finding the maximum growth rate from among the instantaneous growth rates over the whole time course: [log(OD_t+1)—log(OD_t)] / [t₊₁-t]. The achieved bacterial density—area under the growth curve (AUC)—in a culture was calculated by integrating over the growth curve in each experiment using the “trapz()” function in Matlab. It can be thought of as representing the total biomass produced over time. The simply additive null model was calculated by fitting a Lotka-Volterra model [24] to the single cultures for both B. intestihominis and C. difficile. The null model of co-culture (assuming zero interaction between species) was simulated by using the parameters from single culture, and summing the predicted OD870 values.

All scripts used to analyze the data are available at https://bitbucket.org/gutmicrobiomepaper/microbiomenetworkmodelpaper/wiki/Home.

Results

Processing of a Microbial Genus Abundance Dataset for Network Inference

To capture the dynamics of inter-genus interactions in the intestinal tract we employed a pipeline (Fig 1) which translates metagenomic genus abundance information into a dynamic Boolean model. This approach involves three steps: 1) discretization (binarization) of genus abundances, 2) learning Boolean relationships among genera, and 3) translation of genus associations into a Boolean (discrete) dynamic model.

Construction of a Dynamic Network Model from Binarized Time Series Microbial Genus Abundance Information

Boolean rules (S1 Table) were inferred from the time series binarized genus abundances using an implementation of the Best-fit extension [36] in the R Boolean network inference package BoolNet [37](see Methods). A network of 12 nodes and 33 edges was inferred (Fig 2D). The inferred interaction network has a clustered structure: the cluster (subnetwork) containing the two Lachnospiraceae nodes and Barnesiella is strongly influenced by clindamycin whereas the other subnetwork is largely independent of the first, except for the single edge between Barnesiella and C. difficile (Fig 2D). In fact, Lachnospiraceae nodes, Barnesiella and the group of “Other” genera form a strongly connected component; that is, every node is reachable from every other node. Most nodes of the second subnetwork are positively influenced by C. difficile, with the exception of Coprobacillus, for which no regulation by other nodes was inferred, and Akkermansia, which is inferred to be regulated only by Coprobacillus. These latter two genera are transiently present (around day 5) in the clindamycin treatment group, but they do not appear in the final states of any of the treatment groups (see S1 Fig). This network structure is consistent with published data in which the dominant Firmicutes (Lachnospiraceae) and Bacteroidetes (Barnesiella) are devastated by antibiotic administration [51,52]. Furthermore, the clustered structure (Fig 2D) supports the established mechanism of C. difficile colitis: loss of normal gut flora, which normally suppresses opportunistic infection (clindamycin cluster), and the presence of C. difficile at a minimum inoculum (C. difficile cluster) [10,53]. The network clusters have a single route of interaction between Barnesiella and C. difficile.

The negative influence of Barnesiella on C. difficile is in agreement with recently published findings in which Barnesiella was strongly correlated with C. difficile clearance [54]. The role of Barnesiella as an inhibitor of another pathogen (vancomycin-resistant Enterococci (VRE)) has been shown in mice [55], which is also visible in the network model as an indirect relationship between Barnesiella and Enterococcus (Fig 2D). Related species of Bacteroidetes have been shown to play vital roles in protection from C. difficile infection in mice [56]. Furthermore, the network structure shows that Lachnospiraceae positively interacts with Barnesiella, leading to an indirect suppression of C. difficile. Interestingly, the two Lachnospiraceae nodes and the “Other” node form a strongly connected component, suggesting a similar role in the network, particularly in promoting growth of Barnesiella, which directly suppresses C. difficile. In support of this finding, Lachnospiraceae has been shown to protect mice against C. difficile colonization [52,57]. Therefore, the structure of the network is both a parsimonious representation of the current data set, and is supported by literature evidence.

We applied dynamic analysis using the synchronous updating scheme (see Methods) to determine all the possible steady states of the microbiome network model. In a 12 node network, there are 2¹² possible network states. We employed model simulations using the synchronous updating scheme to visit all possible network states and identify all fixed points of the model. Exploration of the steady states of this network reveals 23 possible fixed point attractors (S4 Fig). Three of the identified attractors (Fig 3A) are in exact agreement with the experimentally identified terminal time points of binarized genus abundances (Fig 2C). These attractors make up a small subset of the entire microbiome network state space (S2 Table).

Download:

Fig 3. Steady states and node perturbations in the gut microbiome model.

A) Heatmap of the three steady states in the gut microbiome model. These steady states are identical to steady states identified in the three experimental groups. B) The effect of node perturbations represented by four heatmaps. On the Y-axis of each of the four heatmaps are nodes (genera) in each steady state. On the x-axis of each of the four heatmaps are the steady states found under normal model conditions (i.e. no node perturbations) and also the specific perturbation of a single network node. The two heatmaps in the left column of the figure demonstrate the effect of addition (forced overabundance) of individual genera, and the two heatmaps in the right column of the figure demonstrate the effect of removal (knockout) of individual genera. The top row heatmaps show the effect of node perturbations on the clindamycin treated group and the bottom row heatmaps show the effect of node perturbations on the clindamycin+ C. difficile treatment group. *Genus abundance of 0 means present in 0% of asynchronous simulations and is indicated in blue; Genus abundance of 1 means present in all (100%) of asynchronous simulations, shown in yellow. n = 1000 simulations were applied for all Boolean model simulations.

https://doi.org/10.1371/journal.pcbi.1004338.g003

The attractor landscape can be divided into six groups based on abundance patterns they share (S4 Fig). Group 1 is made up of a single attractor wherein all genera are absent (OFF). The second group attractor consists of the experimentally defined healthy state (Attractor 2) and genera in the C. difficile subnetwork which can be abundant (ON) independent of the clindamycin subnetwork. The third grouping has the clindamycin treated steady state (Attractor 7) and genera in the C. difficile subnetwork that can survive in the presence of the clindamycin. Group 4 contains the clindamycin plus C. difficile steady state (Attractor 12) and its subsets in which one or both of the source nodes Mollicutes and Enterobacteriaceae are absent. Group 5 contains attractors in which clindamycin is absent and C. difficile is present. Even if clindamycin is absent, our model suggests that C. difficile can thrive if Lachnospiraceae and Barnesiella are absent, i.e. these states represent a clindamycin-independent loss of Lachnospiraceae and Barnesiella. Lastly, group 6 attractors have both clindamycin and C. difficile as OFF. Blautia and Enterococcus are always abundant in these attractors. Indeed, because of the mutual activation between Blautia and Enterococcus they always appear together. Attractors in this group may also include the abundance (ON state) of the source nodes Mollicutes and Enterobacteriaceae.