Dynamics of the Microbiota in Response to Host Infection

Longitudinal studies of the microbiota are important for discovering changes in microbial communities that affect the host. The complexity of these ecosystems requires rigorous integrated experimental and computational methods to identify temporal signatures that promote physiologic or pathophysiologic responses in vivo. Employing a murine model of infectious colitis with the pathogen Citrobacter rodentium, we generated a 2-month time-series of 16S rDNA gene profiles, and quantitatively cultured commensals, from multiple intestinal sites in infected and uninfected mice. We developed a computational framework to discover time-varying signatures for individual taxa, and to automatically group signatures to identify microbial sub-communities within the larger gut ecosystem that demonstrate common behaviors. Application of this model to the 16S rDNA dataset revealed dynamic alterations in the microbiota at multiple levels of resolution, from effects on systems-level metrics to changes across anatomic sites for individual taxa and species. These analyses revealed unique, time-dependent microbial signatures associated with host responses at different stages of colitis. Signatures included a Mucispirillum OTU associated with early disruption of the colonic surface mucus layer, prior to the onset of symptomatic colitis, and members of the Clostridiales and Lactobacillales that increased with successful resolution of inflammation, after clearance of the pathogen. Quantitative culture data validated findings for predominant species, further refining and strengthening model predictions. These findings provide new insights into the complex behaviors found within host ecosystems, and define several time-dependent microbial signatures that may be leveraged in studies of other infectious or inflammatory conditions.

consequence of this combination of techniques, in the present analysis, MC-TIMME can effectively detect changes between infected and uninfected populations for cases in which the number of replicates at each time-point is small, which simple statistical methods cannot.

Comments on Bayesian methods and mixture models
Unlike classical frequentist approaches that regard model parameters as fixed, Bayesian approaches regard model parameters as variables subject to error and thus governed by probability distributions. This leads to different inference tasks for frequentist and Bayesian approaches.
For a frequentist approach, the inference task is to calculate point estimates θ of the unknown model parameters θ based on the observed data y. One then calculates a p-value, which is the probability of obtaining the observed value θ or one more extreme from a null model (i.e., a model with pre-specified parameter settings θ 0 ).
In contrast, the Bayesian inference task is to estimate the posterior probability distribution, p(θ | y), which is the distribution of model variables θ given all observed data y. Using Bayes' rule, this can be written in terms of the data likelihood p(y | θ) and the prior probability distribution of model parameters p(θ): Because in the Bayesian setting θ is assumed to have a probability distribution, one can directly calculate posterior probabilities of interest regarding model parameters. For instance, in a clinical trial of an antihypertensive drug, one might be interested in the posterior probability that systolic blood pressure decreases by some prespecified percentage in subjects. In the present application of MC-TIMME, we are interested in the posterior probability that time-series of counts from different taxa (or the same taxon under different conditions) are assigned to a prototype signature. A classic problem in statistics and signal processing is the detection of signals in noisy data. A common approach is to model the noise only, and assume the null hypothesis that a given piece of data (e.g., an observed time-series) comes from this noise distribution. One then computes a p-value and rejects the null hypothesis if the p-value is less than some threshold value. An inherent limitation of p-values is that they only quantify the strength of evidence against the null model, but do not provide a direct measure of confidence that data arise from an alternate model [5].
Mixture Models provide a natural framework for incorporating multiple hypotheses, and thus for computing statistics that provide information about multiple hypotheses simultaneously. Bayesian Mixture Models are particularly useful in this context, as they allow one to incorporate prior knowledge and compute distributions of mixture parameters. Consider a simple example of a two component Bayesian Mixture Model, in which the two components and their weights are prespecified. To make the example more concrete, one can imagine that the components represent two distinct populations (e.g., microbes exhibiting two different types of growth signatures) or signal and noise components. Assume that y i is a time-series of counts, and let z i denote a latent variable indicting to which component y i belongs. Then, the posterior probability that y i belongs to component 2 is given by: Thus, we see from this equation that the posterior probability incorporates information about both model components.
Bayesian Mixture Models can also inherently "adjust for multiple hypotheses" (see [6]). To see this, consider again a simple two component Mixture Model, but now consider a model in which the mixing parameters π are regarded as random variables with an appropriate prior distribution. Assume that we have observed a large number of microbial time-series, and that most of these time-series belong to component 1 (e.g., the "noise" component). This means that the mixture weight π 1 for component 1 will have a posterior probability distribution centered close to one and π 2 will have a posterior probability distribution centered close to zero. Thus, if we observe a new microbial time-series y i that truly should belong to component 2 (e.g., is a true "signal"), a large π 1 "bias" will need to be overcome tending to assign y i to component 1 (e.g., tending to identify the time-series as noise). This effect will only be overcome if y i has a very good fit to the component 2 probability density f(y i ; θ 2 ) (e.g., the time-series really looks like a "signal"). Note that the posterior probability distributions for π 1 and π 2 depend on all the data, and so inherently account for multiple hypotheses. In this sense, Bayesian Mixture Models automatically control for false assignments or detections. This argument can be extended to Bayesian Mixtute Models with multiple components, including Infinite Mixture Models such as MC-TIMME. We tested this feature of our model empirically using repeated random permutations of our dataset, and estimated a false detection rate across the entire dataset of less than 5%, indicating that our model behaved as we theoretically predicted. See Section 7 for details. Figure 1 depicts the extended MC-TIMME model structure in graphical model form with plate notation. The model is hierarchical, with variables at higher levels parameterizing prior distributions for variables at lower levels. We assume that we have observed counts (e.g., sequencing reads, colony forming units, etc.) for G taxa in S sites (e.g., body tissues) in I conditions (e.g., infected and control subjects) at T time-points. We further assume we have R(g, s, i, t) replicates for a given experiment, which may vary as a function of the taxa, site, condition and time-point.

Model definition
We assume that there are a potentially infinite number of prototype signatures with mixture weights π and mixture parameters Θ. Each taxon g in site s and condition i is assigned to a prototype signature via a variable z gsi | π ∼ Multinomial(π).
To extend MC-TIMME for modeling the microbiota in a host with an ongoing infection, we use the following continuous-time model of dynamics. For a given prototype signature k, the parameter vector θ k consists of signature shape variables δ kt and a variable k controlling data variance: Here, δ k1 corresponds to the starting value for the signature k. To capture temporal dependencies, we assume that the value of the signature at time t, η k (t), is generated via a random walk over the δ kt variables: The data generation model is then as follows. Let y gsitr represent the observed counts for taxon g in site s in condition i at time-point t and for replicate r. As in the original MC-TIMME model [1], we assume Figure 1: The MC-TIMME model depicted in graphical model form. Circles represent random variables and edges represent dependencies. Rectangles (plates) represent repeated variables. For example, the variable y gsitr represents the observed counts for taxon g in site s in condition i at time-point t and for replicate r. This variable lies within two overlapping plates, one of which represents repetition over the G taxa, S sites and I experimental conditions, and the other of which represents repetition over the T time-points and R replicates. Arrows going into y gsitr indicate that this variable depends directly on several others. The variable z gsi represents a probabilistic assignment of taxon g in site s in condition i to a particular prototype signature. The variables δ k1 and δ kt specify the shape of prototype signature k, and k controls the variability of data generated by the signature. The infinity sign in the corner of the plate surrounding these variables indicates that there are a potentially unlimited number of prototype signatures. Thus, the value of an observed data point y gsitr depends on its probabilistic assignment to a prototype signature and the corresponding shape and variability parameters for the signature. Variables higher in the hierarchy depict prior distributions for child variables, or hyperparameters for the prior distributions. that the conditional distribution for the observation y gsitr is the Negative Binomial Distribution (NBD), parameterized by the exponentiated random walk of δ k variables and a variable k controlling variance of the data for prototype signature k: Thus, the data is assumed to be generated via a Generalized Linear Model (GLM) using the Negative Bionimal Distribution, with parameterization dependent on the prototype signature to which the data is assigned. To complete the model, priors are placed on variables as follows. We assume that the δ kt variables specifying signature values have Normal priors. For the starting value of the signature δ k1 , we assume a Normal prior with mean β and variance σ 2 : For subsequent signature values δ kt for t > 1, we assume a Normal prior with mean zero and time-varying variance ρ 2 ∆t: Here, ∆t is the interval between time-point t and the previous time-point. Thus, the variance of the random walk increases with larger time intervals. For β, the mean of the δ k1 signature starting value, we also assume a Normal prior β ∼ Normal(m β , v β ).
For σ 2 , the variance of δ k1 , we assume a conjugate scaled inverse-chi-square prior: For ρ 2 , the random walk variance, we also assume a conjugate scaled inverse-chi-square prior: For k , the parameter controlling variance for the NBD for component k, we assume a Gamma prior, The mixture weights π are assumed to have a stick-breaking prior, π ∼ Stick(α), where α is a concentration parameter. The stick-breaking prior generates a countably infinite partition of the unit interval [0,1], with the "evenness" of the partition controlled by the concentration parameter α. The stick-breaking prior on π is defined constructively as: See [7] for further details regarding the stick-breaking prior formulation for the Dirichlet Process. Finally, we assume that α has a Gamma prior, α | ω α ∼ Gamma(ω α ).

Further modifications for handling quantitative culture data
Culture data yields information on not only observed counts of Colony Forming Units (CFUs), but also provides the tissue mass assayed and the dilution factor of the sample. We incorporate this information as a per sample offset term in the model. To be precise, let y gsitr be the CFUs for organism g in site s in condition i at time t, let weight gsitr be the weight of the tissue sample in grams, and let dil gsitr be the dilution factor. The offset term o gsitr is then calculated as: The conditional probability in Equation 2 for y gsitr , the observed CFU counts, is then modified to be:

Hyperparameter settings
We set the means of prior distributions empirically from the data and we set the variances of prior distributions to relatively large values to create diffuse or "uninformative priors." We performed sensitivity analysis on hyperparameters and found that our results were relatively insensitive to the particular settings used (see below). For the concentration parameter α | ω α ∼ Gamma(ω α ), we set its hyperparameters to weakly favor assigning all data to a single signature (and thus biasing toward no change in infected versus uninfected samples). It can be shown that the expected number of non-empty components K for a dataset with N data points is given by [8]: In this context, N is the total number of observed time-series of counts, i.e., N = G · I, where G is the number of taxa and I = 2 (accounting for both infected subjects and uninfected controls). We thus set the expectation equal to one and solve for α: We then set ω α1 = α. We set ω α2 = 0.5; we tested a 5-fold range of values for ω α2 around the value used in the manuscript, and found that our results were insensitive to the parameter setting. For k | ω ∼ Gamma(ω ), the variable controlling the NBD variance for prototype signature k, we set the hyperparameters follows. We first computed the empirical NBD mean µ gsit and variance v gsit using a maximum likelihood estimator for each set of replicates for a given taxon g in site s in condition i and time-point t. The hyperparameter ω 1 was then set as: The hyperparameter ω 2 was set to 0.5; we tested a 5-fold range of values for ω 2 around the value used in the manuscript, and found that our results were insensitive to the parameter setting.
For β | m β , v β ∼ Normal(m β , v β ), the variable specifying the mean of the prior on the initial signature values δ k1 , we set the hyperparameters as follows. We first computed the empirical NBD mean µ 1 and variance v 1 for all samples at the first time point. We then set m β to the transformed mean: We set v β to a large variance of 10 3 ; we tested a 5-fold range of values for v β around the value used in the manuscript, and found that our results were insensitive to the parameter setting. For , the variable specifying the variance of the initial signature values δ k1 , we set the hyperparameters as follows. We set v σ = v β . The parameter ξ σ can be interpreted as a prior sample size that weights the contribution of v σ on σ 2 . We assumed a small prior sample size of 10 · T (e.g., 10 OTUs measured across the time-series) to limit the contribution from the prior. For , the random walk variance, we set the hyperparameters as follows. For each taxon g in site s, condition i, and time-point t we computed the empirical count ratio δ gsit : We then set v ρ to: The parameter ξ ρ can be interpreted as a prior sample size. We assumed a small prior sample size of 10 to limit the contribution from the prior.

Model inference
As discussed in Section 2, the model inference task for MC-TIMME is to esimate the posterior distribution of all model variables given the observed data. Bayesian inference is notoriously computationally intensive, and presents particular challenges for models such as MC-TIMME that capture temporal dependences and use non-conjugate priors. Below, we describe an efficient Markov Chain Monte Carlo (MCMC) algorithm for approximating the posterior distribution of the MC-TIMME model, which is similar to the inference algorithm for the original MC-TIMME model [1], but with some additional steps to handle the extended model of signature dynamics. Our solution uses a combination of Gibbs sampling, Metropolis-Hastings (MH), and auxiliary variable augmentation steps.
The MCMC sampling loop is as follows: 1. Sample z gsi , the assignment of taxon g in site s in condition i to a prototype signature.
3. Sample prototype signature variable k controlling NBD variance.
4. Sample β and σ 2 , the mean and variance for the prior on initial signature values.

Step 1: sampling assignments of taxa to prototype signatures
For the first step in the MCMC loop, we use the Gibbs sampling method outlined in [9]. We sample the assignment z gsi for taxon g in site s in experimental condition i, conditional on all the data and other model variables including assignments of the other taxa. In this step there are two cases: taxon g in site s in condition i is assigned to an existing component k, or it is assigned to a new component. The update equations are then: Here, n −gsi k is the number of taxa assigned to prototype signature k excluding taxa g in site s in condition i, and z −gsi is the assignments of all taxa excluding taxa g in site s in condition i. The variable θ * represents the parameters for a new component (prototype signature). The data likelihood for y gsi is given by: Neg-Bin(y gsitr ; µ k (t), k ) The Gibbs sampling procedure thus either assigns the taxa to an existing prototype signature, or generates a new prototype signature. Because the prior distribution for θ is non-conjugate, it is not possible to analytically integrate out θ * . Thus, we follow the method outlined by [9], and sample θ * from its prior distribution. Because the sample from the prior distribution is unbiased, this method will sample correctly from the underlying posterior distribution.

Step 2: sampling prototype signature shape variables
For the second step in the MCMC loop, sampling prototype signature shape variables δ kt , we use Metropolis-Hastings steps. As noted, for a particular prototype signature k, data is assumed to be generated via a GLM using the Negative Bionimal Distribution. A general strategy for Bayesian estimation of GLMs is to use Gibbs sampling combined with Adaptive Rejection Sampling, as the conditional distributions of individual regression coefficients are log-concave [10]. However, this method is not efficient for the MC-TIMME model, which incorporates dependence via the random walk assumption; in practice, we obtained very poor mixing using this method. We instead use a strategy that employs Metropolis-Hastings steps [11]. The key to this method is to generate good linear approximations to the actual GLM distribution to use as proposal distributions. It turns out that these approximations are directly connected to Weighted Least Squares (WLS) methods, which are used for maximum likelihood estimates for GLM parameters.
To make the GLM connection explicit, we can write the NBD mean for prototype signature k at time t in terms of canonical GLM parameters as: The NBD variance is given by: Here, h(·) represents the link function, and X t is a vector of regression coefficients. We use the natural logarithm function as the link function h(·). For the random walk used in the MC-TIMME model, X tj = 1 for j ≤ t and X tj = 0 otherwise. The procedure for creating a WLS estimate is as follows. We first create transformed data values y gsitr for all data points assigned to prototype signature k: We then use the transformed data values to generate a diagonal weight matrix W (δ k ): The moments for the proposal distribution we will use in the MH steps are then given by: Here, j is the MCMC iteration number. The vector a represents the prior mean for the start of the random walk, i.e., a 1 = β and zero otherwise. The matrix Λ is the covariance for the start of the random walk, i.e., Λ 11 = σ 2 and zero otherwise. The acceptance probability q for the δ k MH steps is given by: k ) Here, scale δ is a scaling factor that we set to scale δ = tune δ · 5.76/T . The parameter tune δ is then used to tune the acceptance rate to ≈ 0.23, the conventionally used acceptance rate for multidimensional MH (see [12] for details).
The function l sk (y; δ k , k ) is the data likelihood restricted to data in condition s assigned to prototype signature k: l sk (y; δ k , k ) = g,i : z gsi =k t,r Neg-Bin(y gsitr ; µ k (t), k ) The conditonal probability p(δ k | β, σ, ρ) is the product of the Normal priors on δ k : The MH procedure for sampling δ k is then: 1. start with δ k sampled from the prior, i.e., δ k1 ∼ Normal(β, σ) and δ kt ∼ Normal(0, ρ √ ∆τ ) 2. at iteration j, calculate the proposal distribution moments m (j) k ) and accept with probability q(δ The MH procedure for the culture data model requires a slight modification of Equation 7. We must incorporate the offset term in the data transformation: 6.3 Step 3: sampling prototype signature variables controlling NBD variance We also use MH steps for sampling k . In this case, because we are sampling a single variable, we use a simple proposal distribution of Normal( The parameter tune is used to tune the acceptance rate to ≈ 0.44, the conventionally used acceptance rate for single dimensional MH (see [12] for details). . The acceptance probability for the k MH steps is given by:

Step 4: sampling the mean and variance for the prior on initial signature values
Sampling β and σ 2 is straightforward because these parameters use conjugate priors. The prior for β is Normal(m β , v β ). Thus, the posterior distribution for β is: Here K is the number of non-empty components.
The prior for σ 2 is Inv-χ 2 (v σ , ξ σ ). Thus, the posterior distribution for σ 2 is: Step 5: sampling the random walk variance Sampling ρ 2 , the random walk variance, is straightforward because ρ 2 has a conjugate prior of Inv-χ 2 (v ρ , ξ ρ ). Thus, the posterior distribution for ρ 2 is: Step 6: sampling the concentration parameter For sampling the concentration parameter α, we use an efficient auxiliary variable sampling method developed by Escobar and West [13]. It can be shown that the conditional distribution for α is given by (see [14]): Here K is the number of non-empty prototype signatures, N is the number of input data items, and B(·, ·) is the standard Beta function defined as: We can then write p(α | K, N ) as a marginalization over an auxiliary variable φ: From the joint distribution, we can see that: Thus, by alternating sampling of α and φ according to Equations 12 and 13 above, we sample from the posterior for α in the MCMC sampling iterations.

Detecting changes in signatures from infected subjects versus uninfected controls
As discussed in Section 2, MC-TIMME is a fully Bayesian model, which allows us to directly compute posterior probabilities of interest. To detect whether the behavior of a taxon g in site s changes in infected subjects versus uninfected controls, we are interested in the probability p gs of the taxon's time-series of counts in the two conditions being assigned to different prototype signatures. The samples from our MCMC inference method described in Section 6 allow us to readily estimate this probability: Here, N MCMC is the number of MCMC iterations and z (j) gsi is the assignment of taxon g in site s and condition i (infected or uninfected) at MCMC iteration j.
We also characterize the posterior distribution of assigned prototype signature shape parameters with median and 95% credible interval statistics. The estimated centiles for the posterior distribution of assigned prototype signature shape parameters for taxon g in site s in condition i are calculated as: Our criteria for detecting differences in infected and control signatures for taxon g in site s was to require 1 − p gs < 0.05 and that the 95% credible intervals for the infected and uninfected signatures not overlap for at least two time-points. The rationale for requiring changes for at least two time-points was that in the present study we were interested in taxa with sustained alternations in behavior, rather than those just exhibiting a single large change. Note that for detecting differences in behavior of taxa, MC-TIMME takes into account the entire time-series as well as behavior of other taxa. This provides significant advantages over a simple strategy of performing many independent statistical tests.
To estimate the false detection rate, we performed permutation testing. Each trial consisted of randomly permuting data for each taxon g across time-points and replicates, and then running MC-TIMME and detecting changes between infected and uninfected signatures as described above. The permuted data approximates a "null distribution" in which no taxon exhibits significant changes in the infected versus the uninfected state. We performed 100 trials, and detected an average of 4% of taxa as significantly changing, suggesting a false positive rate of < 5% on our dataset.

Time to recovery at each site
We define the time to recovery at a specific anatomic site as the latest time at which a weighted measure of changing OTUs is less than 5%. We define the weighted measure as follows. The median fraction of counts assigned to taxon g in site s in condition i at time t is given by: Here, µ gsi (t) is the median of the assigned signature for taxa g in site s in condition i at time t as defined in equation 14.
Let I gst denote an indicator variable that is equal to one (and zero otherwise) if the 95% credible intervals as defined in Section 7 for taxon g in site s at time t do not overlap for the infected and uninfected conditions, i.e., the indicator is equal to one if there is a change between the two conditions at that time point.
The weighted measure of changing OTUs in site s at time t is then defined as: This measure may be viewed as the truncated Hellinger distance [15] between the distributions of estimated taxa counts in the infected and uninfected conditions. The Hellinger distance is a general measure of distance between two probability distributions, with a particularly simple form in the case of multinomial distributions. We use a truncated version that only sums up the contributions of taxa that are detected as changing between the two conditions. Thus, the measure will be zero if no taxa are detected as changing, and can attain a maximum of one if all taxa are detected as changing and the difference between their proportions in the infected and uninfected states is maximal.

Time-maps
The purpose of time-maps is to visualize cascading changes in taxa behavior with infection (relative to the uninfected baseline) in a particular site, as well as to visualize groups of functionally related taxa in that site. Each row in the time-map visualizes a particular taxon g in site s across time.
We define the change for taxa g in site s at time t as: Here, z gs1 is the index of the prototype signature that taxa g in site s is assigned to in the uninfected state, and z gs2 is the corresponding index for the infected state. The visualized function is a normalized version of Ψ(g, s, t), in which we normalize by the standard deviation of the vector (Ψ(g, s, 1), . . . , Ψ(g, s, T )). The visualized color at a time-point t is red if Ψ(g, s, t) > 0 and blue if Ψ(g, s, t) < 0. The intensity of the color at each time-point t is scaled based on the magnitude of Ψ(g, s, t), and standardized across the entire figure by the upper and lower 95% percentile values of Ψ(·, s, ·) over all time-points and taxa. To improve the visualization, we linearly interporate between time-points. Further, for time-points (or interpolated regions) in which the 95% credible intervals of infected and uninfected assigned prototype signatures overlap, we quench the signal by 80% to indicate regions in which the significance of the change is uncertain.
Each taxon g is then assigned a vector consisting of: 1) the average time of maximal change for its functional group, 2) the index of the functional group to which the taxa belongs, 3) the time of maximal change for the taxa. The rows in the figure are then sorted by this vector. The net effect is that the rows are organized into functional groups, with the groups exhibiting average maximal changes earliest at the top of the figure, and taxa within groups sorted by their times of maximal change.