Stochastic models allow improved inference of microbiome interactions from time series data

doi:10.1371/journal.pbio.3002913

Fig 1.

Parameter inference workflow proposed and microbiome data properties.

(A) Mathematical models serve as a link between parameters and data. Either to simulate biological processes or to infer parameters from data. (B) Longitudinal sampling of the same hosts or an ensemble of them are used to obtain datasets. (C) Workflow from microscopic rates of a model and experimental data to inference of parameters values by ABC. The microscopic rates describe possible eco-evolutionary events (such as birth, migration, mutation, or speciation), leading to macroscopic patterns (statistical moments of abundance). Data sets describe absolute abundances (counts) or relative abundances (frequencies) of microbes. To quantify the probability of parameter values given a data set, prior knowledge about the parameters is updated to a posterior distribution based on the agreement of the model with the data. Note that because the model describes the dynamics continuously, no correlation between time points is needed. Figure created in BioRender.com under a CC-BY-NC-ND license.

More »

Expand

Fig 2.

Inferring true parameters from simulated data.

(A, B) Time series comparison between simulations (dots, derived from only 4 replicates) and equations for the statistical moments (lines) of absolute (n_k) and relative abundance (x_k) sharing true parameters (found in Tables 1 and 2). Two models with 3 microbial types (S = 3) were tested, (A) logistic growth with immigration and death 3S + 1 = 9 parameters and (B) Lotka–Volterra S + S² = 12 parameters. Inferred parameter posteriors from the relative abundance are compared to true parameters (dashed lines) and priors (black distributions). All microbial types shared the same priors (Tables 3 and 4). (C) The inferred interactions for the Lotka–Volterra model resembled the true interactions, qualitatively (arrowheads) and quantitatively (arrow thickness), with various certainties (grayscale, defined by the ratio of SD of posterior to prior). (D) For both data sets, the most probable model was identified correctly. The settings for the inference are listed in Table 6 (a.u. = time units are determined by the rates, see Tables 1 and 2). Networks on the left of A and B were created in BioRender.com under a CC-BY-NC-ND license. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13958305.

More »

Expand

Fig 3.

Outcome comparison of our workflow and linear regression of the deterministic Lotka–Volterra model.

We inferred all parameter values (Table 2) from simulated absolute abundance data as in Fig 2. While our workflow used the same setup of Fig 2, the linear regression method was based on [8] without time-dependent perturbations or regularization. Our Bayesian workflow successfully “locates” the true parameter values, along their uncertainty, even if the linear regression method does not. The initial parameters guess for linear regression was close to the true value (1.5 for growth rates, and −10⁻⁴ and 0 for intra- and inter-specific interactions). For our workflow, we used the same parameter priors of Fig 2, summarized in Table 4. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13958305.

More »

Expand

Fig 4.

Effect of data measurement noise on the uncertainty of an inferred Lotka–Volterra parameter.

We inferred all parameters from simulated data as shown in Fig 2 (see Table 2). For simplicity, we show the effect of noise on a single parameter with true value I_3,2 = −4.1 · 10⁻⁵ (dashed line). The effect on all parameters is shown in S2 Fig. We simplified the nuances of empirical noise [24] assuming a scenario where all microbial abundances are affected proportionally. Concretely, a uniform noise distribution was shared among all microbial types and constant through time. For low noise, data could be altered by up to ±5%, while for medium and high noise, by up to ±10% and ±20%. Noise was sampled independently for each microbial type at each time point, affecting the absolute abundances from which relative abundances were computed. (A) A larger number of replicates and/or time points help reduce the increased uncertainty caused by noise. In particular, the number of time points has a stronger effect than the number of replicates. (B) The uncertainty obtained from relative abundance data is consistently larger than from absolute abundance. Still, more replicates and/or sampling time points help to reduce the uncertainty. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13958305.

More »

Expand

Fig 5.

Inferring parameters from empirical data.

The parameters of a logistic growth with immigration and death model were inferred from a mouse dataset. The Oligo-Mouse-Microbiota (OMM¹²) data set [28] tracks a 12-species defined mice microbiome (S = 12), where the absolute abundances in the same individuals were sampled from feces 11 times over 99 days. (A) We analyzed the first 21 days where 4 replicates are available, we show here the abundance of all 12 types averaged over the 4 replicates. We use the underlying data to infer the parameters of a logistic growth model with growth, death, and immigration, with in total S + S² = 156 moments used for the inference. (B) Of the 3S + 1 = 37 parameters inferred, we show only the posteriors of the 5 most certain ones (defined by the ratio of SD of posterior to prior as a relative comparison of the certainty gained between parameters). All microbial types shared the same uniform priors (black lines, Table 5) to have a fair measure of the parameter uncertainty reduced. (C) The parameters inferred for each species varied widely with various certainties. For the shared carrying capacity, we found an average N ≈ 1.45 · 10⁷ bacterial cells, ±3.49 · 10⁵ cells, and uncertainty of 0.0582. A system of 156 equations was solved (S = 12 first moments and S² = 144 second moments and co-moments). The settings for the inference are listed in Table 6. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13958305.

More »

Expand

Table 1.

Parameters in simulated logistic growth with immigration and death (Fig 2A).

The growth and death rates as well as the immigration parameters were only chosen for illustration, thus, time units are arbitrary. We used a relatively small population size for simplicity. However, larger population sizes can be easily tested.

More »

Expand

Table 2.

Parameters in simulated Lotka–Volterra (Fig 2B).

The interaction parameters as well as the growth rates were only chosen for illustration, thus, time units are arbitrary. We used relatively small initial populations for simplicity. However, larger initial populations can be easily tested.

More »

Expand

Table 3.

Priors for simulated logistic growth with immigration and death (Fig 2A).

A combination of uninformative (uniform) and informative (normal) priors were used for illustration. These priors span a wide range of values to test the ability of the inference workflow to find the true parameters in simulations (Table 1). indicates a uniform distribution in the range from a to b. indicates a normal distribution with mean a and standard deviation b.

More »

Expand

Table 4.

Priors for simulated Lotka–Volterra (Fig 2B).

A combination of uninformative (uniform) and informative (normal) priors were used for illustration. These priors span a wide range of values to test the ability of the inference workflow to find the true parameters in simulations (Table 2). indicates a uniform distribution in the range from a to b. indicates a normal distribution with mean a and standard deviation b.

More »

Expand

Table 5.

Priors for logistic growth with immigration and death in empirical mouse data (Fig 5).

Available evidence and back-of-the-envelope calculations (marked by *) were used to propose wide priors. indicates a uniform distribution in the range from a to b. indicates a normal distribution with mean a and standard deviation b.

More »

Expand

Table 6.

Settings for ABC-SMC code.

These settings were chosen to decrease the computing time, but still robustly minimize the distance between data and model. We used tools from the Python package pyABC [34], mainly ABCSMC. The maximum number of generations, mismatch threshold (ε) minimum, and minimum ε change between generations are all stopping criteria (marked by *). LSODA is a numerical solver capable of selectively adapting to the stiffness of a system of differential equations. NA: not applicable.

More »

Expand