Integrating a tailored recurrent neural network with Bayesian experimental design to optimize microbial community functions

doi:10.1371/journal.pcbi.1011436

Fig 1.

The Microbiome Recurrent Neural Network (MiRNN) learns system dynamics and proposes new designs.

(Design) An experimental design space, denoted as , is a set of individual experimental conditions, q, where a particular condition could, for example, be a set of species in a community or the initial concentrations of resources. MiRNN predictions of outcomes for a set of experimental conditions, q, are evaluated by an acquisition function, f, which balances the expected information gain (EIG) of an experimental design and its expected profit to evaluate the optimality of experimental designs. (Test) The optimal experimental design, q*, defines a set of experimental conditions to be observed experimentally. Measurements of these conditions are collected in the test phase. (Learn) Data collected in the test phase, and all previously collected data, are used to fit an updated MiRNN model. Once fit to the newly acquired data, the updated MiRNN model can be used again in the design phase to complete the design, test, learn cycle.

More »

Expand

Fig 2.

The predictive capability of the MiRNN outperforms an unconstrained RNN model using simulated data over a range of sparsity levels.

(a.) A comparison of the MiRNN architecture to a standard RNN, where the constraint highlighted in blue prevents the model from predicting the spontaneous emergence of a species. (b.) Schematic of simulated data generation, indicating that a ground truth computational bioreactor model is used to simulate species abundances over a time span of 130 hours, with measurements of species abundances taken at 26 hour intervals. (c.) Comparison of RNN (green) and MiRNN (orange) performance in species predictions according to the average Pearson correlation coefficient (R) over all species between predictions and measured values. The height of the bars and error bars correspond to the median and interquartile range in prediction performance after running 10-fold cross-validation over 5 trials, where samples were randomly shuffled in each trial. (d.) Same as in panel (c.), except that RMSE instead of Pearson correlation is shown.

More »

Expand

Fig 3.

The predictive capability of the MiRNN outperforms an unconstrained RNN model using experimental data.

(a.) Schematic of experiment in which 95 unique microbial consortia were selected from a set of 25 health-relevant human gut bacteria. After inoculation, species abundances and metabolite concentrations were measured at 16 hour intervals over a course of 48 hours. (b.) A comparison of the MiRNN architecture to a standard RNN, where the constraint highlighted in blue prevents the model from predicting the spontaneous emergence of a species. (c.) Comparison RNN and MiRNN performance in species predictions according to the Pearson correlation coefficient between predictions and measured values. The height of the bars and error bars correspond to the median and interquartile range in prediction performance after running 20-fold cross-validation over 10 trials, where samples were randomly shuffled in each trial. (d.) Same as in panel (c.), except that metabolite prediction performance is shown. (e.) Representative temporal changes in MiRNN predicted metabolite concentrations, where measured values are shown as dots, the mean predicted value is shown as a line, and the uncertainty region shows ± 1 standard deviation.

More »

Expand

Fig 4.

Optimization of resources and feed rate to maximize product.

(a.) A schematic of the fed-batch bioreactor to be optimized, where the rate of a feed stream and the presence of resources in the feed (depicted as yellow and pink shapes) can both be adjusted in order to maximize production. Species (green shapes) that produce a valuable metabolite (orange star) compete for resources with species that do not produce the metabolite. (b.) A diagram that shows the inputs to the MiRNN model including species abundances, metabolite concentration, resource concentrations, and feed rate at time point t − 1. The model predicts species abundances and metabolite concentration at the next time step, t. Predicted species abundances and metabolite concentration are used as inputs to predict the next time step. (c.) A comparison of prediction performance (Pearson correlation coefficient) of end-point metabolite concentration between the proposed experimental design strategy that combines exploration and exploitation (blue) to pure exploitation (green), pure exploration (orange), and random sampling (purple). Solid lines show the median of the best recorded production (y-axis) up to each DTL cycle (x-axis) and uncertainty regions show the interquartile range computed over 30 trials each with random initial experimental designs. (d.) A comparison of metabolite maximization (showing median and interquartile range over 30 trials) between proposed exploration and exploitation (blue) to pure exploitation (green), pure exploration (orange), and random sampling (purple).

More »

Expand

Fig 5.

Microbiome Recurrent Neural Network architecture.

Inputs to the RNN at time step t − 1 include the state of species abundances, metabolite concentrations, control inputs, and a latent vector that stores information from previous steps and whose dimension determines the flexibility of the model. The output from each MiRNN block is the predicted system state and the latent vector at the next time step, t. To avoid the physically unrealistic emergence of previously absent species, a constrained feed-forward neural network (FFNN) outputs zero valued species abundances if species abundances at the previous time step were zero.

More »

Expand