Substantive changes in gene expression, metabolism, and the proteome are manifested in overall changes in microbial population growth. Quantifying how microbes grow is therefore fundamental to areas such as genetics, bioengineering, and food safety. Traditional parametric growth curve models capture the population growth behavior through a set of summarizing parameters. However, estimation of these parameters from data is confounded by random effects such as experimental variability, batch effects or differences in experimental material. A systematic statistical method to identify and correct for such confounding effects in population growth data is not currently available. Further, our previous work has demonstrated that parametric models are insufficient to explain and predict microbial response under non-standard growth conditions. Here we develop a hierarchical Bayesian non-parametric model of population growth that identifies the latent growth behavior and response to perturbation, while simultaneously correcting for random effects in the data. This model enables more accurate estimates of the biological effect of interest, while better accounting for the uncertainty due to technical variation. Additionally, modeling hierarchical variation provides estimates of the relative impact of various confounding effects on measured population growth.
Quantifying how microbes grow in response to stress is required for effective treatment of microbial infections, food safety, and understanding the effects of environmental change. Current models that quantify microbial growth characteristics such as exponential growth rate are based on assumptions that microbial growth curves will adopt a sigmoid form with characteristic lag, logarithmic, and stationary phases. These models are therefore inaccurate when applied to microbes growing under stress. Substantial variability across experiments that measure microbial growth further compounds the issue. Here we report a new statistical model freed from the assumption of optimum growth. This model also properly corrects for experimental variability, enabling researchers to monitor, quantify, and understand how microbial growth changes in response to gradations of stress. We apply this model to two microbial test systems to accurately quantify how pathogenic bacteria respond to acidic antimicrobial treatments, and how environmentally important microbes withstand stress.
Citation: Tonner PD, Darnell CL, Bushell FML, Lund PA, Schmid AK, Schmidler SC (2020) A Bayesian non-parametric mixed-effects model of microbial growth curves. PLoS Comput Biol 16(10): e1008366. https://doi.org/10.1371/journal.pcbi.1008366
Editor: Jason A. Papin, University of Virginia, UNITED STATES
Received: January 16, 2020; Accepted: August 30, 2020; Published: October 26, 2020
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: All relevant data and computer code are available for downloaded via the following GitHub repositories: https://github.com/ptonner/phenom https://github.com/ptonner/hsalinarum_tf_phenotype https://github.com/amyschmid/pseudomonas-organic-acids.
Funding: This study was funded by National Science foundation grants DMS-1407622 to SCS; NSF-MCB-1615685, -1651117, and -1642283 to AKS; NSF Graduate Research Fellowship to PDT (nsf.gov). Research on this project in PAL’s laboratory was supported by grant number BB/K019171/1 from the UK Biotechnology and Biological Sciences Research Council (bbsrc.ukri.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Microbial growth phenotypes inform studies in microbiology, including gene functional discovery, bioengineering process development, and food safety testing [1–3]. For example, recent advances in microbial functional genomics and phenotyping, or “phenomics”, have enabled transformative insights into gene functions, proving critical for mapping the genotype to phenotype relationship . Methods such as genome-wide CRISPRi  and targeted genome-scale deletion libraries [6, 7] frequently rely upon accurate quantitation of microbial population growth as an assay to identify novel mutants with significant growth phenotypes. Population growth, as measured by the growth curve of a microbial culture, is an aggregate measure of all cellular processes and captures how microbial cells adapt and survive in their environmental niche . Because microbial culturing is a necessary precursor to many experimental procedures in microbiology , reproducible results require accurate quantification of the variability in culture state measured through growth [9, 10].
Typical analyses of microbial population growth involve estimating parametric models under the assumptions of standard growth conditions comprised of three successive growth phases: (1) lag phase, in which the population adapts to a new environment, typically fresh growth medium at culture inoculation; (2) log phase, when the population grows exponentially at a rate dependent on nutrients in the environment; and (3) stationary phase, where measurable population growth terminates thereby reaching the culture carrying capacity . Recent studies have shown that the estimates of parameters in these models are highly uncertain [12–14]. This uncertainty arises both from factors of biological interest, such as differences in genetic background and environment, as well as uncontrolled technical noise from experimental manipulation of microbial cultures. While such sources of variability can be modeled using fixed and random effects [15–19], parametric population growth models have additional limitations. Specifically, the parametric assumptions of these growth models require that growth measurements match the sigmoid shape expected for the growth curve under optimum conditions. When population growth deviates from the standard sigmoid shape assumed in these models, model extensions must be developed on a case by case basis for each new experimental perturbation [20, 21]. Additionally, we have shown in previous work that in cases such as extreme stress or strongly deleterious mutations, no parametric growth model accurately represents the growth curve, regardless of extension [19, 22, 23].
Factors affecting microbial growth measurements include both fixed and random effects . Fixed effects are assumed to be drawn from a finite set of perturbations of interest, for example the effect of different concentrations of a chemical on growth that are entirely represented in the dataset. Random effects, conversely, can be viewed as a random sample from a larger set of interest. For example, repeating the same design over many experiments corresponds to sampling the random experimental effect from the theoretical set of all possible experiments that could be conducted with this design [3, 25]. Random effects arising from repeated experimental design are typically referred to as batch effects [26, 27]. Batch effects are often a significant component of measurement noise in high-throughput genomics experiments . However, random effects are not always due to experimental noise, and may represent quantities of direct scientific interest; for example, assaying a set of genetic backgrounds may be viewed as sampling from the set of all possible genetic variants [29–33]. Models which include both fixed and random effects are referred to as mixed effects models.
In this study we present phenom, a general model for analysis of phenomic growth curve experiments based on a Bayesian non-parametric functional mixed effects model of microbial growth. We demonstrate the utility of phenom to analyze population growth measurements of two microorganisms: the hypersaline adapted archaeon, Halobacterium salinarum; and the opportunistic bacterial pathogen, Pseudomonas aeruginosa. H. salinarum is a model organism for transcriptional regulation of stress response in the domain of life Archaea [34–36]. H. salinarum is particularly well adapted to resisting oxidative stress (OS), which arises from the buildup of reactive oxygen species and causes damage to many critical cellular components, including DNA, protein, and lipids [37–43]. Population growth measurements of H. salinarum under OS have been used previously to quantify these harmful effects on physiology, as well as identify regulatory factors important for OS survival [22, 40–42]. The presence of batch effects in H. salinarum OS response was reported (and corrected for) previously , but these efforts did not include modeling of individual batch effects for each term in the model. This motivated the explicit deconstruction of batch effects between different factors (e.g. strain and stress), which we have implemented in phenom and reported here.
Pseudomonas aeruginosa is an opportunistic microbial pathogen and a growing problem in hospital-borne infections. Rising antimicrobial resistance of these organisms has necessitated the development of alternative treatment strategies. For example, topical treatment of infected burn wounds with acetic or organic acids (OAs) has been successful . OA impact on growth depends on external pH levels—in acidic environments the OA does not dissociate, but rather freely traverses the cellular membrane as an uncharged particle. Within the neutral cytoplasm, the OA dissociates, and the protons released induce acid stress . Here we apply phenom to the P. aeruginosa dataset, which is foundational for a larger study of P. aeruginosa strains responding to pH and OA perturbation as a potential novel treatment of pathogenic bacterial infections .
Stress occurs constantly in the environment: as conditions change, mild to severe cellular damage occurs, and cells must regulate their molecular components to survive [46–49]. Population growth measurements are particularly vital to the study of stress response by providing a quantitative measure of growth differences against a non-stressed control . Our model recovers fixed effects due to high and low levels of OS in H. salinarum and interactions between organic acid concentration and pH in P. aeruginosa. Random effects from multiple sources are corrected, thus enabling more accurate estimates of the biological significance of the stress treatment effect. Notably, in cases where random effect and fixed effect sizes are comparable, we demonstrate that mixed modeling is critical for accurate quantification of model uncertainty. If random effects are not included in the model, the significance of the effect of stress treatments on population growth can be erroneously overestimated. We discuss the implications of these findings for multiple areas of microbiology research.
Materials and methods
Experimental growth data
H. salinarum growth was performed as described previously . Briefly, starter cultures of H. salinarum NRC-1 Δura3 control strain  were recovered from frozen stock and streaked on solid medium for single colonies. Four individual colonies per strain were grown at 42°C with shaking at 225 r.p.m. to an optical density at 600 nm (OD600) ∼1.8–2.0 in 3 mL of Complete Medium (CM; 250 NaCl, 20 g/l MgSO4•7H2O, 3 g/l sodium citrate, 2 g/l KCl, 10 g/l peptone) supplemented with uracil (50 μg/ml). These starter cultures were diluted to OD600 ∼0.05 in 200 μl CM (“biological replicates”) then transferred in triplicate (“technical replicates”) into individual wells of a microplate. Cultures were grown in a high throughput microplate reader (Bioscreen C, Growth Curves USA, Piscataway, NJ), and culture density was monitored automatically by OD600 every 30 minutes for 48 hours at 42°C. High and low levels of OS were induced by adding 0.333 mM and 0.083 mM of paraquat to the media, respectively, at culture inoculation.
For P. aeruginosa, laboratory strain PAO1 (ATCC 15692) was grown as described in reference . Briefly, cultures were grown in M9 minimal media supplemented with 0.4% (w/v) glucose and 0.2% (w/v) casamino acids and buffered with 100 mM each of MES and MOPS buffers. Initial cultures were diluted to a starting OD of 0.05 before growth in a microplate reader at a total volume of 200 μl per well. Population growth was measured with a CLARIOstar automated microplate reader (BMG Labtech) at 37°C with 300 rpm continuous shaking. The OD600 was recorded automatically every 15 minutes for a total of 24 hours. A full factorial design of pH and OA concentration was performed for benzoate, citric acid, and malic acid. An experimental batch corresponded to two repetitions of the experiment on separate days with a minimum of three biological replicates of each condition on each day. Two batches for each OA were performed.
All data generated or analyzed during this study are included in this published article (see github repository associated with this study, https://github.com/ptonner/phenom).
Parametric growth curve estimation
For comparison with our non-parametric methods, parametric growth curve models were estimated using the grofit package in R with default parameters . The logistic model was used to fit each curve. Kernel density estimates of parameter distributions were calculated with the Python scipy package with default kernel bandwidth parameters .
phenom: A hierarchical Gaussian process model of microbial growth
A Gaussian process (GP) defines a non-parametric distribution over functions f(t), defined by the property that any finite set of observations of f follow a multivariate normal distribution . A GP is fully defined by a mean function and a covariance function κ(t, t′) (Eq (1)): (1)
GPs are commonly used for non-parametric curve fitting  where is typically set to 0, which we do here. Similarly, we use a common choice for covariance function defined by a radial basis function (RBF) kernel (Eq (2)). (2) Where σ2 is the variance and ℓ is the length-scale. The parameter σ2 controls the overall magnitude of fluctuation in the population of functions described in the GP distribution, while ℓ controls the expected smoothness, with larger ℓ making smoother, slower varying functions more likely. In the process of non-parametric modeling of growth curves, these parameters are adaptively estimated from the dataset.
We first define the fixed effects models used in this study; these will be augmented with random effects in the next section. We consider fixed effects models of increasing complexity: a mean growth phenotype, a single treatment phenotype, and a combinatorial phenotype with interactions between treatments. All of these models fall under the functional analysis of variance (ANOVA) framework [22, 54]. To estimate a mean growth profile, as in the case of measuring a single condition, a mean function m(t) is estimated from the data by modeling each replicate yr(t) for 1 ≤ r ≤ R as consisting of an unknown mean function observed with additive noise (Eq (3)). (3) Where m(t) ∼ GP(0, κm(t, t′)) provides a prior distribution over m, and κm is an RBF kernel with hyperparameters . Here is Gaussian white noise.
When estimating the effect of a perturbation on growth, as in the case of OS, we add a second function δ(t) that represents the effect of the stress being considered. The model then becomes (Eq (4)): (4) where δ(t) ∼ GP(0, κδ(t, t′)) also follows a GP prior independently of m, and κδ has hyperparameters .
When incorporating possible interaction effects such as those between pH and organic acids in the P. aeruginosa dataset, the model becomes (Eq (5)): (5) for pH p and molar acid concentration c, with αp(t) representing the main effect of pH, βc(t) the main effect of acid concentration, and (αβ)p,c(t) the interaction between them. Each effect is drawn from a treatment specific GP prior (Eq (6)). (6)
Again, each covariance function is specified by a RBF kernel with corresponding variance and lengthscale hyperparameters that adapt to the observed data. All models in this section correspond to Mnull for their respective analyses, as they do not include any random effects.
The first random effects added to the model were those used to account for batch effects, in the model Mbatch. Under this model, each fixed functional effect is modified by a GP describing the population of possible batch-specific curves. For example, under the model of interaction effects on growth (Eq 5), replicate r from batch k is modeled as (Eq 7)): (7) where the functions shared with Mnull are highlighted, the functions m(k)(t), , , and are the corresponding random batch effects, and . Similar to the fixed effects, the batch effect functions are drawn from shared GP priors (Eq (8)): (8) with kernel hyperparameters , , and that are distinct from those for the corresponding fixed effects, allowing for different variance and lengthscales between fixed and random effects. Other Mnull models are converted to Mbatch similarly, with each fixed effect becoming a mean of a GP prior for each batch effect.
Mfull develops the hierarchy one step deeper by adding replicate effects to Mbatch (Eq 7). Specifically, the error model ϵk,r is now described by a GP: ϵk,r ∼ GP(0, κy(t, t′)) with corresponding hyperparameters, accounting for replicate-specific variability rather than simply white noise.
The unknown functions (m(t), δ(t), αp(t), βc(t), and (αβ)p,c(t)), kernel hyperparameters ( and ℓl) for each group of latent functions, and observation noise parameters () are all estimated by Bayesian statistical inference. In Bayesian inference, prior distributions on unknown quantities (e.g. p(m(t), θm) = p(m(t)|θm) × p(θm)) are combined with the likelihood, p(y(t)|m(t)) to obtain the posterior distribution p(m(t), θm|y(t)).
Latent functions are grouped by shared kernel hyperparameters into related sets (e.g. treatment effects, interaction effects, batch effects), which then provide the GP prior for the latent function. For each group, is assigned a Gamma(α, β) prior, with fixed effects assigned as a Gamma(10, 10) prior and random effects assigned a Gamma(7, 10) prior. ℓl was assigned an inverse- Gamma(α, β) prior, with parameter α = 6 and β = 1 for H. salinarum, and α = 2, β = 3 for P. aeruginosa fixed effects and α = 10, β = 1 for P. aeruginosa random effects. Noise variance was also assigned a gamma prior.
Bayesian inference was then performed, with the posterior distribution obtained by sampling using Markov chain Monte Carlo (MCMC) implemented with the Stan library, which uses a Hamilitonian Monte-Carlo procedure with No-U-turn sampling . Multiple chains were run to diagnose convergence, with all parameter posterior means confirmed to have converged within as recommended .
Hierarchical batch effects typical in phenomics datasets render parametric models ineffective
In the dataset used here, population growth for each of P. aeruginosa and H. salinarum cultures was monitored under standard (non-stressed) conditions vs. stress conditions (see Materials and methods and references [22, 23] for precise definition of “standard conditions” for each organism). Specifically, cultures were grown in liquid medium in a high throughput growth plate reader that measured population density at 30 minute intervals over the course of 24 hours (P. aeruginosa) or 48 hours (H. salinarum); the resulting data are shown in Fig 1. Experimental designs for each organism included biological replicates (growth curves from different colonies on a plate), technical replicates (multiple growth curves from the same colony), varying conditions (stress vs standard), and batches. Throughout, we define “batch” as a single run of the high throughput growth plate reader. In each run, this plate reader measures the growth of 200 individual cultures across a range of perturbations, including varying stress conditions and genetic mutations (see Methods). H. salinarum was grown under high (0.333 mM paraquat (PQ)) and low (0.083 mM PQ) levels of oxidative stress (OS); the data are combined from published [19, 22, 41] and unpublished studies (Fig 1A). The OS responses of H. salinarum were compared to a control of standard growth in rich medium, representing optimal conditions for the population. The experimental design was replicated in biological quadruplicate and technical triplicate, across nine batches (Fig 1A, individual curves and axes). P. aeruginosa was grown in the presence of increasing concentrations of three different organic acid (OA) chemicals (0–20mM; benzoate, citric acid, and malic acid), each combined with a gradient of pH (5.0–7.0) . Each P. aeruginosa growth condition was repeated across 3 biological replicates and two batches (Fig 1B). The different P. aeruginosa and H. salinarum experimental designs with varying numbers of replicates at each level provides a rich test bed for modeling the impact of random effects with phenom (Fig 1B, S1 and S2 Figs).
A: Population growth measurements of H. salinarum under standard conditions (blue), and low (orange) and high (green) levels of OS. Individual measurement curves are replicates and each graph panel is a different batch. B: Growth of P. aeruginosa strain PA01 under gradient of pH (5–7) and citric acid (0–20 mM). Colors represent different batches.
Figs 1 and 2 demonstrate the two key issues described above and addressed in this paper. First, batch effects are present in both H. salinarum and the P. aeruginosa datasets. For H. salinarum, clear differences in growth under both standard and stress conditions are observed in the raw data across experimental batches (i.e. separate runs of the growth plate reader instrument; Fig 1). Some batches show a different phenotype, with either a complete cessation of growth or an intermediate effect with decreased growth relative to standard conditions. For example, in some batches, populations stressed with low OS grow at the same rate and reach the same carrying capacity as populations grown under standard conditions. For P. aeruginosa, a clear difference between batches grown under 10 mM citric acid at pH = 5.5 is observed [Fig 1B (graph in fourth column, third row) and Fig 2D]. Like with citric acid, batch effects were also found in some of the other conditions considered (e.g. growth under malic acid, S1 and S2 Figs).
(A) Parametric fits to H. salinarum growth curves. (B) Residuals of parametric growth curve fit. (C) Growth of H. salinarum under standard conditions (blue), low (orange) and high (green) OS across three batches. (D) Measurement of P. aeruginosa growth under 10mM citric acid at 5.5 pH. Measurements for each condition vary significantly with batch.
Second, standard parametric growth curve models fail to describe experimental measurements adequately (Fig 2A and 2B), as we have shown previously with both datasets [19, 22, 23]. In Fig 2, we examined the impact of batch and replicate effects on our data by considering how they change parameters estimated from a mixed effects parametric model of population growth . We focused on calculating μmax, the maximum instantaneous growth rate attained by the population, as this is a commonly used parameter for comparisons between conditions [19, 57]. Variation in μmax estimates were observed both on the replicate and batch level, as shown by the kernel density estimates (KDE) of μmax for each stress level (S3 Fig). The variance in μmax is remarkably high: the 95% confidence interval for μmax under standard growth is 0.050–0.141, a nearly 3-fold change between the lower and upper interval limits. Thus, while the t-test conducted on μmax estimates between standard conditions and each stress level is statistically significant (S3 Fig), it is difficult to conclude: (a) what the true magnitude of the stress effects may be; and (b) to what degree the variation due to replicate and batch should inform biological conclusions. The error of the logistic growth model under each PQ condition was also examined. Error increased under high OS (S4 Fig). High OS induces a growth phenotype that deviates heavily from the sigmoidal growth curve assumed in the logistic model as well as in other commonly used growth models. This leads to a poor fit under the high OS condition as has been shown previously (S4 Fig, ). The residuals under standard, low, and high OS conditions also appear to be dependent. Our previous work also demonstrated poor fits to the P. aeruginosa data using parametric models . Taken together, the initial assessment of these two datasets indicates that: (a) technical variation due to batch and replicate in growth curve data can be high; and (b) commonly used standard parametric models are not able to adequately capture or correct for these sources of variability. These sources of error need to be corrected in order to model true growth behavior and inform biological conclusions from the data.
A hierarchical Bayesian model of functional random effects in microbial growth
We previously established the ability of non-parametric Bayesian methods to improve the modeling of growth phenotypes [19, 22, 23]. Here, we describe phenom, a fully hierarchical Bayesian non-parametric functional mixed effects model for population growth data. We highlight the utility of phenom to correct for confounding, random effects in growth phenotypes.
In order to model both biological and technical variation in microbial growth (Fig 3), we first assume that a set of population growth measurements are driven by an (unobserved) population curve m(t) (Fig 3A, blue curve) of unknown shape. For example, m(t) might represent the average growth behavior of an organism under standard conditions. This mean growth behavior may be altered by a treatment effect, represented by an additional unknown curve δ(t) (Fig 3A, orange curve). For example δ(t) may represent the effects on growth induced by low or high levels of OS (Fig 2A). The average growth behavior of a population under stress conditions would then be described by the curve f(t) = m(t) + δ(t).
Representative diagram of hierarchical variation present in microbial growth data. Each tier of graphs represents a different variation source, and lines indicate relationship between them: experimental condition is the true growth behavior of interest, with the condition repeated across batches, and replicates repeated within each batch. (A) Functional phenotypes m(t) (blue), m(t) + δ(t) (orange), and δ(t) (green curve in inset). (B) Batch effects on m(t) and m(t) + δ(t). Each plot is a different batch, solid lines are the true functions as in (A), and the dashed lines are the observed batch effect of m(t) and m(t) + δ(t) for the corresponding batch. (C) Replicate effect within batches. Each axis is a different replicate, solid and dashed lines as in (B), dotted-dashed line is the observed replicate function. (D) Observations from the model described in (A-C). Each curve is sampled with a mean drawn from the global mean, with added batch and replicate effects (dotted-dashed lines in C) and iid observation noise. Each axis is a different batch. The smooth solid lines are the true functions m(t) and m(t) + δ(t) in (A).
When considering a combinatorial experimental design, such as that described for P. aeruginosa growth (Fig 1B), we model independent effects of different treatments as well as their interaction via the form (Eq (9)): (9)
Here, y(t, i, j) denotes the observed population size at time t with treatments i and j of two independent stress conditions. Additionally, αi(t) and βj(t) are the independent effects of each stress condition, and (αβ)i,j(t) is their interaction. This model corresponds to a functional analysis of variance , which we have previously used to estimate independent and interaction effects of microbial genetics and stress . For the analysis of P. aeruginosa, we model the effect of pH (αp for pH = p), organic acid combination (βc for concentration = c) and their interaction ((αβ)p,c), as well as their random functional effect equivalents (see Section “phenom: A hierarchical Gaussian process model of microbial growth”).
Variability around these fixed effect growth models is described by additional, random curves associated with two major sources of variation: batch and replicate (Fig 3B and 3C). Batches correspond to a single high-throughput growth experiment and replicates are the individual curve observations within a batch. Using phenom throughout this study, we only compare replicates that are contained within the same batch. This is due to the nested structure between batch and replicates (Fig 3). Noise due to both replicate and batch do not appear to be independent identically distributed (iid), as observed in the correlated residuals around the mean for each experimental variate (S5A and S5B Fig). Each observed growth curve is therefore described by a combination of the fixed effects and the corresponding batch and replicate effects (Fig 3D). Both replicate and batch variation are modeled as random effects because the variation due to both sources cannot be replicated, i.e. a specific batch effect cannot be purposefully re-introduced in subsequent experiments. Instead, these variates are assumed to be sampled from a latent distribution . Combining the fixed and random effects, we arrive at a mixed-effects model of microbial phenotypes.
We adopted a hierarchical Bayesian framework to model these mixed effects. In this framework, batch effects are described by a shared generative distribution, allowing them to take on distinct values while still pooling across replicates for accurately estimating the generating distribution . We use Gaussian process (GP) distributions for all groups in the model. GPs are flexible, non-parametric distributions suitable for smooth functions . To assess the impact of incorporating random effects on estimation of the treatment effect of interest, we analyze three models of increasing complexity: Mnull excludes all hierarchical random effects, Mbatch incorporates batch variation only, and Mfull incorporates both batch and replicate variation. These models, collectively called phenom, were implemented using the probabilistic programming language Stan  to perform Bayesian statistical inference for all unknown functions and model parameters through Hamiltonian Monte Carlo sampling (see Materials and methods).
In previous work  we identified and corrected for batch effects in a single transcription factor mutant’s stress response, but this model did not provide an explicit deconstruction of batch effects between different factors (e.g. strain and stress) and could therefore not determine which factors were most strongly impacted by batch effects. Moreover, this approach utilized a standard GP regression framework, which has well-established limitations on dataset size, limiting its applicability to the large datasets we consider here. In reference  we described a functional ANOVA model for microbial growth phenotypes, which corresponds to the Mnull model in the phenom case. Again, a global batch effects term was included but individual batch effects were not modeled, and the computational approach utilized (Gibbs sampling) was prohibitively slow for the complete phenom model. phenom represents a significant advance on these previous modeling approaches and computational methodologies.
In order to demonstrate the impact of batch effects on the conclusions drawn from the analysis of microbial growth data, we estimated the latent functions driving both H. salinarum and P. aeruginosa growth using the Mnull model of phenom, with each batch analyzed separately (Fig 4). This corresponds to the analysis that would be conducted after generating any single set of experiments from a batch, without considering or controlling for batch effects, and therefore provides a test of the impact of ignoring batch effects.
Posterior intervals of functions are shown for different analyses where phenom Mnull was fit using data from each batch separately. In all plots, solid line represents posterior mean, shaded region indicates 95% credible region, and each color corresponds to a different posterior conditioned on data from a single batch. (A) Posterior intervals of m(t) under standard growth, and δ(t) under low and high OS response for H. salinarum. (B) Posterior interval of interaction function (αβ)p,c(t) for P. aeruginosa growth in indicated pH and acid concentration.
For H. salinarum, growth data under standard conditions was used to estimate a single mean function, m(t), and fixed effects were estimated for differential growth under low and high OS as δ(t) (Fig 4A). For the P. aeruginosa dataset, batch effects on the interaction between pH and organic acid concentration was represented by a function (αβ)p,c(t), again estimated non-parametrically (Fig 4B). However, rather than reporting (αβ)p,c(t) directly, we report its time derivative, which has the interpretation of instantaneous growth rate rather than absolute amount of growth, and provides an alternative metric for assessing the significance of a treatment effect on growth . Specifically, assessing growth curve models can benefit from the estimates of derivatives as they may more accurately represent the differences between growth curves .
Fitting the Mnull model to each separate batch reveals that the posterior distributions obtained for each function of interest (m(t), δ(t), and (αβ)p,c(t)) are highly variable across batches (Fig 4). This is observed in both the H. salinarum and P. aeruginosa datasets, where the experimental conditions, and therefore the underlying true mean functions, remain constant across batches in each case. Such variability can impact conclusions. We specifically assess the changes in statistically significant treatment effects, i.e. at time points where the effect (δ(t) or (αβ)p,c(t)) has a 95% posterior credible interval excluding zero, indicating high confidence that the treatment effect at that time-point differs from the control. For example, in the low OS condition in the H. salinarum dataset, both the statistical significance of δ(t) and the sign (improved vs. impaired growth) differs between batches (Fig 4A, center). Additionally, the effect of low oxidative stress at time zero is estimated to be non-zero for many of the batches. This is due to technical variation that introduces an artificial offset in OD measurements at the beginning of the growth experiment. Such variation can arise from various factors, including variation between growth state in starter cultures and technical variation in plate reader measurements at low OD (Fig 1A). A similar batch variability was observed under high OS, but due to the stronger effect of the stress perturbation, estimates of δ(t) are less affected by batch and replicate variation (Fig 4A, right). Similarly, the batch variability observed in the raw P. aeruginosa growth data (Fig 1B) results in significantly different posterior estimates of the interaction effect (αβ)p,c(t) across batches, as seen by the lack of overlap between 95% credible intervals (Fig 4B). Differences observed include the timing and length of negative growth impact (benzoate and citric acid), and completely opposite effects with either strong or no interaction (malic acid). In addition, the posterior variance of each function, which indicates the level of uncertainty remaining, is low for each batch modeled separately. This indicates high confidence in the estimated function despite observed differences across batches. These analyses suggest that use of a single experimental batch leads to overconfidence in explaining the true underlying growth behavior.
Hierarchical models correct for batch effects in growth data
To demonstrate the use of phenom to combat the impact of batch effects on growth curve analysis, we combined data across all batches and performed the analysis using each of the Mnull, Mbatch, and Mfull models (Fig 5). Estimates of m(t) between each model were largely similar, likely due to the abundance of data present to estimate this variable (S6 Fig). Instead, we focus on the estimates of δ(t) for low and high OS response of H. salinarum (Fig 5A) and the interaction (αβ)p,c between pH and OA concentration effects on P. aeruginosa growth (Fig 5B). In cases where Mnull differs from Mbatch and Mfull, this indicates an inability for this model to correctly represent the uncertainty due to random effects in the data, which have been shown to be prevalent across different batches of experiments (Fig 4).
Posterior intervals of functions estimated by models of increasing hierarchical complexity: Mnull (blue), Mbatch (orange), and Mfull (green). Solid line indicates posterior mean and shaded regions indicate 95% credible regions. We also plot Mnull and Mfull in separate axes to highlight the posterior intervals estimated by these models. (A) Posterior interval of δ(t) for low (left) and high (right) OS response by H. salinarum. (B) Posterior interval of interaction function (αβ)p,c(t) for P. aeruginosa growth in indicated pH and organic acid concentration.
Growth impairment in the presence of low OS relative to standard conditions (i.e. δ(t)) is estimated to be significant during the time points of ∼0–13 and ∼19–40 hours under Mnull. In contrast, only time points ∼23–31 are significantly non-zero under Mbatch, and no significant effect is identified under Mfull(Fig 5A, left). Conversely, due to the stronger stress effect in the high OS condition (Fig 5A, right), estimates of δ(t) were significantly non-zero under all three models, with only minor differences between the three model estimates. This highlights the importance of controlling for batch and replicate variability as in Mfull: even when estimating the low OS treatment effect under Mnull with all available data, without accounting for batch and replicate random effects the posterior estimates of δ(t) are overconfident and do not accurately represent the uncertainty with respect to the true treatment effect. The lack of significance under Mfull suggests that additional data are needed to confidently identify the true treatment effect in the presence of batch and replicate variation. Modeling the batch effects has also corrected for variability in treatment effects due to technical variation at the inoculation of the growth plate (time zero of the experiments) (S7 Fig).
The impact of modeling hierarchical variation on estimating interaction effects in P. aeruginosa growth was condition dependent (Fig 5B). Across all conditions, a decrease in posterior certainty on the true shape of the underlying function was again observed under Mbatch and Mfull. For benzoate and malic acid, the interaction between pH and acid concentration no longer appears to be a significant effect after accounting for batch and replicate variation, while the larger interaction under citric acid remains significant. As in the comparison of oxidative stress treatments in H. salinarum, stronger effect sizes are required to be confidently distinguished in the face of batch and replicate variability. Finally, the relative conclusions made for the absolute function scale are comparable to those of the derivative estimates for P. aeruginosa, highlighting the flexibility with which treatment effects can be analyzed as most relevant to the researcher (S8 Fig).
For both H. salinarum response to OS and P. aeruginosa growth under pH and OA exposure, an increase in posterior variance was observed under Mbatch and Mfull compared to Mnull (S9 Fig). However, posterior variance of δ(t) in the H. salinarum OS response was higher under Mbatch compared to Mfull. In this case, controlling for replicate effects appears to increase the signal needed to identify δ(t). In contrast, these variances are equal in the P. aeruginosa data, indicating that the relative improvement in variance afforded by modeling batch vs. replicate effects may be dataset dependent.
Variance components demonstrate the importance of controlling for batch effects
Variance components, which correspond to the estimated variance of each effect in the model, can be used to compare the impact each group has on the process of interest . To better understand sources of variability in growth curve studies, we used phenom to estimate the variance components for each dataset above. In our hierarchical non-parametric setup, these variance components are the variance hyperparameters (e.g. σ2) of the Gaussian process kernels for each fixed and random effect group. These parameters control the magnitude of function fluctuations modeled by the GP distribution. Larger variance implies higher effect sizes and therefore a larger impact on the observations.
We show the value of variance components by considering the effects identified by Mfull for H. salinarum under low OS (Fig 6). The variance of the data is partitioned between the mean growth (m(t)), the OS (δ(t)), batch effects (batch curves of m(t) and δ(t)), biological noise (e.g. replicate variability) and instrument noise (). This analysis confirms that batch effects, compared to the other sources of experimental variability in the dataset (replicate noise and measurement error), are between 2 to 10 times more impactful on the phenotype measurements. Additionally, variance components enable comparisons between the experimental and treatment factors in the data. Of particular note is that the variance of the treatment of interest, δ(t), and the batch effects are similar in magnitude, at least in the case of a low-magnitude stress such as 0.083 PQ for H. salinarum. This suggests that proper modeling of this treatment requires both sufficient batch replication and accurate modeling of batch effects in those data. Future studies of similar phenotypes can be guided by these estimates in experimental design, choosing an appropriate batch replication for the degree of noise expected . However, the extent of replication required may depend upon the dataset (factorial design, treatment severity, etc). Taken together, variance components provide an aggregated view of the contribution by various factors and guide future experimentation. Iterative rounds of phenom analysis and growth experiments can then determine the best suited designs, for example by leveraging the estimated batch effect variance from pilot experiments to determine the number of batch replications necessary to reliably estimate a treatment effect of given magnitude. The use of phenom for such formal statistical experimental design calculations represents an exciting direction for future work.
Posterior intervals are shown for the kernel variance hyperparameter for different groups of effects from phenom estimated on H. salinarum growth under low OS. Groups correspond to m(t) (mean), δ(t) (stress), batch effects (batch), replicate noise (biological), and measurement error (noise).
We have provided a framework to test and control for random effects in microbial growth data using the hierarchical non-parametric Bayesian model, phenom (Fig 3). Analysis with phenom indicates that random effects (both batch and replicate) appear in the two microbial population growth datasets studied here, and constitute significant portions of the variability (Fig 1). Failure to correct for these effects confounds the interpretation of growth phenotypes for factors of interest in a large scale phenotyping analysis (Fig 4). phenom controls for these random effects and provides accurate estimates of the growth behavior of interest (Fig 5). Additionally, phenom can be used to estimate variance components, providing information about the relative impact of various sources of noise in the data (Fig 6). Controlling for batch effects in these datasets was therefore key to making accurate biological conclusions.
Related fields of functional genomics, such as transcriptomics, have seen considerable interest in controlling for different experimental sources of variation, broadly labeled as batch effects [28, 62–67]. These studies have shown that differences between batches first need to be corrected to avoid erroneous conclusions . Here we have shown that, like in transcriptomics data, controlling for sources of variation in phenomics data—particularly due to batch—are an important step in making accurate biological conclusions regarding population growth. Additionally, the use of random batch effects in phenom highlights cases where additional information may be gained by further experimentation. Specifically, in cases where treatment effects differ strongly across batches, there may be underlying biological differences driving the variation. Follow-up experimental designs can then aim to delineate these effects directly in a way not confounded by batch. phenom establishes a complete and general method of controlling batch effects in microbial growth phenotypes, overcoming significant weaknesses of previously developed techniques.
Although we focus here on replicate and batch variation, the phenom model is easily extended to incorporate alternative or additional random and fixed effects appropriate for settings with other sources of variation. For example, depending on the experimental design, phenom could control for variation among labs, experimental material, culture history, or genetic background [25, 69–75]. phenom flexibly incorporates additional sources of variation and/or interaction between design variables, as demonstrated with the two different designs analyzed for H. salinarum and P. aeruginosa here. This flexibility allows phenom to be applied to control for many sources of technical variation within microbial population growth data, thereby improving the analysis and resulting conclusions regarding quantitative microbial phenotypes. We therefore expect our model to find broad applications in fields such as bioprocess control, microbial bioengineering, and microbial physiology.
S1 Fig. P. aeruginosa growth under benzoate and pH gradient.
Growth of P. aeruginosa strain PAO1 under gradient of pH (7–5) and benzoate (0–20). Colors represent different batches.
S2 Fig. P. aeruginosa growth under malic acid and pH gradient.
Growth of P. aeruginosa strain PAO1 under gradient of pH (7–5) and malic acid (0–20). Colors represent different batches.
S3 Fig. KDE of μmax for H. salinarum growth across batches.
Crosses indicate significant difference between μmax standard conditions and each OS level (one-sided t-test, p < 0.05).
S4 Fig. Error in parametric growth models.
Distribution of error (MSE) for each condition when fit with a logistic growth curve. The box show shows the inter-quartile range, red line is the median, whiskers show the 1.5 inter-quartile range, and the individual points are outliers.
S5 Fig. Residual structure of microbial growth data across batches.
(A) Individual replicate curve residuals around the mean of the respective batch. Only standard conditions are shown. (B) Residual of the mean behavior for each batch around the global mean (standard condition only).
S6 Fig. Posterior comparison of m(t) for H. salinarum growth across batches.
Posterior interval of m(t) for H. salinarum standard growth.
S7 Fig. Posterior intervals of low oxidative stress batch effects estimated from Mfull.
Full estimates of the δ(t) batch effect under Mfull are shown, with solid lines representing posterior mean and shaded region representing 95% credible intervals (left). 95% posterior credible interval of batch effects for δ(t) at time zero are shown (right), with crosses marking posterior means. Many of the batch effects for δ(t) are estimated to be non-zero at the start of the experiment, reflecting the impact of technical variation in the high-throughput readings at the start of the growth curves.
S8 Fig. Posterior intervals of interactions for P. aeruginosa on an absolute growth scale.
S9 Fig. Posterior variance of function estimates under different models.
Each plot shows the posterior variance of a function at each time point under each of Mbatch and Mfull versus Mnull. (A) δ(t) estimated for H. salinarum growth under low (left) and high (right) OS. (B) (αβ)p,c(t) at pH = 5, mM malic acid = 10.
- 1. Darnell CL, Schmid AK. Systems biology approaches to defining transcription regulatory networks in halophilic archaea. Methods. 2015;86:102–114. pmid:25976837
- 2. Woodruff LBA, Boyle NR, Gill RT. Engineering improved ethanol production in Escherichia coli with a genome-wide approach. Metabolic Engineering. 2013;17:1–11. pmid:23376594
- 3. Aryani DC, den Besten HMW, Hazeleger WC, Zwietering MH. Quantifying strain variability in modeling growth of Listeria monocytogenes. International Journal of Food Microbiology. 2015;208:19–29. pmid:26011600
- 4. Gray AN, Koo BM, Shiver AL, Peters JM, Osadnik H, Gross CA. High-throughput bacterial functional genomics in the sequencing era. Curr Opin Microbiol. 2015;27:86–95. pmid:26336012
- 5. Peters JM, Colavin A, Shi H, Czarny TL, Larson MH, Wong S, et al. A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria. Cell. 2016;165(6):1493–1506. pmid:27238023
- 6. Koo BM, Kritikos G, Farelli JD, Todor H, Tong K, Kimsey H, et al. Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus subtilis. Cell Syst. 2017;4(3):291–305. pmid:28189581
- 7. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006;2:2006.0008. pmid:16738554
- 8. Schaechter M. From growth physiology to systems biology. International Microbiology: The Official Journal of the Spanish Society for Microbiology. 2006;9(3):157–161. pmid:17061205
- 9. Egli T. Microbial growth and physiology: a call for better craftsmanship. Frontiers in Microbiology. 2015;6. pmid:25926822
- 10. Neidhardt FC. Apples, oranges and unknown fruit. Nature Reviews Microbiology. 2006;4(12):876–876. pmid:17120340
- 11. Zwietering MH, Jongenburger I, Rombouts FM, van’t Riet K. Modeling of the Bacterial Growth Curve. Applied and Environmental Microbiology. 1990;56(6):1875–1881. pmid:16348228
- 12. Pouillot R, Albert I, Cornu M, Denis JB. Estimation of uncertainty and variability in bacterial growth using Bayesian inference. Application to Listeria monocytogenes. International Journal of Food Microbiology. 2003;81(2):87–104. pmid:12457583
- 13. Nauta MJ. Separation of uncertainty and variability in quantitative microbial risk assessment models. International Journal of Food Microbiology. 2000;57(1–2):9–18.
- 14. Aryani DC, Besten HMWd, Zwietering MH. Quantifying Variability in Growth and Thermal Inactivation Kinetics of Lactobacillus plantarum. Applied and Environmental Microbiology. 2016;82(16):4896–4908. pmid:27260362
- 15. Lianou A, Koutsoumanis KP. Strain variability of the behavior of foodborne bacterial pathogens: A review. International Journal of Food Microbiology. 2013;167(3):310–321. pmid:24184609
- 16. Arroyo-López FN, Orlić S, Querol A, Barrio E. Effects of temperature, pH and sugar concentration on the growth parameters of Saccharomyces cerevisiae, S. kudriavzevii and their interspecific hybrid. International journal of food microbiology. 2009;131(2-3):120–127. pmid:19246112
- 17. Liu B, Liu H, Pan Y, Xie J, Zhao Y. Comparison of the Effects of Environmental Parameters on the Growth Variability of Vibrio parahaemolyticus Coupled with Strain Sources and Genotypes Analyses. Frontiers in Microbiology. 2016;7. pmid:27446034
- 18. Carlin F, Albagnac C, Rida A, Guinebretière MH, Couvert O, Nguyen-the C. Variation of cardinal growth parameters and growth limits according to phylogenetic affiliation in the Bacillus cereus Group. Consequences for risk assessment. Food Microbiology. 2013;33(1):69–76. pmid:23122503
- 19. Tonner PD, Darnell CL, Engelhardt BE, Schmid AK. Detecting differential growth of microbial populations with Gaussian process regression. Genome Research. 2017;27:320–333. pmid:27864351
- 20. Peleg M, Corradini M. Microbial growth curves: what the models tell us and what they cannot. Critical Reviews in Food Science and Nutrition. 2011;51(10):917–945. pmid:21955092
- 21. Peleg M, Normand MD, Corradini MG. The Arrhenius Equation Revisited. Critical Reviews in Food Science and Nutrition. 2012;52(9):830–851. pmid:22698273
- 22. Darnell CL, Tonner PD, Schmidler S, Schmid AK. Systematic discovery of archaeal transcription factor functions in regulatory networks through quantitative phenotyping analysis. mSystems. 2017;2:e00032–17. pmid:28951888
- 23. Bushell FML, Tonner PD, Jabbari S, Schmid AK, Lund PA. Synergistic Impacts of Organic Acids and pH on Growth of Pseudomonas aeruginosa: A Comparison of Parametric and Bayesian Non-parametric Methods to Model Growth. Frontiers in Microbiology. 2019;9:3196. pmid:30671033
- 24. Searle SR, Casella G, McCulloch CE. Variance Components. Wiley Series in Probability and Statistics. Wiley; 1992. Available from: https://books.google.com/books?id=CWcPAQAAMAAJ.
- 25. Van Derlinden E, Lule I, Boons K, Van Impe J. On the influence of the experimental set-up on the heterogeneous heat response of E. coli K12. Procedia Food Science. 2011;1:1067–1072.
- 26. Churchill GA. Fundamentals of experimental design for cDNA microarrays. Nature Genetics. 2002;32:490–495. pmid:12454643
- 27. Baryshnikova A, Costanzo M, Kim Y, Ding H, Koh J, Toufighi K, et al. Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nature Methods. 2010;7(12):1017. pmid:21076421
- 28. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics. 2010;11(10):733+. pmid:20838408
- 29. Fisher RA. XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. Transactions of the Royal Society of Edinburgh. 1919;52(2):399–433.
- 30. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nature Genetics. 2004;36:512. pmid:15052271
- 31. Spor A, Dillmann C, Wang S, de V, Sicard D, Parent E. Hierarchical bayesian modelling for Saccharomyces cerevisiae population dynamics. International Journal of Food Microbiology. 2010;142(1-2):25–35. pmid:20576304
- 32. Heydari J, Lawless C, Lydall DA, Wilkinson DJ. Bayesian hierarchical modelling for inferring genetic interactions in yeast. Journal of the Royal Statistical Society Series C, Applied Statistics. 2016;65(3):367–393. pmid:27134314
- 33. Reimherr M, Nicolae D. Estimating Variance Components in Functional Linear Models With Applications to Genetic Heritability. Journal of the American Statistical Association. 2016;111(513):407–422.
- 34. Ng WV, Kennedy SP, Mahairas GG, Berquist B, Pan M, Shukla HD, et al. Genome sequence of Halobacterium species NRC-1. Proceedings of the National Academy of Sciences. 2000;97(22):12176–12181. pmid:11016950
- 35. Bonneau R, Facciotti MT, Reiss DJ, Schmid AK, Pan M, Kaur A, et al. A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell. Cell. 2007;131(7):1354–1365. pmid:18160043
- 36. Brooks AN, Reiss DJ, Allard A, Wu WJ, Salvanha DM, Plaisier CL, et al. A system-level model for the microbial regulatory genome. Molecular Systems Biology. 2014;10(7):740. pmid:25028489
- 37. Imlay JA. Pathways of oxidative damage. Annual Review of Microbiology. 2003;57:395–418. pmid:14527285
- 38. Imlay JA. The molecular mechanisms and physiological consequences of oxidative stress: lessons from a model bacterium. Nature Reviews Microbiology. 2013;11(7):443–454. pmid:23712352
- 39. Wu Y, Vulić M, Keren I, Lewis K. Role of oxidative stress in persister tolerance. Antimicrobial Agents and Chemotherapy. 2012;56(9):4922–4926. pmid:22777047
- 40. Tonner PD, Pittman AMC, Gulli JG, Sharma K, Schmid AK. A Regulatory Hierarchy Controls the Dynamic Transcriptional Response to Extreme Oxidative Stress in Archaea. PLoS Genet. 2015;11(1):e1004912. pmid:25569531
- 41. Sharma K, Gillum N, Boyd JL, Schmid A. The RosR transcription factor is required for gene expression dynamics in response to extreme oxidative stress in a hypersaline-adapted archaeon. BMC genomics. 2012;13:351. pmid:22846541
- 42. Kaur A, Van PT, Busch CR, Robinson CK, Pan M, Pang WL, et al. Coordination of frontline defense mechanisms under severe oxidative stress. Molecular Systems Biology. 2010;6:393. pmid:20664639
- 43. Soppa J. From genomes to function: haloarchaea as model organisms. Microbiology. 2006;152(3):585–590. pmid:16514139
- 44. Halstead FD, Rauf M, Moiemen NS, Bamford A, Wearn CM, Fraise AP, et al. The Antibacterial Activity of Acetic Acid against Biofilm-Producing Pathogens of Relevance to Burns Patients. PLoS ONE. 2015;10(9):e0136190. pmid:26352256
- 45. Mira NP, Teixeira MC. Microbial mechanisms of tolerance to weak acids: an overview. Frontiers E-books; 2014.
- 46. Zakrzewska A, Eikenhorst Gv, Burggraaff JEC, Vis DJ, Hoefsloot H, Delneri D, et al. Genome-wide analysis of yeast stress survival and tolerance acquisition to analyze the central trade-off between growth rate and cellular robustness. Molecular Biology of the Cell. 2011;22(22):4435–4446. pmid:21965291
- 47. Altıntaş A, Martini J, Mortensen UH, Workman CT. Quantification of oxidative stress phenotypes based on high-throughput growth profiling of protein kinase and phosphatase knockouts. FEMS yeast research. 2016;16(1):fov101. pmid:26564984
- 48. Van Derlinden E, Lule I, Bernaerts K, Van Impe Jf. Quantifying the heterogeneous heat response of Escherichia coli under dynamic temperatures. Journal of Applied Microbiology. 2010;108(4):1123–1135. pmid:19735318
- 49. Ihssen J, Egli T. Specific growth rate and not cell density controls the general stress response in Escherichia coli. Microbiology. 2004;150(Pt 6):1637–1648. pmid:15184550
- 50. Peck RF, Dassarma S, Krebs MP. Homologous gene knockout in the archaeon Halobacterium salinarum with ura3 as a counterselectable marker. Mol Microbiol. 2000;35(3):667–76. pmid:10672188
- 51. Kahm M, Hasenbrink G, Lichtenberg-Fraté H, Ludwig J, Kschischo M. grofit: Fitting Biological Growth Curves with R. Journal of Statistical Software, Articles. 2010;33(7):1–21.
- 52. Jones E, Oliphant T, Peterson P, et al. SciPy: Open source scientific tools for Python; 2001. Available from: http://www.scipy.org/.
- 53. Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. University Press Group Limited; 2006.
- 54. Kaufman CG, Sain SR. Bayesian functional ANOVA modeling using Gaussian process prior distributions. Bayesian Analysis. 2010;5(1):123–149.
- 55. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, et al. Stan: A Probabilistic Programming Language. Journal of Statistical Software, Articles. 2017;76(1):1–32.
- 56. Brooks SP, Gelman A. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics. 1998;7(4):434–455.
- 57. Beer KD, Wurtmann EJ, Pinel N, Baliga NS. Model organisms retain an “ecological memory” of complex ecologically relevant environmental variation. Applied and Environmental Microbiology. 2014;80(6):1821–1831. pmid:24413600
- 58. Ramsay J, Silverman BW. Functional Data Analysis (Springer Series in Statistics). Springer; 2005. Available from: https://www.amazon.com/Functional-Data-Analysis-Springer-Statistics/dp/038740080X?SubscriptionId=AKIAIOBINVZYXZQZ2U3A&tag=chimbori05-20&linkCode=xm2&camp=2025&creative=165953&creativeASIN=038740080X.
- 59. Gelman A. Analysis of variance—why it is more important than ever. The Annals of Statistics. 2005;33(1):1–53.
- 60. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis, Third Edition. 3rd ed. Boca Raton: Chapman and Hall/CRC; 2013.
- 61. Solak E, Murray-Smith R, Leithead WE, Leith DJ, Rasmussen CE. Derivative Observations in Gaussian Process Models of Dynamic Systems. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. NIPS’02. Cambridge, MA, USA: MIT Press; 2002. p. 1057–1064. Available from: http://dl.acm.org/citation.cfm?id=2968618.2968750.
- 62. Hu J, Coombes KR, Morris JS, Baggerly KA. The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. Briefings in Functional Genomics & Proteomics. 2005;3(4):322–331. pmid:15814023
- 63. Nicholson G, Holmes C. A note on statistical repeatability and study design for high-throughput assays. Statistics in Medicine. 2017;36(5):790–798. pmid:27882571
- 64. Mecham BH, Nelson PS, Storey JD. Supervised normalization of microarrays. Bioinformatics. 2010;26(10):1308–1315. pmid:20363728
- 65. Leek JT, Storey JD. Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis. PLoS Genetics. 2007;3(9):e161. pmid:17907809
- 66. Ioannidis JPA. Microarrays and molecular research: noise discovery? Lancet (London, England). 2005;365(9458):454–455. pmid:15705441
- 67. Teschendorff AE, Zhuang J, Widschwendter M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics. 2011;27(11):1496–1505. pmid:21471010
- 68. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–127. pmid:16632515
- 69. Delignette-Muller ML, Rosso L. Biological variability and exposure assessment. International Journal of Food Microbiology. 2000;58(3):203–212. pmid:10939270
- 70. Koutsoumanis KP, Sofos JN. Effect of inoculum size on the combined temperature, pH and Aw limits for growth of Listeria monocytogenes. International Journal of Food Microbiology. 2005;104(1):83–91. pmid:16005535
- 71. Ryall B, Eydallin G, Ferenci T. Culture history and population heterogeneity as determinants of bacterial adaptation: the adaptomics of a single environmental transition. Microbiology and molecular biology reviews: MMBR. 2012;76(3):597–625. pmid:22933562
- 72. Jaloustre S, Guillier L, Morelli E, Noël V, Delignette-Muller ML. Modeling of Clostridium perfringens vegetative cell inactivation in beef-in-sauce products: A meta-analysis using mixed linear models. International Journal of Food Microbiology. 2012;154(1):44–51. pmid:22236760
- 73. Cao R, Francisco-Fernández M, Quinto EJ. A random effect multiplicative heteroscedastic model for bacterial growth. BMC Bioinformatics. 2010;11:77. pmid:20141635
- 74. Bradley MD, Neu D, Bahar F, Welch RD. Inter-laboratory evolution of a model organism and its epistatic effects on mutagenesis screens. Scientific Reports. 2016;6.
- 75. Draper JL, Hansen LM, Bernick DL, Abedrabbo S, Underwood JG, Kong N, et al. Fallacy of the Unique Genome: Sequence Diversity within Single Helicobacter pylori Strains. mBio. 2017;8(1). pmid:28223462