A Bayesian non-parametric mixed-effects model of microbial growth curves

Substantive changes in gene expression, metabolism, and the proteome are manifested in overall changes in microbial population growth. Quantifying how microbes grow is therefore fundamental to areas such as genetics, bioengineering, and food safety. Traditional parametric growth curve models capture the population growth behavior through a set of summarizing parameters. However, estimation of these parameters from data is confounded by random effects such as experimental variability, batch effects or differences in experimental material. A systematic statistical method to identify and correct for such confounding effects in population growth data is not currently available. Further, our previous work has demonstrated that parametric models are insufficient to explain and predict microbial response under non-standard growth conditions. Here we develop a hierarchical Bayesian non-parametric model of population growth that identifies the latent growth behavior and response to perturbation, while simultaneously correcting for random effects in the data. This model enables more accurate estimates of the biological effect of interest, while better accounting for the uncertainty due to technical variation. Additionally, modeling hierarchical variation provides estimates of the relative impact of various confounding effects on measured population growth.

Quantifying how microbes grow in response to stress is required for effective treatment of microbial infections, food safety, and understanding the effects of environmental change. Current models that quantify microbial growth characteristics such as exponential growth rate are based on assumptions that microbial growth curves will adopt a sigmoid form with characteristic lag, logarithmic, and stationary phases. These models are therefore inaccurate when applied to microbes growing under stress. Substantial variability across experiments that measure microbial growth further compounds the issue. Here we report a new statistical model freed from the assumption of optimum growth. This

Introduction
Microbial growth phenotypes inform studies in microbiology, including gene functional discovery, bioengineering process development, and food safety testing [1][2][3]. For example, recent advances in microbial functional genomics and phenotyping, or "phenomics", have enabled transformative insights into gene functions, proving critical for mapping the genotype to phenotype relationship [4]. Methods such as genome-wide CRISPRi [5] and targeted genome-scale deletion libraries [6,7] frequently rely upon accurate quantitation of microbial population growth as an assay to identify novel mutants with significant growth phenotypes. Population growth, as measured by the growth curve of a microbial culture, is an aggregate measure of all cellular processes and captures how microbial cells adapt and survive in their environmental niche [8]. Because microbial culturing is a necessary precursor to many experimental procedures in microbiology [9], reproducible results require accurate quantification of the variability in culture state measured through growth [9,10]. Typical analyses of microbial population growth involve estimating parametric models under the assumptions of standard growth conditions comprised of three successive growth phases: (1) lag phase, in which the population adapts to a new environment, typically fresh growth medium at culture inoculation; (2) log phase, when the population grows exponentially at a rate dependent on nutrients in the environment; and (3) stationary phase, where measurable population growth terminates thereby reaching the culture carrying capacity [11]. Recent studies have shown that the estimates of parameters in these models are highly uncertain [12][13][14]. This uncertainty arises both from factors of biological interest, such as differences in genetic background and environment, as well as uncontrolled technical noise from experimental manipulation of microbial cultures. While such sources of variability can be modeled using fixed and random effects [15][16][17][18][19], parametric population growth models have additional limitations. Specifically, the parametric assumptions of these growth models require that growth measurements match the sigmoid shape expected for the growth curve under optimum conditions. When population growth deviates from the standard sigmoid shape assumed in these models, model extensions must be developed on a case by case basis for each new experimental perturbation [20,21]. Additionally, we have shown in previous work that in cases such as extreme stress or strongly deleterious mutations, no parametric growth model accurately represents the growth curve, regardless of extension [19,22,23].
Factors affecting microbial growth measurements include both fixed and random effects [24]. Fixed effects are assumed to be drawn from a finite set of perturbations of interest, for example the effect of different concentrations of a chemical on growth that are entirely represented in the dataset. Random effects, conversely, can be viewed as a random sample from a larger set of interest. For example, repeating the same design over many experiments corresponds to sampling the random experimental effect from the theoretical set of all possible experiments that could be conducted with this design [3,25]. Random effects arising from repeated experimental design are typically referred to as batch effects [26,27]. Batch effects are often a significant component of measurement noise in high-throughput genomics experiments [28]. However, random effects are not always due to experimental noise, and may represent quantities of direct scientific interest; for example, assaying a set of genetic backgrounds may be viewed as sampling from the set of all possible genetic variants [29][30][31][32][33]. Models which include both fixed and random effects are referred to as mixed effects models.
In this study we present phenom, a general model for analysis of phenomic growth curve experiments based on a Bayesian non-parametric functional mixed effects model of microbial growth. We demonstrate the utility of phenom to analyze population growth measurements of two microorganisms: the hypersaline adapted archaeon, Halobacterium salinarum; and the opportunistic bacterial pathogen, Pseudomonas aeruginosa. H. salinarum is a model organism for transcriptional regulation of stress response in the domain of life Archaea [34][35][36]. H. salinarum is particularly well adapted to resisting oxidative stress (OS), which arises from the buildup of reactive oxygen species and causes damage to many critical cellular components, including DNA, protein, and lipids [37][38][39][40][41][42][43]. Population growth measurements of H. salinarum under OS have been used previously to quantify these harmful effects on physiology, as well as identify regulatory factors important for OS survival [22,[40][41][42]. The presence of batch effects in H. salinarum OS response was reported (and corrected for) previously [19], but these efforts did not include modeling of individual batch effects for each term in the model. This motivated the explicit deconstruction of batch effects between different factors (e.g. strain and stress), which we have implemented in phenom and reported here.
Pseudomonas aeruginosa is an opportunistic microbial pathogen and a growing problem in hospital-borne infections. Rising antimicrobial resistance of these organisms has necessitated the development of alternative treatment strategies. For example, topical treatment of infected burn wounds with acetic or organic acids (OAs) has been successful [44]. OA impact on growth depends on external pH levels-in acidic environments the OA does not dissociate, but rather freely traverses the cellular membrane as an uncharged particle. Within the neutral cytoplasm, the OA dissociates, and the protons released induce acid stress [45]. Here we apply phenom to the P. aeruginosa dataset, which is foundational for a larger study of P. aeruginosa strains responding to pH and OA perturbation as a potential novel treatment of pathogenic bacterial infections [23].
Stress occurs constantly in the environment: as conditions change, mild to severe cellular damage occurs, and cells must regulate their molecular components to survive [46][47][48][49]. Population growth measurements are particularly vital to the study of stress response by providing a quantitative measure of growth differences against a non-stressed control [1]. Our model recovers fixed effects due to high and low levels of OS in H. salinarum and interactions between organic acid concentration and pH in P. aeruginosa. Random effects from multiple sources are corrected, thus enabling more accurate estimates of the biological significance of the stress treatment effect. Notably, in cases where random effect and fixed effect sizes are comparable, we demonstrate that mixed modeling is critical for accurate quantification of model uncertainty. If random effects are not included in the model, the significance of the effect of stress treatments on population growth can be erroneously overestimated. We discuss the implications of these findings for multiple areas of microbiology research.

Experimental growth data
H. salinarum growth was performed as described previously [22]. Briefly, starter cultures of H. salinarum NRC-1 Δura3 control strain [50] were recovered from frozen stock and streaked on solid medium for single colonies. Four individual colonies per strain were grown at 42˚C with shaking at 225 r.p.m. to an optical density at 600 nm (OD 600 ) �1.8-2.0 in 3 mL of Complete Medium (CM; 250 NaCl, 20 g/l MgSO4•7H2O, 3 g/l sodium citrate, 2 g/l KCl, 10 g/l peptone) supplemented with uracil (50 μg/ml). These starter cultures were diluted to OD 600 �0.05 in 200 μl CM ("biological replicates") then transferred in triplicate ("technical replicates") into individual wells of a microplate. Cultures were grown in a high throughput microplate reader (Bioscreen C, Growth Curves USA, Piscataway, NJ), and culture density was monitored automatically by OD 600 every 30 minutes for 48 hours at 42˚C. High and low levels of OS were induced by adding 0.333 mM and 0.083 mM of paraquat to the media, respectively, at culture inoculation.
For P. aeruginosa, laboratory strain PAO1 (ATCC 15692) was grown as described in reference [23]. Briefly, cultures were grown in M9 minimal media supplemented with 0.4% (w/v) glucose and 0.2% (w/v) casamino acids and buffered with 100 mM each of MES and MOPS buffers. Initial cultures were diluted to a starting OD of 0.05 before growth in a microplate reader at a total volume of 200 μl per well. Population growth was measured with a CLARIOstar automated microplate reader (BMG Labtech) at 37˚C with 300 rpm continuous shaking. The OD 600 was recorded automatically every 15 minutes for a total of 24 hours. A full factorial design of pH and OA concentration was performed for benzoate, citric acid, and malic acid. An experimental batch corresponded to two repetitions of the experiment on separate days with a minimum of three biological replicates of each condition on each day. Two batches for each OA were performed.
All data generated or analyzed during this study are included in this published article (see github repository associated with this study, https://github.com/ptonner/phenom).

Parametric growth curve estimation
For comparison with our non-parametric methods, parametric growth curve models were estimated using the grofit package in R with default parameters [51]. The logistic model was used to fit each curve. Kernel density estimates of parameter distributions were calculated with the Python scipy package with default kernel bandwidth parameters [52].

phenom: A hierarchical Gaussian process model of microbial growth
Gaussian processes. A Gaussian process (GP) defines a non-parametric distribution over functions f(t), defined by the property that any finite set of observations of f follow a multivariate normal distribution [53]. A GP is fully defined by a mean functionf ðtÞ and a covariance function κ(t, t 0 ) (Eq (1)): f ðtÞ � GPðf ðtÞ; kðt; t 0 ÞÞ ð1Þ GPs are commonly used for non-parametric curve fitting [53] wheref ðtÞ is typically set to 0, which we do here. Similarly, we use a common choice for covariance function defined by a radial basis function (RBF) kernel (Eq (2)).
Where σ 2 is the variance and ℓ is the length-scale. The parameter σ 2 controls the overall magnitude of fluctuation in the population of functions described in the GP distribution, while ℓ controls the expected smoothness, with larger ℓ making smoother, slower varying functions more likely. In the process of non-parametric modeling of growth curves, these parameters are adaptively estimated from the dataset. Fixed effects. We first define the fixed effects models used in this study; these will be augmented with random effects in the next section. We consider fixed effects models of increasing complexity: a mean growth phenotype, a single treatment phenotype, and a combinatorial phenotype with interactions between treatments. All of these models fall under the functional analysis of variance (ANOVA) framework [22,54]. To estimate a mean growth profile, as in the case of measuring a single condition, a mean function m(t) is estimated from the data by modeling each replicate y r (t) for 1 � r � R as consisting of an unknown mean function observed with additive noise (Eq (3)).
Where m(t) � GP(0, κ m (t, t 0 )) provides a prior distribution over m, and κ m is an RBF kernel with hyperparameters fs 2 m ; ' m g. Here � r ðtÞ � Nð0; s 2 y IÞ is Gaussian white noise. When estimating the effect of a perturbation on growth, as in the case of OS, we add a second function δ(t) that represents the effect of the stress being considered. The model then becomes (Eq (4)): where δ(t) � GP(0, κ δ (t, t 0 )) also follows a GP prior independently of m, and κ δ has hyperparameters fs 2 d ; ' d g. When incorporating possible interaction effects such as those between pH and organic acids in the P. aeruginosa dataset, the model becomes (Eq (5)): y r ðt; p; mÞ ¼ mðtÞ þ a p ðtÞ þ b c ðtÞ þ ðabÞ p;c ðtÞ þ � r ðtÞ; ð5Þ for pH p and molar acid concentration c, with α p (t) representing the main effect of pH, β c (t) the main effect of acid concentration, and (αβ) p,c (t) the interaction between them. Each effect is drawn from a treatment specific GP prior (Eq (6)).
Again, each covariance function is specified by a RBF kernel with corresponding variance and lengthscale hyperparameters that adapt to the observed data. All models in this section correspond to M null for their respective analyses, as they do not include any random effects.
Random effects. The first random effects added to the model were those used to account for batch effects, in the model M batch . Under this model, each fixed functional effect is modified by a GP describing the population of possible batch-specific curves. For example, under the model of interaction effects on growth (Eq 5), replicate r from batch k is modeled as (Eq 7)): M batch : y k;r ðt; p; mÞ ¼ mðtÞ þ a p ðtÞ þ b c ðtÞ þ ðabÞ p;c ðtÞ zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl }|ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl { with kernel hyperparameters fs 2 a;batch ; ' a;batch g, fs 2 b;batch ; ' b;batch g, and fs 2 ab;batch ; ' ab;batch g that are distinct from those for the corresponding fixed effects, allowing for different variance and lengthscales between fixed and random effects. Other M null models are converted to M batch similarly, with each fixed effect becoming a mean of a GP prior for each batch effect.
M full develops the hierarchy one step deeper by adding replicate effects to M batch (Eq 7). Specifically, the error model � k,r is now described by a GP: � k,r � GP(0, κ y (t, t 0 )) with corresponding hyperparameters, accounting for replicate-specific variability rather than simply white noise.
Bayesian inference. The unknown functions (m(t), δ(t), α p (t), β c (t), and (αβ) p,c (t)), kernel hyperparameters (s 2 l and ℓ l ) for each group of latent functions, and observation noise parameters (s 2 y ) are all estimated by Bayesian statistical inference. In Bayesian inference, prior distributions on unknown quantities (e.g. p(m(t), θ m ) = p(m(t)|θ m ) × p(θ m )) are combined with the likelihood, p(y(t)|m(t)) to obtain the posterior distribution p(m(t), θ m |y(t)). Latent functions are grouped by shared kernel hyperparameters y l ¼ fs 2 l ; ' l g into related sets (e.g. treatment effects, interaction effects, batch effects), which then provide the GP prior for the latent function. For each group, s 2 l is assigned a Gamma(α, β) prior, with fixed effects assigned as a Gamma(10, 10) prior and random effects assigned a Gamma(7, 10) prior. ℓ l was assigned an inverse-Gamma(α, β) prior, with parameter α = 6 and β = 1 for H. salinarum, and α = 2, β = 3 for P. aeruginosa fixed effects and α = 10, β = 1 for P. aeruginosa random effects. Noise variance s 2 y was also assigned a gamma prior. Bayesian inference was then performed, with the posterior distribution obtained by sampling using Markov chain Monte Carlo (MCMC) implemented with the Stan library, which uses a Hamilitonian Monte-Carlo procedure with No-U-turn sampling [55]. Multiple chains were run to diagnose convergence, with all parameter posterior means confirmed to have converged withinR < 1:1 as recommended [56].

Hierarchical batch effects typical in phenomics datasets render parametric models ineffective
In the dataset used here, population growth for each of P. aeruginosa and H. salinarum cultures was monitored under standard (non-stressed) conditions vs. stress conditions (see Materials and methods and references [22,23] for precise definition of "standard conditions" for each organism). Specifically, cultures were grown in liquid medium in a high throughput growth plate reader that measured population density at 30 minute intervals over the course of 24 hours (P. aeruginosa) or 48 hours (H. salinarum); the resulting data are shown in Fig 1. Experimental designs for each organism included biological replicates (growth curves from different colonies on a plate), technical replicates (multiple growth curves from the same colony), varying conditions (stress vs standard), and batches. Throughout, we define "batch" as a single run of the high throughput growth plate reader. In each run, this plate reader measures the growth of 200 individual cultures across a range of perturbations, including varying stress conditions and genetic mutations (see Methods). H. salinarum was grown under high (0.333 mM paraquat (PQ)) and low (0.083 mM PQ) levels of oxidative stress (OS); the data are combined from published [19,22,41] and unpublished studies ( Fig 1A). The OS responses of H. salinarum were compared to a control of standard growth in rich medium, representing optimal conditions for the population. The experimental design was replicated in biological quadruplicate and technical triplicate, across nine batches (Fig 1A, individual curves and axes). P. aeruginosa was grown in the presence of increasing concentrations of three different organic acid (OA) chemicals (0-20mM; benzoate, citric acid, and malic acid), each combined with a gradient of pH (5.0-7.0) [23]. Each P. aeruginosa growth condition was repeated across 3 biological replicates and two batches (Fig 1B). The different P. aeruginosa and H. salinarum experimental designs with varying numbers of replicates at each level provides a rich test bed for modeling the impact of random effects with phenom ( Fig 1B, S1 and S2 Figs).
Figs 1 and 2 demonstrate the two key issues described above and addressed in this paper. First, batch effects are present in both H. salinarum and the P. aeruginosa datasets. For H. salinarum, clear differences in growth under both standard and stress conditions are observed in the raw data across experimental batches (i.e. separate runs of the growth plate reader instrument; Fig 1). Some batches show a different phenotype, with either a complete cessation of growth or an intermediate effect with decreased growth relative to standard conditions. For example, in some batches, populations stressed with low OS grow at the same rate and reach the same carrying capacity as populations grown under standard conditions. For P. aeruginosa, a clear difference between batches grown under 10 mM citric acid at pH = 5.5 is observed [ Fig  1B (graph in fourth column, third row) and Fig 2D]. Like with citric acid, batch effects were also found in some of the other conditions considered (e.g. growth under malic acid, S1 and S2 Figs).
Second, standard parametric growth curve models fail to describe experimental measurements adequately (Fig 2A and 2B), as we have shown previously with both datasets [19,22,23]. In Fig 2, we examined the impact of batch and replicate effects on our data by considering how they change parameters estimated from a mixed effects parametric model of population growth [32]. We focused on calculating μ max , the maximum instantaneous growth rate attained by the population, as this is a commonly used parameter for comparisons between conditions [19,57]. Variation in μ max estimates were observed both on the replicate and batch level, as shown by the kernel density estimates (KDE) of μ max for each stress level (S3 Fig). The variance in μ max is remarkably high: the 95% confidence interval for μ max under standard growth is 0.050-0.141, a nearly 3-fold change between the lower and upper interval limits. Thus, while the t-test conducted on μ max estimates between standard conditions and each stress level is statistically significant (S3 Fig), it is difficult to conclude: (a) what the true magnitude of the stress effects may be; and (b) to what degree the variation due to replicate and batch should inform biological conclusions. The error of the logistic growth model under each PQ condition was also examined. Error increased under high OS (S4 Fig). High OS induces a growth phenotype that deviates heavily from the sigmoidal growth curve assumed in the logistic model as well as in other commonly used growth models. This leads to a poor fit under the high OS condition as has been shown previously (S4 Fig, [19]). The residuals under standard, low, and high OS conditions also appear to be dependent. Our previous work also demonstrated poor fits to the P. aeruginosa data using parametric models [23]. Taken together, the initial assessment of these two datasets indicates that: (a) technical variation due to batch and replicate in growth curve data can be high; and (b) commonly used standard parametric models are not able to adequately capture or correct for these sources of variability. These sources of error need to be corrected in order to model true growth behavior and inform biological conclusions from the data.

A hierarchical Bayesian model of functional random effects in microbial growth
We previously established the ability of non-parametric Bayesian methods to improve the modeling of growth phenotypes [19,22,23]. Here, we describe phenom, a fully hierarchical Bayesian non-parametric functional mixed effects model for population growth data. We highlight the utility of phenom to correct for confounding, random effects in growth phenotypes.
In order to model both biological and technical variation in microbial growth (Fig 3), we first assume that a set of population growth measurements are driven by an (unobserved) population curve m(t) (Fig 3A, blue curve) of unknown shape. For example, m(t) might represent the average growth behavior of an organism under standard conditions. This mean growth behavior may be altered by a treatment effect, represented by an additional unknown curve δ(t) (Fig 3A, orange curve). For example δ(t) may represent the effects on growth induced by low or high levels of OS (Fig 2A).

The average growth behavior of a population under stress conditions would then be described by the curve f(t) = m(t) + δ(t).
When considering a combinatorial experimental design, such as that described for P. aeruginosa growth (Fig 1B), we model independent effects of different treatments as well as their interaction via the form (Eq (9)): yðt; i; jÞ ¼ mðtÞ þ a i ðtÞ þ b j ðtÞ þ ðabÞ i;j ðtÞ: ð9Þ Here, y(t, i, j) denotes the observed population size at time t with treatments i and j of two independent stress conditions. Additionally, α i (t) and β j (t) are the independent effects of each stress condition, and (αβ) i,j (t) is their interaction. This model corresponds to a functional analysis of variance [58], which we have previously used to estimate independent and interaction effects of microbial genetics and stress [22]. For the analysis of P. aeruginosa, we model the effect of pH (α p for pH = p), organic acid combination (β c for concentration = c) and their interaction ((αβ) p,c ), as well as their random functional effect equivalents (see Section "phenom: A hierarchical Gaussian process model of microbial growth").
Variability around these fixed effect growth models is described by additional, random curves associated with two major sources of variation: batch and replicate (Fig 3B and 3C). Batches correspond to a single high-throughput growth experiment and replicates are the individual curve observations within a batch. Using phenom throughout this study, we only compare replicates that are contained within the same batch. This is due to the nested structure between batch and replicates (Fig 3). Noise due to both replicate and batch do not appear to be independent identically distributed (iid), as observed in the correlated residuals around the mean for each experimental variate (S5A and S5B Fig). Each observed growth curve is therefore described by a combination of the fixed effects and the corresponding batch and replicate effects (Fig 3D). Both replicate and batch variation are modeled as random effects because the variation due to both sources cannot be replicated, i.e. a specific batch effect cannot be purposefully re-introduced in subsequent experiments. Instead, these variates are assumed to be sampled from a latent distribution [59]. Combining the fixed and random effects, we arrive at a mixed-effects model of microbial phenotypes. We adopted a hierarchical Bayesian framework to model these mixed effects. In this framework, batch effects are described by a shared generative distribution, allowing them to take on distinct values while still pooling across replicates for accurately estimating the generating distribution [60]. We use Gaussian process (GP) distributions for all groups in the model. GPs are flexible, non-parametric distributions suitable for smooth functions [53]. To assess the impact of incorporating random effects on estimation of the treatment effect of interest, we analyze three models of increasing complexity: M null excludes all hierarchical random effects, M batch incorporates batch variation only, and M full incorporates both batch and replicate variation. These models, collectively called phenom, were implemented using the probabilistic programming language Stan [55] to perform Bayesian statistical inference for all unknown functions and model parameters through Hamiltonian Monte Carlo sampling (see Materials and methods).
In previous work [19] we identified and corrected for batch effects in a single transcription factor mutant's stress response, but this model did not provide an explicit deconstruction of batch effects between different factors (e.g. strain and stress) and could therefore not determine which factors were most strongly impacted by batch effects. Moreover, this approach utilized a standard GP regression framework, which has well-established limitations on dataset size, limiting its applicability to the large datasets we consider here. In reference [22] we described a functional ANOVA model for microbial growth phenotypes, which corresponds to the M null model in the phenom case. Again, a global batch effects term was included but individual batch effects were not modeled, and the computational approach utilized (Gibbs sampling) was prohibitively slow for the complete phenom model. phenom represents a significant advance on these previous modeling approaches and computational methodologies.
In order to demonstrate the impact of batch effects on the conclusions drawn from the analysis of microbial growth data, we estimated the latent functions driving both H. salinarum and P. aeruginosa growth using the M null model of phenom, with each batch analyzed separately (Fig 4). This corresponds to the analysis that would be conducted after generating any single set of experiments from a batch, without considering or controlling for batch effects, and therefore provides a test of the impact of ignoring batch effects.
For H. salinarum, growth data under standard conditions was used to estimate a single mean function, m(t), and fixed effects were estimated for differential growth under low and high OS as δ(t) (Fig 4A). For the P. aeruginosa dataset, batch effects on the interaction between pH and organic acid concentration was represented by a function (αβ) p,c (t), again estimated non-parametrically ( Fig 4B). However, rather than reporting (αβ) p,c (t) directly, we report its time derivative, which has the interpretation of instantaneous growth rate rather than absolute amount of growth, and provides an alternative metric for assessing the significance of a treatment effect on growth [61]. Specifically, assessing growth curve models can benefit from the estimates of derivatives as they may more accurately represent the differences between growth curves [58].
Fitting the M null model to each separate batch reveals that the posterior distributions obtained for each function of interest (m(t), δ(t), and (αβ) p,c (t)) are highly variable across batches (Fig 4). This is observed in both the H. salinarum and P. aeruginosa datasets, where the experimental conditions, and therefore the underlying true mean functions, remain constant across batches in each case. Such variability can impact conclusions. We specifically assess the changes in statistically significant treatment effects, i.e. at time points where the effect (δ(t) or (αβ) p,c (t)) has a 95% posterior credible interval excluding zero, indicating high confidence that the treatment effect at that time-point differs from the control. For example, in the low OS condition in the H. salinarum dataset, both the statistical significance of δ(t) and the sign (improved vs. impaired growth) differs between batches (Fig 4A, center). Additionally, the effect of low oxidative stress at time zero is estimated to be non-zero for many of the batches. This is due to technical variation that introduces an artificial offset in OD measurements at the beginning of the growth experiment. Such variation can arise from various factors, including variation between growth state in starter cultures and technical variation in plate reader measurements at low OD (Fig 1A). A similar batch variability was observed under high OS, but due to the stronger effect of the stress perturbation, estimates of δ(t) are less affected by batch and replicate variation (Fig 4A, right). Similarly, the batch variability observed in the raw P. aeruginosa growth data (Fig 1B) results in significantly different posterior estimates of the interaction effect (αβ) p,c (t) across batches, as seen by the lack of overlap between 95% credible intervals (Fig 4B). Differences observed include the timing and length of negative growth impact (benzoate and citric acid), and completely opposite effects with either strong or no interaction (malic acid). In addition, the posterior variance of each function, which indicates the level of uncertainty remaining, is low for each batch modeled separately. This indicates high confidence in the estimated function despite observed differences across batches. These analyses suggest that use of a single experimental batch leads to overconfidence in explaining the true underlying growth behavior.

Hierarchical models correct for batch effects in growth data
To demonstrate the use of phenom to combat the impact of batch effects on growth curve analysis, we combined data across all batches and performed the analysis using each of the M null , M batch , and M full models (Fig 5). Estimates of m(t) between each model were largely similar, likely due to the abundance of data present to estimate this variable (S6 Fig). Instead, we focus on the estimates of δ(t) for low and high OS response of H. salinarum (Fig 5A) and the

PLOS COMPUTATIONAL BIOLOGY
interaction (αβ) p,c between pH and OA concentration effects on P. aeruginosa growth ( Fig  5B). In cases where M null differs from M batch and M full , this indicates an inability for this model to correctly represent the uncertainty due to random effects in the data, which have been shown to be prevalent across different batches of experiments (Fig 4).
Growth impairment in the presence of low OS relative to standard conditions (i.e. δ(t)) is estimated to be significant during the time points of �0-13 and �19-40 hours under M null . In contrast, only time points �23-31 are significantly non-zero under M batch , and no significant effect is identified under M full (Fig 5A, left). Conversely, due to the stronger stress effect in the high OS condition (Fig 5A, right), estimates of δ(t) were significantly non-zero under all three models, with only minor differences between the three model estimates. This highlights the importance of controlling for batch and replicate variability as in M full : even when estimating the low OS treatment effect under M null with all available data, without accounting for batch and replicate random effects the posterior estimates of δ(t) are overconfident and do not accurately represent the uncertainty with respect to the true treatment effect. The lack of significance under M full suggests that additional data are needed to confidently identify the true treatment effect in the presence of batch and replicate variation. Modeling the batch effects has also corrected for variability in treatment effects due to technical variation at the inoculation of the growth plate (time zero of the experiments) (S7 Fig). The impact of modeling hierarchical variation on estimating interaction effects in P. aeruginosa growth was condition dependent (Fig 5B). Across all conditions, a decrease in posterior certainty on the true shape of the underlying function was again observed under M batch and M full . For benzoate and malic acid, the interaction between pH and acid concentration no longer appears to be a significant effect after accounting for batch and replicate variation, while the larger interaction under citric acid remains significant. As in the comparison of oxidative stress treatments in H. salinarum, stronger effect sizes are required to be confidently distinguished in the face of batch and replicate variability. Finally, the relative conclusions made for the absolute function scale are comparable to those of the derivative estimates for P. aeruginosa, highlighting the flexibility with which treatment effects can be analyzed as most relevant to the researcher (S8 Fig).
For both H. salinarum response to OS and P. aeruginosa growth under pH and OA exposure, an increase in posterior variance was observed under M batch and M full compared to M null (S9 Fig). However, posterior variance of δ(t) in the H. salinarum OS response was higher under M batch compared to M full . In this case, controlling for replicate effects appears to increase the signal needed to identify δ(t). In contrast, these variances are equal in the P. aeruginosa data, indicating that the relative improvement in variance afforded by modeling batch vs. replicate effects may be dataset dependent.

Variance components demonstrate the importance of controlling for batch effects
Variance components, which correspond to the estimated variance of each effect in the model, can be used to compare the impact each group has on the process of interest [24]. To better understand sources of variability in growth curve studies, we used phenom to estimate the variance components for each dataset above. In our hierarchical non-parametric setup, these variance components are the variance hyperparameters (e.g. σ 2 ) of the Gaussian process kernels for each fixed and random effect group. These parameters control the magnitude of function fluctuations modeled by the GP distribution. Larger variance implies higher effect sizes and therefore a larger impact on the observations. We show the value of variance components by considering the effects identified by M full for H. salinarum under low OS (Fig 6). The variance of the data is partitioned between the mean growth (m(t)), the OS (δ(t)), batch effects (batch curves of m(t) and δ(t)), biological noise (e.g. replicate variability) and instrument noise (s 2 y ). This analysis confirms that batch effects, compared to the other sources of experimental variability in the dataset (replicate noise and measurement error), are between 2 to 10 times more impactful on the phenotype measurements. Additionally, variance components enable comparisons between the experimental and treatment factors in the data. Of particular note is that the variance of the treatment of interest, δ(t), and the batch effects are similar in magnitude, at least in the case of a low-magnitude stress such as 0.083 PQ for H. salinarum. This suggests that proper modeling of this treatment requires both sufficient batch replication and accurate modeling of batch effects in those data. Future studies of similar phenotypes can be guided by these estimates in experimental design, choosing an appropriate batch replication for the degree of noise expected [62]. However, the extent of replication required may depend upon the dataset (factorial design, treatment severity, etc). Taken together, variance components provide an aggregated view of the contribution by various factors and guide future experimentation. Iterative rounds of phenom analysis and growth experiments can then determine the best suited designs, for example by leveraging the estimated batch effect variance from pilot experiments to determine the number of batch replications necessary to reliably estimate a treatment effect of given magnitude. The use of phenom for such formal statistical experimental design calculations represents an exciting direction for future work.

Discussion
We have provided a framework to test and control for random effects in microbial growth data using the hierarchical non-parametric Bayesian model, phenom (Fig 3). Analysis with phenom indicates that random effects (both batch and replicate) appear in the two microbial population growth datasets studied here, and constitute significant portions of the variability (Fig  1). Failure to correct for these effects confounds the interpretation of growth phenotypes for factors of interest in a large scale phenotyping analysis (Fig 4). phenom controls for these random effects and provides accurate estimates of the growth behavior of interest (Fig 5). Additionally, phenom can be used to estimate variance components, providing information about the relative impact of various sources of noise in the data (Fig 6). Controlling for batch effects in these datasets was therefore key to making accurate biological conclusions.
Related fields of functional genomics, such as transcriptomics, have seen considerable interest in controlling for different experimental sources of variation, broadly labeled as batch effects [28,[62][63][64][65][66][67]. These studies have shown that differences between batches first need to be corrected to avoid erroneous conclusions [68]. Here we have shown that, like in transcriptomics data, controlling for sources of variation in phenomics data-particularly due to batch -are an important step in making accurate biological conclusions regarding population growth. Additionally, the use of random batch effects in phenom highlights cases where additional information may be gained by further experimentation. Specifically, in cases where treatment effects differ strongly across batches, there may be underlying biological differences driving the variation. Follow-up experimental designs can then aim to delineate these effects directly in a way not confounded by batch. phenom establishes a complete and general method of controlling batch effects in microbial growth phenotypes, overcoming significant weaknesses of previously developed techniques. Although we focus here on replicate and batch variation, the phenom model is easily extended to incorporate alternative or additional random and fixed effects appropriate for settings with other sources of variation. For example, depending on the experimental design, phenom could control for variation among labs, experimental material, culture history, or genetic background [25,[69][70][71][72][73][74][75]. phenom flexibly incorporates additional sources of variation and/or interaction between design variables, as demonstrated with the two different designs analyzed for H. salinarum and P. aeruginosa here. This flexibility allows phenom to be applied to control for many sources of technical variation within microbial population growth data, thereby improving the analysis and resulting conclusions regarding quantitative microbial phenotypes. We therefore expect our model to find broad applications in fields such as bioprocess control, microbial bioengineering, and microbial physiology.