In silico model development and optimization of in vitro lung cell population growth

Tissue engineering predominantly relies on trial and error in vitro and ex vivo experiments to develop protocols and bioreactors to generate functional tissues. As an alternative, in silico methods have the potential to significantly reduce the timelines and costs of experimental programs for tissue engineering. In this paper, we propose a methodology to formulate, select, calibrate, and test mathematical models to predict cell population growth as a function of the biochemical environment and to design optimal experimental protocols for model inference of in silico model parameters. We systematically combine methods from the experimental design, mathematical statistics, and optimization literature to develop unique and explainable mathematical models for cell population dynamics. The proposed methodology is applied to the development of this first published model for a population of the airway-relevant bronchio-alveolar epithelial (BEAS-2B) cell line as a function of the concentration of metabolic-related biochemical substrates. The resulting model is a system of ordinary differential equations that predict the temporal dynamics of BEAS-2B cell populations as a function of the initial seeded cell population and the glucose, oxygen, and lactate concentrations in the growth media, using seven parameters rigorously inferred from optimally designed in vitro experiments.


Introduction
Tissue engineering is a subfield of biomedical engineering that aims to construct functional tissues and organs using engineering principles applied to biological systems.This field has long relied on rigorous in vitro, ex vivo, and in vivo empirical studies to identify culture conditions that achieve experimental objectives, such as maximizing cell yield/function or minimize variability, among others [1].Once optimal culture conditions have been identified, bioreactor devices and associated protocols can be designed to consistently and economically achieve them.
In silico approaches, i.e., the use of mathematical models and relevant computational tools, have become a new paradigm in bioengineering and biomedical studies over the last few years [2].In tissue engineering applications, in silico approaches help researchers to formulate experimentally testable hypotheses for how the cells behave with quantitative and qualitative predictions about cell populations and population ratios [3].In addition, mathematical models can improve our understanding of complex biological phenomena, guide experimental studies, and support the design and optimization of bioreactors and associated experimental protocols [4] for scaled-up production of functional biological tissues.Notably, using mathematical models for tissue engineering applications allows researchers to leverage advances in Simulation-based Design & Optimization (SBDO), a diverse collection of methods and practices for the efficient, systematic use of computer and physical models to support the design and optimization of engineering systems.SBDO has the potential to significantly advance tissue engineering by accelerating discovery and by reducing the time and cost required for the development of next-generation bioreactor devices and experimental protocols [5,6].
Several mathematical models have been proposed for neotissue growth focusing on bone [7,8], cartilage [9], and neural tissues [10,11].Most of these models consist of a system of ordinary and partial differential equations for the cell population and biochemical concentrations [4,8,[10][11][12], while others also consider mechanical cues such as shear stress [7,9,13].These differential equations include coefficients and mathematical expressions representing different aspects of system dynamics, such as diffusion rates, chemical reactions, and modulating effects of biochemical and mechanical cues, all of which must be calibrated and validated.
Due to data availability, cost, and timeline constraints, not all the mathematical models found in the literature have been calibrated and/or validated based on specially designed in vitro or ex vivo experiments [9].Instead, researchers may rely on model parameters estimated under different experimental contexts [7], or calibrate their models to reproduce qualitative behavior only [14].Additionally, in most cases, the proposed models have not been analyzed for uniqueness or identifiability [7,8,10,11,13].Without this, the estimated subset of model parameters is not guaranteed to be physically consistent [12,15] and generalizable.
One of the potential application areas of the SBDO paradigm is the engineering of functional lung and airway tissues, organoids, and even whole organs, with potential applications in screening and development of drug therapies [16], disease modeling [17] and, eventually, to create de novo tissues and organs for human transplantation [18].A large body of work has focused on engineering airway tissues using synthetic [19][20][21] and donor-derived scaffolds [22,23].Among the many scientific and translational challenges of this line of research, one of the most significant is the design and optimization of protocols and devices to sustain the relevant cells as they deposit and attach to suitable scaffolds, proliferate, migrate, and differentiate into the targeted cell types required for functional tissues.A promising approach focuses on creating scaffolds through partial or total decellularization of donor organs.This strategy can result in 3D scaffolds that already have the necessary biochemical and mechanical cues needed for subsequent recellularization [22,24] with recipient-derived adult cells [24], embryonic stem cells [25], or induced pluripotent stem cells (iPSC) [26,27].Still heavily under research, several airway-relevant cell lines are used instead of donor cells for both accelerating discovery and proof-of-concept studies, including BEAS-2Bs [28][29][30], A549s [31], Calu-3s [32], and human tracheal epithelial cells [33].Despite the utility of these cell types, no mathematical models currently exist to describe their population dynamics under targeted culture conditions in vitro and/or ex vivo.The availability of such mathematical models can help accelerate discovery and translation in tissue engineering.
In this work, we propose a detailed, rigorous methodology for developing in silico models for neotissue growth in vitro and ex vivo.We propose a model-based design of experimental protocols (MBDEP) approach that, leveraging these in silico models, defines the spatio-temporal sampling frequency required to optimally infer the model parameters based on experimental data.Starting from a set of biologically-informed model proposals, the proposed model development methodology uniquely combines methods from model inference (e.g., non-linear regression), model selection (data splitting), design of experiments, mathematical statistics (e.g., identifiability analysis [15,34]), sensitivity analysis [35]), and optimization.
As a case study to illustrate our model development approach, this paper describes the development of the first mathematical model capable of predicting the population dynamics of bronchio-alveolar epithelial cells (BEAS-2Bs).BEAS-2Bs are non-cancer, immortalized cells that grow in monolayers [36], and replace normal human bronchial epithelial cells as a model in various toxicology studies [37].BEAS-2Bs were chosen to showcase the proposed methodology because they are often used in tissue engineering, organ regeneration, and transplantation studies, including decellularization and recellularization of airway tissue scaffolds [24,38,39], due to their ability to grow in a lab setting and mimic the function of the human airway epithelium [29,37].Thus, the mathematical model developed in this paper will be directly applicable to these contexts.Importantly, the proposed model development methodology is directly applicable to the formulation, calibration and validation of mathematical models for any other single cell line population and, with a suitable family of model proposals, to multicellular population dynamics.

Experimental setup
Cell culture.In vitro experiments are run for five days with four initial cell populations of 25,000, 50,000, 100,000, and 200,000 cells/well in 6-well plates.Each replicate has three wells seeded with the specified population of BEAS-2Bs [28,37].Each experiment has three replicates (3 × 3 = 9 total wells per initial cell density).The experiments did not involve any media change to observe the effect of more extreme concentrations.We ran four experiments at different glucose and oxygen levels to see their effects on cell population dynamics.For the first experiment, the media was 3 mL Dulbecco's Modified Eagle Medium (DMEM, Gibco, USA) with high glucose and pyruvate, and the cells were cultured at 37˚C, normoxic incubator [40] (18.6% oxygen), with 5% carbon dioxide.The second experiment used the same configuration as experiment one with DMEM with low glucose and pyruvate as the culture medium.For the third and fourth experiments, we altered experiments one and two by culturing the cells at 37˚C in the tri-gas incubator [41], CellXpert C170i (Eppendorf, Germany), with 5% oxygen and 5% carbon dioxide.
Measurements.In each experiment, we removed the plates from the respective incubator and took 200 μL media samples and measured glucose, lactate, oxygen, potassium, sodium, and calcium concentrations using RAPIDPoint 500 Blood Gas Systems (Siemens Healthcare Limited, Canada) six hours after seeding and then every 12 hours (Experiment 1), or 24 hours (Experiments 2, 3, and 4), with the last measurement taken at 114 hours (5 days).At the same intervals, we took five images of the wells (EVOS FL Cell Imaging System) to estimate the total cell count.

Model development methodology
This paper proposes a methodology for the development of mathematical models for neotissue growth dynamics.The overarching problem here is to identify, calibrate and validate a mathematical (in silico) model for the population dynamics of a given cell type as influenced by a pre-defined set of biochemical stimuli.Sections 2.2.2 to 2.2.10 below discuss in more detail each of the corresponding steps of the proposed methodology, shown in Fig 1 .Results of applying this methodology for in silico modeling of cell population dynamics during in vitro culture of BEAS-2Bs cells are presented in Sec. 3.
First, we start with designing a model for cell culture in well plates under static (no flow) conditions with different biochemical environments.This model will focus on the effect of chemical substrates (external stimuli) on proliferation and apoptosis rates (cellular responses).The model proposals are informed by the known biology and physics of the system.After creating a library of candidate models that govern the dynamics, the models are studied through structural identifiability analysis (section 2.2.2).The structurally identifiable models are considered for the next steps of model development.The next step is to define the objective function that encodes the model inference goal, typically minimizing the prediction error of the model with respect to the experimental data.Alternative goals may be formulated as part of the objective function or as optimization constraints, e.g., minimizing the variance of the parameter estimates, minimizing the number of parameters to be estimated, or minimizing residuals with respect to physical or empirical laws, among others.Then, an experimental protocol for data collection (e.g., sampling frequency) is designed so that it is optimal for the family of model proposals over a range of assumed noise levels.Once the experiments are conducted and the data collected and post-processed, the methodology focuses on model inference, i.e., model calibration (fitting) and selection.The result of this process is a single model selected from the set of model proposals that best fits the data according to the previously defined objective function.Next, practical identifiability analysis confirms that the inferred model parameters are unique and finite, i.e., that the objective function has a single global optimum.Then, the goodness of fit of the model is quantified (model validation) to provide an estimate of the expected predictive performance of the model under experimental conditions that are different from those used during model calibration and selection.We conclude the model development procedure using global sensitivity analysis to rank the controllable experimental parameters according to their predicted effect on the cell population.Sensitivity analysis is also used as a diagnostic tool for model calibration by identifying the subset of model parameters that have the greatest influence on model fit.

Model proposals.
In this work, we will discuss the modeling methodology focusing only on the dynamics of the cell population, including proliferation and apoptosis of a single cell line in a biochemical environment that is governed by advection, diffusion, and reaction phenomena.However, the methodology can be applied for inference of mathematical models, including other aspects of neotissue growth, such as scaffold biomechanics, cell-cell, and cellscaffold interactions, cell migration via chemotaxis and haptotaxis, and cell differentiation, by including additional equations codifying the underlying physics [42].
Using the mass balance equation for cell density and for the chemical species concentrations (also referred to as "substrates" in the literature) leads to a set of coupled advection-diffusion-reaction equations that describe the spatiotemporal cells densities and substrates concentrations fields [43].These spatiotemporal equations relate the rate of change of specific concentrations and densities to the diffusion, advection, and reaction rates.The equations are defined as, Here, c i is concentration, and n j is density for each chemical species, i, or cell line, j, respectively.D c i and D n j are the diffusion coefficients, u is the fluid velocity, M c i is the substrate reaction rate, M n j is the cellular responses (proliferation, death, and differentiation), X is the position vector, t is time, C and N are vectors containing all concentrations and densities, respectively, and r� is the divergence operator.The general governing Eqs 3 and 4 have been applied to a broad spectrum of growth and transport of biological processes [44][45][46].Simplified versions of Eqs 1 and 2 can be obtained when the concentration and cell density fields are spatially homogeneous, such as in the case of in vitro submerged static cultures on well plates.Since there is no spatial variability, the diffusion and advection terms in Eqs 1 and 2 can be disregarded, thus Eqs 3 and 4 show that the rate of change for concentrations and cell densities are defined by the reaction and response rates in a homogeneous domain.These rates are the summation of the production and consumption rates in the case of chemical species concentrations, and the summation of proliferation, death, and differentiation rates in the case of cell line densities [8,10,47].
Let us now look at specific, biologically informed functional forms for the right hand side of Eqs 3 and 4. For neotissues consisting of a single cell line that does not differentiate, we propose a model for the growth of the cell population incorporating proliferation and apoptosis rates [42] Here, n, c o , c g , c l , n max , δ, and β are the cell density, concentrations of oxygen, glucose, and lactate, maximum cell density, apoptosis rate constant, and coefficient of proliferation, respectively.As seen in the equation, it consists of two terms, one for the proliferation rate and the other for the death rate.The proliferation rate term is proportionally modulated by two effects [48], the first shows the biochemical stimuli effect, and the other shows the proliferation rate limited by the physical space.A logistic function for the latter term shows that the growth rate per capita linearly drops as the population increases until cell population saturation [49,50].
In Eq 5, the potential modulating effect of the biochemical substrates is usually defined in the literature as, where K i is the Michaelis-Menten Kinetics (MMK) constant.Each of the three substrates can either have a zero-order [8], linear [10], or Michaelis-Menten [7,47] effect on cell growth rate.
The modulating effects depend on the cell type and the chemical substrate under consideration and, when unknown, can be the subject of data-driven model selection methods.It is important to note that, depending on the specific cell type, additional chemical substrates may have significant modulating effects on the cell population, e.g., growth factors, blockers and inhibitors, among others.These can be trivially included in the model through additional modulating terms in Eq 4 and additional advection-diffusion-reaction equations (Eq 1).
For the purpose of describing the methodology and illustrating it with a specific cell line, we will limit our discussion to the modulating effects of glucose, oxygen, and lactate.These nutrients affect growth the most and play a crucial role in tissue viability [7,47,51].The reaction rates in Eqs 3 and 4 are dependent on concentrations and neotissue density n as, where i 0 is the inhibitor and R i (c i , c i 0 ) is referred to as the reaction term and can take one of several mathematical forms, each representing different kinetics [10,42,52], Here, V i , � c i , � c ii 0 , are the reaction constant, the MMK constant, and the inhibition constant, respectively.With a set of biologically-informed mathematical models, model calibration and selection methods utilizing the experimental observations will find the best kinetics for each substrate and its corresponding parameters (Eqs 6 and 8).
The proposed mass-conservation-based neotissue growth model treats the entire population of cells as homogeneous in their density and type without considering multiple cell types, their interactions, and cell transitions from one type to another (i.e., cell differentiation).However, the framework we propose can be easily applied to multiple cell types by adding additional equations similar to Eq 2 for each type, and including transitions between cell types through the response terms.Future spatiotemporal (nonhomogeneous) models will include diffusion and advection terms for the cell population similar to Eqs 1 and 2 [53].

Structural identifiability analysis.
Mathematical models in biology are usually defined with differential equations as [15], In this equation, Y is a vector containing a set of state variables, e.g., substrate concentration and cell densities, Θ is a vector of model parameters, u is a vector of the external stimuli, and _ Y is the rate of change of the state variables over time.Usually, not all the state variables can be measured directly during the experiments; thus, the observables, z are denoted as, A mathematical model is identifiable whenever a unique set of observations or measurements would result in one and only one set of model parameters [50].Mathematically, if Θ and Φ are two valid sets of model parameters, then gðY; Θ; uÞ ¼ gðY; Φ; uÞ Structural identifiability analysis is performed on the mathematical model before model calibration, and it focuses on the relation between the state variables and observables.The model is discarded or modified if it is deemed that data collected about the observables, regardless of the amount of data, will not result in a unique set of model parameters.If a model is structurally identifiable, all the model parameters can be estimated from a sufficiently large number of observable measurements [4,54].To make this determination, the analysis consists of creating a modified differential algebraic equation (DAE) form from model equations that meets a certain rank criterion.The observability matrix is then obtained via symbolic techniques.The model is considered structurally identifiable if the observability matrix has full rank [55].

Objective function definition.
Generally, inferring a parametric model from data is a mathematical optimization problem, in which a set of model parameters is estimated that maximize a user-defined measure of 'goodness of fit'.Two standard methods for parameter estimation of mathematical models are maximum likelihood estimation (MLE) and nonlinear least squares (NLS) [56], which use the likelihood function and the sum of squared errors as objective functions, respectively.The choice of objective function consolidates the assumptions about the data into the calibration process.In MLE, the posterior probability of the observed data is maximized based on the known or assumed statistical distribution of the data.In contrast, NLS minimizes the sum of squared residuals between observations and predictions (NLS) without any distributional assumption about the data.
In this work, we employ MLE because it produces a suitable frequentist formulation of the calibration process without imposing unwarranted assumptions about the data and noise distribution.The likelihood function for independent and identically distributed observations (i.i.d.), Z i , is defined as the multiplication of their probability functions (p) as, Let us assume that the observations are normally distributed, with a time-dependent but unknown mean as predicted by the model with unknown parameters Θ and unknown standard deviation.Then, minimizing the negative log-likelihood objective function, −ℓ(Θ;Z), is mathematically equivalent to the weighted NLS problem defined as [4], where Here, n k , n z , z j , Z j , and σ j (t k ) are, respectively, the number of time points, the number of observables, the model predictions, and the mean and standard deviation of each observable at each time point across experimental replicates, respectively.

Model-based design of experimental protocols (MBDEP).
We propose an approach to design the data collection protocol so that it minimizes the error of estimated model parameters for a given set of model proposals under expected experimental noise levels.Given the non-linearity of the models we use here, our approach is based on statistical simulation, as follows.
First, we identify a set of plausible model parameters, e.g., taking parameter values from the relevant literature, similar experiments with other cell types, or other experiments done with the cell type of interest.Alternatively, some model parameters may be roughly estimated based on the known biology and physics of the process.Next, we do a forward modeling step, in which we use the models with these assumed parameter values to generate simulated noiseless experimental data about the observables.Gaussian noise is then added to this data, i.e., where, � is the noise level, z j (t k ) represents the simulated value of observable j at time t k , and N is the Gaussian probability distribution function.Different noise levels, � 2 {0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5}, can be used for this analysis to observe how different experimental noise levels would affect model inference.Note that here we assume that all observables can be measured with a similar level of experimental error (noise).However, the same approach can be used with different noise levels for each observable, e.g., representing the availability of different measurement techniques or equipment with different levels of accuracy.The simulated data with added noise is then used for MLE or NLS estimation of model parameters (Sec.2.2.7), and the difference between the assumed and the estimated values of the model parameters is calculated.
These steps are repeated with multiple sets of simulated data and different spatio-temporal sampling frequencies (i.e., with different amounts of data).The results of this effort allow us to identify the sampling frequencies that are required to ensure that the estimation of model parameters is robust to the expected levels of experimental noise.Furthermore, if there are external constraints on the experimental procedure (e.g., equipment/personnel availability, cost, timelines, etc.), the proposed MBDEP can be used to check the feasibility of model inference and thus signal the need for reformulating experimental goals.
2.2.7 Model inference.Once the experimental data has been collected, the data is split into three subsets for calibration (60% of the data), selection (20%), and validation (20%), following best practices.The calibration data set is then used to formulate, for each candidate model, a nonlinear optimization problem to determine the set of model parameters that best fit the data, i.e., The optimization problem posed in Eq 15 is solved through a multi-start strategy, which helps with detecting and averting the local minima [57].A maxi-min Latin hypercube method is utilized to generate a set of initial guesses that cover the search space with guaranteed lowerdimension projection properties [58,59].Starting from each initial parameter guess, the optimization problem is solved using the Adam stochastic gradient decent-based method [60].
Then, the best-performing solutions are used as starting points for a further optimization stage using the BFGS method [61] to ensure final convergence.
Model selection is made between the candidate models by checking how well they match the time evolution of state variables, i.e., cell populations and biochemical concentrations on the selection dataset.Several error metrics (e.g., MSE, RMSE) can be used for this purpose.However, given that the candidate models may have different numbers of parameters, in this work, we use the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), to select the model that best balances model complexity and goodness of fit [4,62,63].In Eq 16, log(x) is the natural logarithm of x, k is the number of inferred parameters, and m s is the number of observations in the selection dataset.The use of AIC and BIC has been widely discussed in the literature, and although no definitive recommendations have been made, it is commonly accepted that AIC is an optimal rule for selecting a model among a set that may not contain the true model for the purpose of predicting the dependent variable over unseen data.In contrast, BIC is an optimal rule for selecting the true model (or the lowest-dimension model that best describes the data) among a set of models that contains the true model [64].
Note that model selection is ultimately performed by humans, using expert knowledge about the physics and biology of the situation, and with a clear experimental goal (e.g., prediction vs. description).However, when knowledge about the underlying physical or biological phenomena is incomplete, data-driven model selection between plausible models can help researchers gain knowledge about the phenomenon of interest and may point to unconsidered physics.
2.2.8 Practical identifiability analysis.Practical identifiability analysis is performed after model calibration and selection.It utilizes the inferred model and the experimental measurements to find confidence intervals for each inferred parameter.The sparsity of the experimental data set, or large experimental variability can result in practical unidentifiability [12].
There are several methods to conduct practical identifiability analysis of models.A commonly used method calculates the local sensitivities of the model with respect to its parameters to construct the Fisher information matrix (FIM).The method then either uses the eigenvectors of FIM or constructs the correlation matrix to finalize the analysis [65,66].Null eigenvectors and high correlations point to the unidentifiable parameters.An alternative, which we use in this work, is profile likelihood-based methods, which are invariant to model reparameterization, are not limited to symmetric confidence intervals, and can even detect structural unidentifiabilities [15].
Profile likelihood-based methods consider one model parameter (e.g., θ i ) as fixed at a given value, and then find the MLE of the rest of the parameters (θ j , 8j 6 ¼ i) in Eq 15, i.e., This process is systematically repeated for different values of the fixed model parameter θ i , and then for each parameter θ j in turn.Then, the confidence interval for each parameter is defined as the set where Δ α is the α-quantile of the χ 2 distribution with one degree of freedom.A model parameter is deemed as practically identifiable if the corresponding confidence interval is finite.2.2.9 Goodness of fit.Goodness of fit refers to the assessment of how well the model represents the data using a suitable error metric.In the context of model selection among a set of calibrated models, an unbiased assessment of goodness of fit must rely on hold-out data, i.e., data from the same experimental context (same generating process) but distinct from that used for model calibration and selection.This hold-out data is referred to herein as the validation data set.In this work, we use the mean relative prediction error of the inferred model on the validation dataset as, ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi This value predicts the model accuracy when predicting the state variables for any case within the convex hull of the experimental data used to calibrate, select and validate the models.
As an additional metric to assess the accuracy of the model predictions, we calculate the mean relative experimental variance of the data.In other words, this is the standard deviation, among the number of experiment replicates (9), of each observable at each time point, normalized with the value of the mean value of the observable at that time point, and averaged over all time points and observables, i.e.
Note that this value expresses the variance among replicates in normalized terms.This information is critical to properly evaluate the prediction error of the models, which cannot be meaningfully expected to be lower than the intrinsic variability of the experimental data.

Global sensitivity analysis.
Global sensitivity analysis (GSA) quantifies the effect that any input has on the system output, averaged over the input domain, i.e., over the hypercube formed by the Cartesian product of the ranges of each input.In the context of the proposed methodology, GSA can be used to rank the model parameters in terms of their quantitative impact on the model predictions.It also is used for identifying unimportant parameters and parameter space regions where each parameter is most important [67].More importantly, GSA applied to the validated model can allow us to rank the variables that can be controlled during an experiment in terms of their effect on any observable of interest.Whether applied to the model parameters or to the experimentally controllable variables, GSA is an often overlooked and crucial step for both quality assurance and practical application of in silico models [35].
Variance-based GSA methods such as Sobol's [68] and Saltelli's [69] are the most commonly used in the literature, mainly due to their suitability for non-linear, high-dimensional responses, and for their consideration of interactions between input variables [70].Variancebased GSA methods are based on a seminal work by Sobol [68], which showed a general decomposition of non-linear, continuous functions (e.g., Y k in Eq 9) into a set of integrals of increasing dimensionality representing the overall mean, main effects, and interactions of increasing order between variables [71].
where f 0 is a constant, f i (X i ) is the main effect of X i (effect of only varying X i ), and f ij (X i , X j ) is the first-order interaction between X i and X j .Assuming Eq 21 to be square-integrable, the functional decomposition will result in, with, where X *i means set of all variables except X i and E is the expectation operator.Using Sobol's method, we calculate first and total-order interactions for all the model parameters as, S i measures the effect of varying X i alone averaged over variations in other input parameters, standardized by the total variance.These values are calculated with Monte Carlo sampling, allowing the creation of confidence intervals for the sensitivity indices.

Results
In this section, we describe the application of the proposed methodology for inferring the population dynamics of the airway-relevant BEAS-2Bs cell line, under different biochemical conditions in a (no flow) static culture environment.The following subsections will illustrate how the methodology is applied, step by step, in the development of a model for BEAS-2B cells and show the challenges and benefits of this strategy.The reader can refer to the Materials and Methods section for a full description of the methodology.

Model proposals
This stage creates a set of model proposals based on the existing knowledge about the specific cell line under study, in this case, BEAS-2B cells.BEAS-2B is a human bronchial epithelial cell line derived from normal, non-cancerous human bronchial tissue closely resembling the primary human bronchial epithelial cell's morphology and functional characteristics.These cells do not differentiate, and their growth rate is known to be affected by oxygen, glucose, and lactate levels [39,72].In a homogeneous, static growth environment with no cell migration or shear stress, Eqs 5 and 6 are reasonable model hypotheses for cell population dynamics of a single cell line that does not differentiate.Also, spatial variations in biochemical concentrations would be negligible, and thus Eq 1 would simplify to Eq 3. The Michaelis-Menten reaction rates for glucose and oxygen (as defined in Eq 8) will yield the initial system of ordinary differential equations (ODEs) as, In these equations, lactate production is calculated using the relationship between aerobic respiration and glycolysis [7,8].
For each of the terms f 1 (c o (t)) and f 2 (c g (t)), which account for the potential effect of oxygen and glucose on the cell population growth rate, we select three candidate models, namely zeroorder (Eq 6a), first-order (Eq 6b), and MMK with positive feedback (Eq 6c).Similar candidate models are chosen for the lactate effect, f 3 (c l (t)), but using MMK with negative feedback (Eq 6d) instead.Thus, the model proposal stage results in a total of 27 different candidate models, which will be investigated later in the model inference step via model selection methods.

Structural identifiability analysis
To study the structural identifiability of the models proposed for the BEAS-2B cell line, we focus on the most complex candidate model, i.e., the model with the largest number of parameters and/or most severe non-linearity.The implicit assumption is that, if it is deemed structurally identifiable, then simpler models will be as well.This assumption is reasonable because there are only minor differences between the models being considered.
Global structural identifiability for the model was confirmed using StructuralIdentifiability. jl; an open-source SciML package for structural identifiability analysis [73][74][75].In the specific case of the BEAS-2B candidate models proposed in the previous step, the most complex model corresponds to Eq 25 with MMK effects for the three substrates.The observables for performing this analysis are the cell population density and glucose and lactate concentrations.Results showed that this model is structurally identifiable.

Objective function definition
The next step in the methodology is defining the objective function.For the specific case of the BEAS-2B model, we use non-linear least squares (NLS).Let Z j 2 fN i ; C g i ; C l i g and z i 2 fn i ; c g i ; c l i g, then the least square problem becomes finding the parameters Θ* by, where S T is the training set.As mentioned in Sec.2.2.3, minimizing Eq 26 is equivalent to using the maximum likelihood estimator in Eq 12 [15] if the experimental data is assumed to have a Gaussian distribution.

Model-based design of experimental protocols
We applied the proposed approach for model-based design of experimental protocols (Sec.2.2.4) to the inference of a model for cell population dynamics of BEAS-2B cells.Since this is the first mathematical model for BEAS-2Bs, there is no available data or information on which to base decisions about the frequency and resolution required for cell growth experiments, i.e., the number and timing of the experimental measurements required to properly capture the system dynamics.Let us assume that one of the model proposals selected in Sec.3.1, e.g., Eq 25 with MMK effects, is an appropriate description of the system.We then assume that the model parameters have known values (see supplementary material), which are taken from similar experiments published in the literature for nerve cells [10], osteoblast cells [8], and mesenchymal stromal cells [47].Using this model with initial conditions consistent with our experimental setup (Sec.2.1) and assumed model parameters, we generate synthetic data about the observables, namely cell population and concentrations of glucose, lactate, and oxygen as a function of time, for a total of 16 different initial conditions.Gaussian noise was added to the data as described in Sec.

2.2.4.
Solving Eq 26 with the generated synthetic data results in the inferred parameters (Θ*).This process was repeated five times with data sets with different sizes, corresponding to the measurement of the observables at periods of 1, 2, 4, 24, and 48 hours.To compare the performance of the parameter inferences using different sampling periods, we defined error as the difference between the assumed and inferred (Eq 26) parameters, Fig 2 shows the resulting error in parameter inference as a function of the sampling frequency for the observables.For estimating parameters for the set of model proposals considered here, taking cell population and concentration measurements every 24 hours results in a 7.32% error.Since more frequent measurements do not result in better parameter estimates, we selected 24 hours as the sampling period for our BEAS-2B experiments.

Running and postprocessing the experiments
Experiments were conducted as described in Sec.2.1.We conducted nine replicates of each experimental condition by doing three runs of each experiment, using three well plates in each run.Two-way statistical analysis of variance (ANOVA) test (using the Pingouin [76] Python library) demonstrated statistically significant differences in cell populations between different culture conditions.Specifically, cell populations at time t = 102 hours were significantly affected by the change in the culture environment, suggesting that oxygen and initial glucose concentrations affect the growth dynamics.Notably, since oxygen and glucose affect cell population growth, this implies that five of the 27 model proposals can be discarded, since these would make the cell population after t = 102 hours independent of oxygen or glucose levels.Hence, only 22 model proposals were considered in the next steps of the methodology.
To prepare the experimental data for model inference, we implemented a data splitting strategy, following best practices in model fitting, selection, and validation typically found in the machine learning literature.The full dataset contains the time evolution of state variables under four different culture environments with four different initial cell densities, giving a total of 16 experimental conditions.This dataset was split into calibration, selection, and validation datasets containing 60%, 20%, and 20% of the data, respectively.Through this process, we implemented a stratified sampling approach to ensure that all experimental conditions were equally represented in each dataset.

Model inference using in vitro data
As described in Sec.2.2.7, using the calibration dataset we inferred the parameters ðb; d; V g ; K o ; K g ; K l ; � c g Þ for the 22 candidate models.Due to experimental constraints, it was not possible to measure the oxygen concentration in the growth media, and thus it is assumed to be constant and equal to the oxygen concentration in the environment.We consider this a reasonable assumption, given the time scales involved and the fact that oxygen from the incubator chamber continuously diffuses into the growth media, replenishing the oxygen consumed by the cells.As a result, the advection-diffusion-reaction equation (Eq 1 in Materials and Methods) for oxygen was modified by setting M c o ðNðtÞ; CðtÞÞ ¼ 0, thus indicating a zero rate of change for oxygen concentration.Note, however, that oxygen concentration does affect the cell growth dynamics of BEAS-2B cells.Hence we included K o in our set of model parameters to be estimated.
Optimization runs were conducted over a search space spanning multiple orders of magnitude, defined as a hypercube of dimension 7 over the range [10 −7 , 10 7 ].A total of 1300 starting points for the optimization were selected as a maxi-min Latin hypercube over the search space to ensure good coverage, for a total of 1300 × 22 = 28600 runs.This process was implemented in the Julia programming language using Flux.jl[77] and Optim.jl[78] libraries for Adam and BFGS optimizers, respectively.
Since parameter calibration is a non-convex multimodal optimization problem with potentially many local optima, we further analyzed the resulting losses and inferred parameters to confirm convergence to a global optimum (S1 Fig in S1 File).These analyses suggest that the model inference does indeed have multiple local optima, but also provide evidence that the optimization strategy used here converges to what is likely the globally optimal model parameters for all 22 candidate models.Once the parameters for all candidate models were inferred with the calibration dataset, we used the selection dataset to calculate the BIC values for all models.Fig 3 shows the resulting BIC values, with different markers/colors indicating the number of first-order modulating effects (Eq 6) present in the model.The BIC criterion strongly suggests that model performance deteriorates as more of the substrates are assumed to have first-order effects on the cell population.Thus, we focused on the top five models (lowest BIC), which include MMK-type effects in at least one of the substrates, namely Lac : where the models have been named according to which substrates incorporate an MMK effect.
As seen in Fig 3, the top-performing models have very similar BIC values, so a purely datadriven model selection strategy fails in this case to identify a single winner.However, we note that the model OxyGluLac is mathematically equivalent to the other four models at infinitely large or low values of K i 's, e.g., when K l ! 1, the OxyGluLac model is equivalent to the Oxy-Glu model.Hence we select the OxyGluLac model for the remainder of the study, as the best representation of the underlying cell behavior.This choice is also supported by the known biology of BEAS-2B cells.
Table 1 shows the inferred parameters for the OxyGluLac model resulting from the calibration process.Observing the values for K i s, it can be seen that, in the range of the experimental conditions tested in this work,i.The figure shows that the model is fairly balanced in underpredicting and overpredicting the state variables.The model predictions are mostly between error bars for the in vitro experiments.Perhaps the only exception is the top left subfigure, corresponding to the prediction of cell density under normoxia and high glucose concentration in the culture media.In this case, the model predictions consistently underpredict the experimental data starting at the first time point.These differences are more suggestive of a constant bias than of a prediction error, which can be attributed to systematic errors in our measurements for this specific condition.This observation is supported by the larger dispersion seen in the cell population measurements at the first time point.Further analysis (not shown here for brevity, see S2-S4 Figs in S1 File) confirms that experimental observations are within the confidence intervals of the model predictions.

Practical identifiability analysis
We perform profile likelihood-based practical identifiability analysis using the ProfileLikelihood.jl[79] package.Table 1 shows the resulting confidence intervals for all estimated model parameters are finite, thus proving the practical identifiability of the model based on our experimental data [80].

Goodness of fit
The relative root-mean-square prediction error of the inferred model was calculated using the validation data set as described in Sec.2.2.9, Eq 19.Results show that the inferred model has a RMSE of 18.3%.For context, the experimental error (noise), calculated as the average of the standard deviations of experimental replicates for each time point and experimental condition, is 18.7%.Based on this, we consider that the model is sufficiently accurate for its applications in support of BEAS-2B tissue engineering.

Global sensitivity analysis
A global sensitivity analysis of the OxyGluLac model was performed.Specifically, we calculated the sensitivity of cell population and glucose and lactate concentrations at time t = 114 hr with respect to the experimental conditions that can be controlled, i.e., oxygen concentration in the incubator, initial glucose concentration in the culture media, and initial seeded cell density.For this purpose, we use Sobol's method (Sec.2.2.10) with 40,000 Monte Carlo samples.Firstorder Sobol indices rank the importance of each condition alone, while total-order Sobol indices also include parameter interactions.
Fig 5A shows the resulting Sobol indices.It can be seen that the initial cell population has the largest effect on both the terminal cell population and lactate concentration while having only a small effect on the terminal glucose concentration.Taken together, these observations suggest that the culture conditions used in the experiments did not impose significant metabolic constraints on the cells during the first t = 114 hr.This is also consistent with the lack of significant parameter interactions, as evidenced by the similarity between the first-order and total-order Sobol indices.To further analyze the effect of the culture conditions on the cell population, Fig 5B shows the total-order Sobol sensitivity indices of the cell population throughout the duration of the experiment.It can be observed that the sensitivity increases as the experiment unfolds, suggesting that if the experiment were to be run for longer (e.g., 10 to 15 days), we would see a more significant effect of the biochemical substrates on the cell population.

Application: Optimizing culture conditions
One application of the SBDO paradigm using our model development methodology is studying the effect of different experimental settings efficiently [7,81].Here we study how different media refreshment regimens affect cell population dynamics.In silico, this study is implemented by resetting the concentration values of chemical substrates to be equal to their initial values at series of refreshment time points.Fig 6 shows the estimated cell yields after 43 days of culture under different refreshment periods, from 2 to 24 days, indicated as vertical lines.It is observed that refreshing the culture media every 2 to 10 days maintains a relatively stable cell population, with small oscillations that increase in amplitude as the media refreshment period increases.Conducting the experiment with media refreshments every 10 days or more results in drastic decreases in cell population, which become unrecoverable if the media is not refreshed at least every 14 to 16 days.Note that this in silico study takes only minutes to run on a desktop computer once an inferred and validated model is available.However, conducting all these experiments in vitro would take significant time and incur costs in supplies, equipment, and personnel.

Discussion
The comprehensive methodology for the development of in silico models for neotissue growth proposed in this work leverages state-of-the-art methods for experimental design, data-driven model inference and selection, and post-hoc analysis, validation, and interpretation of biologically informed models based on systems of differential equations.
The proposed methodology addresses the complexity, sparsity, and variability of tissue engineering experiments by implementing specific model formulation calibration, selection, and validation steps that are not typically considered in previous works.For instance, Coy et al. [10] developed a model for the regeneration of nerve cells but did not conduct identifiability and sensitivity analysis, which was used by Villaverde et al. [4].Our proposed methodology uses both structural and practical identifiability analysis to properly assess the suitability of a model and the ability to infer a unique set of parameters based on the available (noisy) experimental data.In comparison with [4], we implement additional model selection and validation steps based on hold-out data.Furthermore, in contrast with [4,65], we add an extra step for global sensitivity analysis, which provides important insights for model diagnosis and interpretation [71].
Uniquely, in our methodology, we propose a novel method for model-based design of experimental protocols (MBDEP).Starting from the set of biologically informed model proposals being considered, and leveraging prior information about model parameters, MBDEP optimally selects the spatial and temporal sampling frequency required to minimize the estimation error of the inferred model parameters, taking into account the expected experimental noise.In addition, MBDEP provides valuable information regarding the suitability of the measurement equipment and techniques, vis-a-vis their expected measurement error, for the intended model inference application.This is in contrast with SBDO literature in non-biology related applications, which has focused instead on experimental designs balancing the needs for model inference and optimization.For instance, [82][83][84] combined simulation models and data-driven surrogate models for the adaptive, sequential design of experiments.In particular, starting from an existing mathematical model of the system, these works determined a set of experimental conditions (i.e., the actual sampling points) required to improve a figure of merit, such as maximizing the information content of the data [85], or optimizing a response variable based on a limited set of computationally expensive deterministic computer experiments [82].In contrast, our proposed model-based experimental protocol design approach aims at determining, a priori, the quantity and quality of noisy data needed for robust inference of mathematical models for neotissue growth formulated as a system of differential equations.However, once a mathematical model has been inferred for the system, adaptive/sequential sampling methods may be used to gather further data to improve the model, capture additional dynamics, or optimize according to experimental goals.
In the context of our in vitro experiments with BEAS-2B cells on well plates, where observables vary in time but not in space, the proposed MBDEP allowed us to adequately capture the time evolution of the observables objectively establishing that measuring cell density every 24 hr was necessary to infer model parameters from experimental data over a wide range of measurement noise.Comparatively, previous work in the literature does not typically provide a rationale for certain aspects of their experimental protocols.For instance, Coy et al. [10] used only the terminal cell density after 24 hr for model inference (a total of 15 measurements), Eleftheriadou et al. [11] used 36 measurements to infer a 16-parameter model, while Duchesne et al. [12] used 10 measurements to infer 6 to 8-parameter model.Future extensions of the BEAS-2B models where spatial variations in cell density and substrate concentrations are present could use the proposed MBDEP to determine the distance between measurements (i.e., spatial sampling frequency) and/or their specific location for robust model inference.
We applied the proposed methodology to develop the first mathematical model to predict population dynamics of BEAS-2B cells in vitro under different biochemical environments characterized by glucose, oxygen, and lactate concentrations.Over the validation dataset, the resulting BEAS-2B model exhibited a prediction error of 18.3%, an accuracy comparable to the experimental noise (18.7%), thus suggesting the suitability of the inferred model for the design and optimization of bioreactor devices and experimental protocols to maximize cell yield, reduce variability, improve cell coverage, among others [1,86].
BEAS-2B cells are widely used as an in vitro platform for numerous studies in airway/lung homeostasis and disease [28,[87][88][89][90] spanning drug screening, toxicology, viral cellular response, and more recently in tissue engineering applications focusing on optimization of methods for generating bioengineered lungs [23,91].As such, a mathematical model accurately predicting BEAS-2B growth and metabolic activity would be valuable in determining experimental conditions in each context.
As an illustration, the inferred model was used to study the effect of the media refreshment period on the resulting BEAS-2B cell population.It was shown that changing the culture media every 1 to 10 days did not have a significant effect on the final cell population after 43 days.Comparatively, cell culture protocols for BEAS-2B cells and similar airway-relevant tissues typically prescribe media changes every 48 hr to 72 hr in growth media.Insights from experimentally validated in silico models may thus translate to significant savings in supplies (e.g., 10X reduction in growth media in this case) and personnel, especially in the context of commercial production of engineered tissues.

Concluding remarks
The in silico model considered in this work is based on a system of coupled differential equations describing advection-diffusion-reaction of biochemical substrates and their effect on neotissue growth of a single cell type, without considering multiple cell types and transitions between them.The mathematical model family proposed here can be applied to other single cell line populations with their dynamics affected by glucose, lactate and oxygen concentrations.In such applications, all the model parameters would need to be re-calibrated using relevant experimental data.Also, in the case of biochemical stimuli other than glucose, lactate and oxygen, additional terms representing their rates of change would need to be added to the system of equations, along with their potential effect on the cell proliferation term.
The framework we propose can be easily applied to multiple cell types by adding additional equations similar to Eq 5 for each cell type, and including transitions between cell types through the response terms [92].Duchesne et al. [12] proposed such a model for the differentiation of chicken erythroid progenitor cells, in which transitions between cell types depended on cell densities only, without considering bio-chemo-mechanical cues.Such formulations could be combined with the proposed methodology for studying the directed differentiation of pluripotent stem cells under different environments [27,93].Additional effects such as shear stress [39], scaffold stiffness, and air-liquid interface exposure could be added to the model, e.g., through Eq 6.
In vitro neotissue growth, whether under static or perfusion conditions, is a complex multiscale multi-physics phenomenon [94].To realize the full potential of SBDO applications in tissue engineering, there is a need for end-to-end in silico modeling, including perfusion cell seeding, deposition, attachment, proliferation, migration, and differentiation in response to both biochemical and mechanical cues.
In this context, hybrid Lagrangian-Eulerian formulations that consider scaffold biomechanics, cell-cell, and cell-scaffold interactions [95] while tracking the motion of individual cells or cell parcels within a flow field [39,96], are promising approaches.Neotissue growth models such as those presented in this work could be combined with hybrid Lagrangian-Eulerian formulations to achieve end-to-end in silico neotissue growth modeling.

Fig 1 .
Fig 1. Model development framework.The green box shows the mathematical model development steps, blue boxes depict the mathematical checks on the model, orange boxes display the experimental steps, and yellow boxes show the initial steps in model development.https://doi.org/10.1371/journal.pone.0300902.g001

Fig 2 .
Fig 2. Parameter inference errors for different temporal sampling periods for substrate frequency increases.Parameter inference error decreases as concentration and cell population data is collected more frequently.However, collecting samples at intervals shorter than 24 hrs has no further impact on estimation errors for model parameters.https://doi.org/10.1371/journal.pone.0300902.g002

Fig 3 .
Fig 3. BIC values for the 22 candidate models.Different markers show the number of first-order effects on the cell proliferation rate.https://doi.org/10.1371/journal.pone.0300902.g003

20] mol m − 3 ,
lactate has the highest effect among the biochemical substrates considered.This observation is confirmed later in the Global Sensitivity Analysis step.Fig 4 compares the inferred in silico model results versus the experimental observations in all experiments.In this figure array, each column in the figure refers to an observable variable (cell density, glucose, and lactate concentrations), and each row represents a different set of experimental conditions (initial concentrations of glucose and oxygen).In each plot, different lines represent different initial cell densities.As the figure illustrates, the calibrated model is indeed able to capture the effect of different biochemical conditions well and accurately predict the resulting cell population through the experiment for all experimental conditions.Fig 4 also shows the difference between the model prediction and the experimental observations versus the noise of the experimental data, defined as standard deviation over the mean.

Fig 4 .
Fig 4. Model inference.Inferred model versus the in vitro observations.The in silico model results are shown with curves, and the in vitro model results are shown as dots with error bars showing standard deviation.https://doi.org/10.1371/journal.pone.0300902.g004

Fig 5 .
Fig 5. Global sensitivity analysis.A: GSA on terminal values of observables.The x-axis shows the three observables.B: Time evolution of the global sensitivity of cell population to variables controlled during the experiments.https://doi.org/10.1371/journal.pone.0300902.g005

Fig 6 .
Fig 6.Study of refreshment periods.Lower media refreshment periods result in higher and more stable cell populations.https://doi.org/10.1371/journal.pone.0300902.g006 as, dn dt ðtÞ ¼ f 1 ðc o ðtÞÞf 2 ðc g ðtÞÞf 3 ðc l ðtÞÞ |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl }|ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {