## Figures

## Abstract

We present a nonlinear programming (NLP) framework for the scalable solution of parameter estimation problems that arise in dynamic modeling of biological systems. Such problems are computationally challenging because they often involve highly nonlinear and stiff differential equations as well as many experimental data sets and parameters. The proposed framework uses cutting-edge modeling and solution tools which are computationally efficient, robust, and easy-to-use. Specifically, our framework uses a time discretization approach that: i) avoids repetitive simulations of the dynamic model, ii) enables fully algebraic model implementations and computation of derivatives, and iii) enables the use of computationally efficient nonlinear interior point solvers that exploit sparse and structured linear algebra techniques. We demonstrate these capabilities by solving estimation problems for synthetic human gut microbiome community models. We show that an instance with 156 parameters, 144 differential equations, and 1,704 experimental data points can be solved in less than 3 minutes using our proposed framework (while an off-the-shelf simulation-based solution framework requires over 7 hours). We also create large instances to show that the proposed framework is scalable and can solve problems with up to 2,352 parameters, 2,304 differential equations, and 20,352 data points in less than 15 minutes. The proposed framework is flexible and easy-to-use, can be broadly applied to dynamic models of biological systems, and enables the implementation of sophisticated estimation techniques to quantify parameter uncertainty, to diagnose observability/uniqueness issues, to perform model selection, and to handle outliers.

## Author summary

Constructing and validating dynamic models of biological systems spanning biomolecular networks to ecological systems is a challenging problem. Here we present a scalable computational framework to rapidly infer parameters in complex dynamic models of biological systems from large-scale experimental data. The framework was applied to infer parameters of a synthetic microbial community model from large-scale time series data. We also demonstrate that this framework can be used to analyze parameter uncertainty, to diagnose whether the experimental data are sufficient to uniquely determine the parameters, to determine the model that best describes the data, and to infer parameters in the face of data outliers.

**Citation: **Shin S, Venturelli OS, Zavala VM (2019) Scalable nonlinear programming framework for parameter estimation in dynamic biological system models. PLoS Comput Biol 15(3):
e1006828.
https://doi.org/10.1371/journal.pcbi.1006828

**Editor: **Qing Nie,
University of California Irvine, UNITED STATES

**Received: **September 5, 2018; **Accepted: **January 30, 2019; **Published: ** March 25, 2019

**Copyright: ** © 2019 Shin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **The full Julia script is available at https://github.com/zavalab/JuliaBox/tree/master/MicrobialPLOS.

**Funding: **SS and VMZ acknowledge financial support from the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. OSV acknowledges support from the Army Research Office under Young Investigator Award W911NF-17-1-0296 and the National Institutes of Health under award NIGMS 1 R35 GM124774-01. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Dynamic modeling is essential for understanding the behavior of biological systems. Systems of interest in this domain include microbial communities and microbiome, gene regulatory networks, and metabolic pathways [1–3]. An important task that arises in modeling studies is validation against experimental data by using parameter estimation techniques. This task is computationally challenging because of the need to solve optimization problems constrained by differential equations. Challenges arise from the dimensionality, nonlinearity, and stiffness of the dynamic model, from the incomplete observation of the system states, from the need to estimate many parameters, and from the need to handle a large number of experimental data sets.

Extensive research on solution methods for estimation problems with differential equations has been reported in the computational biology literature (see [4, 5] for comprehensive reviews). These methods target maximum likelihood estimation formulations (which are derived from Bayesian principles). In these formulations, one aims to find parameters that maximize the likelihood function. The most used strategy to handle such formulations is the so-called *simulation-based* approach. Here, the idea is to perform repetitive simulations of the dynamic model at different trial parameter values to identify a set of parameters that maximizes the likelihood. The trial parameter values are updated using derivative-based or derivative-free search schemes [6–8]. While the simulation-based approach is intuitive, repetitive solutions of large dynamic models become computationally expensive and differential equation solvers can fail at trial parameter values that are non-physical or that trigger unstable responses. In addition, techniques to compute first and second order derivatives for derivative-based schemes (e.g., finite differences, forward and adjoint sensitivities) involve intrusive procedures and are often limited to first-order derivatives [7]. The need for derivatives can be bypassed by using derivative-free search schemes [9, 10], which are widely popular in computational biology. Such methods include simulated annealing [11, 12], genetic algorithms [13, 14], particle swarms [15], approximate Bayesian computation [16, 17], and various other methods [18, 19]. Derivative-free schemes do not scale well in the number of parameters (a larger number of trial parameter values often need to be explored compared to derivative-based schemes). Moreover, second order derivative information is needed to determine if the parameter estimates are unique/observable given available experimental data [20, 21]. The uniqueness/observability test of the parameter estimates is based on curvature information of the likelihood function at the solution.

Simulation-based estimation frameworks previously reported in the computational biology literature have focused on problems that usually contain less than 100 parameters [10, 16, 22]. To the best of our knowledge, the largest estimation problem solved using a simulation-based framework contains 3,780 data points and 1,801 parameters [7]. Such a problem was solved (to partial optimality) using a derivative-based search scheme that uses first-order derivative information (using an adjoint method) and required over 5 hours of computing time. The scalability limitations of simulation-based approaches present an important obstacle in considering models of higher fidelity, in exploiting high-throughput experimental data, in analyzing parameter uncertainties, and in implementing sophisticated techniques such as ensemble modeling.

In this work, we propose a nonlinear programming (NLP) framework for solving estimation problems with embedded dynamic models [23, 24]. The framework is based on a *direct transcription* approach wherein the dynamic model is converted into a large set of algebraic equations by applying time-discretization techniques. The algebraic equations are then embedded directly as constraints in the optimization problem (a nonlinear program-NLP). The NLPs arising from time discretization are of high dimension (easily reaching hundreds of thousands to millions of variables and constraints) but are also *sparse* and *structured*. Moreover, by transforming the dynamic model into algebraic equations, it becomes possible to use automatic differentiation techniques available in modern algebraic modeling languages to compute first and second derivatives. Exploitation of sparsity and structure, together with the availability of derivative information, enable the solution of estimation problems with complex dynamic models and efficient handling of many parameters and experimental data sets.

Discretization-based estimation approaches have been widely studied in diverse fields such as chemical engineering [25–27] and aerospace engineering [28–30] (see [31] for a comprehensive review) but less so in computational biology. A major factor that has hindered wider adoption is the lack of easy-to-use computational frameworks that facilitate access to *non-expert users*. In this work, we demonstrate that modern modeling and solution tools can be combined to create scalable, robust, easy-to-use, and flexible frameworks. We demonstrate the benefits by solving challenging estimation problems arising in microbial community models.

The proposed framework enables the implementation of higher level tasks such as observability analysis and uncertainty quantification. Uncertainty quantification (UQ) seeks to characterize parameter posterior distribution, which is necessary to obtain confidence levels/regions and parameter correlation information. Conventionally, UQ is performed by using second order derivative (Hessian) information of the likelihood function to construct an approximate parameter posterior covariance matrix [24, 32] or by using a Markov-Chain-Monte-Carlo (MCMC) techniques [33–36]. The Hessian-based approach is scalable but it requires intrusive computation (cannot be automatically computed by the solver) and does not capture well the effect of nonlinearities and physical constraints [32]. In MCMC, one samples parameters from the prior parameter density and compares the associated model outputs with experimental data to decide whether to accept that sample or not. By repeating these accept/reject steps one can construct an approximate parameter posterior. MCMC is rather easy to implement (it is not intrusive and does not require solving an optimization problem) but, being simulation-based, also suffers from potential failures of the differential equation solver at non-physical parameter samples, it does not scale well with the number of parameters, and it cannot handle constraints directly (e.g., nonnegative concentrations) [37]. In this work, we propose to overcome some of these challenges by using a randomized maximum a posteriori (rMAP) framework [36–39]. This method computes approximate (to second order) samples from the parameter posterior distribution by performing random perturbations on the experimental data and by re-solving the estimation problem. This allows exploration of the parameter space more efficiently compared to the MCMC scheme because each sample can be computed in parallel (MCMC is sequential). Moreover, the rMAP approach is non-intrusive, can capture nonlinear and physical constraints effects, and avoids potential failures of differential equation solvers. The proposed estimation framework is flexible and can easily accommodate advanced estimation formulations. To demonstrate this, we implement formulations that use different prior regularization schemes and *k*-max norms (the mean of a specified fraction of largest values) to mitigate large outliers [40, 41].

The main contributions of this paper are summarized as follows. First, we provide an overview of parameter estimation and uncertainty quantification that leverages state-of-the-art optimization methods. Second, we demonstrate the computational capability of the nonlinear programming framework to handle large-scale parameter estimation problems for biological models. Lastly, we propose a novel problem formulation for parameter estimation using risk measures. Fig 1 shows an outline of parameter estimation framework presented in this paper.

Mathematical models for biological systems are often expressed as systems of differential equations with parameters that need be estimated from experimental data. We formulate the estimation problem using a maximum a posteriori (MAP) formulation. This yields optimization problems constrained by differential equations that are transformed into fully algebraic nonlinear programs by using discretization schemes. The resulting NLPs can be easily implemented in algebraic modeling languages such as JuMP and Plasmo.jl that compute derivatives automatically and that are interfaced to powerful interior-point optimization solvers that exploit sparsity and structure to achieve high computational efficiency. The proposed framework is scalable, robust, easy-to-use, and flexible. These capabilities facilitate high-level tasks such as identification of parameter observability/uniqueness issues, model selection, and uncertainty quantification.

The paper is organized as follows. The methods section provides a general form for the estimation problem under study and discusses how this can be cast as a sparse NLP by using time-discretization techniques. Furthermore, we introduce basic concepts behind NLP solvers that exploit sparsity and structures at the linear algebra level. In addition, we discuss rMAP and outlier mitigation schemes. In the results section, we demonstrate that the proposed framework can handle challenging estimation problems arising in microbial community models.

## Methods

### Estimation for dynamic models

We consider estimation problems of the following form:
(1)
(2)
(3)
(4)
(5)
Here, is the *set of experiments* and is the set of measurement (sampling) times in experiment . Time is denoted as *t* ∈ [0, *T*_{k}], where is the duration for experiment . The variable vector are the differential *state* time trajectories, are the model *parameters*, are the model predicted *outputs* with corresponding *experimental observations* at time and experiment , and are initial conditions for experiment . For convenience in the notation, we define the output vectors and the experimental output vectors for experiment as well as the total output vector and the total experimental output vector *η* = (*η*_{1}, ⋯, *η*_{K}). The vector function *f*_{k}(⋅) denotes the dynamic model mapping, *φ*(⋅) and *φ*_{k}(⋅) are objective function mappings, *ϕ*_{k}(⋅) is the state-to-output mapping, and *h*_{k}(⋅) is the constraint mapping. All the mappings are assumed to be at least twice continuously differentiable with respect to all the arguments. The estimation formulation (1)–(5) captures all the features of our proposed framework. Our framework, however, can also accommodate more general features; for instance, the initial conditions (3) can be also considered as unknown variables that need to be estimated and we can define non-additive objective functions that penalize large errors.

Problem (1)–(5) can be derived from Bayesian principles. To see this and introduce some useful notation, we start from Bayes theorem:
(6)
Here, *p*(*θ* | *η*) is the parameter posterior density (i.e., the parameter density given knowledge on the outputs), *p*(*η* | *θ*) is the output posterior density (i.e., the outputs given knowledge on the parameters), *p*(*θ*) is the prior density (i.e., parameter density before knowledge of the output), and *p*(*η*) is the output marginal density. In a maximum a posteriori (MAP) formulation, the goal is to find the parameters that maximize *p*(*θ* | *η*). Because *p*(*η*) does not depend on *θ*, this can also be achieved by maximizing *p*(*η*|*θ*)*p*(*θ*). Maximizing *p*(*η*|*θ*)*p*(*θ*) is equivalent to maximizing the log-likelihood function *L*(*θ*) = log(*p*(*θ*)) + log *p*(*η*|*θ*). If the outputs from the experiments are independent (which is usually the case), we have that and thus:
(7)
The observed outputs are random variables that are usually considered to be Gaussian and thus , where Σ_{k} is the covariance matrix. The prior density *p*(*θ*) is also often assumed to be Gaussian and thus , where is the mean of the prior distribution and Σ_{θ} is its covariance. With this, we obtain:
(8)
(9)
By comparing (7) with (1) we can see that minimizing is equivalent to minimizing −*L*(*θ*). Here, the dynamic model together with the state-to-output mapping defines a *parameter-to-output mapping* of the form . In a simulation-based estimation approach, the mapping *m*_{k}(*θ*) is computed by simulating the dynamic model (2)–(3) at a trial value *θ* using a differential equation solver and by evaluating the outputs at the sampling times using (4). In a discretization-based approach, the mapping *m*_{k}(*θ*) is not computed explicitly (but we use it here as a mathematical representation that is used to explain some relevant concepts). Constraints (5) restrict the parameter space to be explored.

After dropping all constant terms in the likelihood function we obtain:
(10)
(11)
The function *φ*(*θ*) is usually known as the *prior* term and provides a *regularization* effect that stabilizes the solution of the estimation problem when the parameters cannot be uniquely infered from the available data [42–45]. This regularization term arises from the prior density *p*(*θ*) and provides a mechanism to encode knowledge on the parameters. Assuming that the prior density is Gaussian gives rise to a prior term that is defined by a weighted L2 norm. Recently, the machine learning community has also proposed the use of regularization terms that use L1 norms (e.g., ). The L1 norm induces sparsity in the parameters and corresponds to assuming that the prior density is Laplacian. One can also show that an L1 norm acts as an exact penalty function and implicitly induces constraints on the parameters. Similarly, one can also use the inequality constraints *h*_{k}(⋅) to directly embed physical knowledge in the MAP formulation (e.g., concentrations can only be positive).

From (10) and (11) we see that *φ*_{k}(⋅) are squared error terms and thus the MAP problem minimizes the *sum* of the squared errors across all experiments . This approach offers limited control on large errors that might result from data outliers. Here, we propose to use a *k*-max norm to mitigate these issues. Our proposal is based on the observation that a *k*-max norm is equivalent to a conditional-value-at-risk (CVaR) norm [40, 41, 46]. The CVaR_{β} norm of a vector **e** = (*e*_{1}, ⋯, *e*_{K}) with components is defined as the average of the *β*-fraction of largest elements of the vector (where *β* ∈ [0, 1] is a parameter that defines the size of the fraction) [46]. One can show that, when *β* → 1, the CVaR norm is the largest fitting error and, when *β* → 0, the CVaR norm is the sum of fitting errors (as in the standard MAP formulation). A key computational property of the CVaR norm is that it can be formulated as a standard optimization problem. In particular, the MAP problem with a CVaR error norm can be expressed as:
(12)
(13)
where [⋅]^{+} = max(0, ⋅) is the max function and *γ* is an auxiliary variable [41].

### Nonlinear programming formulation

To solve the MAP problem (1)–(5) we approximate the differential equations by using a discretization scheme. This enables the use of computationally efficient NLP solvers and facilitates high-level UQ and observability monitoring tasks.

#### Time discretization.

Discretization schemes such as Euler, Runge-Kutta, and orthogonal collocation are commonly used to transform differential equations into algebraic ones. Orthogonal collocation is often preferred because accurate approximations can be obtained with few discretization points [23]. To simplify the presentation we use an implicit Euler scheme, which can be shown to be a special type of an orthogonal collocation scheme (it is a one-point Radau collocation scheme). We discretize the time domain [0, *T*_{k}] into a set of intervals with fixed discrete-time points for each experiment (where *t*_{0} = 0 and ). The associated index set is represented by . By applying an implicit Euler scheme, the dynamic model (2) is converted into a set of nonlinear algebraic equations of the form:
(14)
(15)
With this, we can approximate the MAP problem (1)–(5) using the NLP:
(16)
(17)
(18)
(19)
(20)
Here, we use as short-hand notation to represent states at time *t*_{j} and experiment *k*. It is important that a sufficient number of time discretization points are used so that the differential equations are accurately approximated. If the number of time points is not sufficient, the solution of NLP can be sensitive to the choice of the number of points. To verify that the discretization is sufficient, we can use a test that assesses the sensitivity of the solution with respect to discretization. For example, one can increase the number of time points and see if the solutions are reasonably close.

For convenience, we express (16)–(20) in the following abstract form:
(21)
(22)
(23)
where is a large-dimensional vector containing all the discrete-time states , parameters *θ*, and additional auxiliary variables. The mapping is the objective function and are equality constraints that contain algebraic equations obtained fro discretization of the dynamic model and other auxiliary equations. General inequality constraints can be transformed into equality constraints and simple non-negativity bounds by using auxiliary slack variables (i.e., can be written as with *s*_{k} ≥ 0).

A useful representation of the NLP results from noticing that the parameters *θ* are the only complicating (coupling) variables across experiments . Consequently, we can express the NLP in the *structured* from [47]:
(24)
(25)
(26)
Here, the variable vector *w*_{k} contains all the discrete-time states and auxiliary variables of experiment , Φ(⋅) is the prior term, Φ_{k}(⋅) is the contribution of experiment to the likelihood function, and Π_{k}(⋅) contains the discretized dynamic model equations and auxiliary equations for experiment . As we discuss next, this representation can be used to derive parallel solution approaches.

#### Interior-point solvers.

The NLPs that result from time discretization exhibit a high degree of *algebraic sparsity* (only a few variables appear in each constraint) and are highly structured. Sparsity and structure permeates down to linear algebra operations performed inside the optimization solver. This is sharp contrast to the simulation-based approach, which induces dense linear algebra operations in the space of the parameters *θ*. Most modern large-scale NLP solvers such as Ipopt and Knitro seek to exploit sparsity and structure at the linear algebra level to achieve high computational efficiency [48, 49]. Interior point solvers, in particular, provide a flexible framework to do this. These solvers replace the variable bounds by using a logarithmic barrier function. In the context of NLP (21)–(23), this results in a logarithmic barrier subproblem of the form:
(27)
(28)
where is the so-called barrier parameter. The logarithmic term becomes large as *w*^{(i)} approaches the boundary of the feasible region. This ensures that variables remain in the *interior* of the feasible region (hence the origin of the term *barrier*). A key observation is that one can recover a solution of the original NLP (21)–(23) by solving a sequence of barrier problems for decreasing values of *μ* [50]. An important property of interior-point methods is that the original NLP with bounds is converted into a sequence of NLPs with equality constraints. This removes the combinatorial complexity of identifying the set of bounds that are active or inactive at the solution (a bottleneck in active-set solvers).

#### Sparse linear algebra.

Interior-point methods enable efficient linear algebra implementations. To explain how this is done, we note that the optimality conditions of the barrier problem are given by the following set of nonlinear equations:
(29)
(30)
(31)
where, are the Lagrange multipliers of the equality constraints, *ν* are the Lagrange multipliers of the bound constraints, *V* = diag(*ν*) and *W* = diag(*w*) are diagonal matrices, and **1** is a vector of all ones.

By applying Newton’s method to (29)–(31), we obtain the following linear algebra system:
(32)
Here, *ℓ* is the Newton iteration index, Δ*w*^{ℓ} is the search direction for the primal variables, Δλ^{ℓ} is the search direction for the dual variables, and is the Hessian of the Lagrange function . The matrix *M*^{ℓ}(*κ*_{w}) is known as the *augmented matrix*. The Newton step computation in a simulation-based approach operates only in the space of the parameters *θ* (the states are implicitly eliminated by simulation). In the time discretization approach, the Newton search is in the space of both the discretized states and parameters (contained in the high-dimensional variable vector *w*). Interestingly, however, the augmented matrix found in typical applications is highly sparse (with less than 1% of its entries are non-zero) [24].

The constant is a *Hessian regularization parameter* which plays a key role in the context of parameter estimation. In particular, one can prove that the augmented matrix *M*^{ℓ}(*κ*_{w}) is non-singular (and thus the linear algebra system has a unique solution) if and only if the reduced Hessian matrix *Z*^{T} *H*(*w*^{ℓ}, λ^{ℓ})*Z* is positive definite and the Jacobian matrix ∇_{w} Π(*w*^{ℓ}) has full row rank. Here, the matrix is such that its columns span the null-space of the Jacobian (i.e., ∇_{w} Π(*w*^{ℓ})*Z* = 0). Moreover, the matrix *Z* is of the same dimension as the number of degrees of freedom (in our context this is precisely the number of parameters). When the reduced Hessian is positive definite (i.e., all its eigenvalues are positive) and the Jacobian has full row rank, one can prove that the Newton step of the primal variables Δ*w*^{ℓ} obtained from the solution of (29)–(31) is a descent direction for the objective function (i.e., (Δ*w*^{ℓ})^{T}∇_{w} Φ(*w*^{ℓ}) < 0) when the constraints are close to being satisfied (i.e., Π(*w*^{ℓ}) ≈ 0). This is key because it indicates that the Newton step improves the objective function (in our context, the negative likelihood function). This property cannot be guaranteed when the reduced Hessian is not positive definite. When such a situation is encountered, one can increase the regularization parameter *κ*_{w} until the reduced Hessian is positive definite and a descent direction is obtained. This approach is closely connected to the Levenberg-Marquardt method used in simulation-based estimation approaches (in which one regularizes the Hessian of the negative likelihood function as ) [51]. Another key observation is that, when the reduced Hessian is positive definite at the solution *w**, the estimated parameters are *unique*. This provides an indication that the experimental data is sufficiently informative to identify the parameters uniquely (i.e., the parameters are observable). We note that using a prior term *φ*(*θ*) in the MAP formulation has the effect of adding the positive definite matrix to the reduced Hessian. This artificially regularizes the problem (as is done in the Levenberg-Marquardt scheme by adding the term ). Consequently, when testing for observability/uniqueness, it is necessary to drop the prior term from the MAP formulation. Testing for observability also requires exact second order derivative information because the Hessian is needed. In the time-discretization approach, such information can be obtained directly from algebraic modeling languages. Simulation-based solution approaches often cannot check observability of the parameters (computing second derivatives using adjoint and sensitivity schemes is complicated).

Computing the eigenvalues of the reduced Hessian to check for positive definiteness is expensive. Interestingly, one can also determine if the reduced Hessian is positive definite by using inertia information of the augmented matrix *M*^{ℓ}(*κ*_{w}). The inertia of a matrix *M* is denoted as Inertia(*M*) = {*n*_{+}, *n*_{−}, *n*_{0}} where *n*_{+}, *n*_{−}, and *n*_{0} are the number of positive, negative, and zero eigenvalues of matrix M, respectively. One can prove that the reduced Hessian matrix is positive definite if Inertia(*M*^{ℓ}(*κ*_{w})) = {*n*, *m*, 0}, where we recall that *n* is the dimension of the variable vector *w* and *m* is the number of constraints. Notably, one can obtain the inertia of *M*^{ℓ}(*κ*_{w}) without computing the eigenvalues of the matrix. This is done by using modern sparse symmetric factorization routines such as MA57 or Pardiso [50]. Such routines factorize the matrix *M*^{ℓ}(*κ*_{w}) as *LBL*^{T} where *L* is a lower triangular matrix and *B* is a matrix with 1 × 1 and 2 × 2 blocks in the diagonal. One can show that the number of positive and negative eigenvalues of *M*^{ℓ}(*κ*_{w}) are the number of positive and negative eigenvalues of *B* (which are easy to determine).

Modern interior-point solvers are equipped with highly sophisticated safeguarding techniques that enable the solution of highly nonlinear problems. A powerful approach is called a filter line-search method, in which one seeks to find a step-size *κ* such that the trial Newton iteration *w*^{ℓ+1} = *w*^{ℓ} + *κ*Δ*w*^{ℓ} either decreases the objective function or the constraint violation ||Π(*w*^{ℓ})||. If the step is accepted, the current values for the objective and constraint violation (Φ(*w*^{ℓ}), Π(*w*^{ℓ})) are stored in a filter (a history of previous successful iterations). At the next iterate, one requires that the Newton step is not in the filter and that it improves either the objective or the constraint violation. This rather simple strategy is extremely effective in practice.

We highlight the fact that the proposed discretization approach bypasses the need to repetitively simulate the dynamic model (the discretized dynamic model contained in Φ(*w*) is solved progressively by Newton’s method). This brings substantial computational savings. Moreover, since the discrete-time model is solved at the solution *w** of the NLP (21)–(23), we have that the discrete-time states approximate the state trajectories . In the absence of inequality constraints, one can also show that the reduced Hessian *Z*^{T} *H*(*w**, λ*)*Z* approximates the Hessian of the negative log-likelihood function ∇_{θθ} *L*(*θ**) in a neighborhood of *w** (which contains *θ**). We thus have that the reduced Hessian approximates the inverse parameter covariance matrix . When inequality constraints are present, some of the parameters or state variables might hit their physical bounds and this deteriorates the approximation. When the parameters are not unique (the reduced Hessian has zero eigenvalues), the parameter covariance matrix is singular. We highlight that such information can be obtained as by-products of the solution from the nonlinear programming solver. Therefore, the observability of the parameter estimation problem can be analyzed without further computational efforts. We also highlight that parameter observability/identifiability can be checked prior to solving the estimation problem [52, 53]. Unfortunately, such a priori methods require computing parameter-to-output sensitivity matrices, which requires specialized implementations and which might miss to capture impacts of nonlinearities (the sensitivity is computed only at a reference point). Moreover, such approaches do not capture the impact of constraints. Under the proposed NLP framework, observability is checked automatically by using the inertia of the reduced Hessian at the MAP point.

#### Structured linear algebra.

A key advantage of using interior-point solvers is that they enable *modular linear algebra* implementations. For instance, the multi-experiment structure of problem (24)–(26) permeates down to the linear algebra system, to give a system of the form:
(33)
where Δ*θ* is the Newton step for the parameters and Δ*w*_{k} = (Δ*x*_{k}, Δλ_{k}) is the Newton step for variables in experiment *k*. The above system is said to have a block-bordered diagonal (BBD) structure. Here, we have that:
(34)
where , *T*_{k} = ∇_{θ} Π_{k}, , , , , and .

The BBD matrix is a permutation of the augmented matrix *M*^{ℓ}(*κ*_{w}) (obtained by ordering variables by experiment). The BBD matrix can thus be expressed as *P*^{T}*M*^{ℓ}(*κ*_{w})*P* where *P* is a permutation matrix. The permutation does not affect the eigenvalues of the matrix. The BBD system (33) can be solved in parallel by using a Schur complement decomposition approach [47, 54] or specialized preconditioning strategies [55]. In this work, we focus on Schur complement-based approach. This requires the solution of the linear algebra systems:
(35)
(36)
Here, is the *Schur complement matrix* which has the same dimension as the number of degrees of freedom (in our case the number of parameters). The key observation is that the experiment matrices *K*_{k} can be factorized by using an *LBL*^{T} factorization (by using MA57 or PARDISO) *in parallel*. As a result, Schur decomposition can achieve high computational efficiency in estimation problems with many experimental data sets.

When using a Schur decomposition, one can estimate the inertia of the BBD matrix by using Haynsworth’s formula:
(37)
We recall that *n* = *n*_{θ} + ∑_{k} *n*_{k} and *m* = ∑_{k} *m*_{k}. Consequently, if we have that Inertia(*K*_{k}) = {*n*_{k}, *m*_{k}, 0} for all then Inertia(*M*^{ℓ}(*κ*_{w})) = {*n*, *m*, 0} if and only if Inertia(*S*) = {*n*_{θ}, 0, 0} (i.e., the Schur complement is positive definite). One can obtain the inertia of the blocks *K*_{k} and *S* using *LBL*^{T} factorization. This allows us to test observability of the parameters.

We highlight that Schur decomposition and block cyclic reduction techniques can also be used to decompose the estimation problem along the time horizon. This enables the solution of problems with fine time resolutions and long horizons (see [56]). Unfortunately, scalable implementations of such techniques are currently not available (this is an interesting direction of future work). We also highlight that there exist generic parallel linear solvers (such as MA97) that seek to identify and exploit structures on-the-fly. Such approaches are in general not competitive with approaches that communicate structure directly to the solver (such as Schur decomposition).

### Uncertainty quantification

The estimation problem under the MAP framework gives the values of the parameters *θ** that maximize the parameter posterior density. However, a characterization of the entire posterior is necessary to assess parameter uncertainty. The posterior covariance may be approximated from the reduced Hessian at the solution of the problem *w** and the covariance matrix can be used to determine ellipsoidal level sets of the posterior (confidence regions). This approach, however, might fail to capture nonlinear and constraint effects [24]. In this work, we circumvent these issues by using a randomized maximum a posteriori (rMAP) approach. Under this method, the posterior distribution is explored by using random perturbations on the experimental data (which can be easily parallelized). The rMAP framework can also deliver *approximate samples* from the parameter posterior distribution and implicitly captures nonlinear and constraint effects. To show this, we use the implicit mapping representation . Under this representation, the posterior density (6) can be expressed as:
(38)
(39)
where *m*(*θ*) ≔ (*θ*, *m*_{1}(*θ*), ⋯, *m*_{K}(*θ*)), Σ ≔ diag(Σ_{θ}, Σ_{1}, Σ_{2}, ⋯, Σ_{K}), and . Here, we redefine to enable compact notation. Since *θ** is a solution of the MAP problem, we have that:
(40)
If the mapping *m*_{k}(⋅) is continuously differentiable, we have that:
(41)
To enable compact notation we define *η** = *m*(*θ**) and ∇*m** = ∇*m*(*θ**). We have that *θ** satisfies the stationary condition of (40):
(42)
We use (41) to obtain a second-order Taylor approximation of the posterior as:
(43)
This implies that the posterior is *approximately* represented as:
(44)
We recall that the output error is Gaussian and we can thus write *η* = *m*(*θ*) + *ϵ* with . We now consider the MAP problem with randomly perturbed data:
(45)
and note that
(46)
where *C* is a constant. Consequently, for sufficiently small *ϵ*, we can linearize the mapping *m*(⋅) to obtain an approximate solution of (45) of the form:
(47)
Here, we observe that the right-hand side of (47) is Gaussian with mean *θ** and covariance:
(48)
We thus have that:
(49)
Consequently, solving (45) provides an approximate sample from the posterior distribution *p*(*θ* ∣ *η*). The sampling procedure (45) is accurate up to second order. To obtain an exact sampling from the posterior, one needs to implement a rigorous MCMC scheme. The MCMC scheme removes the bias that appears in the rMAP sample density, which results from the second order approximation [35, 36, 36, 37]. This effect is illustrated with an examples in [37]. Several works in the literature, however, report that accurate posterior densities can be obtained using an rMAP scheme [38, 57, 58]. We also note that the rMAP scheme implicitly captures nonlinearities and physical constraints when computing samples from the posterior (MCMC does not capture constraints). In particular, solving the perturbed problem (45) corresponds to solving the MAP problem (1)–(5) with randomly perturbed data and the MAP problem enforces constraints and handles the full nonlinear model. The computational framework provided in this work focuses on the solution of the estimation problem and seeks to highlight that these capabilities enable the implementation of advanced UQ procedures. A detailed analysis and comparison of different UQ methods for biological dynamical models is an interesting topic of future work. Systematic comparisons in other settings are provided in [35, 36].

### Algebraic modeling platforms

Having an algebraic representation of the estimation problem has many practical and computational advantages. In particular, one can implement the estimation problem in easy-to-use and open-source modeling languages such as JuMP [59], Plasmo.jl [60], and Pyomo [61, 62]. These modeling languages are equipped with automatic differentiation techniques that compute exact first and second derivatives. Derivative information is communicated to optimization solvers without any user intervention. Modern algebraic modeling languages such as Plasmo.jl and Pyomo also allow users to convey structural information to the solvers. This is beneficial in the case of parameter estimation, where the structure can be exploited to enable parallelism and the use of high-performance computing clusters. In our framework, we use the modeling language Plasmo.jl to express multi-experiment estimation problems as graphs. Our implementation using Plasmo.jl is illustrated in Fig 2. The full Julia script is available at https://github.com/zavalab/JuliaBox/tree/master/MicrobialPLOS. We highlight that the same script can be used to solve the estimation problem using a general NLP solver such as Ipopt on a single-processor computer or with a structure-exploiting parallel NLP solver such as PIPS-NLP on multiple parallel computing processors (this might be a multi-core computing server or a large-scale computing cluster). This allows users with limited knowledge on scientific computing to gain access to advanced high-performance computing capabilities.

## Results

Human gut microbial communities and microbiomes are highly dynamic networks coupled by positive or negative interactions and numerous feedback loops that display complex behaviors [63–66]. The generalized Lotka-Volterra (gLV) model provides a useful approach to capture such behavior [67–71]. Specifically, gLV captures single species growth rates and intra-species and inter-species positive and negative interactions. We apply the proposed NLP framework to estimate the growth and interaction parameters of the gLV model from experimental data collected in [66]. The microbial species involved in the experiments are shown in Table 1. Experiments were designed to study the synthetic ecology encompassing 12 prevalent human-associated intestinal species.

The gLV model is given by:
(Model 1)
where is the set of microbial species, is the trajectory of the abundance of species , *μ*_{s} is the growth rate of species *s*, and *α*_{ss′} is the interaction parameter that captures the effect of the abundance of species *s*′ on the growth rate of species *s*. Species *s* and species *s*′ are referred to as recipient species and donor species, resepectively. Since (Model 1) only includes smooth mappings, one can apply the parameter estimation framework presented in the previous section.

The parameters (growth rates and interaction) cannot be calculated directly from first-principles and must be estimated from experimental data. The means of the prior densities of the parameters are assumed to be and their standard deviations are assumed to be . Such values are empirically determined by selecting the standard deviation values that give biologically feasible parameter estimates (the range of biologically feasible parameter values are 0.09 < *μ*_{s} < 2.1, −10 < *α*_{ij} < 10, and −10 < *α*_{ii} < 0). The variances for the output measurements are assumed to be *σ*_{k,s}(*t*) = 0.05 max(0.1, *η*_{k,s}(*t*)). There are a total of 156 parameters including 12 monospecies growth rate and 144 interaction parameters (12 x 12). The set of experiments includes 12 monospecies experiments and 66 pairwise community experiments (total of *K* = 78 experiments). The estimation problem contains a total of 144 differential equations (i.e. the model is a system of ordinary differential equations on ). The computational characteristics of the estimation problem are summarized in Table 2 (labeled as P1).

The dynamic model is discretized using an implicit Euler scheme with five equally-spaced discretization points (monospecies experiments) and 120 equally-spaced discretization points (pairwise experiments). The sufficiency of discretization is verified by checking the sensitivity of the solution with respect to the change of discretization. The following scheme is used:

- Solve the problem to obtain the set of parameters
*θ**. - Increase the number of discretization points by a factor of two (i.e., reduce the time step by half), and resolve the problem to obtain the set of parameters .
- Check if
*θ** and satisfy the following sensitivity criteria: (50)

We use *ϵ*_{abs} = *ϵ*_{abs} = 0.01. We have verified that the above-mentioned discretization sheme yields solutions that satisfy (50).

### Observability and regularization

Parameter observability was checked by solving the MAP formulation for P1 (which uses the available experimental data) and by checking the inertia of the augmented system at the solution (reported by Ipopt). Here, we omitted the prior regularization term *φ*(⋅). We found that parameters obtained from P1 are *not unique* (not observable from the available data). Moreover, we found that the estimated parameter values without regularization have unrealistic (non-physical) values, see Fig 3(a). This observation justifies the need to use prior information. We have used L1 and L2 priors with uniform standard deviations (other forms of prior can also be used as long as they can be reformulated as smooth functions). The results are presented in Fig 3(b) and 3(c). Unique parameter estimates were found when L1 or L2 priors were used. We also found that the L1 prior induces sparser solutions (many parameters are zero). For the remainder of the results, we use the formulation with an L2 prior.

**(a)** Estimates using MAP formulation with no prior information. **(b)** Estimates using L1 prior. **(c)** Estimates using L2 prior. (a-c) The first row shows values for the growth rate parameters *μ*_{s} and the rest of the rows show values for the interaction parameters *α*_{ss′}. The species name corresponding to *s* and *s*′ are presented on the *x* and *y* axes. Recipient and donor species are on the *x* and *y*-axis, respectively.

### Model fitting

Model validation was performed by assessing the goodness of fit to the experimental data (Fig 4). We can see that the model is capable of fitting most of the data points, but there are a number of experiments where the model prediction deviates significantly from the experimental data (such experiments are highlighted with red boxes). Furthermore, we can observe outliers at single data points (highlighted with red circles). Poor fitting can be caused by either bad local minima (the optimization solver finds a local optimal solution rather than the global optimal solution) or by a structural model error (the model structure is incapable of capturing the actual behavior of the system). To avoid bad local minima, we solved the MAP problem with multiple starting points. Such an approach increases the probability to find the global optimum, but obtaining a rigorous certificate of a global minimum is computationally challenging (rigorous global optimization techniques are currently not scalable to large problems). We found that the use of multiple starting points does not improve the model fit. Consequently, we attribute fitting errors to the model structure itself. In particular, the gLV model neglects various physical and biological phenomena such as lag phase or interaction coefficients that change as a function of time [66]. To investigate structural errors, we solved the MAP problem with a variant of the gLV model. In particular, we investigated the saturable gLV model [72, 73] (Model 2):
(Model 2)
where *K*_{ss′} > 0 are additional interaction parameters. The saturable model exhibits a much higher degree of nonlinearity than the gLV model and includes *S*^{2} more parameters (the number of degrees of freedom increases from *S*^{2} + *S* to 2*S*^{2} + *S*). As a result, the saturable gLV model provides more flexibility to improve model fitting. These results thus seek to illustrate that the proposed NLP framework can be used to handle challenging dynamic models. The model fitting obtained with the saturable gLV form is illustrated in Fig 5. As can be seen, significant improvements were made; in particular, the overall fitting error
(51)
was reduced by 30%. Increasing the number of degrees of freedom can cause overfitting, however, and this can make the model less predictive. Consequently, there is a trade-off between fit and predictability.

Subplots show the measured and predicted species abundance in the microbial community. The subplots on the diagonal show fitting for mono-species experiments. Subplots on the *i*-th row and *j*-th column shows fitting for the corresponding pairwise culture (the abundance of species *i* in the presence of species *j*). Recipient and donor species are listed in rows and columns, respectively. For each subplot, the *y*-axis represents the absolute abundance in the community based on relative abundance multiplied by total biomass (OD600) and the *x*-axis represents the experiment time in hours. Data points are denoted by grey dots and dynamic model trajectories are denoted by solid lines. The data points highlighted with red circles are data points corresponding to the ten largest errors . The subplots highlighted with red boxes are subplots for the experiments with the ten largest total prediction errors .

Model fitting for 4 experiments selected among the experiments with 10 largest total prediction errors. The gLV model fits (dotted line) are compared with those of the saturable model (solid line). **(a)** Model fits to monospecies experiment with CH. **(b)** Model fits to monospecies experiment with CA. **(c)** Model fits to pairwise community experiment with PC in the presence of EL. **(d)** Model fits for pairwise community experiment with CA in the presence of ER.

To mitigate outliers, we solved the MAP problem with a CVaR norm and *β* = 0.9 (to penalize the 10% largest errors). The relevant results are summarized in Fig 6. It can be observed that the fitting errors for the outliers obtained with the standard MAP formulation are reduced. The effect of the CVaR formulation is also evident when analyzing the prediction error histogram (see Fig 7). In particular, we observe that the tail of high prediction errors becomes less pronounced under the CVaR formulation. In particular, the mean of the 10% largest errors decreases by 18% (from 167.81 to 137.04). On the other hand, it can also be observed that the mean error increases under CVaR and that the tail of small errors shrinks. This illustrates the fundamental trade-off that usually arises in robust statistics. The behavior induced with CVaR aids estimator performance because it prevents overfitting experimental data sets.

Model fits for 4 experiments selected among the experiments with 10 largest prediction errors . The model fits from standard MAP formulation (dotted line) are compared with the model fits from CVaR formulation with *β* = 0.9 (solid line). **(a)** gLV model fits to pairwise community experiment of PC in the presence of EL. **(b)** gLV model fits to pairwise community experiment of CA in the presence of ER. **(c)** gLV model fits to pairwise community experiment with CH in the presence of PC. **(d)** gLV model fits to pairwise community experiment with CH in the presence of BO.

**(a)** Error histogram for the standard MAP formulation. **(b)** Tail region of (a). **(c)** Error histogram for CVaR formulation. **(d)** Tail region of (c). (a-d) The *x*-axis represents the value of prediction error evaluated at the solution and the *y*-axis represents the frequency. The red and blue line represent quantiles: the overall mean of prediction errors (red) and the mean of largest 10% errors (blue).

### Inference (posterior) analysis

We used rMAP to assess the uncertainty of the 156 parameters estimated from P1 using the available experimental data. To do so, we draw data perturbations as *η*_{k,s}(*t*) ← *η*_{k,s}(*t*) + *ϵ*_{k,s}(*t*) with . We solved 500 MAP problems to obtain parameter samples and use this to approximate the covariance matrix for the posterior. The standard deviations are shown in Fig 8(a) and the marginals for the posterior are shown in Fig 9. A large standard deviation indicates that the estimated parameter value is not reliable. We note that about half of the parameters can be estimated reliably while the other half exhibit significant uncertainty. This indicates that more experimental data should be obtained. From the sample covariance, we generated 95% ellipsoidal confidence regions for each pair of parameters. The correlation plots of *μ*_{s} against *α*_{ss′} for are shown in Fig 10 and the Pearson correlation coefficients are shown in Fig 8(b). In an ideal case, the parameters should be uncorrelated because data should be sufficient to estimate each parameter reliably. Using our data set, however, we can observe strong correlations between the parameters *μ*_{s} and *α*_{ss} in Fig 10, and strong positive and negative correlations can also be found in Fig 8(b).

**(a)** Heat map presents the standard deviation for the parameter posterior density. The first row shows the standard deviations of the growth rate parameters *μ*_{s} and the rest of the rows show the standard deviations of the interaction coefficients *α*_{ss′}. Recipient and donor species are on the *x* and *y*-axis, respectively. The data points highlighted with green circles are data points corresponding to the ten largest standard deviations. **(b)** The heat map represents the Pearson correlation coefficients of the poterior distributions. The *x*-axis and the *y*-axis represents the index of parameters where the parameter vector is constructed as *θ* = (*μ*_{1}, *α*_{11}⋯*α*_{1S}, ⋯, *μ*_{S}, *α*_{S1}, ⋯ *α*_{SS}). The block on the *i*-th row and *j*-th column is the Pearson correlation between the *i*-th and *j*-th component of parameter vector *θ*.

Each subplot shows the histogram of the samples from the approximate parameter posterior. The *x*-axis represents the values of the estimated parameters and the *y*-axis represents the frequencies. The subplots on the first column show the distribution of the growth rates *μ*_{s} and the rest of the subplots show distributions of the interaction parameters *α*_{ss′}. Recipient and donor species are listed in rows and columns, respectively. The *x*-axis is scaled to show *μ* ± 3*σ* where the *μ* is the mean and the *σ* is the standard deviation of the posterior distribution.

Each subplot shows the 95% confidence regions (solid ellipses) of the approximate parameter posterior distributions and the sample points (dots). The subplots on the *s*-th row and *s*′-th column show the correlation of *μ*_{s} and *α*_{ss′}. Recipient and donor species are listed in rows and columns, respectively. Only a representative subset of parameter pairs is presented (there are a total 12,090 pairs).

Furthermore, since the whole approximate distribution is obtained in the inference analysis based on rMAP framework, we can perform more sophisticated analysis on the characteristics of the distribution. In particular, one can investigate third and fourth moments (Fig 11) to examine the skewness and the kurtosis of the distribution. Such information can be used to investigate the deviation of the posterior distribution from the normal distribution. If the posteriors are stritly normally distributed, the third and fourth moments should be zero and 3*σ*^{4}, respectively. However, we can observe that many posterior distributions deviate from such expectations. Thus, we can see that some of the distributions are not close to the normal distribution.

**(a)** Heat map presents the third momentum of the parameter posterior density (normalized by *σ*^{3}).**(b)** The heat map represents the fourth momentum of the poterior density (normalized by *σ*^{4}). (a-b) The first row shows the standard deviations of the growth rate parameters *μ*_{s} and the rest of the rows show the standard deviations of the interaction coefficients *α*_{ss′}. Recipient and donor species are on the *x* and *y*-axis, respectively.

### Computational scalability

We assessed the computational scalability of the estimation framework by analyzing problems with different sizes and characteristics. Problem P1 was implemented in the algebraic modeling platform JuMP and solved with the NLP solver Ipopt configured with the sparse linear solver MA57. Problem P1 with gLV model and L2 prior was solved in 134 seconds and 78 NLP iterations on a standard computing server with an Intel(R) Xeon(R) CPU E5-2698 v3 processor running at 2.30GHz. Problem P1 with gLV model and L1 prior was solved in 219 seconds and 68 NLP iterations with the same hardware. A comparable problem requires over 7 hours to solve using a simulation-based approach implemented in Matlab and that uses finite differences to obtain first derivatives [66]. Despite the significant gains in computational performance obtained with Ipopt, its solution time scales nearly quadratically with the number of data sets. To overcome this scalability issue, we compared the performance of the serial solver Ipopt against that of the parallel solver PIPS-NLP (which uses a Schur complement decomposition to perform linear algebra operations). To test the scalability of PIPS-NLP, we generated a larger version of the estimation problem (labeled as P2). This problem is created by adding synthetic data sets. The NLP corresponding to P2 has over one million variables and constraints (but the number of parameters is the same as that of P1). This problem was implemented in Plasmo.jl. The benefit of using a parallel approach is clearly seen in Fig 12. Here, we highlight that PIPS-NLP solved P2 in less than 10 minutes and 94 NLP iterations using 16 cores while Ipopt requires around 30 minutes and 67 iterations. Furthermore, IPOPT found a different local solution and the solution from PIPS-NLP had a better objective value. Fig 12(b) also shows that PIPS-NLP achieves nearly perfect strong scaling (speedup increases linearly with the number of cores).

**(a)** Solution time for P2 using Ipopt and PIPS-NLP. The *y*-axis shows the solution time and the *x*-axis shows the number of cores used. For Ipopt the single core solution time is given by the horizontal blue line. **(b)** The *y*-axis represents the speed-up (the single-core solution time divided by the multi-core solution time). The blue line is the single-core solution time of PIPS-NLP divided by the single-core solution time of Ipopt. The grey dashed line represents the strong scaling line.

In rMAP-based uncertainty quantification, the main computational challenge was the repetitive solution of the optimization problems. However, such challenge can be overcome by using the existing solution information. The required number of iterations in NLP solver can be greatly reduced when a good starting point (initial guess of the solution) is available (often referred to as *warm start*). Since only small modifications are made to the original problem to formulate the rMAP problem, the NLP solution of rMAP problem is very similar to that of the original problem. Thus, by warm-starting the NLP with the original NLP solution, the computational efforts to solve rMAP problem can be significantly reduced. In particular, most rMAP sampling problem was solved in less than 10 NLP iterations while the original problem required 78 NLP iterations.

We also assessed computational capability in estimation problems with the larger number of species in the microbial community (which increases the number of differential equations and parameters). Here, we generated synthetic data using simulations for larger communities. The generated data are summarized in Table 2. The number of the parameters and of data points scales nearly quadratically with respect to the size of the community. The computation times are shown in Fig 13. The results indicate that, by using PIPS-NLP, one can solve estimation problems with up to 48 species in *less than 15 minutes and 40 NLP iterations* (using 12 parallel computing cores). We highlight that, to the best of our knowledge, problem S4 is the largest estimation problem reported in computational biology literature. This problem contains 2,304 differential equations, 2,352 parameters, and 20,352 data points. The corresponding NLP contains 1.3 million variables and constraints.

**(a)** Number of variables against community size (total number of species). **(b)** The computation times for problems S1-S4 (see Table 2). The problems were solved with PIPS-NLP on 12 parallel cores (Intel(R) Xeon(R) CPU E5-2698 v3 processor running at 2.30GHz).

## Discussion

The high computational efficiency achieved with the proposed framework can enable kinetic modeling of complex biological systems ranging from biomolecular networks to high-dimensional microbial communities [74]. Indeed, the proposed framework can be used to construct and analyze high-fidelity models of whole-cells or microbiomes [75, 76]. In particular, these methods can be applied to develop predictive dynamic models of multi-gene synthetic circuits interacting with host-cell processes for accurately predicting cell growth and synthetic circuit activity [77] or kinetic models of metabolite transformations driving community dynamics. The proposed framework can also be used to handle more sophisticated models that arise in biological systems such as delay differential equations, differential and algebraic equations, and partial differential equations. Dynamic biological models with embedded metabolic flux formulations (giving rise to non-smooth behavior) can be handled with the proposed framework by using reformulations [78]. Estimation problems for stochastic differential equations (such as stochastic chemical kinetics models) cannot be handled with the proposed framework and remain a challenging class of problems [79].

The methods provided in this work will advance our capability of integrating mechanistic modeling frameworks with large-scale experimental data. Furthermore, uncertainty quantification and observability analysis can provide valuable information to guide and accelerate experimental data collection. These capabilities are also essential in diagnosing structural model errors. The proposed framework uses state-of-the-art and easy-to-use modeling and solution tools that can be broadly applied to diverse biological systems and accessible to a wide range of users. In the future, the proposed framework can be interfaced with systems biology markup languages such as SBML [80] and CellML [81] to allow broader applicability. Together, these advances can ultimately transform biology into a predictive and model-guided discipline.

## References

- 1. Venturelli OS, Zuleta I, Murray RM, El-Samad H. Population diversification in a yeast metabolic program promotes anticipation of environmental shifts. PLoS biology. 2015;13(1):e1002042. pmid:25626086
- 2. Friedman J, Gore J. Ecological systems biology: The dynamics of interacting populations. Current Opinion in Systems Biology. 2017;1:114–121.
- 3. Venayak N, Anesiadis N, Cluett WR, Mahadevan R. Engineering metabolism through dynamic control. Current opinion in biotechnology. 2015;34:142–152. pmid:25616051
- 4. Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA, Blom JG. Systems biology: Parameter estimation for biochemical models. FEBS Journal. 2009;276(4):886–902. pmid:19215296
- 5. Sun J, Garibaldi JM, Hodgman C. Parameter estimation using metaheuristics in systems biology: A comprehensive review. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012;9(1):185–202. pmid:21464505
- 6. Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, et al. Lessons learned from quantitative dynamical modeling in systems biology. PloS one. 2013;8(9):e74335. pmid:24098642
- 7. Fröhlich F, Kaltenbacher B, Theis FJ, Hasenauer J. Scalable parameter estimation for genome-scale biochemical reaction networks. PLoS computational biology. 2017;13(1):e1005331. pmid:28114351
- 8. Leppavuori JT, Domach MM, Biegler LT. Parameter estimation in batch bioreactor simulation using metabolic models: Sequential solution with direct sensitivities. Industrial & Engineering Chemistry Research. 2011;50(21):12080–12091.
- 9. Mendes P, Kell D. Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics (Oxford, England). 1998;14(10):869–883.
- 10. Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome research. 2003;13(11):2467–2474. pmid:14559783
- 11. Kirkpatrick S, Gelatt CD, Vecchi MP, et al. Optimization by simulated annealing. science. 1983;220(4598):671–680. pmid:17813860
- 12. Chen WW, Schoeberl B, Jasper PJ, Niepel M, Nielsen UB, Lauffenburger DA, et al. Input–output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Molecular systems biology. 2009;5(1):239. pmid:19156131
- 13. Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M. Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics. 2003;19(5):643–650. pmid:12651723
- 14.
Tominaga D, Koga N, Okamoto M. Efficient numerical optimization algorithm based on genetic algorithm for inverse problem. In: Proceedings of the 2nd Annual Conference on Genetic and Evolutionary Computation. Morgan Kaufmann Publishers Inc.; 2000. p. 251–258.
- 15.
Yang XS. Nature-inspired metaheuristic algorithms. Luniver press; 2010.
- 16. Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface. 2009;6(31):187–202.
- 17. Toni T, Stumpf MP. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics. 2009;26(1):104–110. pmid:19880371
- 18. Balsa-Canto E, Peifer M, Banga JR, Timmer J, Fleck C. Hybrid optimization method with general switching strategy for parameter estimation. BMC systems biology. 2008;2(1):26. pmid:18366722
- 19. Vaz AIF, Vicente LN. A particle swarm pattern search method for bound constrained global optimization. Journal of Global Optimization. 2007;39(2):197–219.
- 20. Zavala VM, Biegler LT. Optimization-based strategies for the operation of low-density polyethylene tubular reactors: Moving horizon estimation. Computers & Chemical Engineering. 2009;33(1):379–390.
- 21. López-Negrete R, Biegler LT. A moving horizon estimator for processes with multi-rate measurements: A nonlinear programming sensitivity approach. Journal of Process Control. 2012;22(4):677–688.
- 22. Lillacci G, Khammash M. Parameter estimation and model selection in computational biology. PLoS computational biology. 2010;6(3):e1000696. pmid:20221262
- 23.
Biegler LT. Nonlinear programming: concepts, algorithms, and applications to chemical processes. vol. 10. Siam; 2010.
- 24.
Zavala VM. Computational strategies for the optimal operation of large-scale chemical processes. Carnegie Mellon University; 2008.
- 25. Albuquerque JS, Biegler LT. Decomposition algorithms for on-line estimation with nonlinear DAE models. Computers & chemical engineering. 1997;21(3):283–299.
- 26. Leibman M, Edgar T, Lasdon L. Efficient data reconciliation and estimation for dynamic processes using nonlinear programming techniques. Computers & chemical engineering. 1992;16(10-11):963–986.
- 27. Tjoa IB, Biegler LT. Simultaneous solution and optimization strategies for parameter estimation of differential-algebraic equation systems. Industrial & Engineering Chemistry Research. 1991;30(2):376–385.
- 28. Betts JT. Optimal interplanetary orbit transfers by direct transcription. Journal of the Astronautical Sciences. 1994;42(3):247–268.
- 29. Betts JT, Cramer EJ. Application of direct transcription to commercial aircraft trajectory optimization. Journal of Guidance, Control, and Dynamics. 1995;18(1):151–159.
- 30. Bottasso CL, Croce A. Optimal control of multibody systems using an energy preserving direct transcription method. Multibody System Dynamics. 2004;12(1):17–45.
- 31. Biegler LT. An overview of simultaneous strategies for dynamic optimization. Chemical Engineering and Processing: Process Intensification. 2007;46(11):1043–1053.
- 32. Pirnay H, López-Negrete R, Biegler LT. Optimal sensitivity based on IPOPT. Mathematical Programming Computation. 2012;4(4):307–331.
- 33. Chib S, Greenberg E. Understanding the metropolis-hastings algorithm. The american statistician. 1995;49(4):327–335.
- 34.
Gamerman D, Lopes HF. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. Chapman and Hall/CRC; 2006.
- 35. Petra N, Martin J, Stadler G, Ghattas O. A computational framework for infinite-dimensional Bayesian inverse problems, Part II: Stochastic Newton MCMC with application to ice sheet flow inverse problems. SIAM Journal on Scientific Computing. 2014;36(4):A1525–A1555.
- 36. Wang K, Bui-Thanh T, Ghattas O. A randomized maximum a posteriori method for posterior sampling of high dimensional nonlinear Bayesian inverse problems. SIAM Journal on Scientific Computing. 2018;40(1):A142–A171.
- 37. Bardsley JM, Solonen A, Haario H, Laine M. Randomize-Then-Optimize: a Method for Sampling From Posterior Distributions in Nonlinear Inverse Problems. Siam Journal on Scientific Computing. 2014;36(4):A1895–A1910.
- 38.
Oliver DS. Metropolized Randomized Maximum Likelihood for sampling from multimodal distributions. arXiv preprint arXiv:150708563. 2015.
- 39.
Oliver DS, He N, Reynolds AC. Conditioning permeability fields to pressure data. In: ECMOR V-5th European Conference on the Mathematics of Oil Recovery; 1996.
- 40. Rockafellar RT, Uryasev S. Optimization of conditional value-at-risk. Journal of risk. 2000;2:21–42.
- 41. Rockafellar RT, Uryasev S. Conditional value-at-risk for general loss distributions. Journal of banking & finance. 2002;26(7):1443–1471.
- 42.
Boyd S, Vandenberghe L. Convex optimization. Cambridge university press; 2004.
- 43.
Tikhonov A. Numerical methods for the solution of ill-posed problems.
- 44.
Golub GH, Van Loan CF. Matrix computations. vol. 3. JHU Press; 2012.
- 45. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996; p. 267–288.
- 46. Pavlikov K, Uryasev S. CVaR norm and applications in optimization. Optimization Letters. 2014;8(7):1999–2020.
- 47. Zavala VM, Laird CD, Biegler LT. Interior-point decomposition approaches for parallel solution of large-scale nonlinear parameter estimation problems. Chemical Engineering Science. 2008;63(19):4834–4845.
- 48. Wächter A, Biegler LT. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming. 2006;106(1):25–57.
- 49.
Byrd RH, Nocedal J, Waltz RA. KNITRO: An integrated package for nonlinear optimization. In: Large-scale nonlinear optimization. Springer; 2006. p. 35–59.
- 50.
Zavala VM, Biegler LT. Nonlinear programming strategies for state estimation and model predictive control. In: Nonlinear model predictive control. Springer; 2009. p. 419–432.
- 51.
Bard Y. Nonlinear parameter estimation. 1974.
- 52. Lopez C D C, Wozny G, Flores-Tlacuahuac A, Vasquez-Medrano R, Zavala VM. A Computational Framework for Identifiability and Ill-Conditioning Analysis of Lithium-Ion Battery Models. Industrial & Engineering Chemistry Research. 2016;55(11):3026–3042.
- 53. McLean KA, McAuley KB. Mathematical modelling of chemical processes–obtaining the best model predictions and parameter estimates using identifiability and estimability procedures. The Canadian Journal of Chemical Engineering. 2012;90(2):351–366.
- 54.
Kang J, Chiang N, Laird CD, Zavala VM. Nonlinear programming strategies on high-performance computers. In: Decision and Control (CDC), 2015 IEEE 54th Annual Conference on. IEEE; 2015. p. 4612–4620.
- 55. Cao Y, Laird CD, Zavala VM. Clustering-based preconditioning for stochastic programs. Computational optimization and applications. 2016;64(2):379–406.
- 56. Wan W, Eason JP, Nicholson B, Biegler LT. Parallel cyclic reduction decomposition for dynamic optimization problems. Computers & Chemical Engineering. 2019;120:54–69.
- 57. Emerick AA, Reynolds AC. Investigation of the sampling performance of ensemble-based methods with a simple reservoir model. Computational Geosciences. 2013;17(2):325–350.
- 58.
Gao G, Zafari M, Reynolds AC, et al. Quantifying uncertainty for the PUNQ-S3 problem in a Bayesian setting with RML and EnKF. In: SPE reservoir simulation symposium. Society of Petroleum Engineers; 2005.
- 59. Dunning Iain, Huchette Joey, Lubin Miles. JuMP: A modeling language for mathematical optimization. SIAM Review. 2017;59(2):295–320.
- 60. Jalving J, Abhyankar S, Kim K, Hereld M, Zavala VM. A graph-based computational framework for simulation and optimisation of coupled infrastructure networks. IET Generation, Transmission & Distribution. 2017;11(12):3163–3176.
- 61.
Hart WE, Laird CD, Watson JP, Woodruff DL, Hackebeil GA, Nicholson BL, et al. Pyomo–optimization modeling in python. vol. 67. 2nd ed. Springer Science & Business Media; 2017.
- 62. Hart WE, Watson JP, Woodruff DL. Pyomo: modeling and solving mathematical programs in Python. Mathematical Programming Computation. 2011;3(3):219–260.
- 63. Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207.
- 64. Tropini C, Earle KA, Huang KC, Sonnenburg JL. The Gut microbiome: connecting spatial organization to function. Cell host & microbe. 2017;21(4):433–442.
- 65. Earle KA, Billings G, Sigal M, Lichtman JS, Hansson GC, Elias JE, et al. Quantitative imaging of gut microbiota spatial organization. Cell host & microbe. 2015;18(4):478–488.
- 66. Venturelli OS, Carr AC, Fisher G, Hsu RH, Lau R, Bowen BP, et al. Deciphering microbial interactions in synthetic human gut microbiome communities. Molecular Systems Biology. 2018;14(6). pmid:29930200
- 67. Lotka AJ. Elements of physical biology. Science Progress in the Twentieth Century (1919-1933). 1926;21(82):341–343.
- 68. Volterra V. Variations and fluctuations of the number of individuals in animal species living together. ICES Journal of Marine Science. 1928;3(1):3–51.
- 69. Stein RR, Bucci V, Toussaint NC, Buffie CG, Rätsch G, Pamer EG, et al. Ecological modeling from time-series inference: insight into dynamics and stability of intestinal microbiota. PLoS computational biology. 2013;9(12):e1003388. pmid:24348232
- 70. Mounier J, Monnet C, Vallaeys T, Arditi R, Sarthou AS, Hélias A, et al. Microbial interactions within a cheese microbial community. Applied and environmental microbiology. 2008;74(1):172–181. pmid:17981942
- 71. Widder S, Allen RJ, Pfeiffer T, Curtis TP, Wiuf C, Sloan WT, et al. Challenges in microbial ecology: building predictive understanding of community function and dynamics. The ISME journal. 2016;10(11):2557. pmid:27022995
- 72. Momeni B, Xie L, Shou W. Lotka-Volterra pairwise modeling fails to capture diverse pairwise microbial interactions. Elife. 2017;6. pmid:28350295
- 73. Thébault E, Fontaine C. Stability of ecological communities and the architecture of mutualistic and trophic networks. Science. 2010;329(5993):853–856. pmid:20705861
- 74. Stanford NJ, Lubitz T, Smallbone K, Klipp E, Mendes P, Liebermeister W. Systematic construction of kinetic models from genome-scale metabolic networks. PloS one. 2013;8(11):e79195. pmid:24324546
- 75. Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B Jr, et al. A whole-cell computational model predicts phenotype from genotype. Cell. 2012;150(2):389–401. pmid:22817898
- 76. Macklin DN, Ruggero NA, Covert MW. The future of whole-cell modeling. Current opinion in biotechnology. 2014;28:111–115. pmid:24556244
- 77. Weiße AY, Oyarzún DA, Danos V, Swain PS. Mechanistic links between cellular trade-offs, gene expression, and growth. Proceedings of the National Academy of Sciences. 2015; p. 201416533.
- 78. Raghunathan AU, Pérez-Correa JR, Agosin E, Biegler LT. Parameter estimation in metabolic flux balance models for batch fermentation–Formulation & Solution using Differential Variational Inequalities (DVIs). Annals of Operations Research. 2006;148(1):251–270.
- 79. Srivastava R, Anderson DF, Rawlings JB. Comparison of finite difference based methods to obtain sensitivities of stochastic chemical kinetic models. The Journal of chemical physics. 2013;138(7):074110. pmid:23445000
- 80. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. pmid:12611808
- 81. Lloyd CM, Halstead MD, Nielsen PF. CellML: its future, present and past. Progress in biophysics and molecular biology. 2004;85(2-3):433–450. pmid:15142756