Bayesian parameter estimation for dynamical models in systems biology

Dynamical systems modeling, particularly via systems of ordinary differential equations, has been used to effectively capture the temporal behavior of different biochemical components in signal transduction networks. Despite the recent advances in experimental measurements, including sensor development and ‘-omics’ studies that have helped populate protein-protein interaction networks in great detail, modeling in systems biology lacks systematic methods to estimate kinetic parameters and quantify associated uncertainties. This is because of multiple reasons, including sparse and noisy experimental measurements, lack of detailed molecular mechanisms underlying the reactions, and missing biochemical interactions. Additionally, the inherent nonlinearities with respect to the states and parameters associated with the system of differential equations further compound the challenges of parameter estimation. In this study, we propose a comprehensive framework for Bayesian parameter estimation and complete quantification of the effects of uncertainties in the data and models. We apply these methods to a series of signaling models of increasing mathematical complexity. Systematic analysis of these dynamical systems showed that parameter estimation depends on data sparsity, noise level, and model structure, including the existence of multiple steady states. These results highlight how focused uncertainty quantification can enrich systems biology modeling and enable additional quantitative analyses for parameter estimation.


Introduction
Mathematical modeling is an integral part of systems biology; indeed, the use of approaches from dynamical systems analyses resulted in a paradigmatic shift in our understanding of biochemical signal transduction and enabled the identification of the emergent properties of a signaling network [1][2][3][4]. Additionally, mathematical models allow us to investigate the dynamics of biological systems beyond what is experimentally possible [5][6][7][8]. A classical approach to modeling the dynamics of signal transduction is the use of systems of ordinary differential equations (ODEs) [9,10]. Often these equations include nonlinear functions to capture complex biochemical interactions using Michaelis-Menten kinetics and Hill functions for cooperative binding [11]. One of the ongoing challenges in developing and constraining predictive models of signal transduction has been the estimation and identification of the kinetic parameters associated with these reactions and quantifying the associated uncertainty [12][13][14]. The use of rigorous, quantitative approaches to estimate kinetic parameters and their uncertainties is in its early stages in systems biology [12,[15][16][17][18] even though such methods are far more prevalent in the greater computational science community under the field of uncertainty quantification (UQ) [19][20][21][22].
There are many sources of uncertainty in dynamical systems modeling of signal transduction, including the model structure itself, the values of model parameters, and the quality of parameters. Gutenkunst et al. [28] found that most models in systems biology contain parameters with a wide range of sensitivities, which they termed 'sloppy'. Despite this challenge, sensitivity analysis helps rank the set of identifiable parameters by their contributions to specified model outputs, [39] as was done in Mortlock et al. [40] for the prolactin-mediated JAK-STAT signaling pathway. This analysis enabled them to select a subset of 33 out of 60 total parameters that significantly contribute to variations in the model outputs.
Commonly used methods to estimate parameters for systems biology models include frequentist [17] and Bayesian approaches [18]. In the frequentist setting, parameter estimation is formulated as an optimization problem, and the solution to the parameter estimation problem is the set of parameters that best recapitulates the data [19]. Additionally, frequentist approaches quantify uncertainty via estimated confidence intervals around the optimal parameters [19,41,42]. In contrast, in the Bayesian perspective, parameters are assumed to be random variables whose unknown probability distributions, called posterior distributions, quantify the probability of assuming any value in the parameter space [19,21,27]. The advantage of Bayesian approaches comes from their ability to characterize the entire posterior distribution and quantify the uncertainty in parameter estimates via credible intervals [27]. Many methods have been developed for Bayesian parameter estimation [18,[43][44][45][46][47][48][49] that all aim to characterize the posterior distribution, by leveraging Bayes' rule [19,27]. For example, Mortlock et al. [40] successfully used Bayesian estimation to study the uncertainty in the model predictions and assess the statistical significance of their modeling results.
Despite the successes of Bayesian parameter estimation in systems biology [12,15,16,18], failure to account for all sources of uncertainty in a model can significantly inhibit parameter estimation and uncertainty quantification [16,26,28]. Thus, a comprehensive framework for UQ in systems biology should include rigorous accounting of uncertainties in the model structure, nonidentifiable parameters, mixed parameter sensitivities, and noisy, sparse, or incomplete experimental data. While identifiability and sensitivity analyses are typically performed prior to parameter estimation [16,29], accounting for model form uncertainty requires us to consider a stochastic model instead of a deterministic one [23,26,30,50]. One promising approach to account for model form uncertainty is the Unscented Kalman filter Markov chain Monte Carlo (UKF-MCMC) method [26,51]. This method includes statistical models for noisy data and model form uncertainty simultaneously; however, it has not been adapted for dealing with the unique challenges in system biology. Similarly, the parameter estimation and model selection method in [30] accounts for model form uncertainty with extended Kalman filtering but does not provide complete uncertainty estimates because it takes a frequentist approach for parameter estimation. Thus, there is a need for a framework that combines structural identifiability analysis, global sensitivity analysis, a statistical model for data and model form uncertainty, and Bayesian parameter estimation for comprehensive UQ of dynamical models in systems biology.
The novelty of this work lies in the comprehensive workflow we have developed to advance the current state of the art for uncertainty quantification in systems biology. Additionally, the proposed framework accounts for uncertainty in the data and model structure by building on the Bayesian framework and the UKF-MCMC method [26,51]. Our proposed workflow begins with structural identifiability analysis [36,52] and global sensitivity analysis [37,53] to eliminate parameters that cannot be learned from the measurements or that are not affecting the model outputs and then extends UKF-MCMC for systems biology by leveraging the constrained interval unscented Kalman filter (CIUKF) [54]. Taken together, each of these steps quantitatively addresses uncertainties encountered during model development and calibration to improve predictive modeling in systems biology.
The remainder of this paper details our comprehensive workflow for uncertainty quantification in systems biology and presents several examples to highlight this analysis. First, in Section 1 we introduce the proposed framework and provide the mathematical details needed to understand and apply the approach. Next, in Section 2 we apply this framework to three systems biology models of increasing complexity, including a simple two-state model [55], a model of the core mitogen-activated protein kinase (MAPK) signaling pathway [56], and a phenomenological model of synaptic plasticity to capture long-term potentiation/ depression [57]. We found that even in simple models, estimation of parameters depends on the level of data noise and data sparsity. Finally, the framework enables uncertainty quantification for model structures that include non-linearities and multistability. In all of these cases, we leverage identifiability and sensitivity analyses to narrow the subset of parameters for estimation and then use Bayesian estimation to determine the role of model structure in parameter estimation. These results establish an uncertainty quantification-focused approach to systems biology that can enable rigorous parameter estimation and analysis. Lastly, in Section 3 we discuss these findings in the context of the three examples, address challenges in applying Bayesian methods, and provide directions for future UQ in systems biology.

Materials and methods
This section describes the technical details of the comprehensive framework for uncertainty quantification (see Fig 1) proposed in Section 1.1. Next, Section 1.2 introduces dynamical systems biology models and the parameter estimation problem. Sections 1.3 and 1.4 respectively overview structural identifiability and global sensitivity analyses to reduce the dimension of the parameter space. Then Section 1.5 introduces Bayesian estimation, Section 1.6 outlines the CIUKF-MCMC algorithm, and Section 1.7 describes the constrained interval unscented Kalman filter in detail. Following this, Section 1.8 discusses how to construct prior distributions and Section 1.9 details Markov chain Monte Carlo sampling. Lastly, Section 1.10 discusses output uncertainty propagation with ensemble modeling, Section 1.11 highlights choosing point estimators for the parameters, Section 1.12 delineates synthetic data generation and Section 1.13 outlines limit cycle analysis.

Framework for comprehensive uncertainty quantification for dynamical models in systems biology
This section previews the proposed comprehensive framework for parameter estimation and uncertainty quantification of dynamical models in systems biology. Fig 1 outlines the framework and its components, which are then described in much more detail in the subsequent sections. The proposed framework follows three main steps. First, we assume that dynamical models of intracellular signal transduction (panel A1 in Fig 1) use classical biochemical rate laws, such as mass action kinetics, Michaelis-Menten kinetics, and Hill functions [9][10][11] (panel A2 in Fig 1). The key challenge to applying these models is estimating the associated parameters, such as the rate constants k and V max , equilibrium coefficients K m and K A , and Hill coefficients n in panel A2 in Fig 1, from available experimental data (panel A4 in Fig 1). The comprehensive framework uses Bayesian inference to estimate a statistical model (a probability distribution; see Fig 1B) for the model parameters given a set of noisy measurement data and a specific model form.
We argue that identifiability and sensitivity analysis are necessary steps to perform before parameter estimation (panel B1 in Fig 1). To eliminate uncertainty due to nonidentifiable parameters, we perform global structural identifiability analysis using the Structural Identifiability Analyzer (SIAN) [36,52] (see Section 1.3 for details). The nonidentifiable parameters are fixed to their nominal values from the literature or based on their physiological ranges. Next, variance-based global sensitivity analysis [19,37] is performed to rank the identifiable parameters in order of their contributions to the variance of the model outputs (see Section 1.4 for details). A subset of the identifiable parameters with the largest sensitivity indices is selected for parameter estimation. The remaining model parameters are then fixed to their nominal values in the same fashion as nonidentifiable parameters.
Bayesian parameter estimation completely characterizes uncertainty in the model parameters by estimating a nonparametric statistical model. Bayes' rule (see panel B2 in Fig 1) provides the best guess distribution, called the posterior distribution, for the parameters starting from an initial guess (the prior distribution) that is transformed by the available experimental Model development in systems biology begins with model construction and data collection. Dynamical models in systems biology typically involve a system of ODEs that capture the dynamics of the concentrations of different chemical species in the system (A1). The reaction rates associated with these concentration changes are usually mass action, Michaelis Menten kinetics, or cooperative kinetics represented by the Hill equation (A2). The free parameters in these models include kinetic rate constants, e.g. k, V max , equilibrium constants, e.g. K m , K A , and Hill coefficients, e.g. n. These parameters are first constrained by best guess values based on physiological ranges and typical values of model parameters from the literature (A3). Finally, the model needs experimental data for validation; this data can either be from published work or new experiments. (B) Parameter preprocessing and Bayesian parameter estimation with the CIUKF-MCMC algorithm. First structural identifiability and global sensitivity analyses on the entire parameter set reduce the set of free parameters that can be estimated (B1). Next, we perform Bayesian parameter estimation for this reduced set of parameters to learn their posterior distributions. The posterior distribution is the parameter distribution conditioned on the data (B2). Bayes' rule relates the posterior distribution to the product of the prior distribution and the likelihood function. The prior distribution encodes known information about the parameters and the likelihood function (which requires simulating the model) measures the misfit between predictions and the data. A stateconstrained Unscented Kalman filter approximates the likelihood function to account for uncertainty in the model equations. Although Bayes' rule provides a means to evaluate the posterior distribution at specific points in the parameter space, we use Markov chain Monte Carlo (MCMC) sampling (B3) to characterize the entire distribution. (C) The posterior distributions enable uncertainty analysis of model outputs through ensemble simulation. We perform simulations using the posterior parameter samples to propagate parameter uncertainty through the model (C1). Statistical analysis of the ensemble enables us to compute uncertainty intervals and study various system behaviors, for example, the statistics of the steady state values (C2).
https://doi.org/10.1371/journal.pcbi.1010651.g001 data with the likelihood function. The likelihood function measures the mismatch between the data and the model predictions and returns higher probabilities for parameter sets that produce outputs closer to the data. We use the CIUKF-MCMC algorithm [26,51] to approximate the likelihood function and account for uncertainty in the model formulation, data, and parameters. Markov chain Monte Carlo sampling, with either delayed rejection adaptive Metropolis [58] or the affine invariant ensemble sampler [59], generates a set of samples that represents the posterior distribution (panel B3 in Fig 1).
Lastly, we leverage the posterior distribution to quantify how uncertainty in the model parameters affects uncertainty in the model predictions ( Fig 1C). An ensemble simulation with the parameter samples generates sets of trajectories (see panel C1 in Fig 1) that capture the uncertainty in the predicted dynamics. Computing uncertainty intervals such as the 95% credible intervals presented in panel C2 in Fig 1 provides a visualization of this uncertainty. Notably, credible intervals are different from confidence intervals because credible intervals capture a specified percentage of the samples whereas confidence intervals are random variables that capture regions where estimators will lie at a specified probability level [27] (see Fig 2 for an example of a credible interval). Additionally, statistical analysis of the ensemble enables quantitative analysis of computational modeling results in the same way that running multiple experimental trials enables analysis of experimental results. Before discussing the methods that enable comprehensive uncertainty quantification in the following sections, the next section formally introduces parameter estimation for dynamical models in systems biology. Secondary modes (red dashed-dotted line) can effect the quality of a point estimate. Additionally, the green shaded region highlights the 95% credible interval, the region between the 2.5th and 97.5th percentiles, that is used to capture the uncertainty in an estimate. https://doi.org/10.1371/journal.pcbi.1010651.g002

Parameter estimation for systems biology models in the form of partially-observed systems of ordinary differential equations
We consider nonlinear ordinary differential equation models of the form where x 2 R d �0 is the state vector of nonnegative species concentrations and y 2 R m is the vector of potentially incomplete, m � d, measurements of x. The functions f i ð�; �Þ : R d � R p ! R d govern the rates of the involved biochemical reactions and are derived using biochemical theory (see panel A2 in Fig 1 for example terms). Further, θ f 2 R p f is the vector of biological model parameters, including but not limited to rate constants, binding coefficients, and equilibrium coefficients. The function hð�; �Þ : R d � R p ! R m is the measurement function that maps from the states to the set of observables (experimental data), where θ h is the vector of associated parameters. Lastly, the measurements y(t) are corrupted by independently and identically distributed (iid) Gaussian measurement noise ηðtÞ 2 R m with zero mean and covariance matrix Γ 2 R m�m . The covariance matrix is parameterized by θ G 2 R m such that Γðθ G Þ ¼ diagðθ G Þ. The parameter space is then defined as the multidimensional space of all possible values of θ ¼ ½θ f , θ G ], e.g., θ 2 R p f þm .
In this work, we make several simplifying assumptions to the model in Eqs (1)- (2). First, we assume that the measurement function h(�;�) is linear and that all the parameters in θ h are known, so Eq (2) becomes with H 2 R m�d . Second, we assume that the initial condition x(t = 0) = x 0 is known, so it is excluded from parameter estimation. Although Eqs (1)-(2) define a continuous-time dynamical system, we mostly deal with discrete observations. The set of n measurements Y n ¼ fy 1 ; . . . ; y n g denotes the experimental data taken at time instances t 1 , � t 2 , . . ., � t n , where y k ¼ Hx k þ η k . In this discrete setting, x k is the state vector at time t k and η k is an independent realization of the measurement noise. Additionally, the set of states at the discrete measurement times defined above is X n ¼ fx 1 ; . . . ; x n g. Note that the full (internal) states may not be available for estimation, only the measurements.
We can now define the problem setting of the proposed parameter estimation framework. This work assumes that the model form is known and seeks to estimate the model parameters and associated uncertainties by learning a probability distribution for the parameters. The following problem statement formalizes the parameter estimation problem. (1) and a set of noisy, sparse and incomplete, experimental measurements Y n ¼ fy 1 ; . . . ; y n g at time instances t 1 � t 2 , . . ., � t n , estimate the complete probability distribution, pðθjY n Þ, for the model parameters

Problem 1 Given a known model form as in Eq
The next section outlines the structural identifiability and global sensitivity analyses performed to reduce the dimension of θ before estimating parameters.

Global structural identifiability analysis with the structural identifiability analyzer (SIAN)
Structural identifiability analysis determines if parameters can be uniquely estimated from the available measurement function [29]. Structural identifiability is a mathematical property of the model itself and does not consider the quality or quantity of the available experimental data. The following definition from [60] provides an intuitive condition for global structural identifiability and can be shown to be equivalent to alternative definitions [33,36].
Definition 1 (Global structural identifiability [60]) A parameter θ i is globally structurally identifiable if, for almost all θ 2 R p and for all times t > 0, Global structural identifiability, as in Definition 1, implies that a parameter θ i can be uniquely identified from data. If a parameter is globally structurally identifiable, then there is a single unique value of that parameter that gives each observed output value with the same initial conditions [36]. Alternatively, a parameter may be locally structurally identifiable if the condition in Definition 1 only holds in a local neighborhood, Vðy i Þ, of parameter space around θ i , e.g., for y i 2 Vðy i Þ [29,33,36]. This condition implies that a finite number of values of a locally identifiable parameter can give the same output values [36]. Lastly, if a parameter is nonidentifiable, then infinitely many values of that parameter can give the same model output. While models with locally structurally identifiable and nonidentifiable parameters are still valid over the entire parameter space, the existence of such parameters can confound successful parameter estimation. Many computational methods have been developed to assess structural identifiability [29,33,36,61] for ODE-based models. In this work, we use the differential algebra and power series-based approach presented in [36,52], because the method specifically assesses global identifiability. The SIAN (Structural Identifiability ANalyser) software [52] provides a numerical implementation of the approach proposed in [36]. The SIAN algorithm for assessing global structural identifiability uses a combination of symbolic and probabilistic computation [36,52]. First, SIAN uses Taylor expansions of the model equations to obtain a polynomial representation of the system. Second, the algorithm truncates the polynomial system to produce a minimal system containing all parameter identifiability information. Third, SIAN solves the identifiability problem for a single parameter set that is randomly selected to guarantee correctness up to a user-specified probability level p (see Theorem 5 in [36]). Fourth, the algorithm uses the results in the third step to separate the parameters into globally identifiable, locally identifiable, and nonidentifiable sets. SIAN is implemented in Maple (Maplesoft, Waterloo, ON) and Julia (The Julia Project [62]). We refer the reader to [36] for additional mathematical details on SIAN.
We use the Julia implementation of the SIAN algorithm with the default probability of correctness, p = 0.99 (available at https://github.com/alexeyovchinnikov/SIAN-Julia). Furthermore, we set the additional p_mod parameter to 2 29 −3 to enable the algorithm to run faster [63]. Any parameters that are not globally structurally identifiable are fixed to nominal values informed by the available literature following identifiability analysis. While these parameters may convey meaningful biological information, nonidentifiability implies that it is mathematically impossible to identify them from the available data. Next, global sensitivity analysis is used to further reduce the set of identifiable parameters.

Variance-based global sensitivity analysis
Parametric sensitivity analysis quantifies the contributions of the model parameters to variations in the model output [19,37]. Specifically, global sensitivity analysis aims to quantify the effects of the model parameters on the output quantity of interest over the entire parameter space [19,37] and is well suited for studying parameters in nonlinear models [32,37]. This work applies Sobol's method [53] for variance-based sensitivity analysis because it provides a quantitative output to rank parameters and leverages the prior distributions defined for the parameters. Sobol sensitivity analysis decomposes the variance of model outputs based on contributions from individual parameters and interactions between parameters [37,53]. The total variance of the output quantity f ðθÞ is where f 0 ≔ R R p f ðθÞdθ is the mean of the output. The following definition for the analysis of variance (ANOVA) representation provides an expansion for the output variance in a highdimensional representation (HDMR), also known as a Sobol representation [19,53].
The ANOVA representation (see Def 2) expands the total variance, Eq (4), as where the variances D i and D i,j are Note that it is possible to compute higher-order variances by increasing the dimension of the integral and following the recursion in Def 2, however we limit our discussion to second-order or lower variances for brevity. The first and second-order Sobol sensitivity indices are then defined using the variance terms in Eq (5) as The first-order sensitivity index S i quantifies the fraction of the total variance attributed to parameter θ i , and the second-order sensitivity index S i,j quantifies this for the interactions between θ i and θ j . Lastly, the total-order sensitivity is which quantifies all contributions from parameter θ i on the output variance. This work uses the UQLab toolbox [64,65] to perform Sobol sensitivity analysis in Matlab (MathWorks, Natick, MA) and the DifferentialEquations.jl package [66] for analysis in Julia. Both softwares use Monte Carlo estimators to compute Sobol sensitivity indices from parameter samples (see [19,37] and references therein for details). In Matlab, unless otherwise specified, first and total sensitivity indices are computed using Sobol pseudo-random sampling (e.g. SOpts.Sobol.Sampling = 'sobol') and the default estimator (SOpts.Sobol.Sampling = 'janon'), see [65] for details. Additionally, the number of samples, SOpts.Sobol.SampleSize, is set specifically for each problem in Section 2. In Julia, we perform similar Sobol sampling with QuasiMonteCarlo.jl (https://github. com/SciML/QuasiMonteCarlo.jl) with the SobolSample() sampler, and the default estimator for the sensitivity indices, e.g. Ei_estimator set to :jansen1999.
Sobol sensitivity analysis computes sensitivity indices for scalar output quantities rather than for an entire trajectory. Thus, we define quantities of interest (QoI) for each problem that capture the biologically relevant information in a trajectory, such as the steady state value of a certain biochemical species. Using the computed sensitivity indices for the given QoI, we rank the parameters and aim to find the most influential parameters by selecting those with the greatest sensitivity indices. The critical challenge is selecting a cutoff point that separates the influential parameters from the complete set. To do so, we chose a threshold corresponding to a pronounced decline in the sensitivity index value or a logical value of that index, for example, S i � 0.1. All parameters with sensitivity indices above the cutoff value are left free for estimation, and the remaining parameters are fixed to their nominal values. The following section recasts the parameter estimation problem (Problem 1) in the Bayesian framework to estimate the free, influential parameters.

Bayesian parameter estimation
Bayesian parameter estimation solves Problem 1 by considering the model parameters, θ, as random variables that do not have a pre-specified parametric distribution. Note that parametric distributions have probability density functions that are fully described by a closed-form parametric equation, such as a normal distribution that is characterized by its mean and variance. The approach characterizes the posterior probability distribution for the parameters pðθjY n Þ conditioned on a given dataset Y n and provides the best guess probability distribution for θ given the data. We can use Bayes' rule to express the posterior distribution as where pðθÞ is known as the prior distribution, and Lðθ; Y n Þ is the likelihood function (see panel B2 in Fig 1 for a visual representation of Bayes' rule).
Intuitively, Bayes' rule updates our best guess about the distribution of the model parameters as new data is being incorporated. The prior distribution pðθÞ represents the best guess before any data are collected and encodes any assumptions on the parameters. For instance, the prior may convey the physiological ranges for parameter values or may weigh known values more heavily (see Section 1.8). The likelihood function Lðθ; Y n Þ ¼ pðY n jθÞ updates our belief state by measuring the misfit between the data and model predictions with a specific parameter set. Parameter sets that are more likely to occur will produce model predictions that better match the data and thus have higher likelihood probabilities. For example, although the prior in panel B2 in wherex k is the predicted state at time t k with the parameters θ, |�| denotes the matrix determinant, and the C-weighted norm is defined as jjajj 2 C ≔ a > C À 1 a, where C is a symmetric positive definite matrix. Lastly, the posterior distribution conveys the best guess after collecting and incorporating the data Y n into the statistical model and can be further refined if more data are included. In panel B2 in Fig 1, the posterior illustrates how the likelihood function re-weights the prior to update our belief state.
A fundamental difficulty in Bayesian parameter estimation is that Bayes' rule only enables evaluating the posterior distribution at specific points in parameter space. This is in contrast to, for example, the formula for a Gaussian distribution (with mean μ and standard deviation σ) pðxÞ ¼ 1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 2ps 2 p exp À ðx À mÞ 2 s 2 � � that can be analytically evaluated at all values of x. Therefore, parameter samples are drawn from the posterior help to characterize the distribution over the entire parameter space. Markov chain Monte Carlo (MCMC) algorithms enable sampling from arbitrary distributions, such as the posterior distribution (see Section 1.9 for details). Before performing Bayesian estimation the next section introduces the constrained interval unscented Kalman filter Markov chain Monte Carlo (CIUKF-MCMC) algorithm that accounts for uncertainty in the model and the data.

Constrained interval unscented Kalman filter Markov chain Monte Carlo (CIUKF-MCMC)
Complete uncertainty quantification of dynamical models in systems biology must account for uncertainty in the model form, parameters and noisy data. In [26], Galioto and Gorodetsky suggest adding a process noise term to Eq (1) to account for model form uncertainty in the system. Following this suggestion, the model in Eqs (1-2) is recast as a discrete time stochastic process where k is the discrete time index for t k , and ψ(�;�) is the discrete state propagator that evolves the state from time t k−1 to time t k . Additionally, ξ k and η k are Gaussian process and measurement noise (stochastic noise processes) with covariances Sðθ S Þ and Γðθ G Þ, respectively. The discrete state propagator ψ(�;�) in Eq (8) is the discrete operator that evolves the state according to the differential equation in Eq (1), and, for example, could be a forward-Euler finite-difference approximation. This work deploys the Matlab function ode15s() to construct a state propagator that guarantees the necessary stability to handle systems biology models.
Bayesian estimation of the model parameters, θ ¼ ½θ f ; θ S ; θ G � > , of the extended system in Eqs (8)-(9) accounts for uncertainty in the data, model, and parameters. The introduction of process noise increases the dimension of the parameters to estimate by requiring estimates for θ S . Further, the addition of stochastic process noise means the state variables are random variables; Bayes' rule for this system becomes pðθ; X n jY n Þ / pðθÞLðθ; Y n ; X n Þ; ð10Þ to account for the additional uncertainty in the states. The key step of the UKF-MCMC algorithm is the marginalization of uncertainty in the states out of Eq (11) to enable estimation of the parameter posterior distribution [26]. The UKF-MCMC algorithm begins by constructing an expression for the joint likelihood of the states and the parameters, Lðθ; Y n ; X n Þ. Two probability distributions implied by the stochastic system in Eqs (8)-(9) are needed to define an expression for the joint likelihood. First, the probability of the current state x k given the past state x k−1 is where the norm kx k À cðx kÀ 1 ; θ f Þk 2 S quantifies the misfit between the past state and the predicted current state. Next, the probability of a measurement y k given x k is where the norm ky k À Hx k k 2 Γ quantifies the residual between the measurement and the true states. By combining Eqs (11) and (12) the joint likelihood is Marginalizing out the uncertain states by integration yields the likelihood for the uncertain parameters However, there is no obvious computationally tractable approach to integrate over a set of uncertain states directly. Theorem 1, stated below, provides a recursive algorithm to marginalize the states out of the likelihood, e.g., to perform the integration in Eq (14). Although Theorem 1 assumes that the initial condition is uncertain (and it is therefore estimated), we do not use that estimate in this work, as we start with a known initial condition x 0 .

Theorem 1 (Marginal likelihood (Theorem 1 of [26] and 12.1 of [67])) Let Y k denote the set of all observations up to time t k as defined in Section 1.2. Let the initial condition be uncertain with distribution pðx 0 jθÞ. Then the marginal likelihood is defined recursively in three stages:
for k = 1, 2, . . .

Predict the new state from previous data
2. update the prediction with the current data 3. and marginalize out uncertainty in the states The recursion defined in Theorem 1 closely resembles a Bayesian filter [67]; thus, it is evaluated with Kalman filtering algorithms [26]. For linear models, the standard linear Gaussian Kalman filter can be used to evaluate the recursion (see Algorithm 2 in [26]). However, exact solutions to the recursion are not possible if the model or measurement processes are nonlinear. Therefore, approximations such as extended Kalman filters (EKF), unscented Kalman filters (UKF), or ensemble Kalman filters (EnKF) [26,67] must be employed. The original implementations of the UKF-MCMC algorithm use the UKF [68] for this approximation because the UKF is generally stable and can handle nonlinear models and measurement processes [26,51]. However, the UKF is not suitable for systems biology models because it ignores constraints on the state variables, such as the nonnegativity of chemical concentrations. Section 1.7 describes the constrained interval unscented Kalman filter (CIUKF) [54] implemented in this work to enforce state constraints during filtering. Thus, we refer to the UKF-MCMC from [26] that uses the constrained interval unscented Kalman filter [54] as CIUKF-MCMC.

Constrained interval unscented Kalman filter (CIUKF)
We implement the constrained interval Unscented Kalman filter (CIUKF) [54,69] algorithm to enforce all state constraints in CIUKF-MCMC. This algorithm assumes that the state is subject to an interval constraint x LB � x � x UB . Since we only seek to enforce nonnegativity in systems biology, the interval constraint is 0 � x � 1. We choose the CIUKF over alternative state constrained Kalman filters [54,70] because it enforces constraints in both the predict and update steps of the algorithm and retains the same structure as the standard UKF [67]. We outline the steps of the CIUKF below.
The CIUKF algorithm predicts the state x k + 1 from all preceding measurement data Y n . Following the structure of the linear Kalman filter, CIUKF is a recursive algorithm that iterates over all data and performs prediction and update steps at each time point [67]. For simplicity, we outline a single iteration of the CIUKF that moves the state forward in time from x k to x k + 1 . Let C xx k be the state covariance matrix at time t k , Sðθ S Þ be the process noise covariance matrix, Γðθ G Þ be the measurement noise covariance matrix, and θ f be the model parameters. There are two steps to the CIUKF, prediction, and update.
First, the prediction step uses the the interval constrained unscented transform [54,69] (the state-constrained equivalent to the unscented transform [68,71]) to predict the state and its covariance matrix at the next time point after propagation by the nonlinear model, e.g., Eq (8). The interval constrained unscented transform constructs a set of sigma points that capture the covariance C xx k at time t k . Each sigma point is propagated in time by the nonlinear model to approximate the new state and its covariance at the next time t k+ 1 . The set of 2d + 1 sigma points, X, is given by where ξ i is the ith sigma point coefficient, is the matrix square root of A, and i = 1, . . .d. The sigma point coefficients ξ i control the distances of the sigma points around the initial state x k and are chosen to ensure that no sigma points violate the state constraints. They are Xði; jÞ ≔ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where λ is a parameter of the algorithm. Alternatively, in the standard unscented transform, the coefficients are all equal to ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi d þ l p [68,71]. Next, a set of weights, w i , are assigned to each sigma point as where ξ i are as defined in Eqs (16)- (18). Importantly the sum of the weights equals one, P 2d i¼0 w i ¼ 1. The prediction step then uses the nonlinear model, Eq (8), to propagate each sigma point forward in timeX for i = 0, . . ., 2d. The prediction mean x � k and covariance C �xx k are then respectively computed as Eqs (15)-(22) describe the constrained interval unscented transform that approximates the mean and covariance of the state after propagation by the nonlinear model. The prediction mean, Eq (21), and covariance, Eq (22), provide the predicted state and its covariance, respectively, that are then updated using the available data y k . The update step begins by constructing a new set of sigma points centered around x � k , where Additionally, a new set of weights are where fw ðmÞ i g are used to compute the mean and fw ðcÞ i g are used to compute the covariance matrix. These weights are indeed equal for all sigma points and are equivalent to those used in the standard UKF. Next, the measurement function is applied to each sigma point, yielding a set of predicted measurements where the measurement function h(�;�) is possibly nonlinear with parameters θ h . Then the mean and covariance matrices of the predicted measurements are computed with the weighted sums, and the Kalman gain is Lastly, the updated state x k + 1 is found by solving the following constrained nonlinear optimization problem, where the objective function is This optimization problem can be solved in Matlab using the fmincon() optimizer. Additionally, the updated covariance matrix is given by In offline state estimation problems, such as CIUKF-MCMC, this filter is iterated over all available data, e.g. from time t 0 to time t n if n data points are available [26,51]. In practice, the CIUKF algorithm is substantially more compute-intensive than the standard UKF [54] because the CIUKF update step involves solving a constrained nonlinear optimization problem, e.g., Eq (24). However, the objective function in Eq (25) can be simplified given the linear measurement assumptions made in Section 1.2. The simplified objective function becomes Expanding this and recognizing that minimizing which is equivalent objective function to Eq (26) and is a quadratic form. Thus, the constrained optimization problem in Eq (24) becomes a quadratic program when using Eq (27) as the objective function. We use the quadprog() function in Matlab to solve the quadratic program with the 'Algorithm' option set to 'trust-region-reflective'. We keep all other settings for the solver as defaults. As expected, solving the quadratic program is substantially more efficient than solving the general nonlinear problem. Throughout this work, we set λ = 1, α = 1 × 10 −3 , and β = 1 for the CIUKF. Furthermore the Cholesky decomposition, A = L L > , is used to compute the matrix square roots in Eqs (15) and (23), because covariance matrices are always positive definite. Additionally, we compute all covariance matrices as P � ¼ 1 2 ðP þ P > Þ þ 2 I, where � = 1 × 10 −10 , to ensure all they remain symmetric positive definite. The next section discusses how to choose prior distributions for Bayesian estimation.

Prior distribution
The prior distribution, pðθÞ encodes our belief state about parameters before collecting data and performing Bayesian estimation [19,27]. The form of the prior distribution allows us to incorporate varying levels of prior knowledge into our models. If the values of a parameter are known, informative priors can be used to shift to the possible values for that parameter towards the known values [27]; for example, a log-normal prior distribution can be used to center the prior around experimental measurements of a parameter [72]. However, the Bayesian inference literature often warns that informative priors should only be used in combination with good information on the parameter values [27] as it can take a large amount of data to overcome a "bad" prior. Alternatively, noninformative or weakly-informative priors reflect a lack of good prior knowledge about the parameters. For example, if we only know a parameter's physiological ranges, we could construct a uniform prior that states there is an equal probability of any parameter value within this range. Noninformative or weakly-informative priors rely on the data to provide information on the parameters, so the choice of such priors is often safer when prior knowledge of the parameters is limited [27].
In applying the CIUKF-MCMC algorithm, this work constructs prior distributions for two sets of parameters, the biological model parameters, and the noise covariance parameters. We choose to use uniform priors for the biological model parameters, θ f , to replicate the typical modeling setting where only the possible ranges for model parameters are known. S1, S2 and S4 Tables list the upper and lower bounds of all biological model parameters. Furthermore, we follow the choices in [26] and use right-half-normal priors for the measurement and process noise covariance parameters, θ S and θ G , respectively (this is further motivated in Section 7.1 of [73]). The choice of covariance and upper bound of these priors was found to significantly affect the convergence of MCMC sampling. Thus, the prior distributions for the noise covariance parameters were scaled to match the respective state variable. For example, the prior for a measurement noise covariance θ S i that corresponds to measurements fy i 1 ; y i 2 ; . . . ; y i n g would be a right-half-normal distribution with mean zero and standard deviation equal to a fraction of the standard deviation σ y i of the data. The prior is then y S i � Right-Half-Normalð0; bs x i Þ truncated to ½0; s x i � > , where b < 1 and s x i is the standard deviation of the state x i . We chose b ¼ 1 3 unless otherwise specified. These choices are motivated by the observation that measurement noise and process noise covariances will always be smaller than or equal to the covariance of the available data if there is any meaningful information in the data. The next section introduces MCMC sampling to characterize posterior distributions that use CIUKF to approximate the likelihood function.

Markov chain Monte Carlo sampling
Markov chain Monte Carlo (MCMC) algorithms enable sampling from arbitrary probability distributions [19,[74][75][76]. The key idea of MCMC is to construct a Markov chain of samples θ 1 ; θ 2 ; . . . ; θ N whose distribution converges to the target distribution, pðθÞ, that we wish to sample [27,75]. We apply MCMC sampling to Bayesian parameter estimation by constructing a Markov chain where the target distribution is the posterior distribution, that is pðθÞ ¼ pðθjY n Þ. In this work, we use two MCMC sampling algorithms, delayed rejection adaptive Metropolis (DRAM) [58] and affine invariant ensemble sampler (AIES) [59]. These samplers build upon the classical Metropolis-Hastings algorithm [77,78] that we introduce in Section 1.9.1. We outline DRAM in Section 1.9.2 and AIES Section 1.9.3. Lastly, we discuss convergence assessment with the integrated autocorrelation time [59] in Section 1.9.5. We focus these discussions on the practical aspects of MCMC and refer the reader to [19,27,75] for additional theoretical details. [77,78] constructs a Markov chain whose probability distribution is guaranteed to converge to the target distribution and forms the foundation for a large family of MCMC samplers [27,75]. The MH algorithm consists of two steps, a proposal and an accept-reject step, repeated to draw the set of samples θ 1 ; θ 2 ; . . . ; θ N . The Markov chain starts with an initial sample θ 0 that the user chooses. We outline one iteration of the MH algorithm moving from step i to step i+ 1.

Metropolis-Hastings. The Metropolis-Hastings (MH) algorithm
The first step in Metropolis-Hastings, called the proposal step, is to propose a new sample θ � . We draw θ � from the proposal distribution qðθ � jθ i Þ. The proposal distribution is specific to each MCMC algorithm, but, for example, could be a normal distribution centered around the previous sample such that θ � � N ðθ i ; s 2 IÞ where σ is specified (this is random walk Metropolis (RWM) [75]).
Next, the accept-reject step decides if the proposal is accepted, set θ iþ1 ¼ θ � , or rejected, set θ iþ1 ¼ θ i . In the MH algorithm, this decision is probabilistic, e.g., the proposal is accepted or rejected with acceptance probability aðθ � jθ i Þ. The acceptance probability is which guarantees that the stationary distribution of the samples (the distribution that the samples converge to in the infinite sample limit) equals the target distribution [27,75]. These steps, propose and accept-reject, are repeated until the distribution of the set of samples has converged to its stationary distribution. Although convergence is guaranteed in the infinite sample limit [27,75], assessing convergence in practice is nontrivial. We chose to use an approach from [59] that uses the integrated autocorrelation time to asses convergence as outlined in Section 1.9.5 (see [19,27,75] for additional approaches).

Delayed rejection adaptive Metropolis (DRAM).
The delayed rejection adaptive Metropolis (DRAM) algorithm is based on the random walk Metropolis algorithm and combines delayed rejection with adaptive Metropolis [58]. DRAM closely follows the Metropolis-Hastings algorithm but specifically uses a Gaussian proposal distribution such that θ � � N ðθ i ; CÞ, where C is the covariance matrix. Delayed rejection (DR) [58,79] and adaptive Metropolis (AM) [58,80] are two modifications to the random walk Metropolis algorithm that help to improve the convergence rate of the Markov chain.
First, delayed rejection adds an additional round of Metropolis-Hastings (propose and accept-reject) if the first proposal is rejected. That is, if θ � is rejected a new proposal, θ �2 � q 2 ðθ �2 jθ � ; θ i Þ is drawn, where q 2 (�) is the new proposal distribution. The new proposal distribution is also Gaussian, however the covariance is scaled to a fraction of the original proposal covariance, e.g. q 2 ðθ �2 jθ � ; θ i Þ ¼ N ðθ i ; gCÞ, where γ < 1 and is a tuning parameter of the algorithm [58]. Thus the second proposal is closer to the previous point and is more likely to be accepted. DRAM evaluates the new proposal with a Metropolis-Hastings accept-reject step with the acceptance probability where α(�|� 0 ) is as defined in Eq (28). Although most implementations impose a single DR step [19], delayed rejection can be repeated more than once, where the proposal distribution and acceptance probability are modified accordingly at additional each round. Adaptive Metropolis acts separately from delayed rejection and aims to move the proposal distribution closer to the target distribution [58], by replacing the covariance matrix of the proposal, C, with the covariance matrix of the samples. In practice, adaptation begins after a set number of i 0 samples have been drawn and updates the covariance matrix at every step as ( where C 0 is the initial covariance matrix, s p is an algorithm tuning parameter that is often set to s p = 2.38 2 /p [58,74] where p is the dimension of θ, and � is a small positive number. We implement the DRAM algorithm with a single round of delayed rejection following [26] with γ = 0.01, i 0 = 200, and � = 1 × 10 −10 . Furthermore, the initial covariance matrix C 0 is tuned to accept between 20-40% of proposals. We refer the reader to [19,58] for further details on DRAM and tuning the initial covariance matrix. Additionally, each Markov chain is initialized to the MAP point by using fmincon() (default options except: 'UseParallel' set to true and 'MaxFunctionEvaluations' set to 10, 000) in Matlab to minimize the negative log-posterior (equivalent to maximizing the posterior) as in [26].

Affine invariant ensemble sampler (AIES).
While random walk Metropolis-based algorithms such as DRAM can adequately sample complex posterior distributions, these methods will show very slow convergence when the target distribution is highly anisotropic [59]. The posterior distribution with CIUKF-MCMC in systems biology are anisotropic because the scales of model parameters can vary by several orders of magnitude, and the noise covariance parameters often have different scaling than the model parameters. Fortunately, AIES provides an algorithm to sample such anisotropic distributions [59] effectively. The motivation for affine invariance is that anisotropic distributions can be transformed to isotropic distributions with an affine transformation. Thus an algorithm that is invariant to such transformations will effectively sample an isotropic distribution when sampling an anisotropic distribution [59].
The AIES algorithm differs from DRAM and random walk Metropolis because it leverages an ensemble of Markov chains rather than a single chain. Each chain in the ensemble of N e Markov chains is called a walker, and we denote the set of walkers at step i with fθ i 1 ; . . . ; θ i Ne g where the subscript is the walker index and the superscript is the step-index. Note that the number of walkers must be larger than the dimension of θ that is N e > p, for θ 2 R p . At the end of N steps, a set of N e Markov chains with N samples is obtained. The first chain in the ensemble is, for example, θ 1 1 ; θ 2 1 ; . . . ; θ N 1 . Note that the total ensemble will have N�N e samples. Each step of the AIES algorithm involves a proposal and a Metropolis-Hastings update for each chain in the ensemble. One iteration of these steps to move from i to i + 1 for a single walker, θ i n , is outlined below and these steps are repeated to update the entire ensemble. First, a proposal for the current walker θ i n is chosen using the stretch move [59] that ensures affine invariance of the sampler. The stretch move proposes a new point that lies along the line that connects the current walker θ i n and another walker in θ i k randomly chosen from the ensemble. Here, Z is a random variable that is sampled following where a > 1 is an algorithm tuning parameter that the user must specify. Second, the proposal is accepted or rejected using a Metropolis-Hastings-like accept-reject step. The acceptance probability is where Z is as defined above, and p is the dimension of θ. This formulation of the acceptance probability guarantees convergence to the target distribution [59]. We use the Matlab implementation of the AIES algorithm [81] from the UQLab toolbox [64]. Unless otherwise specified, the ensemble size is N e = 150 because we observed improved sampling with a large ensemble. To accelerate sampling, the likelihood is evaluated for each ensemble member in parallel using a parallel for loop (e.g., parfor in Matlab) with at most 24 parallel threads. Additionally, default value of a = 2 is used for the stretch move tuning parameter. Lastly, each Markov chain in the ensemble is initialized to a random point drawn uniformly over the support of the prior.

Markov chain burn-in.
In MCMC, Markov chains (or ensembles of chains) often display an initial transient, called burn-in, before converging to their stationary distributions [19,27,76]. Importantly, these samples during burn-in are not distributed according to the stationary distribution and should therefore be excluded from the final set of samples. Common practice in the MCMC literature is to simply discard these initial samples to remove the effects of burn-in [19,27]. The choice of the burn-in length is often nontrivial and is best informed by an analysis of the Markov chain [76]. Unless otherwise specified the integrated autocorrelation time (described below) dictates the number of samples to discard as burn-in in this work. Specifically, we compute the integrated autocorrelation time after collecting many samples and set the burn-in length to 5-10 times the computed value.

Convergence assessment with the integrated autocorrelation time.
A key challenge in Markov chain Monte Carlo sampling is determining the appropriate number of samples N to collect. In this work, we use the integrated autocorrelation time [59,76] to determine when the Markov chain has approximately converged to its stationary distribution. We outline the theory and motivation behind the integrated autocorrelation time for a single Markov chain and refer the reader to [59] for a discussion of ensemble methods. The use of the integrated autocorrelation time is motivated by the typical use of MCMC sampling to compute an expectation E½x� ¼ Z xpðxÞdx: Given a Markov chain of length N, we can estimate the expectation with the Monte Carlo estimatorm In general it is common to consider the variance ofm, varðmÞ, as the estimation error [59]. This variance is given by where the integrated autocorrelation time τ s is given by Here, the autocovariance function C s (T) with lag T 2 N is given by where t � 2 N. Thus, the Monte Carlo estimation error is proportional to integrated autocorrelation time for a fixed chain length. The integrated autocorrelation time can be interpreted as the time it takes for the samples in a Markov chain to become uncorrelated [76]. Additionally, the effective number of samples, N eff , can be defined using the integrated autocorrelation time as, N eff = N/τ s . In this work, we use the integrated autocorrelation time for two purposes. First, the computation of the integrated autocorrelation time is used to choose the correct burn-in length. We compute the integrated autocorrelation time after collecting many samples and then discard between 5-10 times τ s as burn-in. Second, after discarding the initial samples, the integrated autocorrelation time helps to determine if enough samples have been collected, e.g., N eff is large. Should the effective sample size be small, the MCMC sampler is run longer to collect more samples. We compute the integrated autocorrelation time using a Matlab function associated with [82] (available at https://www.physik.hu-berlin.de/de/com/UWerr_fft.m) with default algorithm parameters. We compute τ s for each parameter and take the maximum of these values. For an ensemble, we compute the mean τ s for the walkers of each parameter and take the maximum value of the means. After completing MCMC and evaluating the chains, we leverage the posterior samples to quantify uncertainty in model predictions.

Ensemble simulation and output uncertainty analysis
Markov chain Monte Carlo sampling provides a set of samples, fθ 1 ; . . .θ n g, that converge in distribution to the posterior distribution (e.g. panel B3 in Fig 1). One is often interested in how uncertainty in the parameter estimates, which is conveyed by the posterior distribution, propagates to uncertainty in the model predictions. Fortunately, an ensemble of simulations (see panel B1 in Fig 1) distributed according to the posterior can be run using the posterior samples. This approach to uncertainty propagation is known as sampling-based uncertainty propagation [19] and is feasible because the simulation of dynamical systems biology models is computationally efficient. We refer the reader to [40,83] for examples of sampling-based uncertainty propagation in systems biology. Each simulation is run by solving the differential equation with the ode15s() integrator in Matlab with a unique sample from fθ 1 ; . . .θ n g for the parameters. We use default tolerances for the integrator and supply the Jacobian matrix (specify the 'Jacobian' option) to improve computations. We compute the Jacobian by hand and evaluate it with the same parameter as the differential equation.
Following ensemble simulation, statistical analyses of the ensemble of predicted trajectories and any relevant quantities of interest (QoI) can be performed. For example, panel B2 in Fig 1  highlights the uncertainty in the trajectory with 95% credible intervals to show the region where 95% of the trajectories fall. Additionally, we compute the statistics of relevant QoIs, such as the steady state values or limit cycle period, from the ensemble. The next section discusses how to choose single point estimates that best represents the estimated parameters.

Parameter point estimates from posterior samples
One often wants to compute a point estimate for each of the parameters in addition to characterizing the entire posterior distribution (black line in Fig 2). Common choices for point estimates in Bayesian statistics include the mean, median, or mode of the posterior distribution [27]. These point estimates are computed from posterior samples acquired via MCMC sampling. While sample statistics, such as the mean and median, are computed directly from the samples, computing the MAP point requires estimating the posterior of the probability density function and finding the maximal point. Direct computation of the sample mode will not yield the MAP point because the parameters are continuous random variables, so no two-parameter samples are expected to be identical. One approach to estimate the MAP point involves computing a histogram of samples and using the center of the bin with the most associated probability as the MAP. However, we found that the histogram of posterior samples is often noisy and can be sensitive to the bin size, so that the MAP estimate may be erroneous. We chose to take an alternative approach that fits a kernel density estimator [84] to approximate the posterior distribution and subsequently computes that MAP point. The kernel density estimator provides a non-parametric approximation of the posterior distribution and, intuitively, smooths the posterior histogram. We use the ksdensity() function for kernel density estimation in Matlab. The 'support' option is set to be the region bounded by the prior distribution and the 'BoundaryCorrection' option is set to use the 'reflection' method to account for these bounds. Lastly, we use the default values for the 'Bandwidth' unless otherwise specified. All other options are kept to the defaults as defined in the Matlab documentation. The MAP point is the point with maximum probability in the ksdensity() output. The next section moves from the details of CIUKF-MCMC and Bayesian estimation to outline how synthetic data is generated for the examples in this paper.

Synthetic data generation for numerical experiments
The synthetic data in this work aims to replicate noisy data found in biological experiments. We generate noisy synthetic data by drawing samples from deterministic model simulations and simulate measurement noise by adding independently and identically distributed (iid) perturbations to each sample. First, a nominal set of biological model parameters and an initial condition are chosen for data generation. These values become the ground truth for the estimation problem, and are informed by the available literature when possible. Next, a numerical solution to the system provides true trajectories of the state variables. Unless otherwise specified the ode15s() integrator in Matlab is used with default tolerances, and the Jacobian matrix is supplied (specify the 'Jacobian' option with the analytical Jacobian Matrix) to improve computations. We then apply the measurement function and sub-sample the true trajectory to simulate sparse sampling. Lastly, we corrupt the data by adding a realization of an iid noise stochastic process to each sample. To meet the assumptions of the CIUKF-MCMC algorithm (see Section 1.6) we use mean-zero, normally distributed perturbations with the diagonal covariance matrix, Γ. We chose the entries of Γ to be proportional to the variances of the respective state variables, e.g. where b is a positive constant typically chosen to be less than one that controls the noise level. The last section discusses how to compute the amplitude and period of a limit cycle oscillation.

Limit cycle analysis
Limit cycle amplitude and period are used to characterize limit cycle oscillations for global sensitivity and output uncertainty analysis. These quantities are relevant in intracellular signaling because the strength and timing of signals, amplitude and period, respectively, are thought to encode different inputs [85]. The limit cycle amplitude, y lca , quantifies the difference between the maximum and minimum values of the oscillations and is defined as where x min and x max denote the minimum and maximum values of the state x over a single complete oscillation. Further, the limit cycle period, y period , is the time to complete an oscillation and is defined as To compute these quantities we find trajectories that show limit cycle oscillations and then extract the two quantities of interest. This approach leverages the findpeaks() function that returns the locations of the local maxima (peaks) in a trajectory for these computations in Matlab. The findpeaks() function can also find the local minima by applying it to the negative of the trajectory. The first task in computing the limit cycle features is to detect actual limit cycle oscillations. A trajectory is discarded if it reaches a fixed point (steady state) when no peaks are detected (findpeaks() returns an empty set). Next, the difference between the heights of the identified peaks is used to discard trajectories that show decaying oscillations to a fixed point. A threshold on one half of this difference, called for the helpPeakThreshold, is set to 17.0 unless otherwise specified; a trajectory is discarded if its difference in peak values exceeds this threshold. Any remaining trajectories will show limit cycles or will contain numerical artifacts which are falsely detected as limit cycles.
Lastly, the limit cycle amplitude and period are computed. The limit cycle amplitude is the mean difference between each pair of detected peaks and minima that correspond to one oscillation of the limit cycle. The limit cycle period is then computed as the mean time between two peaks and the time between two minima. Trajectories with numerical artifacts are eliminated by discarding those with a limit cycle amplitude that is smaller than the LCAminThresh, with a range of limit cycle values greater than the Decaythresh or those that return no limit cycle period (e.g., the empty matrix on Matlab). Unless otherwise specified, we set the LCA-minThresh to 1.0 and the Decaythresh to 5.0. These computations assume that the period of the limit cycle is stable and that no frequency modulation occurs. We note that a small number of oscillating trajectories can be misclassified as fixed points due to numerical errors in the simulated trajectory. We specifically observed this when there was large peak-topeak variability. However, misclassification is rare, so we do not attempt manual relabelling of these trajectories.

Results
We applied the Bayesian parameter estimation framework to three different models that represent signal transduction cascades of increasing biological and mathematical complexity. Section 2.1 uses the first model, a simple kinetic scheme, to describe a series of computational experiments that illustrate the effects of measurement noise and data sparsity on estimation uncertainty (Fig 3). Next, we tested our framework on two models that are more representative of the nonlinearities and overparameterization observed in systems biology models. Section 2.2 analyzes a representative model of the mitogen-activated protein kinase (MAPK) pathway [56] that exhibits multistability depending on the choice of model parameters, and Section 1 in S1 Text uses this model to illustrate the necessity of structural identifiability and global sensitivity analyses. Finally, Section 2.3 analyzes a simplified model of synaptic plasticity [57] that illustrates the effects of higher parameter uncertainty even in phenomenological biological representations.

Measurement noise and sparsity increase estimation uncertainty in a simple model of signal transduction
The first model we consider is a relatively simple two-state model from [55] shown in Fig 3A. The governing equations for this model are where the states are x(t) = [x 1 (t), x 2 (t)] > , the biological model parameters are θ f ¼ ½k 1e ; k 12 ; k 21 ; b� > , and u(t) is the input function. The input function (illustrated in Fig 3A) uðtÞ ensures that all four biological model parameters are structurally identifiable because the input function has at least one nonzero derivative [55]. S1 Table lists the nominal parameter values and the initial condition was x 0 = [0.5, 0.5] > . Sensitivity analysis was not performed on this model because the state variables are linearly dependent on the parameters. Synthetic data with full state measurements (see Section 1.12) was used to perform two parameter estimation experiments-one with increasingly noisy data and another with increasingly sparse data-to investigate how measurement noise and data sparsity affect A first hypothesis we tested was that noisy data increases uncertainty because measurement noise limits the ability to constrain the dynamics of the state variables. To test this hypothesis, we performed Bayesian parameter estimation from synthetic data with increasing measurement noise (circle marks in Fig 3C). We observed that the marginal posterior distributions in Fig 3B (kernel density estimator fit to the posterior samples as in Section 1.11) widen and flatten with increasing noise levels, indicating increased uncertainty. However, the most probable value for each parameter, the MAP point, lies close to the nominal parameter values (dashed lines in Fig 3B) for every noise level, suggesting that the data provide information about the parameter irrespective of the noise level. Additionally, Fig 3C shows that the width of the 95% credible interval for the dynamics of x 1 (t) grew as the noise increased from the lowest level (2.5%) to 5.0% and then remained similarly wide at the highest values (see S1 Fig for the respective trajectories of x 2 (t)). While the uncertainty bound did not widen above the 5.0% noise level, the shape of the trajectories began to shift further from the truth (dotted line in Fig  3C), indicating the estimates began to take on a bias. These experiments validated our hypothesis that even in a simple dynamical system measurement noise increases estimation uncertainty of kinetic parameters.
Next, we hypothesized that data sparsity (fewer data points) would increase the uncertainty in parameter estimates. To test this, we fixed the measurement noise to the 2.5% level (see Fig  3) and varied the number of measurements (e.g., the sampling rate) that were included in the data used for estimation. We tested three sparsity levels with 40 experimental samples (Δt = 0.05), 20 experimental samples (Δt = 0.1) and 10 experimental samples (Δt = 0.2) over the simulation time 0 � t � 2. Fig 3D highlights the widening of the estimated marginal posterior distributions for each model parameter as we decreased the number of data points (increased sparsity). Additionally, Fig 3E shows that the increased parameter estimation uncertainty translates to increased uncertainty (wider 95% credible interval) in the trajectory of x 1 (t) (see S1 Fig for the trajectories of x 2 (t)). In both of these experiments, the proposed uncertainty quantification framework qualitatively and quantitatively confirmed that increasing the noise or sparsity level increases estimation uncertainty.

Parameter estimation for a model of the MAPK cascade
We chose a simplified model of the highly conserved mitogen-activated protein kinase (MAPK; also known as the MEK/ERK cascade) signaling pathway [86] as a second test case for our parameter estimation framework. This pathway is known to exhibit bifurcations in its dynamical behavior [85,87]; the system can reach a stable steady state or exhibit limit cycle sensitivity for the MAPK model parameters. All parameters except the total concentrations, S 1t , S 2t and S 3t , exponents, n 1 and n 2 , and locally identifiable K 1 and K 2 , are varied uniformly over the identified ranges (see S2 Table). We use 5,000 and 15,000 samples for the bistable and limit cycle regimes, respectively. (C) Sensitivity indices for bistable behavior dynamics. We use the steady state value of x 2 (t) and x 3 (t) for both the high and low steady states as quantities of interest. By selecting the two most sensitive parameters for the four quantities of interest, we reduce the set of free parameters to θ f ¼ ½k 2 ; k 4 ; k 5 ; k 6 � > . (D) Sobol sensitivity indices for a set of free parameters that contribute to limit cycle behavior. We show the first-order sensitivity indices S i and the total-order indices S T i for the limit cycle amplitude and period of x 3 (t). We reduce the number of free parameters by selecting those with S i > 10 −3 across both output quantities, that is, θ f ¼ ½k 2 ; k 6 ; k 4 ; a� > . https://doi.org/10.1371/journal.pcbi.1010651.g004 oscillations. We focused on a phenomenological model of the MAPK pathway from [56] (see diagram in Fig 4A) that includes the mixed feedback (negative and positive feedback) necessary to predict the range of dynamical behavior observed in experiments. This model has three states, x 1 (t), x 2 (t), x 3 (t) that correspond to phosphorylated RAF, MEK, and MAPK/ERK, respectively [88,89] and 14 model parameters. The differential equations are with the biological model parameters θ f ¼ ½k 1 , k 2 , k 3 , k 4 , k 5 , k 6 , K 1 , K 2 S 1t , S 2t , S 3t , α, n 1 , n 2 ] > . A previous analysis of the model in [56] found that it can predict three regimes of dynamical behavior that depend on the model parameters; these regimes are limit cycle oscillations, bistability and mixed multistability. Here we focus on using CIUKF-MCMC to estimate model parameters that produce two of the three dynamical regimes-bistability and limit cycle oscillations (see Fig 4B for example trajectories). S2 and S3 Tables list the nominal parameter values and initial conditions (as defined in [56]) used to produce each of these dynamics. First, we performed identifiability and sensitivity analysis to find the subsets of relevant parameters to estimate each dynamical regime. The SIAN software [52] (see Section 1.3) showed that 12 of the 14 biological model parameters are structurally identifiable from measurements of all three state variables; however, K 1 and K 2 are only locally structurally identifiable. SIAN cannot assess parameters that appear in an exponent [36,52], so we fixed n 1 and n 2 to their nominal values listed in S2 Table. To avoid global versus local identifiability complications, we fixed K 1 and K 2 to their nominal values. Additionally, we omitted the total concentration parameters, S 1t , S 2t , and S 3t , from further analysis because we assume they would be specified according to the cell type that corresponds to the available data. As a result, we narrowed the free parameters down to a set of 9 parameters, from a set of 14 originally.
Next, we used global sensitivity analysis as described in Section 1.4 to further reduce the number of free parameters. We choose the quantities of interest for sensitivity analysis for the bistable and oscillatory regimes separately. The quantities of interest for the bistable regime are the steady state values of x 2 and x 3 (the values at t = 30 min). Those for the oscillatory regime are the limit cycle amplitude and limit cycle period (computed following Section 1.13). In the computations, we allowed the biological model parameters to vary uniformly over the ranges listed in S2 Table with 5,000 samples for the bistable cases and 15,000 samples for the limit cycle regime. Fig 4C and 4D show the computed Sobol sensitivity indices for the parameters ranked by decreasing sensitivity index. We selected the parameters with the greatest sensitivity indices, that is, the parameters to which the output quantities of interest are most sensitive. For the bistable case, we selected the four most sensitive parameters, which were k 2 , k 4 , k 5 , and k 6 . For the oscillatory case, we selected the parameters with a first-order sensitivity index S i greater than 10 −3 , which were k 2 , k 4 , k 6 , and α. All remaining biological model parameters were fixed to the nominal values listed in S2 Table. Sensitivity analysis highlighted that k 2 , k 4 , and k 6 are important in predicting both dynamical regimes. Meanwhile, parameters such as k 5 and α are only important for the bistable and oscillatory regimes, respectively.
After identifiability and sensitivity analyses, we applied the CIUKF-MCMC method to estimate model parameters that predict the correct steady state in the bistable case. To simulate noisy experimental data, we generated two synthetic datasets (see Section 1.12), sampled from the high steady state and the low steady state (circle and square markers in Fig 4B). Each dataset had 30 full-state measurements evenly spaced over 0 � t � 30 (min) with measurement noise covariances set to 2.5% of the variances of the true trajectories. The prior assumptions were specified according to Section 1.8 for all model parameters with the bounds listed in S2 Table. Using the CIUKF-MCMC algorithm with AIES (Section 1.9.3), we ran an ensemble of 150 Markov chains with 3,500 samples per chain (525,000 total samples) for each of the datasets (as shown in S10 Fig). The maximum integrated autocorrelation times (Section 1.9.5) were 120.06 for the low steady state and 169.03 for the high steady state, leading us to discard 840 samples for the low steady state and 1,183 samples for the high steady state as burn-in.
The estimated marginal posterior distributions (solid black line; Fig 5A and 5B for the low and high steady states, respectively) indicated varying levels of uncertainty between the model parameters and across the two steady states. For example, in both the low and high steady states, the marginal posterior for k 2 has most of its probability mass centered around the nominal value (dashed black lines), while that for k 6 has probability mass spread over a broader range of the prior support (range of the prior bounds). Additionally, the MAP points (dotted blue line) for the low steady state closely correspond to the nominal values for all model parameters, whereas there is a significant discrepancy between the MAP and the nominal values for k 4 , k 5 and k 6 in the high steady state case.
An ensemble of 30,000 simulations with randomly selected posterior samples (see Section 1.10) represented the posterior distributions of the dynamics for both steady states. For the low steady state, the trajectory evaluated at the MAP point (dotted blue line in Fig 5C) closely matches the true trajectory (dashed black line) and the 95% credible interval (green shaded region) tightly constrains these trajectories. However, the trajectory at the MAP point for the high steady state (Fig 5D) reaches the low steady state rather than the high steady state (solid green line). Furthermore, the 95% credible interval for the high steady state closely follows the initial transient (0 � t � 10 (min)), but it covers both steady states by the end of the simulation, e.g., for 10 � t (min). The considerable uncertainty and bias in the estimated dynamics of the high steady state are unsurprising, given the uncertainty observed in the marginal posterior distributions. This comprehensive uncertainty analysis of the bistable MAPK dynamics showed that the presence of multiple steady states makes parameter estimation harder for the same set of parameters and governing equations. In particular, the estimation uncertainty is much lower when data from the low steady state is supplied for estimation than when data from the high steady state is used.
Next, we used CIUKF-MCMC to estimate posterior distributions for the reduced set of model parameters that predict limit cycle oscillations in x 3 (t). Synthetic data with 15 samples evenly spaced over 30 � t � 60 (min) simulated noisy measurements from the oscillating trajectory at a noise level of 1% of the variance of the true trajectory (green points in Fig 6B). We refer to this data as the oscillations only data, because it only includes samples from the limit cycle and not from the initial transient. Using CIUKF-MCMC with AIES, we ran 30 Markov chains (shown in S11 Fig) with 51,000 steps per chain to sample the parameter posterior distributions. We discarded 7,477 samples per chain, seven times the integrated autocorrelation time of 1,068, to account for burn-in. The marginal posterior distributions estimated from the remaining posterior samples (Fig 6A) tightly concentrate the probability around the nominal parameter values. This high level of certainty in the parameter values leads to a posterior distribution of x 3 (t) (represented with an ensemble of 30,000 simulations in Fig 6B) that closely follows the true limit cycle oscillations (see S4 Fig for those of x 1 (t) and x 2 (t)). Additionally, the trajectory evaluated at the MAP point (dotted blue line) follows the true trajectory (dashed black line) and shows very similar oscillations.
Closer examination of a subset of 50 out of the 30,000 posterior trajectories (green lines in Fig 6C) revealed that the 95% credible interval includes many limit cycles that are similar to the true trajectory (dashed black lines), a smaller number of limit cycles that do not match the true oscillations, and yet an even smaller number of trajectories that reach fixed points (for example, the dotted blue line in Fig 6C). We further leveraged the posterior samples to quantify the variability in the predicted limit cycles. First, we characterized all of the trajectories in the ensemble (Fig 6D) and found that 90.6% (27,182 samples) correctly produce limit cycle oscillations while 9.6% (2,818 samples) reach a fixed point. Quantification of the limit cycle amplitude and period for each of the 27,182 oscillating trajectories showed that despite the small uncertainties in model parameters, we still observe variability in the characteristics of predicted limit cycles. We find that the most probable limit cycle amplitude and period (see histograms in Fig 6E) are close to the true values (26.12 for the amplitude and 13.04 for the period; indicated by vertical dashed black lines). However, we still observe variability in these quantities despite the high level of certainty in the parameter estimates. Specifically, we find that the limit cycle amplitude often deviates by up to 30% (10 nM) from the true amplitude. While the period deviates by a smaller amount, up to 15% (2 min), we find that its distribution is skewed to periods that are shorter than the true period. Based on these findings, we conclude that small uncertainties in the model parameters result in limit cycles with varied amplitudes and periods, but do not strongly affect our ability to predict oscillatory dynamics.
Next, we investigated whether the nature of the data supplied for estimation affects our ability to predict limit cycle oscillations. To investigate this, we repeated parameter estimation . The 95% credible interval shows the region between the 2.5th and 97.5th percentiles that contains 95% of 30,000 posterior trajectories. (C) Sample posterior trajectories (50 out of 30,000 total) reveal that most trajectories closely match the true limit cycles. Additionally, several trajectories that reach a fixed point are shown. (D) Quantification of the fraction of the 30,000 sample trajectories that produce limit cycles oscillations, 90.6% (27,182 samples), or reach a fixed point, 9.4% (2,818 samples). (E) Histograms quantify the variability in limit cycle amplitude and period for the 27,182 trajectories that show limit cycle oscillations. We define the limit cycle amplitude as the peak-to-peak difference for one oscillation, and the period is the time to complete an oscillation. The vertical black lines show these quantities for the true trajectory.
https://doi.org/10.1371/journal.pcbi.1010651.g006 with two additional noisy limit cycle data sets (shown in Fig 7A with the noise level set to 1% of the variance of the true trajectory). We refer to these new data as the equidistant sampling data and the non-equidistant sampling data. The equidistant sampling data includes 20 samples taken every two minutes over 0 < t � 60 minutes and covers the initial transient and several periods of oscillations (see Fig 7A). The non-equidistant sampling data also covers the first 60 minutes, however it only has five samples taken every five minutes for the initial transient (the first 30 minutes) and the same 15 samples taken every two minutes for the remaining 30 minutes (20 samples in total). We compared this to the results for the oscillations only data (Fig 6) to investigate how samples from the initial transient affects estimation. We performed MCMC with 150 walkers for 22,500 steps for the equidistant sampling data (S12 Fig) and 30 AIES walkers for 36,500 steps for the non-equidistant sampling data (S13 Fig). To account for the MCMC burn in, we discarded the first 5,669 samples per chain for the equidistant data and 11,065 samples for the non-equidistant data. Additionally, we simulated ensembles of 30,000 trajectories with random posterior samples for the two new data sets; S5  Comparison of the fraction of the 30,000 simulated trajectories that show oscillations clearly indicates that the input data affects our ability to predict limit cycles (see Fig 7B). As a baseline, we compare the three data sets to simulations with samples from the prior distribution, which we think of as the setting where no data are provided. Fig 7B shows that all cases with data predict a higher faction of limit cycle trajectories than the prior samples, where only 17.6% of simulations oscillated. We found that only 45.9% (13,794 samples) of the simulations corresponding to the equidistant sampling data show sustained oscillations. However as the number of samples from the initial transient (data from 0 to 30 minutes) is decreased, the fraction of oscillating simulations increased; 63.1% (18,921 samples) and 90.6% (27,182 samples) of the simulations corresponding to the non-equidistant sampling and the oscillation only data, respectively, showed limit cycle oscillations.
We additionally found that uncertainty in the nature of the dynamics is closely tied to the uncertainty in the model parameters. Fig 7C, 7D, and 7E show the marginal posterior distributions corresponding to the equidistant sampling, non-equidistant sampling, and oscillations only data, respectively. Specifically, we observed that we can always predict k 2 with high certainty, but the amount of uncertainty in k 4 , k 6 and α varies between the three data sets. For example, for the equidistant sampling data, the marginal distribution for k 4 (Fig 7C) has three distinct peaks of high probability (modes); one concentrated at very small values, another centered near the nominal value and a wider mode at higher values. The k 4 marginal distribution for the non-equidistant sampling data in Fig 7D places less probability at the two modes far from the nominal value, but the mode near the nominal value is shifted towards higher values of k 4 . Lastly, for the oscillations only data, we see only one prominent mode for k 4 that is very close to the nominal value (Fig 7F). Similar trends are seen for the α marginal distributions. The posterior distribution of dynamics in S5 Fig and the distributions of the limit cycle characteristics in S6 Fig show that the increased uncertainty in these model parameters increases variability in the predicted dynamics. In cases with high uncertainty in the model parameters, the 95% credible intervals for x 3 (t), for example in S5 Fig, do not tightly constrain the dynamics, and the limit cycle characteristics vary more ( S6 Fig). Lastly, we used the results of our Bayesian analysis to discover relationships between parameter pairs and the nature of the corresponding predicted dynamics. Based on observations that increased uncertainty in both k 4 and α results in higher fractions of incorrectly predicted fixed point trajectories and prior knowledge of how the model parameters interact to control the interplay of positive and negative feedback [56], we hypothesized that we might observe correlations between pairs of model parameters and the predicted dynamics. To test this, we plotted Distributions are visualized by fitting a kernel density estimator to 2,524,800 samples for the equidistant sampling data, 763,080 for the nonequidistant sampling data, and 1,305,720 for the oscillations only data. (F-H) Two-dimensional scatter plots reveal relationships between k 4 and α that are necessary to produce limit oscillations. Simulations with blue points produce limit cycle oscillations, and those with red points produce fixed points. Darker regions indicate a higher probability of observing the corresponding parameter values. two-dimensional scatter plots of all pairs of parameters for each data set in S7-S9 Figs, and for k 4 and α in Fig 7F, 7G, and 7H. The points are colored blue if the corresponding trajectory produced a limit cycle and red if the trajectory reached a fixed point. We observed the strongest correlation between k 4 and α, where, for the equidistant sampling data, (see Fig 7F), the values of both parameters needed to be above the smallest values, but not too large, to produce oscillations. The plots for the non-equidistant sampling data ( Fig 7G) and oscillations only data (Fig 7H) further highlight the relationship between k 4 and α. Based on these results, we conclude that k 4 and α are correlated such that their values must be within specific ranges to produce limit cycles. We note that we were able to discover this correlation through a Bayesian analysis; while, it is visually most apparent for the equidistant sampling data due to the high estimation uncertainty, it is a property of the model itself and not of the data.
In summary, comprehensive UQ for the MAPK model highlighted how the existence of multistability introduces additional uncertainties into parameter estimation. Specifically, sensitivity analysis identified two parameters of the MAPK model, k 5 and α, that were only important for the bistable dynamics and the oscillatory dynamics, respectively. In both cases, Bayesian parameter estimation was able to predict the correct type of dynamics but showed remaining uncertainty in the specific characteristics of the dynamics. By varying the data supplied for estimation, we found that the nature of the data set provided to CIUKF-MCMC affects our ability to predict limit cycles. Interestingly, we found that more data does not always improve estimation and that having less data that is more focused on the limit cycles reduced estimation uncertainties and gave better posterior parameter distributions. Additionally, we discovered that the values of both k 4 and α need to be within a certain range to predict oscillatory MAPK dynamics. Overall, the proposed framework for comprehensive UQ found parameters that were important to each dynamical regime and directly quantified how uncertainties in these parameters contributes to uncertainty in the dynamics.

Parameter estimation in a phenomenological model of long-term potentiation/depression
A phenomenological model of coupled kinase and phosphatase switches whose activities affect the level of membrane-bound AMPAR (alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptor) as a reporter of synaptic plasticity was proposed in [57] to capture the key events in synaptic plasticity. This kinase-phosphatase model has three states pK(t), P(t), and A(t) that correspond to active forms of kinase (CaMKII in [57]), phosphatase (PP2A in [57]), and membrane-bound AMPAR, respectively. The differential equations are dAðtÞ dt ¼ ðc 1 � pKðtÞ þ c 3 Þ � ðA tot À AðtÞÞ À ðc 2 � PðtÞ þ c 4 Þ � AðtÞ; where the 24 biological model parameters are θ f ¼ ½k 1 , k 2 , k 3 , k 4 , k 5 , k 6 , k 7 , k 8 , K m 1 , K m 2 , K m 3 , K m 4 , K m 5 , K 0 , P 0 , K tot , P tot , A tot , c 1 , c 2 , c 3 , c 4 ] > . The nominal values and physiological ranges for these parameters are listed in S4 Table. This model predicts tristability (three steady states) in the level of excitatory postsynaptic potential (EPSP) as a function of the calcium Ca 2+ (t) input. The normalized EPSP is the membrane-bound AMPAR A(t) level, normalized to the initial condition, e.g., normalized EPSP = A(t)/A(t = 0), as defined in [57]. Fig 8B shows simulations of the three expected responses with the nominal parameter values from [57]. The initial condition x 0 = [0.0228, 0.0017, 0.4294] > used for all simulations was determined by allowing the system to reach steady state with the baseline Ca 2+ (t) level of Ca 2+ (t) � 0.1 [μM]. The three steady states are the initial baseline, the higher long-term potentiation state (LTP; trajectory depicted in dashed black Fig 8B), and the lower long-term depression state (LTD; trajectory depicted in solid green Fig 8B). The LTP state is obtained by applying a constant stimulus of Ca 2+ (t) � 4.0 [μM] from 1 � t � 3 (sec), while the LTD state is reached by applying a constant stimulus of Ca 2+ (t) � 2.2 [μM] in the same time interval; Ca 2+ (t) is set to the baseline level before (t < 1 (sec)) and after (t > 3 (sec)) the stimulus is applied. We investigated how well the proposed uncertainty quantification framework could estimate the model parameters for LTP and LTD from synthetic data of an LTP-inducing calcium input.
Following the proposed framework, the parameter space is reduced by performing identifiability and sensitivity analysis. First, identifiability analysis showed that all model parameters are globally structurally identifiable from full-state measurements. Next, global sensitivity analysis of the steady states in response to an LTP-inducing and an LTD-inducing input ranked the 22 globally identifiable model parameters. All free parameters were uniformly varied over two orders of magnitude centered around the nominal values, y � i , for each parameter, e.g. y i � Uð0:1 � y � i ; 10 � y � i Þ. In order to maintain conservation of mass, the lower bounds of the total concentration parameters, K tot , P tot , A tot , were chosen to be greater than or equal to the initial condition. Fig 8C shows the computed Sobol sensitivity indices for the LTP-inducing input and Fig 8D shows those for the LTD-inducing input. The sensitivity analyses point to the same group of seven model parameters for both the LTP-inducing and LTD-inducing inputs. These are θ f ¼ ½k 2 , k 6 , K 0 , P 0 , K tot , P tot , A tot ] > , whose first-order indices were greater than 0.05, e.g., S i > 0.05. We chose to estimate these seven parameters and fix the remaining model parameters to the nominal values listed in S4 Table. Next, we used the CIUKF-MCMC algorithm with AIES to estimate the posterior distribution for the reduced set of parameters. The parameters were estimated from noisy synthetic data with 36 full-state measurements of the LTP response. The data are spread uniformly over the domain 0 � t � 9 (sec) at a noise level of 1% of the variance of the true trajectory for the respective states. The maximum integrated autocorrelation time of the ensemble of 150 Markov chains with 8,000 steps per chain was 621.58, leading us to discard 4,351 samples as burnin. Traces of the ensemble of Markov chains for all parameters are shown in S14 Fig. Fig 9A shows the estimated marginal posterior distributions for the free model parameters, k 2 , k 6 , P 0 , K 0 , P tot , K tot , A tot . We observed different levels of uncertainty across the estimated parameters. For instance, the marginal posterior for A tot indicated a very high level of certainty with almost all of the probability mass, and thus the MAP point, aligned to the nominal value. However, the marginal posterior distributions for P 0 and K 0 show much more significant uncertainty because we observe posterior probability spread over the entire support of the prior. Additionally, the marginal posterior distributions for k 2 , k 6 , P tot , and K tot show a large mode around the MAP point that is shifted from the nominal value and a smaller mode at the nominal value. ] from 2 � t � 3 (sec)) and long-term depression (LTD; pulse of Ca 2+ (t) � 2.2 [μM] from 2 � t � 3 (sec)) inducing calcium inputs. The calcium level is set to a baseline of Ca 2+ (t) � 0.1 [μM] before and after stimulus. We compute normalized EPSP by normalizing A(t) to its initial condition as described in [57]. The synthetic noisy data for the LTP and LTD cases are indicated by the black square and green circle marks, respectively, with the noise covariance equal to 1% of the variance of the data. (C and D) Sobol sensitivity indices for all free model parameters in response to LTP-inducing and LTD-inducing inputs, respectively. The quantities of interest are the steady state values of each state variable. We show both the first-order sensitivity indices S i and the totalorder indices S T i . We select a reduced set of free parameters by choosing the parameters whose first-order sensitivity index is greater than 0.05, e.g., S i > 0.05. This gives us the same set of free parameters, θ f ¼ ½k 2 , k 6 , K 0 , P 0 , K tot , P tot , A tot ] > , for both the LTP and LTD cases. Remaining model parameters are fixed to the nominal values in S4 Table. https://doi.org/10.1371/journal.pcbi.1010651.g008

PLOS COMPUTATIONAL BIOLOGY
Using the posterior samples, an ensemble of 30,000 simulations (see Section 1.10) represented the posterior distribution of the predicted dynamics in response to an LTP-inducing input, as shown in Fig 9B. We observed that the 95% credible interval for the normalized EPSP covered both LTP (normalized EPSP >1) and LTD (normalized EPSP <1) responses even though the input was LTP-inducing. However, the trajectory evaluated at the MAP point (dotted blue line) matched the true trajectory (dashed black line), indicating that most trajectories align with the expected LTP response. Examination of the individual trajectories within the ensemble simulation (Fig 9B, top row) and the normalized EPSP steady state values (Fig 9C, top row) confirmed that there are both LTP (blue traces) and LTD (black traces) responses to the LTP-inducing input. Specifically, we found that 76.59% (22,972 samples) of the responses correctly predict LTP, and 23.41% (7,024 samples) of response incorrectly predict LTD (Fig 9D). Therefore, despite the high-quality data (many measurements and low noise), we still observed substantial uncertainty in the predicted normalized EPSP.
Lastly, we investigated if the posterior distribution estimated in the LTP regime can predict the response to an LTD-inducing input. An initial hypothesis was that in response to an LTDinducing input, most trajectories would predict LTD, but a significant subset would predict the incorrect response, LTP. An additional ensemble of 30,000 simulations, with an LTDinducing input of Ca 2+ (t) � 2.2 [μM] (pulse from 2 � t � 3 (sec)) was used to determine the posterior distribution of the dynamics. Visualization of 100 of the 30,000 trajectories in Fig 9E  (bottom row) again showed both LTP (blue) and LTD (black) responses; however, we unexpectedly observed more LTP than LTD in response to the LTD-inducing input. The distribution of normalized EPSP responses (bottom row of Fig 9D) confirms that there are a large number of responses in the LTP regime with a minor mode around the correct response. Quantification of these results highlights that only 31.07% (9,320 samples) of responses are in the LTD state (expected response) while 68.93% (20,680 samples) of the responses are in the LTP state (unexpected response). In summary, this model formulation can correctly capture the same LTP behavior over a range of model parameters while losing the ability to predict LTD behavior.
Despite the LTP and LTD responses being sensitive to the same set of parameters (see Fig  8C and 8D), the posterior distribution estimated from measurements of LTP places more probability on parameter sets that robustly predict LTP over those that correctly predict both responses. From these results, we conclude that for this model, we need to learn the model parameters with a high degree of certainty in order to disambiguate the LTP versus the LTD response because sensitivity analyses revealed that the same set of parameters governs these two different outputs. This finding highlights that sensitivity analyses are not sufficient to distinguish parameter uncertainty for systems with multistability and a comprehensive framework as outlined here is necessary to shed light on such model complexities.

Discussion
In this work, we developed a framework (see Section 1.1) for comprehensive uncertainty quantification of dynamical models in systems biology. The proposed framework leverages identifiability and sensitivity analysis to reduce the parameter space (Sections 1.3 and 1.4) followed by Bayesian parameter estimation with CIUKF-MCMC (Section 1.6). We applied this framework to three systems biology models to demonstrate its applicability and highlight how a focus on uncertainty can transform modeling-based studies. First, we performed two computational experiments on a simple two-state model that showed how noise and data sparsity contribute to estimation uncertainty. Next, we applied our framework to two models, the MAPK and the synaptic plasticity models, which better resemble the models used to capture biological readouts. Using these models, we highlight how comprehensive uncertainty quantification enables quantitative analysis of two biologically relevant dynamical behaviors, limit cycles and steady state responses. We also found that good quality data cannot always overcome uncertainty due to the model structure. These examples provide an essential starting point for applying our framework in practice and interpreting systems biology studies under uncertainty.
Our results highlight how a focus on uncertainty quantification can give new insights in modeling-based studies. For example, in Section 2.2 we were able to learn a posterior distribution for the parameters that predict limit cycles with high probability. The posterior distribution is our best guess for the distribution of the model parameters after incorporating the available data into the statistical model. Therefore, the posterior distribution for the dynamics (approximated via an ensemble simulation as in Section 2.10) provides our best guess for the dynamics, given everything we know about the model and the data. For the MAPK limit cycle oscillations, this best guess includes dynamics with a range of limit cycle properties. Additional analysis of the parameters that produce limit cycle oscillations revealed correlations between k 4 and α and that their values must both be within a certain range to produce the expected dynamics. Therefore, incorporating uncertainty quantification into modeling provides additional context and understanding of the system of interest.
We also observe that predictions do not always capture the correct bistability in the example of high MAPK steady state (Section 2.2) and the LTP/LTD response (Section 2.3). In both cases, the 95% credible intervals of the ensemble of predicted dynamics cover both the higher and lower steady states (LTP and LTD in the synaptic plasticity example). These results imply that the estimated posterior distributions for these models include parameter sets that no longer show bistability at the specified input (the initial condition for the MAPK model or the calcium level for the synaptic plasticity model). Further analysis of these systems could test if these parameter sets lead to a complete loss of bistability or merely shift the bifurcation point, the value of the input that changes which steady state is reached. Overall our results point to a complex interplay between model parameters and inputs that potentially confounds parameter estimation of multistable systems.
As highlighted in Sections 2.1 and 2.2, we showed that estimation uncertainty is a result of the quality, quantity, and nature of the available experimental data. As expected, we demonstrated, for the simple two state model (Fig 3,) that data with more noise and fewer samples leads to estimates with wider uncertainty intervals. Furthermore, for the MAPK oscillations (Fig 7), we found that the sampling strategy, either including data from the initial transient or focusing on sampling from the oscillations, drastically affects estimation uncertainty. This finding contradicts the common intuition that more data is always better, as we had the smallest estimation uncertainty in predictions made using data that had the fewest samples, the oscillations only data. We suspect that these observations can be understood by considering that the likelihood function measures the mismatch between the data and estimated trajectory with an l 2 -like norm, for example see Eq (12). When we include data from both the initial transient and oscillations, as in the equidistant and non-equidistant sampling cases in Section 2.2, the likelihood probability is similar for all trajectories that capture the initial transient, regardless of whether they oscillate or reach a fixed point. However, the oscillations only data focuses the likelihood on the long-time dynamics and places lower probability on trajectories that reach a fixed point. While l 2 -norm-based likelihood functions are often used in Bayesian methods, one can also design a likelihood function to explicitly determine if a trajectory displays the same features of the data, such as the feature-based approach in [90]. These findings imply that thoughtful experimental design can reduce unnecessary and potentially costly data collection and can increase certainty in parameter estimates and model predictions.
In the proposed framework, we stress that global identifiability and sensitivity analyses are essential to practical parameter estimation and uncertainty quantification. By running variations of the proposed framework without these analyses in Section 1 in S1 Text, we found that each analysis reduces estimation uncertainty and leads to an easier estimation problem. In the complete framework, we isolate globally identifiable parameters and choose a cutoff value of the Sobol sensitivity index (see Section 1.4) to select the influential parameter subset. Throughout this work, we choose a logical sensitivity index threshold that corresponds either to a sharp drop in or a logical value of the sensitivity index, such as the S i � 0.1 criterion that we apply in several of the example problems. However, we caution that selecting the subset of influential parameters is an intricate problem that requires users to decide the acceptable amount of information to discard from the uncertainty analysis [91,92]. Alternative methods to the thresholding approach include those discussed in [91], group sensitivity analysis [37] or lasso regression [93]. While these methods aim to select a subset of influential parameters, they are similar to the thresholding approach in that each method hinges upon determining the acceptable number of degrees of freedom to exclude from the analysis. Future research should develop methods for fully quantitative parameter subset selection methods.
While the computations remained tractable for the three example models in this paper, it is essential to consider how complete uncertainty quantification can increase the computational costs of parameter estimation. In [26], the authors showed that the original UKF-MCMC algorithm required more floating point operations than methods that do not account for all sources of uncertainty. Similarly, we found (see Section 2 in S1 Text) that a single CIUKF likelihood evaluation takes 81 times longer than an evaluation of a similar likelihood function that does not account for uncertainty in the model form. A deeper look into the execution of the CIUKF likelihood function revealed that approximately 60% of the runtime was spent on calls to ode15s(), which we use to discretize the differential equation model. The added computational cost of CIUKF may be reduced in the future by developing optimized codes for discretizing the differential equation model and evaluating the likelihood function.
Bayesian methods, such as CIUKF-MCMC, enable complete uncertainty quantification; however, the need for MCMC sampling and the associated repeated likelihood evaluations can pose an increased computational expense. In particular, as models include greater numbers of state variables and parameters, the added expense of simulation, the increased dimensionality, and the potentially complex geometry of the posterior distributions can challenge standard MCMC approaches. Innovative methods from the broader UQ community, combined with the increasing accessibility of high-performance computing resources, enable Bayesian analysis of models with large numbers of state variables and parameters [19,94,95]. Additionally, while we showed that identifiability and sensitivity analyses could significantly reduce the dimensionality of the parameter space, this reduced space may remain relatively high dimensional for specific applications. Many MCMC sampling algorithms that incorporate information about the geometry of the distribution or run parallel Markov chains can sample highdimensional and possibly complex posterior distributions. These methods include parallel MCMC samplers [96,97], Hamiltonian Monte Carlo [27] and dimension independent, likelihood-informed methods [98]. Although comprehensive uncertainty quantification comes with added computational costs, recent innovations in UQ methods can alleviate some of the added computational burdens of comprehensive analysis.
Throughout this work, we make several assumptions regarding the availability of the model equations and the statistical properties of associated errors. First, we assume that the model equations are known prior to parameter estimation. This assumption reflects standard modeling practices where models use biochemical theories that assume equations for the kinetics of biochemical reactions. In using the CIUKF-MCMC we somewhat weaken this assumption because it introduces process noise to account for uncertainty in the model form [26]. In reality, all models have some level of uncertainty because they rely on assumptions about the system. Therefore, accounting for model form uncertainty regularizes the dynamics to account for a mismatch between the predicted dynamics and the data [26]. However, in cases where the model is not known a priori, it may be necessary to estimate the model structure and parameters from the data simultaneously. One approach to disambiguate a model structure is to learn the biochemical reaction network [99,100] or the mathematical model directly from data [101][102][103][104][105]. Additionally, it is possible to cast these problems in the Bayesian perspective to learn the model form and the associated uncertainties [26,106].
Furthermore, we make several assumptions when choosing statistical models for measurement and process noise and simulating biological measurement data. First, we assume Gaussian measurement and process noises in applying the CIUKF-MCMC method. Standard Kalman filtering approaches revolve around assumptions of normally distributed noise processes [67]. However, we may be better able to describe noise in biological systems and experiments with alternate probability distributions (see, e.g., [46]). Assuming a non-Gaussian distribution for the measurement noise would require reformulating the likelihood function (it would alter the distribution in Eq (12)); however, incorporating alternative distributions for the process noise would require approximations for the marginal likelihood that go beyond Kalman filtering. Next, this work considers linear measurement functions, e.g., Eq (3), but CIUKF-MCMC is also well-suited to handle nonlinear measurement functions [26]. Lastly, we assume that we only have access to a single time series of measurements, e.g., one trial of an experiment; however, most experiments in biology perform several repeated trials. A straightforward approach to incorporate all available data to inform the statistical model would use the mean of each time point and estimate parameters from the time series of means. Additionally, one could estimate parameters from each time series separately and then analyze several posterior distributions that one could merge via meta-analysis or information fusion principles [107][108][109], or construct a statistical model that accounts for the multiple time series [27] simultaneously.
The framework introduced in this work can be applied and extended to enable comprehensive uncertainty quantification for most dynamical models encountered in systems biology. Future research should focus on rigorous methods for parameter subset selection, applying MCMC samplers that are well suited for complex, high-dimensional distributions to systems biology, and incorporating more specific statistical models of biological data and noise. In summary, research at the intersection of uncertainty quantification and systems biology modeling will strengthen parameter estimation and enable models that more accurately predict the dynamics observed in experimental measurements.  Fig 3C. We observe that increasing the data noise level increased the uncertainty in the predicted dynamics. We control the noise level by setting the noise covariances to the specified percentage of the standard deviation of each state variable. The dashed black vertical lines indicate each parameter's nominal (true) value. (B) corresponds to the data sparsity experiment, Fig 3E. We observe that decreasing the number of experimental samples supplied for estimation increased estimation uncertainty. The noise level in the data was fixed to the 2.5% level shown above. (TIF)   Table. Long-term potentiation/depression model parameters from [57] and ranges. The ranges are given by ½0:1 � y � i ; 10 � y � I�, where y � i is the nominal value. Note: we do not include ranges for n 1 and n 2 because these parameters are always set to the nominal values.