Efficient Characterization of Parametric Uncertainty of Complex (Bio)chemical Networks

Parametric uncertainty is a particularly challenging and relevant aspect of systems analysis in domains such as systems biology where, both for inference and for assessing prediction uncertainties, it is essential to characterize the system behavior globally in the parameter space. However, current methods based on local approximations or on Monte-Carlo sampling cope only insufficiently with high-dimensional parameter spaces associated with complex network models. Here, we propose an alternative deterministic methodology that relies on sparse polynomial approximations. We propose a deterministic computational interpolation scheme which identifies most significant expansion coefficients adaptively. We present its performance in kinetic model equations from computational systems biology with several hundred parameters and state variables, leading to numerical approximations of the parametric solution on the entire parameter space. The scheme is based on adaptive Smolyak interpolation of the parametric solution at judiciously and adaptively chosen points in parameter space. As Monte-Carlo sampling, it is “non-intrusive” and well-suited for massively parallel implementation, but affords higher convergence rates. This opens up new avenues for large-scale dynamic network analysis by enabling scaling for many applications, including parameter estimation, uncertainty quantification, and systems design.


Materials and Methods
For the approximation of the forward and inverse problem based on parametric, nonlinear deterministic systems of ODEs arising in the analysis of cellular networks, we use an adaptive Smolyak-type algorithm described in section 2. The algorithm successively identifies parameters with a large impact on quantities of interest, which leads to an adaptive refinement in the parameter space exploiting the sparsity and nonisotropic behavior of the underlying model. This strategy yields convergence rates independent of the number of unknown parameters and superior to MC based sampling strategies. The nonintrusive nature of the proposed algorithm allows to use available forward solvers and is well-suited for a parallel implementation.
All simulations in this work were performed on the BRUTUS and SC02 computing cluster at ETH Zurich (using up to 128 cores for a parallel simulation).

Theory and Proposed Method -Adaptive Smolyak Approach
We briefly review theoretical underpinnings of the proposed numerical method, in particular we make precise the notion of sparsity of the parameter dependence of the solution x(t) on the vector p of parameters. To this end, in this section we make explicit the dependence of x(t) on the parameter sequence p and write x(t; p). Since we are particularly interested in numerical methods whose performance does not deteriorate for a large number n p of parameters, we derived in [1] bounds on the approximability of the parametric solution x(t; p) in the case when n p = ∞, i.e. when p = (p j ) j≥1 denotes a parameter sequence. The local Lipschitz conditions for v(x(t), u(t), p) are satisfied, in particular, for v(x(t), u(t), p) = diag (p) ρ (x(t)) + Ou(t) which case occurs in mass-action kinetic models. In this case, the dependence of v(x(t), u(t), p) on p is affine, i.e. the right hand side is of the form f (x(t), u(t), p) = φ 0 (x(t), u(t)) + j≥1 p j φ j (x(t), u(t)) (1) and the mathematical results of [1] apply. Using that in (1), we may rescale the parameters and the φ j , we assume in the following wlog. that p j ∈ [−1, 1] for all j ∈ N.
The results of [1] imply, in particular, that under a smallness condition on the φ j (required to ensure convergence of the series in (1)), there exists a unique solution x(t; p) ( [1, Theorem 4]), the solution x(t; p) can be analytically continued as a function of the parameters p into the complex domain, and can be represented by so-called generalized polynomial chaos expansions with respect to multivariate tensorized, Legendreand Chebyshev polynomials of p which converge unconditionally on the possibly infinitedimensional parameter space U = [−1, 1] N . To state these results (which are formulated and valid even for problems with infinitely many parameters indexed by N, ie., for parameter sequences) we introduced in [1] the multi-index set As any ν ∈ F has only finitely many nonzero entries, the definitions for multi-factorials, the length of a multi-index ν and for the partial derivative of order ν are well-defined for ν ∈ F. At every p ∈ U ⊂ R N , the solution x(t) of (1) can be represented by the formal Taylor expansion where, for ν ∈ F : p ν = p ν1 1 p ν2 2 ... and The convergence of the formal Taylor expansion (3) is shown in [1] to be unconditional and pointwise in [0, T ]×U , i.e. we have pointwise convergence in U as a mapping taking values in the space C k+1 ([0, T ]; S), for any finite order k of differentiability with respect to the time variable t. Therefore, in particular, the convergence theory for any standard timestepping scheme for the approximate numerical solution of the parametric IVP (1) applies.
In our computational approach, we truncate the Taylor series (3) to a finite number of N terms, i.e. we consider with suitable index sets M N ⊂ F of at most N indices. Evidently, the question is what the best/ optimal index set M N is, second what the convergence rate for the error x(t) − x M N (t) is and, third, how such sets M N could be spotted computationally. It was shown in [1] that there exist so-called monotone sequences of sparse index sets M N ⊂ F (to which we will also refer as "sparsity models" for the parametric dependence of x(t)) which identify at most N "active" Taylor coefficients T ν (t), ν ∈ M N in (3), which can be constructed such that the corresponding, finitely truncated parametric expansions (4) realize a convergence rate with respect to the number N of terms which equals the so-called best N -term convergence rate. Second, once such truncations have been selected, it will be necessary to solve the initial value problems by ODE solvers for approximation of the expansion coefficients T ν (t) in (3).
A key role in our algorithms is played by index sets M ⊂ F that we refer to as lower or monotone index sets. This class of index sets was introduced in [2] in the context of adaptive, multivariate Taylor approximations of parametric elliptic partial differential equations. We now strengthen the N -term approximation properties of the Taylor series by constraining the admissible, nonempty index sets M ⊂ F.
The notion of lower index sets is based on the following semi-ordering of F: for any two indices µ, ν ∈ F, we say that µ ≤ ν if and only if µ j ≤ ν j for all j ≥ 1. We will also say that µ < ν if and only if µ ≤ ν for all j ∈ N and if µ j < ν j for at least one value of j.
A sequence (a ν ) ν∈F of nonnegative real numbers is said to be monotone decreasing if and only if for all µ, ν ∈ F µ ≤ ν ⇒ a ν ≤ a µ .
A nonempty set M ⊂ F is called monotone if and only if ν ∈ M and µ ≤ ν ⇒ µ ∈ M. We now present the sparsity result [1,Theorem 5]. We consider the parametric IVP ODE (1) for parameter vectors p ∈ U = [−1, 1] N . We assume that there exist real numbers R > 0 and 0 < κ < 1 with the following properties: 1. In (1) the vector field f depends on the parameter vector p in the affine fashion (1) with φ j satisfying for some sparsity parameter 0 < σ < 1 where the polyradius ρ is given by ρ j = max 1, δ 4Lj (R) for some arbitrary fixed δ > 0, and for L j (R) := φ j Lip0(S,R) .
2. The initial data x 0 ∈ C([0, T ] × U ; S) satisfies a smallness condition (see [1,orem 5] for details). sup Then the Taylor expansion (3) of the parametric solution x(t; p) of (1) is σ-sparse in the following sense: for every N ∈ N there exists a monotone sparsity model M N ⊂ F with #M N = N such that it holds, with rate r = 1/σ − 1, where the constant C > 0 is independent of n p and of N , and where with P M (U ; C k+1 ([0, T ]; S)) denoting the span of multivariate polynomials of p ∈ U with coefficients in the linear (Banach) space C k+1 ([0, T ]; S). The actual, computational approximation of the parametric solution x(t; p) family in the polynomial space P M (U ; C k+1 ([0, T ]; S)) can be constructed by sparse interpolation so that the (dimension-independent) convergence rate 1/σ−1 is realized by our proposed interpolation scheme. We now present some details of this scheme. Denoting by (z k ) k≥0 a sequence of univariate interpolation points and by (I k ) k≥0 the corresponding sequence of interpolation operators, the sparse interpolation operator is defined by with ∆ j = I j − I j−1 , j ≥ 0. Under the assumption (5) and (6), and if the univariate sequence (z k ) k≥0 is chosen such that the Lebesgue constant λ k ≤ (k+1) θ for some θ ≥ 1, there exists a constant C and a nested sequence of monotone index sets ( with r = 1 σ − 1, (independent of number N of interpolation points and of the number of parameters), see [3]. Analogous to [3][4][5], we propose a deterministic, adaptive algorithm, which iteratively determines a nested sequence (M N ) N ≥1 of monotone index sets to realize the rate (10).
The refinement strategy of the algorithm is based on a greedy-type approach aiming at localizing most profitable indices in a reduced neighborhood of the current index set defined by with j(M) = max{j : ν j > 0 for some ν ∈ M}, I ν = {j ∈ N : ν j = 0} ⊂ N. The estimated profit of an index ν ∈ N (M k ) in iteration k, denoted by g ν , is measured according to its expected contribution to the approximation in terms of g ν (M k ; Ξ) = max t∈Ξ |x(t)(z ν )−I M k x(t)(z ν )| with z ν denoting the corresponding interpolation point of the index ν and Ξ ⊂ [0, T ] is a suitable, finite subset, so that the dimension-adaptive index set is constructed recursively as follows.

4:
Determine the set of reduced neighbors N (M 1 ).

8:
Determine the set of reduced neighbors N (M k+1 ).

11:
end while 12: end function Details on the algorithm, which was originally proposed in the current version in [4], can also be found in [3,6] and in the references therein. In the present case, after rescaling, the parameter domain U = [−1, 1] np (cp. Eqn [4] of the main text).
Furthermore, the sparsity results presented allow to design efficient, deterministic algorithms for identification of parameters in the underlying system from noisy measurements. The problem of computational inference of responses of uncertain systems in the presence of noisy observational data can be solved using a Bayesian approach, suitably generalized to (infinite-dimensional) function space settings in [7,8]. In Bayesian prediction, the goal of computation is to evaluate numerically a mathematical expectation over all possible realizations of the unknown parameter sequence p conditional on given data D ∈ R K stemming from K observations where the uncertainty-to-observation map h : U → R K is given by h = O • G with O :→ R K denoting a bounded, linear observation operator and with G : U → C k+1 ([0, T ]; S)) denoting the solution map. We assume throughout what follows that the Bayesian prior measure on the uncertain parameters p is the uniform measure denoted by µ 0 (dp) and that the observational noise η ∈ R K is an additive Gaussian random variable with law N (0, Γ) with some positive covariance matrix Γ that is assumed to be known. The present analysis can be generalized to the case of nonuniform, separable priors with compactly supported densities. In [8], it is shown that assuming the boundedness and continuity of h(p), then µ D (dp), the distribution of p ∈ U given D, is absolutely continuous with respect to µ 0 (dp), i.e.
with the parametric Bayesian likelihood Θ given by with δ(p; D) := 1 2 |y − h(p)| 2 Γ and the normalization constant Note that this formulation is equivalent to the presentation in the main text, for example, with the normalization constant Z corresponding to the Bayesian evidence for the data.
In general, our aim is to compute the conditional expectation over all parameters p of a quantity of interest (QoI) Φ under the given noisy observational data and, in particular, we are interested in the case Φ = h(p), the states of the underlying system or higher moments of the states. To this end, Bayes' formula gives In [9], regularity and sparsity of the posterior density as a function of the parameter vector p ∈ U is analyzed. The results in [9], together with the sparsity analysis in [1], imply sparse approximations of the posterior with convergence rate r determined only by the model sparsity, but independent of the model size. These results are the basis for sparse, adaptive Smolyak quadrature algorithms to efficiently approximate the possibly infinite-dimensional integrals Z , Z defined in (13). Similar to the definition of the sparse interpolation operator in (9), we introduce the sparse quadrature operator for any finite monotone index set M ⊂ F by with the quadrature difference operators ∆ ν = j≥1 ∆ νj , ∆ j = Q j −Q j−1 and (Q k ) k≥0 sequence of univariate quadrature formulas, see [6,9] for details on the construction of the tensorized multivariate quadrature formulas. Then it can be proven, under appropriate assumptions on the univariate quadrature formulas and on the forward problem, i.e. under the sparsity assumption (5) and the smallness condition (6), that there exists a sequence of finite, monotone index sets and with r = 1 σ − 1. Analogously to the sparse interpolation, the construction of the monotone index sets (M N ) N ≥1 is based on the greedy-type strategy summarized in Algorithm 1.
Note that the convergence rates presented are given with respect to the cardinality of the index set M. Estimates of the work required for the evaluation of the adaptive Smolyak approximation rely on the specific choice of the univariate interpolation and quadrature formulas. In [3], details on univariate interpolation formulas leading to the same convergence rates with respect to the number of interpolation points can be found. The quadrature case requires a slight adaption of the rate by a factor of log 2 3, see [10, Proposition 1].

Models
To investigate the performance of the adaptive Smolyak approach, the proposed algorithm is applied to three models from the biochemical literature. The first model describes the uptake of glucose into cells of bakers' yeast S. cerevisiae, which is the first step of glycolysis. This model has 10 parameters that were assumed to be uncertain. Due to the moderate size, this model gives an opportunity to verify that our method is correctly implemented by comparison to Monte-Carlo simulations.
The second model describes the short-term response of the epidermal growth factor receptor (EGFR) pathway for stimulation with EGF. This model has 50 parameters, which is a typical number of parameters for models of cell signaling in the literature, but constitutes a substantial challenge for any known computational method for statistical inference due to the curse of dimensionality.
The third model describes the coupling of signaling pathways in mammalian cells. With 227 parameters, it is to our knowledge one of the largest dynamical models reported in the literature.

Model 1 -Glucose Transport in Yeast
The ODEs that describe the system dynamics are based on mass-action kinetics and take the form: where the nine state variables correspond to concentrations of the following chemical species: external glucose x e Glc , internal glucose x i Glc , internal G6P-bound carrier x i E−G6P , internal glucose and G6P-bound carrier x i E−Glc−G6P , internal G6P x i G6P , external glucose-bound carrier x e E−Glc , internal glucose-bound carrier x i E−Glc , external free carrier x e E , and internal free carrier x i E . The model has 10 rate parameters: We also need to specify at least three initial conditions for (external) glucose, G6P, and for carrier E, if we assume that the concentrations of the other chemical species are initially negligible.
This model was originally introduced in a reduced form in Rizzi et al. [11], and a derivation of the reduced model from Eq. (17) is presented in [12]. For this article, the rate parameters have been chosen as in [11] to reflect the assumptions used to derive the reduced model (e.g., the time scales of reactions resulting in quasi-steady state relations). The artificial observation data used in the numerical experiments is generated from a randomly chosen reference parameter (10 ±0.25 variations around the nominal point).
Observations are assumed to be available for the first state (external glucose) at time points 20.0, 30.0, 60.0 and for the first, second and fifth state at time point 30.0. The noise in the measurements is given by a normal distribution with variance 0.1 2 .

Model 2 -Epidermal growth factor receptor (EGFR) signaling
This model describes the short-term (up two minutes after stimulation) response of the epidermal growth factor receptor (EGFR) pathway upon stimulation with EGF [13].
The model was created to explain why the concentration of phosphorylated EGFR, phospholipase C-γ (PLC γ ), and the total concentration of Grb2-EGFR complexes, peak after about half a minute and then return to low levels within a few minutes upon EGF stimulation. Specifically, it was proposed that the peak in EGFR phosphorylation could be explained by EGFR being protected from phosphatases when bound to target molecules (Grb2, Shc, or PLC γ ), rather than by a rapid activation of (tyrosine) phosphatases. The model has 50 kinetic parameters, which were determined from previous experimental results in the literature. The one-dimensional sensitivity analysis in [13] demonstrated that most parameters could be individually varied without significant changes in the pathway response. However, changing all parameters simultaneously may result in a "marked change in the response dynamics", although it was also pointed out that a multiplication of all estimated parameters values by 2 only results in a scaling in time.
It was also noted that the model is sensitive to the relative concentrations of signaling proteins. By applying our method it is possible to investigate the model response to variations in any combination of parameter values around the estimated optimal parameter point. Within the inverse setting, we use artificial data, generated from a randomly chosen parameter value in the range of 10 ±0.1 and 10 ±0.25 around nominal (assuming a uniform prior distribution) and perturbed by normally distributed noise. For all numerical experiments presented below, the first state of the model is observed at discrete time points 20.0, 30.0, 60.0.

Model 3 -Coupled signaling pathways
This model describes the EGF (and heregulin) activated response in, and interaction of, a number of signaling pathways in mammalian (rat) cells. Components of the signaling network that can be simulated with this model are the mammalian ErbB signaling pathways and the MAPK and Akt cascades.
The model exists in two forms: with and without rules for time-dependent assignments of parameter values. We used the model without rules, downloaded from the supplementary material of [14], since only this model is directly compatible with our odeSD solver [15]. However, as pointed out by Chen et al. the difference between the models is small, although the model without rules cannot be simulated in the "pre-incubation" phase. We decided to keep the 500 initial conditions for the state variables of the model fixed to the values specified in the model, and to explore the impact of all the 227 model parameters in the dimension-adaptive algorithm.
Detailed computational results of 20 runs of the optimiziation algorithm by Chen et al. are provided in the supplementary material in [14], and 48 parameters and 9 initial conditions were calibrated to different values (in the A431 cell line). The remaining model parameters were either fixed to the values predefined in the model, or estimated to very similar values in all the runs of the optimization algorithm (numerically equivalent for the precision provided in the file: Parameters_A431_20models.mat in the supplementary material in [14]). Note that the model response variables should be sensitive to the parameters that are always estimated to similar values.
Our computations of the local sensitivities were made for the parameter point that is incorporated in the following model file (supplementary material of [14]): ErbB-Chen_et_al_2008-A431-norules.xml which was found in the folder:

Numerical Experiments
In the following section, we present numerical results of the proposed method for all three models introduced in Sec. S3. We compare the results and the performance of the adaptive Smolyak algorithm to state-of-the art methods (first-order sensitivity approach and MCMC methods) and discuss the improvements achieved as well as our method's limitations.

Comparison between the adaptive Smolyak interpolation and a firstorder sensitivity approach
In systems biology, common methods to analyze the system behavior with respect to changes in the parameter space are local methods based on first-order-sensitivity information. In the following, we will compare the proposed adaptive Smolyak interpolation with a first-order approach approximating the solution The sensitivity profile, which quantifies the maximum (absolute value of the) sensitivity over states and time with respect to the parameters, is shown in Fig. S1. The sensitivity index σ, defined by with n x number of states and n t number of discrete time points, equals 0.1866 for model 1. with n k = 2 · k + 1, for k ≥ 0.   We first notice that the Smolyak interpolation based on CC and RL points leads to errors in the range of 10 −5 , as predicted by the error estimator. The approximation error of the first-order approach is in the range of 10 −3 , which means a loss in accuracy of two orders of magnitude compared to the sparse grid interpolation. Increased accuracy can be crucial to suppress numerical errors in the analysis of system quantities and calibration to measurements. The adaptive sparse interpolation provides an efficient way to approximate the solution on the entire parameter space and thus, in contrast to local methods, allows to consider larger domains of parameter variations.

Bayesian inverse problem
We also estimated error curves for Bayesian inference with the Smolyak based method, which are shown for the normalization constant Z in Fig. S6, and for the quantity Z of the first moment of the six non-measurable state variables in Fig. S7, given observations of the first state variable (external glucose). The first and second moments of the six non-observable state variables over time are shown in Fig. S8 and Fig. S9, respectively.

Comparison between Monte Carlo and the adaptive Smolyak approach
To analyze the glucose transport model in section 3.1, we used a Metropolis Hastings Markov Chain Monte Carlo (MH-MCMC) algorithm [16], and the results are then compared to those of the Smolyak approach. Assume that we have the following (artificial) observations of the first state variable in the model x e Glc (external glucose) at time instants: x e Glc (t = 20) = 1.4695, x e Glc (t = 30) = 1.2386, x e Glc (t = 60) = 0.9625. We also assume that the standard deviation for the measurement noise is known and equal to 0.1 for all three time instants. We explore the parameter space in the interval [p 0 · 10 −0.25 , p 0 · 10 +0.25 ], where p 0 is the nominal point of model 1. The proposal distribution that we use for MH-MCMC is a uniform distribution on the interval of considered parameter values (±0.25, around the nominal parameter point in log-space) to compute estimates of the model entities. This makes a burn-in period (with discarded parameters) unnecessary, since the proposal distribution remains the same along the chain. However, we also tested the performance of proposal distributions with narrower distributions, for which the results were similar in terms of the convergence rate (data not shown). This strategy is of course only feasible in the case of artificial data, since in general, the unknown data is not at hand. Hence, the burn-in period is considerable and leads to a significant increase in the overall number of forward simulations needed to sample from the posterior.
To investigate the convergence rate of the MH-MCMC approach we generated a sequence of parameter points from the posterior distribution, and computed the posterior first moment of the state variables of interest based on all drawn parameter points. A sufficient number of parameter points was drawn (100.000 points) to ensure that the drawn parameter points could be used to compute the first moments to a sufficient numerical precision. We then simulated a second, independent sequence of parameter points from the posterior and estimated the distance to convergence for the points along this sequence, with the first chain as a reference. We used an empirical measure E k for convergence, after k samples have been drawn from the prior, which is given by: where the index i refers to the six non-observed state variables, t 1 , . . . , t 10 denote the discrete time instants, y ijk = 1 k k l=1 y ijl , k = 1, . . . , K is the mean of state i at time instant j for the first k points of the Markov chain, andŷ ijK is the estimated averaged over K points from a separate Markov chain.
The results are presented in Fig. 2E. Monte Carlo has a convergence rate of around 0.5 (in mean square with respect to the prior measure) as expected, which should be compared to the rate of around 2.6/4.1 ≈ 0.65 (in the maximum norm, ie., in the worstcase setting) for the adaptive Smolyak approach. However, note that the convergence rate of Smolyak is close to that of MH-MCMC for a couple of hundred ODE solves, before transitioning to a new (higher) asymptotic convergence rate which is stable (at around 1.2) after the first few thousand ODE solves. The convergence rate of the adaptive Smolyak approach is higher than for the MH-MCMC approach, indicating a significant improvement. However, the adaptive Smolyak approach does not seem to reach its full potential in this case since the parametric response of the model does not appear to be sparse (the model response appears to be equally sensitive to most parameters). Finally, we note that other numerical experiments, with different response variables and observation time points, show better convergence rates (> 1), cf. Fig. S6 and Fig. S7.
To verify that the adaptive Smolyak algorithm is correctly implemented we computed the first two moments of the state variables for the posterior with MH-MCMC, and compared those to the moments computed with adaptive Smolyak. The results for the first moment are shown in Fig. S10, where we observe an excellent agreement. The second moment of the state variables computed with MH-MCMC is presented as the covariance over time, and is shown in Fig. S11. Also here the results of MH-MCMC agree with those of adaptive Smolyak. We therefore conclude that the adaptive Smolyak method is correctly implemented, and that it converges faster than MH-MCMC, although the rate of convergence is strongly dependent on the sparsity of the model (cf. Fig. 2E).   A sensitivity analysis of the model, the sensitivity profile is displayed in Fig. S12, gives the sensitivity index σ, defined by (18) with n x = 23, n t = 6, σ = 5.8058e3. The sparse Smolyak interpolation approach gives the estimated error curves shown in Fig. 3A, which suggest a convergence rate of 0.75 with respect to the number of ODE solves needed. As discussed in section 2, the convergence rate is determined by the sparsity of the underlying problem.
Taking a look at the index set computed by the adaptive algorithm, we notice that the interpolation of the states requires an almost isotropic refinement (with interpolation formulas of order 1 and 2), cf. Fig. 3B, limiting the convergence rate.
Next, we compare the proposed approach with the first-order approximation by measuring the approximation error at three randomly chosen realizations of the 50 parameters, displayed in Fig. 3C, Fig. S13 and Fig. S14.  Figure S14. Comparison of the (absolute) errors for the adaptive sparse grid based on CC interpolation nodes (above) and the first-order sensitivity approximation (below), variations of ±0.25p 0 , 3rd realization, second model.
As already observed in the previous numerical example, the local method cannot capture the global behavior of the states and leads to an error of order 1, whereas the Smolyak approximation shows an error of the estimated order 10 −2 .

Analysis of 'sloppy' parameter sensitivities
In [17], changes in model behavior upon parametric perturbations are measured by the average squared change in the model states where p 0 denotes the nominal value of the parameters, n x denotes the number of states, n t is the number of time points considered in the model and σ k is defined as the maximum of state k over all (discrete) time points, i.e. σ k = max t∈{t0,...,tn t −1} x k (t, p 0 ). The Hessian matrix of χ 2 is computed to analyze the sensitivity of the models to parameter variations. Since in our models, we already reparametrized the parameter space, such that the parameter variations are in the range of [−1, 1], we do not consider derivatives w.r.t. log p as suggested in [17] to overcome scaling issues. The Hessian matrix H jk = dχ 2 dp j dp k is then approximated using finite differences, i.e. we compute an approximationH bỹ since χ 2 (p 0 ) = 0. The Hessian approximation is computed with a finite difference step size = 10 −6 .

Bayesian inverse problem
In our next numerical experiment, we are interested in the inverse problem, i.e. the goal of computation is the conditional expectation of a quantity of interest, state 1, with respect to noisy measurements. The 50 parameters are assumed to be uniformly distributed with variations in the log space of the range p 0 · 10 ±0.1 . The noise of the (synthetic) measurement data is modeled by a Gaussian random variable with zero mean and standard deviation 1. We observe a convergence order of almost 1 with respect to the number of ODE solves required for the approximation of the normalization constant and of the quantity Z defined by (13), see Fig. S15 and Fig. S16. Enlarging the parameter variations to p 0 · 10 ±0.25 (Fig. S17), we observe a deterioration of the convergence rate, caused by the increasing nonlinearity of the underlying model. However, compared to the state of the art Monte-Carlo methods, the convergence rate r is only limited by the intrinsic sparsity of parametric system response, with, in addition, maximum-norm (or worst-case) error bounds thus enabling computational uncertainty quantification and parameter identification also for complex reaction networks on high-dimensional parameter spaces.   The numerical experiment discussed above clearly demonstrates the potential of the proposed approach, especially for problems which depend on high-dimensional parameter spaces. The convergence rates and approximation error could be significantly improved compared to the state of the art methods. Limitations of the adaptive Smolyak approach arise by increasing the parameter domain leading to a highly nonlinear behavior of the underlying model, which cannot be captured properly by global approximations. The deterioration of the convergence rate can be also observed for standard methods and is expected due to the nonlinearity of the underlying biochemical systems. However, depending on the parameter ranges, the adaptive Smolyak algorithm shows in our experiments consistently a significantly better performance than other state of the art methods and thus, is able to deal with the computational forward and Bayesian inversion analysis even in large-scale applications.

Comparison between the adaptive Smolyak interpolation and a firstorder sensitivity approach
A detailed analysis of the sensitivities with respect to the parameters of the model reveals that the sensitivities at the nominal point p 0 of 135 parameters are larger than 1e4, illustrated in the sensitivity profile of the model in Fig. 4A. The sensitivity index σ (n x = 500, n t = 10) for this model equals 2.4860e15 rendering the forward analysis and Bayesian inference infeasible even for moderate parameter ranges. We therefore restricted the variations of the parameters to the range of ±0.01 to ensure bounded variations in the states. The Smolyak algorithm fails to converge for larger parameter ranges indicating the ill-conditioning due to modeling issues related to sensitivities of these magnitudes.
As shown in Fig. S18, the estimated error curves of the Smolyak algorithm indicate an asymptotic rate of convergence of 1.5 with respect to the number N of ODE solves needed.  Figure S19. Comparison of the (absolute) errors for the adaptive sparse grid based on CC interpolation nodes (above) and the first-order sensitivity approximation (below), variations of ±0.01p 0 (1st realization), third model.

Bayesian inverse problem
We now turn to the inverse problem, i.e. the goal of computation is the conditional expectation of a quantity of interest given noisy observational measurement data. Bearing in mind that the parameter sensitivities are > 1e5 for more than half of the parameters, we restrict, as in the previous experiment, the parameter variations to ±0.01 around their nominal values. The observational noise is assumed to be independently, normally distributed with zero mean and variance σ 2 = 0.  To further explore the parameter space, we propose the following ad-hoc computational remedy, leaving aside the issue of modeling correctness: the idea of the adaptive Smolyak algorithm relies on the efficient detection of dimensions in the parameter space with a large impact on the quantities of interest. The information provided by the already activated indices can be used to enlarge the parameter variations of parameters showing less/ no influence on the quantities of interest, i.e. to enlarge the parameter domain for parameters not activated by the dimension-adaptive Smolyak algorithm. In [18], a reduction of the parameter dimension based on first order sensitivity information is proposed, leading to an overall reduction from 227 to 75 parameters. In contrast to this a priori choice of the active parameters, our approach allows to successively explore and enlarge the parameter space based on the information gained by the adaptive exploration of the parameter space by the Smolyak algorithm.
The number of parameters activated by the algorithm is 112 for variations in the range of ±0.01 · p 0 . We define a vector p 1 0 by p 1 0 ∈ R 227 , p 1 0 = 1, if index i is activated and a second vector p 2 0 by p 2 0 ∈ R 227 , p 2 0 = 1, if index i is not activated, indicating the set of parameters with enlarged parameter variations ±0.1p 0 . The convergence results of this heuristic strategy are summarized in Fig. S21.   Figure S22. Estimated (absolute) error curves of the normalization constant Z of the three model variables Akt (left), Erk (middle), ErbB (right) with respect to with respect to the number of ODE solves needed based on the sequences CC, uniform distribution of the parameters, variations of p 0 ± 0.01 · p 1 0 / ± 0.1 · p 2 0 / ± 1.0 · p 3 0 , third model.
As displayed in Fig. S22, we finally end up with a similar convergence behavior as in the case of variations in the range of ±0.01p 0 and could moreover significantly enlarge the parameter space. The heuristic strategy proposed above could be a promising approach to exploit the additional information given by the adaptively determined index sets, namely the influence of each parameter on the quantities of interest in the output. The numerical experiments presented clearly indicate the high impact of some parameters on the quantities of interest, but there are also parameters having no or less influence on the response. This nonisotropic behavior can be exploited by the adaptive algorithm, in particular by the proposed heuristic strategy, which allows to enlarge the parameter space. Nevertheless, sensitivities of the magnitude appearing in the sensitivity profile of the underlying model will cause numerical stability problems for any algorithms in system and parameter identification problems. For "reasonable" parameter sensitivities, the proposed dimension-adaptive Smolyak algorithm can construct a parametric surrogate model and allows for computational Bayesian estimation with work vs. accuracy which is dimension-robust (ie., with work which scales linearly with the number of active model parameters) and which can be, asymptotically, superior to that afforded by Monte Carlo algorithms, in all examples which we investigated in the present work. A necessary prerequisite for the success of the presently proposed computational strategy, however, is proper model specification. Model misspecification can severely hamper the performance of any numerical approach. The presently proposed (linear scaling with respect to the model size) computation of "sensitivity profiles" prior to Bayesian inversion allows for a fast identification of possible model misspecification and, in any case, alerts the modeler to computational pitfalls due to intrinsic model stiffness and ill-conditioning.
Parameter sensitivities of computed system responses of the magnitude reported in Fig. 4A indicate, in our view, serious issues with numerical stability of forward simulation and calibration, and are to be taken as evidence for model misspecification, which can not (and should not) be addressed by efficient numerical simulation alone.
Assuming the absence of ill-conditioning or of model misspecification, the presently proposed, dimension adaptive interpolation and quadrature algorithms was found to scale and perform reliably in computations for nonlinear models of biochemical reaction pathways with up to several hundred states and parameters. Based on these observations and on recent theoretical results [1] by some of the authors on intrinsic sparsity in polynomial chaos expansions of the parametric responses, linear scaling and comparable performance are expected for considerably larger models with affine-parametric structure imposed by mass-action kinetics. We add that affine-parametric models are not a prerequisite for the applicability of the presently proposed algorithm. In further experiments, we found similar performance also for Michaelis-Menten kinetics and, based on theoretical results for parametric partial differential equation models in [19], analogous (theoretical and computational) results can be expected also for models with nonlinear, holomorphic parametric dependencies.