Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks

Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics.


Derivation of Forward and Adjoint Sensitivity Analysis for Gradient Evaluation
AMICI supports first and second order forward sensitivities and first order adjoint sensitivities. In the following we will provide all necessary equations for sensitivity and gradient calculation.

Objective function
We consider the objective function with parameter dependent noise variance σ 2 ij (θ). For simplicity of notation we assume that t 0 ≤ t 1 < ... < t N .
The derivative of the objective function with respect to parameter θ k , i = 1, ..., n θ , is given by s y i,k (t j ) (2) in which s y i,k (t j ) denotes the sensitivity of the output y i = h i (x, θ) at time t j with respect to parameter θ k . The sensitivity of the output can be expressed in terms of the sensitivity of the state

First and second order forward sensitivity analysis
The governing equations for first order forward sensitivities s x Second order equations can be obtained by differentiating both sides of (4) with respect to the parameter θ l : → R nx×n θ ×n θ denotes the second order sensitivity of states x with respect to parameters θ k and θ l . Note that for two times continuously differentiable functions f the second order sensitivities are, according to Schwarz' Theorem, symmetric in k and l.

Adjoint sensitivity analysis
The evaluation of every gradient entry ∂J/∂θ k using (2) requires the solution of (4) which can be prohibitively computationally demanding for complex systems with a large number of parameters. In the following we will derive an evaluation scheme of this gradient of objective functions of the form (1) which does depend on the state sensitivities s x k (t) and thus does not require a solution to (4). This will be achieved by reformulating the term which contains the sensitivity of the state with respect to the parameters, We introduce the adjoint state p(t) : [t 0 , t N ] → R nx as the solution to the backward differential equation which is defined on intervals (t j−1 , t j ), j = 1, . . . , N . The dimension of the adjoint state p(t) equals the number of state equations, which is usually the same as the number of state variables x(t). From (6) it follows that for each of the individual subintervals with j ∈ {0, . . . , N − 1} it holds that: By integration by parts of (7) we obtain We proceed by simplifying the second term of the integral by using the forward sensitivity equation (4) and by evaluating the limit from above lim t→t + j p(t) using the definition of the end value on the intervals, see the last line of (6). This yields Note that the latter only holds for j > 0 while for j = 0 (for which no measurement is available), the last summand is 0. Summing (9) over j from 0 to N − 1 yields All terms in ( * ) either cancel or evaluate to 0 except for p(t N )s x k (t N ) and p(t 0 ) T s x k (t 0 ). Additionally merging the sum over integrals into one integral and using the last line of (6) for expressing the term By replacing p(t 0 ) T s x k (t 0 ) with the definition of the initial condition of the forward sensitivity equation in (4) we obtain an expression for (5) which is independent of the state sensitivities s x k (t 0 ): This alternative formulation of the problematic term (5) is then used to formulate the gradient (2). We obtain the objective function gradient with k = 1, ..., n θ . This expression no longer depends on the state sensitivities s x k (t) but instead on the adjoint state p(t). In contrast to the state sensitivities, the adjoint state does not depend on parameters und thus only needs to be evaluated once. This reduces the computational complexity of one gradient evaluation from n p + 1 systems of the size n x to 2 systems of the size n x . The computational complexity of the numerical integration required to evaluate (10) is usually negligible.  where method ∈ {adjoint', forward, finite difference} indicates the employed gradient approximation scheme. For method ∈ {adjoint, forward, finite difference} we used default tolerances default accuracies (absolute error < 10 −16 , relative error < 10 −8 )) and for method=adjoint' we used high accuracies (absolute error < 10 −32 , relative error < 10 −16 ). For finite differences we used a step size of 10 −3 . The maximum over both relative error variants ensures symmetry with respect to the choice of methods and accounts for the fact that we do not know the ground truth.

Scaling of Forward and Adjoint Sensitivity Analysis with respect to the Number of Time Points
We compared the scaling of adjoint and forward sensitivity analysis with respect to the number of time points at which data is available. We found that for both appraoches the computation time increases linearly with the number of time points (see Figure 2). An additional analysis suggests that for forward sensitivity analysis this is caused by the need to evaluate the sensitivities at more time points than required by the employed multi-step solver. This appears to a slight shortcoming of the the multi-step solver in CVODES [1], which requires the evaluation of a polynomial equation for each sensitivity. For adjoint sensitivity analysis the increase in computation time is caused by the reinitialization of adjoint sensitivities at the discontinuities at the data time points. This is also a shortcoming of using a multi-step solver as the order of the solver and the time-step needs are reset to small values at every discontinuity. For both, forward and adjoint sensitivity analysis, the adverse scaling with the number of time points could be eliminated by switching to a single-step solver. We are, however, not aware of any publicly available general purpose differential equation solvers with A key challenge for large-scale mathematical modeling is data availability. Insufficient data can easily result in structural and practical identifiability problems. To assess the identifiability of the model for ErbB signalling we investigated the eigenvalue spectrum of the Fisher Information Matrix at the nominal parameter value. We found that the nonzero eigenvalues span almost 30 orders of magnitude and that there are 73 eigenvalues equal to zero. This suggests that most parameters in the model are poorly or not at all identifiable from the considered experimental data.

Comparison of AMICI and odeSD
We compared AMICI and odeSD [2], a specialized ODE integrator for the efficient computation of forward sensitivities. For our evaluation we used the newest publicly available version of odeSD (odeSD v1.1) and considered the Kholodenko model [3]. This model has been studied by [2]. To avoid spurious effects from thread starvation, MATLAB was started with the -singeCompThread flag to ensure single-threading. This was necessary as odeSD uses multi-threaded routines from the MATLAB Math Kernel Library, which resulted in generally slower computation times during high computational load on the testing machine.
For the Kholodenko model with ten equidistantly spaced time points between t = 0 and t = 50, odeSD with mex right hand side required 14 seconds for 100 repeated integrations while AMICI required 17 seconds for 100 repeated integrations for the calculation of the solution. Accordingly, the computation time is comparable while AMICI offers additional options, including adjoint sensitivity analysis. If the number of intermediate time points increases, the computation time for odeSD remains roughly the same. For AMICI the overall computation time increases slightly as the time requirement for evaluating the (differentiated) interpolating polynomial becomes increasingly important for the total computation time.