Experimental Design for Stochastic Models of Nonlinear Signaling Pathways Using an Interval-Wise Linear Noise Approximation and State Estimation

Background Computational modeling is a key technique for analyzing models in systems biology. There are well established methods for the estimation of the kinetic parameters in models of ordinary differential equations (ODE). Experimental design techniques aim at devising experiments that maximize the information encoded in the data. For ODE models there are well established approaches for experimental design and even software tools. However, data from single cell experiments on signaling pathways in systems biology often shows intrinsic stochastic effects prompting the development of specialized methods. While simulation methods have been developed for decades and parameter estimation has been targeted for the last years, only very few articles focus on experimental design for stochastic models. Methods The Fisher information matrix is the central measure for experimental design as it evaluates the information an experiment provides for parameter estimation. This article suggest an approach to calculate a Fisher information matrix for models containing intrinsic stochasticity and high nonlinearity. The approach makes use of a recently suggested multiple shooting for stochastic systems (MSS) objective function. The Fisher information matrix is calculated by evaluating pseudo data with the MSS technique. Results The performance of the approach is evaluated with simulation studies on an Immigration-Death, a Lotka-Volterra, and a Calcium oscillation model. The Calcium oscillation model is a particularly appropriate case study as it contains the challenges inherent to signaling pathways: high nonlinearity, intrinsic stochasticity, a qualitatively different behavior from an ODE solution, and partial observability. The computational speed of the MSS approach for the Fisher information matrix allows for an application in realistic size models.

Given observations O 0 , O 1 , . . . , O n at time points t 0 , t 1 , . . . , t n , the observation based likelihood function L(O, θ) gives the probability to obtain the data O given a parameter θ. This likelihood function can be factorized into

Measurement noise
The first factor describes the measurement noise: P (O i ; ν i ) is the probability to measure O i if being in state ν i . As this is in most cases a Gaussian distribution, its calculation is computationally fast because it does neither involve stochastic simulations nor ODE solutions.

Transition probability
The second probability is the transition probability for a transition from to ν i−1 to ν i . Its distribution is generally unknown. It could be calculated with the use of simulations and density estimation methods or the solution of a high dimensional chemical master equation system. Both are from computational point very time consuming. Therefore, [13,14] suggest to approximate it with a normal distribution: where f (y|µ, Σ) is the probability density function of a multivariate normal distribution with mean µ and covariance Σ which is calculated by a linear noise approximation as in equation (12).
As the Gaussian distribution has a continuous support, the probability for (36) is calculated with an integral instead of the sum: where Λ i stands for the state space at time point t i . In many cases the state space will be constant over time, hence Λ = Λ 1 = Λ 2 , . . . = Λ n .

State estimation
The third probability P (ν i−1 ; O i−1 , . . . , O 0 , θ) is the probability to be in a state ν i−1 given the observations O i−1 , . . . , O 0 . This probability is challenging from a computational point of view due to two reasons: first, its evaluation needs computational expensive formulas from hidden Markov processes (equation 1 in [51]). Second, even once it is evaluated for every point ν i−1 in the state space Λ i−1 , the transition probability p(ν i ; ν i−1 , θ) has to be calculated for any of these points. This is, even with the approximation, computationally challenging (see also comment in discussion).
Therefore, a state updating is used instead of the full probability distribution for i = 1, . . . , n − 1 and x as in equation (2) and Σ as in equation (12). The initial statê ν 0 is included into the optimization vector. This is a quadratic optimization problem with an analytical solution and therefore computationally very fast.
By definition the state estimate incorporates information available from the previous observations. The state estimateν i−1 is then used to evaluate P The state estimation procedure can be considered as a 1-point evaluation of the integral or an approximation of the distribution with a point distribution. Although this seems to be a rough approximation, extensive simulation studies [14] have shown that this allows to estimate states and parameters. When considering different models, there are two options to check whether the approximation is appropriate: first, stochastically simulating time courses, applying this approach to them, and checking how well the states are estimated. Second, using updates formulas for P (ν i−1 ; O i−1 , . . . , O 0 , θ) (such as equation 1 in [51]) to evaluate the term in equation (36) and comparing the result to equation (41). The wide use of Kalman-filtering techniques [52] shows that the use of state estimates allows for an successful analysis.

The MSS objective function
Taking the logarithm of equation (35), inserting the approximation of equation (41) and considering Gaussian measurement error, leads to the MSS objective function as displayed in equation (15).

Settings for the parameter estimations
Immigration-Death model: The parameter estimation for the Immigration-Death model was carried out by optimizing the MSS objective function (equation (15)). The optimization was performed with the FindMinimum algorithm in the software Mathematica 9 [53]. The search was initialized with the true parameter and terminated with at maximum 500 iterations. Only for the parameter estimations with the benchmark method, the version 10.4 [54] was used as the version 9.0 seemed to be much slower here.

Lotka-Volterra model:
The parameter estimation for the Lotka-Volterra model was also carried out with the FindMinimum algorithm in the software Mathematica 9. The search was initialized with the true parameter θ (0) and (1.1, 1.1, 1.1, 0.9)θ (0) as a second start value.

Calcium oscillation model:
The parameter estimation for the fully observed Calcium oscillation model was carried out using a Particle Swarm [55] algorithm implemented in Mathematica [53].
with 300 iterations and 150 particles per iteration.
Calculation of the Fisher information for the exact approach for the Immigration-Death model The Fisher information F I ex (equation 30) is derived as follows (see also [56]) with