Computational modeling is a key technique for analyzing models in systems biology. There are well established methods for the estimation of the kinetic parameters in models of ordinary differential equations (ODE). Experimental design techniques aim at devising experiments that maximize the information encoded in the data. For ODE models there are well established approaches for experimental design and even software tools. However, data from single cell experiments on signaling pathways in systems biology often shows intrinsic stochastic effects prompting the development of specialized methods. While simulation methods have been developed for decades and parameter estimation has been targeted for the last years, only very few articles focus on experimental design for stochastic models.
The Fisher information matrix is the central measure for experimental design as it evaluates the information an experiment provides for parameter estimation. This article suggest an approach to calculate a Fisher information matrix for models containing intrinsic stochasticity and high nonlinearity. The approach makes use of a recently suggested multiple shooting for stochastic systems (MSS) objective function. The Fisher information matrix is calculated by evaluating pseudo data with the MSS technique.
The performance of the approach is evaluated with simulation studies on an Immigration-Death, a Lotka-Volterra, and a Calcium oscillation model. The Calcium oscillation model is a particularly appropriate case study as it contains the challenges inherent to signaling pathways: high nonlinearity, intrinsic stochasticity, a qualitatively different behavior from an ODE solution, and partial observability. The computational speed of the MSS approach for the Fisher information matrix allows for an application in realistic size models.
Citation: Zimmer C (2016) Experimental Design for Stochastic Models of Nonlinear Signaling Pathways Using an Interval-Wise Linear Noise Approximation and State Estimation. PLoS ONE 11(9): e0159902. https://doi.org/10.1371/journal.pone.0159902
Editor: Ramon Grima, University of Edinburgh, UNITED KINGDOM
Received: September 3, 2015; Accepted: July 11, 2016; Published: September 1, 2016
Copyright: © 2016 Christoph Zimmer. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. CZ was supported by Bioms.
Competing interests: The author has declared that no competing interests exist.
Computational modeling is widely used to deepen the understanding of biological processes. Due to advances in experimental techniques (e.g. the possibility to measure small numbers of molecules in single cells ), the importance of stochastic modeling is increasing. This article focuses on experimental time course data that shows intrinsic stochasticity such as e.g. signaling pathways (Fig 1). Stochastic simulation algorithms have been developed for decades  resulting in a lot of variants today . Recently, the development of parameter estimation techniques suited for stochastic models began. These techniques can be classified into approaches based on the chemical master equation , moment closure methods [5–8], Monte Carlo methods [9, 10] and approximations [11–14].
The left panel shows the deterministic behavior of the Calcium oscillation model, the right panel two stochastic realizations showing the special characteristics to which the experimental design methodology in this article can be applied: qualitatively different behavior from deterministic modeling, bursting oscillations, high nonlinearity and fast dynamics (e.g. from almost 0 to 10000 molecules within a few time units).
The development of experimental design techniques for stochastic models in systems biology is a very new field. The goal of experimental design is to design an experiment by choosing the experimental conditions (e.g. time points of measurements or components that are measured) in such a way as to maximize the amount of information that can be obtained from the data. In contrast to parameter estimation, experimental design is independent from measurements and can be calculated before performing any experiment. Therefore, it is a tool to reduce experimental costs by obtaining a certain predefined accuracy or maximizing the accuracy with a predefined cost.
The most common quantity for measuring this information for models of ordinary differential equations (ODE) is the Fisher information matrix [15, 16]. As the Fisher information matrix is a parameter dependent measure, its application needs some prior knowledge about the system’s parameters. As this is not always readily available, techniques such as robust experimental design [17, 18] have been developed to broaden its applicability. The Fisher information matrix is under regularity condition the inverse of the asymptotic variance of a maximum likelihood estimator . Therefore, the Fisher information has a high impact on analyzing and improving parameter estimation: it is used to calculate confidence intervals for the parameter estimates. Furthermore, it facilitates the investigation of the correlation between parameters and the optimization of experimental design. For experimental design the Fisher information matrix is first mapped to a scalar by so called optimality criteria . This scalar function can then be optimized. The optimality criteria reflect the experimenter’s interest on the parameters. Additionally, optimality criteria can also be used for model selection. This article suggests an approach for calculating a Fisher information matrix in stochastic models.
The Fisher information has been applied to ODE or differential algebraic models [15, 21]. A review of experimental design techniques in systems biology is presented by . The mitogen-activated protein (MAP) kinase cascade is investigated in  who compares the confidence ellipsoid of the Fisher information to parameter estimates gained from inference on simulated data. There are also approaches to experimental design without the Fisher information matrix such as Bayesian experimental design  or design strategies based on profile likelihoods .
Current approaches for calculating a Fisher information matrix in stochastic models in epidemics  and systems biology  approximate the process with a multivariate normal distribution taking into account the inter-temporal covariances. [26, 27] uses moment closure techniques to compare experimental moments to parametrized theoretical moments. Based on that, the authors show how to design optimal experiments to investigate gene expression.  suggests a Monte Carlo based techniques to derive optimal perturbation experiments for transcription.
The article at hand makes use of a recently developed multiple shooting for stochastic systems (MSS) objective function that treats the intervals between measurements separately. On each interval a linear noise approximation (LNA) is used in combination with a state updating scheme to handle non-observable components . This separate treatment of the intervals means that the LNA is only needed on the relatively short time interval between measurements. The FIMSS Fisher information is calculated based on this MSS objective function and pseudo data. The pseudo data is generated using the same approximation scheme as for the MSS objective function. The reason for the use of the Fisher information are its theoretical properties, the wide use in deterministic models and computational speed.
In contrast to  data from only one time course is sufficient for the new method of this article. The approach of  will be used as a benchmark for comparison.  calculated a Fisher information matrix without Monte Carlo simulations. This approach assumes that the observations are distributed with a multivariate normal distribution (MVN). The mean of this MVN is the ODE solution and the covariance matrix is calculated with the help of a LNA and it contains all inter-temporal covariances. This means that this approach applies the LNA to the whole systems horizon, in contrast to the MSS method which only applies to the intervals between measurements. This difference is of major importance: If the LNA holds over the whole systems horizon, it also holds on a shorter time scale. However, if it holds on shorter time scale, it does not necessarily hold over the whole systems horizon. The results section will illustrate the impact of this on experimental design methods.
Simple examples, as the Immigration-Death model, allow for an exact calculation of estimates and Fisher information matrices. This allows to compare the performance of the MSS method with an exact approach.
As the Fisher information is an asymptotic measure and calculated using the MSS objective function’s approximation, it is essential to investigate its performance. To address this matter, the Fisher information, which corresponds to the inverse of the covariance of the maximum likelihood estimator, is compared to a covariance matrix gained from different realizations of these estimator. These realizations are computed by simulating stochastic data sets and performing a parameter estimation on each of the data sets. The resulting estimates are used to calculate a covariance matrix, which is then compared to the Fisher information matrix. This comparison is based on optimality criteria, correlation structure or two-dimensional projection of the confidence intervals (which gives best visual interpretation and has also been suggested by ). As the Fisher information matrix is an asymptotic measure and calculated by approximations, an exact coverage of confidence intervals is unlikely. However, since the information content can be captured well enough to identify more informative designs, this is sufficient for experimental design. Note that the comparison of the Fisher information matrix to the covariance matrix from estimates is only done for investigating the performance. In general, when designing optimal experiments, only the evaluation of the Fisher information is needed and not the covariance from the estimates. The latter is only needed for evaluating the performance.
The methods section will recapitulate stochastic modeling, introduce the Fisher information matrix, and define the FIMSS Fisher information matrix. The results section investigates the performance of the FIMSS Fisher information matrix for three models: an Immigration-Death, a Lotka-Volterra and a Calcium oscillation model. Calcium oscillations play an important role in cell development and death as well as fertilization . On top of that, the Calcium oscillation model is an especially challenging test case as it is highly nonlinear and shows a qualitatively different behavior in stochastic modeling than in deterministic modeling.
Stochastic Modeling of Biochemical Reactions
Computational modeling is a key technique for the analysis of complex systems in science. This subsection will introduce stochastic modeling and explain in which situations it is important.
Let X = (X1, …, XD) denote the D reactants in a system with r reactions in which qij denotes the number of educt molecules of species Xi for reaction j and uij the number of product molecules of species Xi for reaction j. Hence the system reads as (1)
The stoichiometric matrix S is a D × r dimensional matrix. Its entries sij = uij−qij describe the net effect of reaction j to species Xi. In terms of ODEs the system would read as (2) and a rate law v = (v1, …, vr)T describing the speed of the reactions, an initial concentration x0 and a parameter vector θ.
Stochastic modeling is important in systems with small numbers of molecules, where stochastic fluctuations can influence the system’s behavior . It focuses on single particles and considers each reaction explicitly. Both order of reactions and waiting times are stochastic quantities depending on the system’s state and the rate laws. The chemical master equation (CME, Eq 3) describes the time evolution of the probability of the system to be in a state ν: (3) with a vector sj = (s1j, …, sDj)T and with a propensity that can be calculated from the rate law v and describes the speed of the reactions in terms of particle numbers. The rate constant θ needs to be defined in a volume independent way, otherwise apply transformations as in . See  for detailed discussion on stochastic formulation for rate laws vj of higher order reactions.
The Gillespie algorithm  is the method of choice to simulate stochastic time courses. It is an iterative algorithm simulating reaction event after reaction event using functions of random numbers to determine both time step and reaction. The resulting time course is then a discrete state continuous time Markov jump process, see also  for details. The stochastic time courses shown in this manuscript were simulated using the Gillespie implementation in COPASI .
Stochastic modeling can show system’s behavior that can not be seen with ODE modeling: Stochasticity can for example introduce bi-modality in genetic toggle switches, which have a stable steady state in ODE modeling . The structure of Calcium oscillations may change qualitatively from ODE to stochastic modeling . Furthermore, intrinsic stochasticity may provide information, e.g. regarding reactivity, which allow to solve identifiability problems . This emphasizes the importance of stochastic modeling.
The Fisher information Matrix
The Fisher information matrix measures the information provided by an experimental set-up for the estimation of the parameters. The theoretical result , which serves as the basis for the wide use [15, 16, 21, 22] of the Fisher information, states that the inverse of the Fisher information matrix FI is under certain regularity conditions  the asymptotic variance of a maximum likelihood estimator for a parameter ϑ: (4) where “dist” stands for convergence in distribution, MVN for multivariate normal distribution, ϑtrue for the true parameter value, and T = (t0, …, tn) for the set of time points, at which measurements are recorded. In case of structurally non identifiable parameters, the determinant of the Fisher information is zero and its inverse does not exist. Therefore, the relation (Eq 4) does only hold for scenarios, in which all parameters are identifiable.
A maximum likelihood estimator can be obtained by maximizing the likelihood function L over a parameter ϑ: (5) where the likelihood function describes the probability to observe a data set given a parameter ϑ. Intuitively spoken, the more sensitive the likelihood function is to changes in ϑ the more precise is an estimation of . The Fisher information matrix FI captures this sensitivity. Its components are defined as (6) where nϑ is the dimension of the parameter vector ϑ, and the expectation is calculated over all possible combinations of observations . The Fisher information matrix is a symmetric matrix. The diagonal entries of its inverse describe the variance of the parameter estimates and the off-diagonal entries of its inverse the correlation of the parameter estimates.
Use of the Fisher information matrix.
Due to the relation of Eq (4) the Fisher information can be used to calculate confidence intervals for each parameter or a multidimensional confidence area for all parameters. This includes information on the volume of the confidence ellipsoid and on the axis in the parameter space that can be identified with the lowest precision. Furthermore, the Fisher information can be used to obtain relative errors of the parameter estimates and extract correlation information between the parameters.
As the Fisher information only depends on the time points of the measurements but not on the actual outcome of an experiment, it is possible to calculate it before performing any experiment. This means that Fisher information matrices can be calculated for different experimental set-ups allowing the selection of the most informative design. This procedure is called experimental design. The goal is to obtain a parameter estimate that is as precise as possible, which means that its variance is as small as possible. As the Fisher information matrix is the inverse of the covariance matrix of the estimator, minimizing the variance means maximizing the Fisher information matrix. However, the task of maximizing a matrix is not well defined. To overcome this problem, several so called optimality criteria have been introduced , which map the Fisher information matrix to a real number.
Optimality criteria reflect measures of the parameter estimates’ confidence ellipsoids, which correspond to the Fisher information matrix. This article will use two optimality criteria:
- D-criterion: maps the Fisher information matrix to its determinant. The determinant corresponds to the volume of the confidence ellipsoid.
- E-criterion maps the Fisher information on its minimum eigenvalue. The minimum eigenvalue corresponds to the largest axis in the confidence ellipsoid.
While there are a lot more optimality criteria, see e.g. , the choice of the criterion depends on the focus of the experimenter. This article selects the D- and E-criterion as they give important information on the size and shape of the confidence ellispoid (volume and largest axis).
Computation of the Fisher information matrix for stochastic models.
Eq (6) defines the Fisher information matrix. Its calculation is straightforward. However, the required computational time poses a big challenge, which makes the straightforward calculation infeasible for most realistic size models in systems biology. One reason is the expectation in Eq (6). Theoretically, this is a sum over all possible data sets . If there are n measured time points, there are “number of points in state space” to the power of “n” summands. This is computational infeasible in most scenarios. Therefore, this article approximates this expectation with a mean over a subset of all data sets. This subset is created by generating M pseudo data sets using the Gillespie algorithm or with an alternative approach, as described in the “How to generate the pseudo data” subsection.
Another challenge is the evaluation of the likelihood function L. Analytical solutions are most commonly not available for models in systems biology. Approximations have to be accurate but still fast enough so that M can be chosen high enough to get a good approximation of the expectation. A MSS approximation for the log-likelihood function has been suggested by [13, 14]: . This FMSS objective function will be introduced in the next subsection. (A more detailed explanation can be found in the appendix or the original articles [13, 14].)
The FIMSS Fisher information matrix is based on the MSS objective function and reads as (7) where the parameter ϑ contains the potential unobservable initial states ν0, the measurement noise covariance matrix Σmeas and the kinetic parameters θ.
This FIMSS Fisher information matrix can be used to assess the precision of the parameter estimates for an experimental set-up. It is possible to use pseudo data for a different number of system components to see the gain (or loss) in information by measuring more (or less) components. Furthermore, by varying the number of time points in T, the required number of time points for a predefined precision can be determined. This can be accomplished by calculating the optimal design for different numbers of measurements and then choose the smallest number of measurements for which the accuracy requirements are met. Additionally, it is possible to calculate an optimum experimental design by choosing the design T* that maximizes the value of an optimality criterion Φ: (8)
The power and accuracy of the FIMSS Fisher information matrix will be demonstrated in the Results section on different test models.
The MSS objective function
The MSS objective function has been suggested and shown to work for parameter estimation in stochastic models [12–14]. It assumes that observations are taken at discrete time points t0, t1, …, tn, where the system’s state is ν = (ν0, ν1, …, νn). This system’s state is usually only imperfectly observable. This means that only some of the components can be observed and these observations are noisy: . The main characteristics of the MSS objective function are:
- the time course data is split into intervals that are treated separately,
- unobserved states are handled by state updating,
- the distribution of a current state given its precursor is approximated with a normal distribution with mean and covariance gained by a linear noise approximation.
The MSS objective function will be derived briefly here, the details can be found in the S1 Text.
The observation based likelihood function gives the probability to obtain the data given a parameter θ: (9) where is the probability to observe given previous observations . This probability can be written as (10)
The first factor describes the measurement noise: is the probability to measure if being in state νi.
The second probability is the transition probability for a transition from to νi−1 to νi. Its distribution is generally unknown and [13, 14] suggest to approximate it with a normal distribution: (11) where f(y|μ, Σ) is the probability density function of a multivariate normal distribution with mean μ and covariance Σ which is calculated by a linear noise approximation (12) with x = x(Δt; θ, νi−1), the solution of Eq (2), a volume Ω and’ denoting the transpose of a matrix.
As the Gaussian distribution has a continuous support, the probability for in Eq (10) is calculated with an integral instead of the sum: (13) where Λi stands for the state space at time point ti. In many cases the state space will be constant over time, hence Λ = Λ1 = Λ2, … = Λn.
The third probability is the probability to be in a state νi−1 given the observations .  suggests to use a state updating procedure instead of the full probability distribution to estimate the state νi−1 at time point ti−1: Given a state estimate at time ti−1, the probability to see the observation at time ti is the product of the probability to move from state to a state νi and the probability to see if the state is νi. A state estimate can be defined as the state that leads to the highest probability to observe : (14) for i = 1, …, n − 1 and x as in Eq (2) and Σ as in Eq (12). The initial state is included into the optimization vector.
The MSS objective function.
The parameter ϑ = (θ, ν0, Σmeas) is composed of the kinetic parameters θ, the initial state ν0 and the measurement noise covariance matrix Σmeas.
How to generate the pseudo data?
Eq (7) approximates the expectation over all possible data sets by the mean over a subset of all data sets. This subset is created by simulating pseudo data sets. Therefore, one needs a way to generate these pseudo data sets. The following scheme has been indicated in . The distribution of νi at time point ti given the knowledge of previous state νi−1 is approximated with νi|νi−1 ∼ N(x(Δt; θ, νi−1), Σ(Δt; θ, νi−1)). This can be used to generate pseudo data trajectories ν(T) by iteratively drawing random numbers according to this distribution. The argument “(T)” is hereby used to denote the time points t0, …, tn of the pseudo data. Given a state νi−1, the next state νi and the next pseudo observation are calculated as (16) with , and random numbers ui ∼ N(0D, 1D×D) and with a D × D matrix 1D×D with diagonal entries 1 and a vector 0D of length D with zero entries. The length of the vector of observables is denoted by “obs”.
To ensure consistency in the generation of pseudo data and the evaluation of the MSS Fisher information matrix this scheme was applied (instead of e.g. a standard Gillespie algorithm). Furthermore, the suggested pseudo data generation scheme has the advantage that it allows for a continuous dependency between parameters and system’s state (by leaving out the “Round” operation)operation), which is not possible using a standard Gillespie algorithm.
A benchmark approach for comparison
The method from  is used as a benchmark. It is also based on a LNA, however, it is assumed that the LNA holds for the whole time horizon, namely from t0 until tn.  calculates inter-temporal covariances by (17)
The fundamental matrix Φ of the non-autonomous system is calculated by (18) with an identity matrix I. The observation sequence ν = (ν0, …, νn) is then considered to be multivariate normal distributed (19) with (20) and a symmetric matrix (21)
A parameter estimate can be calculated by maximizing the probability of the MVN distribution in Eq (19) to observe ν over the parameter θ. A Fisher information matrix FIBench can be calculated using  with and (22)
Note that the first summand needs to contain the inverse of ΣB as in .
Design of the simulation study
To ensure the validity of the FIMSS Fisher information approximation the accuracy thereof has to be evaluated. The Fisher information in itself is an asymptotic measure and the FIMSS Fisher information matrix for stochastic models is particularly approximative using the LNA approximation from the MSS objective function.
Two quantities are calculated for the accuracy evaluation: on one hand the new FIMSS Fisher information matrix for stochastic models. On the other hand, Nsim stochastic time courses are simulated with the Gillespie algorithm , and for each of them a parameter estimation is performed with the MSS objective function  (see the “Settings for the parameter estimation” section in the S1 Text). This results in Nsim parameter estimates. A covariance matrix is calculated from these Nsim parameter estimates and denoted by Cov. Its inverse Cov−1 is denoted by FIemp. As FIMSS corresponds asymptotically to the inverse of the covariance matrix, Eq (4), FIMSS can be compared to FIemp.
Different measures are employed to compare the two matrices FIMSS and FIemp:
- a comparison based on optimality criteria such as D- and E-criterion (introduced in the “Optimality criteria” subsection),
- The i−th diagonal entry of the inverse of the Fisher information matrix corresponds to the variance of i−th component of the parameter estimate . Therefore, the average relative squared error (ARSE) (23) can be compared to with known true parameter θ(0). The average ARSE over all parameter components is sometimes named “A-criterion”,
- a visualization of the 2-d projections of the confidence ellipsoids and a comparison of its shape to the cloud of points of the parameter estimates as also suggested by ,
- a comparison of the parameter correlations.
This model can be used as a simple model of constitutive gene expression , where X is the amount of transcript, θ1 the transcription rate and θ2 the mRNA degradation rate.
The Immigration-Death model has a very interesting property: using an ODE model and steady state data, only the quotient of the parameters θ1 and θ2 is identifiable but not their absolute values. However, using a stochastic model, the information encoded in the intrinsic fluctuations allows the identification of both parameters . This property has to be reflected in the Fisher information matrix. Therefore, this relatively simple model is a valuable part of the test set for the FIMSS Fisher information matrix.
Various scenarios are taken into account with each 100 observations and different inter-sample distances ranging from Δt = 0.1 to Δt = 15. x0 = 10 is chosen as initial value with θ1 = 1 and θ2 = 0.1, as this configuration leads to a steady state. The FIMSS Fisher information is calculated for all inter-sample distances according to Eq 7 with M = 1000 pseudo data sets. The pseudo data sets are generated as explained in “How to generate the pseudo data” subsection.
The FIMSS Fisher information matrix is calculated based on a finite pseudo data set (M = 1000) where each entry of FIMSS represents a sample mean. Therefore, the first question is how the accuracy of the entries of FIMSS depends on the number of pseudo data sets M. Fig 2 shows how the entries of FIMSS converge with increasing M. One can see that already a number of M as low as 200 leads to an acceptable accuracy for FIMSS.
Each panel shows one entry of FIMSS. Note that the 2 × 1 entry is identical to the 1 × 2 entry due to the symmetry of the Fisher information matrix. The x-axis shows the number of pseudo data sets M used for calculating the sample mean (shown as solid line) of Eq 7. Gray color indicates the area from sample mean plus / minus one standard deviation. As the width of the gray are is decreasing, the accuracy increases and an acceptable accuracy is reached at values around M = 200.
As the Fisher information is an asymptotic measure and the FIMSS Fisher information matrix is based on the MSS objective function, the next step is to investigate the accuracy of FIMSS compared to FIemp. FIemp is the inverse of the covariance of estimates gained from simulated data. For each experimental design Nsim = 1000 data sets are simulated and 1000 parameter estimates calculated. The experimental designs vary in their inter-sample distance and contain, for better comparison, 100 observations each. Whenever an estimate was greater than 3, it was counted as non-converging. This happened 28 times for Δt = 15, 7 times for Δt = 12.5, and 3 times for Δt = 10. Convergence was achieved for each data set for all remaining inter-sample distances.
Fig 3 shows the 95%-confidence ellipsoid calculated for FIMSS and the first 50 estimates. One can see a distinct change in the shape of the confidence ellipsoid from almost round for small inter-sample distance to rather stretched for larger inter-sample distances. The reason is that higher inter-sample distances allow for a better determination of the steady state and, therefore, for the quotient of the parameters. But, these designs collect less information on the intrinsic fluctuations as the inter-sample distance approaches the auto-correlation time and, therefore, the absolute value can be identified with lower precision only. The FIMSS Fisher information covers this change very well.
Each panel considers one experimental design with varying inter-sample distances Δt and 100 observations. The confidence ellipsoid (red) of the FIMSS is able to represent the shape of the distribution of the estimates.
Fig 4 shows the evaluation of the performance of the FIMSS in comparison to FIemp based on the D- and E-criteria. While the FIMSS covers the dynamics well, there is a bias towards overestimating the information content due to its asymptotic nature. However, it is still accurate enough to allow for experimental design, namely choosing an inter-sample distance that leads to an E-optimal design (Δt = 1) or an D-optimal design which needs a larger inter-sample distance of 5 to 10. While the FIMSS would suggest the higher value, FIemp would suggest the lower value. This difference is again due to the non-asymptotic scenario and the approximation in the MSS objective function.
FIMSS Fisher information and FIemp are calculated for different inter-sample distances. The solid line is an interpolation of the values of FIMSS and the “X” denote the values of FIemp.
The D-criterion values of FIMSS between Δt = 5 and Δt = 10 are relatively similar. The same holds for the values of FIemp. This demonstrates a good performance of the MSS Fisher information but also that choosing a good design is fairly robust towards the inter-sample distance. This is important information because it means that deviation of the experimental schedule by 5 time units will not strongly influence the quality of the outcome. However, looking at the E-criterion, a deviation of 5 time units from the optimal inter-sampling schedule will have a stronger impact as the values of the E-criterion for e.g. Δt = 0.3 are much smaller than the optimal values at Δt = 1.
Additionally, the performance of FIMSS is also evaluated based on the ARSE (Fig 5). While there is a slight underestimation of the ARSE, again due to the asymptotic nature of the Fisher information, the FIMSS capture the dependency on the inter-sample distance well.
FIMSS and FIemp are calculated for different inter-sample distances. The solid line is an interpolation of the values of FIMSS and the “X” denote the values of FIemp.
The Fisher information matrix can also be used to extract information on the correlation between the parameters θ1 and θ2. Table 1 summarizes the correlation based on the inverse of FIMSS denoted by Corr(FIMSS) and the correlation of the parameter estimates from the simulated data denoted by Corremp for four designs.
The computational time for an evaluation of the new FIMSS Fisher information matrix with M = 1000 takes less than 1 minute on an Intel Core i7-3770 CPU with 16GB RAM using one kernel.
Comparison with the benchmark approach.
The approach of  is applied to the Immigration-Death model as a benchmark. The Fisher information matrix FIBench is calculated as well as 1000 parameter estimates for the same data set used for the MSS method. FIemp,Bench is calculated from these estimates. As for the MSS method, estimates with an θ1 > 3 are counted as non-converging. Out of the 1000 estimates 2 were non-converging for Δt = 10, 9 for Δt = 12.5 and 33 for Δt = 15.
Fig 6 shows the first 50 estimates and the 2-dimensional confidence ellipsoid of FIBench. The accuracy of the parameter estimates is comparable to the MSS method (Fig 3) as the estimates have a similar distance to the true value (1, 0.1). The Fisher information FIBench leads to a 2-dimensional confidence interval that captures the location of the estimates similarly well as FIMSS.
Each panel considers one experimental design with varying inter-sample distances Δt and 100 observations. The confidence ellipsoid (red) of the FIBench corresponds well with the location of the parameter estimates.
Figs 7 and 8 and Table 2 confirm that the benchmark approach is able to capture the changes in the volume (D-criterion), the longest axis (E-criterion), the ARSEs as well as the correlation. Therefore, both the MSS and the benchmark are well suited for parameter estimation and a calculation of a Fisher information matrix for this model.
FI⋅ Fisher information and FIemp,⋅ are calculated for different inter-sample distances. The solid line is an interpolation of the values of FI⋅ and the “x” denote the values of FIemp,⋅. Red color corresponds to the MSS method, blue to the benchmark and black to the exact method. Symbols partially overlapping.
FI⋅ and FIemp,⋅ are calculated for different inter-sample distances. The solid line is an interpolation of the values of FIex and the “X” denote the values of FIemp,⋅. Red color corresponds to the MSS method, blue to the benchmark and black to the exact method. Symbols partially overlapping.
Comparison with an exact method.
The Immigration-Death example with the above mentioned parametrization is small enough to apply a state truncation and an exact method (ex) for the parameter estimation as well as the calculation of the Fisher information matrix. This approach is based on an analytical calculation of the transition probabilities P(νi; νi−1, θ) as described in : (27) (28)
See S1 Text for details. The sum ∑νi,νi−1 is infinite. However, it is replaced by ∑νi,νi−1 ≤ 30 as the probability to reach a higher number is very small even for large time scales: , 1500 being the longest observation duration used in this simulation study.
Next, the exact approach is applied to the Immigration-Death model. The Fisher information matrix FIex is calculated as well as 1000 parameter estimates for the same data set used for the MSS method and the method of . FIemp,ex is calculated from these estimates. As with the MSS method, estimates with an θ1 > 3 are counted as non-converging and this happened 2 times for Δt = 1, 2 times for Δt = 10, 9 times for Δt = 12.5 and 32 times for Δt = 15 out of the 1000 estimates for each scenario.
Fig 9 shows the first 50 estimates and the 2-dimensional confidence ellipsoid of FIex. The accuracy of the parameter estimates is comparable to the MSS method in Fig 3 (showing that the approximation does not lead to a loss in accuracy). The 2-dimensional confidence intervals describe the change in the cloud of estimates from rather round to rather stretched well and, more important, the 2-dimensional confidence intervals are similar to those calculated with FIMSS in Fig 3.
Each panel considers one experimental design with varying inter-sample distances Δt and 100 observations. The confidence ellipsoid (red) of the FIex is similar than the confidence ellipsoid of the FIMSS in Fig 3.
The observation for the D-criterion, the E-criterion and the ARSEs is similar: FIex and FIemp,ex correspond very well. Again, more important, the results from FIMSS and FIex are very similar (see Figs 7 and 8) which shows that the MSS method is able to calculate accurate Fisher information matrices. The same holds for capturing the correlation (MSS results in Table 1, exact method in Table 3).
Fig 7 also shows that all three approaches overestimate the D- and E- criterion values. To ensure that this does not depend on the specific data set, the exact method was also investigated using different data sets (S6 Fig) showing consistent results. The reason for the overestimation seems to be that the Fisher information is an asymptotic measure and this causes difficulties for longer step-sizes in the Immigration-Death model.
The next example is a Lotka-Volterra model, which shows oscillatory behavior. The model consists of three reactions: (31) where Y(1) and Y(2) denote prey and predator, respectively, and θ1, θ2, θ3 are parameters. The first reaction of Eq (30) is the prey reproduction, the second the predator reproduction, and the third is the predator death. In terms of ODEs this system reads as (32)
The true parameter is set to θ(0) = (0.5, 0.0025, 0.3) and the initial values to Y = (71, 79) as in .
Four different experimental designs of the Lotka-Volterra model are compared to analyze the impact on the amount of information gained in the experiment. The first experimental design LV1 consists of 40 equidistant observations of both prey and predator with an inter-sample distance of 1. As the parameter space is three-dimensional, Fig 10(LV1) shows the three two-dimensional projections of the Nsim = 50 estimates and the confidence ellipsoid from FIMSS for LV1. In general, there is a clear agreement between the estimates and the confidence ellipsoid, which is also reflected in the D- and E-criteria as well as the ARSE as summarized in Table 4 (first row).
Each row shows one of the scenarios LV1 to LV4. In each row the three panels show one two dimensional projection of the three dimensional parameter space. In each panel the black dots are the estimates from simulated data and the confidence ellipsoid from FIMSS is marked red.
Next, an extended observation time frame until T = 200 is considered retaining an inter-sample distance of Δt = 1. This greatly increases accuracy of the estimates and one once again sees a good agreement for the confidence ellipsoids (Fig 10(LV2)) and the optimality criteria (Table 4(second row)). Furthermore, the ARSE is reduced by 50% yielding important information about the benefits in extending the observation time frame.
To investigate whether experimental costs can be reduced by decreasing the number of measured time points while maintaining the information on the parameters, a scan over different inter-sample distances is performed based on an equidistant design with 10 observations (Fig 11). This scan is only performed for the FIMSS Fisher information, a comparison to a covariance from estimates is omitted due to computational time requirements. Based on the optimality criteria, a sample distance of Δt = 7 or Δt = 9 would be preferable to Δt = 1, which was used in LV1 and LV2.
The left panel shows interpolations of values of the D-criterion and the right panel of the E-criterion for different inter-sample distances.
Furthermore, there is no need to consider only equidistant designs rather than also allowing for experimental designs with varying inter-sample distances. Based on the D-criterion, an optimal design with 10 observations points is calculated. This leads to an optimization problem as in Eq (8) with the set of time points T as optimization variable. Here, a particle swarm algorithm is chosen for the optimization with 20 iterations, 25 particles and M = 500. The potential inter-sample distances are limited to be within 1 to 12, as the evaluation of the D- and E-criterion (Fig 11) does not suggest higher values. The resulting optimal design is denoted by LV3. It is assumed that the experimental set-up does not allow for arbitrary precision with respect to the observed time points and that observations may only be recorded at integer time points. The resulting optimal design consists of the time points T = (0, 7, 14, 22, 28, 36, 43, 47, 59, 66, 73). A three-fold reduction in observation time points has no impact on the accuracy as evident in the comparison of confidence ellipsoids and estimates (Fig 10LV3) and in the evaluation of optimality criteria (Table 4(third row). Depending on the experimental set-up, this is a huge reduction in costs. As the FIMSS Fisher information captures this gain precisely, it is a valuable tool to reduce experimental costs.
The Lotka-Volterra model offers the chance to investigate scenarios with partial observation, namely scenarios in which only one species can be observed. The experimental design LV4 has the identical set-up as LV1, except that only the prey is observed. As Fig 10(LV4) demonstrates, the confidence ellipsoid of the new FIMSS captures the shape of the estimates precisely even in partially observed scenarios. Interestingly, the correlation between the parameters θ1 and θ2 as well as θ1 and θ3 changes from positive in the fully observed scenarios to negative in the partially observed case. This is represented in the changed spatial orientation of the estimates’ cloud for LV4 when compared to the other scenarios in Fig 10. Again, the FIMSS is able to capture this change. This is highlighted in Fig 12, where the three-dimensional ellipsoids are compared to the cloud of estimates for LV1 and LV4. Furthermore, the D- and E-criteria as well as the ARSE shown in Table 4 support the fact that the FIMSS is even applicable in partially observed cases.
Left panel fully observed scenario LV1, right panel partially observed scenario LV4. In each panel the black dots are the estimates from simulated data and the three dimensional confidence ellipsoid from FIMSS is marked yellow. One can see that the FIMSS Fisher information captures the change in correlation between the parameters.
Table 5 summarizes the evaluation of the correlation between the parameters for different experimental designs. The fully observed designs (LV1-LV3) show moderate levels of correlation for the estimates, which is mapped by the FIMSS Fisher information in the cases of LV1 and LV2. Due to the FIMSS being an asymptotic measure a sparse sample of ten observed time points is likely the cause for a reduced precision in LV3. The design LV4 has a strong correlation between the second and third component of the parameter vector and the FIMSS Fisher information captures this very well. Furthermore, the sign change of the correlation between the fully observed scenarios to the partially observed case is also represented in the FIMSS.
The computational evaluation of the FIMSS Fisher information matrix with M = 1000 takes approximately 22 minutes for LV1 and 2 hours for LV2 on an Intel Core i7-3770 CPU with 16GB RAM using one kernel. The increase in computational time by a factor of 5 is due to the length of the time series. As LV3 contains even fewer points than LV1, its computational time is even faster with 6 minutes. The computational time of LV4 is with 30 minutes slightly longer than LV1 as additional derivatives for the unobserved initial state have to be calculated.
Comparison with benchmark approach.
The benchmark method  is applied to the LV1 scenario. First, the Fisher information FIBench is calculated and next 50 estimates from the simulated data set. These estimates are used to construct a covariance matrix and its inverse FIemp,Bench. Table 6 shows the comparison of D- and E-criterion as well as ARSE of FIBench, FIemp,Bench, FIMSS and FIemp. The accuracy of the parameter estimation with the benchmark is similar to the MSS (last column in Table 6). A Wilcoxon-Signed Rank test is applied to see whether the small differences in ARSE are significant. The benchmark is significantly better for θ1 in the scenario with 5 observations (0.1 ≥ p ≥ 0.01). The MSS method is significantly better for θ1 (20 observations), θ2 (20, 30 and 40 observations) and θ3 (30 and 40 observations) and strongly significantly better (p < 0.01) for θ1 (30 and 40 observations).
More importantly, the benchmark Fisher information matrix FIBench exhibits a strong overestimation of the precision for larger observation horizons (30 and 40) as its values for the D- and E-criterion are a lot higher than the corresponding values of FIemp,Bench. The same holds for the parameter individual ARSEs which are strongly underestimated.
Fig 13 shows the 2-dimensional projections of the parameter estimates for MSS and benchmark. For small observation horizons (≤ 20) the FIBench performs slightly worse than the MSS method but it roughly captures the location of the estimates. However, for larger observation horizons (30 and 40) the 2-dimensional confidence ellipsoid of FIBench is too small to capture the location of the estimates. Table 7 shows that this is not only a problem of the size of the ellipsoid but also the correlations, as they are also not well reflected.
Each row shows an experimental design. In each row, each panel shows one two dimensional projection of the three dimensional parameter space. In each graphic the black dots are the estimates from MSS and the green dots from Bench. The confidence ellipsoid of FIMSS is marked red and the confidence ellipsoid of FIBench green. The confidence ellipsoid of FIBench for the last row with 30 observations is so small that it can be hardly seen.
The striking difference in performance can be explained as follows: Both methods rely on approximations, namely, the MSS method on an interval-wise LNA and the benchmark on a LNA on the whole systems horizon. As mentioned in the introduction, the second is a lot more restrictive than the first. Whether the approximation holds, can be easily tested. The benchmark’s approximation requires ν ∼ MVN(μ, ΣB) (see Eq 19). If this is fulfilled, then it follows that (ν − μ)AB with is a vector of independent standard normally distributed random variables. A Kolmogorov-Smirnov test can be applied to test this. Similarly, the MSS methods requires νi ∼ N(x(Δi; θ, νi−1), Σ(Δi; θ)) for i = 1, …, n which leads to (νi − x(Δi; θ, νi−1))A, with (A′ A)−1 = Σ(Δi; θ) for i = 1, …, n. This can be also tested by a Kolmogorov-Smirnov test.
Fig 14 shows the p-values of the Kolmogorov-Smirnov test for the benchmark (left panel) and the MSS (right panel) in dependence of the total observation horizon. One can clearly see that the MSS methods assumption is not significantly violated but the benchmark’s assumption clearly fails with increasing observation duration. This shows the strong benefits of applying the multiple shooting approach and using the LNA only on the intervals between observations. Fig 14, therefore, also explains the differences in performance for the benchmark Fisher information FIBench and the MSS Fisher information FIMSS.
P-values for Kolmogorov-Smirnov tests whether the approximation is fulfilled for different observation horizons; left panel shows results for benchmark and right panel for MSS. Each color stands for one of the 50 data sets. Test is performed with the true parameter θ = (0.5, 0.0025, 0.3).
Fig 15 shows the comparison of the mean and the mean plus and minus two standard deviation calculated from the LNA and 100 stochastic simulations over time demonstrating why the performance of the LNA decreases with increasing observation duration. The LNA yields an accurate approximation in the beginning (until time 20) but does not lead to an accurate description for any later time points (from time 20 onwards).
The upper row shows 100 stochastic simulations in gray color and the mean from a LNA in solid red color as well as mean plus and minus two standard deviations in dashed red color. The lower row shows p-values of a Kolmogorov-Smirnov test for each time point whether the 100 stochastic simulations follow a normal distribution with mean and variance from a LNA. The solid line at a p-value of 0.05 illustrates that all values below show significant differences to the LNA approxiamtion. One can see that the quality of the LNA approximation decreases over time. Test is performed with the parameter θ = (0.5, 0.0025, 0.3).
Calcium oscillation model
The third model used to evaluate the new approach is a Calcium oscillation model : (33) where Ca(t) stands for cytosolic Calcium, G(t) for the active subunit of the G-protein and PLC(t) for the activated form of phospholipase C . The behavior of this model differs qualitatively between stochastic and deterministic modeling for small particle numbers as presented in . The true parameter vector is (34) and the initial value is (Ca, G, PLC)(0) = (10, 10, 10). This model shows highly nonlinear oscillations in stochastic modeling but only small amplitude regular oscillations in deterministic modeling (Fig 1). Therefore, this model is excellent for testing any methods analyzing models with intrinsic stochasticity. Even more, Calcium oscillations are also of a high practical relevance: in cell development and death as well as fertilization .
Even though the systems is highly nonlinear, the FIMSS can be calculated with a moderate number of M = 400 pseudo data sets, as representatively shown for the 2 × 2 entry in Fig 16. The remaining entries of the FIMSS can be found in S3 Fig. The fact that the FIMSS can be calculated with a moderate number of pseudo data sets even in highly nonlinear systems is essential as otherwise the computational costs would be to high for performing experimental design, which needs multiple FIMSS calculations. The respective plots for the partially observed case can all be found in S4 Fig.
The x-axis shows the number of pseudo data sets M used for calculating the 2 × 2 entry of FIMSS, the mean is shown as solid line. Gray color shows the area from sample mean plus / minus one standard deviation. As the width of the gray area is decreasing, the accuracy is increasing. One can see that already small values as M = 400 give a good approximation.
Fig 17 shows the consensus between the FIMSS Fisher information matrix and the 2-dimensional projections of the cloud of the Nsim = 50 estimates from simulated data. Each panel shows one 2-dimensional projection of the parameter space. As with the previous models, there is a nice agreement between the FIMSS Fisher information and the shape and the size of the cloud of estimates for all projections.
The panels show the two dimensional projections of the 12-dimensional parameter space for the Δt = 0.5 design. In each panel the black dots are the estimates from simulated data and the confidence ellipsoid of the Fisher information is marked red. “k” is used as an abbreviation for “thousand”.
Table 8 shows the consensus between FIMSS and FIemp on the D- and E-criterion. The FIMSS Fisher information captures the volume reduction of the confidence ellipsoid (D-criterion) with larger inter-sample distance. The FIMSS Fisher information also demonstrates robustness of the minimal eigenvalue (E-criterion) towards changes in the inter-sample distance. This means that the newly defined FIMSS Fisher information matrix is able to capture key properties of experimental design in systems with highly stochastic oscillations. The fact that the D-criterion changes, while the E-criterion is almost constant over the three experimental designs, means that the volume of the confidence ellipsoid reduces while its axis in the parameter space with the smallest precision does not improve. A similar effect has been observed for the Immigration-Death model with designs ID-Δt = 1.0 and ID-Δt = 7.5 in Fig 3.
An investigation of the ARSE for the three different designs (Table 9) shows that the FIMSS performs equally well for all three designs.
In contrast to the D-criterion, which improves with increasing inter-sample distance, the ARSE remains fairly constant. A similar phenomenon could be observed in the Immigration-Death model (Fig 3) where the ARSE is fairly the same for Δt = 0.5 and Δt = 7.5 while the volume is a lot smaller for the second design.
During the analysis of the Calcium model one potential drawback of the LNA based MSS objective function was encountered. The LNA approximation in the MSS objective function breaks down for large inter-sample distances as the influence of nonlinear effects on the dynamics increases. In addition to the theoretical condition on the LNA—discussed in detail in —there is an easy way to detect such situations: Calculate M1 = FIMSS(ϑ, T) and M2 = FIMSS(ϑ, T′) with designs T = (t0, t1, t2, …, tn) and . If the LNA approximation holds, the entries of M1 and M2 should have a similar size. Therefore, the quotients of the diagonal elements of M1 and M2 are calculated for comparison. The mean and two standard deviations serve as an indication how close these values are to 1 (which would indicate a similar size). If they are close to one, this means that the choice of the time step does not influence the result. Their mean and standard deviation are for Δt = 0.10: 1.02 ± 0.20, for Δt = 0.20: 1.00 ± 0.15, for Δt = 0.50: 0.72 ± 0.48. The LNA does not hold any more for Δt = 1.0: 4.9 10−5 ± 1.9 10−4, Δt = 1.5: 2.2 10−8 ± 3.9 10−8 and Δt = 2.0: 1.0 10−9 ± 6.0 10−10. Therefore, experimental designs with inter-sample distances of Δt = 1.0 or higher cannot be recommended. For the Lotka-Volterra and Immigration-Death model, there was no such indication and the LNA approximation was valid for all considered step-sizes.
The Calcium model also indicates that it is more consistent to use the scheme of Eq (16) for generating the pseudo data than the Gillespie algorithm. As even the interval-wise LNA becomes critical with longer inter-sample distance, one can either use a rough model approximation with the MSS and then calculate the Fisher information consistently with the scheme or use the correct (Gillespie) model and use a rough approximation for the Fisher information. As the calculation of the Fisher information includes derivatives, the first seems to be more robust towards rough approximations. S7 Fig creates the same plot as in Fig 17 but with Gillespie simulations instead of the scheme. The result shows still a good agreement of FIemp and FIMSS but the use of the scheme is favorable as the consistence between FIemp and FIMSS is better in Fig 17. The D-criterion for FIMSS with the Gillespie algorithm is 1.1 × 10−6, which underlines that the scheme is more suited (Table 8 shows that the D-criterion FIemp is 6.6 10−10 and the D-criterion of FIMSS with the scheme is 1.2 10−9). Whenever the interval-wise LNA is not rough, there is no statistic difference between pseudo data from the scheme or the Gillespie algorithm, so the choice does not matter.
To evaluate the parameter correlation calculated by the FIMSS, correlation matrices are composed of the FIMSS and the estimates for a design with an inter-sample distance of Δt = 0.1. The consensus is evaluated based on the absolute values of their difference, which is illustrated in Fig 18(A).
A shows the fully observed Calcium oscillation model and B the partially observed scenario. Two correlation matrices are calculated for each of the cases, one from FIMSS and the other from the estimates. The absolute value of their differences is color-coded.
The computational time for an evaluation of the Fisher information matrix with M = 1000 takes roughly 6 hours on an Intel Core i7-3770 CPU with 16GB RAM using eight kernels.
The state estimation procedure for the partially observed case has been developed and shown to work well previously . However, in highly nonlinear stochastic models with few observables, imprecise state estimates are in principle possible.  (Fig 5) shows that, even in such case, the method can extract information from current data and re-adapt the estimates of the unobservables to their underlying dynamics. Nevertheless, it means that there is at least one poor state estimate which might have resulted in one very unlikely transition to the following data point. This unlikely transition leads to a very small probability, which results in a very high negative log-likelihood value and in an unrealistically high term of the Fisher information matrix.
To circumvent this issue, the transitions between time points ti−1 and ti are determined for which holds for all observed components k, with q15 being the 10−15 quantile of a standard normal distribution. For time steps that do not fulfill this condition, the corresponding components of the MSS objective function derivatives are set to zero and not counted for FIMSS which means that these time intervals are disregarded. Using the 10−15-quantile means that only very strong outliers are not counted as information in the FIMSS Fisher information matrix. In fact, calculating the FIMSS for a scenario in which only Calcium is observable, this happened on average for 2.8 in 100 intervals over the M = 1000 pseudo data sets; for a scenario in which both Calcium and PLC are observable only 0.5 in 100 failed this condition. No occurrences were detected in the partially observed Lotka-Volterra system. Furthermore, the components of the state estimates are lower bounded with 0.1 for numerical reasons.
Investigating the ARSE for the partially observed scenario showed a good agreement between the FIMSS and the estimates for most of the parameters (Table 10). The deviations for e.g. θ10 and θ11 might indicate that the landscape of the parameter space contains some strong nonlinearities which cannot be captured by the Fisher information matrix as it is, by definition, a quadratic measure. These nonlinearities could lead to non-identifiabilities that are not captured by the Fisher information. A similar phenomenon has been observed previously  and the use of the profile likelihood techniques  has been suggested to improve the analysis.
However, the important gain from this analysis is that the FIMSS Fisher information matrix can be used to compare the three designs, fully observed, Calcium and PLC observed, and only Calcium observed. This comparison leads to the insight that the additional measurement of PLC gives a modest increase in accuracy compared to only measuring Ca. The further additional measurement of G (leading to a full observation) does not have a remarkable impact on the information and accuracy anymore. This means that one can easily save the cost of measuring G. Depending on the costs for measuring PLC, a compromise between accuracy and cost can be reached. The newly suggested FIMSS Fisher information matrix covers these differences well and can, therefore, serve as a valuable instrument in deciding on the design of an experiment.
The correlation structure between the parameters can also be reproduced as shown in Fig 18B for the scenario with observation of Calcium and PLC.
This work introduces an approach to calculate a FIMSS Fisher information matrix for stochastic models based on the MSS objective function . The FIMSS approach is able to successfully capture important experimental design properties such as precision and correlation in challenging models. Furthermore, it allows the comparison of the information content of different experimental design and, by that, choose an optimal design. The article demonstrates that these features hold for highly nonlinear models that might even show a qualitatively different behavior in stochastic modeling than in deterministic modeling. Therefore, the method is particularly suited for application on signaling pathways in systems biology.
The calculation of the FIMSS Fisher information is based on the MSS objective function [13, 14]. The MSS objective function treats the intervals between succeeding observations separately. On each interval a LNA is used and the unobserved states are updated with a state estimation procedure. The FIMSS Fisher information is calculated based on this MSS objective function and the use of pseudo data which is gained by the same MSS approximation.
The dependency of the FIMSS precision on the number of pseudo data sets used for the calculation was investigated. As illustrated in Figs 2 and 16 and S1 Fig to S4 Fig, a few hundred pseudo data sets are sufficient to obtain a good approximation. As there are many evaluations of the Fisher information matrix involved in finding the optimal experimental design, this is a critical characteristic of the new method.
The Fisher information is an asymptotic description of the inverse of the covariance matrix of a maximum likelihood estimator. In particular in the stochastic case it is also approximative. Thus, it is very important to investigate whether its accuracy is still satisfactory under realistic (particularly finite) data scenarios. Therefore, this work compares the FIMSS Fisher information matrix to FIemp, the inverse of a covariance matrix calculated from parameter estimates. These parameter estimates are gained by performing parameter estimations on simulated data sets. Both, FIMSS and FIemp, are then compared based on
- two-dimensional projections of the confidence ellipsoids and the estimates—which is easiest for visualization,
- optimality criteria such as determinant (corresponding to volume of confidence ellipsoid) and minimal eigenvalue (corresponding to the largest axis of the confidence ellipsoid),
- average relative squared errors and
- the correlation structure.
All this is solely done to evaluate the accuracy of the suggested methodology. There is no need for the comparison in real life applications where it is enough to calculate the FIMSS Fisher information matrix for designing experiments.
Three test models were used to demonstrate the power of the newly suggested FIMSS Fisher information matrix: an Immigration-Death model, a Lotka-Volterra model, and a Calcium oscillation model. The newly defined FIMSS Fisher information matrix proved to be successful for all four test measures (a-d). Figs 3, 10 and 17 show that it reflects the shape and size of the two-dimensional projections of the confidence intervals. Furthermore, Fig 4 and Tables 4 and 8 show that it covers the volume (D-criterion) and the largest axis (E-criterion) of the multi-dimensional confidence ellipsoid. The average relative squared error (ARSE) is covered precisely as well (Fig 5 and Tables 4+9). Additionally, the correlation structure is reflected accurately (Tables 1 and 5 and Fig 18).
During the evaluation of the Immigration-Death model a larger optimal inter-sample distance was obtained based on the D-criterion compared to the ARSE (Fig 4). Depending on the experimenter’s interest, the newly introduced FIMSS MSS Fisher information matrix aids in choosing an appropriate experimental design. The analysis of the Lotka-Volterra model demonstrates the gain in precision by extending the observed time frame (five-fold from LV1 to LV2). An even greater gain can be achieved by allowing for non-equidistant designs. Here, a similar amount of information can be obtained with 10 observations compared to 40 observation at equidistant time intervals. Depending on the experimental set-up, this is a huge reduction in costs. The Calcium model showed that there is an increase in information when measuring PLC and Calcium instead of only Calcium. Measuring also G (hence all three variables) does not lead to a strong increase in information anymore. As this analysis can be run before performing any experiments, the Fisher Information is a very valuable tool for experimental design. The FIMSS Fisher information matrix extended its applicability to signaling pathways with high nonlinearity and intrinsic stochastic effects that lead to a qualitatively different behavior from the deterministic solution.
The approach of  uses the expected Kullback Leibler divergence between prior and posterior distribution to measure the information content of an experiment. The potential lack of prior knowledge on the parameter can be handled with an uninformative prior. This is an advantage compared to the MSS method using the Fisher information matrix, which is a parameter dependent measure. However,  uses Monte Carlo simulations thrice to explore a) the parameter space, b) the observation space and c) the state space. While the additional computational cost of a) leads to a broader applicability (in case of poor prior knowledge regarding the parameter) and the cost of b) is comparable to the MSS method’s simulation cost, the additional computational cost for c) is a critical advantage of the MSS method, especially in signaling pathways with fast dynamics and a huge state space such as the Calcium model (in which the states of all three components take values from 0 to 10 000 within a few time units).
 is suited for an experimental set up with multiple measurements per time point comparing their moments with parametrized theoretical moments based on a moment closure without the use of simulations. The MSS approach differs from this approach as it is suited to experiments with measurements from only one time course (and not multiple measurements per time point) and it uses simulations to generate the pseudo data for the FIMSS. Next, MSS employs a LNA in contrast to moment closure, see  for a comparison of LNA and moment-closure which are both used to calculate moments of stochastic systems.
In contrast to other recent approaches from [11, 25], the FIMSS Fisher information matrix only needs the LNA on the relatively short time interval between two succeeding measurement points. This makes it less restrictive than a LNA on the whole time horizon as a comparison with a benchmark () has shown. This benchmark treats the observations as samples from a multivariate normal distribution with a mean equaling the deterministic solution and a covariance matrix containing all inter temporal covariances. The LNA is applied to calculate these inter-temporal covariances. If the system can be approximated with a LNA over the whole time horizon, the benchmark approach has two advantages: a) it allows to consider the inter-temporal correlations which provide additional information that cannot be exploited with the MSS method and b) it does not need Monte Carlo simulations for the calculation of a Fisher information matrix. It needs only one ODE solution and one calculation of the inter-temporal covariances, which is an increase in computational speed. This is also of benefit for parameter estimation because  needs only one computation of the ODE and of the inter-temporal covariance system independent of the number of single-cell trajectories. However, the number of rows and columns of the inter-temporal covariance matrix scales with the product of the number of time points and the number of components.
The results of the Immigration-Death model show a similar performance of MSS and the benchmark method regarding parameter estimation and the calculation of the Fisher information matrix. As there is little inter-temporal correlation in the model, the benefit “a)” of the benchmark method is small. The Lotka-Volterra model shows an acceptable performance of the benchmark for small observation horizons. However, the MSS method performs better even here (Fig 13). For larger observation horizons (30 and 40) the difference becomes striking (last two rows of Fig 13). The benchmark is not able to reflect the location of the estimates anymore, while the MSS method still accurately describes the location of the estimates. The reason is that the benchmarks approximation is not valid anymore, while the less restrictive approximation of the MSS method still holds. This has been evaluated using a Kolmogorov-Smirnov test (Fig 14). Comparing this figure to S5 Fig showing the results for the Immigration-Death model, explains the benchmarks acceptable performance for the Immigration-Death model and the poor performance for the Lotka-Volterra model. Fig 13 also shows that the accuracy of the parameter estimation is less affected than the accuracy of the Fisher information, possibly due to the variance acting as a weighting factor for the parameter estimation. A very rough approximation of this weighting factor might have little influence (as indicated in ) on the parameter estimation but stronger influence on the Fisher information matrix, which describes how well the (parameter dependent) changes in the variance can be exploited.
While LNAs over the whole system’s horizon have been successfully used to model single-cell experiments [45, 46], the presented examples demonstrate that this is not generally the case and a less restrictive approximation such as the MSS is needed. The theoretical condition for the LNA approximation used in the MSS objective function are discussed in [13, 14] and reviewed in the supporting information S1 Text. Even the interval-wise LNA used in MSS can fail to hold (e.g. in case of only very few molecules present in the system), see  for details on theoretical and practical limitations of LNAs. Including higher order terms in the calculation of the covariance can improve the situation . However, this has not been necessary for the presented models. A way to evaluate the validity of the LNA approximation was introduced using the Calcium model. The time steps for the creation of the pseudo data sets were varied and the impact on the entries of the FIMSS was compared. This control helps to identify designs with an high amount of information and with an applicability of the MSS objective function for parameter estimation.
Work by  and  suggests and objective function using a LNA embedded in a Kalman filtering framework. This approach is similar to the MSS objective function as it also treats intervals between measurements separately and uses a LNA. However, the MSS method is more general as the state updating formulation (Eq 14) could also be straightforward extended to non Gaussian measurement noise. In case of Gaussian measurement noise the state updating formula (Eq 14) is equal to the state updating of [49, 50]. Differences can be found in the initialization of the variance/co-variance for each interval. The MSS initializes with 0 (as in Eq 12), [49, 50] use a Kalman filter recursion. This is an alternative to the updating procedure for MSS and might allow for a more precise description of the variance and co-variance. Therefore, it might be a promising objective function to be incorporated into an experimental design framework. However, this approach has not yet been used for calculating a Fisher Information matrix or experimental design.
Calculating exact Fisher information matrices is only possible in small example models such as the Immigration-Death model. However, the comparison of the MSS method to an exact method is an important part of a performance study. The results show that the accuracy of parameter estimation and of the calculation of the Fisher information matrix are comparable to the exact approach (MSS in Figs 3, 4 and 5, exact method in Figs 9, 7 and 8). This is an important message as it shows that the use of the interval-wise LNA approximation does not lead to a loss of accuracy for the Immigration-Death model.
The Fisher information is a parameter-dependent measure. Therefore, its power for experimental design depends on the knowledge of parameters based on professional expertise or previous experiments. If there is no such knowledge, robust experimental design [17, 18] or Bayesian experimental design  suggest strategies for a deterministic framework, which are applicable to the FIMSS.
It is possible to fix the random seed before the computation making the FIMSS Fisher information deterministic. This is very advantageous for the optimization where the user can apply Bayesian techniques as well as global optimization or gradient based methods.
None of the computations in this article required the use of a computing cluster. One evaluation of the FIMSS Fisher information matrix takes less than one minute on one kernel for the Immigration-Death model, approximately 20 minutes (LV1) to 120 minutes (LV2) on one kernel for the Lotka-Volterra model, and roughly 6 hours for the Calcium oscillation model on eight kernels. This demonstrates the applicability of the FIMSS Fisher information matrix to realistic size models from a computational point of view.
S1 Fig. Dependence of the accuracy of the FIMSS entries on the number of pseudo data sets for the fully observed Lotka-Volterra model.
Each panel shows one entry of the Fisher information matrix. Note that the i × j entry is identical to the j × i entry due to the symmetry of the matrix. The x-axis shows the number of pseudo data sets used for calculating the sample mean of Eq (7) shown as solid line. Gray color shows the area from sample mean plus / minus one standard deviation. As the width of the gray are is decreasing, the accuracy is increasing. Small values as M = 200 already give a good approximation.
S2 Fig. Dependence of the accuracy of the FIMSS entries on the number of pseudo data sets for the partially observed Lotka-Volterra model.
Same setting as in S1 Fig.
S3 Fig. Dependence of the accuracy of the FIMSS entries on the number of pseudo data sets for the fully observed Calcium oscillation model.
Same setting as in S1 Fig.
S4 Fig. Dependence of the accuracy of the FIMSS entries on the number of pseudo data sets for the partially observed Calcium oscillation model.
Same setting as in S1 Fig.
S5 Fig. Testing the approximation for different observations horizons.
P-values for Kolmogorov-Smirnov tests whether the approximation is fulfilled for different observation horizons in the Immigration-Death model; left panel shows results for benchmark and right panel for MSS. Each color stands for one of the 50 data sets. Test is performed with the true parameter θ = (1, 0.1) and the scenario with the largest inter-sample distance, namely Δt = 15.
S6 Fig. D-criterion and E-criterion for the exact method for the Immigration-Death model with multiple evaluations of FIemp,ex.
FIex Fisher information and FIemp,ex are calculated for different inter-sample distances. The solid line is an interpolation of the values of FIex and the “x” denote the 10 values of FIemp,ex.
S7 Fig. Parameter estimates and confidence ellipsoid from FIMSS for Calcium model using the Gillespie algorithm to generate the pseudo data.
The panels show the two dimensional projections of the 12-dimensional parameter space for the Δt = 0.5 design. In each panel the black dots are the estimates from simulated data and the confidence ellipsoid of the Fisher information is marked red. “k” is used as an abbreviation for “thousand”.
Christoph Zimmer was supported by Bioms. CZ would like to thank Ruth Großeholz for proofreading and helpful comments. CZ would also like to thank the two anonymous reviewers for their valuable comments.
- Conceived and designed the experiments: CZ.
- Performed the experiments: CZ.
- Analyzed the data: CZ.
- Contributed reagents/materials/analysis tools: CZ.
- Wrote the paper: CZ.
- 1. Raj A, van Oudenaarden A. Single-Molecule Approaches to Stochastic Gene Expression. Annu Rev Biophys. 2009;38:255–270. pmid:19416069
- 2. Gillespie DT. A General Method for Numerically Simulating the Stochastic Time Evolution of coupled Chemical Reactions. Journal of Computational Physics. 1976;22 (4):403–434.
- 3. Pahle J. Biochemical simulations: stochastic, approximate stochastic and hybrid approaches. Briefings in Bioinformatics. 2009;10 (1):53–64. pmid:19151097
- 4. Andreychenko A, Mikeev L, Spieler D, Wolf V. Approximate maximum likelihood estimation for stochastic chemical kinetics. EURASIP Journal on Bioinformatics and Systems Biology. 2012;9.
- 5. Gillespie CS. Moment-closure approximations for mass-action models. IET Systems Biology. 2009;3 (1):52–58. pmid:19154084
- 6. Gillespie CS, Golightly A. Bayesian inference for generalized stochastic population growth models with application to aphids. Applied Statistics. 2010;59 (2):341–357.
- 7. Hasenauer J, Wolf V, Kazeroonian A, Theis FJ. Method of conditional moments (MCM) for the Chemical Master Equation; A unified framework for the method of moments and hybrid stochastic-deterministic models. J Math Biol. 2013; pmid:23918091
- 8. Mikeev L, Wolf V. Parameter Estimation for Stochastic Hybrid Models of Biochemical Reaction Networks. HSCC 12, Beijing. 2012;.
- 9. Boys RJ, Wilkinson DJ, Kirkwood TBL. Bayesian inference for a discretely observed stochastic kinetic model. StatComput. 2008;18:125–135.
- 10. Wang Y, Christley S, Mjolsness E, Xie X. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC Systems Biology. 2010;4:99. pmid:20663171
- 11. Komorowski M, Costa MJ, Rand DA, Stumpf MPH. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. PNAS. 2011;108:21:8645–8650. pmid:21551095
- 12. Zimmer C, Sahle S. Parameter Estimation for Stochastic Models of Biochemical Reactions. Journal of Computer Science & Systems Biology. 2012;6:011–021.
- 13. Zimmer C, Sahle S. Deterministic inference for stochastic systems using multiple shooting and a linear noise approximation for the transition probabilities. IET Systems Biology. 2015;9:181–192. pmid:26405142
- 14. Zimmer C. Reconstructing the hidden states in time course data of stochastic models. Mathematical BioSciences. 2015;269:117–129. pmid:26363082
- 15. Körkel S, Kostina E. Numerical Methods for Nonlinear Experimental Design. In: Bock HG, Kostina E, Phu HX, Rannacher R, editors. Modelling, Simulation and Optimization of Complex Processes, Proceedings of the International Conference on High Performance Scientific Computing. Hanoi, Vietnam: Springer; 2004. p. 255–272.
- 16. Faller D, Klingmüller U, Timmer J. Simulation Methods for Optimal Experimental Design in Systems Biology. SIMULATION. 2003;79:717–725.
- 17. Bock HG, Körkel S, Kostina E, Schlöder JP. Robustness Aspects in Parameter Estimation, Optimal Design of Experiments and Optimal Control. In: Reactive Flows, Diffusion and Transport. From Experiments via Mathematical Modeling to Numerical Simulation and Optimization Final Report of SFB (Collaborative Research Center) 359. Jäger, W. and Rannacher, R. and Warnatz, J.; 2007. p. 117–146.
- 18. Körkel S, Kostina E, Bock HG, Schlöder JP. Numerical Methods for Optimal Control Problems in Design of Robust Optimal Experiments for Nonlinear Dynamic Processes. Optimization Methods and Software. 2004;19:327–338.
- 19. Lehmann EL, Casella G. Theory of Point Estimation. Springer; 1998.
- 20. Fedorov VV, Hackl P. Model-Oriented Design of Experiments; 1997.
- 21. Bauer I, Bock HG, Köerkel S, Schlöder JP. Numerical Methods for Optimum Experimental Design in DAE systems. Journal of Computational and Applied Mathematics. 2000;120:1–25.
- 22. Kreutz C, Timmer J. Systems biology: experimental design. FEBS Journal. 2009;276:923–942. pmid:19215298
- 23. Chaloner K, Verdinelli I. Bayesian Experimental Design: A Review. Statistical Science. 1995;10:273–304.
- 24. Steiert B, Raue A, Timmer J, Kreutz C. Experimental Design for Parameter Estimation of Gene Regulatory Networks. PlosONE. 2012;7:e40052.
- 25. Pagendam DE. Experimental Design and Inference for Population Models. PhD thesis, University of Queensland. 2010;.
- 26. Ruess J, Milias-Argeitis A, Lygeros J. Designing experiments to understand the variability in biochemical reaction networks. Journal of the Royal Society Interface. 2013;10.
- 27. Ruess J, Parise F, Milias-Argeitis A, Khammash M, Lygeros J. Iterative experiment design guides the characterization of a light-inducible gene expression circuit. PNAS. 2015;112:8148–8153. pmid:26085136
- 28. Nandy P, Unger M, Zechner C, Koeppl H. Optimal Perturbations for the Identification of Stochastic Reaction Dynamics. IFAC; 2012. p. 686–691.
- 29. Berridge MJ, Bootman MD, Lipp P. Calcium—a life and death signal. Nature, news and views feature. 1998;395 (6703):645–648.
- 30. Kurtz TG. The Relationship between Stochastic and Deterministic Models for Chemical Reactions. The Journal of Chemical Physics. 1972;57(7):2976–2978.
- 31. Wilkinson DJ. Stochastic Modelling for Systems Biology. Boca Raton: Chapman & Hall/CRC, Mathematical and Computational Biology Series; 2006.
- 32. Wu S, Fu J, Cao Y, Petzold L. Michaelis-Menten speeds up tau-leaping under a wide range of conditions. The Journal of Chemical Physics. 2011;134 (13):134112. pmid:21476748
- 33. Wilkinson DJ. Stochastic modeling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics. 2009;10 (2):122–133. pmid:19139763
- 34. Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, et al. COPASI—a COmplex PAthway SImulator. Bioinformatics. 2006;22 (24):3067–3074. pmid:17032683
- 35. Gardner TS, Cantor CR, Collins JJ. Construction of a genetic toggle switch in Escherichia coli. Letters to Nature. 2000;403:339–342.
- 36. Kummer U, Krajnc B, Pahle J, Green AK, Dixon CJ, Marhl M. Transition from Stochastic to Deterministic Behavior in Calcium Oscillations. Biophysical Journal. 2005;89:1603–1611. pmid:15994893
- 37. Zimmer C, Sahle S, Pahle J. Exploiting intrinsic fluctuations to identify model parameters. IET Systems Biology. 2015;9:64–73. pmid:26672148
- 38. Porat B, Friedlander B. Computation of the exact information matrix of Gaussian time series with stationary random components. IEEE T Acoust Speech. 1986;34:118–130.
- 39. Munsky B, Neuert G, van Oudenaarden A. Using Gene Expression Noise to Understand Gene Regulation. Science. 2012;336:183–187. pmid:22499939
- 40. Munsky B, Trinh B, Khammash M. Listening to the noise: random fluctuations reveal gene network parameters. Molecular Systems Biology. 2009;5 (318). pmid:19888213
- 41. Piaggio HTH. An elementary treatise on differential equations and their applications. London: Bell & Hyman; 1982.
- 42. Fröhlich F, Theis FJ, Hasenauer J. Uncertainty Analysis for Non-identifiable Dynamical Systems: Profile Likelihoods, Bootstrapping and More. In: Computational Methods in Systems Biology;.
- 43. Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics;25 (15). pmid:19505944
- 44. Grima R. A study of the accuracy of moment-closure approximations for stochastic chemical kinetics. J Chem Phys. 2015;136.
- 45. Pahle J, Challenger JD, Mendes P, McKane AJ. Biochemical fluctuations, optimisation and the linear noise approximation. BMC Systems Biology. 2012;6. pmid:22805626
- 46. Elf J, Ehrenberg M. Fast Evaluation of Fluctuations in Biochemical Networks With the Linear Noise Approximation. Genome Research. 2003;13:2475–2484. pmid:14597656
- 47. Grima R. An effective rate equation approach to reaction kinetics in small volumes: Theory and application to biochemical reactions in nonequilibrium steady-state conditions. The Journal of Chemical Physics. 2010;133:035101. pmid:20649359
- 48. Thomas P, Matuschek H, Grima R. How reliable is the linear noise approximation of gene regulatory networks? BMC Genomics. 2013;14. pmid:24266939
- 49. Finkenstädt B, Woodcock DJ, Komorowski M, Harper C, Davis JRE, White MRH, et al. Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: an application to single cell data. The Annals of Applied Statistics. 2013;7:1960–1982.
- 50. Fearnhead P, Giagos V, Sherlock C. Inference for Reaction Networks Using the Linear Noise Approximation. Biometrics. 2014;70:457–466.