## Figures

## Abstract

Modern optical imaging experiments not only measure single-cell and single-molecule dynamics with high precision, but they can also perturb the cellular environment in myriad controlled and novel settings. Techniques, such as single-molecule fluorescence in-situ hybridization, microfluidics, and optogenetics, have opened the door to a large number of potential experiments, which begs the question of how to choose the best possible experiment. The Fisher information matrix (FIM) estimates how well potential experiments will constrain model parameters and can be used to design optimal experiments. Here, we introduce the finite state projection (FSP) based FIM, which uses the formalism of the chemical master equation to derive and compute the FIM. The FSP-FIM makes no assumptions about the distribution shapes of single-cell data, and it does not require precise measurements of higher order moments of such distributions. We validate the FSP-FIM against well-known Fisher information results for the simple case of constitutive gene expression. We then use numerical simulations to demonstrate the use of the FSP-FIM to optimize the timing of single-cell experiments with more complex, non-Gaussian fluctuations. We validate optimal simulated experiments determined using the FSP-FIM with Monte-Carlo approaches and contrast these to experiment designs chosen by traditional analyses that assume Gaussian fluctuations or use the central limit theorem. By systematically designing experiments to use all of the measurable fluctuations, our method enables a key step to improve co-design of experiments and quantitative models.

## Author summary

A main objective of quantitative modeling is to predict the behaviors of complex systems under varying conditions. In a biological context, stochastic fluctuations in expression levels among isogenic cell populations have required modeling efforts to incorporate and even rely upon stochasticity. At the same time, new experimental variables such as chemical induction and optogenetic control have created vast opportunities to probe and understand gene expression, even at single-molecule and single-cell precision. With many possible measurements or perturbations to choose from, researchers require sophisticated approaches to choose which experiment to perform next. In this work, we provide a new tool, the finite state projection based Fisher information matrix (FSP-FIM), which considers all cell-to-cell fluctuations measured in modern data sets, and can design optimal experiments under these conditions. Unlike previous approaches, the FSP-FIM does not make any assumptions about the shape of the distribution being measured. This new tool will allow experimentalists to optimally perturb systems to learn as much as possible about single-cell processes with a minimum of experimental cost or effort.

**Citation: **Fox ZR, Munsky B (2019) The finite state projection based Fisher information matrix approach to estimate information and optimize single-cell experiments. PLoS Comput Biol 15(1):
e1006365.
https://doi.org/10.1371/journal.pcbi.1006365

**Editor: **Oleg A. Igoshin,
Rice University, UNITED STATES

**Received: **July 11, 2018; **Accepted: **December 2, 2018; **Published: ** January 15, 2019

**Copyright: ** © 2019 Fox, Munsky. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data and algorithms are within the paper and its Supporting Information files. All computational analysis codes are available at https://github.com/Munsky/FSPFIM_2018.

**Funding: **Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM124747. The work reported here was partially supported by a National Science Foundation grant (DGE-1450032). Any opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the National Science Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Recent labeling and imaging technologies have greatly increased capabilities to measure biological phenomena at the single-cell and single-molecule levels. When conducted under different conditions, single-cell experiments can probe processes for different spatial or temporal resolutions, for different population sizes, under different stimuli, at different times during a response, and for myriad other controllable or observable factors [1–7]. As these experiments have become more capable to precisely perturb or measure different biological species, they have also become more expensive, which imposes a limit on the number and type of experiments that can be conducted in any given study. Clearly, not all experiment designs provide the same information, and different experiments may be “optimal” to answer different questions about the system. However, the inherent diversity of modern experiments makes it difficult to intuit which experiments will be most informative and in which circumstances. Computational tools for model-driven experiment design could help to select more informative experiments, provided that existing tools can be adapted to overcome the unique challenges presented by single-cell data.

One model-driven approach to optimal experiment design is to use the *Fisher information matrix* (FIM), which describes the precision to which a model’s parameters can be estimated for any particular experiment [8–13]. To improve estimates of model parameters, the FIM can be used iteratively in a Bayesian framework by specifying maximally informative experimental conditions, collecting data under these conditions, using new data to constrain parameters, and using the newly constrained parameters to design the next round of experiments [9, 12–15]. The formalism of the FIM for experiment design has been used to great effect in engineering disciplines, such as radar, astrophysics, and optics [16–18]. In principle, similar analyses could introduce a natural feedback in the co-design of single-cell experiments and discrete stochastic models, but for this to work, accurate analyses are needed to extract more meaning from the data and to provide better predictions about how biological systems will behave under new conditions.

Experimentally observed cell-to-cell variability has been well demonstrated to provide substantial quantitative insight to constrain and identify the mechanisms and parameters of gene regulation models [1–6, 19–21]. Therefore, the FIM analysis for the optimal design of single-cell experiments should explicitly consider such single-cell variability. Standard FIM analyses assume continuous-valued observables with Gaussian-distributed *measurement* noise. However, in contrast to most classical engineering applications, the distributions of integer-valued RNA or protein levels across an isogenic cell population can be highly complex and subject to intrinsic and extrinsic variations, with nonlinear interactions that lead to multiple peaks and long tails [2, 22–24]. Because the FIM is not computable for general discrete stochastic processes with non-Gaussian distributions, computational biologists have applied various approximations to estimate the FIM. A few recent biological studies use the Linear Noise Approximation [25] to treat single-cell distributions as Gaussian, which allows for the use of standard Fisher information analyses [8]. This approach, which we refer to as the LNA-FIM, should be valid for large numbers of molecules, but it is unlikely to be accurate for systems with high intrinsic noise corresponding to low gene, RNA, or protein counts. A different approach to estimate the FIM uses the central limit theorem (CLT) to approximate the sample mean and covariance to be jointly Gaussian and uses higher-order moments of the chemical master equation to estimate the likelihood of these moments [9]. This approach, which we refer to as the sample moments approach (SM-FIM), should be valid for large numbers of cells as can be collected in high-throughput experimental approaches, such as flow cytometry. However, when distributions have long asymmetric tails and sample sizes are limited, higher moments become very difficult to estimate and can lead to surprising model estimation errors [26]. Beyond these few Gaussian assumptions, there has been little work devoted to improve the design of time-varying single-cell experiments for systems with arbitrary probability distributions.

In this study, we introduce a formulation of the Fisher information for use with discrete stochastic models and data sets containing intrinsic variability that is measurable with single-biomolecule resolution. Our approach utilizes the finite state projection (FSP) approach [27] to solve the chemical master equation (CME) [25, 28], and compute the likelihood of single-cell data given a discrete stochastic model [2, 21, 24]. The FSP solves for the probability distribution over discrete numbers of biomolecules to any arbitrary error tolerance. By utilizing the full probability distributions, as opposed to finite order or approximate moments of these distributions, our approach makes no assumptions and works well for distributions with multiple peaks or long tails.

In the next section, we introduce the FSP and derive the sensitivities of the FSP solution to small perturbations in parameters. Next, we derive the likelihood function and its local sensitivity for discrete stochastic models and discrete data. These allow us to formulate and compute the FSP-FIM. Next, we use a combination of analytical results and numerical simulations to verify the FSP-FIM for two common models of gene expression. Finally, we demonstrate how the FSP-FIM can be applied to design nontrivial experiments for a simulated system with nonlinear reaction rates.

### Chemical master equation and finite state projection

Stochastic gene expression can be modeled as a discrete state, continuous time Markov process, where different states represent the *N*_{s} species of interest. In a biological context, the species *η* often correspond to gene configurations, RNA or protein abundances. Transitions to state **x**_{i} + *ψ*_{ν} from **x**_{i} occur with probabilities *w*_{ν}(**x**_{i}, *t*)*dt* in an infinitesimal time step of length *dt*, where *w*_{ν} and *ψ*_{ν} are the propensity function and the stoichiometric vector corresponding to reaction *ν* ∈ {1, 2, …, *N*_{r}}. Using the propensity functions and stoichiometry vectors, one can describe the evolution of probability mass for each **x**_{i} using the chemical master equation (CME, [25, 28]) given by:
(1)
By enumerating all possible **x**_{i}, one can define the probability mass vector as **p** = [*p*(**x**_{1};*t*), *p*(**x**_{2};*t*), …]^{T} and reformulate the CME in matrix form as [27].

Many systems described by the CME are not closed, i.e. the vector **p** has infinite dimension. In such cases, most states are extremely rare, and the sum of their corresponding probabilities is negligible. Thus, a natural approximation for the CME is to separate it into two exhaustive and disjoint sets, **X**_{J} and **X**_{J′}, with **X**_{J} being a finite set and **X**_{J′} being a set of low probability states. Defining **p**_{J}(*t*) ≡ *p*(**X**_{J}; *t*), the CME can be reordered and written as:
(2)
The finite state projection (FSP) approach [27], obtains an approximation of **p**_{J}(*t*) for finite times by replacing the set of states **X**_{J′}(*t*) with an absorbing sink state whose probability mass is *g*(*t*),
(3)
The FSP provides the exact total error of this approximation for all states in **X**_{J} and **X**_{J′} as:
(4)
where the |.|_{1} denotes the absolute sum of the vector [24, 27]. The FSP solution is also guaranteed to be a lower bound on the true solution [24, 27],
(5)
For simplicity, we will hereafter refer to the approximated states **p**_{FSP}(*t*) as **p**(*t*) and the corresponding matrix **A**_{JJ} as **A**. Next, we derive the likelihood function for FSP models and single-cell data.

### The FSP enables computation of the likelihood of single-cell data

A common task in single-cell analyses is to analyze snapshot measurements of independent cell populations, such as those collected using single-molecule fluorescent in-situ hybridization (smFISH) [22, 23]. For such measurements, cells are fixed in the process of quantifying their RNA, and individual cells cannot be tracked over time. However, snapshots can be collected at different points in time to quantify a population’s response to changing conditions [2, 29, 30]. For such experiments, we assume that measurements at all time points {*t*_{k}} are independent. The measured RNA counts for *N*_{s} different labeled species for each of *N*_{c} individual cells at time *t* can be collected into the data matrix . We define *L*(**D**;** θ**) as the likelihood that all measured data come from a model parameterized by

**= [**

*θ**θ*

_{1},

*θ*

_{2}, …,

*θ*

_{k}].

For FSP models, the likelihood and its logarithm for *N*_{c} measured cells can be written directly as:
(6)
(7)
A common task in systems biology is to estimate parameters that maximize the likelihood that data could have come from a given model, and this form of the likelihood function has been used multiple times to estimate parameters from single-cell data [2, 6, 21, 24, 31, 32]. In addition to estimating parameters from data, the likelihood function can also be used to estimate the sensitivity of parameter estimates to sampling errors in the experimental measurements, which can in turn be used to design better experiments. In the following sections, we will use this fact to derive the FIM for FSP models.

### Derivation of the Fisher information for FSP Models

The FIM, which describes the amount of information that can be expected by performing a particular experiment with *N*_{c} cells, is defined as
(8)
where the expectation is taken over *p*(**X**; ** θ**), corresponding to the density from which future (or hypothetical) data could be sampled. For FSP models, this density is the discrete distribution found by solving Eq 3. Eq 8 is positive semi-definite and is additive for collections of independent observations [10]. The inverse of the FIM is known as the Cramèr-Rao bound (CRB), which provides a useful lower bound on the variance for any unbiased estimator of model parameters [11]. The notion of information stems from the fact that new experiments should increase the FIM, corresponding to additional knowledge about

**and a tighter CRB. More specifically, the well-known asymptotic normality of the maximum likelihood estimator (MLE) states that as the number of measurements**

*θ**N*

_{c}increases, the MLE estimates will converge in distribution to a multivariate normal probability density with a variance given by the CRB, (9) where is the

**that maximizes Eq 6 and**

*θ**** are the “true” model parameters that produced the observed data [10, 11]. Designing experiments to maximize a given metric of the FIM can be expected to provide a more accurate estimate of**

*θ***, where different definitions of ‘accuracy’ (i.e., different vector norms for parameter errors) can be implemented through the choice of different FIM metrics.**

*θ*To derive the FIM requires one must take the partial derivative of the log-likelihood (Eq 7) with respect to the parameters ** θ**,
(10)
The expression ∇

_{θ}

*p*(

**X**;

**) is the**

*θ**sensitivity matrix*,

**S**, which has dimensions

*N*×

*N*

_{θ}, where

*N*is the dimension of the CME or its FSP projection. As described in the Materials and Methods, we derive an equation similar to that presented in [33] to define the time evolution of the sensitivity for each state’s probability density,

*p*(

**x**

_{l};

**), to each parameter**

*θ**θ*

_{j}. However, unlike previous analyses that rely on stochastic simulations and finite difference approaches, the FSP enables direct approximation of the sensitivities. Using the sensitivity matrix, the entries of the FIM can be computed as: (11) Taking the expectation over all

*l*on (1,

*N*) yields the elements of the FIM: (12) which quantifies Fisher information for the model evaluated at a single time point. For smFISH data, each time point is independent. If

*N*

_{c}(

*t*

_{k}) cells are measured at each

*k*

^{th}time point, the FIM is summed, and the total information is computed as: (13)

The Fisher information can be found using Eq 13 for any model for which the FSP (Eq 3) can be solved. This formulation explicitly quantifies how the number of cells and number of time points impact the information, and is easily extended to include other experiment design aspects such as the interval of successive measurements or changes in applied inputs, as we will demonstrate in the following sections. Because one is often interested in the relative sensitivity of parameters rather than the absolute sensitivity, a logarithmic parameterization of the FIM can easily be obtained from Eq 13 by multiplying by the corresponding entries of ** θ** (see supplemental information for full details),
(14)

In the following sections, we will verify the FIM using several common models of gene expression, and demonstrate experiment designs using these approaches.

## Results

### The FSP-FIM captures the exact information for constitutive gene expression

To demonstrate and validate the FSP-FIM method, we begin with a simple birth and death model for constitutive gene expression as shown in Fig 1. This model, which has been fit to capture the variability for many housekeeping genes [1, 20], consists of two reactions, corresponding to the constant transcription and first order decay of RNA,
The production and degradation parameters are defined as ** θ** = [

*k*

_{r},

*γ*].

The Fisher information for the two model parameters *k*_{r} (a) and *γ* (b) for various values of the mean expression level, λ. The analytical form of the FIM for a Gaussian approximation and that computed using Eq 37 (purple line) match to one another. The value computed using the FSP-FIM (blue) matches to the exact form of the analytical Poisson distribution (orange dashed). As λ becomes large, all four approaches are consistent.

Given an initial condition of zero RNA for this process, the population of RNA at any later time is a random integer sampled from a Poisson distribution,
(15)
where λ is the time varying average population size,
(16)
We have chosen the constitutive gene expression model to verify the FSP-FIM because the exact solution for the Fisher information for Poisson fluctuations can be derived in terms of λ as [10]:
(17)
For convenience, the derivation of Eq 17 is included in the supplementary text. Fig 1 shows the exact value of Fisher information (orange) versus the mean expression level for the two parameters *k*_{r} and *γ*. Fig 1 also shows that the FSP-FIM (blue) matches the exact solution for the information on both parameters at all expression levels, which verifies the FSP-FIM for this known analytical form.

Having demonstrated that the FSP-FIM matches to the exact solution, it is instructive to compare how well the previous LNA-FIM and SM-FIM estimates match to the exact FIM computation. For the Poisson distribution, the mean and variance are both equal to λ. Using this fact, the FIM can be approximated using the LNA-FIM for normal distributions (see Eq 37 in the Materials and methods). This expression, which is derived in the supplementary text, reduces to (18) when both the mean and variance are λ. As λ becomes large, the Poisson distribution becomes well approximated by a normal distribution [11]. Eqs 17 and 18 show that for this limit of large λ, the first term in Eq 18 dominates, and reduces to , yielding nearly equivalent values for the expected information. However at low mean expression λ ≤ 1, the strictly positive Poisson and the symmetric Gaussian distributions are less similar, and the Gaussian approximation predicts more information than is actually possible given the exact Poisson distribution. These trends are shown in Fig 1, where the LNA-FIM approach only matches to the exact solution at high expression levels (compare orange and purple lines). For this example, the sample-moments based FIM (SM-FIM) is exact and matches to the analytical and FSP-FIM solutions at all expression levels [9].

### The FSP-FIM matches the simulated information for bursting gene expression

Next, we consider a slightly more complicated model of bursting gene expression, in which a single gene undergoes stochastic transitions between active and inactive states with rates *k*_{on} and *k*_{off}. This switching model, which is depicted in Fig 2(a), has been studied in detail [20, 34–40], and it has been used to capture single-cell smFISH measurements in mammalian cells [30, 37], yeast cells [2, 36], and bacterial cells [29]. When active, the gene transcribes RNA with constant rate *k*_{r} and these RNA degrade in a first order reaction with rate *γ*. The four reactions of the system are:
(19)
(20)
(21)
(22)
For the examples below, we use the baseline parameters given by: *k*_{on} = 0.05*α* min^{−1}, *k*_{off} = 0.15*α* min^{−1}, *k*_{r} = 5.0 min^{−1}, and *γ* = 0.05 min^{−1}. In particular, the mRNA degradation rate, which sets the overall time-scale, was chosen to be representative of the average decay times (approximately 20 minutes) for mRNA in yeast [41].

(a) Schematic of the standard bursting gene expression model. Parameters are defined as given in the text to yield an “on” fraction of 0.25 and a mean expression of 25 mRNA per cell. (b) At slow switching rates, unique “on” and “off” modes are apparent, and distributions of molecule numbers are bimodal. (c) For intermediate switching rates, the distributions are geometric. (d) At high switching rates, the distributions are nearly Poisson (d). For each switch rate scale (labeled I, II, or III), the distribution of RNA computed with the FSP (blue) is compared to a Gaussian with the same mean and variance (purple).

For the bursting gene expression model, rescaling the transition rates *k*_{on} and *k*_{off} by a common factor does not affect the mean expression level, because the fraction of time spent in the active state remains unchanged. This fraction can be written
(23)
and is the same for any *α* > 0. For the parameters given above, the average expression at steady state is given by *k*_{r} *f*_{on}/*γ* = 25. However, rescaling the transition rates does change the shape of the distribution as shown in Fig 2(b)–2(d) [20]. When switching is slow, the gene stays in the “on” and “off” states long enough to observe individual high and low peaks corresponding to the “on” and “off” states, as in shown in Fig 2(b). However, for intermediate switching rates, the gene does not spend enough time in the “off” state for bursts to decay or enough time in the “on” state for large populations to accumulate (see Fig 2(c)). At fast switching rates the “on” and “off” states come to a fast quasi-equilibrium, and the time-averaged system approaches a Poisson process, where the effective production rate is *k*_{r} *f*_{on}. For the bursting gene expression model, all moments of the distributions can be computed exactly from Eq 35 in the Materials and Methods section, even when the RNA distributions are highly non-Gaussian [42].

Since the previous example has already verified the accuracy of the FSP-FIM when the expression has a Poisson distribution, we now verify the FSP-FIM for the slow switching case in which the distribution is bimodal (*α* = 0.1). To our knowledge an exact FIM solution is not known for the bursting gene expression model, so we evaluate the different FIM approximations by finding the sampling distribution of the MLE, and we compare the covariance of this distribution to the inverse of the FIM [11]. To do this, we sample from *p*(**X**; *t*, ** θ***) under reference parameter set

*** to generate 200 simulated data sets, each with independent RNA measurements for 1,000 cells. We then allow**

*θ**k*

_{off}and

*k*

_{r}to be free parameters, and we find for each of the 200 data sets. Fig 3 compares the 95% confidence intervals found by taking the inverse of the FIM and through MLE estimation using simulated data for the FSP likelihood (Eq 6) shown in Fig 3(a), the LNA-based likelihood (Eq 36 in the Methods section) shown in Fig 3(b), and the SM-based likelihood (Eq 36 in the Methods section, Supplementary Eq. 10) shown in Fig 3(c). Fig 3(a) shows that the CRB predicted by the FSP-FIM matches almost perfectly to the confidence intervals determined by MLE analysis of independent data sets. S3 Fig (left column) shows that this estimate is consistently accurate over multiple different experiment designs. In contrast, the LNA-FIM dramatically overestimates the information and predicts confidence intervals that are much smaller than are actually possible (Fig 3(b) and S3 Fig, center column). The SM-FIM does a better job than the LNA in that it matches the MLE analysis for some experimental conditions (Fig 3(c)) but much less well for other conditions (S3 Fig, right column). We note that the three different FIM estimates yield different principle directions and different magnitudes for parameter uncertainty (Fig 3(d)), but in all cases the FSP-MLE matches to the FSP-FIM and results in the tightest MLE estimation.

The inverse of the FIM is a lower bound on the variance of the MLE estimator. Here, we simulate 200 data sets with 1,000 cells in each data set. We then find the MLE (scatter plots) for each, and compare the covariance of these samples to the inverse of the FIM for the (a) FSP-, (b) LNA-, and (c) SM-FIM approaches. Panel (d) shows the FIM matrices for all approximations on the same axes. Simulated data were generated using the parameters given in the main text and at 10 time points evenly distributed between 0 and 200 minutes.

Having verified the FSP-FIM for the bursting gene expression model with multiple parameter sets, we next explore how the information changes as a function of the system parameters. Fig 4 shows the determinant of the FIM (also known as the D-optimality or information density) for the bursting gene expression model as a function of the switch rate scaling factor, *α*, using the LNA-FIM (purple), SM-FIM (green) and FSP-FIM (blue) approximations. In the limit of fast switching (i.e. *α* → ∞), the expected information converges to nearly the same value for all approaches, as expected for a Poisson distribution with high effective population size of λ = 25 RNA. However, in the non-Gaussian regimes with slow switch rates, the LNA-FIM over-estimates and SM-FIM under-estimates the information compared to the verified FSP-FIM approach. We note that these differences arise despite the fact that the bursting gene expression model has linear propensity functions, which allows for closed and exact computation of the statistical moments.

The determinant FIM for the LNA-FIM (purple), FSP-FIM (blue), and SM-FIM (green) as a function of the gene switching rate scale, *α*. Labels I, II, III correspond to the switch rates for which distributions are plotted in Fig 2(a)–2(c). Parameters are given in the main text and data are assumed to be collected at 10 equally separated time points between 0 and 200 minutes.

### The FSP-FIM can design more informative single-cell experiments

Next, having verified the FSP-FIM for its ability to accurately estimate the FIM for different parameter sets, we explore the use of the FSP-FIM to design experiments that maximize information. Specifically, we will use classical FIM-based experiment design approaches to choose single-cell experiments first for the bursting gene expression model above, and then for a nonlinear toggle model for which moments can no longer be computed exactly. We consider two different metrics of the FIM, which are frequently used in model-driven experiment design [9, 12]. The first of these is E-optimality (presented in the main figures), which corresponds to the smallest eigenvalue of the FIM. By finding the experiment which maximizes this eigenvalue, the information is increased in the principle direction of parameter space in which the least information is known (i.e. the parameter uncertainty is highest). The second FIM criteria is D-optimality (presented in supplemental figures), which corresponds to the determinant of the FIM. By maximizing the determinant of the FIM over the experiment design space, one finds an experiment which minimizes the volume of the uncertainty in parameter space. We note that many other experimental design criteria are possible, and the choice of criteria depends on what one desires to learn about the system.

#### Optimizing the sampling rate for bursting gene expression.

Our first demonstration of FSP-FIM based experiment design is to select the optimal single-cell sampling period with which to identify the parameters of the bursting gene expression model. For this, we have chosen to analyze E-optimality criteria, which seeks to maximize the smallest eigenvalue of the FIM. We consider a potential experiment design space consisting of 60 logarithmically distributed sampling periods Δ*t* from 2 × 10^{−2} minutes and 7×10^{2} minutes. For each sampling period, a total of five evenly spaced temporal measurements would be taken. Fig 5(a) compares the information expected versus the sampling period using the different FIM approximations: LNA-FIM (purple), SM-FIM (green) and FSP-FIM (blue). For each potential experiment, we then simulate 200 data sets for 1,000 cells each by sampling *p*(**X**;*t*, ** θ***), use Eq 7 to find the MLE parameter estimate for each data set, and then compute the covariance matrix from the MLE parameter sets. This covariance matrix is inverted, and its minimum eigenvalues are depicted as orange triangles in Fig 5(a). Fig 5(b) also shows a scatterplot to compare the relationship between the MLE-observed information and the predicted information for all FIM approaches. Once again, the FSP-FIM consistently matches the observed E-optimality at all experimental conditions. However, the LNA approach is much less consistent, sometimes over-estimating and sometimes under-estimating the real information for the different experimental conditions. The SM-FIM consistently underestimates the true information for this example, although it is not clear if this trend would hold for all sets of parameters and experimental conditions.

(a) E-optimality (i.e., smallest eigenvalue of the FIM) for the standard bursting gene expression model versus sampling period, Δ*t*, using FSP-FIM (blue), LNA-FIM (purple), and SM-FIM (green). Maximizing E-optimality corresponds to minimizing variance in the most variable direction of parameter space. The orange triangles show MLE-based confirmation of the E-optimality, using 200 simulated data sets for each sampling period. The green shaded region represents experiments that are feasible using smFISH, from minute resolution [2] to hour resolution [29] (b) Comparison of the FSP-FIM (x-axis) versus the observed information (y-axis) for various sampling periods using the FSP-FIM (blue circles), LNA-FIM (purple squares), and SM-FIM (green crosses). Kinetic parameters are given in the main text.

From Fig 5(a), it is clear that the amount of expected information depends strongly on the sampling period. When the sampling period is much longer than the characteristic time to reach the steady state distribution (Δ*t* ≫ 1/*γ*), the information does not change because all snapshots are already close to steady state. When the sampling period is too short (Δ*t* ≪ 1/*γ*), there is insufficient time for the distributions to change and the information tends to zero. Despite conserving these trends, the three different FIM analyses result in substantially different optimal experiments for the E-optimality design criteria. Using the FSP-FIM, the optimal experiment is Δ*t* = 6.1 minutes, which we verified using the MLE sampling approach (compare orange triangles and blue line in Fig 5(a)). This optimal design is well-aligned with smFISH experimental technique, which can capture cell populations with one minute resolution [2] to one hour resolution [29]. However, the LNA-FIM selects a much faster sampling period of Δ*t* = 1.1 minutes, and the SM-FIM selects a much slower sampling period of Δ*t* = 420 minutes. Thus, the FSP-FIM not only provides more information compared to moments-based approaches, but it also provides a better estimate of the expected information. In turn, these improved estimates can help to avoid potentially misleading experiments and select optimal designs.

#### The FSP-FIM accurately estimates information for systems with nonlinearities and bimodal responses.

To demonstrate the utility of the FSP-FIM approach for models with nonlinear reaction propensities and multiple species, we turn to the toggle model first introduced by Gardner et al [43], with a stochastic formulation by Tian and Burrage [44]. Fig 6(a) shows a schematic of the toggle model, which consists of two mutually repressing proteins, *x* ≡ LacI and y ≡ λcI, where the production of each species depends non-linearly on the concentration of its competitor. The reactions in the toggle model can be written
(24)
(25)
where
(26)
(27)
In this formulation, we have assumed that the degradation of λ*cI* is controlled by an ultraviolet (UV) radiation through the light-induced circuit described by Kobayashi et al [45]. Similar to [46], we assume that the UV level affects the degradation of λcI according to the function:
(28)
where the minimum degradation rate has been chosen to match dilution due to the *E. coli* half life of 30 min [46]. The remaining parameters used for this example are given by ** θ*** in Table 1. The system’s initial condition at

*t*= 0 is assumed to be the equilibrium distribution when no UV is applied. For this biological system and these parameters, different levels of UV radiation will give rise to different dynamics. At low levels of radiation, switching to the high LacI state is rare, and the distribution tends to have a single peak. At intermediate levels of radiation, switching between low and high levels of LacI expression is possible, and LacI distributions may be bimodal. Finally, at high levels of radiation, the system very quickly switches into the high LacI state.

(a) Model schematic of the two genes, *lacI* and λcI, which are mutually repressing [43]. Degradation of λcI is controlled by UV radiation. (b) Verification of the FSP-FIM (black ellipse) for 200 MLE estimates of 1,000 cells each (black dots) for two free model parameters, *α*_{xy} and *b*_{y}.

** θ*** is the “true” parameter set from which data is generated, and is the MLE parameter set fit to a baseline data set generated assuming 0 UV (see S5 Fig for further discussion). Here,

*N*is used to denote the units of single-molecules.

Because this model has complex nonlinear propensity functions, the statistical moments cannot be calculated in closed form, and the LNA-FIM and SM-FIM estimates are no longer expected to provide accurate estimates for information or optimal experiment designs. In contrast, the FSP analysis remains unchanged, and the FSP-FIM can be computed exactly as above. As before, we verify the FSP-FIM for this nonlinear case using a set of 200 simulated data sets measured at 1 hr, 4 hr, and 8 hr, each with 1,000 cells, and we found MLE parameter estimates for each simulated data set. Fig 7(a) shows this verification in a simple case with two free parameters, *b*_{y} and *α*_{xy}, and S4 Fig shows the verification where all parameters free except for Hill coefficients *η*_{xy} and *η*_{yx}. In this and all subsequent analysis of the toggle model, we have used the logarithmic parameterization of the FIM (Eq 14).

(a) Degradation rate of λcI is controlled by UV as shown in Fig 6(a). The magnitude and duration (*β*) of UV exposure are free experiment design parameters, along with the time between measurements Δ*t*. (b) E-optimality (the smallest eigenvalue of the FIM) versus the 3-dimensional experiment design space, where the FIM is computed using (b) the reference parameter set, (c) by averaging the E-optimality over 100 unique parameter sets and (d) using the “true” parameter values. The black circle is the optimal design chosen according to (c). The black triangle denotes a nearby, but less informative, experiment. (e) For the experiments corresponding to the black circle and triangle in (b-d), E-optimality values are shown for each sampled parameter set.

Next, we aim to design more complex experiments for the toggle model described above. We consider an experiment design space where the measurement sampling period (Δ*t*), pulse duration (*β*), and pulse magnitude (UV) can all be changed, as illustrated in Fig 7(a). Each pulse of UV starts at *t* = 1 hr. We then compute the FSP-FIM for each experiment {UV, *β*, Δ*t*}.

To capture the more realistic situation where parameters are unknown prior to experimentation, we next explore how parameter uncertainty affects the estimation of the FIM and the design of optimal experiments. To begin, we assume that parameters have been partially estimated from a simple initial experiment corresponding to measurements of the unperturbed steady state at zero UV input to the system. In practice, similar preliminary parameter estimates could be acquired from literature, from previous less-optimized experiments, or by comparison to related pathways or organisms. For our analysis, the prior estimate for parameters is described by a multivariate lognormal distribution with a geometric mean of given in Table 1 and covariances given in S1 Table. Parameters sampled from this distribution are substantially different from the “true” parameter, ** θ***, which is also shown in Table 1. Fig 7(b) shows the E-optimality criteria for parameter set as a function of the experiment design parameters {UV,

*β*, Δ

*t*}. Next, we sampled 100 random sets of parameters from the prior distribution (S5 Fig), and we computed the E-optimality for each set. Fig 7(c) presents expected information versus experiment design averaged over these 100 parameter sets. For comparison, Fig 7(d) shows the information versus experiment designs if one had exact knowledge of the true parameters.

From Fig 7(b)–7(d), we observe that relative estimates of the FIM remain consistent despite substantial changes to the parameters from which the FIM is computed. To explore this observation more closely, we selected the experiment that maximizes the averaged E-optimality in Fig 7(c). This experiment is denoted by a black circle in Fig 7(b)–7(d), and we compare it to another similar experiment design, shown by the black triangle in Fig 7(b)–7(d). S6 Fig shows the expected parameter uncertainty for these two designs and shows that the optimal experiment reduces variance in some parameter directions by more than an order of magnitude compared to the sub-optimal experiment. To explore how different parameters change the ranking of these two experiments, we analyze the ranking of Experiment A and Experiment B not only based on their average E-optimality value as in Fig 7(c), but at each of 100 random parameter combinations. Fig 7(e) shows that for 97 of the 100 parameter samples, the relative ranking of the experiments is consistent, even though the absolute value of the E-optimality criteria varies over several orders of magnitude.

The analysis shown in Fig 7 assumes a fixed initial distribution at *t* = 0, which was specified by the steady state distribution under the true parameters in the absence of UV radiation. Under this assumption, the initial sensitivity matrix *S*(0) in Eq 34 was set to zero. S7 Fig extends the analysis to compute the initial sensitivity at steady state, which slightly increases the estimate of information for the early time points, but has relatively little effect on the choice of optimal experiment design.

We next seek to understand how optimal experiments depend on one’s plans to perform multiple experiments. The “single experiment” in Table 2 refers to designing a single experiment, , to maximize the expected FIM design criteria, such as finding the maximal combination in Fig 7(c). The “dual greedy” approach also chooses the same and then seeks to find the most complementary additional experiment, , to maximize the overall FIM design criteria. Finally, the “dual simultaneous” search finds the optimal combination of any two possible experiments, and to maximize the design criteria. Since the optimal choice for and can admit the other choices, it must yield at least as high a design criteria as and . By comparing the three design strategies for the current toggle model, we find indeed that the simultaneous approach discovers a substantially more informative experiment than does the greedy approach. In other words, the overall expected value of an experiment, can depend not only on the current parameter values, but also upon which other experiments one intends to conduct. If one has plans to do multiple experiments, it may be better to consider the potential information from all experiments as a whole rather than to design each experiment one at a time.

## Discussion

Fluctuations in biological systems complicate modeling by introducing substantial variability in gene expression among individual cells within a homogeneous population. This variability contains valuable and quantifiable insights [20], but data with multiple peaks and long tails, such as those collected using smFISH, are often poorly described by modeling approaches that only make use of low-order moments of such distributions [26]. The FSP approach [27] has previously been used to identify and predict gene expression dynamics for complex and rich single-molecule, single-cell data [2, 29, 30]. In this work, we have developed the FSP-based Fisher information matrix, which extends the FSP analysis to allow rigorous design of experiments that are optimally informative about the model’s parameters.

The FSP-FIM uses a novel sensitivity analysis, which requires solving a system of ODEs that is twice the size of the FSP dimension for each parameter, and therefore should be computationally tractable for any problem to which the FSP can be applied. The local sensitivity of each parameter is independent of the other parameters, so the computation is easily parallelized among multiple processors. We verified that the FSP-FIM approach matches the information for the constitutive gene expression model, whose response follows a Poisson distribution (Fig 1), and for which the FIM can be computed exactly. The FSP-FIM also matches to classical FIM approaches that assume normally distributed data (LNA-FIM) or very large data sets (SM-FIM) in the limiting case when the data distributions are close to being Gaussian (Figs 1–4). For systems where data is not Gaussian and for which there is no exact FIM formula, we showed that the FSP-FIM is more accurate than traditional approaches (Figs 4 and 5), which we validated by generating many independent data sets and comparing the inverse of the FSP-FIM to the variance in the MLE estimates (Figs 3 and 6).

We showed that the choice of FIM analysis can lead to different optimal experiment designs (Fig 5). For example, Fig 5 and S3 Fig show that the LNA-FIM can substantially overestimate the information of certain experiments compared to the full, correct information obtain using the FSP-FIM, which could mislead researchers to choose experiment designs that are much worse than they expect. In practice, overestimation of the Fisher information can have the further deleterious effect of overconfidence in poor parameter estimates, which can result in model bias and poor predictions as we observed recently in [26]. Furthermore, the LNA-FIM is not self-consistent, and often overestimates the information even compared to the ellipse found from sampling the MLE with the Gaussian likelihood function. On the other hand, we found that the SM-FIM under-estimated the information for the bursting gene model, but the amount of underestimation varied substantially with experimental conditions, which could cause researchers to reject otherwise informative experiments. In contrast to these moment-based approaches, the MLE sampling using the FSP approach always provided the best parameter estimates (Fig 3 and S3 Fig), and the FSP-FIM was always consistent with the confidence intervals verified by sampling (Figs 1, 3 and 5, S1–S3 Figs), even for the case of nonlinear reaction propensities for which exact moments cannot be found (Fig 6(a), and S4 Fig).

In our analysis of the non-linear toggle model, we allowed for the independent control of three experimental variables (Fig 7a), and found experiments that optimize particular criteria of the FIM. Furthermore, we showed that other experiments very near to the optimal experiment in the design space can be significantly less informative than the optimal experiment (Fig 7(b)–7(e) and S6 Fig). Choosing between such similar experiment designs is non-trivial and would be difficult or impossible using intuition alone. On the other hand, we explored the effects of parameter uncertainty on FSP-FIM-based experiment design, and we found that parameter rankings are relatively robust to parameter uncertainty, even when the absolute value of the FSP-FIM is sensitive (Fig 7).

We found that that the choice of optimal experiments depends on the number of experiments to be completed (Table 2). For example, the optimal set of two experiments may not contain the optimal single experiment. Sometimes, the FIM for a given experiment may be singular or nearly singular, indicating that the model under investigation is unidentifiable for the current parameterization and experiment design. In such a case, the FIM-eigenvectors corresponding to the near-zero eigenvalues indicate specific linear combinations of parameters that are poorly constrained (e.g., ‘sloppy’ directions [47]). If a second complementary experiment can shift the orientation of these sloppy vectors, then those parameters may yet be uncovered through combinations of multiple experiments. Alternatively, if a given combination of parameters remains unidentifiable for all admissible experiments, then these irrevocably sloppy directions may be used to reformulate the model into one that has a reduced set of fully identifiable parameters. We note that as one conducts new experiments and collects new data, parameter posteriors will need to be updated. As this occurs, optimal experiments may also need to be adjusted (e.g., through application of a Bayesian experiment design framework [48]), and future developments are needed to incorporate FSP-FIM computations within such iterative frameworks.

Our results show that the FSP-FIM performs better than previous approaches for gene regulation models with low molecule counts or nonlinear reaction rates. Previous studies have demonstrated many realistic systems for which such FSP can be used to identify and predict stochastic dynamics in numerous biological systems [2, 6, 19, 26, 29–32, 49]. Each of these studies has used different experimental input signals, such as temporal salinity profiles [2, 26], temperature [29], or chemical induction [19, 30]. Modern optogenetic experiments promise to allow for even more robust and flexible control of input signals to control cellular behavior [7, 50, 51]. For such studies, the FSP-FIM could now be used to optimize these signals to achieve maximally informative experiments.

Like any other tool, the FSP-FIM also has its associated challenges. Our initial investigations focused on intrinsic stochastic fluctuations of small biochemical processes, and we used simulated data to verify our new computational tools. For models with large molecular counts of four or more species or with the addition of mechanisms to account for extrinsic variability, existing methods to solve the FSP-FIM will remain intractable until more efficient probability density based CME analyses can be developed to address such problems [52–56]. Until higher dimension CME approaches are developed, approximate moment-based experiment design methods, such as the SM-FIM and LNA-FIM, may remain the only available options to design experiments for large biochemical pathways. We also note that real experiments come with additional sources of noise, such as the errors or uncertainties associated with experimental measurements. For example, in smFISH data analysis, image processing settings give rise to variability in final RNA counts due to density dependent co-localization of RNA molecules. This measurement uncertainty may have a non-negligible effect on parameter inference, and future controlled experiments are needed to elucidate the degree to which such effects depend on optical imaging settings. Fortunately, such variabilities are easily incorporated within the framework of the FSP analysis. For example, previous work has used a simple linear transformation to adapt FSP analyses to include the effects of noisy GFP fluorescence emission and background autofluorescence when comparing integer-valued biochemical models to flow cytometry data in arbitrary continuous units of fluorescence [19]. Once adapted to take these transformations into account, the FSP-FIM could be used to design experiments to minimize the effects of measurement noise.

New experimental capabilities are creating an enormous potential to probe single-cell biological responses. These capabilities are making it ever more difficult to choose what species in the system to measure, whether to measure joint distributions (i.e. measure the RNA counts from multiple genes in the same cells) or marginal distributions (only measure RNA counts from a single gene at a time), or in what condition. Furthermore, different experiments have different costs, and the experimentalists must not only optimize their information about model parameters, but also consider the trade-off between collecting more data and the cost of a given experiment. By providing a new computational tool to iteratively improve models and design experiments for an important class of biological problems, the FSP-FIM will help to improve quantitative predictive modeling of gene expression.

## Materials and methods

### Derivation of sensitivities for FSP models

The change of probability *p*(**x**_{l}) with respect to small changes in parameter *θ*_{j} describes the sensitivity of the *l*^{th} state in the Markov process to the *j*^{th} parameter [33, 57]. These local sensitivities can be calculated by transforming the linear ODEs describing the time evolution of the probabilities of the state space into a set of ODEs describing the time evolution of the sensitivities. Given an initial condition, the solution to the CME is:
(29)
Taking partial derivatives with respect to ** θ**,
(30)
We can now describe the sensitivities

**S**≡ ∇

_{θ}

**p**as they evolve with time, by taking the time derivative of the equation above. For the FSP, the right-hand side

*f*(

**p**(

*t*;

**),**

*θ***,**

*θ**t*) =

**A**(

**,**

*θ**t*)

**p**(

*t*), and (31) (32) In many cases, including all models formulated using mass-action kinetics, the generator

**A**can be written as a linear combination of the model parameters, i.e.

**A**= ∑

*θ*

_{i}

**B**

_{i}, and the derivative with respect to the

*i*

^{th}parameter can be found, (33) Using this notation, Eq 30 is reduced to the set of linear ODEs for each parameter

*θ*_{i}, (34) In practice, Eq 34 can be computed in parallel for each parameter, and the computation of sensitivities for

*K*parameters is equivalent to solving

*K*sparse systems of ODEs, each twice the size of the FSP computation.

### Moment-based FIM approximations

Current state-of-the-art approaches for single-cell, single-molecule experiment design rely on computing moments of the CME. Such statistical moments may be computed exactly for systems with affine-linear propensities [42]. The uncentered moments of the CME, , where is a vector of integers corresponding to the power of the species in **x**, and the entire moment **x**^{m} is found according to the following formula:
(35)

In the limit of large numbers of molecules reacting in a well-mixed solution, the linear noise approximation (LNA) may be applied to CME [25]. In such cases, molecule numbers are considered to be Gaussian, and the well-known Gaussian form of the FIM may be applied [8]. If the observed data is assumed to come from a multivariate Gaussian distribution with means and covariance matrix **Σ**(*t*;** θ**), such as those in Eq 35, the likelihood is given by:
(36)
and the FIM is well-known [10, 11]
(37)

Another approach, developed in [9] is to use a likelihood function that takes the sample mean and sample variance to be jointly Gaussian, and thus requires the computation of up to the 4th moments in Eq 35. In the supplement, we reproduce the formulae from [9] relevant to this study.

## Supporting information

### S1 Text. Includes the derivations logarithmic parameterization of the FSP-FIM, central limit theorem approximation to the FIM, and the FIMs for Gaussian and Poisson fluctuations.

The text also includes our methodology for generating and fitting simulated data.

https://doi.org/10.1371/journal.pcbi.1006365.s001

(PDF)

### S1 Fig. Optimal experiment design for the bursting gene expression model using the determinant of the FIM, D-optimality.

https://doi.org/10.1371/journal.pcbi.1006365.s002

(PDF)

### S2 Fig. Optimal experiment design for the bursting gene expression model using E-optimality determined using the logarithmic parameterization of the FIM.

https://doi.org/10.1371/journal.pcbi.1006365.s003

(PDF)

### S3 Fig. Verification of the optimal experiment design for the sampling period Δ*t* of a bursting gene expression model.

https://doi.org/10.1371/journal.pcbi.1006365.s004

(PDF)

### S4 Fig. Verification of the FSP-FIM for the seven free parameters for the toggle model.

https://doi.org/10.1371/journal.pcbi.1006365.s005

(PDF)

### S5 Fig. Sampled parameters for the uncertainty analysis of different experiment designs.

https://doi.org/10.1371/journal.pcbi.1006365.s006

(PDF)

### S6 Fig. Different experiment designs’ effects on parameter uncertainty.

https://doi.org/10.1371/journal.pcbi.1006365.s007

(PDF)

### S7 Fig. Toggle model experiment design with non-zero initial sensitivities.

https://doi.org/10.1371/journal.pcbi.1006365.s008

(PDF)

### S1 Table. Variance and covariance of the log of each parameter for the toggle model for prior uncertainty in the toggle model.

https://doi.org/10.1371/journal.pcbi.1006365.s009

(PDF)

## References

- 1. Zenklusen D, Larson DR, Singer RH. Single-RNA counting reveals alternative modes of gene expression in yeast. Nature structural & molecular biology. 2008;15(12):1263–1271.
- 2. Neuert G, Munsky B, Tan RZ, Teytelman L, Khammash M, van Oudenaarden A. Systematic identification of signal-activated stochastic gene regulation. Science. 2013;339(6119):584–587. pmid:23372015
- 3. Golding I, Paulsson J, Zawilski SM, Cox EC. Real-time kinetics of gene activity in individual bacteria. Cell. 2005;123(6):1025–1036. pmid:16360033
- 4. Octavio LM, Gedeon K, Maheshri N. Epigenetic and conventional regulation is distributed among activators of FLO11 allowing tuning of population-level heterogeneity in its expression. PLoS genetics. 2009;5(10):e1000673. pmid:19798446
- 5. Zechner C, Ruess J, Krenn P, Pelet S, Peter M, Lygeros J, et al. Moment-based inference predicts bimodality in transient gene expression. Proceedings of the National Academy of Sciences. 2012;109(21):8340–8345.
- 6. Gomez-Schiavon M, Chen LF, West AE, Buchler NE. BayFish: Bayesian inference of transcription dynamics from population snapshots of single-molecule RNA FISH in single cells. Genome biology. 2017;18(1):164. pmid:28870226
- 7. Baumschlager A, Aoki SK, Khammash M. Dynamic Blue Light-Inducible T7 RNA Polymerases (Opto-T7RNAPs) for Precise Spatiotemporal Gene Expression Control. ACS synthetic biology. 2017;6(11):2157–2167. pmid:29045151
- 8. Komorowski M, Costa MJ, Rand DA, Stumpf MPH. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(21):8645–8650. pmid:21551095
- 9. Ruess J, Milias-Argeitis A, Lygeros J. Designing experiments to understand the variability in biochemical reaction networks. Journal of The Royal Society Interface. 2013;10(88).
- 10.
Kay SM. Fundamentals of Statistical Signal Processing: Estimation Theory. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.; 1993.
- 11.
Casella G, Berger RL. Statistical inference. Pacific Grove, CA: Wadsworth and Brooks/Cole; 1990.
- 12. Kreutz C, Timmer J. Systems biology: experimental design. The FEBS Journal. 2009;276(4):923–942. pmid:19215298
- 13. Steiert B, Raue A, Timmer J, Kreutz C. Experimental Design for Parameter Estimation of Gene Regulatory Networks. PloS one. 2012;7(7):e40052. pmid:22815723
- 14. Ruess J, Parise F, Milias-Argeitis A, Khammash M, Lygeros J. Iterative experiment design guides the characterization of a light-inducible gene expression circuit. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(26):8148–8153. pmid:26085136
- 15. Zimmer C. Experimental design for stochastic models of nonlinear signaling pathways using an interval-wise linear noise approximation and state estimation. PloS one. 2016;11(9):e0159902. pmid:27583802
- 16. Vallisneri M. Use and abuse of the Fisher information matrix in the assessment of gravitational-wave parameter-estimation prospects. Physical Review D. 2008;77(4).
- 17. Frehlich R. Cramer-Rao bound for Gaussian random processes and applications to radar processing of atmospheric signals. IEEE Transactions on Geosciences and Remote Sensing. 1993;31(6):1123–1131.
- 18. Shechtman Y, Sahl SJ, Backer AS, Moerner WE. Optimal point spread function design for 3D imaging. Physical review letters. 2014;113(13):133902. pmid:25302889
- 19. Munsky B, Trinh B, Khammash M. Listening to the noise: random fluctuations reveal gene network parameters. Molecular Systems Biology. 2009;5(318):318. pmid:19888213
- 20. Munsky B, Neuert G, van Oudenaarden A. Using gene expression noise to understand gene regulation. Science (New York, NY). 2012;336(6078):183–187.
- 21. Fox Z, Neuert G, Munsky B. Finite state projection based bounds to compare chemical master equation models using single-cell data. Journal of Chemical Physics. 2016;145. pmid:27544081
- 22. Femino AM, Fay FS, Fogarty K, Singer RH. Visualization of single RNA transcripts in situ. Science. 1998;280(5363):585–590. pmid:9554849
- 23. Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A, Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods. 2008;5(10):877–879. pmid:18806792
- 24. Munsky B, Fox Z, Neuert G. Integrating single-molecule experiments and discrete stochastic models to understand heterogeneous gene transcription dynamics. Methods. 2015;. pmid:26079925
- 25.
Van Kampen NG, Godfried N. Stochastic processes in physics and chemistry. Elsevier; 1992.
- 26. Munsky B, Li G, Fox ZR, Shepherd DP, Neuert G. Distribution shapes govern the discovery of predictive models for gene regulation. Proceedings of the National Academy of Sciences. 2018;.
- 27. Munsky B, Khammash M. The finite state projection algorithm for the solution of the chemical master equation. The Journal of Chemical Physics. 2006;124(4):044104. pmid:16460146
- 28. McQuarrie DA. Stochastic Approach to Chemical Kinetics. Journal of Applied Probability. 1967;4(3):413.
- 29. Shepherd DP, Li N, Micheva-Viteva SN, Munsky B, Hong-Geller E, Werner JH. Counting small RNA in pathogenic bacteria. Analytical chemistry. 2013;85(10):4938–4943. pmid:23577771
- 30. Senecal A, Munsky B, Proux F, Ly N, Braye FE, Zimmer C, et al. Transcription factors modulate c-Fos transcriptional bursts. Cell reports. 2014;8(1):75–83. pmid:24981864
- 31. Xu H, Skinner SO, Sokac AM, Golding I. Stochastic kinetics of nascent RNA. Physical review letters. 2016;117(12).
- 32. Sepúlveda LA, Xu H, Zhang J, Wang M, Golding I. Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Science. 2016;351(6278):1218–1222. pmid:26965629
- 33. Gunawan R, Cao Y, Petzold L, Doyle FJ. Sensitivity analysis of discrete stochastic systems. Biophysical journal. 2005;88(4):2530–2540. pmid:15695639
- 34. Peccoud J, Ycart B. Markovian modeling of gene-product synthesis. Theoretical Population Biology. 1995;48(2):222–234.
- 35. Kepler TB, Elston TC. Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations. Biophysical journal. 2001;81(6):3116–3136. pmid:11720979
- 36. Raser JM. Control of stochasticity in eukaryotic gene expression. Science. 2004;304(5678):1811–1814. pmid:15166317
- 37. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS biology. 2006;4(10):e309. pmid:17048983
- 38. Shahrezaei V, Swain PS. Analytical distributions for stochastic gene expression. Proceedings of the National Academy of Sciences. 2008;105(45):17256–17261.
- 39. Iyer-Biswas S, Hayot F, Jayaprakash C. Stochasticity of gene products from transcriptional pulsing. Physical Review E. 2009;79(3):2323.
- 40. Golding I. Deciphering the stochastic kinetics of gene regulation. Biophysical journal. 2017;112(3):342a.
- 41. Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO. Precision and functional specificity in mRNA decay. Proceedings of the National Academy of Sciences. 2002;99(9):5860–5865.
- 42. Singh A, Hespanha JP. Approximate moment dynamics for chemically reacting systems. IEEE Transactions on Automatic Control. 2011;56(2):414–418.
- 43. Gardner TS, Cantor CR, Collins JJ. Construction of a Genetic Toggle Switch in Escherichia coli. Nature. 2000;403(6767):339–342. pmid:10659857
- 44. Tian T, Burrage K. Stochastic models for regulatory networks of the genetic toggle switch. Proceedings of the National Academy of Sciences. 2006;103(22):8372–8377.
- 45. Kobayashi H, Kærn M, Araki M, Chung K, Gardner TS, Cantor CR, et al. Programmable cells: interfacing natural and engineered gene networks. Proceedings of the National Academy of Sciences. 2004;101(22):8414–8419.
- 46.
Munsky B. Modeling Cellular Variability. In: Wall ME, editor. Quantitative biology: from molecular to cellular systems. CRC Press; 2012. p. 234–266.
- 47. Gutenkunst R, Waterfall J, Casey F, Brown K, Myers C, Sethna J. Universally sloppy parameter sensitivities in systems biology models. PLoS computational biology. 2007;3(10):1871–1878. pmid:17922568
- 48. Vanlier J, Tiemann CA, Hilbers PAJ, van Riel NAW. A Bayesian approach to targeted experiment design. Bioinformatics. 2012;28(8):1136–1142. pmid:22368245
- 49. Lou C, Stanton B, Chen YJ, Munsky B, Voigt CA. Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nature Biotechnology. 2012;30(11):1137–1142. pmid:23034349
- 50. Rullan M, Benzinger D, Schmidt GW, Milias-Argeitis A, Khammash M. An optogenetic platform for real-time, single-cell interrogation of stochastic transcriptional regulation. Molecular cell. 2018;70(4):745–756. pmid:29775585
- 51. Stewart-Ornstein J, Chen S, Bhatnagar J, Weissman J, El-Samad H. Model-guided optogenetic study of PKA signaling in budding yeast. Molecular Biology of the cell. 2017;28(1). pmid:28035051
- 52. Peles S, Munsky B, Khammash M. Reduction and solution of the chemical master equation using time scale separation and finite state projection. The Journal of chemical physics. 2006;125(20):204104. pmid:17144687
- 53. Munsky B, Khammash M. A multiple time interval finite state projection algorithm for the solution to the chemical master equation. Journal of Computational Physics. 2007;226(1):818–835.
- 54. Munsky B, Khammash M. Transient analysis of stochastic switches and trajectories with applications to gene regulatory networks. IET systems biology. 2008;2(5):323–333. pmid:19045827
- 55.
Munsky B, Tapia JJ, Faeder J. Adaptive coarse-graining for transient and quasi-equilibrium analyses of stochastic gene regulation. 51st IEEE Conference on Decision and Control (CDC). 2012;.
- 56. Vo HD, Fox ZR, Baetica A, M B, 2018. Bayesian estimation for stochastic gene expression using multifidelity models. biorxiv. 2018;.
- 57. Costanza V, Seinfeld JH. Stochastic sensitivity analysis in chemical kinetics. The Journal of chemical physics. 1981;74(7):3852–3858.