^{1}

^{*}

^{2}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: DT EK. Performed the experiments: DT. Analyzed the data: DT. Wrote the paper: DT.

The ability of systems and synthetic biologists to observe the dynamics of cellular behavior is hampered by the limitations of the sensors, such as fluorescent proteins, available for use in time-lapse microscopy. In this paper, we propose a generalized solution to the problem of estimating the state of a stochastic chemical reaction network from limited sensor information generated by microscopy. We mathematically derive an observer structure for cells growing under time-lapse microscopy and incorporates the effects of cell division in order to estimate the dynamically-changing state of each cell in the colony. Furthermore, the observer can be used to discrimate between models by treating model indices as states whose values do not change with time. We derive necessary and sufficient conditions that specify when stochastic chemical reaction network models, interpreted as continuous-time Markov chains, can be distinguished from each other under both continual and periodic observation. We validate the performance of the observer on the Thattai-van Oudenaarden model of transcription and translation. The observer structure is most effective when the system model is well-parameterized, suggesting potential applications in synthetic biology where standardized biological parts are available. However, further research is necessary to develop computationally tractable approximations to the exact generalized solution presented here.

Developing an understanding of biological phenomena through modeling requires the notion of a state that captures the essential components of the system and a model that describes its essential functions. When a collection of cells is considered in aggregate, measurement noise is usually primarily responsible for complicating the problem of identifying state and model parameters in genetic networks. At the single-cell level, the presence of cellular variability in experimental data

The stochastic phenomenon of systemic noise in individual cells can be detected by observing the variation that occurs during the growth the of isogenic colonies observed using time-lapse microscopy

In the standard mesoscopic formulation of stochastic chemical kinetics

In this study, we propose an observer-based method for estimating system states, estimating parameters, and discriminating between mechanisms from a single colony of cells observed through time-lapse microscopy. We derive equations for calculating the posterior probability distributions for states and parameters from the observation of both a single cell and a complete colony. We derive necessary and sufficient conditions that specify when a set of models can be distinguished from each other using our method. We illustrate our approach by analyzing the Thattai-van Oudenaarden model

We consider a single-celled organism as a single, well-mixed compartment. Consider a reaction network in a chamber that satisfies the standard assumptions of stochastic chemical kinetics

The system is observed at a sequence of time points

The first problem we consider we call the forward problem. The objective of this problem is to the find the

For the idealized case of continual, noise-free observation, we can also describe the dynamic evolution of

In the idealized case, the observed trajectory is a jump process with constant output between jumps. The left-hand equation describes the behavior of the system while the output is continually observed to be

The structure of the “forward observers” uses the “predict-and-update” approach for observers found in control theory, such as the Kalman filter

The expected value taken with respect to the distribution

The forward observer thus computes the probability distribution of the current state of a process while an experiment is on-line. The second problem we consider is the related “backward” problem of finding the

Given the results of the forward observer for discrete observations (Eq. 1), the probability

The names “forward observer” and “backward observer” are taken from the direction of calculation in time; the forward probability is calculated starting at

(a) Inputs to the observer algorithm. (i) The system model consists is a birth-death reaction of a single species

The forward and backward observer algorithms used to determine the state of the cellular process can, with a straightforward modification, also be used to distinguish between a finite set of candidate models of the process. These models can have different reaction structures or they can contain the same set of reaction channels but have differing reaction rates. Suppose that we wish to discrimate between a finite set of

When a single cell grows into a colony of isogenic cells, different daughter cells will produce different sequences of observations due to the inherent stochasticity in both the chemical reaction and the observation process. If the observer algorithms are used for model discrimination on each daughter cell separately, they will produce differing probability distributions over the model space and likely disagreement as to the most likely model.

The final problem we consider is integrating the observations from many different cells that are all descendants of a single ancestral cell. Denote the single ancestral cell by

Denote by

The first factor,

(a) Run the forward observer on each cell, breaking up the lineage so that each segment is only counted once. (b) Run the backward observer on each cell and “pinch” together a cell and its ancestor when the birth of a cell is observed. One forward and backward sweep of the lineage determines the posterior probability distribution

Once the

To find the probability distribution for the state of each cell at times after

When performing an experiment in order to conduct model discrimination, an important question to answer beforehand is to determine whether or not the models are identifiable, i.e., regardless of what outputs are observed, will it be possible to converge on a point estimate in the model space?

Consider a set of models

For a stochastic chemical reaction network with reversible transitions in which the probability that any of the species populations increase without limit is zero, the underlying continuous-time Markov chain is positive recurrent and thus there exists a unique steady-state distribution

For simplicity of presentation, we consider the case where where each state

The output rate

The proof of this theorem is included in Section 2

The Thattai-van Oudernaarden model

We denote this model by

Panels (a) and (b) of

(a) The unobserved mRNA population from a Gillespie SSA run of a colony where cells divide every 20 minutes and the system is observed every minute. (b) The observed protein number from the same Gillespie SSA run. (c) The estimate of the mRNA population in each cell as a function of time generated by the forward observer. (d) The estimate of the mRNA population in each cell as a function of time generated by the backward observer. In this example, we assumed that, when the cell divides, each molecule of mRNA and protein was equally likely to join both daughter cells. Each cell’s ancestor is the cell lineage is indicated by a red vertical line connecting the plot for a daughter cell to that of its mother cell. Brighter shades of green indicates mRNA populations that are more probable.

Note that the output of the backwards observer is continuous with time except at each multiple of 20 minutes when cell division occurs.

We also demonstrate how to use the idealized forward observers for model discrimation. Panel (a) of

We consider three models in this figure:

We consider two alternate models. The first,

A Gillespie simulation from this model is shown in Panel 4(b). The second alternative model,

A Gillespie simulation from this model is shown in Panel 4(c).

These three models all have the same steady-state distribution, so, in order to distinguish them, it is necessary to use transient data. As an example, we assume that the generated trajectories are observed continually and construct the block-diagonal observer matrix for model discrimination. According to the distinguishability condition, all three models are distinguishable from each other. Because the system is observed continually, we use the observer equations from Eq. 2.

Consider the trajectory generated by

The trajectories generated by

A fundamental issue limiting our understanding of the dynamics of cellular networks is that of sensing. Fluorescent proteins, the most commonly used sensors in the laboratory today, have multiple limitations that make their indiscriminate use unadvisable

In light of these issues with the current state-of-the-art in sensing, it is imperative that we develop methods to extract as much information as possible out of the limited measurement techniques we do have available to us. Model-based approaches allow the experimenter to extract additional information and meaning from limited data indirectly through the design of observation algorithms and platforms, but require a reasonable amount of confidence in the accuracy of both the model of the system being studied and the experimental environment in which the measurements are being carried out.

In this paper, we develop a general theoretical method for observing the state of a process inside a single-celled organism based on the assumptions of stochastic chemical kinetics. This algorithm takes as its input a sequence of observations and outputs a probability distribution over the state space or parameter space of the system. We present forward observer algorithms for both discrete and continual observations, which estimate the state of the system using only past data, a backward observer algorithm, that estimates the state using all the collected data, including future data, and a colony algorithm for integrating the different trajectories generated by daughters of the same ancestral cell. For simplicity, in this paper, we presented the algorithm using the notation of finite-state, time-invariant Markov chains. However, the observer approach described here is more generally applicable as long as the system model chosen provides a method of constructing the transition semigroup

The two main limitations of our method are the “curse of dimensionality” and the need for accurate parameterization of the system and sensor models. The state of a stochastic chemical kinetic system is a

The accuracy of the posterior probability distributions calculated by the observer algorithms is dependent on the accuracy of the parameters in both the system model and the sensor model. Therefore, the applicability of our method is limited by the experimentalist’s ability to determine not only reaction rates and network structures in the system being studied, but also the dynamical properties of the type of sensor (e.g., fluorescent proteins) being used. Due to these limitations, we expect that our approach will be of more interest to synthetic biologists, who typically study systems with fewer parameters than those studied by systems biologists. However, the need for accurate parameter values is a problem that needs to be addressed in this approach for systems of all sizes.

We demonstrated the algorithm on the Thattai-van Oudenaarden model of transcription and translation. Because this model contains only two species whose populations change with time, it is possible to solve the chemical master equation with negligible truncation error and thus to make computationally tractable estimates of the unobservable mRNA population without resorting to more advanced approximation techniques. Furthermore, there exists a standard set of parameters for this model, allowing us to sidestep the problem of inaccurate parameterization. By applying the necessary and sufficient conditions for models to be distinguishable from each other, we can determine in advance that the observer is potentially effective in detecting differences in both the protein and the mRNA dynamics, although more time is needed to distinguish models with different hidden mRNA dynamics from those with different visible protein dynamics. However, because the distinguishability result describes the asymptotic behavior of the observer, it does not guarantee that the systems can be distinguished in a reasonable amount of time. Further research is required to quantify the rate of distinguishability for general stochastic chemical reaction networks.

Hopefully, as the state-of-the-art in computation power and experimental power continues to grow, the method described in this paper can be built upon to uncover knowledge of the dynamics of finer details of cellular operation. To address the realistic situation where it is not possible to accurately parameterize the model before applying the observer, future theoretical development of the observer algorithm will include the development of adaptive observer algorithms to simultaneously estimate the parameters and the states. To apply the observer to the estimation of unknown quantities when a parameterized model is available, we envision the following general procedure. First, select a few cells or colonies on which the observer algorithm has been applied for state estimation, and then perform a more expensive experimental test in order to verify the observer’s predictions. Once satisfied with the observer’s performance, the experimenter can then use the observer for high-throughput analysis on live cells, taking advantage of its indirect sensing method to perform experiments more rapidly and cost-effectively.

All simulations were carried out in MATLAB R2011B. Code is available in the supporting information.

(ZIP)

(PDF)

This work was performed while D. Thorsley was with the Department of Electrical Engineering, University of Washington, Seattle, USA.