^{1}

^{2}

^{3}

^{*}

^{4}

^{5}

^{6}

Conceived and designed the experiments: DS-G. Performed the experiments: DS-G SC. Analyzed the data: DSG JNA. Contributed reagents/materials/analysis tools: JNA. Wrote the paper: DS-G JNA SC.

The authors have declared that no competing interests exist.

In both prokaryotic and eukaryotic cells, gene expression is regulated across the cell cycle to ensure “just-in-time” assembly of select cellular structures and molecular machines. However, present in all time-series gene expression measurements is variability that arises from both systematic error in the cell synchrony process and variance in the timing of cell division at the level of the single cell. Thus, gene or protein expression data collected from a population of synchronized cells is an inaccurate measure of what occurs in the average single-cell across a cell cycle. Here, we present a general computational method to extract “single-cell”-like information from population-level time-series expression data. This method removes the effects of 1) variance in growth rate and 2) variance in the physiological and developmental state of the cell. Moreover, this method represents an advance in the deconvolution of molecular expression data in its flexibility, minimal assumptions, and the use of a cross-validation analysis to determine the appropriate level of regularization. Applying our deconvolution algorithm to cell cycle gene expression data from the dimorphic bacterium

Time-series analyses of cellular regulatory processes have successfully drawn attention to the importance of temporal regulation in biological systems. A number of model systems can be synchronized such that data collected on cell populations better reflect the dynamic properties of the individual cell. However, experimental synchronization is never perfect, and the degree of synchrony that does exist at the outset of an experiment is quickly lost over time as cells grow at different rates and enter different developmental or physiological states on cell division. Thus, data collected from a population of synchronized cells can lead to incorrect models of temporal regulation. Here we demonstrate that the problem of relating population data to the individual cell can be resolved with a computational method that effectively removes the effects of both imperfect synchrony and time-dependent loss of synchrony. Application of this deconvolution algorithm to a cell cycle time-series data set from the model bacterium

Recent technological advances have made feasible studies of biological systems at the single-cell level

Among the properties hidden by population averaging is cell-to-cell variability, such as that found in gene expression and protein production

From a mathematical perspective, population asynchrony may be modeled as a kernel function that maps the average of an observable in the absence of asynchronous variability to the value measured at the population level. Population asynchrony has been modeled in yeast as both a time-dependent

Population asynchrony characterization is most easily done with a synchronizable system such as the dimorphic bacterium

We propose a simple model for the time-dependent distribution of

To effectively remove the effects of population asynchrony from measured data, we must first establish a model describing the temporal position of cells within their own cell cycles and how they are distributed in the population. In this section we develop this model in the context of

We refer to the position of a cell within its own cell cycle as the cell's

At time

All three of these cell-specific quantities are random variables;

The conditional distribution

Having constructed a model for the distribution of cell types, we now show how this distribution can be used to map gene expression at the single-cell level to the expression data derived from cellular populations. The signal intensity measured in a typical microarray experiment is proportional to the population-level concentration of the measured species

It has been previously shown that the

Using the above approximations, the total concentration of gene

The kernel mapping function

The rule-based ^{6} synchronized cells at

(A) The simulated steady state cell cycle phase distribution shown here is achieved after ∼10 average cell division times. Each cell

Rewriting

For a desired

Hence, Eq. (9) and Eq. (10), combined with a rule-based model of the evolution of cell types within a population enable us to compute the kernel transformation needed to invert population measurements into single-cell data. The kernel

At the outset of the experiment, all cells can be found in the SW stage. The distribution broadens as experiment time goes on and cells progress through their cycles at different rates. Following division, new peaks emerge in the distribution as daughter cells enter the population with different cell cycle phases: SW cells with

With the complete noiseless measurement model given as the integral equation in Eq. (6), extracting average single-cell information involves solving the integral equation for

In order to estimate the expression function, which is solely specified by

The cost function

The final optimization problem is to minimize

The solution to the optimization problem (Eq. (15)) depends on the value of the smoothness parameter

For a fixed value of

The cell-type distribution model enables us to mathematically determine the probability that a cell taken from a synchronized population is in a given phase. For example, the probability that a single

Our simulated distribution, with cells grouped broadly into the SW, ST, EPD, and LPD types, is shown alongside the experimentally-determined distribution in

A comparison of the simulated and experimentally-determined distributions shows that the population fractions of SW cells, young ST cells, early predivisional (EPD) cells, and late predivisional (LPD) cells are similar in both. Experimental data is reproduced from Judd et al.

There are over 500 cell cycle-regulated genes in the

Shown here in arbitrary units are the original microarray data (blue line), the model-predicted measurements

In general, the deconvolution procedure yielded expression profiles with peaks shifted to later times relative to the population data, and recovered details lost in the population averaging. For example, the deconvolved expression profile for

The average

Fortunately, the natural adhesion and asymmetric division of

Histograms of single-cell division times for ST cells only (A) and for the full cell cycle (B), measured under microfluidic conditions, show an average SW-to-ST transition time 10.5 minutes (difference between the two histogram means). This translates to a

It is clear from our microfluidic growth assays that the mean SW-to-ST transition phase is dependent on growth and/or environmental conditions. Our choice of

To evaluate this impact, we replaced the

Gene name | ||||

0.10 | 0.9580 | 0.060 | 0.9846 | |

0.10 | 0.8882 | 0.022 | 0.9941 | |

0.11 | 0.8058 | 0.025 | 0.9942 | |

0.11 | 0.8741 | 0.019 | 0.9980 | |

0.10 | 0.7922 | 0.021 | 0.9923 | |

0.09 | 0.9378 | 0.011 | 0.9995 | |

0.09 | 0.7685 | 0.017 | 0.9958 | |

0.08 | 0.9453 | 0.014 | 0.9985 | |

0.10 | 0.8850 | 0.028 | 0.9898 | |

0.12 | 0.8653 | 0.015 | 0.9986 |

The minimal effect of variation in model parameters is characterized by (i) the mean absolute value of the normalized residuals and (ii) the Spearman rank correlation coefficients

The function for the phase-dependent volume of a single cell (Eq. (5)) is an additional aspect of the model for which there has been no prior detailed investigation. We chose a reasonable piecewise linear model based on the measured average volume fraction of SW vs. ST cells, however, as with the transition phase, an analysis of the effect of changes to the single-cell volume function was warranted. We therefore reapplied the expression estimation replacing the volume function Eq. (5) with a constant cell volume, and discretized the functions into 100 phase points as before. The normalized residuals were calculated analogously to those in Eq. (18). The mean absolute value of the residuals and Spearman correlation coefficient for each gene are shown in

While population-level experimental techniques typically allow for high-throughput and fast data collection, they are unable to capture many of the details present at the level of single cells. This is an unavoidable consequence of population averaging; population-based data are in fact transforms of organism- and condition-specific population asynchrony kernels with single-cell data. Thus, an assumption of equivalence of population and single-cell data is an assumption of a non-physical delta function integral kernel. Recognizing this, cell distribution models have been proposed with the aim of extracting more detailed information from biological time-series data. Perhaps the simplest improvement on the delta function model is a fixed kernel such as a Gaussian. Further improvements have been made by allowing for a Gaussian kernel whose width increases with time (e.g.,

The aforementioned parameters and initial synchronization state are specific to a given model system and experimental condition. For a synchronized population of

We note that we have assumed a perfect

Along with characterization of cell distribution, there has been considerable interest in recent years in extracting “single-cell”-like information from population data using deconvolution-type algorithms

Even with a detailed and accurate kernel and an accepted deconvolution-type algorithm, the precise shape of a deconvolved function is in general highly sensitive to the value of the regularization parameter (

By construction, the model-based deconvolution method presented in this paper mitigates the effects of synchronization loss in expression experiments. However, as with all time series experiments, the estimates remain dependent on the sample rate of the data. If the sample rate is insufficiently high to capture salient gene activity, important events in the expression profile may be missed. In principal, lower sampling rates may be accommodated by increasing the number of assumptions made about the expression profile to be estimated. In this paper, smoothness (Eq. (12)), positivity (Eq. (13)), and continuity (Eq. (14)) were all used to decrease the effective degrees of freedom and supply a maximal, yet realistic, amount of

The synchronous average expression profiles extracted using our generalized deconvolution algorithm are, with the effects of population asynchrony removed, a much-improved reflection of biological reality. We demonstrated this with

(A)

Looking only at the population-level microarray expression data for

These deconvolution results appear to be relatively insensitive to changes in model parameters. Of the parameters used in the cell cycle phase distribution model, the mean SW-to-ST transition phase is the one that is known with the least certainty. However, we found that precise knowledge of the mean transition phase under a given condition is not absolutely necessary for extraction of average single-cell data with our deconvolution algorithm. Even a substantial change in the assumed SW-to-ST transition phase had only a small effect on the deconvolved profiles. With respect to the single-cell volume model employed in the deconvolution algorithm, even the extreme and false assumption of fixed cell volume had an insignificant effect on the shape of the deconvolved expression profile.

One

To our knowledge, our deconvolution method is the first to specifically deal with the unique analytical challenges posed by dimorphic organisms. Although this method can be applied to any time-series measurement made on a cellular population, we have demonstrated its utility with an analysis of cell-cycle regulated gene expression in

Supporting Text

(0.08 MB PDF)

A comparison of expression functions calculated using _{sst}_{sst}_{sst}

(0.86 MB EPS)

The kernel structure, shown here with 0.5 minute resolution, is highly time dependent and not well-modeled by any common form.

(1.46 MB MOV)

The authors would like to thank Malgorzata Rowicka-Kudlicka, Andrzej Kudlicki, and Patrick McGrath for helpful discussions, and Alison Hottes for valuable comments on the manuscript.