Model-Based Deconvolution of Cell Cycle Time-Series Data Reveals Gene Expression Details at High Resolution

In both prokaryotic and eukaryotic cells, gene expression is regulated across the cell cycle to ensure “just-in-time” assembly of select cellular structures and molecular machines. However, present in all time-series gene expression measurements is variability that arises from both systematic error in the cell synchrony process and variance in the timing of cell division at the level of the single cell. Thus, gene or protein expression data collected from a population of synchronized cells is an inaccurate measure of what occurs in the average single-cell across a cell cycle. Here, we present a general computational method to extract “single-cell”-like information from population-level time-series expression data. This method removes the effects of 1) variance in growth rate and 2) variance in the physiological and developmental state of the cell. Moreover, this method represents an advance in the deconvolution of molecular expression data in its flexibility, minimal assumptions, and the use of a cross-validation analysis to determine the appropriate level of regularization. Applying our deconvolution algorithm to cell cycle gene expression data from the dimorphic bacterium Caulobacter crescentus, we recovered critical features of cell cycle regulation in essential genes, including ctrA and ftsZ, that were obscured in population-based measurements. In doing so, we highlight the problem with using population data alone to decipher cellular regulatory mechanisms and demonstrate how our deconvolution algorithm can be applied to produce a more realistic picture of temporal regulation in a cell.


Supplementary Text S1
Quadratic form of the cost function C(λ) The predicted value G(t m ) of the mth measurement G(t m ) may be approximated by an inner product of sampled functions where q(t m ) = [Q(φ 1 , t m ) . . . Q(φ N k , t m )] T and δ = 1/N k is the knot spacing.
The cost function C(λ), given in Models as: with G(t m ) = Q(φ, t m )f (φ)dφ, may be written as a quadratic form in α as follows: define Q as the N m × N k measurement matrix Q = [q(t 1 ) . . . q(t Nm )] T that maps the sampled expression function f to the predicted measurements g = [ In Eq. (S3) and what follows, we absorb the height scaling constant δ into the expression function f that is to be estimated. With , the second term of Eq. (S2) may be written where Letting R be a diagonal matrix of measurement variances σ 2 1 . . . σ 2 Nm , the cost function (Eq. (S2)) may be written which is a quadratic form in α.
Justification for use of the synchronous average expression function f (φ) It has been shown that as a result of noise in gene expression, the levels of expression (measured with fluorescent reporters) are normally-distributed about the population mean [1]. Letting f (φ) = (1/N ) k f k (φ) denote the average population expression, we may then write the single-cell expression function f k (φ) for cell k as the product of a scaling factor s k with the population average where s k is independent of all other cell parameters θ, φ and is drawn from a normal distribution with mean 1. For a given species of RNA, the total number of transcripts in the population at time t is then given by: where E X [. . .] denotes statistical expectation over the random variable X. As s is random and independent of θ and φ, the expectation of the product of s and f (φ)v θ (φ) is equal to the product of their expected values (1 and f (φ)Q(φ, t)dφ, respectively).

Microarray data noise model
It has been previously shown that the dominant source of noise in experiments using Affymetrix microarrays is hybridization noise [2]. This leads to a signal variance on each gene expression value that is, above a certain threshold, proportional to the value itself, where G is the level of gene expression, G 0 is the threshold, and β is a constant. Although the experiments in [2] use mRNA from a human Burkitts lymphoma cell line, it was suggested that the hybridization noise component proportional to signal intensity does not depend on the type of genechip and the sample being used. Consistent with that work, we used a noise model with β=5. Also, the threshold is the 10th percentile of all expression values, which, for the entire Caulobacter data set analyzed here, is 0.188. Far below this threshold, i.e., at low levels of expression, we assumed that the variance is constant. To combine these two limits into a single noise model, we assumed that at the threshold intensity the variance diverges from proportionality by 5%, so that

Constraints on the average single-cell expression functions
There are typically multiple solutions to inversion problems of the kind used here. We therefore constrained the single-cell expression profiles by excluding non-physical solutions for which RNA concentration became negative. We also applied a continuity constraint, since for every cell k the concentration of any RNA species at φ = 1 must be equal to the volume-weighted sum of concentrations at φ = 0 and φ = φ (sst) k . Mathematically, we have that where f k (φ) and v k (φ) are the expression in and volume of cell k at phase φ. We use the previously established values of 0.4 and 0.6 for the average SW and ST cell volume fractions [3]. The synchronous average expression over N cells is f (φ) = (1/N ) k f k (φ), which we apply to the left-and right-hand sides of Eq. (S10) to get As before, we assume that the variability between the f k is independent of φ (sst) , and thus we replace the individual f k with their mean value (see Supplementary Information), and rewrite Eq. (S11) as where N (φ (sst) ; µ sst , σ 2 sst ) is a Gaussian probability density function evaluated at φ (sst) , φ 1 = 0, φ N k = 1, and w = [w 1 . . . w N k ] T , with (S13)

Division time assays
To determine the division times for Caulobacter SW and ST cells, we used a simple microfluidic apparatus and followed the microfluidic protocol described previously [4,5]. ST cells attach to the glass surface of the microfliuidic via the adhesive holdfast and are oriented along the direction of the flow of growth medium. and glass cover slip were cleaned and sealed using a Plasma Prep II plasma cleaner (SPI Supplies). Sodium hydroxide (2M solution), ethanol, and water were sequentially flowed into the channels to clean the interior before cell loading.
Individual colonies of wild-type Caulobacter crescentus strain CB15 [6] were taken from a peptone/yeast extract (PYE)-agar plate and grown overnight in 5 ml PYE medium at 30 • C, diluted to 0.1 optical density at 660 nm (OD 660 ) and regrown for 2 hours. Cells were then loaded into the microfluidic chamber and incubated for an additional hour prior to imaging. A Harvard Apparatus PHD2000 infuser was used to induce a constant flow of PYE medium at a rate of 12 µl/min for the duration of the experiment.
Cells were imaged with a Leica DM5000 at 630x magnification in phase contrast mode. Images were collected at 2 minute intervals on a Hamamatsu Orca-ER digital camera, and the light dosage was limited to 200 msec exposure and ∼5 second manual focus time per exposure. The temperature in the room was maintained at 30 • C. Cell growth and division were monitored for 12-14 hours during each of four independent experimental runs.
Images were imported into ImageJ (National Institutes of Health, Bethesda, MD) for processing. The images were converted into binary stacks by subtracting the image backgrounds and adjusting the threshold pixel intensity. Cell areas were calculated in ImageJ, and data were further analyzed using Mathematica (Wolfram Research, Inc., Champaign, IL).