^{1}

^{2}

^{3}

^{¤}

^{4}

^{5}

^{3}

^{‡}

^{2}

^{‡}

^{1}

^{‡}

The authors have declared that no competing interests exist.

Performed experiments, image processing, and inheritance and correlation analysis: AL Performed parameter search and model validations: AGV Contributed software for automated image analysis: CV Performed identifiability analysis: EC Coordinated the research: GFT PH GB Analyzed the results and wrote the manuscript: AL AGV EC GFT PH GB.

Current address: Universidad Santiago de Cali, Cali, Colombia

‡ GFT, PH, and GB also contributed equally to this work.

Significant cell-to-cell heterogeneity is ubiquitously observed in isogenic cell populations. Consequently, parameters of models of intracellular processes, usually fitted to population-averaged data, should rather be fitted to individual cells to obtain a population of models of similar but non-identical individuals. Here, we propose a quantitative modeling framework that attributes specific parameter values to single cells for a standard model of gene expression. We combine high quality single-cell measurements of the response of yeast cells to repeated hyperosmotic shocks and state-of-the-art statistical inference approaches for mixed-effects models to infer multidimensional parameter distributions describing the population, and then derive specific parameters for individual cells. The analysis of single-cell parameters shows that single-cell identity (

Because of non-genetic variability, cells in an isogenic population respond differently to a same stimulation. Therefore, the mean behavior of a cell population does not generally correspond to the behavior of the mean cell, and more generally, neglecting cell-to-cell differences biases our quantitative representation and understanding of the functioning of cellular systems. Here we introduce a statistical inference approach allowing for the calibration of (a population of) single cell models, differing by their parameter values. It enables to view time-lapse microscopy data as many experiments performed on one cell rather than one experiment performed on many cells. By harnessing existing cell-to-cell differences, one can then learn how environmental cues affect (non-observed) intracellular processes. Our approach is generic and enables to exploit in unprecedented manner the high informative content of single-cell longitudinal data.

It is well-recognized that cellular heterogeneities exist in a population of isogenic cells [

Regarding dynamical models of gene expression, the most widely-accepted approach to take into account cell-cell variability so far relies on modeling transcription as a stochastic process [

Therefore, a purely stochastic representation of cellular heterogeneity is not appropriate for a large proportion of genes and biological processes. Witnessing that validating a model encompassing both types of variability against data is still very difficult given current experimental possibilities [

Here we analyzed the temporal evolution of the level of expression of an inducible fluorescent reporter in a population of yeast cells growing in a microfluidic device. By selecting a strong inducible promoter and using a stable reporter, we placed ourselves in experimental conditions where extrinsic variability is dominant over the neglected intrinsic component. In addition we assess directly how the inferred individuality in gene expression can be related to measurable features of cell’s phenotype and physiology and therefore related to typical biological measures of cellular identity. We use a modeling approach in which, for a standard model of gene expression in yeast, each single cell is given specific parameter values while the cell population is described by a multidimensional parameter distribution (

We propose several validations of the inference results and we analyze the obtained parameter distributions representing cell populations. Then we focus on single cells and analyze the correlation across parameters or between parameters and other single-cell features related to phenotypic and physiological variability. At last, the inheritability of the parameters of gene expression is assessed. Taken together, our results demonstrate that using the proposed framework, biologically-relevant model parameters can be attributed to individual cells and related to single-cell features, while the population of cells is represented in a concise manner. As such, this work is an important step towards identifying the major determinants of extrinsic cell-cell variability, as well as introducing quantitatively the concept of single-cell identity.

Using microfluidics and time-lapse microscopy we acquired longitudinal data of the response of individual yeast cells subjected to repeated hyperosmotic shocks (see

Mixed-effects (ME) models are a class of statistical models introduced to describe the response of different individuals within a population to known stimuli. Here, we used a ME model where the response of individual cells was described in terms of a simple dynamical model of gene expression. Denoting with _{m} and _{m} for the mRNA, and _{p} and _{p} for the protein, respectively. To relate fluorescence measurements to actual protein concentrations, we accounted for protein folding time using a delay _{m}, _{m}, _{p}, and _{p} vary within the population. Differences in parameter values may typically originate from differences in the level of key components of the gene expression machinery (_{m}, _{m}, _{p}, _{p}) with

Here, we are looking for a multidimensional distribution defined by its center of mass (

Both the

A. Representation of the experimental dataset. B. Simulated behavior obtained when using the parameters of each observed cell in the dataset (325 cells) inferred with the SAEM approach. C. Simulated behavior obtained when using the parameters of each observed cell in the dataset (325 cells) inferred with the naive approach. D. Simulated behavior of 10000 cells when resampling the population joint distribution inferred with SAEM, (pink). E. Simulated behavior of 10000 cells when resampling the population joint distribution inferred with the naive approach. F. As an illustration we show the simulated behavior of 10000 cells when resampling the population parameter distribution as in D but without preserving the covariance between parameters (

We then evaluated the capability of the obtained parameter

To investigate the causes of the marked differences between the predictive power of the ME models inferred using either the naive approach or the SAEM algorithm, we compared the corresponding parameter distributions. In both cases, the mean values of the parameters were comparable and within the expected ranges (see

We then tested the robustness of the inference approach which is an essential property for learning algorithms. Interestingly, the performance of the SAEM inference method degraded gracefully as the number of available single-cell trajectories for identification was decreased to as few as 32 cells (

A Predictions obtained for a ME model having parameter distributions estimated on only 32 randomly-chosen cell trajectories (see also

At this point, we have showed how to efficiently and robustly extract the distributions of parameters of a standard model of gene expression from a collection of longitudinal single-cell data, and a set of parameters for each cell in the population. While we are here mostly interested in the details of the parameter distribution, we can also extract the average value for each parameter of the model. Importantly, they are different from the parameters that are obtained by fitting directly our model of gene expression to the population-averaged behavior. This is illustrated on

A-B. Starting from an experimental dataset (A), one can either extract the parameters that describe the average behavior (in blue), or use our framework to extract the entire collection of single-cell parameters (black dots in B) and compute the average parameters (in yellow). B-C. The average parameters do not match the parameters that best describes the average behavior. C. Visualization of 1000 simulated single-cell behaviors (blue thin lines) based on the parameters distributions shown (partially) in B. The solid blue line is a (good) simulation of the average behavior (also shown in blue in panel A). The yellow solid line is the behavior corresponding to the “average cell”, which has for parameters, the average parameters of the parameters distributions. The “average cell” behavior is clearly different from the averaged behavior.

Non-identifiability arises when the information contained in data along with a model structure does not allow for the proper estimation of parameter values: several parameter values (or more usually combinations of parameter values) yield equally-good results given the available data. In our framework, very high correlations between parameter values may indicate the existence of non-identifiability relations among parameters. The first application of the SAEM algorithm showed that _{m} and _{p} were highly correlated, and, indeed, checking single-cell values suggested that the rates of transcription and translation could hardly, if at all, be quantified independently. A detailed identifiability analysis showed that, at the level of individual cells, these two parameters are structurally non-identifiable; only their product can be quantified (_{p} when inferring parameter distributions using SAEM, and introduced the protein production rate _{mp}, defined as the product of _{m} and _{p}, for the single-cell models. With these changes, shrinkage was then found to be negligible (

Having identified single-cell parameter values, one may wonder whether they can be used to retrieve known facts or discover new ones on the physiology of the cell response to hyperosmotic shocks. In our model, hyperosmotic shocks affect all cells identically. However, in the microfluidic device, the intensity of the shock perceived by different cells varied, as evidenced by differences in the reduction of cellular volume following shocks. Therefore, one should find that protein production parameters inferred for the most severely impacted cells are statistically higher than average. We thus estimated the perceived shock intensities as the time-averaged reduction of cellular volume following shocks, and compared for all the cells the inferred parameter values and the perceived shock intensities. We found a strong correlation between protein production rates and shock intensities in agreement with our hypothesis. Moreover an equally-strong correlation was also found with mRNA degradation rates (

A. Correlations between the perceived intensity of hyperosmotic shocks and single-cell parameter estimates are provided with their corresponding _{mp} and mRNA degradation rates _{m} for each individual cell. Their strong correlation (Spearman coefficient: 0.88; ^{−15}) together with their mutual increase with perceived shocks intensity indicates that these two processes are jointly regulated in response to hyperosmotic shocks. Insert plot and colored background represent perceived shock intensity for 9 groups of 35 cells along the regression line.

In addition to hyperosmotic shocks, several features related to the cell physiology or local environment are also expected to relate to gene expression [_{p}, and the cell division rate. Indeed, as the fluorescent reporter we used has a long half-life and photobleaching is negligible (see Initial parameters values _{p} (

Local cellular density, division rate, size and age were quantified with single-cell resolution (_{p} and the cell division rate. The proportion of variance accounted for by each principal component is indicated in parenthesis.

More generally, one wonders how the different measured cell features relate to the overall (multivariate) parameter variability. We conducted a principal component analysis (PCA) of the set of inferred single-cell parameter values. This yielded a new parameterization of the model (new parameters being called principal components PC1, PC2 and PC3) that is particularly relevant to investigate variability as, unlike natural parameters, each principal component is uncorrelated to the others. The analysis showed that the first two components PC1 and PC2 represented 87% and 12%, respectively, of the overall variance in single-cell parameter values, and that these principal components correlated very significantly with measured cell features. We then ranked the various features based on their correlation with the variability captured by the inferred ME model. For a given feature, this is defined as the weighted average correlation with the different PCs, with weights equal to the importance (

Finally, we investigated inheritance of single-cell parameters. Statistical tests showed that the parameters of mother and daughter cells were significantly closer to each other than the parameters of random cell pairs (_{p} (resp. _{m}, _{mp}). Although mild in absolute terms, bootstrapping testing showed the presence of a statistically strong inheritance effect (^{−15} for all parameters,

(A) The distance between parameters of related mother and daughter cells (MD) and non-related mother and daughter cells (nMD) were compared. (B-D) Distribution for each parameter of the average distance between 40 pairs of MD (red) and nMD (blue) for 50000 combinations obtained by bootstrapping (^{−15}).

In this work, we proposed an approach for capturing the biological variability observed in single-cell time-lapse microscopy experiments by

Our approach is adapted to calibrate models explicitly accounting for extrinsic variability. From a mechanistic viewpoint, two components of biological variability, termed intrinsic and extrinsic noise, have been proposed. For a given cellular process, intrinsic variability is mostly related to fast fluctuations coming from stochasticity in molecular reactions while extrinsic variability includes more stable cell-to-cell differences in intracellular and extracellular environments [

The possibility of identifying single-cell models opens new perspectives. Indeed, our results support the approach advocated by Pelkmans and coworkers

All experiments were performed using a STL1::yECitrine-HIS5, Hog1-mCherry-hph yeast strain derived from the S288C background [

The cells were imaged using an automated inverted microscope (IX81; Olympus) equipped with an X-Cite 120PC fluorescent illumination system (EXFO) and a QuantEM 512 SC camera (Roper Scientific). The temperature of the microscope chamber, which also contains the media reservoirs, was constantly held at 30°C by a temperature control system (Life Imaging Services). All of these components were driven by the open-source software μManager which was interfaced with Matlab. Images were taken using a 100× oil immersion objective (PlanApo 1.4 NA; Olympus). The fluorescence exposure time was 200 ms, with fluorescence illumination intensity set to 50% of maximal power. The fluorescence exposure time was chosen such that the fluorescent illumination did not cause noticeable effects on cellular growth over extended periods of time. Importantly, illumination, exposure time, and camera gain were not changed between experiments, and besides background and auto-fluorescence subtraction (defined as the minimum intensity in the first frame), no data renormalization or processing was done. Imaging was performed at a frequency of one frame every 3 min for bright-field and one frame every 6 min for fluorescence measurements. The duration of the experiments was 10 hours.

Single-cell gene expression profiles were obtained in two experiments: one for identification (

We assumed that the transcription factor activity, _{c}(_{v}(

Two methods were proposed to infer ME population models: a naive approach and SAEM. The naive approach used the local optimization algorithm fminsearch from Matlab to maximize the (log-)likelihood of the parameters tested, given the observed data for the considered cell. The parameter distribution for the ME model is then defined based on the set of single-cell parameters. The SAEM approach aims directly at maximizing the likelihood of the population (high-level) parameters describing the distributions of the model parameters, given all the single-cell data. We used the SAEM implementation of Monolix software. Lastly, having inferred a distribution for the model parameters of a population of cells, one could estimate the most likely parameter values for each single cell (ME single-cell models). We used the local optimization tool fminsearch from Matlab to implement a maximum

The analysis of the correlations between the perceived shocks or the single-cell measured features and the estimated parameters was performed using the Spearman coefficient of correlation. The significance of the correlations (

Additional information on experimental design, data analysis, estimation of single cell quantitative features, cell lineage reconstruction, modeling of the osmostress-induced gene expression, parameter inference, simulation of population behavior, correlation with quantitative single-cell measurements, and heritability analysis.

(PDF)

Influence of the cell number and of the learning time horizon.

(PDF)

Predicting population behavior on two validation data sets.

(PDF)

Parameters _{m} and _{p} cannot be assigned values unambiguously no matter the quality and quantity of fluorescence measurements.

(PDF)

Statistical properties of parameters that are not distinguishable at the single-cell level can nevertheless be constrained in a population approach.

(PDF)

A. Minimum, maximum and average cellular fluorescence levels in the identification dataset

(PDF)

In the naive approach, optimization is used to seek -for each cell- parameter values fitting the individual behavior of the cell via residual minimization (top, step 1). The distribution describing all of the estimated parameter values is then deduced (top, step 2). In the proposed method, the SAEM tool is used to infer a distribution that explains the set of individual behaviors at the distribution level (bottom, step 1). Parameter values for single cells are then estimated based on the particular behavior of the cell and the inferred distribution for the population, using maximum

(PDF)

A. 2D plot describing the distribution of the (logarithm of) single-cell parameters for two parameters (insert: same data shown in natural scale). The ellipses represent the region in which 50% of the parameters are distributed. B. Two metrics were computed to quantify the difference in the structure of the parameter distributions at a more global level. The first metric was the average of the coefficients of the variation matrix (i.e. of the off-diagonal terms cov_{ij}/(_{i}_{j}) between the parameters of the model; this represents the amount of structure in the parameter distribution and shows that SAEM yielded a more structured parameter distribution. The second metric was the volume in the parameter space of the 95%-confidence ellipsoid associated with the covariance matrix. This yields a measure of the typical volume of parameter space occupied by the parameter distribution, and therefore, quantifies the spread of the parameter distributions. This showed that the SAEM approach described the population with a smaller distribution.

(PDF)

The blue bar represent the average distance in parameters between 55 mother-daughter pairs from experiment

(PDF)

(A) Initial values for the search have been obtained by global optimization (CMAES) on the mean behavior starting from literature-based parameters. The value of the delay _{m} and _{p} (only their product is relevant in single-cell models): the mean of _{p} is kept at a constant value during the search. No constraints are placed on its variance though. The value of the delay _{m}] where it was 8%.

(PDF)

(ZIP)

The authors acknowledge Jean-Marc Di Meglio, Benoit Sorre, and Hidde de Jong for insightful discussions. A.L. is grateful to the doctoral program Frontiers of Living Systems.