Exploiting Cell-To-Cell Variability To Detect Cellular Perturbations

Any single-cell-resolved measurement generates a population distribution of phenotypes, characterized by a mean, a variance, and a shape. Here we show that changes in the shape of a phenotypic distribution can signal perturbations to cellular processes, providing a way to screen for underlying molecular machinery. We analyzed images of a Drosophila S2R+ cell line perturbed by RNA interference, and tracked 27 single-cell features which report on endocytic activity, and cell and nuclear morphology. In replicate measurements feature distributions had erratic means and variances, but reproducible shapes; RNAi down-regulation reliably induced shape deviations in at least one feature for 1072 out of 7131 genes surveyed, as revealed by a Kolmogorov-Smirnov-like statistic. We were able to use these shape deviations to identify a spectrum of genes that influenced cell morphology, nuclear morphology, and multiple pathways of endocytosis. By preserving single-cell data, our method was even able to detect effects invisible to a population-averaged analysis. These results demonstrate that cell-to-cell variability contains accessible and useful biological information, which can be exploited in existing cell-based assays.


Introduction
Advances in labeling and imaging have made it possible to collect quantitative information on a variety of cellular processes at single-cell resolution, and to generate high-quality population distribution data. Cell-to-cell variability is evident in any such measurement [1], [2]. This variability plays an essential role in processes ranging from bet-hedging in unicellular organisms, to cell differentiation, ageing, and infection in metazoa [3]- [7]. Cellto-cell variability can be generated by intrinsic stochastic mechanisms and shaped by regulatory molecular circuitry: transcriptional regulation impacts protein expression distributions in bacteria and yeast [8]- [10]; the molecular machinery controlling cell shape generates morphological variability in metazoan cells [11], [12]; DNA-damage checkpoints produce noisy oscillations in cancer cell lines [13]. A corollary of these results is that the entire population distribution of a phenotype can be used to study the underlying biology. Here we present an explicit demonstration of this idea: rather than investigating how specific molecular mechanisms generate variability, we reverse the process and use variability itself as a general probe of those mechanisms. We first show that genetic perturbations reliably cause changes to the population distributions of a variety of phenotypes related to cell morphology and activity. We then demonstrate how such changes in phenotypic distributions can be exploited, in conjunction with a screening approach such as genome-wide RNA interference (RNAi), to probe cellular processes with unprecedented sensitivity. An application of this method, to study the molecular basis of multiple endocytic pathways in metazoan cells, is reported in a companion paper [14].
Image-based RNAi screens have previously been used to understand the molecular basis of diverse cellular processes, including: pathogen entry mechanisms; intracellular traffic; cell motility, growth, and differentiation; and cell death and aging [15]- [18]. Though these screens often collect data at single-cell resolution [19]- [24], they invariably focus on population-averaged values to select hits, and rely on heuristic normalization techniques to compensate for labeling and imaging artifacts which cause these values to be erratic [25], [26]. Here we use a radically different strategy, which paradoxically relies on the occurrence of cell-to-cell variability: we focus purely on the shapes of population distributions which, as we show, are robust against measurement artifacts. The shapes of these distributions change under RNAi down-regulation; we quantify these changes, and thus identify genes which influence various cellular phenotypes. This shapebased strategy complements existing methods for analyzing imagebased screens. It can be directly applied to any perturbation experiment which generates single-cell data, for a variety of phenotypes, and is able to identify subtle hits which are missed using standard approaches.

Single-cell-resolved features from an image-based RNAi screen
We applied these ideas in the context of an image-based RNAi screen, with the goal of identifying molecular machinery involved in multiple pathways of metazoan endocytosis. The results of this endocytic screen are presented in the companion paper [14]; here we focus on the use of cell-to-cell variability as a general cellbiological probe. We simultaneously tracked two endocytic pathways [27] in Drosophila S2R+ cells using a pulse-labeling assay [28]: the clathrin-and dynamin-independent CLIC/GEEC endocytic pathway [29] responsible for fluid-phase uptake (probed using FITC-conjugated dextran; green, Fig. 1A,B); and the canonical clathrin-dependent pathway [30] responsible for receptor-mediated endocytosis (probed by using Alexa568-conjugated Transferrin, which is taken up by an ectopically expressed human Transferrin receptor [28]; red, Fig. 1A,B). We used fluorescence microscopy and automated image analysis to extract 27 features for each cell (see Table S1 in File S1, and companion paper [14] for further details): 12 intensity-dependent features describing total uptake levels of the two endocytic probes, and surface levels of the Transferrin receptor as labeled by the monoclonal antibody Okt9 [28] (orange cartoon, and I1-I12; Fig. 1C,D); and 15 geometric features quantifying the shape, size and number of endocytic compartments, as well as nuclear and cell morphology (purple cartoon, and G1-G15; Fig.  1C,D). The screen was performed on custom-designed glass slide arrays of 300 wells printed with double-stranded RNA (dsRNA) (Fig. 1E): 30 wells were negative controls (15 with no dsRNA and 15 with dsRNA targeting the gene for zeocin resistance, which is absent in the Drosophila genome; black in Fig. 1D); 8 wells were positive controls (dsRNA targeting Shibire [31] for the receptor-mediated pathway, and dsRNA targeting Sec23 and Arf1 for the fluid-phase GEEC pathway [28]; shades of blue in Fig. 1D); the remaining wells contained dsRNA targeting individual genes to be screened. Each slide was assayed in triplicate, with scrambled dsRNA patterns (   Table S1 in File S1 for detailed feature descriptions. (E) The screen was carried out on glass slides printed with 300 wells in a 10630 format, each well containing dsRNA targeted against different genes. Colors represent negative (black) or positive (blue) control wells, while white represents test wells. Details of the image analysis and the experimental conditions are provided in the companion paper [14]. doi:10.1371/journal.pone.0090540.g001 S1A in File S1). Cells were grown for four days in the presence of dsRNA, then fixed and imaged. We tested a total of 7216 dsRNAs targeting 7131 unique genes.

Cell-to-cell and well-to-well variability in feature distributions
We measured phenotypic distributions of the 27 single-cell features for each well (hollow histograms, Fig. 2A,B), with population sizes ranging from 200 to 500 cells. All the distributions we measured showed significant cell-to-cell variability; but we also saw a great deal of well-to-well variability between replicate measurements. Even among negative controls, feature distributions were erratic, though geometric features were typically more robust than intensity features (compare hollow purple histograms in Fig. 2B to hollow orange histograms in Fig. 2A) with the latter showing significant slide-positional artifacts (orange heat map, Fig.  2C). To further quantify this effect we carried out 1-way ANOVA [32] for two scale-related measures (mean and variance) and two dimensionless shape-related measures (skewness and kurtosis) of each distribution, using the ANOVA F-statistic to compare the variability of these measures between and within the rows of each slide. This analysis confirmed (Fig. 2E) that the mean and variance of intensity features (but not of geometric features) were susceptible to row-dependent artifacts, but shape-related measures for all features less so. The same trends were observed for variability between columns of a slide and between entire slides, and between negative controls on different slides, across all intensity and geometry features (Fig. S1B,C in File S1). Based on these results, we surmised that the well-to-well differences in the mean and variance of intensity features had their origin in background (additive) and scale (multiplicative) artifacts (from dye loading or illumination, for example) which don't influence geometric measurements. These artifacts should neutralized by the standard procedure of subtracting the mean and dividing by the standard-deviation, leaving a distribution that is characterized by its shape alone. Indeed, phenotypic distributions from replicate measurements converged to nearly identical shapes once normalized in this way, for both intensity and geometry features (filled histograms, Fig. 2A,B). However, if the feature axis was re-scaled non-linearly before normalization (such as by a logarithmic or power-law transformation), geometric distributions typically converged while intensity distributions did not. This provided further evidence that normalization was working to counter affine (additive and multiplicative) artifacts.

Feature distributions change shape under perturbations
Our key observation was that positive and negative control wells could be distinguished even after normalizing out mean and variance (Fig. 2F); the resulting distributions are characterized entirely by their shapes. We used a Kolmogorov-Smirnov-like (KS) statistic [32], [33] to assign a Z-score to each gene on a slide ( Fig.  3; Methods: A Z-score to quantify shape changes of phenotypic distributions). This Z-score quantifies distribution shape changes between test and negative control wells; the higher the score, the greater the shape deviation. Since each gene was tested in triplicate, we calculated three Z-scores for every gene, and pooled these data over the entire screen. We classified a gene as a hit if it occurred two or more times above a given Z-score threshold (Fig.  3B). Figure 4A shows the number of hits selected from the screen (green curve) compared to the number of hits selected from randomly permuted genes (grey band) as the Z-score threshold is varied. The deviation of the green curve from the grey band reveals the presence of reproducible hits in the dataset. The maximal deviation occurs near a Z-score threshold of 3 across all features ( Fig. S2A in File S1). Using this threshold we identified 1072 unique genes as hits for one or more features [14].

Shape-based scoring reveals a spectrum of weak-tostrong genetic contributions
The response to perturbations over triplicate measurements revealed an unexpectedly complex connection between genes and phenotypes. For each feature, at a given Z-score threshold we can classify genes into four bins: those occurring either zero, one, two, or three times above threshold. Each of these bins will contain a combination of true hits and negatives. We can infer false-positive rates (FP: the fraction of above-threshold negatives; Fig. 4B, top panel) and true-positive rates (TP: the fraction of above-threshold hits; Fig 4B, bottom three panels) by fitting the observed gene number in each bin to a statistical model ( Fig. 4C; Methods: Assessing statistical power from triplicate data). The inferred FP rates matched well to the FP rates measured for negative controls (Fig. 4D). However, we were not able to infer a single TP rate consistent with the data. The behavior of the positive controls highlights the problem: different genes appear to have different, characteristic TP rates (Fig. 4B).
Extending this idea, we postulated that hits over the entire screen had a distribution of TP rates. Under this assumption, it was possible to fit all the observed data, and therefore to infer FP rates (Fig. 4E, x-axes) and the range of TP rates (Fig. 4E, green band along y-axes) as the Z-score threshold was varied. For most features, a Z-score threshold of 3 corresponds to FP , 0.1 and TP , 0.5 for single measurements; the FP rate is lower and the TP rate higher if we use triplicate data with a 2/3 rule (Fig. 4C). In support of our calculation, measured TP rates from different positive controls fell within the inferred band of TP rates (Fig. 4E, solid lines; Fig. S3 in File S1). The broad distribution of TP rates is a property of the underlying biology, related to the varying degrees of influence different genes can have on the phenotype of interest.

Shape-based scoring outperforms other scoring methods
There are many possible variations of the KS-based Z-score, differing on how the phenotypic distributions are initially modified. We compared three options: (U) Un-normalized or raw distributions; (P) Partially normalized distributions, transformed to have the average mean and variance of nearest   Figure 5A; the bar graphs indicate the fraction of intensity (orange) or geometric (purple) features in each bin; left panels show observed improvements for positive controls, right panels show inferred improvements for all genes. We find that for intensity features, which are plagued by measurement artifacts, normalization actually increases the power of the screen; for geometric features, which are less susceptible to artifacts, our method performs at least as well using normalized distributions as using raw distributions. Specifically: at FP = 0.1, the inferred average TP rate with normalization (that is, using shape alone; Fig. 4E, solid green dots) exceeds that without normalization (Fig. 4E, hollow green dots) for all intensity features and some geometric features (Fig. 5, top panels). We did however find that normalization typically does not improve or harm performance when applied to positive controls (Fig. 4, left panels) which are associated with strong perturbations.
We next compared the performance of the KS-based Z-score with that of a Z-score where the 'signal' is restricted only to the mean values of feature distributions. This 'traditional' Z-score is defined as the squared difference between the signal of a given well and the average signal of the wells on a given slide, normalized by the variance of all the signals. As before, we can use triplicate data with a 2/3 rule to select hits; Figure S2B in File S1 shows the number of hits selected using the traditional Z-score (green curve) compared to the number of hits selected from randomly permuted genes (grey band). Unlike the KS-based Z-score ( Fig. S2A in File S1), at no threshold does the number of hits selected using the traditional Z-score rise above that expected by chance. For further validation, we screened a subset of predicted hits using an independent experimental assay. This validation assay was designed to minimize positional artifacts at the expense of lower throughput, so the mean values of feature distributions could be used to select hits [14]. We found that for both intensity and geometric features, less than 10% of the traditional Z-scores of validated genes exceeded unity in absolute value; most of these hits would have been completely missed by the high-throughput screen. Conversely, there was a strong overlap between genes selected by shape-based scoring with those selected from the mean-value analysis in the validation assay; that is, the hits which had been selected only for their ability to modify the shapes of phenotypic distributions had a strong tendency to perturb the means of those distributions as well (Fig. 6A).

Biological relevance of shape-based scoring
We applied three independent criteria to test whether our shape-based scoring strategy generates biologically meaningful information. First, we examined whether our hit selection might have been influenced by overall cell health and proliferation, using cell density as a proxy. We found that the cell densities observed in negative control wells broadly overlapped with those in test wells and in wells corresponding to hits. This suggests that the observed changes in the shapes of phenotypic distributions do not arise from gross deficiencies in cell health due to RNA interference. However, there were subtle differences in cell density between hit subsets (Fig. 6B): the number of cells per field among fluid and Transferrin uptake hits (medians 90 and 92, respectively) were only slightly below those among negative control wells (median 98), but nuclear hits showed a striking correlation with high cell number (median 139). This suggests an unexpected connection between nuclear morphology and proliferative capacity, though this must be further explored to rule out confounding factors. Second, we checked whether individual genes influenced multiple types of features in the expected manner. Of the 1072 hits, 470 influenced fluid-phase uptake, 602 influenced Transferrin uptake, 267 influenced nuclear morphology, and 26 influenced cell size (Fig. 6C). Consistent with the expectation that different endocytic pathways share core molecular machinery, there was a high degree of overlap between the fluid and Transferrin uptake gene sets (27%: 211 genes compared to 34 expected by chance). In contrast, there was no significant overlap between endocytic hits and those influencing nuclear morphology (5%: 56 genes compared to 31 expected by chance). Third, we asked whether the lists of hits were enriched for protein complexes, as defined by the Gene Ontology [34] 'cellular component' classification system (Fig. 6D). Protein complexes involved in basic cellular processes such as transcription, mRNA processing, and proteolysis, all emerged as strong hits; we also found components of the cytoskeletal and traffic machinery enriched among endocytic hits. These complexes are expected to play a pervasive role in cellular processes; what is surprising is that they can be detected through their influence on the shapes of phenotypic distributions alone. By all these criteria our shape-based analysis appears to be well suited to probe the entire spectrum of genes involved in complex cellular processes, covering genes with both subtle and strong effects across a variety of phenotypes.

A 'cell state' model for feature distribution shape changes
We have seen that population distributions change shape when a perturbation is applied; this suggests that different cells in the population might be in distinct states, causing them to respond differently to the perturbation. We can define this hidden 'cell state' [35] to be a feature with the following properties: (1) it is not itself affected by a perturbation; (2) it can be used predict the response of some other feature to that perturbation. We refer to such a cell-state feature as a ''classifier'', and to the feature being perturbed as the ''output''. We searched for potential classifier-output pairs among the 12 intensity features, as well as the cell-size feature (geometric features had narrow or discrete distributions, and were therefore poor candidates for classifiers). Cells were sorted into three classifier percentile bins (A: 5-%-15%, B: 45%-55%, C: 85%-95%) and the fraction of genes occurring zero, one, two, or three times above a Z-score threshold of 3; circles show actual data, bars show the inferred composition of each bin, in terms of positives (green) and negatives (grey). The lower-left panel shows the fraction of positives in each bin; genes occurring two or more times above threshold are strongly enriched in positives. The two right panels show the performance when hits are selected from a single measurement rather than using triplicates. (D) The grey band shows the range of inferred FP rates for 25 features (excepting features G11 and G15 for which the inference procedure fails to converge); the black line shows the mean of the measured FP rates for the same features. (E) Inferred TP rates. Green bands show the range of inferred true positive rates (1-b 0 6 s; see Methods: Assessing statistical power from triplicate data) as a function of inferred false positive rates (a); blue lines show the observed TP and FP rates among positive control genes (light: Shibire; medium: Sec23; dark: Arf1). The box to the right of each graph shows the inferred average TP rate (1-b 0 ) at FP = 0.1; the solid dot shows the performance using normalized distributions, the hollow dot shows the performance using un-normalized distributions. doi:10.1371/journal.pone.0090540.g004 mean value of the output was calculated for each bin (m A , m B , m C ). These were used to define a correlation score: y = (m A -m B )/(m B -m C ). For a given classifier-output candidate pair { i, j }, we calculated the median value of y over all positive control wells, and subtracted from it the median value over all negative control wells, to get the final score Dy ij . This 13613 matrix is shown as a heatmap (Fig. S4A in File S1) separately for the positive controls Arf1, Shi, and Sec23. A feature i will be a good candidate for a classifier if the diagonal entry Dy ii is close to zero (since it must be unaffected by the perturbation); a corresponding feature j is a good candidate for the output if the entry Dy ij is high in magnitude, which occurs when the three binned populations respond differentially to RNAi. One candidate pair is shown in the Shibire heatmap (Fig. S4A in File S1): the classifier is cell size (G11) and the output is the intensity feature I9. To explore this classifier-output pair further, we carried out pairwise KS-tests of Shibire-RNAi wells against negative control wells, for both I9 and G11 (Fig. S4B in File S1). It is clear that I9 is strongly affected by the perturbation, but G11 is not. The same result was evident when we pooled data from group of similar wells (Fig. S4B in File S1, red box) and examined the distributions of G11 and I9 without (Fig. S4C in File S1, grey histogram) and with (Fig.  S4C in File S1, blue histogram) the RNAi perturbation. We next split the cells into five classifier percentile bins, and examined the mean value of the output in each bin without (Fig. S4C in File S1, grey curve) and with (Fig. S4C in File S1, blue curve) RNAi. We found that the response of the output to RNAi depended strongly on cell size. This implies that the variation in response is at least partly due to the background variation in cell size; however, most of the variation remains unexplained. This is not surprising, since we have only tested a handful of features for explanatory value.

Discussion
Cell biologists, when limited by the sensitivity of their assays, are sometimes forced to pool data from large numbers of cells. For example, transcriptional, proteomic, or metabolic analyses require large sample sizes in order to enhance signal-to-noise and thus ensure reproducibility. Behind the interpretation of such measurements is the tacit assumption that population averages accurately reflect the state of individual cells. However, as single-cell-resolved data have become more widely available, the ''myth of the average cell'' has been found to be a poor reflection of reality [1]. Cell-to-cell variability is ubiquitous, and population averages blur much of the complexity of cell-biological phenomena. Several elegant studies have begun to reveal details about the processes which give rise to this variability. Here we have taken a complementary approach, using cell-to-cell variability itself as the cell-biological probe. Rather than relying on traditional population-averaged measures to detect perturbations, this is precisely the information we ignore -we normalize all our phenotypic distributions to have zero mean and unit variance, and focus purely on shape. This operation neutralizes the scale and background artifacts inherent in the dye-labeling and imaging approaches central to many cell biological assays. But there is a danger that normalization might cause initially distinct distributions to collapse onto a single curve, thus throwing the baby out with the bathwater. Our key finding is that, for a broad range of phenotypes, normalized distributions do not collapse; instead, perturbations cause robust changes to their shapes. By detecting these changes, we can screen for candidate genes involved in a variety of cellular processes.
As a practical matter, the fact that we have been able to exploit cell-to-cell variability in this way without needing to understand its mechanistic basis implies that our methods are broadly applicable. However, our results also raise questions of a fundamental nature. What we observe as the individuality of cells in their response to perturbations might be intrinsically stochastic; but it might also reflect a prior heterogeneity of biochemical and biophysical cell states in the population, arising from autonomous factors such as transcriptional, metabolic, and cytoskeletal variations, or higherorder effects involving cell-to-cell communication and coordination. Disentangling these possibilities is an important problem in its own right: the more we discover about the origins of cell-to-cell variability, the better will we understand the secret lives of single cells.

Methods
A Z-score to quantify shape changes of phenotypic distributions We start with a list of values of some phenotype of interest measured in any well. For each data point, we subtract the mean and divide by the standard deviation, resulting in a distribution with zero mean and unit variance. We then compare the normalized distributions from two wells by using the Kolmogorov-Smirnov (KS) test statistic D: the maximum vertical distribution between the cumulative distribution functions [32], [33]. If the two sampes have N 1 and N 2data points, the effective sample size is given by 1/N e = 1/N 1 + 1/N 2 , and the final statistic is: The Z-score itself is calculated by quantifying the shape deviation from negative controls. For each negative control well, we We validated hits using an independent experimental assay based on the population-averaged mean value of each phenotype [14]. We carried out this measurement both for hits as well as for a number of non-hits (genes with below-threshold Z-scores). The y-axis shows the fraction of original hits validated (FP = 0.1); the x-axis shows the fraction of sub-threshold genes validated. Each dot gives the result for a single feature type; hits for features G11 and G15 were not included in the secondary measurement. Horizontal and vertical dotted lines show the FP rate. The distinction between hits and sub-threshold genes was based on shape-based scoring alone, but the former are detected at a much higher rate than the latter using the population-average-based assay. This demonstrates a strong correlation between the ability of a gene to influence the mean value of a phenotype and the shape of a phenotypic population distribution. (B) Relationships of hit subsets to cell density. We show the cell number per imaging field as a vertical histogram. The median (horizontal line), mean (box), and percentiles (5%, 25%, 75%, 95%) of the cell number distribution are overlaid. Histograms are separately shown for negative control wells, all test wells, and for fluid uptake, Transferrin uptake, and nuclear morphology hits. (C) Area-proportional Venn diagram of hits that influence fluid-phase uptake (F), Transferrinreceptor-mediated uptake (T), or nuclear and cell morphology (N). Of the 26 genes that influence cell size, 21 which do not influence other features have been omitted. Numbers give the sizes of non-overlapping subsets. The total number of hits is 1051 (shown) + 21 (not shown). (D) Functional enrichment. We annotated genes according to the Gene Ontology (GO) 'cellular component' classification system, using only the most specific term for each gene. We used the one-tailed Fisher's exact test to determine an enrichment p-value for each teach GO term among the 1072 hits, given its background occurrence among the 7216 RNAi probes. To correct p-values for multiple hypotheses, we used 1000 simulated datasets in which GO terms had been randomly permuted. The table shows the seven GO terms with corrected p-values , 0.1, along with the observed and expected number of genes among the seven non-overlapping gene subsets of the Venn diagram. doi:10.1371/journal.pone.0090540.g006 calculate its D * value against all 300 wells of a slide; this 300-length vector is normalized to have zero mean and unit variance. Arranging each such vector as a row of a 306300 matrix, the average of each column represents the shape deviation of the corresponding well. In practice we first removed the 10 worstscoring negatives so that we could use them to estimate falsepositive rates. This left a 206300 matrix shown as a heat map in Figure 3A, with lighter boxes indicating higher shape deviations. The Z-score of any well is the average value of the corresponding column; the higher the score, the greater the shape deviation.

Assessing statistical power from triplicate data
Each gene is tested in triplicate, and therefore assigned three Zscores. At a given Z-score threshold, genes can be classified into four bins: occurring zero, one, two, or three times above that threshold. Each bin contains some to-be-determined combination of negatives and true hits. We assume a fraction h of all tested genes to be true hits. At a fixed threshold, we assume hits to have a Gaussian distribution of false-negative (FN) rates with mean b 0 and variance s 2 , while negatives have a false-positive (FP) rate a. Under these assumptions, genes fall into triplicate bins as follows: bin hits non{hits Eq: 2 where the brackets S Á Á ÁT denote averages over the distribution of b values, easily computed under the Gaussian assumption. We next estimate the values of fa,b 0 ,s,hg which best describe the observed triplicate bin data. At each threshold, four normalized bins correspond to three degrees of freedom, while we are trying to estimate four unknown parameters. However, since h is a constant independent of threshold, the system is ''almost'' determined, and we are able to find a unique solution with a non-zero least-squares error. We can thus find fa,b 0 ,sg as a function of the threshold, as well as a fixed value of h, for each parameter (Fig. S3 in File S1).

Supporting Information
File S1 Contains the following tables and figures: Table S1. Feature descriptions. Figure S1. Slide layout and positional artifacts. Figure S2. Presence of reproducible hits. Figure S3. Performance inferred from triplicate data. Figure S4. Cell states and cell-to-cell variability. (PDF)