Head-to-head comparison of three experimental methods of quantifying competitive fitness in C. elegans

Organismal fitness is relevant in many contexts in biology. The most meaningful experimental measure of fitness is competitive fitness, when two or more entities (e.g., genotypes) are allowed to compete directly. In theory, competitive fitness is simple to measure: an experimental population is initiated with the different types in known proportions and allowed to evolve under experimental conditions to a predefined endpoint. In practice, there are several obstacles to obtaining robust estimates of competitive fitness in multicellular organisms, the most pervasive of which is simply the time it takes to count many individuals of different types from many replicate populations. Methods by which counting can be automated in high throughput are desirable, but for automated methods to be useful, the bias and technical variance associated with the method must be (a) known, and (b) sufficiently small relative to other sources of bias and variance to make the effort worthwhile. The nematode Caenorhabditis elegans is an important model organism, and the fitness effects of genotype and environmental conditions are often of interest. We report a comparison of three experimental methods of quantifying competitive fitness, in which wild-type strains are competed against GFP-marked competitors under standard laboratory conditions. Population samples were split into three replicates and counted (1) "by eye" from a saved image, (2) from the same image using CellProfiler image analysis software, and (3) with a large particle flow cytometer (a "worm sorter"). From 720 replicate samples, neither the frequency of wild-type worms nor the among-sample variance differed significantly between the three methods. CellProfiler and the worm sorter provide at least a tenfold increase in sample handling speed with little (if any) bias or increase in variance.


Introduction
In the context of evolutionary biology, fitness is the contribution of an individual to the next generation. Researchers working with C. elegans and related nematodes are often interested in PLOS ONE | https://doi.org/10.1371/journal.pone.0201507 October 19, 2018 1 / 11 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 comparing the average fitness of different strains. The most straightforward way to quantify fitness is to count the total number of offspring produced by an individual over the course of its lifetime. The number of offspring produced over the lifetime of an individual i is its absolute fitness (usually depicted W i ). Relative fitness, w i , is the absolute fitness of an individual scaled relative to that of a reference, usually either the most fit individual in the population, i.e, w i ¼ W i W MAX , or the population mean, w i ¼ W i W . From the perspective of evolution, only relative fitness matters. All else equal, greater absolute fitness means greater relative fitness. However, all else is often not equal, for several reasons. First, demography matters: offspring produced early in an individual's life contribute more to fitness than offspring produced late in life [1]. More importantly, however, interactions between individuals can influence fitness in ways that will not be apparent if the different strains are not allowed to interact directly. Also, differences in relative fitness may often only be manifested under competitive conditions [2], because small differences in performance which would have no detectable effect on fecundity (e.g., sprint speed in gazelles) may translate into qualitative differences in fitness (e.g., which gazelle gets caught and eaten by the lion).
A standard laboratory assay of competitive fitness in many organisms, including Caenorhabditis, is to allow different strains of interest ("focal" strains, usually strain = genotype) to compete against a standard, marked competitor strain [3,4]. Experimental populations are initiated with a known number of focal and competitor individuals, and the population allowed to grow until a pre-defined endpoint, at which time the individuals are either enumerated in totality or sampled. The proportion of individuals of the focal strain at the assay endpoint is designated p; the proportion of competitors is 1-p. The ratio p/(1-p)-the odds that an individual in the population is the focal type-is often called the "competitive index", CI [5], and provides an estimate of the fitness of the focal strain relative to the marked competitor. Relative competitive fitness of focal strains i and j against the same competitor strain can be assessed by comparison of the odds ratio p i /p j . There are several indexes of competitive fitness, all of which are fundamentally based on the proportion p [6].
Experimental measurement of competitive fitness in many organisms, including Caenorhabditis, has been greatly facilitated by the availability of heritable fluorescent markers, e.g., GFP, which can be scored in much higher throughput than traditional phenotypic markers such as dumpy or unc mutants. The simplest competitive fitness assay is to pick a known number of worms of the focal and fluorescently-marked competitor strains onto a seeded plate, incubate the plate until the food is consumed, then count the worms under transmitted light and fluorescent light. Focal and competitor worms alike will be visible under transmitted light, whereas only the competitor worms will visible under the fluorescent light (Fig 1). Images of the plate can be captured under transmitted and fluorescent light and worms counted, either by eye or by means of image analysis software.
Two methods exist by which the throughput of competitive fitness assays can be significantly increased. First, worms can be washed into wells of a microtiter plate and a motorized stage used to automate the image capture, followed by automated counting by image analysis. Second, a large-particle flow cytometer (aka, a "worm sorter") can be employed. The latter two methods involve significant initial investment, especially the worm sorter. However, given that the relevant hardware is available, it is useful to know the time/accuracy trade-offs involved with the different methods.
Here we provide a head-to-head comparison of three experimental methods of quantifying competitive fitness in C. elegans. Method 1 is our standard "by eye" competitive fitness assay, in which nothing is automated [7]. Method 2 employs the image-analysis software CellProfiler [8,9] to automate worm counting. Method 3 involves using a Union Biometrica BioSorter large particle flow cytometer to count worms. For each method worms are washed from agar competition plates into the wells of a 96-well plate to facilitate counts. Our primary interests are: (1) quantifying the variability of the data, (2) identifying potential bias, and (3) quantifying the time per datum collected.

Variability
Measures of variance are often correlated with the mean, which can confound inferences regarding differences in variance when both quantities differ between groups [10,11]. To begin, we assessed three estimators of competitive fitness: the frequency of the focal type, p, the CellProfiler generated worm outlines and GFP objects respectively for the area bound by the red rectangle in (C). Occasional CellProfiler worm untangling errors are shown in (D); "m1" shows misaligned worm outlines for the overlapping worms, "m2" shows two worms mistaken as one.
competitive index CI = p/ (1-p), and log(CI). For each estimator of competitive fitness we calculated two measures of variation: the within-block standard deviation (SD) of a given focal strain/competitor strain/method combination and the mean within-block Median-Levene statistic [12], Md ¼ 1 n P n 1 jx ijkl Àx jkl j, wherex jkl is the block median of the estimator of competitive fitness of focal strain j against competitor strain k using method l, and n is the number of observations in block of a given focal strain/competitor strain/method combination.
Plots of the mean-variance relationships are shown in Fig 2. The correlations are weakest for log(CI), slightly greater for p and nearly perfect for CI. For each of the three measures of competitive fitness, the standard deviation (SD) is less correlated with the mean than is the Median Levene statistic (Md); the correlation is smallest for the SD of log(CI). Given these findings, our assessment of the three assay methods is based on SD log(CI) .
Box-plots of p and SD log(CI) for the three methods, averaged over focal strains and competitor strains, are shown in Fig 3. The data are summarized in S1 Table and raw data are given in S2 Table. Averaged over focal strains and competitors, p does not differ significantly between the three methods (F 2,69.1 = 1.71, P>0.18), although it is slightly larger when estimated by CellProfiler. SD log(CI) does not differ significantly between the three methods (F 2,49.5 = 2.23, P>0.12), but CellProfiler is slightly less variable than the other two methods. However, because of the weak negative correlation between mean and variance of log(CI), it is uncertain if Cell-Profiler is inherently slightly less variable than the other methods, or if the slightly lower variability would disappear in cases when the means were equal.
The repeatability of the "by eye" and CellProfiler methods can be assessed by counting the same image(s) twice (S3 Table). To quantify the repeatability of the by eye method, the same counter (SS) re-counted a subset of 59 images approximately one month after the original counts. The correlation between the two counts was >99.9% for both the total count and the GFP count. The mean absolute difference between the two counts, expressed as a fraction of the average of the two counts, was 0.73% for the total count and 0.46% for the GFP count. The correlation between the proportion of focal worms, p, between the two counts is 99.8%. We re-counted all 720 images counted by CellProfiler using identical software parameters on a different computer running the same version of CellProfiler (v2.2.0); the counts were exactly the same in every case.
Unlike images, worm sorter samples are ephemeral, in that a specific data point cannot be permanently associated with a given individual. To assess repeatability of the worm sorter counts, we split 336 samples from the same assay well into two wells on the counting plate, one of which contained approximately 1/4 of the sample by volume (n worms = 143) and the other of which contained the remaining~3/4 ðn worms = 551) (S3 Table). Of the 336 samples, 280 met the (arbitrary) criteria of n worms > 10 and 0.01 < p < 0.99. The mean absolute difference in the proportion of focal worms, p, is 5.4%, and the correlation between p in the 1/4 and 3/4 split samples is 96.5%. The variation among the two sorter samples includes sampling variance of p (= technical variance), whereas the variation between the two samples by eye and by Cell-Profiler only includes counting error. The sample standard error of the binomial is ffi ffi ffi pq n q , so the difference in standard errors between the two samples equals The relative contribution of binomial sampling and biological variation among competition plates to the total variance is an empirical question. For the 1/4 volume sample, p = 0.553 with standard error 0.0159; for the 3/4 volume sample, p = 0.524 with standard error 0.0154. For the 1/4 volume sample, mean log(CI) = 0.153 with standard error 0.0947; for the 3/4 volume sample, mean log(CI) = -0.0058 with standard error 0.0946. Clearly, for the sample sizes employed in this study (> 100 worms per plate, on average), binomial within-sample variance is small relative to the biological variance among plates of worms.

Bias
If any of the methods are biased, the bias must be sufficiently small to have avoided detection, given the sample sizes. However, we suspect the slightly greater value of p (and thus CI and log Quantifying competitive fitness in C. elegans (CI)) estimated by CellProfiler relative to the other two methods (S1 Table) may in fact reflect a slight bias toward inflating the frequency of non-fluorescent worms, due both to false positives (non-worms called as worms under brightfield) and false negatives (failure to identify fluorescent worms as fluorescent). Successive reduction in edge-cropping of the images analyzed by CellProfiler (see Methods) leads to increasing values of p, relative to the value of p estimated by eye from the same (cropped) image (S1 Fig); presumably the reduction in p with increased cropping is due to the reduction in false positives.
Detecting a fluorescent worm boils down to a two-step process in CellProfiler: first, the worm body must be identified in the brightfield image, and then the fluorescent signal must be identified within that worm body in the associated fluorescent image. Occasionally, CellProfiler fails to identify the full worm (most often truncating the head or tail region), and if the fluorescent signal happens to fall in the missed region, the worm will be mischaracterized as nonfluorescent. We cannot quantify the magnitude of that effect beyond reiterating that it must be small relative to the other sources of variation in the data.

Time per datum
Each of the three methods has unique investments of time associated with it; we return to those in the Discussion. For the head-to-head comparison, each sample was first divided into aliquots (see Methods), thus the clock starts at the time the samples were divided. The "by eye" and CellProfiler methods both involve the analysis of images captured from samples in a 96-well plate. It took approximately 20 minutes to capture the 96(x2) images from one 96-well plate with the stage movement rate set to "slow" (faster image capture is possible), or about 12.5 seconds/sample.
An experienced counter (SS) took~5-7 minutes to count both images for one sample by eye (bright field and GFP; n BF � 176, n GFP � 77). The rate at which images can be processed by CellProfiler will depend on the computer platform as well as the images. Working with a far-from-state-of-the-art Dell Optiplex 9020 PC running Windows 7, CellProfiler averaged 35.6 ± 16 minutes to process a 96-well plate, or about 22 seconds per sample.
As we applied it, the worm sorter took about 50 minutes to collect data from a 96-well plate, or about 30 seconds per sample. Once data are acquired by the sorter, counts can be obtained computationally essentially instantaneously.

Discussion
The results of this study are good news for worm biologists wanting reliable high-throughput estimates of competitive fitness using fluorescently-labeled strains. Automated image-analysis with CellProfiler and the BioSorter both provide estimates of competitive fitness that are nearly unbiased and no more inherently variable than the human eye. For samples of a few hundred worms, CellProfiler and the Biosorter can count worms at a rate that is approximately an order of magnitude faster than even a well-trained, well-motivated human, and the time-savings will increase as sample size increases.
A caveat is that both CellProfiler and the Biosorter require a non-trivial initial investment of time to optimize the data-collection protocol, whereas a sighted human can begin collecting data right away. In our experience the CellProfiler image analysis pipelines took nearly 40-man hours to develop and implement. However, less time will be required to optimize our existing pipelines (S1 File) for a different imaging system. Importantly, the time required to develop the CellProfiler pipelines could be reduced by standardizing imaging conditions so that block-specific pipelines and worm models are not required. For example, in our experiments variable magnification and lamp illumination settings across blocks required us to develop unique pipelines for individual blocks, each of which required a few hours to prepare. We also chose to develop block-specific worm models because of variation in worm morphology across blocks. We believe that much of this variability could be avoided by standardizing levamisole treatments prior to imaging (see Methods). Levamisole is a potent cholinergic agonist that causes hypercontracted paralysis followed by relaxation and death, and the size and shape of worms change significantly during early exposure. We recommend incubating worms for at least 30 min in 5 mM levamisole prior to imaging, so that worms are consistently imaged in the relaxation phase following hypercontraction.

Competitive fitness assay
Five replicate blocks of the competitive fitness assay were conducted in the spring of 2016. Blocks consisted of 24 replicate competitions for all 18 combinations of focal and competitor strains and assay methods, except in one block for which "by eye" data were not collected. Blocks were initiated by bleach-synchronization of strains that had been maintained as mixstaged populations on standard 100 mm NGM plates. The competition between strains began by transferring a single L4-stage focal hermaphrodite and a single L4-stage, GFP-marked competitor hermaphrodite to the same well of a 24-well plate. Each 24-well plate contained NGMA agar supplemented with nystatin (20 μg/ml) and streptomycin (50 μg/ml) and seeded with 10 μl of OP50-1 E. coli. Replicates for each of the six competitions were distributed equally among six 24-well plates, i.e. four replicates per 24-well plate. Competition between the focal and competitor strains persisted for 168 h at 20˚C at which point the food source in nearly all wells had been completely consumed. The resulting nematode populations were then washed from the competition plates by adding 1.5 ml of M9 buffer to each well with a repeat pipettor, aspirating up and down three times with a disposable transfer pipet, then transferring the nematode suspension to a 2 ml capacity 96-well plate. Once filled, the 96-well plate was centrifuged for 1 min at 1,000 g, and the supernatant was aspirated with an 8-channel strip aspirator. In order to retain the nematode pellet, the strip aspirator was set to leave 100 μl in each well. The wells in the 96-well plate were then washed once more by adding 1.4 ml of M9 buffer, centrifuging, and aspirating, followed by the addition of 1.4 ml of M9. A 12-channel pipet was then used to gently mix the nematode suspension by pipetting up and down five times. Once mixed, a 110 μl sample of the resulting worm suspension was transferred into the appropriate rows of two 96-well, clear bottom plates. One of the 96-well plates was used to quantify competitive fitness using the image-based methods (manual counting and CellProfiler counting) while the other was used for sorter-based counting.

Imaging
Quantification of competitive fitness by image-based methods requires both a bright-field and paired GFP fluorescence image to identify focal animals (non-fluorescent) from competitors (fluorescent). To ensure that nematodes do not move between the bright-field and GFP fluorescence images, levamisole was added to the wells at a final concentration of 5 mM. Bright-field and GFP fluorescence images (470±20 nm excitation, 525±25 nm emission) were captured at 20x or 30x magnification for each well using an automated epifluorescence microscope (IX-70, Olympus, Pittsburgh, PA, USA) fitted with a CCD camera (Retiga-2000R, QImaging, Surrey, BC, Canada), XYZ stage and focus motors (Prior Scientific, Cambridge, UK), and controlled by Image Pro-plus software (Media Cybernetics, Rockville, MD, USA). The bright-field and GFP fluorescence image pairs for each well were saved as a single stacked image before being used for the image-based competitive fitness quantification methods.

Quantification method
illumination that remains. Subsequent modules then identify putative nematode objects from the background via thresholding. Next, individual nematode outlines are identified from clusters of nematodes or debris using the block specific worm model. Using the paired GFP fluorescence image, fluorescent objects in each nematode outline are identified and counted by thresholding. The pipelines output a .csv file containing the number of nematode objects in a well, along with the number of children GFP objects counted within each nematode object. A particular nematode is assigned a focal genotype if it contains no GFP objects, or a competitor genotype if it contains one or more GFP objects (Fig 1D). Finally, the frequency of the focal type, p, the competitive index CI = p/ (1-p), and log(CI) are calculated for each well.
BioSorter. Our BioSorter competitive fitness analysis pipeline is presented in detail in Supplementary Appendix A1 of [7]; we reprise the basics here. After sample preparation, we aliquot 110 μl of the sample into a well of a 96-well plate, which the LP Sampler aspirates to the sorter. The LP Sampler script is optimized for 100 μl input, but there is a risk of incorporating bubbles in the fluid line if the sample is short. We include the extra 10 μl for redundancy, and four wash steps with the LP Sampler to ensure we pick up most of the sample. In a typical wash step, the LP Sampler aspirates and dispenses water, cumulatively picking up most of the sample over the course of four wash steps. Potentially leftover samples are unlikely to introduce bias in a competitive assay.
The competitive fitness assay was extended until plates were starved of E. coli food (or nearly so), which greatly reduces the frequency of false positives in the sorter counts. E. coli tends to clump together and register as many small events (particles), decreasing the signal to noise ratio for smaller worms.
Data analysis. The dependent variable for purposes of quantifying variation is the standard deviation of log(CI), SD log(CI) . CI was measured from 24 replicate plates from each combination of focal strain (N2, PB306, CB4856), competitor strain (ST2, VP604), and quantification method (by eye, CellProfiler, Sorter) in five assay blocks. Thus, there are five estimates of SD log(CI) for each unique combination of focal strain/competitor strain/method (four estimates for "by eye"). Method is the independent variable of interest; for the purposes of this study we are not interested in the variation among focal strains or competitor strains. However, because the sample sizes are small, we treat focal strain, competitor strain, and their interactions as fixed effects. Data were analyzed by general linear model (GLM) as implemented in the MIXED procedure of SAS v. 9.4. Variance components were estimated by restricted maximum likelihood (REML). The full linear model is: y ijkl = μ+f i +c j +m k +t i×j +u i×k +v j×k +w i×j×k +ε l| ijk , where y ijkl is the estimate of SD log(CI) in block l, μ is the overall mean, f i is the effect of focal strain i, c j is the effect of competitor strain j, m k is the effect of method k, t i×j is the effect of the interaction between focal strain i and competitor strain j, u i×k is the effect of the interaction between focal strain i and method k, v j×k is the interaction between competitor strain j and method k, w i×j×k is the effect of the three-way interaction, and ε l|ijk is residual (among-block) variance. We initially estimated the residual variance separately for each focal/competitor/ method combination, then pooled the residual variance over different combinations of groups, using the minimum corrected Akaike's Information Criterion (AICc) as the criterion for the best model. Similarly, competitor strain, focal strain, and their interactions were removed and the AICc calculated. The smallest AICc was given by the model with only method included as a fixed effect and the residual variance estimated separately for each method, pooling residual variance over focal and competitor strains within a method. Significance of fixed effects was assessed by F-test on type III sums of squares.
We repeated the above analysis for the fraction of focal worms, p, with block included as an additional random effect and replicate (nested within block) as the unit of observation. The block for which we did not collect "by eye" data was omitted from the analysis.