Unbiased Estimation of Mutation Rates under Fluctuating Final Counts

Estimation methods for mutation rates (or probabilities) in Luria-Delbrück fluctuation analysis usually assume that the final number of cells remains constant from one culture to another. We show that this leads to systematically underestimate the mutation rate. Two levels of information on final numbers are considered: either the coefficient of variation has been independently estimated, or the final number of cells in each culture is known. In both cases, unbiased estimation methods are proposed. Their statistical properties are assessed both theoretically and through Monte-Carlo simulation. As an application, the data from two well known fluctuation analysis studies on Mycobacterium tuberculosis are reexamined.


Introduction
Since the pioneering work of Luria and Delbrück [1], fluctuation analysis has been the object of many studies: see [2][3][4][5][6][7] for reviews. In the past twenty years, the stress has been put on the estimation of the expected number of mutations, for which reliable methods are now available [8][9][10][11][12][13][14][15]. However, as Stewart puts it (p. 1140 of [4]): The parameter L [expected number of mutations] is not, in itself, of biological interest because the experimenter can vary it at will simply by changing the size of the culture vessel or the richness of the medium. What he really wants to know is not L, but the mutation rate.
Deriving a mutation rate (i.e. the probability for a mutation to occur upon any given cell division) from an expected number of mutations seems easy: the former is the quotient of the latter by the final number of cells at the end of the experiment. The problem is the definition given to ''final number of cells''. The simplest view is expressed by Kendal and Frost (p. 1062 of [2]).
N is obtained by averaging the final number of cells from each parallel culture.
Other authors have developed a more cautious approach, like Foster (p. 198 of [5]).
The validity of the mutation rate calculation requires that N t be the same in each culture. Usually, but not always, this can be accomplished by growing cells to saturation. If achieving an uniform N t is a problem, the cell number in each culture can be monitored before mutant selection by measuring the optical density or by counting cells microscopically (e.g. using a Petroff-Hausser chamber). Because there is currently no valid method to correct for different N t 's, deviant cultures must be eliminated from the analysis.
Even under the most careful monitoring, final numbers of cells vary [16]. Yet, final number data are rarely reported in fluctuation analysis experiments, although exceptions exist such as [17,18]. Theoretical models considering variations in the population size have previously been proposed by Angerer [19] and Komarova et al. [20]. Yet, to the best of our knowledge, Foster's assertion that ''there is currently no valid method to correct for different N t 's'' remains true to this date. This paper proposes several such methods.
As we shall see, dividing an estimated expected number of mutations by a mean final number of cells, induces a negative bias on mutation rates. Not only the mutation rate, but also the variance of the estimator are underestimated, thus potentially inducing wrong conclusions in statistical testing. Two levels of knowledge on the fluctuations of final numbers are considered. Either the mean and variance of final numbers have been estimated separately, or the final number is known for each culture. In the first case, ifp p denotes the estimate of the mutation rate assuming constant final numbers, the unbiased estimatep p ub is obtained by: where m and C denote the mean and coefficient of variation of the final number of cells. When final numbers are known for all cultures, better results are obtained by the Maximum Likelihood method. The qualities of the proposed estimators have been assessed on a simulation study. The impact on real experiments is discussed, using Mycobacterium tuberculosis data published by David [17], and Werngren & Hoffner [21]. Our R [22] implementation of the simulation function and the different estimators is provided in File S1.

Simulation experiments
Six different estimates of p were computed on 1000 simulated samples of 50 couples mutant counts -final numbers. Our choice for the sample size was motivated by two opposite reasons. On the one hand, sample sizes in practice rarely exceed a few tens. On the other hand, confidence interval calculations are all based on asymptotic normality, which requires the sample size to be large enough. A sample size of 50 seemed a reasonable compromise. Boxplots for the estimates are represented on Figure 1. The first boxplot corresponds to the 1000 estimates by the p 0 -method, assuming the mean final number is known; it is negatively biased as predicted by the theory. The next boxplot represents estimates from the Maximum Likelihood method with known mean final number; it is coherent with the previous one, and similarly biased as expected. On the next two boxplots, the estimates have been multiplied by the unbiasing factor (1). The unbiasing is correct for both methods. For the last two boxplots, each estimate has been computed using the 50 couples with no prior knowledge on the mean and coefficient of variation of final numbers. The best results are obtained by the maximum likelihood method (last boxplot).
The p 0 -method (label MLP0) performs nearly as well. Since the last two boxplots do not use any prior information, one could have expected their dispersions to be higher than those of the first four. This was not the case, which proves that prior knowledge on the distribution of N is not a real improvement over measuring final numbers for each culture.
Each estimation method returns a (theoretical) standard deviation, from which confidence intervals can be computed. It is is based on a large sample approximation. The sample size in current fluctuation analysis experiments usually ranges from 20 to 50. Since the estimated standard deviation is of high importance for statistical decision, it was necessary to check whether theoretical standard deviations matched observations. On the same samples, the empirical standard deviation of the 1000 estimates was computed, and compared to the mean value of theoretical standard deviations. For each of the estimators, the theoretical standard deviation was smaller than the observed one; yet, the relative error was smaller than 5%, which validates the theoretical value. For instance, the empirical standard deviation for the maximum likelihood estimate (rightmost boxplot of Figure 1) was 1:85|10 {10 , whereas the theoretical value was 1:80|10 {10 .

Published data sets
In the two references studied here [17,21], the authors used Luria & Delbrück's method of the mean. Luria & Delbrück [1] themselves had remarked that the method is very sensitive to the size of jackpots and induces important biases; see also Lea & Coulson [23], and Pope et al. [6] for a more recent reference. Table 1 reports mutation rate estimates for the data in Table 1 of David [17]. Since detailed data were not avaible, only the p 0method could be used. The second column contains the author's estimates. The next two columns contain the unbiased p 0 -estimate and its 95% confidence interval. Observe that, even though confidence intervals are large due to the small sample sizes, the author's estimates are outside the confidence interval in 5 cases out  Table 1), the method of the mean may overestimate p by several orders of magnitude. The main conclusion of [17] was a significant difference in mutation rates, depending on the drug (Isoniazid, Streptomycin, Rifampin, or Ethambutol). Indeed that difference is confirmed by an ANOVA of the estimated mutation rates (P~0:012). Table 2 of David [17] contains two paired samples of mutant counts and final numbers. All possible estimates were computed. Values ranged between 1:81|10 {10 and 2:14|10 {10 . The two values that we consider most reliable, obtained by the maximum likelihood method, were very similar: 1:98|10 {10 and 1:97|10 {10 . The estimate reported by the author is 7:53|10 {10 . Again, the difference is due to the bias induced by the author's estimation method. Table 2 reports mutation rate estimates by the ML method, from data in Table 1 of Werngren & Hoffner [21]. The second column contains the authors' estimates, calculated by Luria & Delbrück method of the mean. The next two columns contain the unbiased ML estimate and its 95% confidence interval. Except for two strains, the authors' estimate is outside the confidence interval. Here, the method of the mean used by the authors has underestimated the mutation rate, because of the very small number of jackpots in the data. The main conclusion of [21] was that no significant difference had been observed between non-Beijing strains (first seven lines) and Beijing strains (last six lines). Actually, the average mutation rate over the first seven lines is 4:37|10 { 8 , over the last six lines it is 2:69|10 { 8 . The difference is significant at threshold 5% (Welsh Two Sample ttest, P~0:047).

Discussion
In any estimation problem, three levels must be distinguished: the reality which is and will remain unknown, the mathematical model which involves more or less realistic hypotheses, and the estimation method. Minimal requirements for an estimator are consistence (outputs should be close to the unknown value of the parameter), and a computable asymptotic variance (to allow statistical inference). Since there is no way to validate all mathematical hypotheses that define the model, another quality is desirable: robustness. Indeed, designing an estimator for a given model and applying it to a different one usually induces a bias: the smaller the bias, the more robust the estimator. For mutation rate estimates, several sources of bias have been identified, such as cell deaths [19,[24][25][26], unknown division time distribution [15], etc. Since there is no way to double check estimates on real data, the usual approach for evaluating an estimation method consists in repeating in silico experiments, i.e. simulate mathematical models for a given value of the parameter, estimate that value repeatedly, and study the distribution of the obtained estimates. A general simulation algorithm described in [15] permits extensive Monte-Carlo experiments.
Usually, only the expected number of mutations is considered as the parameter of interest. Among the many estimation procedures that have been proposed, we have focused on the p 0 -method and the maximum likelihood (ML); they satisfy the basic requirements of statistical inference. As for most other parametric estimation problems, the ML method is the most precise. Provided cell deaths are neglected, the p 0 -method stands out as the most robust.
All estimation methods are valid only if all observed mutant counts come from the same Luria-Delbrück distribution, i.e. if they have been obtained under a fixed expected number of mutations. However, the parameter of real interest which must be considered as fixed, is the mutation rate. For each culture the expected number of mutations is the product of the mutation rate by the final number of cells. Since final numbers vary from one culture to another, so do expected numbers of mutations. As shown here, applying the p 0 -and ML procedures to the fluctuating final number case as if final numbers were constant, induces a bias. Two solutions have been proposed. In the case where the final numbers of each culture are unknown, but a coefficient of variation is available, an unbiasing factor has been defined, and validated on simulation experiments. The unbiasing factor (1) measures the error induced by neglecting final number fluctuations: the relative error is of order aC 2 =2 where a~pm is the expected number of mutations and C the coefficient of variation of final numbers.
The more favorable case is when final numbers are available. Of course measuring the final number of cells for each culture leads to reducing the volume of the culture in which the mutants are counted, and therefore underestimating mutations. This should be accounted for, by proportionally adjusting the estimates of final numbers. When coupled mutant counts -final numbers have been collected, variants of the p 0 -and ML methods are available. Both yield quite precise estimates. As in the constant final number case, the p 0 -method is more robust, and almost as precise as the ML method. Only the ML method can output relative fitness estimates. Does the correction for fluctuating final numbers have an impact on the interpretation of the data? We have reexamined the data in two examples chosen from the literature. In both cases, important discrepancies were oberved, that do not only come from neglecting final numbers: they are essentially due to the author's use of Luria-Delbrück's method of the mean, which is very sensitive to jackpots, and can bias the mutation rate estimate by several orders of magnitude. In David's paper, the ethambutol mutation rate had been estimated around 10 {7 whereas our estimation is of order 10 {9 . The demonstration is even more striking in Werngren and Hoffner's paper. They compared mutation rate between Beijing and non Beijing M. tuberculosis strains and concluded that it was not different and thus could not explain the strong association between Beijing strains and multidrug resistance phenotype. However we re-calcutated the mutation rate and showed that it was significantly higher for Beijing vs. non-Beijing strains. This result is consistent with a recent paper [27] showing that lineage 2 (Beijing) M. tuberculosis strains have a higher mutation rate than lineage 4 (non-Beijing) strains. Given the importance of mutation rates on the risk of selection of drug resistant mutants, an accurate evaluation is very important. We hope that our results will help improving precision in the evaluation of mutation rates.

Conclusion
Dealing with classical estimation methods, Foster [5] was right in recommending that cultures with deviant final numbers be eliminated from fluctuation analysis. Indeed, under varying final numbers those methods underestimate mutation rates, and the relative bias is proportional to the squared coefficient of variation of final numbers. Yet, instead of being discarded as a nuisance, variations in final numbers should be added to the available information to improve estimation: the best mutation rates estimates are obtained when couples mutation count -final number are used.
Two possibilities exist. If mutant counts contain enough zeros (say 10% or more), the p 0 -method gives reliable results in virtually null computer time, and is robust both to relative fitness and division time distribution changes. If mutant counts do not contain enough zeros, or if an estimate of relative fitness is sought for, then the joint estimation of the mutation rate and relative fitness should be carried through by the maximum likelihood method.
We are currently working on an optimized implementation of these methods into a forthcoming R [22] package that will be made freely available.

Methods
Here, N denotes the final number of cells in a Luria-Delbrück fluctuation analysis experiment. Contrarily to the traditional point of view [5], fluctuations on N are considered, i.e. N is viewed as a random variable. In the following subsections, different levels of information are assumed on the distribution of N: either its Laplace transform is known, or only its expectation and variance are known, or nothing is known, but the final numbers of cells have been measured together with mutant counts for each experiment. Notations for the different parameters are summarized in Table 3.
As usual, adding a 'hat' to the notation of a parameter denotes an estimator of that parameter. We shall consider only strongly consistent, asymptotically Gaussian estimators. If h is any parameter, and s denotes the sample size, then ffiffi s p (ĥ h{h) converges to a centered Gaussian distribution as s tends to infinity. The variance of that distribution, called asymptotic variance ofĥ h, will be denoted by vĥ h .
In the next four subsections, the focus is on the so-called p 0method, introduced by Luria and Delbrück [1] (see also [5,28]). The problem of jointly estimating the mutation rate p and the relative fitness r by he maximum likelihood method will be treated after.
Unbiasing p 0 -estimates The final number of cells N is viewed as a random variable with probability distribution function G on ½0,z?). The distribution of N is supposed to be known and its Laplace transform is denoted by L.
The expectation and variance of N are denoted by m and s 2 respectively. Let U be a random variable, with uniform distribution on ½0,1, independent from N. The indicator X for the mutant count being null is defined as: Consider a sample of size s, i.e. s independent copies of X : (X 1 , . . . ,X s ). Denote byp p 0 the empirical mean of the X i 's, i.e. the relative frequency of zeros among mutant counts.
p p 0~1 s X s i~1 X i : By the central limit theorem, ffiffi s p (p p 0 {p 0 ) converges in distribution to the centered Gaussian distribution with variance p 0 (1{p 0 ), i.e.p p 0 has asymptotic variance vp p 0~p 0 (1{p 0 ).
The p 0 -method consists of estimating the mean number of mutations a by the negative logarithm ofp p 0 , then divide by m to obtain an estimate of p. a a 0~{ log (p p 0 ) andp p 0~â a 0 m : Actually,â a 0 is a consistent estimator of: { log (p 0 )~{ log L(p) ð Þ: If N is constant, then L(p)~e {pm~e{a , and { log (p 0 )~a: in that caseâ a 0 is asymptotically unbiased. If N is not constant, because of the convexity of the exponential, and by Jensen's inequality, { log L(p) ð Þis smaller than a, i.e.â a 0 underestimates a, and thereforep p 0 underestimates p.
Denote by L {1 the inverse of L (assumed to be injective). Define a new estimator of p by: By construction,p p ub is a strongly consistent estimator of p, and therefore it is asymptotically unbiased. Its asymptotic variance is obtained by the traditional delta-method (see e.g.  [23]; see also [5,28]. Families of distributions for which explicit expressions ofp p ub and vp p ub can be obtained are scarce. Two examples are given below.
Gamma distributions. They depend on two parameters, usually denoted by a and l. The expectation and variance are: m~a l and s 2~a l 2 : The squared coefficient of variation is the inverse of the shape parameter: C 2~1 =a. The Laplace transform at p is: One gets: Expressed in terms ofâ a 0 and C 2 , these expressions become: and vp p ub~1 zaC 2 À Á 2 vp p 0 : As we shall see in the next subsection, the last two expressions, which are exact for inverse Gaussian distributions, hold as a first order approximation for any distribution.

First order approximation
If the probability distribution of N is known, the bias can be exactly corrected by inverting the Laplace transform of N. However, this is only a theoretical viewpoint. The best that can be hoped for in practice is an estimate of the expectation of N together with its variance. It turns out that whatever the distribution of N, and provided the product of the coefficient of variation by the expected number of mutations remains relatively small, the bias can be corrected. Here, we only assume that the first two moments of N, m and s 2 are known, but the full distribution of N, and in particular its Laplace transform, remains unknown. As we have seen, the expectation ofâ a 0 is { log (L(p)). Consider the terms of the series expansion of L(p) in p up to order 2 (see e.g. [30]): Expressed in terms of a and C 2 , the relative bias is: To unbiasp p 0 , one must divide by the relative bias or (as a first order approximation), multiply by 1zâ a 0 C 2 2 . Hence (1): The asymptotic variance, obtained through the delta-method is: These expressions are exact for inverse Gaussian distributions, only approximations for any other distribution.
To assess the validity range of the unbiasing factor, a simulation experiment was conducted. For the same value of p~10 {9 , samples of final numbers were simulated with a log-normal distribution with mean m~a=p and coefficient of variation C. The values of a ranged from 0:1 to 2, those of C from 0 to 1. The results are shown on Figure 2. Red curves show the actual relative bias of the p 0 -method; for blue curves, the bias has been corrected by the unbiasing factor (1). The correction maintains the bias under acceptable values even for relatively large a and C.
The p 0 -method by maximum likelihood In this section, nothing is assumed about the distribution of N. A couple (X ,N) of random variables is considered, where X represents the indicator of a null mutant count, and N the total number of cells at the end of the experiment. The conditional distribution of X knowing N~n, is defined as before: ½ X~1D N~n ~e {pn : Assume that s experiments have been repeated independently, yielding s couples (x i ,n i ), where x i is 1 or 0 according to whether zero or a positive number of mutants have been counted, and n i is the final number of cells. The likelihood is the probability of the observation: The likelihood depends only on the products pn i . If all n i ' s are divided by a given constant, then the maximum likelihood estimator will be multiplied by the same constant. Since the n i 's are very large and p very small, rescaling both can make the calculation numerically more stable.
The log-likelihood and its derivatives are: The maximum likelihood estimatorp p ml is the solution of d' dp (p p ml )~0, and its asymptotic variance is computed from (see [29]). This is essentially the method used by de la Iglesia et al. [18] in a similar case.

Bivariate maximum likelihood estimation
In cases where no null mutant counts have been observed, or if an estimate of the relative fitness is desired together with the mutation rate, another procedure must be used. Estimating the two parameters of a classical Luria-Delbrück distribution by the method of maximum likelihood was proposed long ago [8,12,31,32]. Using well known explicit formulas, the method has been implemented [11,14,33]. In [15] it was shown that similar algorithms apply not only to the classical Luria-Delbrück distribution (in which division times are exponentially distributed), but also to the so-called Haldane model in which distribution times are supposed constant [34,35]. The situation here is only slightly different. Instead of being considered as a sample of a fixed Luria-Delbrück distribution, mutant counts can be viewed as independent realizations of different distributions. Denote by LD(a,r) a Luria-Delbrück distribution with expected number of mutations a and relative fitness r. If a pair mutant count -final number (m,n) has been observed, m is viewed as a realization of the LD(pn,r), and the likelihood is computed accordingly. Thus the pair (p,r) is jointly estimated, as the pair (a,r) in the constant final number case.
Here is the mathematical model: for each experiment a pair of numbers giving the number of mutants and the final number of cells is obtained. An experiment is modelled by a couple (M,N) of random variables, where M represents the number of mutants and N the total number of cells at the end of the experiment. The conditional distribution of M knowing N~n is assumed to be the generalized Luria-Delbrück distribution GLD(pn,r,F). The notation is that of [15]: the expected number of mutations a is the product of p by the expected final number of cells, the relative fitness (ratio of the growth rate of the population of normal cells divided by that of mutants) is r, and the distribution of mutant division times is given by F . As in [15], we assume that a model has been chosen for the distribution of division times, so that only the mutation probability p and the relative fitness r are to be estimated.
The sample size being s, for i~1, . . . ,s experiment number i has yielded a couple (m i ,n i ), where m i is the mutant count and n i is the final number of cells. As in [14,15], we denote by q m (a,r) the probability of a mutant count equal to m, under the Luria-Delbrück distribution with parameters a (expected number of mutations) and r (relative fitness). The computation algorithms of the q m (a,r) are well known and will not be reproduced here: see [12,14,15]. With that notation, the mutant count at the end of the i-th experiment is equal to m i with probability q mi (pn i ,r). No assumption being made on the final counts, we consider the stuple of mutant counts (m i ) i~1,...,s as a realization of a sample of independent random variables.
The log-likelihood is: The computation of the gradient and Hessian of '(p,r) are only slightly different from those needed for the calculation of the maximum likelihood estimates of a and r in the classical case [12,14]. In the formulas below, be shall omit the dependence in (p,r) for clarity. The first and second derivatives of ' are evaluated at (p,r), those of q mi are evaluated at (pn i ,r). The gradient is computed by: The Hessian is computed by: Lr , The first and second derivatives of q m (a,r) in a and r are obtained by recursive algorithms that will not be reproduced here [12,14].
It is a well known fact in statistics, that the most easy looking maximum likelihood problem usually conceals algorithmic difficulties: numeric instability, bad conditionning of the Hessian, etc. [36]. Here, the procedure looks straightforward from (5) and (6): solving the gradient by a quasi-Newton or conjugate gradient method should be done quite efficiently at low computing cost. However, depending on the values in the sample, some optimization techniques may be more efficient than others. For the results described in this article, we have used the statistical software R [22], and compared several optimization algorithms: quasi-Newton, BFGS, conjugate gradient, simulated annealing [37]. The calculation of the Hessian at the maximum likelihood solution, which is needed to output asymptotic variances poses a numerical problem, already signalled in [14]. For the results of the article a numeric evaluation of the Hessian was used instead of (6) [37]. In File S1, only the simplest method has been included: it consists in solving the gradient by the Raphson-Newton method, from (5) and (6). It is not the best method by far. We are presently working on an optimized implementation, to be included in a forthcoming R package.

Model for simulations
In the simulation study reported in the Results section, we have chosen to draw samples of final numbers according to a lognormal distribution with fixed expectation m and coefficient of variation C. Other similarly shaped distributions could have been used: gamma, inverse Gaussian, Weibull, etc. Our choice of the log-normal was motivated by fitting real data, and by previously published results: see [16] and references therein.
If some value of the mutation rate p has been fixed, and the final number of cells N has been simulated, a mutant count can be drawn according to a Luria-Delbrück distribution with expected number of mutations a~pN and relative fitness r. As explained in [15], an additional choice must be made: that of a probability distribution for division times. Neither of the two extreme choices that leads to computable versions of the Luria-Delbrück distribu- tion (exponential and constant division times) is realistic. We have chosen the same distribution as in [15]: the best adjustment on Kelly and Rahn's observation on Bacterium aerogenes [38].
Simulations have been conducted for different sets of parameters. Results are reported for the following values, considered as representative: p~10 {9 , m~10 9 , C~0:5, r~1 : One thousand samples of size 50 of pairs (mutant counts -final numbers) were simulated. For each sample, six estimates of p were computed, together with their theoretical standard deviation. N Classical methods: the estimate of the expected number of mutations a was computed by two different methods: the p 0method [1,5,28], and the maximum likelihood (ML) method [12,31,32], both applied to the sample of mutant counts. Dividing by the expected final number m, assumed to be known, leads to two estimates for p. N unbiased estimates: to each of the two previous estimates, the unbiasing formulas (1) and (3) were applied, assuming that the true value of the coefficient of variation was known which lead to two more estimates of p. There again, the expected final number m was supposed to be known, as well as the coefficient of variation C.
N p 0 -method on the pairs: no prior information being assumed, the maximum likelihood determination of p by the p 0 -method was applied to the sample of pairs mutant counts -final numbers.
N maximum likelihood for p and r: taking againg the sample of pairs with no prior information, a joint estimation for p was obtained.
The data in Table 1 of [17] are not detailed, so only the p 0method could be applied. The bias correction (1) was applied, using a coefficient of variation of 0:35 (estimated from Table 2 in the same reference). Table 2 of [17] shows 10 pairs mutant counts -final numbers. All possible estimates were computed together with their confidence intervals. However, it must be remarked that standard deviation computations rely upon asymptotic results, and do not apply to such a small sample.
In Table 1 of [21] mutant counts are explicitly given. The maximum likelihood estimate with exponential division time was computed, then unbiased using a coefficient of variation of 0:44 (estimated from the given final counts).

Supporting Information
File S1 File S1 is a script of the R functions that have been used for the simulation experiments described here. It is a preliminary version of a forthcoming R package. The functions have not been protected nor optimized. (R)