Avoiding Dangerous Missense: Thermophiles Display Especially Low Mutation Rates

Rates of spontaneous mutation have been estimated under optimal growth conditions for a variety of DNA-based microbes, including viruses, bacteria, and eukaryotes. When expressed as genomic mutation rates, most of the values were in the vicinity of 0.003–0.004 with a range of less than two-fold. Because the genome sizes varied by roughly 104-fold, the mutation rates per average base pair varied inversely by a similar factor. Even though the commonality of the observed genomic rates remains unexplained, it implies that mutation rates in unstressed microbes reach values that can be finely tuned by evolution. An insight originating in the 1920s and maturing in the 1960s proposed that the genomic mutation rate would reflect a balance between the deleterious effect of the average mutation and the cost of further reducing the mutation rate. If this view is correct, then increasing the deleterious impact of the average mutation should be countered by reducing the genomic mutation rate. It is a common observation that many neutral or nearly neutral mutations become strongly deleterious at higher temperatures, in which case they are called temperature-sensitive mutations. Recently, the kinds and rates of spontaneous mutations were described for two microbial thermophiles, a bacterium and an archaeon. Using an updated method to extrapolate from mutation-reporter genes to whole genomes reveals that the rate of base substitutions is substantially lower in these two thermophiles than in mesophiles. This result provides the first experimental support for the concept of an evolved balance between the total genomic impact of mutations and the cost of further reducing the basal mutation rate.


Introduction
It has become increasingly clear that the basal rate of spontaneous mutation per genome per replication is remarkably invariant in DNA microbes: using a classical correction factor for estimating the ratio of all base-pair substitutions (BPSs) to detected base-pair substitutions, genomic mutation rates (mutations per genome per replication) vary by less than twofold while genome sizes vary by <6,000-fold (Table 1). Thus, when mutation rates are expressed per average base pair, they also vary by a similarly large factor. Therefore, basal mutation rates characteristic of unstressed microbial populations can evolve to finely tuned values. The theory of mutation rates has its roots in Haldane's 1927 formulation of the impact of selection and mutation on fitness [1], followed by Sturtevant's 1937 conjecture that the deleterious character of most mutations would generate selective pressures that should lower mutation rates indefinitely [2]. In 1967, Kimura offered the hypothesis that there would be a ''physiological cost'' to each reduction in rate, leading to an equilibrium value when that cost outweighs the gain in fitness [3]. The surprise has been that the observed genomic rates are so narrowly distributed among DNA microbes despite a wide variety of life histories and genome sizes. An even deeper mystery, not to be addressed here, is why the particular microbial genomic rate of about 0.003-0.004 has been adopted by microbes of such diverse life histories and genome sizes.
If the Kimura conjecture is correct, then increasing the average deleterious impact of a spontaneous mutation (and thus converting many neutral or nearly-neutral mutations to deleterious mutations) would lower the rate of mutation, at least on an evolutionary time scale. The concept of an equilibrium basal mutation rate is difficult to test in a laboratory context because any imposed resetting of the equilibrium would probably require numbers of generations large even by microbial standards, and is difficult to test convincingly because only one or a few habitats could be explored. However, it has recently proven possible to test the concept by examining a natural evolutionary experiment, life at high temperatures. Those who gather mutants for fun or profit have often observed that the most common class of mutations is to temperature sensitivity, indicating that many missense mutations are well tolerated at the standard growth temperature but become much more deleterious, often to the point of lethality, at a temperature only 5uC-10uC higher. This widespread anecdotal observation implies that macromolecular stability becomes increasingly dependent on structural integrity as temperatures rise, a reasonable conjecture in keeping with the considerable constraints observed in the proteins of thermophilic microbes (e.g., [4]). It is therefore likely that the average missense mutation harms thermophiles more than mesophiles (the hypothesis of dangerous missense). This simple prediction was supported by the observation that missense mutations accumulated to a lesser extent (compared to synonymous mutations) in thermophiles than in mesophiles during the course of molecular evolution (d N /d S falling from 0.14 to 0.09), implying stronger purifying selection in thermophiles [5]. Here, direct measurements of the rate and character of spontaneous mutation are compared for mesophilic and thermophilic microbes.

Two Extrapolation Problems
The first phase of determining genomic mutation rates involves measuring a mutation frequency, converting the frequency to a rate, and taking precautions to exclude or take into account the impact of perturbations such as differential growth rates of mutants versus wild type and delayed expression of the mutant phenotype. In addition to measuring rates, it is crucial to identify the kinds of mutations that arise in order to exclude biases due to massive mutational hotspots or to bizarre classes of mutations. The typical result is a rate for a mutation-reporter gene, which is then extrapolated to the whole genome provided that the spectrum of mutations is fairly ordinary. However, there is a substantial problem here: while most indels are detected, most BPSs fail to produce a phenotypic change detectable in the laboratory. One must therefore estimate their full frequencies. (An exception is the still rare case that mutation detection is achieved with phenotypeblind genomic DNA sequencing.) Two methods have been applied. Both make the reasonable assumption that almost all indels and chain-termination (CT) BPSs are detected with high efficiency in protein-coding sequences. (Although exceptions occur, they are infrequent and tend to occur at the extreme downstream end of a gene.) The first method was based in part on the average relative frequencies of CT and non-CT BPSs in a handful of spectra and provided a correction factor for base substitutions of 4.726 [6]. This method was used for almost all of the entries in Table 1; however, the range of values averaging to 4.726 was large, reducing reliability. The second method is based exclusively on CT mutations. It involves examining the reporter sequence for all possible BPSs capable of generating CTs and then dividing the observed CT mutation frequency by that reduced target size and multiplying by 3 (to account for the three BPSs that can arise at any site) to obtain an average mutation rate per base pair. The CT method also has drawbacks. First, it cannot report A?TRG?C mutations, but these generally arise at approximately average BPS rates, suggesting a minimal problem. Second, CT mutations are typically a minority of all mutations, so that many spectra sport only a few CTs, reducing sampling accuracy.
The other major barrier to accurate extrapolation from a mutation-reporter gene to the whole genome becomes manifest when sequencing reveals a major hotspot. Mutation rates at particular sites vary greatly, but most mutational spectra display a range of site-specific numbers of mutations ranging from 1 to hotspots with from several percent to even a quarter of the whole collection. The impact of a hotspot containing a quarter of all the mutations is modest, but some genes contain single hotspots bearing the large majority of mutations; the classic example is the E. coli lacI gene, where ,72% of all mutations are indels arising at a stretch of 13 BPSs consisting of 3.25 repeats of a tetramer [7]. However, such massive indel hotspots are infrequent among genes, and it is reasonable to post occasional genomic rates both including and removing them.

Thermophiles Versus Mesophiles
All informative microbial mutation rates obtained before 2000 were for mesophilic species, but rates and spectra are now available for two genes in each of two very different thermophiles, the crenarchaeon Sulfolobus acidocaldarius [6] and the bacterium Thermus thermophilus [8], both growing at close to 75uC. In the first study, with S. acidocaldarius, BPSs were a smaller fraction of the spectrum than in mesophiles, and this observation prompted the hypothesis of dangerous missense. Note, however, that if greater fractions of missense mutations are phenotypically detectable in thermophiles than in mesophiles, then the historical method of correcting for undetected BPSs becomes inappropriate when based on mesophiles. It is therefore advisable to resort exclusively to the CT method for estimating total BPS rates, which is the central result for this report. Table 2 lists genomic mutation rates estimated using the CT method (or its lacZa equivalent), sometimes based on the same

Author Summary
Spontaneous mutations are key drivers of evolution and disease. In microbes, most mutations are deleterious, some are neutral (without significant impact), and a few are advantageous. Because deleterious mutations reduce fitness, there should be constant selection for antimutator mutations that reduce rates of spontaneous mutation. However, such reductions are necessarily achieved at some cost. Therefore, a mutation rate should converge evolutionarily on a value that reflects this trade-off. For DNA microbes, the observed genomic mutation rate is remarkably (and mysteriously) invariant, in the neighborhood of 0.003-0.004, with a range of less than two-fold despite huge variation per average base pair in organisms with a wide diversity of life histories. Would an environmental condition that increased the average deleterious impact of a mutation be balanced by additional investments in antimutator mutations? It is widely observed that many mutations with mild impacts become strongly deleterious at higher temperatures, so mutation rates were measured in two thermophiles, a bacterium and an archaeon. Remarkably, both displayed average mutation rates reduced by about five-fold from the characteristic mesophilic value, most of the decrease reflecting a 10-fold reduction in the rate of base substitutions. sources as for Table 1 but excluding some reports whose sequencing information was inadequate for the CT method. The nine entries at the top are for mesophiles and reveal no significant departures from the values in Table 1, providing empirical confidence in the robustness of the CT method. The two entries at the bottom are for thermophiles, whose numbers of CTs are small. (The data for the two mutation-reporter genes are combined in each organism because of the small number of CTs.) The thermophile BPS rates are substantially lower, by about 10-fold, than their mesophile counterparts. When major indel hotspots are included, indel rates are less than twofold lower in thermophiles, while total genomic rates are about fivefold lower. (When the indel hotspots are removed from the analysis, the indel rate decrease is three-fold and the total genomic rate decrease is seven-fold.) Although these ratios are somewhat uncertain because of the small numbers of CTs for five of the seven mesophiles and both thermophiles, the mean difference is large enough to support the inference that BPS rates are lower in thermophiles. The mesophile and thermophile values were compared using randomization t-tests [9], a nonparametric test that requires no assumptions about normality or equal variances of the mutation rates. The resulting one-sided p values are 0.018 for both the total mutation rate and its BPS component, and 0.27 for the indel values that include the hotspots.

The Central Result
Genomic mutation rates have long been suspected to evolve as a balance between the deleterious impact of the average mutation and the cost of further reducing the mutation rate. A test of this conjecture on the evolutionary scale could consist of estimating mutation rates in organisms whose environment increases the impact of the average mutation. Because many base substitutions do greater harm at higher temperatures, thermophiles were suitable candidate organisms. For both a bacterium and an archaeon, the thermophiles display sharply reduced rates of base pair substitutions compared to the typical mesophile.
The lower mutation rates in thermophiles are likely to reflect their higher optimal growth temperatures. There is no obvious hint of a particular aspect of life history other than temperature that sets the two thermophiles apart from the mesophiles. The %(G+C) values for the ten organisms in Tables 1 and/or 2, listed monotonically with the two thermophile values in bold, are 35-36-37-38-41-50-50-51-68-69, providing no hint of a role for this variable, as also noted in the earlier molecular-evolution study [5]. Thus, the Kimura conjecture, that the equilibrium mutation rate reflects a balance between the impact of the average mutation compared to the cost of keeping mutations in check, is supported in a natural experiment.
The hypothesis of dangerous missense predicts that BPS rates will be reduced in thermophiles but does not speak directly to indel rates. However, indel rates are also reduced, although less strongly than are BPS rates and with a p value of 0.27 for these data. One candidate explanation for this difference is that the reduction in BPS rates is achieved by the accumulation of modifiers selected to target BPS mutagenesis but at most incidentally targeting indel mutagenesis. Because single-base additions and deletions tend to be the large majority of indels in mesophiles (35 single-base indels/ 38 total indels in phage l, 20/23 in phage T4, 45/45 in Herpes simplex virus, 604/641 in E. coli, 88/97 in S. cerevisiae, and 24/32 in S. pombe) and are similarly frequent in thermophiles (84/95 in S. acidocaldarius and 46/54 in T. thermophilus), these small indels must be the main targets of antimutagenic modifiers acting on indels generally. Both single-base indels and BPSs result from errors of insertion followed by failures of proofreading and DNA mismatch repair in well studied model organisms such as E. coli and S. cerevisiae, but little is known about the sources of spontaneous mutations in S. acidocaldarius and T. thermophilus.

New Fishing Holes
Are there likely to be other outliers with informative deviations from the mutational pattern that is consistently displayed among the mesophilic microbes examined to date with respect to either the mutation rate or the BPS:indel ratio?
Mutations to cold sensitivity are rarely reported and are anecdotally described as difficult to discover. If they are indeed rare, perhaps fewer missense mutations produce mutant phenotypes in psychrophiles than in mesophiles. One evolutionary consequence might then be a relaxation to a higher spontaneous rate of BPS mutation, perhaps with little effect on the rate of indel mutation.
Because of incomplete buffering against the impacts of their environments, halophiles and acidophiles experience relative high internal concentrations of Na + and H + , respectively, compared to other microbes. These ionic environments might be unusually stressful to mutants carrying missense mutations, resulting in adjustments to their mutational patterns in the same direction as seen for thermophiles. Although without significance because of sampling constraints, Table 2 attributes a five-fold lower BPS mutation rate to the acidophile S. acidocaldarius than to the nonacidophile T. thermophilus. Unfortunately, an attempt to characterize mutation in the halophilic archaeon Haloferax volcanii failed, probably because this mesophile is highly polyploid [10].
The lactic acid bacterium Oenococcus oeni, used in wine making to convert malic acid to lactic acid, lacks the usual bacterial DNA mismatch repair (MMR) system and has a high mutation rate as judged by mutations conferring resistance to rifampin and erythromycin, as does Oenococcus kitaharae [11]. These results suggest a powerful genus-wide mutator condition, which would normally be highly deleterious. The question then arises whether the lack of MMR is so strongly adaptive in these species as to outweigh the sharply decreased fitness of the mutator condition, or whether the species have been unable to re-acquire the MMR genes by horizontal transfer.
Whereas the above two species lack MMR function and display mutator phenotypes, the crenarchaeons as a whole, including S. acidocaldarius, lack all known bacterial MMR genes, but S. acidocaldarius, at least, displays an antimutator phenotype compared with mesophiles. How can this be? In Escherichia coli, the mutation rate per average base pair <8610 210 (Tables 1 and 2). Based on the strengths of mutator mutations, replication infidelity can be estimated as the product of three components during DNA replication: insertion errors <0.9610 25 , proofreading failures <1.7610 22 , and MMR failures <5610 23 [12,13]. In bacteriophage T4, which does not employ a general MMR system, the mutation rate per average base pair <2610 28 (Tables 1 and 2). Based on the strengths of mutator mutations, replication fidelity can be estimated as the product of two components during DNA replication: insertion errors <1610 25 and proofreading failures <2610 23 [13]. Thus, T4 makes up for the lack of MMR by a proofreading potency about an order of magnitude greater than that operating in E. coli. The mutation rate per base pair for S. acidocaldarius <3610 210 , which might be achieved by a product of factors applied to the T4 insertion and proofreading accuracies that together produce a 70-fold improvement. Alternatively, S. acidocaldarius may possess an MMR system so distinct from the standard mutHLS model as to have escaped recognition by genomic scans. Note also that both thermophiles have genomes about twofold smaller than the E. coli genome.

General Procedures
We begin in possession of values for the following: G = the genome size in bases or base pairs. T = the number of bases or base pairs in the target (the mutation-reporter sequence). m T = the measured mutation rate at T, corrected where necessary for mutants expressing the characteristic phenotype but revealed by sequencing to lack mutations in the reporter gene, but not corrected for mutants with two or more mutations (which are infrequent and sometimes absent). In many cases, m T = f/ ln(m T N) where f = the measured mutation frequency for the given target, N = the final population size, and the median m T over several cultures is used [14], a method that is robust compared to the classical fluctuation test provided the average number of mutational events per culture is $30 [15]. M = number of sequenced mutants = B+I, where B = number of BPS mutants and I = number of indel mutants, the latter also including complex mutants (a minority, if present at all) regardless of their components.
For the ''historical'' method, we correct for undetected BPSs by multiplying the number of detected BPSs by 4.726 [6].

Calculations
Phage M13. G = 6407. This system is unique among popular mutation reporters. It consists of an E. coli lacZa transgene embedded in the single-stranded DNA of the M13 genome and carrying both an upstream regulatory region and the beginning of the lacZ gene. Because thousands of mutants have been sequenced, it has become apparent which mutations are detectable when present singly and which are not [14,16]. The target sizes for base substitutions (T B = 245) and for single-base indels (T 61 = 177) are thus well defined, and we further assume that the infrequent larger indels are fully detectable (T L = 239). The measured mutation frequency f was 5.86610 24 [17], M = 117, B = 67, I 61 = 11 and I L = 39. Assuming that virtually all replication occurs by a rolling circle mechanism, the mutation rate is calculated as for RNA viruses, m = f/2c where c is the number of consecutive cycles of infection [18]. The following protocol was used to grow the stock ( [17] and T. A. Kunkel, personal communication). The contents of one plaque ($10 13 pfu) were added to l L of medium containing E. coli cells diluted from an overnight culture to about 10 7 cells/ml, so that the multiplicity of infection was about 10 3 . The input of infected cells from the plaque was #10 8 , so that the input concentration of infected cells = 10 8 /10 3 /ml = 10 5 /ml, that is, no more than 10 5 / 10 7 = 0.01 of all cells. c<2.5 in the plaque +1 in the liquid culture