Genetic information should be accurately transmitted from cell to cell; conversely, the adaptation in evolution and disease is fueled by mutations. In the case of cancer development, multiple genetic changes happen in somatic diploid cells. Most classic studies of the molecular mechanisms of mutagenesis have been performed in haploids. We demonstrate that the parameters of the mutation process are different in diploid cell populations. The genomes of drug-resistant mutants induced in yeast diploids by base analog 6-hydroxylaminopurine (HAP) or AID/APOBEC cytosine deaminase PmCDA1 from lamprey carried a stunning load of thousands of unselected mutations. Haploid mutants contained almost an order of magnitude fewer mutations. To explain this, we propose that the distribution of induced mutation rates in the cell population is uneven. The mutants in diploids with coincidental mutations in the two copies of the reporter gene arise from a fraction of cells that are transiently hypersensitive to the mutagenic action of a given mutagen. The progeny of such cells were never recovered in haploids due to the lethality caused by the inactivation of single-copy essential genes in cells with too many induced mutations. In diploid cells, the progeny of hypersensitive cells survived, but their genomes were saturated by heterozygous mutations. The reason for the hypermutability of cells could be transient faults of the mutation prevention pathways, like sanitization of nucleotide pools for HAP or an elevated expression of the PmCDA1 gene or the temporary inability of the destruction of the deaminase. The hypothesis on spikes of mutability may explain the sudden acquisition of multiple mutational changes during evolution and carcinogenesis.
Evolution and carcinogenesis are driven by mutations. Cells maintain constant mutation rates and can afford only transient mutagenesis bursts for adaptation. The nature of the mutational avalanches is not very clear. We sequenced the whole genomes of mutants induced in haploid and diploid yeast by nucleobase analog HAP and by DNA editing cytosine deaminase. Mutants selected in diploids are saturated with passenger mutations. Far fewer mutations are found in haploid mutants. Treatment with a mutagen without selection results in intermediate mutagenesis. The observed transient hypermutability of diploids under mutagenic insult helps to explain the wellspring of mutations that arise during evolution and carcinogenesis.
Citation: Lada AG, Stepchenkova EI, Waisertreiger ISR, Noskov VN, Dhar A, Eudy JD, et al. (2013) Genome-Wide Mutation Avalanches Induced in Diploid Yeast Cells by a Base Analog or an APOBEC Deaminase. PLoS Genet 9(9): e1003736. doi:10.1371/journal.pgen.1003736
Editor: Nancy Maizels, University of Washington, United States of America
Received: May 22, 2013; Accepted: July 5, 2013; Published: September 5, 2013
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Funding: The work was supported by UNMC Eppley Cancer Center seed grants and, in part, by NCI grant CA129925; Smoking Disease Research Program DHHS grant 2013-21; by the Russian Federal Government Research Program “Innovative Scientific Personnel”; State Contract 14.740.11.0916 to YIP; Federal Grant-in-Aid Program “Human Capital for Science and Education in Innovative Russia 2009–2013” to YIP; and the Federal Grant-in-Aid Program “Human Capital for Science and Education in Innovative Russia” (Governmental Contract No. 8654). AGL was supported by a Graduate Research Fellowship from the University of Nebraska Medical Center. MH was supported by NIH grants R01AI072435 and R01GM100151. IBR is supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health/DHHS. The University of Nebraska DNA Sequencing Core receives partial support from the NCRR (1S10RR027754-01, 5P20RR016469, RR018788-08) and the National Institute for General Medical Science (NIGMS) (8P20GM103427, GM103471-09). This publication's contents are the sole responsibility of the authors and do not necessarily represent the official views of the NIH, NCI or NIGMS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The precise balance between genome stability and mutagenesis is vital for the survival of a species , , . It ensures the maintenance of the optimal combinations and frequencies of alleles with high fitness and, simultaneously, the introduction of new mutations that are the raw material for the natural selection that drives adaptation in a changing environment. A wealth of data indicate that this balance shifts toward higher mutation rates during sub-optimal conditions, and then returns to normal levels (, ,  and references therein). Similar mechanisms have been proposed to explain the evolution of tumors , . Sequencing of cancer genomes shows that tumor genomes are highly enriched with mutations , . The accumulated mutation load cannot be explained by normal mutation rates and requires highly mutable cells (, ; reviewed in ). A stable mutator phenotype would inexorably reduce tumor fitness due to the accumulation of mutations in regulatory and essential genes. In order to account for this discrepancy, it has been hypothesized that the mutator phenotype in cancer is transient , . Spikes of hypermutability can be caused by epigenetic changes and/or the defective regulation of DNA repair and replication , abnormally high expression of DNA editing deaminases ,  and other processes.
Another layer of complexity is added by the fact that the mechanisms of the appearance of mutants are different in haploid and diploid organisms. In haploid cells, a mutation-causing defect of the gene product is expressed immediately. In diploid cells, a wild-type allele will mask a recessive mutation, and only the effects of dominant mutations will be observed (Fig. 1). For recessive mutations, the mutant phenotype will only be expressed in diploid cells when the second allele is inactivated. This can occur in various ways. First, either gene conversion or recombination between the mutated allele and the centromere will lead to a reduction to homozygosity. Second, chromosome loss or deletion of the region encoding the wild-type allele will result in a reduction to hemizygosity. Third, the wild-type allele may acquire an independent, typically heteroallelic mutation. The classic example illustrating the importance of two-step mutagenesis is Knudson's theory of retinoblastoma development via the inactivation of both alleles of a tumor suppressor gene , .
Mutation (red bar) in a gene (blue rectangle) occurring with frequency “m” will lead to a phenotypic change in diploid, either by a concomitant loss of heterozygosity (by recombination with frequency “r”) or deletion (“d”) or chromosome information loss/inactivation (“l”). The frequency of such events “f” should be the product of the frequencies of each independent event. Mutations can also occur independently in both alleles of a diploid but at different sites, and their frequency should be the square of “m”.
If measured by phenotypic change, mutation frequency should be much lower in diploids than haploids (Fig. 1); however, in yeast, it is only several-fold less ( and references therein). Most mutagens act in yeast by a two-step mechanism involving mutation and segregation, because they induce a high frequency of recombination events , while replication infidelity caused by non-recombinogenic base analogs or proofreading exonuclease defects somehow induces a high level of independent mutations in both homologs , .
Most of our knowledge of the mechanisms of mutagenesis comes from classical studies in haploid models, such as E. coli, haploid yeast strains, or Drosophila germ cells. The molecular mechanisms of mutagenesis in diploid cells have not been studied in-depth. In this work, we induced mutations in isogenic haploid and diploid yeast using one of two different types of mutagens that generate non-canonical bases in DNA: the base analog 6-hydroxylaminopurine (HAP), and ectopically produced editing cytosine deaminase PmCDA1 from sea lamprey. We have chosen these mutagens and genetic backgrounds to avoid an induction of recombination by mutagens. Yeast is characterized by high recombination. Our conditions were well-suited for study of mutagenesis more closely resembling the processes in human cells, when recombination is rare.
HAP and PmCDA1 enhance replication infidelity and create a mutator phenotype on demand. HAP is incorporated during the growth in a media with analog and rapidly wiped out from cells after transfer to the medium without it. It is known that nucleotide pools are constantly and rapidly renewed in yeast cells . The expression of PmCDA1 in our system is under the control of a regulatable promoter and could be turned on and off. After mutagenic treatment we selected forward mutants resistant to antibiotic canavanine or toxic drug 5-fluoroorotic acid (5-FOA) and resequenced their genomes. This allowed for the determination of accumulated DNA sequence changes specific for each mutagen. The numbers of induced base substitutions were more than an order of magnitude higher in diploid mutants than in haploid mutants. The genomes of diploid clones treated with either mutagen but not selected for resistance also contained significantly less mutations than the diploid mutant clones. This indicates the heterogeneity in mutability between different cells and proves that selected mutants came from a fraction of cells that experienced the most dramatic mutagenesis. We call such cells hypermutable. Diploid hypermutated cells survived, because most of the induced mutations were recessive and did not result in phenotypic changes when heterozygous. Haploids with similar levels of mutagenesis die due to inactivation of essential genes. For the first time, to our knowledge, this work suggests that cells have a wide range of mutability in a genetically homogenous population of eukaryotic cells exposed to a mutagen. This may explain the rapid appearance of mutations (mutation avalanches and recently discovered kataegis) in evolution and disease progression, especially in sporadic cancer.
HAP and PmCDA1 are highly mutagenic in haploid and diploid yeast
HAP is an adenine base analog that has an ambiguous base-pairing capacity. In imine form it can pair with thymine, whereas in its rarer amine form it pairs with cytosine. HAP is a universal mutagen that is active in most organisms, from humans to bacteria and their phages , . The conversion of HAP in cells to the corresponding deoxyribonucleotide triphosphate (dHAPTP), followed by its incorporation into DNA by replicative polymerases, results in A-T to G-C and G-C to A-T transition mutations (see Fig. 2A, 2B) , , , , . PmCDA1 belongs to the AID/APOBEC superfamily of editing deaminases , . These enzymes are found in different vertebrate species and perform a variety of functions, including immunoglobulin gene diversification (AID), RNA editing (APOBEC1), restriction of retroviruses (APOBEC3s), and possibly active DNA demethylation , , , . PmCDA1 is involved in the diversification of genes encoding immunoglobulin analogs in sea lamprey and is closely related to other APOBEC enzymes . AID/APOBECs fulfill their functions by catalyzing cytosine deamination, which results in the formation of uracil in the substrate DNA or RNA. Uracil can then be processed by the base-excision repair pathway protein uracil-DNA-glycosylase, followed by repair, which may result in mutations and recombination. If uracil escapes repair during the next round of replication, a C-G to T-A transition occurs (Fig. 2C) , .
A. Induction of G-C to A-T transitions by the mechanism of HAP misincorporation. B. Induction of A-T to G-C transition mutations by the mechanism of misincorporation opposite HAP. C. Induction of G-C to A-T mutations by cytosine deamination followed by incorporation of adenine opposite uracil. D. Numbers of SNVs induced by HAP in haploid (LAN201-1 - LAN201-4) and diploid (LAN211-1 – LAN211-10) strains. The proportions of substitution types are shown by color. E. Numbers of SNVs induced in haploid (LAN200-L1 - LAN200-L4) and diploid (LAN210-L1 – LAN210-L7), PmCDA1-induced mutants CANR mutants, and diploid PmCDA1-induced FOAR mutants (LAN210-FOA-1 and LAN210-FOA-2). Proportions of substitution types are shown by color.
Both HAP treatment (reviewed in ) and the ectopic production of PmCDA1 ,  are not very toxic but strongly mutagenic in wild-type yeast as measured by different reporter systems detecting base-pair substitutions. In contrast to the other organisms, HAP does not induce recombination in Saccharomyces cerevisiae , , , most likely because a key enzyme required to excise HAP-containing DNA is absent in yeast. In addition, mismatch repair – one of the key safeguards of genome stability  – does not seem to recognize HAP in the DNA , in contrast to the other base analogs such as dP . These unique properties provide the opportunity to detect a genuine signature of base analog-induced mutations. PmCDA1 was chosen as a prototype of the AID/APOBEC1 family because it has the highest mutagenic effect in the group when produced in yeast . PmCDA1 is recombinogenic in wild-type S.cerevisiae, but inactivation of uracil-DNA-glycosylase (ung1) completely blocks deaminase-induced recombination . Thus, HAP and PmCDA1 are perfect tools for studying mutagenesis in diploid cells under conditions when induced recombination is suppressed, by mechanism of the induction of independent mutations (Fig. 1, right panel).
We examined the effect of ploidy on HAP- and PmCDA1-induced mutagenesis. The median frequency of canavanine-resistant mutants (CanR, mutation in the CAN1 locus) in the HAP-treated haploid strain LAN201 (see Table S1 for genotypes of strains) is 2.51*10−5 (Table 1), a 23-fold increase over the background level. The frequency of HAP-induced CanR mutants in the isogenic diploid strain LAN211 is 5*10−7, 833-fold higher than expected based on the mutation frequency in the haploid strain (both copies of the CAN1 gene have to be inactivated in order to produce an antibiotic resistant phenotype, Fig. 1) (Table 1). We did not find CanR clones in diploids in the absence of mutagen. This was consistent with previous observations that the spontaneous frequency of mutants in wild-type diploids is extremely low , . Overall, the results are in full agreement with our earlier genetic data on the mutagenesis of diploids with HAP with a different reporter, the LYS2 gene (i.e. they are reporter-independent) .
The expression of PmCDA1 in the ung1 (uracil-DNA-glycosylase-deficient) haploid strain LAN200 leads to a 22-fold increase in CAN1 mutagenesis over the background frequency (1.6*10−4 vs. 7.2*10−6). Similar to HAP, the frequency of PmCDA1-induced mutations in the diploid ung1 strain LAN210 is much higher than expected based on the observed haploid rate (2.3*10−6 vs. 2.5*10−8, a 92-fold increase) (Table 1). Similar results has been obtained with the URA3 reporter gene (mutants resistant to the 5-FOA). Frequency of FOAR mutants in diploid strain was much higher than predicted based on the measured frequency in the haploid strain (see Table 1). The viability of haploid and diploid cells treated with deaminase was 65% and 90%, respectively.
High-throughput “next-generation” DNA sequencing (NGS) has revolutionized biomedical research. In order to better understand the phenomenon of an unexpectedly high mutation rate in diploid strains, we used NGS to determine the genome-wide spectra of mutations induced by HAP and PmCDA1 in yeast. To make the analysis of mutant clones possible, we first determined the sequences of the genomes of our wild-type strains. DNA from LAN201, LAN211, LAN200 and LAN210 (Table S1) was extracted, sequenced on an Illumina HiSeq 2000 instrument, and reference genome sequences were de novo assembled from the sequencing data (see Materials and Methods for details of sequencing and genome assembly). Since LAN201 and LAN211 — as well as LAN200 and LAN210 — are isogenic to each other, the sequences of their genomes were identical, with the exception of the MAT locus. However, the related UNG1 and ung1 strains (LAN201 and LAN211 vs. LAN200 and LAN210) differ by seven single-nucleotide variations (SNVs), in addition to disruption of the UNG1 gene by a cassette conferring hygromycin resistance (Table S2). Overall, the sequence of our LAN-specific reference genome contains 12,077,153 bp and covers 92.74% of the S288C nuclear genome. Other genome parameters, such as the number of genes and the GC percent, are similar between the LAN and S288C reference genomes (Table S3).
Resequencing of HAP-induced haploid and diploid mutants
Next, we resequenced the genomes of canavanine-resistant clones induced by HAP in LAN201 and LAN211 strains. Four haploid and 10 diploid genomes were sequenced. We detected numerous mutations in all 14 genomes (Table 2). All mutations detected in haploid clones have SNV frequencies of 80–100%. This confirms that all cells in the sequenced colony were derived from one mutated progenitor cell. Rare cases where SNV in haploid clones have a frequency between 40 and 80% were assembly errors (see Materials and Methods). In diploid clones, most of the mutations are true heterozygous (i.e., frequencies of SNVs between 40 and 80%). Rarely, two or more SNVs in the same gene are found. They could be clustered mutations in one copy of the gene or changes in both copies, i.e. heteroallelic. Such cases in our reporter gene lead to a detectable phenotype due to the inactivation of both copies of the reporter and, therefore, were true heteroalleles. We cannot predict from the sequencing data whether the mutation will be recessive or dominant. However, most of the heterozygous mutations are expected to be recessive, because gain of function is a rather specific event. In addition, not all SNVs lead to the phenotypic changes (also see section “Prediction of effects of multiple SNVs on viability” below). Therefore, it is likely that the functions of the majority of the genes with SNVs are not disrupted in diploid mutants, even in the cases where multiple SNVs were present. In the case of the CAN1 reporter gene, no dominant mutations have ever been reported, to our knowledge. This is expected because the resistance phenotype is due to the loss of function of Can1p, a one-subunit arginine permease (www.yeastgenome.org). Because of the selection for the loss of function of permease in diploids, two copies of the CAN1 gene should be damaged (Fig. 1). The predominant mechanism of the appearance of CanR mutants was independent mutations in the two homologs. This results in heteroallelic mutations where both alleles are non-functional as nine clones out of 10 possessed heteroallelic mutations in the CAN1 gene. One clone had one homozygous mutation and is discussed below.
Some mutations in the genomes were found with a SNV frequency of more than 80% and were classified as homozygous. The majority of these rare homozygous mutations in diploid clones apparently result from spontaneous recombination events (see Table 2). This includes the homozygous mutation in the CAN1 gene of clone LAN211-4, which belongs to the group of 4 homozygous mutations localized on the distal end of the left arm of chromosome V and, therefore, is a result of a mutation-segregation mechanism via recombination (Table 2, Fig. 1). The mutational load is strikingly different in the haploid and diploid clones (P <0.005, see Materials and Methods). Four haploid clones contain 54 to 356 mutations, whereas diploids had from 1020 to 1747 SNVs per genome (Table 2 and Fig. 2D). The average number of SNVs per 100 Kb is 1.3 for haploids and 6.05 for diploids (Table 2). All mutations are A-T to G-C and G-C to A-T transitions, in agreement with the mechanism of HAP action during replication (Table 2 and Table S4). In most sequenced genomes, mutations in the G-C pairs were more abundant than mutations in the A-T pairs (see right column in Table S4 and Fig. 2D), which is consistent with earlier data with specific reporters , . The bias toward mutations in G-C pairs suggests that most of the effects of dHAPTP are attributable to its misincorporation opposite C in the first replication cycle (Fig. 2). However, the variability of the ratio of mutations in the G-C pair to mutations in A-T pairs in individual genomes was high, from 0.5 in LAN211-1 to 5.3 in LAN211-7. In particular, we observed a strong bias toward mutations in A-T pairs in one diploid HAP-induced mutant clone (LAN211-1). The reason for these differences is unknown and may reflect cell-to-cell variability in HAP metabolism and/or DNA replication (see Discussion). This highlights the value of whole-genome resequencing studies, which provide a snapshot of the mutagenic process in individual cells.
Analysis of the sequence context of these mutations did not reveal any strong biases toward any particular sequence contexts for HAP-induced SNVs (Fig. 3). However, we observed a slight preference for A/T rich sequences in our genome-wide data for both G-C to A-T and A-T to G-C transitions. Mutational spectra obtained using reporter genes shows different results depending on the substitution type and reporter used (Fig. 3, first column of consensus sequences; see Discussion).
The spectra of mutations induced by HAP in genomes are from this study. Data for the URA3 reporter is from this work and  and for the LYS2 reporter from . PmCDA1-induced mutation spectra in reporter genes and in lamprey VLRs are a combination of data from this work and .
Genomes of PmCDA1-induced haploid and diploid mutants
We sequenced four haploid CanR, seven diploid CanR and two diploid FOAR (mutations in the URA3 locus confer resistance to 5-FOA) mutant clones induced by PmCDA1. Similar to the results obtained with HAP, all mutations in haploid PmCDA1-induced mutant strains have SNV frequency >80%, whereas the majority of mutations in diploid clones are heterozygous (Table 3). It is important to note that the average number of mutations in haploids was very close to what was found in yeast ung1 haploids after induction of hyper-active AID deaminase or APOBEC3B . PmCDA1 induces slightly more homozygous mutants in diploids than HAP. As opposed to the results with HAP-induced diploid mutants (see Table 2), homozygous SNVs in PmCDA1-induced diploid mutants are mostly scattered throughout the genome and, therefore, are not due to recombination events. Even if the homozygous mutations were found very close to each other (such as in the most hypermutable region on chromosome X, see Table S6 and  for details), they were always accompanied by the heterozygous SNVs in close proximity, and sometimes the heterozygous SNVs were found in between the homozygous ones. These data indicate that homozygous mutations in genomes of PmCDA1-induced mutants in diploids are unlikely to be due to recombination or gene conversion. It is plausible that regions of the genome that are prone to PmCDA1-dependent deamination can accumulate multiple independent mutations, sometimes leading to the homozygous SNVs. In the CAN1 reporter, heteroallelic mutations are present in six diploid CANR mutant clones, while only one mutant clone is homozygous. Both FOAR diploid clones possess heteroallelic SNVs in the URA3 reporter gene. Diploids accumulate more PmCDA1-induced SNVs than haploids (4.38 vs. 0.74 SNVs/100 Kb; 5.9-fold increase; p = 0.005); however, the variation of the number of mutations in PmCDA1-induced diploids is higher than in HAP-induced diploid clones (Table 3 and Fig. 2D, 2E). All SNVs are C-G to T-A transitions, as expected from cytosine deamination (Table S5, Table 3, and Fig. 2E). Interestingly, a small fraction of mutations (about 0.6%) are tandem, i.e. two consecutive cytosines or guanines are mutated (CC→TT and GG→AA tandem transitions, see Table S5). We found one triplet CCC→TTT mutation in clone LAN210-FOA-L1. In addition, there are strong regional hot-spots in the genome-wide distribution of PmCDA1-induced mutations that are not present in HAP-induced mutants . The observed local regions which are saturated with mutations cannot be associated with the recombinational hotspots and long regions of ssDNA formed during resection , ,  because PmCDA1 does not induce recombination in ung1 yeast . The high number of hotspots per genome cannot be explained in our system by the spontaneous DSB in yeast cells as it was recently proposed (also see Discussion) . The hotspots of deaminase-induced mutations are described in detail in our recent paper  and the underlying mechanisms are currently under investigation.
Types of HAP-induced mutations near origins of replication
The mutation rate can be affected by the replication timing . The mutagenic mechanism of HAP (Fig. 2) allows for the discrimination of errors on the lagging versus the leading strand during DNA replication . Previous studies examining site-specific reversions reported a preference for HAP-induced errors on the leading strand, when site-specific reversions are studied , . Our genome-wide analyses permitted us to reinvestigate this phenomenon independent from the selection for specific mutations. These new, genome-wide analyses of locations of C to T versus G to A mutations found that their distribution is random on the leading or lagging strands. In order to detect potential bias close to the origins of replication, we analyzed mutations around each known origin of replication in the region +/−2000 nucleotides. We extracted all cases of neighboring mutations where two or more mutations are found in the vicinity of the same origin. We assumed that if there is a strand-specific asymmetry of mutations near the origins of replication, this should be reflected by the distribution of the types of neighboring mutations. The changes on the opposite side of the origin of replication should be complementary, because the leading and lagging strands are swapped. For example, if two mutations G-C→A-T are located to the right of an origin, they both should be of the same type, either G→A or C→T, while mutations to the left of this origin should be reciprocal (i.e., C→T and G→A). Analysis of 489 pairs of such neighbor mutations revealed a marginally significant deviation from a random expectation ½: 270 pairs of mutations are consistent with the model of strand-specific asymmetry of mutations, whereas 219 are inconsistent with this model (P sign test = 0.024). This result is in agreement with the model that most errors induced by HAP occur with equal probability on lagging or leading DNA strands, while in some regions/sites the bias could be substantial. Earlier work with site-specific reversions may have only described a minor and specific pathway of HAP mutagenesis at such specific sites , . We recently reached the same conclusion for HAP-induced forward mutations in the URA3 reporter gene .
Genomes of random unselected HAP- and PmCDA1-treated clones (called “non-mutants” in the text)
Resequencing of genomes of haploid and diploid HAP- and PmCDA1-induced mutants indicate that there is significant variability in mutation levels in yeast cell populations. Since diploid mutant clones were selected for concomitant mutations of the two copies of the CAN1 gene, we then investigated the mutational load in cells treated with either mutagen but not selected for canavanine or 5-FOA resistance. We have picked up arbitrary diploid clones from the same YPDU plates that were used to treat strains with HAP before replica-plating to the canavanine-containing media, and from synthetic complete plates that were used to estimate the viability in the case of PmCDA1 (see Materials and Methods for details). We sequenced the genomes of eight HAP-treated and four PmCDA1-treated non-mutants. Analysis of SNVs revealed that HAP (Table 2 and Table S4) and PmCDA1 (Table 3 and Table S5) induce the same types of mutations in non-mutant clones as in selected CanR and FOAR mutants, albeit at significantly lower frequencies (Fig. 4A and Fig. 4B, respectively). Most of the mutations in non-mutant clones are heterozygous. Interestingly, HAP-treated non-mutant diploid clones accumulate more SNVs than HAP-induced CanR haploid mutants, whereas PmCDA1-treated non-mutant clones contain fewer SNVs than CanR PmCDA1-induced haploid mutants (Fig. 4). These results provide additional evidence that levels of HAP- and PmCDA1-induced mutagenesis vary widely, even in the absence of selection (see Discussion).
A. Number of mutations for different types of HAP-treated genomes. B. Number of mutations for different types of PmCDA1-treated genomes. Bars are median values. Note the logarithmic scale. p values for Mann-Whitney test are shown for significant differences. C. Viability of haploid spores obtained from different diploid clones. Spores resulting from dissecting tetrads produced from wild-type strains (LAN211 and LAN210), mutants induced by HAP (LAN211-5) and PmCDA1 (LAN210-L4), and non-mutant clones treated with HAP (LAN211-NM3) and PmCDA1 (LAN210-NM2) are shown. Each vertical row of four colonies represents the progeny of four haploid spores produced in a single meiosis.
Viability of haploid progeny of diploid clones with known genome sequence
Recessive heterozygous mutations in diploid genomes have no effect on survival but can cause lethality in haploids. We performed tetrad analysis to estimate the viability of the haploid progeny of wild-type diploid strains, as well as progeny from HAP- and PmCDA1-treated mutant and unselected mutagenized clones. Most of the haploid spores obtained from wild-type strains (LAN211 and LAN210) are viable (Table 4 and Fig. 4C, top row). On the other hand, most of the spores from HAP-induced mutants are inviable (see example in Fig. 4C, second row). A few viable spores were detected for only two mutants tested (LAN211-5 and LAN211-6, see Table 4). Similarly, the majority of spores obtained from PmCDA1-induced mutants do not grow (see e.g. Fig. 4C, second row; see also Table 4). HAP-treated, non-mutant clones show variable viability. All LAN211-NM1 progeny are inviable, whereas viability is very high in LAN211-NM2 and LAN211-NM4 progeny. LAN211-NM3 progeny display an intermediate level of viability (44.4%) and considerable heterogeneity among viable spores. Some of the spores were of normal size, while others were small (Fig. 4C, bottom row; Table 4). The viability of the haploid progeny of PmCDA1-treated non-mutant clones is similar to that of the wild-type strains.
Prediction of effects of multiple SNVs on viability
About 75% of HAP-induced mutations were found in open-reading frames (ORFs) of protein-coding genes (Fig. 5A), as expected, given that ORFs encompass about 73% of our reference genomes. Among these mutations, two-thirds (comprising about 50% of all SNVs) are non-synonymous, whereas about one-third (∼25% of all SNVs) are synonymous. SNVs resulting in protein truncations range from 2% to 3% in different genome types (Fig. 5). Interestingly, we found eight mutations predicted to result in the extension of an encoded protein sequence (Table S6). Unexpectedly, we found no difference in the distribution of the types of substitutions between all types of clones - haploid mutants, diploid mutants and diploid non-mutants (Fig. 5).
Results for haploid and diploid mutant and diploid non-mutant clones are shown for HAP (A) and for haploid and diploid mutants for PmCDA1 (B). Numbers are mean values of percentages with 95% confidence limits in parentheses. PmCDA1-treated, non-mutant clones are excluded from the analysis due to low levels of mutations in their genomes.
The same analysis was performed for SNVs in PmCDA1-induced mutants (Fig. 5B). Here, many more SNVs are present in regions outside of CDS, as compared to the HAP results. Sixty-five and 56 percent of SNVs were found in non-protein coding regions in haploid and diploid mutant clones, respectively. These values are much greater than expected given that non-protein-coding regions comprise only about 25% of the yeast genome. Also, the fraction of non-synonymous SNVs is much less for PmCDA1-induced clones compared to HAP-induced clones (21% and 26% vs. 45–48%). The number of synonymous SNVs for PmCDA1-induced clones ranges from 11% to 16%. The fractions of truncation mutations were similar for HAP and PmCDA1 (3% in PmCDA1 genomes and 2–3% for HAP).
We estimated from our data that 0.3 to 1.4% of all HAP-induced base substitutions cause lethal mutations in haploid cells. Our logic is as follows. Considering that about 18% of yeast genes are essential , , and given that about half of the SNVs in HAP-treated genomes are either non-synonymous or lead to protein truncation (Fig. 5), we estimate that up to 9% of all SNVs can potentially be lethal in haploid progeny. This translates into 43, 4, 40 and 15 such potentially lethal SNVs in the genomes of LAN211-NM1, LAN211-NM2, LAN211-NM3 and LAN211-NM4, respectively. To get an estimate of how many of these potentially lethal SNVs are actually lethal, we performed the following calculations. Roughly half (44.4%) of the spores obtained from LAN211-NM3 are inviable, indicating the presence of a single latent lethal heterozygous mutation in this clone. That means that about 2.5% (one mutation out of 40 potentially lethal SNVs) of non-synonymous SNVs in ORFs of essential genes lead to lethality. Strain LAN211-NM1 has a similar number of SNVs but none of its spores are viable (28 tetrads with 112 spores analyzed, all spores inviable; see Table 4). Therefore, spore viability in this strain is less than ∼1% (1/112), which translates into at least six or seven latent lethal heterozygous SNVs in this clone, assuming that the mutations are not linked (viability of spores = (1/2)n, where n = number of heterozygous mutations lethal in homozygous state; for 1% viability n≈6.5). At least ∼15% (6.5/43) of the non-synonymous SNVs in essential genes in this clone are lethal. Taken together, our data show that three to 15% of non-synonymous SNVs (or 0.3 to 1.4% of all SNVs) in our HAP-induced mutant clones are lethal in the homozygous state.
Fraction of HAP or PmCDA1 hypermutable cells
Earlier studies using next-generation sequencing in yeast documented rare spontaneous mutations in yeast haploid and diploid strains , . Here, we extend these findings by comparing strains with different ploidy and by applying two different mutagens. We found the intrinsic differences in the ability of cells from the same population to mutate after treatment with two different mutagens. Mutants conferring resistance to canavanine in diploid yeast induced by two types of mutagens accumulate many more SNVs than haploid mutants (Figs. 2D, 2E, 3A, 3B, Tables 2 and 3). This is in agreement with the high mutation frequency observed in diploids (Table 1). The canavanine-resistance phenotype (CanR) in diploids is a result of two genetic events needed to inactivate both copies of the CAN1 gene in diploid strains (Fig. 1). Since both HAP and PmCDA1 (in ung1 strains) do not induce recombination in our system, both alleles of CAN1 are inactivated by independent mutations (right branch on Fig. 1), except for rare cases of spontaneous mitotic recombination. Thus, by selection for can1 mutants in diploid cells, we essentially select the progeny of cells which experienced high levels of mutagenesis.
The effect of transient hypermutability is not specific for CanR selection. First, PmCDA1-induced FOAR diploid mutants possess the same high level of mutations as their CanR counterparts (Fig. 2E, 3B, Table 3). Second, transient hypermutability is observed with other reporters, e.g. using the LYS2 forward mutagenesis reporter gene . We demonstrated previously that the selection for mutants in haploid strains with a duplication of the reporter gene results in a much smaller number of mutants compared to normal diploids (Fig. 6A)  and . The levels of HAP mutagenesis are the same in triploid strains and in diploid strains with a duplicated reporter gene on one of the homologous chromosomes (Fig. 6B). Thus, in these model systems, high levels of mutagenesis require that cells be diploid or have higher ploidy.
HAP solution was spotted on a disk in the center of plates and colonies of mutants appear as a circle around the place of application. A. Mutant colonies induced with HAP in diploid (left) and a haploid strain with a duplicated LYS2 reporter (right). B. HAP-induced mutants in a triploid strain (left) and diploid strains with duplication of LYS2 reporter in one homologous chromosome (right).
Observed spikes of mutability in individual cells are also not specific to only one particular mutagen. Progeny of such cells was observed in the case of both HAP and PmCDA1, underscoring the fact that different mutagens can induce hypermutagenesis. However, the types of mutations found were mutagen-specific, suggesting that the principal mechanism of mutations in the hypermutable fraction is the same in all other cells. The genome resequencing and genetic results show that the distribution of the mutation load is highly uneven in cell populations. Some cells accumulate dramatically more mutations than others. In other words, the mutation frequency, as virtually any other variable, follows a certain distribution (Fig. 7). Cells that survive very high levels of mutagenesis constitute a hypermutable fraction of a population and impact the overall estimated mutation rate. For example, 1% of cells with a mutation rate three orders of magnitude higher than that in regular cells will elevate the detected rate for a given cell culture by ten-fold. These cells survive in diploid clones and were found as the canavanine-, 5-FOA or aminoadipic acid-resistant mutants that we selected. Haploid cells cannot tolerate such a high level of mutagenesis due to the inactivation of housekeeping genes. The nature and shape of the mutability distribution requires additional investigation with hundreds of genomes from mutagenized but randomly sampled (i.e. non-mutant) clones sequenced.
In the cartoon we used normal distribution as an example. The right side of distribution (highlighted by light green and by red rectangles) contains cells with a very high induced mutation rate. The fraction of hypermutable cells explains the observed increase of mutation frequencies in diploids. The selection of CanR driver mutants in diploids results in the recovery of thousands of passenger mutations. These mutants originated from a transiently hypermutable fraction of cells that survive an extremely high frequency of mutations. Such cells die in haploids (red zone) but survive in diploids. Only the hypermutators from the light green zone survive in haploids. The size of this fraction can be substantial, because mutation avalanches evidence for transient hypermutagenesis was even observed in the genomes of unselected HAP-treated non-mutant clones.
Since the majority of prior studies on the molecular mechanisms of mutagenesis have been performed in the haploid model systems, the hypermutable fraction of diploid cells described here has evaded detection in the earlier literature. To our knowledge, the only exception is the detection of transiently hypermutable populations of cells that arise during adaptive mutagenesis in bacteria , , , . The existence of these hypermutable bacterial cells is restricted to the specific conditions of nutrient starvation. Importantly, hypermutable cells have never been directly detected in the eukaryotic species, although genetic studies are consistent with their presence , . Hypermutable cells can be potentially responsible for the accumulation of multiple mutations during carcinogenesis and evolution.
We further corroborated our model by analyzing the genomes of several non-mutant clones treated with the mutagen. These clones have much less SNVs than their CanR mutant diploid counterparts (Fig. 4). When PmCDA1 is used, the number of SNVs in non-mutant clones is very low (10, 14, 4 and 34 mutations in LAN210-NM1 -LAN210-NM4, respectively), indicating that only a small fraction of cells producing PmCDA1 experience extremely high levels of induced mutagenesis. Therefore, the distribution of cells with different mutation rates is narrower compared to HAP (compare Tables 2 and 3 and Fig. 4A and 3B). It appears that every mutagen causes a different distribution of levels of mutagenesis among cells. The shape of this distribution may be modified by the type of organism, environmental conditions and degree of variation of the mutagen processing physiology in the cells. As a result, the size and parameters of the fraction of hypermutable cells is different for different mutagens. The shape of the “default” distribution of levels of mutagenesis (that is characteristic of a certain cell population not treated with any mutagen) is modified by the application of the mutagen. Mutagens not only increase the integral mutability in the cell population, but they also change the overall shape of the distribution of mutation rates in individual cells as evidenced by the comparison of mutation loads in non-mutant clones treated with HAP (intermediate mutation load) and PmCDA1 (very few mutations).
Several mechanisms could contribute to the uneven mutability of cells in a population. In the case of cells not treated with the mutagens, it could be fluctuations in DNA mismatch repair efficiency in strains with defective DNA polymerase proofreading from cell to cell . In the case of mutagenized cells, the effective intracellular concentration of a mutagen may differ between cells. HAP-induced mutation rates can be influenced by differences in HAP uptake and subsequent metabolism (conversion to dHAPTP by salvage and de novo nucleotide synthesis pathways and hydrolysis of dHAPTP by the Ham1 protein ). It is known that the deletion of the HAM1 gene leads to the increase of yeast sensitivity to the mutagenic action of HAP, by almost two orders of magnitude . In the case of PmCDA1, its mutagenesis level could be modulated by differences in deaminase gene expression, protein degradation and aggregation, availability of substrate ssDNA, and fluctuations in levels of proteins that protect the genome from deamination (such as RPA ) or stimulate deamination (for example, ). The transient hypermutable cells are likely to exist in any cell population. Accumulating evidence suggests that gene expression profiles vary from between cells of the same type in the same tissue (see recent paper about immune cells  and references therein). Such single-cell differences may affect the response of the cells to a particular mutagen or induce the expression of mutator proteins, such as APOBEC , , , , , , . However, the mechanisms underlying these effects are different for different organisms, cell types and the mutagen or mutator backgrounds used. The types of mutations found in the progeny of hypermutable cells and their distributions over the genome depends on the conditions, whether cells were mutagenized and, if so, what mutagen was used.
Even when the same mutagen was applied, the level of mutagenesis and its specificity are both variable between different cells. The ratio of mutations in C-G pairs to mutations in A-T pairs varies widely between different HAP-treated clones (Table S4). Moreover, one of the sequenced HAP-induced diploid mutants (LAN211-1) shows a non-typical bias toward A-T to G-C transitions, whereas in other sequenced clones and in published reports using reporter genes, G-C to A-T transitions are more frequent , . It is hard to explain this extremely interesting phenomenon of clone-to-clone variability. One possibility is that it could be due to cell-to-cell differences in DNA replication. Eukaryotes replicate DNA with the aid of different polymerases . One can speculate that there is a difference between the main replicative DNA polymerases δ and ε in the rules of HAP incorporation and replication of HAP-containing DNA by these enzymes. In this scenario, if partition between pol δ and ε varies from cell to cell, then this could account for the deviation from the expected behavior during HAP-induced mutagenesis, where more G-C to A-T transitions are typically observed. The use of genome-wide sequencing enabled the detection of both transiently hypermutable diploid cells and cell-to-cell variability in the type of changes induced by the same mutagen in the same population of cells. Similar to new paradigms emerging from single-molecule techniques in biochemistry, our analysis revealed that cells undergoing mutagenesis are not identical and differ significantly from the averaged sample estimates.
Effects of mutations on viability
Heterozygous mutations in diploid mutants have no effect on fitness as long as they are recessive. To estimate the effects of these mutations on viability, we induced sporulation of diploid yeast clones and dissected the resulting tetrads of haploid spores. The severe decrease in the viability of spores from CanR mutants (Fig. 4C and Table 4) indicates that these diploids possess multiple lethal mutations in the heterozygous state. As expected from their low mutational load, the viability of spores derived from non-mutant PmCDA1-treated diploids is similar to the wild-type level. HAP-treated non-mutant clones show very interesting results after meiosis and tetrad dissection. Although all spores from clone LAN211-NM1 (474 heterozygous SNVs) are inviable, LAN211-NM2 (40 heterozygous SNVs) and LAN211-NM4 (161 heterozygous SNVs) display near-wild-type spore viability. Of the spores from LAN211-NM3, 55.6% (449 heterozygous SNVs) are inviable. Among the LAN211-NM3 clone's spores, 38 formed colonies of normal size and 47 formed very small, barely visible colonies, which were not able to grow any further after being transferred into YPDAU broth and, thus were classified as inviable. Most likely, the ability of haploid spores to grow reflects the segregation of several lethal and conditionally lethal mutations. The segregation pattern differed from one individual spore to another (see Fig. 4C). These results indicate that the upper threshold for the number of heterozygous SNVs per parental diploid genome after mutagenesis that haploid meiotic progeny will tolerate is somewhere around 460.
The effects of mutations and PmCDA1-induced genome instability
Analyses of the predicted effects of SNVs on genes in different types of HAP-treated clones did not reveal any significant differences in the ratio of synonymous to non-synonymous SNVs and to mutations outside the CDS. PmCDA1-treated clones show similar results, though variability is higher. On the other hand, deaminase induced many more mutations in non-CDS regions than HAP. This result is unexpected because AID/APOBEC deaminases are known to act on ssDNA, especially during transcription . It is possible that PmCDA1 deaminates genomic regions corresponding to 5′- and/or 3′-UTRs of the genes. Another possibility is that deaminases may have access to the ssDNA formed during both transcription and replication in yeast, which results in mutation in transcribed and non-transcribed regions. Further studies are required to clarify the observed effect of preferential enzymatic deamination of non-CDS regions in yeast.
Tandem CC→TT and GG→AA mutations are present in all seven diploid (with one clone possessing triplet CCC→TTT mutation) and one of the haploid PmCDA1-induced mutants. These tandem substitutions are indeed due to enzymatic deamination and not due to the oxidative damage to the DNA, because CC→TT mutations have been found exclusively in the genomes of clones treated with PmCDA1. In addition, dense localized clusters of mutations are present in several loci. These clusters of mutations are highly similar to the clusters recently discovered in yeast under chronic exposure to a mutagen and in human cancers , . It has been hypothesized that AID/APOBEC deaminases are involved in the formation of these clusters. The existence of tandem SNVs and mutation clusters induced by PmCDA1 is likely a result of the processive action of deaminase on certain regions of the yeast genome (i.e., where it binds to ssDNA and slides back and forth, catalyzing multiple deaminations) , . The processive action of deaminase in the genome may also help to explain the higher numbers of mutations in non-protein-coding regions (Fig. 5). Clones with a high frequency of mutations in ORFs that result from processive deaminase activity are likely to be counter-selected due to the dominant nature of the resulting mutant alleles. Our data provide the direct link between AID/APOBECs and mutational thunderstorms (kataegis); we concentrate on analysis of these clustered mutations induced by deaminase in our parallel paper . Two other groups have recently used a yeast system to study deaminase-induced genome-wide mutagenesis and have come to similar conclusions , . Taylor and colleagues  proposed (and demonstrated using SceI-induced double-strand break (DSB), see also ) that resection of DSB induced by the repair of deaminated cytosine or by other means (independent of deaminase) leads to exposure of ssDNA, which is preferentially deaminated by APOBEC, causing clustered mutations. The Gordenin group proposed similar mechanism  and recently reported clustered APOBEC3G-induced mutations in the reporter localized in the overhang resulting from uncapped telomeres . Recombination induced by deaminase is completely blocked by uracil-DNA-glycosylase disruption in our strain , but we still observe the genome-wide multiple mutation clusters (this work and ). We conclude that the high level of mutagenesis in diploids allowed for the detection of clustered mutations induced independently from recombination. Considering genome-wide distribution on mutations induced by PmCDA1, possible sources of ssDNA for deaminase could be intermediates of replication and transcription.
Genome-wide analysis of the sequence context of mutations
Whole-genome resequencing provides an unprecedented opportunity to analyze the genome-wide distribution of mutations and their sequence context. We compared the genome-wide mutational sequence context data that we obtained for HAP and PmCDA1 mutagenesis with prior results obtained using reporter genes. We found that HAP has a slight preference for A-T-rich sequences in the genome compared to results obtained using the URA3 gene as a reporter (Fig. 3, left column of consensus sequences) (data from  and this work). An even stronger bias is evident in the spectrum of HAP-induced mutations in the LYS2 gene, where a major hotspot at position 3165 in the LYS2 ORF severely affects the results of sequence context analysis (Fig. 3, bottom consensus in the left column) . PmCDA1 mutagenesis shows a strong preference for deamination of cytosines at ATC motifs in the yeast genome, which agrees with the results of the PmCDA1-induced mutational spectra obtained from sea lamprey lymphocyte receptor gene variable regions, further corroborating evidence that PmCDA1 is responsible for VLR diversification . Our genome-wide mutation sequence context results are very similar to the spectra of PmCDA1-induced mutations in the yeast URA3 and CAN1 genes when they are used as reporter genes (in this work and ). In contrast, the CTC motif (mutated base underlined) is favored by PmCDA1 when the E.coli rpoB gene is used as a reporter, which is primarily due to a strong hotspot at position 1592 in the rpoB ORF (Fig. 3, bottom consensus, right column) . Taken together, we conclude that analyses of the sequence context preferences of mutagens using reporter genes should be interpreted carefully, especially when the number of detectable positions in the reporter is limited and strong hotspots are found for the reporter/mutagen combination under study.
Materials and Methods
All S.cerevisiae strains used in this study (see Table S1 for genotypes) are derived from 1B-D770 . The mutant ura3–4 allele in this strain was reverted to wild type by transformation with wild type URA3 DNA obtained by PCR, yielding the LAN201 strain. LAN211 is an auto-diploid of LAN201 obtained by HO endonuclease expression followed by selection for diploids. The haploid ung1-deficient strain LAN200 was described previously . Auto-diploidization of LAN200 resulted in the diploid ung1 strain LAN210.
Standard yeast media were used . For selection of mutants we have used synthetic complete (SC) agar plates without arginine with 60 mg/L of L-canavanine or 0.1% of FOA. For induction of deaminase expression, minimal synthetic media with addition of 1% raffinose and 2% galactose was used.
Mutagenesis in yeast
Mutation frequencies were determined by fluctuation analysis as described previously . For the HAP experiment, independent LAN201 or LAN211 clones were grown in rich YPD media overnight. HAP was added to the media, where applicable, to a final concentration 50 µg/ml. After overnight incubation at 30°C, cultures were plated undiluted on synthetic complete media with canavanine (SC+CAN) to select for can1 mutants, and with dilution to complete (SC) plates to estimate viability. The CAN1 gene encodes arginine permease, which transports the toxic arginine analog canavanine into cells. Inactivation of CAN1 renders cells resistant to canavanine.
For the PmCDA1 experiments, plasmid pESC-LEU-PmCDA1 was constructed as follows. Total RNA was extracted from the blood of sea lamprey (Petromyzon marinus) and reverse-transcribed with oligo (dT). PmCDA1 was amplified with primers NotICDA1N-F (5′-TTTGCGGCCGCACCATGACCGACGCTGAGTAC, location 118–135 in GenBank accession EF094822) and SpeICDA1C-R (5′- TTTACTAGTGCAACAGCAGGACTCTTAGTG, location 724–742 in EF094822) and cloned into pESC-LEU vector (Stratagene). For yeast experiments, LAN200 or LAN210 strains were transformed with the pESC-LEU-PmCDA1 expression plasmid or with the vector only , . Colony-purified transformants were inoculated in 5 ml of synthetic liquid media without leucine containing 1% raffinose. After overnight incubation, galactose was added to cultures at a final concentration of 2%. Galactose activates the GAL1-10 promoter in the pESC-LEU vector which induces the expression of deaminase. After one day of incubation, culture suspensions were plated undiluted on SC+CAN and with dilution on complete plates.
Isolation of clones for genome sequencing
LAN201 and LAN211 were streaked on YPDAU plates and grown overnight. The next day, they were replica-plated on fresh YPD plates, and a drop of HAP was added to sterile filter paper placed on the agar surface so that different yeast patches receive a similar HAP dose. The next day, streaks were replica-plated on SC+CAN plates to select for mutants. One CanR colony was picked from one streak, then colony-purified and frozen as a glycerol stock at −80°C. To obtain non-mutant HAP-treated clones, yeast from YPD plates with HAP (the same plates used to obtain CanR clones) were streaked on YPDAU plates without HAP and then colony-purified. All of the isolated HAP-treated, non-mutant clones were confirmed to be CanS.
In the PmCDA1 experiments, LAN200 and LAN210 were transformed with pESC-LEU-PmCDA1. Individual transformants were then inoculated in 5 ml of liquid synthetic media containing glucose and without leucine, followed by incubation for one day at 30°C with shaking. Cells were then pelleted, washed once with sterile water, then resuspended in 12 ml of synthetic media without leucine containing 2% galactose and 1% raffinose, followed by incubation for 3 days at 30°C with shaking. Aliquots of the resulting yeast suspensions were plated on synthetic complete media containing canavanine to select for can1 mutants. Aliquots of diluted cultures were plated on synthetic complete (SC) plates to estimate viability. Individual CANR colonies (one per each independent culture) were colony-purified and stored at −80°C. PmCDA1-treated non-mutant clones were arbitrarily picked up from SC plates. These clones were confirmed to be CanS.
Purification of yeast genomic DNA for sequencing
We used the method described in  with slight modifications. Cells were collected from 30 ml of saturated culture (OD600≈10) grown in YPDAU medium, washed once with water, and resuspended in 3 ml of lysis buffer (0.1 M Tris-HCl pH 8.0, 50 mM EDTA, 1% SDS). Then 150 µl of 5 M NaCl and ∼1.2 ml of glass beads were added to the suspension. Cells were disrupted by vortexing (2 cycles, 2 min each) in a cold room and then the lysate was centrifuged (13,000 g, 10 min). DNA was purified from the supernatant using phenol-chloroform extraction followed by ethanol precipitation. The DNA pellet was dissolved in DNA-grade water and treated with RNAse A (Qiagen, 10 µl of 10 mg/ml per sample, 1 h at 37°C). DNA was purified again by phenol-chloroform extraction followed by ethanol precipitation, and finally resuspended in DNA-grade water. The concentration and quality of DNA preparations were monitored by agarose gel electrophoresis and the use of a NanoDrop spectrophotometer (Thermo Scientific) and a Qubit fluorometer (Invitrogen).
Library construction and whole-genome resequencing
Isolated yeast genomic DNA was used to construct fragment libraries using the recommended kits for sequencing on the UNMC NGS Core Laboratory's HiSeq 2000 instrument. We multiplexed individual yeast libraries, each derived from an individual clone, in a single lane of an Illumina flow cell. Each of the yeast genomes was sequenced at 100× to 300× coverage (depending on the run) by sequencing 101 bp from each end of the individual DNA fragments in the library (101 bp paired-end sequencing), according to Illumina's recommendations. During the instrument run and after sequencing of the yeast libraries was completed, a variety of quality assurance (QA) measurements were made to ensure the integrity of the DNA sequence data. The DNA “bar codes” used for multiplexing were first used to partition the reads into their respective sample-specific bins, and then the bar codes were stripped from the reads to yield sample-specific DNA sequences of interest. Base-calling error correction was performed on each sample-specific set of de-multiplexed raw reads using Quake . Raw Illumina resequencing data for the LAN211 strain and for the various mutant and non-mutant clones were deposited in the NCBI Sequence Read Archive (www.ncbi.nlm.nih.gov/sra, accession numbers SRA057025 and SRP014741).
De novo assemblies of reference genomes of parent strains
About ten million pair-end reads generated by sequencing of the whole-genome library obtained from the LAN201 reference strain were used for de novo genome assembly using CLC Bio's Genomics Workbench software (CLC Bio, Aarhus, Denmark). This resulted in 458 contigs of various lengths (from 200 to 217,386 bp). These contigs were aligned to the genome of the standard yeast strain S288C using batch BLAST. After sorting of the contigs by chromosome, each set was scaffolded (ordered and oriented) against the corresponding chromosome using Geneious Pro software (Biomatters Ltd, Auckland, New Zealand) . Consensus sequences were extracted from the scaffolds and used in the next step of reference genome assembly. Raw sequencing data (the same 10 million reads that were used for the de novo assembly) were then assembled to the extracted consensus sequences, and SNVs were detected using Geneious Pro. After manual identification of false positives and the correction of alignments, a new consensus was obtained. This “version 1” LAN201 draft genome assembly covers 92.74% of the standard S288C yeast genome downloaded from the Saccharomyces Genome Database (SGD, www.yeastgenome.org) in October 2011. This draft assembly has a GC content of 38.24% (compared with 38.31% for S288C; see Table S2). LAN211 is identical to LAN201 except for the mating type locus. To obtain references of ung1-deficient strains (LAN200 and LAN210), sequencing reads corresponding to LAN210 were assembled on the LAN211 draft in Geneious Pro. SNVs were called, manually checked, and then a consensus sequence was deduced. This resulted in a LAN210 draft genome assembly. The re-assembly of the LAN200 reads on the LAN210 reference followed by SNV calling confirmed that the two strains are isogenic. These draft genome assemblies were used to analyze the genomes of mutant clones. UNG1 and ung1 reference genomes are compared in Table S2.
Comparative assemblies of genomes of mutant and non-mutant clones treated with the mutagen on reference genomes and SNV calling
Ten to 20 million pair-end reads per clone were comparatively assembled on the LAN211 (for HAP) or LAN210 (for PmCDA1) reference genomes using Geneious Pro. SNVs were called using homozygous (SNV frequency ≥80%) and heterozygous (40% ≤ SNV frequency ≤80%) modes. The threshold SNV call frequencies (40% and 80%) were selected based on pilot experiments designed to optimize the detection of true SNVs and reduce the number of false positives. Regions of high and low coverage (more than two standard deviations from the mean) were excluded from the analysis. At this point, a fraction of non-expected substitution types was observed in the genomes. These included non-G-C→A-T and non-A-T→G-C mutations in HAP-treated genomes and non-G-C to A-T mutations in PmCDA1-treated genomes. The majority of these SNVs were found in the regions where reads were clearly misaligned to the reference (i.e. having low mapping quality). The rest of the non-expected SNVs were found in the otherwise good regions of assembly. We PCR-amplified some of the representative genomic regions where expected and non-expected SNV types were detected, and then sequenced these amplicons using the Sanger method. All SNVs of the expected types were indeed present in the genomes, whereas all non-expected SNVs were found to be assembly errors. For example, we observed frequent putative A→C “transversions” in ACC or ACCCC sequence motifs, but these were not confirmed by Sanger sequencing. Based on these results, we removed all non-standard SNVs from the data set. Finally, detected SNVs were extracted from the alignments for further analyses (Table S6).
Other bioinformatics techniques
The predicted effects of SNVs on proteins were analyzed in Geneious Pro . To extract the genomic sequence context of mutations, we used ad hoc program Rseq1. Consensus sequences, also called sequence logos (Fig. 3) were created using WebLogo 3 (http://weblogo.threeplusone.com/)  with adjustment for the GC composition of the corresponding genomes and reporter genes.
The Mann-Whitney test as used to compare differences in mutation loads in different types of mutant and non-mutant clones (see Fig. 4). This result suggests that the difference between haploid and diploid strains is significant, reflecting differences in their ability to tolerate the high frequency of induced mutations.
List of yeast strains used in this work.
Nucleotide sequence differences between wild-type and ung1 reference strains.
Genome assembly parameters of reference strain LAN211 compared to strain S288C from the Saccharomyces Genome Database (www.yeastgenome.org). a CDS – coding sequence. b ORF – open reading frame.
Distributions of substitution types (as percentages) in HAP-mutagenized genomes.
Distributions of substitution types (as percentages) in PmCDA1-treated genomes. a One triple GGG→AAA mutation found.
SNVs detected in sequenced genomes. The table consists of 6 sheets, representing the following data: 1. Mutations detected in the genomes of HAP-induced haploid CANR mutants. 2. Mutations detected in the genomes of HAP-induced diploid CANR mutants.3. Mutations detected in the genomes of HAP-treated diploid non-mutants. 4. Mutations detected in the genomes of PmCDA1-induced haploid CANR mutants. 5. Mutations detected in the genomes of PmCDA1-induced diploid CANR and FOAR mutants. 6. Mutations detected in the genomes of PmCDA1-treated diploid non-mutants. Description of column names: Document Name - Name of sequenced clone. Sequence Name - Name of chromosome file. Track Name - Homozygous (80%) or heterozygous (40%) mutation. Minimum - Coordinate of mutation start in reference genome. Maximum - Coordinate of mutation end in reference genome (same as “Minimum” except for tandem mutations).
We are grateful to Sarah Schmoker for help with the mutagenesis experiments, Anna Zhuk (St. Petersburg State University) for the help with data analysis and to Drs. Polina Shcherbakova, Tran Hiep and Sergei Mirkin for critical reading of the manuscript.
Conceived and designed the experiments: AGL YIP. Performed the experiments: AGL EIS ISRW VNN AD MH. Analyzed the data: AGL RJB IBR YIP. Contributed reagents/materials/analysis tools: JDE. Wrote the paper: AGL RJB IBR YIP.
- 1. Hanawalt PC (2007) Paradigms for the three rs: DNA replication, recombination, and repair. Mol Cell 28: 702–707.
- 2. Lynch M (2010) Evolution of the mutation rate. Trends Genet 26: 345–352.
- 3. Kirschner M, Gerhart J (1998) Evolvability. Proc Natl Acad Sci U S A 95: 8420–8427.
- 4. Herr AJ, Ogawa M, Lawrence NA, Williams LN, Eggington JM, et al. (2011) Mutator suppression and escape from replication error-induced extinction in yeast. PLoS Genet 7: e1002282.
- 5. Drake JW, Charlesworth B, Charlesworth D, Crow JF (1998) Rates of spontaneous mutation. Genetics 148: 1667–1686.
- 6. Daee DL, Mertz TM, Shcherbakova PV (2010) A cancer-associated DNA polymerase delta variant modeled in yeast causes a catastrophic increase in genomic instability. Proc Natl Acad Sci U S A 107: 157–162.
- 7. Drake JW, Bebenek A, Kissling GE, Peddada S (2005) Clusters of mutations from transient hypermutability. Proc Natl Acad Sci U S A 102: 12849–12854.
- 8. Loeb LA (2011) Human cancers express mutator phenotypes: origin, consequences and targeting. Nat Rev Cancer 11: 450–457.
- 9. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, et al. (2012) Mutational Processes Molding the Genomes of 21 Breast Cancers. Cell 149: 979–993.
- 10. Bielas JH, Loeb KR, Rubin BP, True LD, Loeb LA (2006) Human cancers express a mutator phenotype. Proc Natl Acad Sci U S A 103: 18238–18242.
- 11. Loeb LA, Springgate CF, Battula N (1974) Errors in DNA replication as a basis of malignant changes. Cancer Res 34: 2311–2321.
- 12. Loeb LA (2001) A mutator phenotype in cancer. Cancer Res 61: 3230–3239.
- 13. Richards B, Zhang H, Phear G, Meuth M (1997) Conditional mutator phenotypes in hMSH2-deficient tumor cell lines. Science 277: 1523–1526.
- 14. Loeb LA (1997) Transient expression of a mutator phenotype in cancer cells. Science 277: 1449–1450.
- 15. Matsumoto Y, Marusawa H, Kinoshita K, Endo Y, Kou T, et al. (2007) Helicobacter pylori infection triggers aberrant expression of activation-induced cytidine deaminase in gastric epithelium. Nat Med 13: 470–476.
- 16. Burns MB, Lackey L, Carpenter MA, Rathore A, Land AM, et al. (2013) APOBEC3B is an enzymatic source of mutation in breast cancer. Nature 494: 366–370.
- 17. Knudson AG Jr (1971) Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 68: 820–823.
- 18. Berger AH, Knudson AG, Pandolfi PP (2011) A continuum model for tumour suppression. Nature 476: 163–169.
- 19. Pavlov YI, Shcherbakova PV (2010) DNA polymerases at the eukaryotic fork - 20 years later. Mutat Res 685: 45–53.
- 20. Gordenin DA, Inge-Vechtomov SG (1981) [Mechanism of mutant induction in the ade2 gene of diploid Saccharomyces cerevisiae yeasts by ultraviolet rays]. Genetika 17: 822–831.
- 21. Pavlov Iu I, Noskov VN, Chernov Iu O, Gordenin DA (1988) [Mutability of LYS2 gene in diploid Saccharomyces yeasts. II. Frequency of mutants induced by 6-N-hydroxylaminopurine and propiolactone]. Genetika 24: 1752–1760.
- 22. Tran HT, Degtyareva NP, Gordenin DA, Resnick MA (1999) Genetic factors affecting the impact of DNA polymerase delta proofreading activity on mutation avoidance in yeast. Genetics 152: 47–59.
- 23. Kumar D, Viberg J, Nilsson AK, Chabes A (2010) Highly mutagenic and severely imbalanced dNTP pools can escape detection by the S-phase checkpoint. Nucleic Acids Res 38: 3975–3983.
- 24. Kozmin SG, Schaaper RM, Shcherbakova PV, Kulikov VN, Noskov VN, et al. (1998) Multiple antimutagenesis mechanisms affect mutagenic activity and specificity of the base analog 6-N-hydroxylaminopurine in bacteria and yeast. Mutat Res 402: 41–50.
- 25. Menezes MR, Waisertreiger IS, Lopez-Bertoni H, Luo X, Pavlov YI (2012) Pivotal role of inosine triphosphate pyrophosphatase in maintaining genome stability and the prevention of apoptosis in human cells. PLoS One 7: e32313.
- 26. Shcherbakova PV, Pavlov YI (1993) Mutagenic specificity of the base analog 6-N-hydroxylaminopurine in the URA3 gene of the yeast Saccharomyces cerevisiae. Mutagenesis 8: 417–421.
- 27. Stepchenkova EI, Koz'min SG, Alenin VV, Pavlov Iu I (2009) [Genetic control of metabolism of mutagenic purine base analogs 6-hydroxylaminopurine and 2-amino-6-hydroxylaminopurine in yeast Saccharomyces cerevisiae]. Genetika 45: 471–477.
- 28. Shcherbakova PV, Noskov VN, Pshenichnov MR, Pavlov YI (1996) Base analog 6-N-hydroxylaminopurine mutagenesis in the yeast Saccharomyces cerevisiae is controlled by replicative DNA polymerases. Mutat Res 369: 33–44.
- 29. Pavlov YI, Suslov VV, Shcherbakova PV, Kunkel TA, Ono A, et al. (1996) Base analog N6-hydroxylaminopurine mutagenesis in Escherichia coli: genetic control and molecular specificity. Mutat Res 357: 1–15.
- 30. Burgis NE, Cunningham RP (2007) Substrate specificity of RdgB protein, a deoxyribonucleoside triphosphate pyrophosphohydrolase. J Biol Chem 282: 3531–3538.
- 31. Rogozin IB, Iyer LM, Liang L, Glazko GV, Liston VG, et al. (2007) Evolution and diversification of lamprey antigen receptors: evidence for involvement of an AID-APOBEC family cytosine deaminase. Nat Immunol 8: 647–656.
- 32. Samaranayake M, Bujnicki JM, Carpenter M, Bhagwat AS (2006) Evaluation of molecular models for the affinity maturation of antibodies: roles of cytosine deamination by AID and DNA repair. Chem Rev 106: 700–719.
- 33. Conticello SG, Langlois MA, Yang Z, Neuberger MS (2007) DNA deamination in immunity: AID in the context of its APOBEC relatives. Adv Immunol 94: 37–73.
- 34. Lada AG, Iyer LM, Rogozin IB, Aravind L, Pavlov Iu I (2007) [Vertebrate immunity: mutator proteins and their evolution]. Genetika 43: 1311–1327.
- 35. Teperek-Tkacz M, Pasque V, Gentsch G, Ferguson-Smith AC (2011) Epigenetic reprogramming: is deamination key to active DNA demethylation? Reproduction 142: 621–632.
- 36. Maizels N (2005) Immunoglobulin gene diversification. Annu Rev Genet 39: 23–46.
- 37. Lada AG, Krick CF, Kozmin SG, Mayorov VI, Karpova TS, et al. (2011) Mutator effects and mutation signatures of editing deaminases produced in bacteria and yeast. Biochemistry (Mosc) 76: 131–146.
- 38. Pavlov YI, Lange EK, Chromov-Borisov NN (1979) Studies on genetic activity of 6-hydroxylaminopurine and its riboside in strains of Salmonella typhimurium and Saccharomyces cerevisiae. Research of Biological Effects of Antropogenic Factors on Water Reservoirs. Irkutsk. pp. 11–30.
- 39. Pavlov YI (1986) Mutants Highly Sensitive to the Mutagenic Action of 6-N-hydroxylaminopurine. Soviet Genetics 22: 2235–2243.
- 40. Modrich P (2006) Mechanisms in eukaryotic mismatch repair. J Biol Chem 281: 30305–30309.
- 41. Negishi K, Loakes D, Schaaper RM (2002) Saturation of DNA mismatch repair and error catastrophe by a base analogue in Escherichia coli. Genetics 161: 1363–1371.
- 42. Roberts SA, Sterling J, Thompson C, Harris S, Mav D, et al. (2012) Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol Cell 46: 424–435.
- 43. Kulikov VV, Derkatch IL, Noskov VN, Tarunina OV, Chernoff YO, et al. (2001) Mutagenic specificity of the base analog 6-N-hydroxylaminopurine in the LYS2 gene of yeast Saccharomyces cerevisiae. Mutat Res 473: 151–161.
- 44. Taylor BJ, Nik-Zainal S, Wu YL, Stebbings LA, Raine K, et al. (2013) DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. Elife 2: e00534.
- 45. Lada AG, Dhar A, Boissy RJ, Hirano M, Rubel AA, et al. (2012) AID/APOBEC cytosine deaminase induces genome-wide kataegis. Biol Direct 7: 47.
- 46. Hicks WM, Kim M, Haber JE (2010) Increased mutagenesis and unique mutation signature associated with mitotic gene conversion. Science 329: 82–85.
- 47. Poltoratsky V, Heacock M, Kissling GE, Prasad R, Wilson SH (2010) Mutagenesis dependent upon the combination of activation-induced deaminase expression and a double-strand break. Mol Immunol 48: 164–170.
- 48. Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, et al. (2009) Human mutation rate associated with DNA replication timing. Nat Genet 41: 393–395.
- 49. Shcherbakova PV, Pavlov YI (1996) 3′→5′ exonucleases of DNA polymerases ε and δ correct base analog induced DNA replication errors on opposite DNA strands in Saccharomyces cerevisiae. Genetics 142: 717–726.
- 50. Pavlov YI, Newlon CS, Kunkel TA (2002) Yeast origins establish a strand bias for replicational mutagenesis. Mol Cell 10: 207–213.
- 51. Waisertreiger IS, Liston VG, Menezes MR, Kim HM, Lobachev KS, et al. (2012) Modulation of mutagenesis in eukaryotes by DNA replication fork dynamics and quality of nucleotide pools. Environ Mol Mutagen 53: 699–724.
- 52. Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, et al. (1999) Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285: 901–906.
- 53. Giaever G, Chu AM, Ni L, Connelly C, Riles L, et al. (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418: 387–391.
- 54. Nishant KT, Wei W, Mancera E, Argueso JL, Schlattl A, et al. (2010) The baker's yeast diploid genome is remarkably stable in vegetative growth and meiosis. PLoS Genet 6: e1001109 doi: 10.1371/journal.pgen.1001109.
- 55. Lynch M, Sung W, Morris K, Coffey N, Landry CR, et al. (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci U S A 105: 9272–9277.
- 56. Noskov V (1988) Studies of the mutagenic action of 6-N-hydroxylaminopurine and beta-propiolactone in diploid yeast Saccharomyces cerevisiae [Candidate of Biological Sciences]. Leningrad: Leningrad State University. 167 p.
- 57. Hall BG (1990) Spontaneous point mutations that occur more often when advantageous than when neutral. Genetics 126: 5–16.
- 58. Torkelson J, Harris RS, Lombardo MJ, Nagendran J, Thulin C, et al. (1997) Genome-wide hypermutation in a subpopulation of stationary-phase cells underlies recombination-dependent adaptive mutation. EMBO J 16: 3303–3311.
- 59. Rosche WA, Foster PL (1999) The role of transient hypermutators in adaptive mutation in Escherichia coli. Proc Natl Acad Sci U S A 96: 6862–6867.
- 60. Foster PL (2004) Adaptive mutation in Escherichia coli. J Bacteriol 186: 4846–4852.
- 61. Tran HT, Degtyareva NP, Gordenin DA, Resnick MA (1999) Genetic factors affecting the impact of DNA polymerase δ proofreading activity on mutation avoidance in yeast. Genetics 152: 47–59.
- 62. Lada AG, Waisertreiger IS, Grabow CE, Prakash A, Borgstahl GE, et al. (2011) Replication protein A (RPA) hampers the processive action of APOBEC3G cytosine deaminase on single-stranded DNA. PLoS One 6: e24848.
- 63. Basu U, Meng FL, Keim C, Grinstein V, Pefanis E, et al. (2011) The RNA exosome targets the AID cytidine deaminase to both strands of transcribed duplex DNA substrates. Cell 144: 353–363.
- 64. Alex K, Shalek RS, Adiconis Xian, Gertner Rona S, et al. (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498: 236–40.
- 65. Eckert KA, Sweasy JB (2012) DNA polymerases and their role in genomic stability. Environ Mol Mutagen 53: 643–684.
- 66. Pham P, Bransteitter R, Petruska J, Goodman MF (2003) Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 424: 103–107.
- 67. Pham P, Calabrese P, Park SJ, Goodman MF (2011) Analysis of a single-stranded DNA-scanning process in which activation-induced deoxycytidine deaminase (AID) deaminates C to U haphazardly and inefficiently to ensure mutational diversity. J Biol Chem 286: 24931–24942.
- 68. Chan K, Sterling JF, Roberts SA, Bhagwat AS, Resnick MA, et al. (2012) Base damage within single-strand DNA underlies in vivo hypermutability induced by a ubiquitous environmental agent. PLoS Genet 8: e1003149.
- 69. Shcherbakova PV, Kunkel TA (1999) Mutator phenotypes conferred by MLH1 overexpression and by heterozygosity for mlh1 mutations. Mol Cell Biol 19: 3177–3183.
- 70. Sherman F. FG, Hick JB (1986) Methods in yeast genetics: Cold Spring Harbor Laboratory Press. 200 p.
- 71. Otero JM, Vongsangnak W, Asadollahi MA, Olivares-Hernandes R, Maury J, et al. (2010) Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications. BMC Genomics 11: 723.
- 72. Kelley DR, Schatz MC, Salzberg SL (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11: R116.
- 73. Drummond AJ AB, Buxton S, Cheung M, Cooper A, Duran C, et al.. (2012) Geneious v5.6, Available from http://www.geneious.com
- 74. Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14: 1188–1190.