Empirical determinants of adaptive mutations in yeast experimental evolution

High-throughput sequencing technologies have enabled expansion of the scope of genetic screens to identify mutations that underlie quantitative phenotypes, such as fitness improvements that occur during the course of experimental evolution. This new capability has allowed us to describe the relationship between fitness and genotype at a level never possible before, and ask deeper questions, such as how genome structure, available mutation spectrum, and other factors drive evolution. Here we combined functional genomics and experimental evolution to first map on a genome scale the distribution of potential beneficial mutations available as a first step to an evolving population and then compare these to the mutations actually observed in order to define the constraints acting upon evolution. We first constructed a single-step fitness landscape for the yeast genome by using barcoded gene deletion and overexpression collections, competitive growth in continuous culture, and barcode sequencing. By quantifying the relative fitness effects of thousands of single-gene amplifications or deletions simultaneously we revealed the presence of hundreds of accessible evolutionary paths. To determine the actual mutation spectrum used in evolution, we built a catalog of >1000 mutations selected during experimental evolution. By combining both datasets, we were able to ask how and why evolution is constrained. We identified adaptive mutations in laboratory evolved populations, derived mutational signatures in a variety of conditions and ploidy states, and determined that half of the mutations accumulated positively affect cellular fitness. We also uncovered hundreds of potential beneficial mutations never observed in the mutational spectrum derived from the experimental evolution catalog and found that those adaptive mutations become accessible in the absence of the dominant adaptive solution. This comprehensive functional screen explored the set of potential adaptive mutations on one genetic background, and allows us for the first time at this scale to compare the mutational path with the actual, spontaneously derived spectrum of mutations. AUTHOR SUMMARY Whole genome sequencing of thousands of cancer genomes has been conducted to characterize variants including point mutations and structural changes, providing a large catalogue of critical polymorphisms associated with tumorigenesis. Despite the high prevalence of mutations in cancer and technological advances in their genotyping, cancer genetics still presents many open questions regarding the prediction of selection and the functional impact of mutations on cellular fitness. Long term experimental evolution using model organisms has allowed the selection for strains bearing recurrent and rare mutations, mimicking the genetic aberrations acquired by tumor cells. Here, we evaluate the functional impact of thousands of single gene losses and amplifications on the cellular fitness of yeast. Our results show that hundreds of beneficial mutations are possible during adaptation but not all of them have been selected in evolution experiments so far performed. Together, our results provide evidence that 50% of the mutations found in experimentally evolved populations are advantageous, and that alternative mutations with improved fitness could be selected in the absence of the main adaptive mutations with higher fitness. BLURB A combined view of potential adaptive mutations, generated by a systematic screening approach, coupled with the mutational spectrum derived from experimentally evolved yeast reveals the usage of accessible evolutionary solutions.

on one genetic background, and allows us for the first time at this scale to compare the 1 mutational path with the actual, spontaneously derived spectrum of mutations. 1 Whole genome sequencing of thousands of cancer genomes has been conducted to Whole genome sequencing of thousands of human tumors has uncovered a huge number 2 of variants including point mutations and structural changes, providing a large catalog of mutated 3 genes across all major cancer types [1][2][3][4]. Recent advances in profiling initiatives and systematic different conditions [10][11][12][13][14][15][16][17][18]. Within this rapidly increasing dataset, only a few mutations have 1 been fully characterized with regard to function. Similar to studies investigating human disease 2 candidate genes, large-scale studies from the microbial community have distinguished adaptive 3 mutations from background neutral mutations on the basis of statistical approaches such as 4 frequency, enrichment and recurrence [10,11,17,[19][20][21][22][23]. 5 Despite sophisticated genetic systems, dissecting the functional consequences of every 6 mutation observed in a population is still tedious, though generally experimentally 7 straightforward. For example, simple genetics can be used to reassort mutations, followed by fitness and correspond to potential evolutionary solutions. We next compared the single-step 1 mutation fitness to the actual mutation spectrum derived from experimental evolution studies 2 performed in this study and also collected from the literature. We found that 50% of the 3 mutations are predicted to positively affect fitness. In sulfate-limited condition, mutations in one 4 gene dominate both the single-step fitness landscape and the observed mutational spectrum, 5 while in the two other conditions the increase in fitness is driven by a large number of beneficial 6 mutations of smaller effect size. Finally, we show that these constraints can be modified by 7 eliminating the highest fitness paths, upon which the evolving cultures explore alternative 8 beneficial mutations. . We performed a dose-response curve for ~80% of all the genes of the yeast genome in three 5 different nutrient-limitations using five different yeast barcoded genomic collections (Table S1) 6 outlined as follows: two deletion collections in which each gene is replaced by a selectable 7 marker and a unique DNA barcode in one haploid and one heterozygous diploid background 8 [34], one control collection where thousands of unique barcodes have been placed at a single 9 known neutral genomic location [35], and finally two collections of diploid strains bearing 10 plasmids where each gene and its native promoter has been cloned into a barcoded plasmid 11 present at either low or high copy [32,33]. A schematic description of the method is presented in  The functional screening of mutations uncovers hundreds of accessible 7 adaptive mutations 8 We quantified a total of 100,853 relative fitnesses ranging from -36.5 to +42.8% based on 9 an average of 462 reads per gene per competition and created an experimental fitness landscape 10 beneficial mutation with a 10% fitness increase will reach 5% of the population in ~200 1 generations and will fix in ~500 generations [41]. This analysis suggests that mutations causing 2 less than 10% fitness increase will rarely be observed in our experimental evolution timescale. 3 The functional screening of pooled mutants revealed that most of the mutants display wild-type 4 fitness. Using the 10% cut-off, we detected an enrichment of mutants with a decreased fitness 5 (n=1693 vs. 19 for the control pool) and an increased fitness (n=506 vs. 80 for the control pool) 6 respectively compared to the control pool (Chi square, p <0.0001) (Figure 2). 7 We focused first on the 506 mutants showing increased fitness, hypothesizing that 8 mutations affecting these genes would be more likely to be adaptive during growth under strong 9 selection. Despite making up just 47% of the mutations tested, 73% of the beneficial mutations 10 we detected are from the plasmid collections where the gene copy number is increased, 11 suggesting that in diploids, gain-of-function mutations and duplications are more likely to 12 produce fitness gains than are loss-of function mutations. Among the genes associated with a 13 fitness increase, SUL1 was notable with the highest fitness measure (42.8% in sulfate-limited 14 condition for a strain carrying a high copy number plasmid). We previously demonstrated that 15 the amplification of this gene is recurrently selected during experimental evolution in sulfate 16 limitation, and that increasing the copy number of SUL1 via expression on both low and high amplicon, we detected only three independent clones where the SUL1 amplicon excluded the gene BSD2. The fitness of each of the 13 strains harboring an amplification containing both 1 SUL1 and BSD2 is higher than the fitness of three strains with an amplicon containing only SUL1 2 but not BSD2 [43]. Reintroducing BSD2 into one of the three strains using a low copy plasmid 3 increased the fitness by 5% (37.7% to 43.8%), demonstrating that the functional screen with 4 pooled strains is a reliable method to detect small effect and secondary adaptive mutations, and 5 suggesting that the two mutations have an additive effect on the fitness. 6 Our functional screens revealed the presence of hundreds of possible beneficial mutations 7 (223 in sulfate-, 210 in glucose-and 73 in phosphate-limited conditions). We next sought to 8 apply the functional knowledge gained from the genome-wide analyses described above to the 9 hundreds of de novo mutations identified in laboratory evolution experiments. Using this 10 combined dataset, our goal was to ask which particular adaptive mutations are selected and why.

11
Mutational spectrum in microbial evolution experiments 12 To determine the mutational signature of adaptation using laboratory evolution, we  [10][11][12]16,43,46]. Not all the conditions overlap with our 19 functional screens, but they are useful for cross-condition comparison. In total, we compiled 20 1,167 mutations in 1,088 genes from 106 long-term laboratory evolution experiments conducted 21 in eleven different conditions from nine published studies in addition to this one ( Table S4). The features of these studies and the resulting mutations are summarized in Table 1. The complete 1 list of mutations, their frequencies, and their predicted effects are given in Table S4. The 2 compiled mutation catalog does not take into account chromosomal rearrangements, as these 3 events were not always measured in the different studies. Comparing the mutational spectrum across many environments, strains and ploidies 7 allows us to extract mutational signatures and infer the properties of beneficial mutations in 8 yeast. Ploidy in particular has been a subject of much interest since the observation that haploids 9 and diploids adapt at different rates [40]. Two recent studies have shown that loss-of function 10 mutations were commonly selected in evolved populations of haploid yeast [10,11] is consistent with our previous observations that evolved diploid strains contain more and larger 7 gene and chromosome copy number variants than evolved haploids. We next investigated if the 8 difference between haploid and diploid was a general rule across environments. Using only 9 mutations discovered in haploid and diploid strains evolved under matched conditions, we 10 detected that the mutational signature was different between haploids and diploids in glucose-11 limitation (Fisher exact test, p<10 -14 ) with an enrichment of LOF in haploid (Chi-square, n= 224, 12 p<10 -9 ), but only a slight tendency is observed in phosphate-limitation (Fisher exact test, n=54 13 p=0.053) and none in sulfate-limited conditions (Fisher exact test, n=100, p=0.72). The 14 difference between ploidies is likely explained by the tendency of LOF mutations to be recessive 15 [49] compared to mutations that increase gene expression, which may be more likely to have an 16 effect as a heterozygote. Though loss-of-heterozygosity has been observed in diploid populations 17 [42,49], these are relatively rare. To test this directly, we determined how many LOF mutations 18 detected as beneficial in a haploid context might lose this effect when heterozygous in a diploid. 19 We compared 58 beneficial mutants from the haploid deletion collection to the fitness of the 20 heterozygous diploid mutants and found that these mutations do show a tendency to be recessive,  Mutational pathways are constrained 6 Recurrence-based models, which assume oncogenes are recurrently mutated in several 7 tumor samples, are still one of the most widely used approaches to identify putative driver genes 8 in cancer [50][51][52]. The repeatability of adaptive trajectories has also been extensively observed in 9 the microbial research community and has led to the discovery of drivers of adaptation such as 10 SUL1, HXT6/7, and RIM15 in S. cerevisiae and rpoS in Escherichia coli [11,19,20,42,53]. Of the 11 1,088 genes mutated in the catalog we compiled, 154 genes were found with a mutation in more 12 than one sample, and among them 19 genes were found mutated more than five times 13 independently ( Figure 3A). We detected that recurrently mutated genes are highly enriched in 14 mutations categorized as high impact (Fisher exact test, p<10 -16 ) ( Figure 3B) and are longer than 15 genes with only one hit (Wilcoxon rank-sum test, p<10 -16 ) ( Figure S3B). In order to detect true 16 adaptive mutations and discard false-positives, several studies have developed tools to correct for 17 gene length [5]; in our study we decided instead to attempt to infer the functional impact of 18 mutations on cellular fitness using our screen results.

19
Prediction of evolutionary response to strong selection 20 Despite the presence of more than one hundred recurrent mutations, a large number of 21 genes are mutated in only single populations. Since the number of Evolve and Resequence experiments is currently still relatively small, akin to a non-saturating genetic screen, adaptive 1 mutations are likely to be found in the class of singletons and would be missed by a recurrence 2 method. As an alternative strategy toward specifically identifying adaptive mutations, we 3 compared the mutations found in the evolution experiments with known beneficial mutations 4 identified by our genomic screen.
5 From the functional screen described above, we detected 506 beneficial mutations 6 targeting 458 genes; among them, 86 genes were found with a hit in our compiled mutation five genes were recurrently mutated, but had no obvious benefit in their given conditions 16 (VPS25, MNN4, FRE5 and GSH1 in glucose and PHO84 in phosphate). One example, MNN4 has 17 been found mutated in two independent populations grown in glucose limitation; however we 18 measured no fitness benefit in our functional screen and no fitness benefit was reported in a 19 competitive assay using evolved clones [24]. These five genes could be recurrently mutated by benefit in a specific genetic background. This data show that convergent evolution cannot be 1 used as the only parameter to predict evolutionary outcomes and more comprehensive and 2 unbiased detection of adaptive mutations requires a more direct method such as functional 3 screening. 4 50% of the mutations accumulated during experimental evolution are 5 adaptive 6 Next we wanted to determine how many adaptive mutations were carried by each 7 sequenced population and clone, using the frequency of recurrence combined with data from the 8 functional screen. We determined that 91% of the samples (clones and populations) carried at 9 least one predicted driver mutation. Of these samples, each contained an average of 5.  Table S6). Three populations with no predicted beneficial 13 mutations were cultivated in nitrogen-limiting conditions. However, these strains have been 14 shown to carry Copy Number Variants (CNVs) [12], and we did not include nitrogen limitations 15 in our functional screen. We also detected 24 mutations from the experimental evolution studies 16 in genes that are associated with deleterious mutations based on our functional screen performed 17 in the same conditions. However, none of the mutations were predicted to have a high impact on 18 the function of the gene, and so they might instead be neutral or near-neutral passenger 19 mutations. Thus, combining functional screening of mutations and whole genome sequencing of 20 populations and clones in this way, we are able to identify both drivers of adaptation and also 21 unexplored fitness peaks. We conclude that evolution is partly predictable based on the repeatability of adaptive trajectories in independent evolution experiments and reflects at least in 1 part the underlying fitness distribution of possible mutations.

2
The set of beneficial mutations reveals potential drivers of adaptation 3 The analysis above defines the subset of adaptive mutations actually utilized by 4 experimental evolution. However, the screen for beneficial mutations identified a large 5 mutational reservoir with many additional accessible evolutionary paths [54]. To determine what 6 differentiates the actual mutation spectrum from the potential mutation pool, we excluded the 7 mutations that had already been identified in experimental evolution, and found 369 potential  We used our functional screen to determine whether the mutations actually selected for 14 during experimental evolution differed from the potential adaptive mutations that were never 15 recovered. We detected a statistical difference between the fitness of the beneficial mutations  phosphate-limitations (n=94 and 54) may have limited our ability to detect a similar fitness differential as observed in glucose-limitation (n=224). This would suggest that the observed 1 mutation spectrum is driven by the fitness of potential beneficial mutations. The observed 2 mutation spectrum could also be biased away from the highest fitness mutations by differences in 3 mutation rate, as previously proposed [10]. Likely, the lack of mutations in these genes may be 4 the result of a combination of all of these factors, including random chance, epistatic interactions 5 between mutations, and/or a reflection that the pool experiment does not adequately recapitulate 6 the fitness of the de novo mutations. Clonal interference is also likely to play a large role. 7 Consistent with previous findings, SUL1 dominates in the functional screen and in the mutational 8 spectrum ( Figure 5B), but other highly beneficial mutations (>20% fitness increase) such as 9 mutations in MAC1 and PHO3, two genes coding proteins implicated in copper and phosphate-10 sulfate metabolism, respectively, are also potential drivers but are never recovered ( Figure 5B) 11 [10,55]. Conversely, in glucose limitation, many beneficial mutations of similar fitness are 12 possible, and so more variety in outcomes and broader sampling of the mutational reservoir is 13 observed.
14 Mutational spectrum in the absence of the main adaptive mutation 15 To investigate the discrepancy we observed between the single-step fitness landscape and 16 the observed mutational spectrum, and to test the predictability of experimental evolution, we 17 wanted to test if we could detect unobserved adaptive mutations by inhibiting the selection of the 18 main driver of adaptation. We have shown in previous work that SUL1 amplification dominates 19 the mutational spectrum [42,43] and is the mutation with the highest fitness in our screen 20 ( Figure 5B). Additional adaptive mutations might be undetectable in sulfate-limited conditions 21 due to the presence of such a strong fitness peak. We hypothesized that by eliminating the outcome more similar to the pattern observed in glucose-limitation. To explore the mutational 1 landscape in the absence of the main adaptive mutation, we screened two evolved populations in 2 which no SUL1 amplification was detected by qPCR ( Figure 6A) and aCGH (data not shown) 3 even after 200 generations of cultivation in sulfate-limited conditions. The fitness of the clones 4 and populations without SUL1 amplification (~30%) (Figure 6B) are on the lower end of the 5 fitness range of previously studied evolved clones with SUL1 locus amplifications (37% to 53%) 6 [43]. To establish which mutations were responsible for this phenotype, we performed whole 7 genome sequencing and called SNPs and INDELs of the clones and the populations isolated at 8 generation 200. One nonsense mutation was detected in the previously identified adaptive gene 9 SGF73 for one of the clones (Table S4). Two independent non-synonymous mutations (N263H 10 and N250K) in the coding-region of SUL1 were also detected in both populations. Wild type 11 strains containing those mutations were created and we detected a fitness increase of 23.1%   Using those two datasets, we were able for the first time to compare potential and actual 6 beneficial mutations and begin to understand why some mutations are selected or not. 7 Patterns and reproducibility of evolution 8 By compiling a catalog of >1,000 mutations identified in 109 independent evolution 9 experiments from this study and others (Table S4) Figure 3A). We detected that the same beneficial phenotype can arise through identical 6 genomic changes (recurrently mutated genes) [10,17] and also through different, apparently 7 unrelated mechanisms as 85% of the genes were only hit once by a mutation. As the recurrence  Evolution is constrained by the fitness of adaptive mutation 3 By combining the beneficial mutations detected in the functional screening and the 4 mutational spectrum of evolved clones and populations, we were able to determine that 50% of 5 the mutations detected in evolving populations are beneficial. As would be expected, this number 6 is higher than previous estimates of the null distribution of mutation fitness using mutation 7 accumulation lines performed in yeast (6% to 13% of all mutations) [65]. We also found that 8 some mutations dominate the mutational spectrum by dominating the fitness of beneficial 9 mutation. For instance, a particular large effect mutation is nearly always observed in sulfate-10 limited conditions, while a diversity of smaller-effect beneficial mutations was detected in both 11 glucose and phosphate-limitations.

12
The comparison also revealed a large number of potential beneficial mutations that have 13 never been observed in any Evolve and Resequence studies so far ( Figure 5B). We wanted to 14 see if those mutations corresponded to inaccessible evolutionary paths or if they could be 15 selected in some specific conditions. We decided to focus on sulfate-limitation, as one primary 16 evolutionary path is utilized in this condition (SUL1 amplification). We looked in evolved strains 17 without this mutation, and found that alternative routes could then be explored. The fitness of the 18 evolved population linked to the deletion of two adjacent genes (IPT1 and SNF11).      Table S8). PCR amplifications were performed in 100µl volume, using Roche 9 FastStart DNA polymerase with the following conditions; 94°C/3min, 25 cycles of 94°C/30sec, 10 55°C/30sec, 72°C/30sec, followed by 72°C/3min. PCR products were then purified using the 11 Qiagen MinElute PCR Purification kit (cat. No. 28004), quantified using a Qubit fluorometer and 12 then adjusted to a concentration of 10µg/ml. Equal volumes of normalized DNA were then 13 pooled and gel purified from 6% polyacrylamide TBE gels (Invitrogen) using a soak and crush 14 method followed by purification and concentration using Qiagen Qiaquick PCR purification.

15
After quantification using a Qubit fluorimeter, libraries were sequenced using the standard 16 Illumina protocol as multiplexed single read 36-base cycles on several lanes on an Illumina  were reassigned to a particular sample using a custom Perl script (Supplementary File 1). Then, 23 each barcode was reassigned to a gene using a standard binary search program (program in C,  Table S7). The fitnesses are similar in both assays and we observed a strong positive 1 correlation (R 2 =0.83) between the large pool screen and the individual fitness measurements 2 ( Figure S4 and Table S7). A second concern is that use of the yeast collections to determine the 3 association of fitness changes could be compromised by mutations or copy number changes 4 preexisting elsewhere in the genomes of the pooled strains. To limit this known artifact, most of   (Table S7). We also detected no copy number changes at the population level 15 using microarray analysis of the last sample of the competition of the low copy plasmid 16 collection, though this approach would only detect CNVs that achieved at least ~10% population 17 frequency (data not shown). 1 All sequencing data from this study have been submitted to the NCBI Sequence Read Archive   during Evolve and Resequence studies. 154 genes were found to be hit by more than one

DATA ACCESS
The ratio of driver mutations to mutation total is not conditions specific (p=0.28; 0.70; 0.36; 0.78 1 and 0.36 for glucose-limited; sulfate-limited; YPD; other and phosphate-limited respectively).  represents the relative frequency of one strain over time, plotted as the log 2 ratio of the frequency 6 at generation x relative to its frequency at generation = 0 over the ~20 generations of steady-state 7 competition. Each line, colored blue and red, represents the linear regression used to calculate 8 the relative fitness between generation 6 and 20.     Table S4: Identities, frequencies, and predicted effects of mutations discovered in 9 experimental evolution studies. 10