Genome-wide association across Saccharomyces cerevisiae strains reveals substantial variation in underlying gene requirements for toxin tolerance

Cellulosic plant biomass is a promising sustainable resource for generating alternative biofuels and biochemicals with microbial factories. But a remaining bottleneck is engineering microbes that are tolerant of toxins generated during biomass processing, because mechanisms of toxin defense are only beginning to emerge. Here, we exploited natural diversity in 165 Saccharomyces cerevisiae strains isolated from diverse geographical and ecological niches, to identify mechanisms of hydrolysate-toxin tolerance. We performed genome-wide association (GWA) analysis to identify genetic variants underlying toxin tolerance, and gene knockouts and allele-swap experiments to validate the involvement of implicated genes. In the process of this work, we uncovered a surprising difference in genetic architecture depending on strain background: in all but one case, knockout of implicated genes had a significant effect on toxin tolerance in one strain, but no significant effect in another strain. In fact, whether or not the gene was involved in tolerance in each strain background had a bigger contribution to strain-specific variation than allelic differences. Our results suggest a major difference in the underlying network of causal genes in different strains, suggesting that mechanisms of hydrolysate tolerance are very dependent on the genetic background. These results could have significant implications for interpreting GWA results and raise important considerations for engineering strategies for industrial strain improvement.


Introduction
The increased interest in renewable energy has focused attention on non-food plant biomass for the production of biofuels and biochemicals [1]. Lignocellulosic plant material contains significant amounts of sugars that can be extracted through a variety of chemical pretreatments and used for microbial production of alcohols and other important molecules [2][3][4][5]. However, there are major challenges to making biofuel production from plant biomass economically viable [6]. One significant hurdle with regards to microbial fermentation is the presence of toxic compounds in the processed plant material, or hydrolysate, including weak acids, furans and phenolics released or generated by the pretreatment process [7][8][9][10]. The concentrations and composition of these inhibitors vary for different pretreatment methods and depend on the plant feedstocks [7,9,11]. These toxins decrease cell productivity by generating reactive oxygen species, damaging DNA, proteins, cell membranes [12][13][14], and inhibiting important physiological processes, including enzymes required for fermentation [15], de novo nucleotide biosynthesis [16], and translation [17]. Despite knowledge of these targets, much remains to be learned about how the complete suite of hydrolysate toxins (HTs) acts synergistically to inhibit cells. Furthermore, how the effects of HTs are compounded by other industrial stresses such as high osmolarity, thermal stress, and end-product toxicity remains murky.
Engineering strains with improved tolerance to industrial stresses including those in the plant hydrolysate is of the utmost importance for making biofuels competitive with fuels already in the market [6]. A goal in industrial strain engineering is to improve lignocellulosic stress tolerance, often through directed engineering. Many approaches have been utilized to identify genes and processes correlated with increased stress tolerance, including transcriptomic profiling of cells responding to industrial stresses [18][19][20][21], genetic mapping in pairs of strains with divergent phenotypes [22][23][24][25], and directed evolution to compare strains selected for stress tolerance with starting strains [26][27][28][29]. However, in many cases the genes identified from such studies do not have the intended effect when engineered into different genetic backgrounds [30][31][32][33]. One reason is that there are likely to be substantial epistatic interactions between the genes identified in one strain and the genetic background from which it was identified [34]. A better understanding of how tolerance mechanisms vary across genetic backgrounds is an important consideration in industrial engineering.
Exploring variation in HT tolerance across strain background could also reveal additional defense mechanisms. The majority of functional studies in Saccharomyces cerevisiae are carried out in a small number of laboratory strains that do not represent the rich diversity found in this species [35,36]. The exploration of natural diversity in S. cerevisiae has revealed a wide range of genotypic and phenotypic variability within the species [36][37][38][39][40]. In some cases, trait variation is correlated with genetic lineage [36,[41][42][43], indicating a strong influence of population history. At least 6 defined lineages have been identified in the species, including strains from Malaysia, West Africa, North America, Europe/vineyards, and Asia [41] as well as recently identified populations from China [38,44]. In addition to genetic variation, phenotypic variation has cataloged natural differences across strains, in transcript abundance [37,45,46], protein abundance [47][48][49], metabolism [50][51][52], and growth in various environments [32,36,37,42,[52][53][54]. Thus, S. cerevisiae as a species presents a rich resource for dissecting how genetic variation contributes to phenotypic differences. In several cases this perspective has benefited industry in producing novel strains by combining genetic backgrounds or mapping the genetic basis for trait differences [25,[55][56][57][58][59].
We used genome-wide association (GWA) in S. cerevisiae strains responding to synthetic hydrolysate (SynH), both to identify new genes and processes important for HT tolerance and to explore the extent to which genetic background influences mechanism. We tested 20 genes associated with HT tolerance and swapped alleles across strains to validate several allele-specific effects. However, in the process of allele exchange we discovered striking differences in gene contributions to the phenotype: out of 14 gene knockouts tested in two strains with opposing phenotypes, 8 (57%) had a statistically significant effect on HT tolerance in one of the backgrounds but little to no significant effect in the other background. In most of these cases, the specific allele had little observable contribution to the phenotype. Thus, although GWA successfully implicated new genes and processes involved in HT tolerance, the causal variation in the tested strains is not at the level of the allele but rather whether or not the gene's function is important for the phenotype in that background. This raises important implications for considering natural variation in functional networks to explain phenotypic variation.

Genetic variation across 165 S. cerevisiae strains
We obtained 165 Saccharomyces cerevisiae strains, representing a range of geographical and ecological niches, that have high quality whole genome sequencing reads (coverage~30X), coming from published sequencing projects across the yeast community [39, 42, 52, 60] (S1 Table). We identified 486,302 high quality SNPs (see Methods). 68% of them had a minor allele frequency less than 5%. Nucleotide variation compared to the well-studied S288c-derived reference strain varied from as low as 0.08% for the closely related W303 lab strain and as high as 0.72% for the bakery strain YS4 (S1 Table). The majority of strains were largely homozygous (in some cases due to strain manipulation by sequencing projects); however, we identified 21 strains with >20% heterozygous sites. Most of these were from natural environments (11 strains) but they also included clinical samples (5 strains), baking strains (3 strains), a sugar cane fermenter (1) and a laboratory strain (FL100, which was scored as 98% heterozygous and may have mated with another strain in its recent history (S1 Table)).
Sixty-three percent of the variants were present in coding regions (S2 Table), which is lower than random expectation (since 75% of the yeast genome is coding) and consistent with purifying selection acting on most gene sequences. Indeed, coding variants predicted to have high impact, such as SNPs that introduce a stop codon, eliminate the start codon, or introduce a defect in the splicing region, were very rare (0.004% of genic SNPs)-a third of these were in dubious ORFs (22%) or genes of unknown function (8%) [61] that are likely nonfunctional and under relaxed constraint. However, 54 genes with debilitating polymorphisms are reportedly essential in the S288 background; nearly half of these polymorphisms are present in at least 3 strains and in some cases are lineage specific (S3 Table). Tolerance of these polymorphisms could arise through duplication of a functional gene copy [62], but could also arise due to evolved epistatic effects as has been previously reported [63], highlighting the complexity behind genetic networks and the role of genetic variation in determining their regulation.
Principal component analysis of the genomic data recapitulated the known lineages represented in the collection, including the European/wine, Asian/sake, North American (NA), Malaysian, West African (WA), and mosaic groups [36, 41, 42, 64] (S1 Table). Our analysis split the West African population into three subgroups not previously defined ( Fig 1A). Construction of a simple neighbor-joining (NJ) tree broadly confirmed the population groups present in the 165-strain collection ( Fig 1B).

Phenotypic variation in SynH tolerance is partly correlated with ancestral group
We scored variation in lignocellulosic hydrolysate tolerance in several ways. Strains that are sensitive to hydrolysate grow slower and consume less sugars over time [65], thus we measured final cell density and percent of glucose consumed after 24 hours to represent SynH tolerance. Growth and glucose consumption were significantly correlated (R 2 = 0.79), although there was some disagreement for particular strains (including flocculant strains) (S1 Table). We also determined tolerance to HTs specifically, to distinguish stress inflicted by HTs from effects of the base medium that has unusual nutrient composition and high osmolarity due to sugar concentration. To do this, we calculated the relative percent-glucose consumed and final OD 600 in media with (SynH) and without HT toxins (SynH-HTs, see Methods) (S1 Table). Tolerance to SynH base medium without toxins (SynH-HT) and SynH with the toxins was only partly correlated (R 2 = 0.48) (S1 Fig), suggesting that there are separable mechanisms of growing in base medium and surviving the toxins.
There is wide variation in tolerance to lignocellulosic hydrolysate that partly correlates with populations (Fig 2, S1 Fig). North American and Malaysian strains displayed the highest The entire collection of high quality SNPs (486,302) was used as input for principal component analysis (PCA) (A) and to generate a neighbor-joining (NJ) tree (B). Populations assigned in circles were defined manually using published population structure data. Strains are color-coded according to genetic similarity, with matching colors between the PCA and NJ tree (generated by Adegenet [110]). The population and/or niche is represented by the key, with the number of strains in each group indicated in parentheses.
https://doi.org/10.1371/journal.pgen.1007217.g001 tolerance to SynH. As expected, phenotypic variation within each population was related to genetic variation, e.g. West African strains in Population 5 showed low genetic and phenotypic variation while mosaic strains with genetic admixture showed the widest range of phenotypes.

Genome-wide association analysis reveals genes associated with SynH tolerance
We used GWA to map the genetic basis for the differences in SynH tolerance, for each of the four phenotypes introduced above. The population signatures in S. cerevisiae are problematic for GWA, since the strong correlations between phenotype and ancestry obscure the identification of causal polymorphisms [66,67]. To overcome this, we incorporated a large number of mosaic strains in the analysis and used a mixed-linear model to account for strain relationships, as implemented in the program GAPIT [68] (see Methods). We used as input SNPs that were present in at least 3 strains, eliminating 42% of SNPs in the dataset (see Methods). Of the remaining SNPs, 45% have a minor allele frequency of less than 5%; only those with an allele frequency >2% were used for GWA. GWA identified loci whose variation correlated with phenotypic variation. None of the GWA-implicated loci passed the stringent Bonferroni p-value correction based on the number of effective tests (see Methods), which is not uncommon for GWA at this scale [42, 69,70]. We therefore used a somewhat arbitrary p-value cutoff of 1e-04 and performed additional filtering to minimize false positive associations (see Methods).  The combined analysis yielded 76 SNPs that met our p-value threshold (S4 Table,  Thirty-eight of these SNPs, linked to 33 genes, passed additional filtering (See Methods, Table 1). Of these, 17 SNPs are associated with growth in SynH, while 23 SNPs are associated with tolerance to HTs specifically ( Table 1). Eight of the SNPs are intergenic and 20 are located within genes, with 13 of those predicted to change the coding sequence. Although we would expect that SNPs linked to HT tolerance should be identified in both sets of analyses, only 2 SNPs were significantly associated with both SynH and HT tolerance. This almost certainly highlights limited statistical power with the small set of strains used here. For most SNPs, the allele associated with tolerance was more frequent in our strain collection (Fig 3A), but for some it was the allele associated with sensitivity that was nearly fixed. We carried out additional GWA filtering to ensure that results were not driven by population structure (see Methods), since we note that many of sensitive alleles were prominent in the Asian population (S3 Fig). As expected for a largely additive trait, there was a significant linear correlation between the number of deleterious alleles a strain harbored and its tolerance to hydrolysate (R 2 = 0.48, p = 2.2e-16, Fig 3B).
Interestingly, the genes associated with the 38 implicated SNPs capture functionally related processes, suggesting mechanistic underpinnings of hydrolysate tolerance. Lignocellulosic hydrolysate contains a large number of toxins that affect multiple cellular functions and can target energy stores, membrane fluidity, protein and DNA integrity, and other processes [10,65]. Our analysis implicated several genes involved in redox reactions (ADH4, ALD3), protein folding or modification (CYM1, UBP5, UFD2, AOS1), ergosterol or fatty acid synthesis (ERG12, HMG2, NSG2), DNA metabolism and repair (REV1, DAT1, MCM5, SHE1), mRNA transcription and export (LEU3, SIR3, ELF1, RIM20, MEX67), mitochondrial function (MNE1, MAS1), and flocculation (FLO1, FLO10). Several of these processes were already known to be associated with hydrolysate stress, including flocculation [71], ubiquitin-dependent processes that may be linked to protein folding challenges [13,72,73], and sterol biosynthesis which affects tolerance to multiple stresses present in this media [32, 74,75]. Nearly a third of these genes were identified as differentially expressed in our previous study comparing strain responses to SynH and rich medium [32], although this was not enriched above what is expected by chance. Thus, although gene expression differences can be informative in suggesting affected cellular processes, many of the genes implicated by GWA cannot be predicted by expression differences, especially SNPs that affect function without altering gene expression. Additional genes identified here belong to functional groups previously identified in our differential expression analysis, such as amino-acid and NAD biosynthesis.

Gene knockouts confirm functional requirement for some implicated genes
We sought to confirm the importance of the GWA-implicated genes in SynH tolerance, first through gene-knockout analysis and then with allelic replacement in two different strains backgrounds. We began by knocking out 19 of the implicated genes in the tolerant North American strain, YPS128. Of these, 37% (7/19) of the knockout mutants had a significant phenotype when grown on SynH: four displayed decreased SynH tolerance, while 3 showed increased performance (Fig 4A). We note that 4 of the 7 knockouts had a mild phenotypic effect in standard growth medium (that was generally exacerbated in SynH), while 3 of these had a phenotype only in response to SynH (S4 Fig). The most significant knockouts decreasing tolerance in the YPS128 strain included the transcription factor LEU3, ribosomal protein RPL21B, protein phosphatase subunit SAP190, and to a milder extend the mitotic spindle protein SHE1. None of these genes has been directly implicated in tolerance to hydrolysate in previous studies. The effect of deleting LEU3, which encodes the leucine-responsive transcription factor, was intriguing, since our prior work reported that amino-acid biosynthesis genes are induced specifically in response to HTs [32]. To confirm that this response was due to the toxicity found in the media and not due to amino acid shortage in SynH, we compared growth in synthetic complete (SC) medium, which has similar levels of branched-chain amino acids compared to SynH. The LEU3 knockout strain grew as well as the wild type in SC, but it grew to 54% lower final density in SynH-HT medium and 79% lower density in SynH medium with the toxins added ( Fig 5A). The defect was not fully complemented by supplementing synthetic hydrolysate with 10X the normal amino acid mix (Fig 5B), indicating that amino acid shortage in the medium is unlikely to fully explain the growth defect.
The most striking phenotypic improvement was caused by deletion MNE1, encoding a splicing factor for the cytochrome c oxidase-encoding COX1 mRNA [76]. Aerobically, the mutant grew to roughly similar cell densities but consumed 44.7% more glucose and generated 64% more ethanol than the wild type, generating significantly more ethanol per cell (S5 Fig). A logical hypothesis is that this mutant has a defect in respiration and thus relies more on glycolysis to generate ATP and ethanol than wild-type cells [76]. Under this hypothesis, the effect of the mutation should be normalized when cells are grown anaerobically because both the  mutant and wild type must rely on fermentation. However, under anaerobic conditions the mutant grew significantly better than the wild type (Fig 6A), consumed 70% more glucose ( Fig  6B), and produced 63% more ethanol after 24-hour growth ( Fig 6C). Thus, a simple defect in

Extensive background effects influence gene involvement in SynH tolerance
We next knocked out 16 genes in the sensitive strain YJM1444, with the intention of allelic exchange ( Fig 4B). We were unable to recover knockouts for some of the genes tested in YPS128, but of those we acquired 14 overlapped the YPS128 knockouts, and two (REV2 and HMG2) that we were unable to knock out in the tolerant strain were added. Remarkably, knockouts had strikingly different effects between the two genetic backgrounds-while three of the gene deletions affected hydrolysate tolerance in YJM1444, there was no overlap with the gene deletions causing a statistically significant effect in YPS128 (although some mild effects may be below our statistical power to detect). The three knockouts specific to YJM1444 improved SynH tolerance and included two genes involved in sterol biosynthesis (NSG2 and HMG2) and one involved in flocculation ( Fig 4B). In fact, deletion of FLO1 dramatically reduced the flocculation phenotype of YJM1444 and resulted in >236% increased glucose consumption in SynH. This single mutation converted YJM1444 tolerance to the level of SynH tolerance seen in YPS128 (S6 Fig). To test that this phenotypic effect was directly caused by the FLO1 allele, we deleted its paralog FLO5, which caused neither a change in flocculation nor increased glucose consumption of the culture (S7 Fig).
There appeared to be subtle, but not significant, effects of the MNE1 deletion in YJM1444 and we wondered if the was obscured by flocculation. Therefore, we measured glucose consumption in high-rpm shake flasks that disrupt flocculation. Indeed, MNE1 deletion had a significant benefit under these conditions; however, the magnitude of the effect was more subtle than MNE1 deletion in YPS128 (S8A Fig). We also tested this deletion in an industrial strain, Ethanol Red (E. Red). Deletion of MNE1 in a haploid spore derived from E. Red produced a minor, reproducible benefit although it was not statistically significant (S8A Fig). Nonetheless, these results show that MNE1 plays a role in SynH tolerant, albeit to different levels, in three different strain backgrounds.

Genetic background effects dominate the effect of allelic variation in HT tolerance
We tested allelic effects in two ways. First, we introduced a plasmid-borne copy of the tolerant allele or sensitive allele (S5 Table) into YPS128 lacking the native gene, and measured percent  Fig 7A). The assay was fairly noisy, nonetheless, there was a clear effect for the FLO1 allele, which caused YPS128 to become flocculant and dramatically decreased growth in the SC with HTs. We did not observe other allele-dependent effects that overcame the variability of the assay, including for the genes whose knockout produced a defect in YPS128. Second, we performed reciprocal hemizygosity analysis for six genes, including three genes that whose deletion produced differential effects in YPS128 and YJM1444. We crossed the YPS128 and YJM1444 backgrounds such that the resulting diploid was hemizigous for either the tolerant or sensitive allele (Fig 7B). In this case, none of the six genes had an allele-specific effect-surprisingly, this included FLO1 for which there was clear allelic impact in the haploid backgrounds. We realized a unique phenotype in the YPS128-YJM1444 hybrid: whereas the strain is heterozygous for the functional FLO1 allele, the hybrid lost much of the flocculence of the YJM1444 strain (S8B Fig). FLO1 expression is known to be repressed in some diploid strains [77]. Thus, simply mating the strains in effect created a new genetic background that changed the allelic impact of the gene.
We wondered if this effect explained the lack of allele-specific phenotypes for other implicated genes. We therefore created homozygous deletions in the diploid hybrid for six genes whose deletion had strain-specific impacts in the haploids (Fig 7C). Two of the knockouts (leu3Δ and sap190Δ) produced a defect in the hybrid, similar to the effect seen in YPS128. Homozygous deletion of MNE1 produced a unique growth defect in 24-well plates that was not seen in the haploids or the hemizigous diploids. This appeared to be due to increased flocculation in the hybrid diploid; growth in shake flasks to disrupt flocculation resulted in a mild but statistically insignificant benefit to the hybrid when grown in flasks, similar to that seen for YJM1444. In contrast, deletion of RIM20 or FLO1 had no effect under these growth conditions-this explains the lack of allele specific effect, because the genes are no longer important in this background and under these growth conditions.
Mating YJM128 and YJM1444 created a new background that surpassed performance of YPS128 (Fig 7D). We wondered if hybridization could benefit other strains as well. We mated industrial strain E. Red crossed to YJM1444 and YPS128. E. Red and YJM1444 were both scored as sensitive and perform similarly in SynH (Fig 7D). However, the hybrid had a striking jump in SynH tolerance, exceeding the tolerance of YPS128. This benefit may be in part because the new diploid background changes the flocculation phenotype. On the other hand, YJM1444 and E. Red harbor alternate alleles at 71% of the SNPs implicated by GWA, raising the possibility that complementation of recessive alleles could also contribute to the strain improvement (see Discussion).

Discussion
Engineering strains for tolerance to lignocellulosic hydrolysate has proven difficult due to the complex stress responses required to deal with the combinatory effects of toxins, high osmolarity, and end products such as alcohols and other chemicals. Even when the cellular targets of stressors are known, the mechanisms for increasing tolerance are not always clear. We leveraged phenotypic and genetic variation to implicate new mechanisms of hydrolysate tolerance, by finding correlations between phenotypic and genetic differences among a collection of Saccharomyces cerevisiae strains, which allowed us to implicate specific genes and alleles involved in hydrolysate tolerance. The results indicate several important points relevant to engineering improved hydrolysate tolerance and genetic architecture of tolerance more broadly.
Perhaps the most striking result is the level to which gene involvement varies across the strains in our study. We expected that swapping alleles of implicated SNPs should contribute to variation in the phenotype. Most alleles did not detectably affect tolerance, although it is likely that they may have a minor contribution below our limit of detection. Indeed, strains that harbor more deleterious alleles are significantly more sensitive to SynH (Fig 3B), as expected for an additive trait. But at the same time, we uncovered significant variation in whether the underlying gene was involved in the phenotype. Among the genes that we were able to knockout in both strains (14 genes), 57% produced a phenotype (to varying levels and significance) in one of the two strains we tested. This indicates substantial epistatic interactions with the genetic background, such that the gene is important in one strain and but dispensable in another. Even more striking is the case of FLO1: knocking out the functional gene in YJM1444 produced a major benefit to that strain, whereas introducing the functional allele to YPS128 was very detrimental to SynH tolerance. Yet neither the allele nor the gene itself influenced SynH tolerance in the hybrid, because the hybrid is much less flocculant under these conditions (despite carrying functional YJM1444 FLO1 gene).
While it may not be surprising that gene knockouts result in quantitatively different phenotypes, we did not expect that most knockouts would have no detectible effect in specific backgrounds. It will be important to investigate the extent to which this effect is true in other organisms and for other phenotypes. However, evidence in the literature hints at the breadth of this result: several genes are required for viability in one yeast strain but not another [63,78], while overexpression of other genes produces a phenotype in one background but not others [32]. Genetic background effects on gene contributions have been reported before, in yeast and other organisms [35, [79][80][81][82][83][84]; however, the extent to which different genes appear to be involved in toxin tolerance in the different strains studied here suggests an important consideration that is underappreciated in GWA analysis: that the network of genes contributing to the phenotype could be largely different depending on genomic context. Dissecting these epistatic interactions is likely to be daunting, since a major challenge in most GWA studies remains identifying the epistatic interactions due to the high statistical hurdle [34, 85,86]. We propose that emerging network-based approaches to augment linear contributions will be an important area in identifying genetic contributions in the context of background-specific effects.
QTL mapping has allowed the characterization of the genetic architecture of industrially relevant stresses, including tolerance to ethanol [22,87], acetic acid [23, 56], and plant hydrolysate [25] among many others [24, [88][89][90]. But while this method exploits the genetic diversity between two strains, with GWA we were able to study a much larger collection of genetic diversity, providing unique insights. SynH tolerance is clearly a complex trait, with many genes likely contributing. Previous studies have shown that part of the growth inhibition can be explained by a re-routing of resources to convert toxins into less inhibitory compounds [18,19,[91][92][93][94] and to repair damage generated by reactive oxygen species in membranes and proteins [13,14,95]. One of the most significant effects was caused by deletion of LEU3, the transcription factor regulating genes involved in branched amino acid biosynthesis. Interestingly, weak acids have been shown to inhibit uptake of aromatic amino acids causing growth arrest [96], and it is possible that Leu3 is required to combat this effect. Chemical genomic experiments suggest an additional role for Leu3 in managing oxidative stress in the cell [97], which could relate to oxidative stressors in hydrolysate [13,14,32]. We also uncovered a gene, Mne1, that when deleted significantly increases ethanol production in SynH. Mne1 aids the splicing of COX1 mRNA [76] and has not been previously linked to stress tolerance. Interestingly, MNE1 mutants produced more ethanol per cell aerobically, but also grew substantially better in SynH anaerobically, raising the possibility that Mne1 plays an additional, unknown role in cellular physiology that can be utilized to increase fermentation yields. Finally, although flocculation has been previously shown to increase cell survival in hydrolysate [71], our study showed that flocculation reduced the rate of sugar consumption in the culture, likely because cells in the middle of the clump are nutrient restricted. Together, these results shed new light on SynH tolerance and mechanisms for future engineering.
Our results raise broader implications for strain engineering, based on the genetic architectures uncovered here. Given the implication of gene-by-background interactions, the best route for improving strain performance may be crossing strains for hybrid vigor [98][99][100].
Indeed, we unexpectedly generated a strain that outperformed the tolerant YPS128, by crossing two poor performers in SynH. This improved vigor could emerge if the hybrid complements recessive deleterious alleles in each strain, or if mating creates a new genetic background that changes the requirements (and fitness) of the strain. We believe that both models-weak but additive allelic contributions in the context of epistatic background effectsare at work in our study. For additive traits, GWA and genomic studies can have significant practical power, by predicting where individual strains fall on the genotype-phenotype spectrum and by suggesting which strains should be crossed for maximal phenotypic effect.

Strains
Strains used in the GWA are listed in S1 Table Gene knockouts were performed in strains derived from North American strain YPS128 and mosaic strain YJM1444. The homozygous diploid parental strains were first engineered into stable haploids by knocking out the homothallic switching endonuclease (HO) locus with the KAN-MX antibiotic marker [101], followed by sporulation in 1% potassium acetate plates and dissection of tetrads to attain heterothallic MATa and MATα derivatives. Gene knockouts were generated through homologous recombination with the HERP1.1 drug resistance cassette [102] and verified by 3 or 4 diagnostic PCRs (validating that the cassette was integrated into the correct locus and that no PCR product was generated from within the gene that was deleted). Most knockouts removed the gene from ATG to stop codon, but in some cases (e.g. kdx1) additional flanking sequence was removed, without removing neighboring genes. Genes from YPS128 or strains carrying the sensitive allele (S5 Table) were cloned by homologous recombination onto a CEN plasmid, taking approximately 1,000 bps upstream and 600 bps downstream from each genome, and verified by diagnostic PCR. Phenotyping of strains harboring alternate alleles on plasmids was performed in as previously described, except that the pre-culture was grown in YPD with 100 mg/ L nourseothricin (Werner BioAgents, Germany) to maintain the plasmid expressing each allele. We note that plasmid-bourn expression of the gene complemented the gene-deletion phenotype, where applicable, in all cases tested (not shown). Allele specific effects were additionally tested by reciprocal hemizygosity analysis (RHA) [103]. The HO locus was replaced with the nourseothricin resistance cassette (NAT-MX) for each mating type of YPS128 and YJM1444. These were then crossed with the appropriate deletion strain of opposite mating type and harboring the KanMX cassette, selecting for mated cells resistant to both drugs, to generate heterozygous strains that were hemizigous for the gene in question (crosses shown in S6 Table).

Media, growth, and phenotyping conditions
Synthetic Hydrolysate (SynH) medium mimics the lignocellulosic hydrolysate generated from AFEX ammonium treated corn stover with 90 g glucan/L loading and was prepared as in Sardi et al. (2016). Two versions were prepared to represent the complete hydrolysate (SynH) and the hydrolysate without the hydrolysate toxin cocktail (HT) (SynH-HT), as previously published [32].
Phenotyping for GWA, gene deletion assessment, and RHA, was performed using high throughput growth assays in 24 well plates (TPP1 tissue culture plates, Sigma-Aldrich, St. Louis, MO). To prepare the cultures, 10 μl of thawed frozen cell stock were pinned onto YPD agar plates (1% yeast extract, 2% peptone, 2% dextrose, 2% agar) and grown for 3 days at 30˚C. Cells were then pre-cultured in 24 well plates containing 1.5 ml of YPD liquid, sealed with breathable tape (AeraSeal, Sigma-Aldrich, St. Louis, MO), covered with a lid and incubated at 30˚C while shaking for 24 h. Next, 10 μl of saturated culture was transferred to a 24 well plate containing 1.5 ml of SynH or SynH-HT where indicated, and grown as the preculture for 24 h. Cell density was measured by optical density at 600 nm (OD 600 ) as 'final OD'. Culture medium collected after cells were removed by centrifugation was used to determine glucose concentrations by YSI 2700 Select high performance liquid chromatography (HPLC) and refractive index detection (RID) (YSI Incorporated, Yellow Springs, OH). Biological replicates were performed on different days.
For GWA, we used four different but related phenotype measures of cells growing in SynH or SynH-HTs: 1) final OD 600 as a measure of growth, 2) percent of starting glucose consumed after 24 hours in SynH, 3) HT tolerance based on OD 600 (calculated as the ratio of final OD 600 in SynH versus final OD 600 in SynH -HTs), and 4) HT tolerance based on glucose consumption (calculated as the ratio of glucose consumed in SynH versus in SynH -HTs). Strains and phenotype scores are listed in S1 Table. Initial phenotyping for GWA was performed in biological duplicates; knockout strains and hemizigous strains were phenotyped in five biological replicates to increase statistical power, whereas homozygous deletion strains were phenotyped in triplicate. Replicates for each batch of strains shown in each figure were performed on separate days, for paired statistical analysis.
Experiments done for allele replacements expressed on plasmids were performed in glass tubes using modified synthetic complete medium (SC) with high sugar concentrations and the toxin cocktail where indicated (Sardi et al 2016) to mimic SynH but with no ammonium to support nourseothricin selection [104] (1.7 g/L YNB w/o ammonia sulfate and amino acids, 1 g/L monosodium glutamic acid, 2 g/L amino acid drop-out lacking leucine, 48 μg/L leucine, 90 g/L dextrose, 45 g/L xylose). This was required since nourseothricin selection does not work in highammonium containing SynH. First, we precultured strains carrying plasmids in SC medium with nourseothricin (200 ug/ml) for 24 h. Next, we inoculated a fresh culture at a starting OD 600 of 0.1 in 7 ml of the modified synthetic complete medium with nourseothricin (200 ug/ml) and HTs. Cultures were grown for 24 h and phenotyped as described above. Replicates were performed on different days, and thus samples were paired by replicate date for t-test analysis.
Anaerobic phenotyping was performed in the anaerobic chamber, where cells were grown in flasks containing 25 ml SynH or SynH-HT and maintained in suspension using a magnetic stir bar. Ethanol production was measured over time by HPLC RID analysis. Paired t-test analysis was performed to determine significance, pairing samples by replicate date.

Genomic sequencing and analysis
We obtained publicly available whole genome sequencing reads from Saccharomyces cerevisiae sequencing projects [39,42,52,60]. Sequencing reads were mapped to reference genome S288C (NC_001133, version 64 [105]) using bwa-mem [106] with default settings. Single nucleotide polymorphisms (SNPs) were identified using GATK [107] Unified Genotyper, analyzing all the strains together to increase detection power. GATK pipeline included base quality score calibration, indel realignment, duplicate removal, and depth coverage analysis. Default parameters were used except for -mbq 25 to reduce false positives. Variants were filtered using GATK suggested criteria: QD < 2, FS > 60, MQ < 40. A dataset with high quality SNPs was generated using VCFtools [108] by applying additional filters of a quality value above 2000 and excluding sites with more that 80% missing data. Genetic variant annotation was performed using SNPEff [109]. Principal component analysis and the neighbor-joining tree were performed with the R package Adegenet 1.3-1 [110] using the entire collection of high quality SNPs (486,302 SNPs). SNP data are available in the EBI under accession number PRJEB24747.

Genome-wide association analysis
Correlations between genotype and phenotype were performed using a mixed linear model implemented in the software GAPIT [68]. Only SNPs with a minor allele frequency (MAF) of at least 2% were used for this analysis (282,150 SNPs). Multiple models, each incorporating a different number of principal components to capture population structure (from 0-3), were analyzed. The final model was manually chosen as the one with the greatest overall agreement between the distribution of expected and the observed p-values, i.e. based on QQ plots with the least skew across the majority of SNPs. We performed four analyses, one for each for the four related phenotypes measured. The model used to map SynH final OD 600 and SynH percent glucose consumed used 0 principal components, with population structure corrected using only the kinship generated by GAPIT. The model used to map HT tolerance based on relative final OD 600 used 2 principal components, and the model to map HT tolerance based on glucose consumed incorporated 1 principal component. The threshold for significance accounting for multiple-test correction was identified by dividing the critical p-value cutoff of 0.05 by the number of independent tests estimated by the SimpleM method [111], which decreased the number of tests from 282,150 to 137,398 to produce a p-value threshold of 3.6e-7 [112]. However, none of our tests passed this threshold, which is likely overly conservative. We therefore used a p-value threshold of 1e-04 to identify genes for detailed follow-up analysis. We realized that the extreme phenotypes of Asian/sake strains coupled with their strong population structure might be confounding the analysis [66]. Therefore, to further reduce the chance of false positives due to residual population influences, we reran the analyses without the 11 sake strains and removed from the original list of significant SNPs those with p>5e-3. For each locus carrying a significant SNP, we plotted phenotypic distributions for each possible genotype. We focused subsequent downstream analysis on individual SNPs whose effects were additive across strains that were heterozygous and homozygous at that site, assessed visually. Genes affected by each SNP were determined by the SNPEff annotation, which predicted the effect of variants on genes. YPS128 deletion mutants were transformed with an empty plasmid (pKI), a plasmid carrying the tolerant allele (pT), or a plasmid carrying the sensitive allele (pS). Cells were grown in SynH, which does not allow the use antibiotics due to the presence of ammonium sulfate in the medium. Although most cells likely retain the plasmid over the duration of this experiment, we were unable to detect allele-specific effects that overcome the variation in the experiments. (EPS) S1  Table. Initial identification of SNPs correlated with SynH tolerance. Initial set of SNPs whose p-value passed our threshold in any of the GWA are shown, ranked by significance. Phenotypes to which the SNP was associated are listed in the first column; (1) Final OD 600 in SynH, (2) Percent of glucose consumed in SynH, (3) HT tolerance based on OD 600 , (4) HT tolerance based on glucose consumed. SNPs identified in multiple GWA, the most significant pvalue is listed in the last column. SNP type was determined by SNPeff: syn, synonymous; mis, missense; int, intergenic.