Complex Ancestries of Lager-Brewing Hybrids Were Shaped by Standing Variation in the Wild Yeast Saccharomyces eubayanus

Lager-style beers constitute the vast majority of the beer market, and yet, the genetic origin of the yeast strains that brew them has been shrouded in mystery and controversy. Unlike ale-style beers, which are generally brewed with Saccharomyces cerevisiae, lagers are brewed at colder temperatures with allopolyploid hybrids of Saccharomyces eubayanus x S. cerevisiae. Since the discovery of S. eubayanus in 2011, additional strains have been isolated from South America, North America, Australasia, and Asia, but only interspecies hybrids have been isolated in Europe. Here, using genome sequence data, we examine the relationships of these wild S. eubayanus strains to each other and to domesticated lager strains. Our results support the existence of a relatively low-diversity (π = 0.00197) lineage of S. eubayanus whose distribution stretches across the Holarctic ecozone and includes wild isolates from Tibet, new wild isolates from North America, and the S. eubayanus parents of lager yeasts. This Holarctic lineage is closely related to a population with higher diversity (π = 0.00275) that has been found primarily in South America but includes some widely distributed isolates. A second diverse South American population (π = 0.00354) and two early-diverging Asian subspecies are more distantly related. We further show that no single wild strain from the Holarctic lineage is the sole closest relative of lager yeasts. Instead, different parts of the genome portray different phylogenetic signals and ancestry, likely due to outcrossing and incomplete lineage sorting. Indeed, standing genetic variation within this wild Holarctic lineage of S. eubayanus is responsible for genetic variation still segregating among modern lager-brewing hybrids. We conclude that the relationships among wild strains of S. eubayanus and their domesticated hybrids reflect complex biogeographical and genetic processes.


Introduction
Humans changed from living in hunter-gatherer societies to agricultural societies in part through the domestication of animals and plants [1,2]. At the same time, humans began unwittingly domesticating microorganisms for the production of fermented beverages and foods, but the underlying source populations and genetic processes for microbial domestication are not well understood [3]. Beer is the most common fermented beverage in the world and can be classified as ale or lager, depending on the fermentation conditions and yeasts used. Ale-style beers are mainly produced by strains of S. cerevisiae [4]. In contrast, 94% of the beer market is dominated by lager-style beers, which are fermented at colder temperatures by allopolyploid hybrids of S. cerevisiae x S. eubayanus (syn. S. pastorianus syn. S. carlsbergensis) [5].
Although the S. cerevisiae parent of lager yeasts seems to be closely related to modern ale strains [6,13,15], identifying close relatives of the S. eubayanus parent has proven more challenging. Since the discovery of the species in 2011 in Patagonia, South America [5], rare strains of S. eubayanus have been isolated in North America [20], Asia [21], and New Zealand [22]. Other than interspecies hybrids [5,23], no European isolates of S. eubayanus have been reported. Genome sequence comparisons have shown the Patagonian type strain to be 99.56% identical to the S. eubayanus subgenome of a lager-brewing hybrid [5], while a Tibetan isolate was shown to be 99.82% identical [21].
Previous population and phylogenetic studies of S. eubayanus suggest that it may contain up to five known phylogenetically distinct clades. Two distinct and highly diverse populations have been described in South America (Patagonia A and Patagonia B) where they have been commonly associated with Nothofagus [20], as well as Araucaria araucana [24]. Recently, an isolate from New Zealand was shown to belong to the Patagonia B clade by multi-locus phylogenetic analysis [22]. Previously isolated North American strains were shown to be the result of recent admixture between the two Patagonian populations [20]. Three lineages have been isolated in Asia, mostly in association with Quercus, including the Tibetan lineage and two earlydiverging lineages that could be regarded as distinct subspecies (Sichuan and West China) [21]. Analyses of population differentiation and genetic diversity have not been performed on the latter three lineages, and all five lineages have not been thoroughly analyzed together in the same phylogenetic study.
To improve our understanding of the genetic diversity and phylogeography of S. eubayanus and its domesticated European hybrids, we have integrated existing multi-locus datasets and added several new isolates from North America (North Carolina, Washington, and New Brunswick). To extend these analyses, we have also performed whole genome sequencing (WGS) on available isolates. These results support the existence of a relatively low-diversity Holarctic lineage, which includes wild isolates from Tibet and North Carolina, as well as the hypothetical ancestor of the European interspecies hybrids. Depending on the region of the genome examined, this Holarctic lineage is embedded within or sister to one of the Patagonian populations of S. eubayanus. Genomic analyses further show that none of the known wild S. eubayanus strains is the sole closest relative to the lager-brewing hybrids. Instead, we infer that lager yeasts drew from alleles that were segregating among a Holarctic lineage of S. eubayanus.

Broad Saccharomyces eubayanus geographic and ecological distribution
Our ongoing high-sugar enrichment surveys of yeast from soil, leaves, bark, mushrooms, and other natural substrates in North America isolated seven new strains of S. eubayanus: one from Washington State, USA; two from North Carolina, USA; and four from New Brunswick, Canada ( Fig 1A, S1 Table). The new S. eubayanus strains were isolated from novel tree hosts, including the bark of Cedrus sp., the bark and soil of Pinus taeda, and the bark of Quercus rubra. North American isolates of S. eubayanus remained quite rare overall (<1% of yeast isolates), except at specific sampling sites, and were only slightly biased toward the tree order Fagales (S1 Text, S1 Fig).

Some North American strains are closely related to lager-brewing yeasts
To determine how the new North American strains are related to South American [5,20], Asian [21], and New Zealand strains [22], we performed multi-locus phylogenetic analyses. Specifically, we partially sequenced nine nuclear coding sequences and three nuclear intergenic regions, consisting of a total of~9.8 kbp, as well as one mitochondrial gene (500 bp). Existing multi-locus data was utilized at this stage, rather than WGS data, because the Chinese strains are not available for study.
North American strains displayed three different types of ancestry: 1) the strain from Washington was embedded within the Patagonia B clade and was more closely related to the strain from New Zealand than any other Patagonia B strain, 2) the strains from New Brunswick were identical at these loci to three previously characterized admixed strains from Wisconsin, USA [20], and 3) the strains from North Carolina were closely related to the strains from Tibet and lager beer (Fig 1B, S1 Text). This latter "Holarctic" subgroup of strains (Tibet, North Carolina, and Lager) was well supported phylogenetically and was more closely related to the Patagonia B clade than to any other population ( Fig 1B). Phylogenetic supernetwork analysis and examination of the individual gene trees revealed a complex history for the strains in the Patagonian populations and their close Holarctic relatives, but it failed to unambiguously identify the closest relative of lager yeasts (S2 and S3 Figs, S1 Text).
To determine the consensus relationships among the wild populations of S. eubayanus and the domesticated lager-brewing hybrids, we compared the complete genome sequences of 33 strains, including representatives of both known lager yeast lineages (Saaz and Frohberg) and S. uvarum as the outgroup. In contrast to previously reported topologies citing a personal communication [25] and weak support in a multi-locus dataset [22], WGS data strongly agreed with our multi-locus phylogenetic tree and placed the Patagonia A population as an outgroup to a clade containing the Patagonia B population plus the strains from the Holarctic lineage ( Fig 1C). Even with WGS data, it remained unclear whether the Holarctic subgroup was embedded within the Patagonia B population or was sister to it. In contrast, the New Zealand strain was closely related to the Washington strain, both falling within Patagonia B. These analyses further showed that, on average, the S. eubayanus subgenomes of both the Saaz and Frohberg lager yeast lineages were more closely related to the representative strain from Tibet than to known strains from North Carolina or Patagonia. Nonetheless, analysis of the full single nucleotide polymorphism (SNP) dataset revealed extensively conflicting phylogenetic signals, which are displayed by the presence of nodes subtended by multiple edges in a phylogenetic network (Fig 1D).
No wild isolate is the sole closest relative of lager-brewing yeasts Concatenated phylogenies display the consensus topology supported by a dataset, which can obscure phylogenetic incongruence due to recombination, incomplete lineage sorting, and other biological processes. When genome-scale datasets are used, maximum support values can be obtained, even when different loci strongly support conflicting topologies [26,27]. To explore how recombination within and between populations has influenced the ancestry of S. eubayanus strains, we developed a simple and easily visualized test statistic and assessed its performance on one of the seven nearly identical admixed strains from North America ( Fig 2D). First, across the genome, we plotted the average pairwise nucleotide sequence divergence (and Genome-wide analysis of admixed strains. A) Pairwise nucleotide sequence divergence of the admixed strain yHKS210 compared to strains from the Patagonia A and Patagonia B populations. Average pairwise divergence comparisons are represented with red and blue dots for Patagonia A and Patagonia B, respectively. Standard deviations of pairwise divergence among Patagonia A and Patagonia B are represented by shadows, with broader regions corresponding to higher genetic diversity within populations. B) To directly visualize which population is closest to each region of the genome, we calculated the log 2 ratio of the minimum PB-Admixed nucleotide sequence divergence (d B-Ad ) and the minimum PA-Admixed nucleotide sequence divergence (d A-Ad ) in 50-kbp windows. log 2 < 0 or >0 indicate that part of the genome is more closely related to Patagonia A or Patagonia B, respectively. Regions lacking values are due to filters imposed based on coverage, data quality, or their absence in some strains (see S1 Text). C) Admixture ancestry assignment based on PCAdmix (i.e. an inference of which population is contributed that portion of the genome). Portions are defined by 20 SNPs. Blue indicates a chromosomal region inferred to share ancestry with PB-1, red indicates shared ancestry with PA-2, and white indicates that the method cannot make an inference. Roman numerals represent chromosomes. D) Unrooted ML phylogenetic tree reconstructed using SNPs. The scale bar shows the number of substitutions per site, corrected for invariant sites. standard deviation) of this strain compared to the Patagonia B and Patagonia A strains, clearly demonstrating regions more closely related to one population or the other (Fig 2A). This approach also revealed genomic regions of high genetic diversity within populations (Fig 2A) (e.g. the broader standard deviations of the left arm of chromosome IV among Patagonia A, and of the left arm of chromosome VII among Patagonia B strains). Next, for each window, we calculated the log 2 of the pairwise divergence ratio using the strain with the minimum pairwise divergence value from each population. This ratio produced sharp transitions between positive and negative values, which corresponded to likely recombination breakpoints ( Fig 2B). Our quantitative log 2 ratio approach was generally concordant with a well-established program (PCAdmix) that uses a principal component analysis (PCA)-based method with hidden Markov model smoothing to assign ancestry to chromosomal regions according to the population contributing to it ( Fig 2C). All seven admixed strains shared the same population ancestry in each chromosomal region, suggesting a recent radiation of this admixed lineage across the Great Lakes-Saint Lawrence Seaway.
Similar plots were constructed to determine whether the sequenced Tibetan strain was the closest relative of lager yeasts at all loci or whether there was indeed evidence for a more complex ancestry (Fig 3). Although most of the genomes of both the Saaz and Frohberg lager Genome-wide pairwise nucleotide sequence divergence to lager yeasts. A) and C) are pairwise nucleotide divergence comparisons to a Saaz and a Frohberg representative, respectively. Comparisons are made to the Patagonia A population, the Patagonia B strains, the two North Carolina strains, and the Tibetan representative. Dots represent average values, while standard deviations from the average are represented by the colored shadow area; red for Patagonia A, dark blue for Patagonia B, blue for Tibet (T), and light blue for North Carolina (NC). B) and D) are the log 2 ratios of the minimum NC-Lager divergence (d NC-X ) and the T-Lager nucleotide divergence (d T-X ) in 50-kbp windows, where X is B) Saaz (S) or D) Frohberg (F). log 2 < 0 or >0 indicate whether that part of the genome is more closely related to T or NC, respectively. Red lines in B) and D) are significance thresholds established by permutation tests (unbiased P < 0.019). Regions lacking values are due to filters imposed based on coverage, data quality, or their absence in some strains (see S1 Text). Roman numerals represent chromosomes. representatives were more closely related to the Tibetan genome than to the North Carolina genomes (i.e. log 2 divergence ratio values < 0), 19 regions were more closely related to the North Carolina genomes in both the Saaz and Frohberg strains (i.e. log 2 divergence ratio > 0.118 or 0.096 for Saaz and Frohberg, respectively, unbiased P < 0.019, permutation test) (Figs 3B, 3D and 4A). Each of these regions was supported by PCAdmix (Fig 4B), and PCAdmix detected several additional regions where the lager strains seemed to be more closely related to the North Carolina strains than to the Tibetan strain. The log 2 ratio statistic and PCAdmix define windows differently, either based on physical genomic distance or the number of SNPs, respectively. Therefore, as expected, the methods did not always partition genomes in exactly the same places.
Strong support for this alternative topology was confirmed by conventional phylogenetic analyses (Fig 4C and 4D, S4 Fig). In a handful of cases, a Patagonia B representative was actually more closely related to the parent of one or both of the lager lineages than the Tibetan strain was (S5 and S6 Figs). These regions could be due to incomplete lineage sorting, Phylogenetically supported regions of interest are indicated with brown bars when North Carolina (NC) was more closely related to Frohberg, while Tibet (T) was more closely related to Saaz; a red bar when NC was more closely related to Saaz and T to Frohberg; and purple bars (2 representative regions shown out of 19) when NC was the closest relatives of both the Saaz and Frohberg strains. Roman numerals represent chromosomes. B) Lager chromosome ancestry based on PCAdmix inference of which strain is most closely related to that portion of the genome. Portions are defined by 20 SNPs. Blue indicates inferred shared ancestry with T, light blue indicates shared ancestry with NC, and black marks where the method cannot infer the ancestry. PCAdmix and the log 2 ratios produce largely overlapping results, but minor discrepancies are expected due to the differences in how the chromosomes are partitioned. C-G) ML phylogenetic trees supporting the relationships indicated by the colors of the bars, as defined above. introgression, or different rates of evolution among wild S. eubayanus strains, but overall, they show that lager yeasts and wild strains of S. eubayanus have complex ancestries. In particular, none of the known wild isolates of S. eubayanus is the sole closest relative to lager-brewing strains. Instead, as in the case for most natural, sexually reproducing species, the data suggest an important role for outcrossing and incomplete lineage sorting in maintaining genetic variation and creating recombinant individuals.
Standing genetic variation in S. eubayanus persists in hybrid lagerbrewing yeasts Surprisingly, comparison of the log 2 divergence ratio values of the Saaz and Frohberg representatives against the North Carolina strains and the reference of Tibet (Fig 4, S5A and S6A Figs) highlighted at least five genomic regions where the ancestries of the Saaz and Frohberg representatives differed dramatically (Fig 4A). Several additional loci also had non-overlapping log 2 ratios between Saaz and Frohberg, which provides further evidence of the complex ancestries of these lineages ( Fig 4A). We closely inspected seven regions where the log 2 divergence ratio, PCAdmix, or both methods suggested that the lager lineages had different alleles (Fig 4). The discordant ancestries of three of these regions were strongly supported by conventional phylogenetic analyses (Fig 4E-4G). In each case, the North Carolina strains were more closely related to one lager strain, while the Tibetan strain was more closely related to the other.
To ensure that the phylogenetic signals in these three regions were not artifacts, we closely inspected them using several orthogonal methods, including de novo assembly, PCR, local investigation of conflicting phylogenetic signals, examination of heterozygosity, and examination of copy-number variants. For example, the strongest phylogenetic signal for the region on chromosome X came from a 3-kbp region that placed the Frohberg and Tibetan strains sister to each other on a long branch (S7 Fig). Although this region contains a solo LTR in most strains, de novo assembly confirmed that the solo LTR was absent in the Tibetan and Frohberg strains and was not responsible for the phylogenetic signal. Additionally, although the Frohberg strain had multiple copies of the S. eubayanus subgenome in this region, there was no detectable heterozygosity. Heterozygosity was also too low in the other regions of phylogenetic interest to confound results (S8 Fig); indeed, overall these regions had less heterozygosity (1.08 Ã 10 −4 and 8.49 Ã 10 −5 heterozygous sites/bp for Saaz and Frohberg, respectively) than the genome as a whole (2.08 Ã 10 −4 and 4.86 Ã 10 −4 heterozygous sites/bp for Saaz and Frohberg, respectively) (S9 Fig). Differences between the regions of interest and the genome as a whole in copy-number variation (S8 and S9 Figs) and genetic diversity (S8 Fig, S2 Table) were also not the cause of the phylogenetic incongruence. Instead, we infer that the Saaz and Frohberg strains examined possess different alleles that were drawn from standing variation segregating among wild strains of S. eubayanus.
Evidence that lager-brewing yeasts are descended from a Holarctic lineage of S. eubayanus To delineate the number of populations of S. eubayanus and determine how well differentiated they are, we analyzed the multi-locus data from the complete strain set using STRUCTURE (S1 Text). Strains from West China were inferred to be an independent population and excluded from subsequent analyses. Analyses of WGS data using multiple methods suggested that Patagonia A and Patagonia B-Holarctic were independent populations and recovered the admixed strains (Fig 5). Although divisions beyond K = 2 were not significant with STRUCTURE ( Fig  5A), principal component and coancestry analysis with fineSTRUCTURE provided some support for dividing Patagonia A into two subpopulations (PA-1 and PA-2, Fig 5B and 5C).
Similarly, these analyses split Patagonia B-Holarctic into three subpopulations, one containing most of the non-admixed strains from Holarctic ecozone (Holarctic: North Carolina, Lager, Tibet), one containing only S. eubayanus strains from South America (PB-2), and a final subpopulation containing South-American and non-South American strains (PB-1).
These analyses also provided additional information about closest relatives of the admixed and lager strains. The fineSTRUCTURE coancestry heatmap suggested that PB-1 and PA-2 were the closest relatives of the admixed strains (Fig 5B). These results were also supported by analysis of D-statistics, where the most significant values were obtained when PB-1 and PA-2 were tested as donors to the admixed strains (S3 Table). Analysis with PCAdmix suggested that PB-1 contributed about 58% of the genome to the admixed strains, whereas PA-2 contributed 42%, results consistent with the phylogenetic analyses and an f4-ratio test (S3 Table, Fig  1D). Analysis with PCAdmix for the lager genomes further suggested that strains more closely related to the Tibetan strain contributed 66% of the S. eubayanus genetic material, whereas strains more closely related to those from North Carolina contributed 34% (S1 Text). Nonetheless, we caution that the few available data are best interpreted as pointing to the existence of standing variation across the Holarctic lineage, rather than direct ancestry or admixture involving these specific extant strains. These results, together with the nucleotide diversity statistics (Fig 6A), the pairwise comparison of F st , the distribution of SNPs (Fig 6B), and phylogenetic analysis (Fig 1B) support at least four distinct populations of S. eubayanus: Patagonia A, Patagonia B-Holarctic, Sichuan, and West China (Fig 6A). The nucleotide diversities of the West China population and the Holarctic lineage were lower than either population from Patagonia (Fig 6A, S4 Table). In contrast to the other populations or groups, including the Holarctic lineage as a whole, only the 10 strains from Tibet had significantly negative values for Tajima's D, Fu and Li's D, and Fu's F (S4 Table). The Tibet group's Fay and Wu's H value was not significantly different from zero Sichuan, SI; Tibet, T; and West China, WC. The genetic distances estimated by the multi-locus approach and by WGS comparisons were generally congruent, with differences ranging between 0.00% and 0.12%. Prior the estimation of pairwise genetic distance (Tamura-Nei corrected) and nucleotide diversity, we deleted gaps for each pairwise comparison. 1 Value inferred from [21] since sequence data for the sole strain from Sichuan were not deposited in GenBank. (H = 0.76 P > 0.05, calculated using Patagonia B strains as an outgroup), which is consistent with a neutral demographic explanation, such as a recent local population expansion across the vast region of Tibet surveyed.

Discussion
Parallels between the biogeography of S. eubayanus and its sister species The patterns of diversification and differentiation between S. eubayanus populations are remarkably reminiscent of those described recently for its sister species, S. uvarum (S10 Fig) [23]. Specifically, both species include early-diverging subspecies in East Asia or Australasia. Both species have two highly diverse, partially sympatric populations in Patagonia that are about 1% divergent in DNA sequence. In both cases, one of these populations is closely related to a relatively low-diversity lineage with a Holarctic distribution that gave rise to domesticated hybrid yeasts that ferment economically important products. In contrast to the process of introgression seen in domesticated strains of S. uvarum, lager yeasts were generated through allopolyploidization of S. eubayanus and S. cerevisiae. Genetic mechanisms of hybridization aside, the deep parallels between the diversifications of these two sister species in the wild suggest that similar biogeographical and ecological forces may explain their distributions. The presence of wild S. uvarum in Europe further suggests that Holarctic representatives of S. eubayanus are present, or may have been present in the past, somewhere in Europe.

The importance of understanding the Holarctic lineage of S. eubayanus
Although non-hybrid isolates of European S. eubayanus remain elusive, we expect European strains of S. eubayanus would have relatively low genetic diversity, belong to the Holarctic lineage, and be genetically similar to isolates from Tibet and North Carolina, as well as to the parents of lager yeasts. Importantly, any European strains that might eventually be discovered will not be the closest relative to all lager yeasts at all loci because, as this study shows, standing genetic variation in S. eubayanus made it through the bottleneck of hybridization that generated modern lager yeasts. All of the currently proposed models of hybridization are compatible with this data, including multiple hybridization events [6,[12][13][14][15], differential loss-ofheterozygosity among heterozygous ancestors [11], or more complicated backcrossing scenarios [9][10][11]14,28]. The complexity of lager yeast ancestry means that identifying the alleles relevant for specific traits may require a broad sampling of S. eubayanus genetic diversity from across the Holarctic ecozone.
In contrast to the frequent isolation of S. eubayanus from Nothofagus in Patagonia [5], the rare Northern Hemisphere strains of S. eubayanus described here and in other recent studies [20,21] were isolated in association with several different tree genera (S1 Fig). These findings suggest that our understanding of S. eubayanus ecology is still quite limited or may be an indication of its generalist character, as has recently been argued for S. cerevisiae [29]. Expanded sampling of substrates beyond the conventional hosts of Quercus and Nothofagus [30], even in South America [24], may be critical to gaining a fuller view of the ecological and genetic diversity of S. eubayanus.
Additional isolates will also be key for evaluating competing demographic models to explain the relationship between the Holarctic lineage and the Patagonia B population. One possibility is that a large ancestral population was split by vicariance, perhaps as the climate warmed following the last glacial period. Alternatively, long-range dispersal could have occurred between the Northern Hemisphere and South America, potentially in either or both directions. The relative diversities of the Holarctic and Patagonia B lineages and the confinement of a signature of recent demographic expansion to the Tibetan strains argue that dispersal from South America into the Holarctic may be more likely. Nonetheless, the distribution of clades defies a simple explanation and appears to require cladogenic events in multiple locations, both for S. eubayanus and its sister species S. uvarum.
Human activity is not required to explain the dispersal of S. eubayanus to Europe Although humans undoubtedly played a role in selecting for the allopolyploid hybrids that became lager yeasts, human activity is not required to explain the spread of wild S. eubayanus across the Holarctic ecozone. Even conservative molecular clock estimations place all S. eubayanus cladogenic events, including the origin of the Holarctic lineage, well outside of the range of written human history (S11 Fig). Moreover, no known strain is a close enough relative to the ancestor of lager yeasts to be compatible with human-mediated transfer to Europe via the Silk Road [21] or any hypothesis involving colonial era transfer to Europe from South America [5] or North America.
How yeasts migrate is still controversial. Proposed natural mechanisms include long-range dispersal by birds [31,32], short-range dispersal by insects [33], or dispersal by wind [34]. The former may be particularly relevant because some bird migration flyways from Patagonia to Greenland or Alaska, overlap with European or Asian migration routes, respectively [35]. Clear cases for human-associated yeast dispersal have been made for industrial strains of S. cerevisiae, including the dispersal of Wine/European strains to wine-making regions all over the world [36][37][38][39][40][41], as well as some interspecies hybrids used in wine production [42]. Interestingly, Wine/European strains of S. cerevisiae have retained considerable genetic diversity, perhaps because large effective population sizes were maintained and because of the semipermeable nature of the vineyard environment [41]. European strains of S. paradoxus have also been inferred to have been dispersed to North America and New Zealand, possibly in association with Quercus [25,39,43]. A recent population genomic analysis of the former case revealed extremely low levels of diversity and a coalescence date consistent with colonial era dispersal [44].
The genomic diversity that we observed among the admixed strains of S. eubayanus from Wisconsin and New Brunswick is also consistent with a very recent dispersal to opposite ends of the Great Lakes-Saint Lawrence Seaway. The number of inferred breakpoints (40 total crossovers, Fig 2B) is similar to the number observed in one round of meiosis in S. cerevisiae [45], and each Patagonian population seems to have contributed approximately half of their genomes. Since all seven admixed strains share the same breakpoints and have nearly identical genome sequences (of 325 variable SNPs, only 37 differentiate Wisconsin from New Brunswick, Fig 2D), they are likely descended quite recently from a single individual that underwent haploselfing after an outcrossing event and one round of meiosis. Although we cannot be certain whether this dispersal across North America and the dispersal of S. paradoxus to North America were anthropic [44], they demonstrate that recent continent-scale dispersal is detectable in yeast using WGS data. In contrast, the mean genetic distance among S. eubayanus Holarctic genomes is well over 100 times higher (0.1989% for the Tibetan, North Carolina, and lager strains versus 0.0013% for the admixed strains of S. eubayanus and 0.0009% for the North American strains of S. paradoxus from Europe).

Conclusion
In conclusion, S. eubayanus biogeography and the origins of lager yeasts have proven more complex, but also much richer, than initially hypothesized. Here we have presented evidence that lager yeasts are derived from a relatively low-diversity lineage of S. eubayanus with a Holarctic distribution. These strains from the Holarctic lineage diversified from within one of two diverse populations found primarily in Patagonia. This pattern of diversification is similar to that of its sister species, S. uvarum. Although the S. eubayanus subgenomes of lager yeasts were drawn from the Holarctic lineage, none of the known S. eubayanus isolates is their sole nearest relative. Indeed, for the first time, we have shown that variation segregating among wild S. eubayanus persists among the allopolyploid lager-brewing yeasts. These findings strongly suggest that further sampling of the Northern Hemisphere for S. eubayanus will, not only enhance our understanding of the natural history and genetic diversity of this important species, but offer valuable insight into the sources of diversity among modern brewing strains.

Yeast isolation
New S. eubayanus strains were isolated from two locations in the USA, Washington State (yHKS509) and North Carolina (yHRVM107, yHRVM108), by following previously described high-sugar enrichment protocols at 10°C [46]. Four new S. eubayanus were isolated by enrichment from New Brunswick (yHDPN421-yHDPN424), Canada, as previously described [47], with the exception that the samples were incubated in liquid medium for seven months at 4°C, followed by a second culture step on solid medium for two weeks at 4°C. Strains were initially identified by PCR and Sanger-sequencing of the ITS region of the rDNA locus (see S1 Text). Complete results of these yeast biodiversity surveys will be reported elsewhere, and our recent publications represent less than half of the yeast strains isolated [46,47].

Multi-locus sequence data generation
For the phylogenetic and nucleotide diversity analyses, we selected genes and intergenic sequences to integrate the maximum amount of sequencing data available from previous studies [20][21][22] (S1 Table). Additional genes from Patagonian and the newly isolated S. eubayanus strains were PCR-amplified and Sanger-sequenced (S4 Table). Reads from sequenced genes were assembled using the STADEN Package v1.7 [48]. The COX2 sequence of strain CDFM21L.1 was assembled in GENEIOUS v6.1.6 using the reads retrieved by BLASTing the S. eubayanus COX2 sequence against SRR1507225 from the SRA database of NCBI [21]. Individual genes of strain P1C1 were retrieved by BLASTing against its genome assembly (S1 Text). New sequences generated were deposited in GenBank under accession numbers KR871406-KR871626.

Individual phylogenetic gene trees and supernetworks
Phylogenetic gene trees and the supernetwork were reconstructed following our previous approach [20]. The supernetwork was reconstructed using the relative average for edge weights and using the filter option to discard the splits from PDR10 (a gene undergoing balancing selection or reciprocal introgression between some populations) (Dataset A) (S1 Text). An additional Neighbor-Net phylogenetic network was reconstructed for the SNP dataset using SplitsTree v4.12.8 [49].

Genome sequencing and analyses
Genomic libraries for available S. eubayanus strains (S1 Table), one representative strain from the Saaz lineage of lager yeast (CBS1503), and one representative strain from the Frohberg lineage of lager yeast (W34/70) were generated as described previously [50] and sequenced using Illumina paired-end sequencing (S5 Table). Details on the identification of high-quality single nucleotide polymorphisms (SNPs) can be found in S1 Text. Illumina reads were deposited in the SRA database of NCBI under accession number SRP064616.
After removing positions with gaps in any strain, whole genome nucleotide divergence graphs were constructed by calculating the pairwise number of segregating sites per nucleotide or divergence (d) in windows of 50,000 bp using the PopGenome package for R [51]. To compare how closely related various strains of interest (i.e. lager or admixed) were to a portion of the genome of two defined reference strains (e.g. North Carolina and Tibet), the value of the log 2 of the ratio of the d values were calculated for each window (see S1 Text).
The whole genome phylogenetic tree was reconstructed from WGS data using RAxML v8.1 [52]. For phylonetwork and population analyses, SNPs were selected using strict coverage and quality filters (details in S1 Text). Based on the comparisons of the log 2 divergence ratios or the PCAdmix results, genomic regions of interest were extracted for phylogenetic analyses (see S1 Text). Regions of interest were extracted from whole genome assemblies reconstructed using iWGSv1.01 [53].

Population genetics and genomics
A multi-locus concatenated alignment from Dataset A (~7.7 kbp) was generated using FAScon-CAT v1.0 [54]. Multi-locus concatenated alignment and WGS data were used for diversity statistics, polymorphism comparisons, and population analyses (see S1 Text). The concatenated alignment was also used to reconstruct a Maximum-Likelihood phylogenetic tree in RAxML v8.1 using the same parameters as for the individual gene trees.
A second recombinant-free concatenated alignment of the coding sequences from Dataset B (Dataset A where IntMD, MET2, and MLS1 sequences, which had low information content, were discarded) was generated using IMGC [55] and FASconCAT. The 380 fourfold degenerate sites in this alignment were used to estimate divergence times. Divergence time reconstruction was performed as we described previously [20].
The number of populations for the SNP dataset were inferred using STRUCTURE v2.3.4 [56]. fineSTRUCTURE v2 [57] was used to generate coancestry heatmaps and to perform PCA. Parental contributions to the genomes of Wisconsin, New Brunswick, Saaz, and Frohberg strains were estimated using a hidden Markov model of evolution implemented in PCAdmix v1.0 [58], and chromosomes were partitioned according to the output results. Analyses of f-and D-statistics were performed in ADMIXTOOLS v3.0 [59].
Supporting Information S1 Text. Supplementary materials. (DOCX) S1  Table. PCR primer sequences and conditions used in the present study.  Table) were, in general, lower than the average value found genome-wide among the strains from the Patagonia A-Patagonia B-Holarctic clade (0.57%). For 14 of 44 genes the values were higher but less than twice the genome-wide diversity values. Based on comparisons to the multi-locus dataset, the false positive rate of our pipeline at calling non-heterozygous sites was low (4.63 Ã 10 −5 SNPs/site) and not sufficient to influence conclusions regarding the regions of interest. (PDF) S10 Fig. S. eubayanus and S. uvarum phylogenetic tree comparison. S. eubayanus and S. uvarum phylogenetic trees are shown in A) and B), respectively. Color bars represent populations for each species, and are colored according to the colors used in the previous S. eubayanus phylogenetic tree figures. Demographically similar populations of S. uvarum use the analogous colors from S. eubayanus. The multi-locus S. eubayanus phylogenetic tree is from Fig 1B, [21] suggested that the TTH27L.1 and PYCC 6148 T (= CRUB 1568 T ) MLS1 sequences were not recombinant; however, the sequences deposited in GenBank (KF892364 and KF892348, respectively) appeared to be recombinant. Our copy of the strain PYCC 6148 T did not possess a recombinant MLS1, but we could not check the strain TTH27L.1 because it is not available for study. We noted that the apparent recombination point for both strains is at the junction of the promoter and coding sequence, so we suspect that errors were introduced in silico while the sequences were uploaded to GenBank or when multiple Sanger sequencing reads were assembled. Absent further direct verification of TTH27L.1 MLS1, we suggest that the apparent recombination is likely an artifact. (DOCX)