Population genomics of Fusarium graminearum reveals signatures of divergent evolution within a major cereal pathogen

The cereal pathogen Fusarium graminearum is the primary cause of Fusarium head blight (FHB) and a significant threat to food safety and crop production. To elucidate population structure and identify genomic targets of selection within major FHB pathogen populations in North America we sequenced the genomes of 60 diverse F. graminearum isolates. We also assembled the first pan-genome for F. graminearum to clarify population-level differences in gene content potentially contributing to pathogen diversity. Bayesian and phylogenomic analyses revealed genetic structure associated with isolates that produce the novel NX-2 mycotoxin, suggesting a North American population that has remained genetically distinct from other endemic and introduced cereal-infecting populations. Genome scans uncovered distinct signatures of selection within populations, focused in high diversity, frequently recombining regions. These patterns suggested selection for genomic divergence at the trichothecene toxin gene cluster and thirteen additional regions containing genes potentially involved in pathogen specialization. Gene content differences further distinguished populations, in that 121 genes showed population-specific patterns of conservation. Genes that differentiated populations had predicted functions related to pathogenesis, secondary metabolism and antagonistic interactions, though a subset had unique roles in temperature and light sensitivity. Our results indicated that F. graminearum populations are distinguished by dozens of genes with signatures of selection and an array of dispensable accessory genes, suggesting that FHB pathogen populations may be equipped with different traits to exploit the agroecosystem. These findings provide insights into the evolutionary processes and genomic features contributing to population divergence in plant pathogens, and highlight candidate genes for future functional studies of pathogen specialization across evolutionarily and ecologically diverse fungi.


Introduction
Fusarium graminearum is the primary cause of Fusarium head blight (FHB), a devastating disease affecting wheat, barley and other cereal crops worldwide [1,2]. During host infection, the PLOS  fungus destroys grain and produces toxic metabolites, like deoxynivalenol, that contaminate cereals and damage the immune, gastrointestinal and reproductive systems of humans and animals [1,3]. Repeated FHB outbreaks in major cereal-producing regions around the world have resulted in substantial reductions in cereal yield, quality and value, and billions of dollars in economic losses [4,5].
In North America FHB has re-emerged in recent decades, with outbreaks occurring regularly across major cereal growing regions [2, 6,7]. North America has historically been dominated by a single, genetically diverse population of FHB pathogens (referred to as NA1) that typically produce the 15-acetyl-deoxynivalenol (15ADON) trichothecene toxin analog [8]. However, a highly virulent, introduced population of F. graminearum (referred to as NA2) that typically produce 3-acetyl-deoxynivalenol (3ADON) has rapidly increased in frequency in some regions compared to the endemic population [9][10][11][12]. Although the geographic origin of NA2 remains uncertain, this population exhibits reduced diversity consistent with a clonal expansion following a bottleneck, which likely occurred after founder isolates were transcontinentally introduced [11]. Trichothecenes are known virulence factors in wheat [13], and chemotype differences may therefore be contributing to the invasion of NA2/3ADON isolates in some regions. However, NA2 isolates can also deposit more toxin in grain, grow faster and spread more quickly than NA1 isolates on certain lines of wheat [11,12,14,15], indicating that other fitness traits may be involved in pathogen adaptive shifts. Changes in FHB pathogen composition therefore warrants concern about food safety and the sustainability of FHB-resistant crops and fungicides.
Additional diversity in the F. graminearum gene pool was recently discovered in the Midwestern US, where a novel group of isolates that have the 3ADON haplotype at the trichothecene biosynthetic gene cluster (TRI-cluster), but produce a distinct trichothecene analog, 3αacetoxy, 7α,15-dihydroxy-12,13-epoxytrichothec-9-ene (NX-2) were found to cause FHB on wheat [16][17][18]. The evolution of NX-2 in F. graminearum likely reflects a change in selective constraint on Tri1, a trichothecene-modifying P450 enzyme located on chromosome 1, outside of the TRI-cluster [16]. Recent inquiries into the demographic history of NX-2 F. graminearum were unable to resolve their relationship with NA1 and NA2; genotypes based on tandem repeat markers suggested these isolates share the NA1 genetic background [19], whereas restriction site polymorphisms indicated shared ancestry with NA2 [20]. However, both of these previous studies failed to reject the hypothesis that the NX-2 chemotype may be associated with a distinct population.
The significant changes in FHB pathogen diversity in North America have resulted in geographic clines in chemotype frequency [11] and region-specific biases in FHB population composition; NA2 is dominant in parts of the Midwestern US and western and Maritime provinces of Canada, NA1 is dominant in eastern Canadian provinces of Québec and Ontario [19], and NX-2 F. graminearum are exclusively found in the northern US and southern Canada [16,21]. The geographic distribution and population dynamics of these pathogens suggest that FHB populations inhabiting North America have been influenced by a complex selective landscape [19], and may be differentially adapted across their continental range. Currently, however, we know very little about the genetic variation in F. graminearum populations that could be contributing to phenotypic divergence in FHB pathogens.
Genome comparisons of two F. graminearum isolates highlighted genomic regions with high evolutionary potential [22], and strain-specific genes that may influence virulence [23]. However, genomic diversity across the broad spectrum of FHB pathogens in North America has never been evaluated. Therefore, to identify candidate genes for functional studies of FHB pathogen diversity, we sequenced the genomes of 60 F. graminearum comprising NA1, NA2 and NX-2 isolates sampled across North America, and 11 genomes representing four related Fusarium outgroup species (S1 Table). We performed phylogenomic and Bayesian analyses to gain a genome-wide perspective on the population identity of NX-2 isolates and clarify their genetic relationships with endemic NA1 and introduced NA2 pathogens. Then, we scanned the genome for signatures of population divergence, by searching for localized reductions in nucleotide diversity and pronounced inter-population differentiation that occurs around alleles targeted by selection [24,25]. In addition, we assembled the first F. graminearum pangenome to identify gene sets that distinguished populations and may therefore contribute to phenotypic variation in FHB pathogens. Biological functions of the candidate selected genes and gene sets that differentiated populations were then investigated to make inferences about fungal divergence, pathogen population dynamics and future prospects for crop disease management.

Population genomic structure
We generated whole-genome sequences for 60 isolates of F. graminearum, 10 isolates from three related species in the Fusarium graminearum species complex (FGSC), F. gerlachii (N = 3), F. louisianense (N = 2), and F. boothii (N = 5), and an isolate of F. pseudograminearum, a member of the Fusarium sambucinum species complex that is an outgroup to the FGSC [26]. On average, genome re-sequencing of the F. graminearum isolates resulted in 50X coverage across 99% of the PH-1 reference genome. The AT rich (>90% AT) regions near the centromeres and ends of chromosomes [27] were generally not well resolved due to repetitive DNA content, and were therefore excluded from further analyses. A total of 505,748 quality-filtered, nuclear SNPs identified across the 60 isolates of F. graminearum were retained for population genomic analyses, which comprised a subset of the 1,083,417 SNPs identified across all isolates and species in the FGSC. The genomic distribution of SNPs within F. graminearum varied extensively, consistent with previous reports demonstrating that regions near subtelomeres and central portions of chromosomes 1, 2 and 4, corresponding to putative ancestral chromosome fusion sites, had the highest density of polymorphic sites [22,23,28].
Preliminary Bayesian clustering analyses indicated that sampling SNPs at 5 kb intervals across the genome provided sufficient resolution to assess population structure and admixture, while significantly reducing the computational load and minimizing the influence of linkage. Analyses of this reduced dataset comprising 6,852 variant sites provided strong evidence for three distinct populations, which was in agreement with evolutionary relationships depicted by a reticulating phylogenetic network of F. graminearum that accounted for intraspecific recombination (Fig 1A). Across three replicate Bayesian clustering analyses, likelihood and ΔK values were maximized for models with K = 3. Population structure inferred from the optimal model indicated that endemic NA1/15ADON and introduced NA2/3ADON F. graminearum belonged to two separate populations, based on q-value estimates of each isolate's ancestry in each population (average q NA1 = 1.0, average q NA2 = 0.99). However, a single 3ADON isolate (F328) and the majority (60%) of NX-2 F. graminearum were assigned to a third genetic population, hereafter referred to as NA3 (average q NA3 = 1.0). The remaining eight NX-2 isolates had admixed genomes, in that they shared the majority of their ancestry with the NA1 population (average q NA1 = 0.85), but were assigned with significantly lower probability compared to other isolates (P < 0.001) and also shared a proportion of ancestry with the NA2 (average q NA2 = 0.13) and NA3 (average q NA3 = 0.02) populations.
The maximum likelihood phylogeny inferred from all 1,083,417 SNPs identified in isolates of F. graminearum and FGSC species was consistent with Bayesian clustering analyses and phylogenetic networks of F. graminearum, in that the three genetic populations comprised strongly supported (100% of bootstraps) clades ( Fig 1B). The phylogeny indicated that the NA3 clade split from the common ancestor of the NA1 and NA2 populations following divergence of F. graminearum from F. gerlachii and F. louisianense, two sister species found in the US.
Collectively, Bayesian and phylogenomic analyses indicated that eight isolates with the NX-2 toxin type clustered with NA1/15ADON isolates. Recombination events between admixed NX-2 F. graminearum and NA1, NA2 and NA3 F. graminearum were evident in the phylogenetic network as well ( Fig 1A). Moreover, in the maximum likelihood phylogeny (Fig 1B), admixed isolates occupied a basal position within a clade that included all of the NA1/15A DON isolates, consistent with expectations for isolates with a recombinant background. A separate maximum likelihood phylogeny of TRI1, the gene responsible for differences between NX-2 and 3ADON or 15ADON toxin types, confirmed that the gene sequences encoding the NX-2 trichothecene toxin had a single evolutionary origin and did not arise separately in admixed and NA3 isolates (S1 Fig), consistent with previous studies [16]. Further, genomewide averages of sliding-window F ST values indicated significantly higher levels of gene flow between admixed isolates and NA1 isolates (average F ST = 0.16, P < 0.001), and to a lesser extent, between admixed isolates and NA2 isolates (average F ST = 0.43, P < 0.001) compared to all other population pairs. As such, the NX-2 F. graminearum associated with the NA1 population appear to have genomes that reflect a history of recombination between populations. Therefore, to limit the confounding effects of gene flow, these admixed NX-2 isolates were excluded from further population-based analyses, but were examined separately where appropriate.

Genome-wide diversity of F. graminearum populations
Genomic scans of diversity performed in non-overlapping, 10 kb sliding windows revealed distinct patterns of DNA variation within NA1, NA2 and NA3. Average nucleotide diversity (π) and polymorphism (θ) indicated reduced genetic variation in NA2, and a relatively low number of nucleotide differences between isolates (Table 1). Furthermore, Tajima's D values, which describe the spectrum of allele frequencies in each population, were significantly higher and the variance of D values was significantly larger (Bartlett's test P < 0.001) in NA2 than in the other two populations. These genetic patterns are consistent with a recent bottleneck in NA1/15ADON isolates, 20 NA2/3ADON isolates, 20 NX-2 isolates, and for the maximum likelihood analyses, representatives from three FGSC species were included: F. boothii (N = 5), F. gerlachii (N = 3) and F. louisianense (N = 2). Different colors indicate clustering assignment of each isolate in the three populations inferred from Bayesian analyses: NA2 = red, NA1 = blue and NA3 = green. A. SplitsTree4 [98] was used to construct a network wherein F. graminearum isolates are represented by terminal nodes, and relationships are depicted as branches with parallel edges indicating reticulate events (i.e. recombination, gene transfer). B. In the maximum likelihood phylogeny, all bootstrap values were 100%, except those indicated at branch nodes. The tree was rooted with F. boothii and drawn to scale, with branch lengths measured in the number of substitutions per site. Colored bars indicate Bayesian estimates of ancestry (q) for each isolate in the three populations. NA2. In contrast, the endemic NA1 population was the most diverse, and isolates were genetically heterogeneous as indicated by the relatively large number of pairwise nucleotide differences (Table 1), and the positive average Tajima's D [29]. NA3 showed slightly lower levels of diversity than NA1, but values of Tajima's D were relatively low and negatively biased, which could indicate a recent population expansion [30]. The demographic histories inferred from the observed allele frequency spectrums of the three populations were consistent with the evolutionary history of North American F. graminearum reported previously [11,16,19,20], however, average Tajima's D values suggested deviations from neutral expectations, and should thus be interpreted with caution.

Signatures of selection highlight genomic regions involved in population divergence
After characterizing genome-wide patterns of diversity within populations, we utilized combined signals from several sliding-window statistics to pinpoint genomic regions contributing to divergence in the three populations. We identified 14 regions with genetic signatures of selection (i.e., outliers, Table 2, Fig 2). These regions exhibited significant interpopulation divergence (P < 0.001 for D xy , F ST and Tajima's D between population pairs) and reduced diversity (P < 0.001 for π) within one or more populations as compared to a null distribution of values generated through random permutation. Patterns of genetic variation in the vicinity of these regions indicated that ten of the 14 outliers had signatures of selection that localized to a single 10 kb window, while the other four had signatures that extended across consecutive 10 kb windows, with three outliers spanning 20 kb and one spanning 30 kb ( Table 2). All but one of the outliers (o9, chromosome 2) were clustered in the subtelomeric and interstitial portions of chromosomes (Fig 2), within known polymorphic islands [22,28]. For six outliers evidence of selection was limited to haplotypes from a single population, i.e., only one of the three populations exhibited reductions in diversity and high divergence from the other two populations ( Table 2, Fig 3A), suggesting that a single selective sweep had targeted one population. In contrast, at the remaining eight regions multiple, population-specific outlier haplotypes were observed, such that two or more populations exhibited reductions in diversity and high divergence from the other populations (Fig 3B), suggesting that soft sweeps had resulted in the fixation of multiple, divergent alleles in populations, or that the same genomic regions had experienced recurrent selective events. At all of these regions, sequences within the outlier population(s) were highly similar, but had an extremely large number of fixed differences compared to sequences in the other populations (D xy P < 0.001). Although outliers tended to reside in genomic regions that generally exhibited frequent recombination and are known to harbor recombination hot-spots [22,38], in most cases, there was a notable reduction in the number of cross-over events within 10 kb outlier windows of the affected population(s) as compared to the flanking regions (Fig 2).
Admixed isolates had sequences that were 98-100% identical to sequences of NA1, NA2 or NA3 isolates at all outlier regions. Further, for nine of the 14 outliers, sequences in at least some of the F. graminearum isolates (Table 2) were found in related FGSC species as well. Phylogenies of genes residing within outlier regions were discordant with the genome-wide phylogeny based on SNPs (Fig 1B), in that trees constraining F. graminearum isolates to a monophyletic clade were significantly (P < 0.001) worse than maximum likelihood inferred phylogenies, with a single exception (FGRRES_02171). The majority of outliers (11 of 14) contained at least one gene that showed trans-specific patterns of inheritance, in that alleles from one F. graminearum population were more closely related to the alleles in other FGSC species than to the alleles in different F. graminearum populations (S2 Fig). For example, NA1 and F. boothii isolates shared the 15ADON haplotype at the TRI-cluster genes located on chromosome 2, while all NA2, NA3 and admixed isolates with the NX-2 chemotype shared the 3ADON TRI-cluster haplotype (Table 2). Outlier regions did not distort our whole-genome phylogeny, however, as the topology of the genome-wide SNP phylogeny did not change when these 14 regions were omitted (S3 Fig).
Of the 81 genes residing in outlier regions (Table 2), 56 contained domains that were functionally annotated in the InterPro database. Functional enrichment analyses of domains associated with a Gene Ontology (GO) term (N = 31) indicated that zinc ion binding domains (GO:0008270), specifically, fungal transcription factors (IPR007219, IPR001138), and membrane proteins (GO:0016021; IPR011701, IPR005828, IPR000175) were significantly overrepresented among the population differentiating genes in outlier regions as compared to all F. graminearum genes (Fig 4). Additionally, a substantial proportion of outliers (71%) contained genes that encoded one or more hydrolytic or degradative enzymes, and at least four contained genes with predicted functions related to pathogenicity, including short, cysteine-rich, secreted effector proteins [31]. In addition, another outlier locus encoded a secreted carboxylesterase Sliding-window values of intrapopulation diversity and recombination (Panel A, Tajima's D, π and R m ) and interpopulation differentiation (Panel B, Tajima's D, D xy, and F ST ) were calculated in 10 kb windows to identify genomic regions with signatures of selection in NA1 (omitting admixed NX-2 isolates), NA2 or NA3 populations of F. graminearum. The fourteen outliers (o1-o14, yellow highlight) showed significant (P < 0.001) divergence between populations based on pairwise (interpopulation) values of Tajima's D, D xy, and F ST , coupled with significantly reduced diversity (π) within the divergent population(s). Significance was assessed by comparing observed sliding-window values of each metric against a null genome-wide distribution derived through random permutation.
https://doi.org/10.1371/journal.pone.0194616.g002 (FGRRES_09181) that is likely involved in the breakdown of plant material during FHB infection [36,39], flanked by a DEAD helicase (FGRRES_13527) and a polyketide synthase (Pgl1) (Fig 3A). The strongest signals of selection were observed across the length of the TRI-cluster genes (Table 2, Fig 2, Fig 3B) encoding enzymes and proteins that are directly involved in the biosynthesis of trichothecene mycotoxins.
Not all effector genes identified in outliers were implicated in pathogenicity. Two of these regions contained secreted, cysteine-rich effector proteins that colocalized with fungal incompatibility cell death domains (HeLo, PF14479; HET, PF06985; [40]). Two other outliers encoded secreted, antimicrobial proteins (Table 2). Moreover, all three populations were highly differentiated at the putative anti-fungal protein FGRRES_10561, containing an Hce2 Functional enrichment was assessed based on Gene Ontology (GO) terms assigned to domains annotated in candidate genes identified in NA1 (left, blue bars), NA2 (center, red bars) and NA3 (right, green bars). Candidate genes included genes in outlier regions showing signatures of selection and differentially conserved genes in each population. Hypergeometric tests were performed by calculating Z-scores that compared the relative frequency of GO terms assigned to each candidate gene set to GO terms assigned to all functionally annotated F. graminearum genes. Z-scores and the number of domains in each gene set (N) are only shown for GO terms that were significantly overrepresented after a Benjamini-Hochberg adjustment for multiple testing (adjusted P < 0.01) [134]. InterPro domains associated with each GO term are listed in parentheses below the term description in bold type.

Heterogeneity in gene content in FHB pathogen populations
After mapping reads to the PH-1 genome sequence, we performed a de novo assembly of the unmapped reads from each isolate to explore genomic features that were not represented in the reference genome. This allowed us to assemble a pan-genome for North American F. graminearum, comprising core genes shared by all isolates and accessory genes that were missing in one or more isolates. On average, orphan contig assembly resulted in an additional 705 kb of nucleotide sequence per isolate, though the percentage of AT bases in these sequences was higher than the genome-wide average; that is, the AT content was 56% for orphan contigs versus 52% in the reference genome. This suggests that these sequences tended to reside close to the AT-rich regions near centromeres and the ends of chromosomes [27]. Due to their low sequence complexity, most orphan contigs could not be anchored to specific chromosomal regions using bioinformatics approaches. However, Sanger sequencing into the flanking regions of a subset of orphan contigs (N = 5) placed them in the telomere proximal regions of chromosomes 1 and 3.
To determine if differences in gene content contributed to population divergence, we examined presence/absence polymorphism in homologs of all genes present in the reference genome and genes present on orphan contigs in the 60 sampled genomes. We utilized stringent criteria for ortholog assignment (single linkage clustering with inclusion threshold of D i,j ! 0.5) that were able to distinguish 97% of all annotated reference genes. A total of 13,632 genes were present in all 60 genomes and the PH-1 reference genome, though a proportion of these (9%) contained premature stop codons or were missing start codons in a subset of genomes and were thus classified as separate orthologs. Still, core genome size may be slightly overestimated, as our clustering algorithm based on BLAST-alignment scores was unlikely to distinguish full-length orthologs from proteins variants with short, carboxy terminal truncations in some cases. Nonetheless, 1,681 genes were considered accessory genes because they were not conserved among all genomes examined. That is, orthologs were missing in one or more genomes, or exhibited considerable divergence from the reference sequence (< 62% identity on average) such that BLAST-based clustering partitioned them into a distinct ortholog group. Accessory genes were relatively rare among sampled isolates, and 27% were unique to a single isolate (S4 Fig), suggesting that many of these genes are poorly conserved or were recently acquired in the sampled isolates. Locations of accessory genes that could be mapped to the PH-1 reference sequence (N = 512), overlapped extensively with high diversity, frequently recombining regions of the genome near subtelomeric and interstitial portions of chromosomes [22,27]. Functional enrichment analyses of GO terms assigned to the 1,681 accessory genes indicated that zinc ion binding transcription factors (GO:0008270; IPR00 1878, IPR003604, IPR007219, IPR000679, IPR001138, IPR006026), protein binding domains such as WD40, ankyrin and tetratricopeptide repeats (GO:0005515; IPR001810, IPR001680, IPR001214, IPR002110, IPR019734, IPR003877), and nuclear associated DNA binding proteins (GO:0003676, GO:0005634; IPR001878, IPR011129, IPR003604, IPR013087, IPR007219, IPR001138, IPR004367) were significantly overrepresented (P < 0.01) in the accessory genome as compared to all F. graminearum genes. A dendrogram based on gene presence/absence polymorphisms in core and accessory genes from the 60 isolates was highly consistent with the maximum likelihood phylogeny and population assignments based on SNP diversity (Fig 5). Isolates from the same genetic population shared similar gene sets, with admixed isolates Pairwise distances were computed between genomes based on gene presence/absence polymorphism, and the resulting matrix was used to cluster isolates with the neighbor joining method implemented in MEGA 7 [126]. Branch lengths represent the number of genes that differ in presence/absence status between strains, such that genomes with identical gene content would have a distance of 0. https://doi.org/10.1371/journal.pone.0194616.g005 Population genomics of the head blight pathogen Fusarium graminearum being primarily associated with the NA1 cluster, and to a lesser extent, the NA3 cluster. A few isolates, however, showed considerable genomic divergence due to a relatively large number of unique genes. Isolate F328, for example, had 103 genes that were not found in any other F. graminearum isolates, but most of these (81%) showed significant homology (BLAST E < 10 −10 ) to proteins from fusaria that are not members of the FGSC (e.g., F. oxysporum, F. langsethiae, F. poae). To confirm the presence of these accessory genes in F328, the fungus was re-isolated from a single conidium, and the resulting isolate was subjected to genome sequence analysis. The accessory gene content of the resulting genome sequence was the same as in the original F328 genome sequence. This finding indicates that the accessory genes in F328 could reflect gene acquisition from other fusaria.
Population-specific patterns of gene conservation were evident in a small proportion of the accessory genome. Of the 1,681 accessory genes analyzed, 121 genes were differentially conserved among populations (E test P < 0.01), showing high conservation in one population, but were absent or rare in the other two populations (S2 Table and Fig 6). Only 38 of these genes were exclusive to a single population, in that they were found in most members from one population, but were completely absent in the other two populations; 9 such genes were detected in NA1, 11 in NA2 and 18 in NA3. However, HMMER pattern searches indicated that three of the 121 differentially conserved genes had the same Pfam domain content and order; thus, while BLAST-clustering assigned them to distinct ortholog groups, these proteins may have divergent homologs segregating in different populations (symbols in Fig 6). Nonetheless, these three potential orthologs were found in different genomic contexts and were not present in all isolates, suggesting they are accessory proteins that have diverged in a population-specific manner.
Of the 121 genes that were differentially conserved across populations, 88 could be functionally characterized based on annotations in the PH-1 reference genome, InterPro domains or homology to known proteins (Fig 6). Each population was associated with distinct heterokaryon incompatibility genes (IPR010730, IPR024983). Moreover, degradative enzymes, and in particular hydrolases (IPR013094, IPR016035, IPR015955), but also pathogenicity-related lipases, proteases and effectors [31,36] were commonly found among genes that were differentially conserved. As such, each population had a unique collection of pathogenicity-related genes (Fig 6, S2 Table). Furthermore, we identified numerous putative secondary metabolite biosynthetic genes that were associated with a single population. These genes encoded nonribosomal peptide synthetase (NRPS) enzymes, cytochrome P450 enzymes and a variety of transcription factors and transmembrane transporters (S2 Table). In fact, in each population we identified at least one differentially conserved putative secondary metabolite gene cluster, wherein synthetase enzymes, transporters, transcription factors and modifying enzymes colocalized to the same contig (S2 Table, gene clusters F, I, J and U). Furthermore, functional enrichment analyses of the subset of differentially conserved genes that were assigned GO terms (N = 68) showed that transcription regulators (GO:0006355) and domains involved in redox reactions (GO:0016491, GO:0055114, GO:0050660) and transmembrane transport (GO:0055085) were significantly overrepresented (adjusted P < 0.01) in these genes.
Taken together, genome scans and gene content surveys showed that each population could be distinguished by a subset of genomic features with unique biological functions. For instance, regions that were differentially conserved or showed signatures of selection in NA1 were enriched for membrane proteins, transmembrane transporters and domains involved in DNA dependent regulation of transcription (adjusted P < 0.001, Fig 4). The NA2 population, on the other hand, showed enrichment for zinc finger transcription factors (adjusted P < 0.001), whereas the NA3 population had an overabundance of candidate genes with functions related to oxidation-reduction (adjusted P < 0.001).

F. graminearum population genomics
Our phylogenomic analyses revealed a previously unidentified population of pathogens, herein termed NA3, corresponding to a clade of F. graminearum that produce the NX-2 toxin. Studies utilizing VNTR and RFLP-based genotyping were unable to elucidate the genetic identity of NX-2 F. graminearum, but the restricted distribution of these isolates in the northern US and southern Canada suggested they were likely endemic to North America [16,17]. Accordingly, we found no evidence of a recent introduction in NA3, but rather, the relatively high levels of diversity in this population supported an endemic origin and a possible recent expansion (Table 1). Still, despite evidence of gene flow between NA1 and NA3, NA3 isolates have remained genetically distinct (Fig 1) and have genes that distinguish them from isolates belonging to the two predominant populations of FHB pathogens in North America (Fig 5). One possible explanation for these findings is that NA3 isolates recently switched from nonagricultural grasses to domesticated cereals [16,18]. Lofgren et al. (2017) reported that F. graminearum inhabit diverse wild grass species as endophytes, and wild grasses may be the ancestral hosts for FHB pathogens [41]. Thus, the genomic traits that differentiate F. graminearum populations may to some degree reflect different evolutionary trajectories taken by these populations as they transitioned from an endophytic lifestyle on wild grasses to pathogens of cultivated cereal.
Indeed, F. graminearum isolates co-occupying geographic ranges in southern Canada and the northern US are highly substructured compared to pathogens from other regions of the world, which may indicate that they have not coexisted for sufficient time to be in mutationselection-drift equilibrium. Recent surveys of F. graminearum conducted in Germany for instance, reported that 213 randomly sampled F. graminearum over a > 500 km transect belonged to a single, recombining metapopulation [42]. In contrast, our results support the hypothesis that North America harbors multiple F. graminearum populations that have acquired the ability to infect cultivated cereals brought to the region in the last few hundred years [41].

Genomic signatures of population-specific selection
After elucidating the rich evolutionary history of North American F. graminearum, we sought to identify genes underlying pathogen specialization in this species. The outlier regions identified tended to be located near subtelomeres and sites of putative chromosome fusion events ( Table 2, Fig 2), wherein patterns of nucleotide variation suggested that selection had contributed to population divergence. At seven of the 14 outlier regions, signatures of selection were detected in multiple populations. For instance, one of the outliers located on chromosome 2 (Table 2, o6) showed patterns consistent with divergent selection targeting NA1 and NA3, and this region contained a gene harboring a SNP associated with a known quantitative trait Alleles at these outliers may have been recently exchanged among species, or they may be under balancing selection to maintain diversity. The TRI-cluster genes, for example, showed patterns of transspecies evolution, and were previously reported to have evolved under balancing selection acting on chemotype polymorphisms within the B trichothecene producing clade of Fusarium, which includes all of the FGSC species [15].
Individual populations also showed evidence of being uniquely targeted by positive selection, as five outliers had patterns of variation consistent with an independent sweep of distinct genes/mutations in a single population (Table 2, Fig 3A). Notably, a signature of positive selection was identified in the NA3 population on chromosome 2 between 2.89-2.90 Mb, a region that is directly downstream and approximately 5 kb from a SNP associated with a known quantitative trait that regulates trichothecene production in F. graminearum [SNP:12335 43], suggesting that this genomic region may encode adaptations that regulate mycotoxin-associated virulence toward sensitive hosts. Likewise, in other fungi similar population-specific sweeps of alleles have been reported at genes encoding adaptive traits [44][45][46]. In Neurospora crassa for example, genomic signatures of population divergence localized to genes that proved to be essential for cold adaptation [44]. In Microbotryum fungi many genes with footprints of selection were differentially expressed in planta, suggesting that these loci correspond to putative adaptations responding to host-mediated positive selection [46].
Collectively, our genome scans suggested that recurrent episodes of selection in F. graminearum have allowed independently evolving populations to accumulate divergent alleles at dozens of genes located in rapidly evolving regions of the genome. Core genes encoding membrane proteins, transcription factors and degradative enzymes were preferentially targeted, perhaps indicating that these genes represent essential, but highly dynamic traits that provide evolutionary flexibility to the fungus. However, populations also maintained distinct sets of accessory genes (Fig 6, S2 Table). Comparative genomic studies in bacterial species have shown that such population-specific patterns of gene conservation can result from selection for traits involved in niche specificity [47,48], consistent with the tendency for accessory genes to localize in genomic regions implicated in pathogen specialization [22,23,27,38]. Many of the population-differentiating accessory genes in F. graminearum had functions related to transmembrane transport, transcription regulation and secondary metabolism suggesting they may also be important for pathogen specialization, though functional studies are warranted to determine their adaptive value.
Interestingly, most (63%) population-differentiating accessory genes were present in some admixed isolates (Fig 6) and the vast majority (98%) were conserved to some degree in other Fusarium species (S2 Table), with a few exhibiting > 90% identity to homologs in distantly related species (e.g., F. oxysporum). These findings are in accordance with previous studies [23], indicating that accessory genes are often exchanged among F. graminearum and other species of fusaria, or they were present in the ancestral gene pool and were lost and/or duplicated in a population-specific manner. As such, standing variation in the Fusarium gene pool may provide a reservoir of interchangeable niche-specific traits that enhance the adaptive capacity of F. graminearum by facilitating rapid evolution [49].

Functional implications of genomic divergence in FHB pathogens
Based on patterns of genomic divergence, it appears that F. graminearum cereal pathogens may rely on a small subset of rapidly evolving core genes, and an array of modular, accessory genes, to fine-tune the genome in response to selection. This genomic architecture has allowed for the evolution of divergent gene sets within populations (Fig 4), which in turn, could contribute to differences in the functional capacity of sympatric FHB pathogens. Given the strong selective constraints imposed by agricultural landscapes [6], it was not surprising that many of the population-specific candidate genes encoded pathogenicity-related proteins. Previous studies identified 55 candidate virulence genes that were unique to the PH-1 reference strain [23], however, 95% of these genes were prevalent in the genomes sampled herein (found in 38% of isolates on average), and only two showed patterns of conservation that were suggestive of population-specific adaptations. The genomic features identified in our study, on the other hand, provided evidence that populations could be differentiated by numerous gene variants and combinations of accessory genes encoding proteins involved in pathogenicity. These included plant cell-wall degrading enzymes, major facilitator transporters and secreted effector proteins (Table 2, Fig 6, S2 Table), which are known to be expressed in planta, and may be associated with different strategies for host invasion and nutrient acquisition from degraded plant tissues [22,36,[50][51][52].
The FHB pathogens in our study were known to produce different analogs of trichothecene mycotoxins [11,17,19], which can affect virulence [13,53]. However, we uncovered additional population-level diversity in putative secondary metabolite-related genes, including four previously undescribed gene clusters, multiple genes encoding NRPS enzymes, and several cytochrome P450 enzymes (Fig 6, S2 Table). For instance, a uniquely conserved putative gene cluster in the NA2 population encoded a fungal transcription factor, a monoterpene hydrolase and a salicylate hydroxylase. In other studies, salicylate hydrolases were shown to degrade the plant defense hormone, salicylic acid [54,55]. On the other hand, we found evidence of broad diversification in oxidoreductive gene function in the NA3 population (Fig 4), which was uniquely associated with a previously undescribed secondary metabolite gene cluster showing high similarity (91-93% identity) to genes in F. oxysporum (S2 Table, Y). Although these genes have not been functionally characterized, based on similarity to secondary metabolite genes in fungal pathogens, we speculate that many may be involved in the biosynthesis of metabolites that function as plant effectors and virulence factors [56][57][58][59][60][61]. Thus, the three populations may employ different metabolites during infection [62], or utilize different virulence factors to avoid host recognition [63].
Our findings also provided evidence that competition has imposed strong selective constraints on NA1, NA2 and NA3 isolates, resulting in population-level divergence in genes that allow the fungus to detect and respond to antagonists. F. graminearum populations could be differentiated by multiple heterokaryon incompatibility genes (het), and genes encoding NACHT, STAND, NB-ARC and HELO domains (Table 2, Fig 6, S2 Table) involved in nonself recognition and heterokaryon cell death, [40,64,65]. In addition, we detected populationlevel divergence in antimicrobial effectors, including an anti-fungal toxin [32], a secreted virucidal protein [66] (Table 2) and a putative antibiotic biosynthetic gene cluster (S2 Table, F). For example, different protein variants of Hce2 chitinase, an effector that likely functions to inhibit fungal competitors [32], were found in the three populations, and in NA3 there was strong evidence that selection has resulted in the fixation of a distinct variant of Hce2 (Table 2). Population-level divergence of incompatibility genes could indicate that these loci contribute to reproductive isolation [67], though patterns of trans-species polymorphism at non-self recognition and fungal immunity genes in FGSC species (Table 2, FGRRES_10558, FGRRES_12215, FGRRES_13758) could indicate balancing or frequency-dependent selection to maintain diversity in nature [68][69][70][71]. Taken together, divergence of toxigenic peptides and non-self recognition proteins suggests that like other filamentous fungi, F. graminearum utilizes diverse extracellular antimicrobial toxins during antagonistic interactions [40]. In F. verticillioides, encounters with other fungi result in production of secondary metabolites that kill and inhibit growth of fungal competitors [72]. In the wild, F. graminearum co-occurs with other microbes, including other Fusarium species [73][74][75]. Several studies have shown that F. graminearum interacts competitively with other Fusarium species in vitro and in the field [73,76], and in planta, intraspecific interactions can inhibit mycotoxin production and pathogenicity [23]. Therefore, it is conceivable that F. graminearum populations differ in their ability to interact competitively, which could impact FHB pathogen composition and diversity of the cereal microbiome.
Although environmental factors, particularly temperature, have been shown to influence the distribution of FGSC species, there appear to be few climatic constraints on F. graminearum [77]. However, there is evidence that certain isolates have enhanced cold tolerance. For example, some isolates with the 3ADON chemotype appear to be more resistant to freezing than 15ADON isolates [74], which may confer an advantage to certain populations of the fungus in cooler climates [73]. Herein, we found that populations of F. graminearum have evolved unique genomic traits that could influence how these pathogens respond to environmental factors. First, in NA2 we found strong evidence of a selective sweep targeting a locus encoding Pgl1, a polyketide synthase involved in pigment production [37], a DEAD helicase (FGRRES_13527), and a putative plant effector (FGRRES_09181) [36] (Fig 3A). The discovery of a DEAD helicase was striking, because homologs of this gene have been implicated in the adaptive evolution of multiple fungal species [44,45], and are essential for local adaptation to climate in N. crassa [44]. Due to the close linkage of these genes, it is unclear which was targeted by selection. However, parallel findings in other fungal species strongly support the hypothesis that divergence of the DEAD helicase gene in populations of F. graminearum reflects adaptation to different local conditions. This may indicate that sympatric populations evolved in different environments or have different environmental optima.
NA1 also showed evidence of significant divergence at genomic regions involved in cold adaptation, as this population was uniquely associated with a homolog of the cold shock protein Crp1, located within a putative plant effector biosynthetic gene cluster (S2 Table; J). Crp1 is a multifaceted RNA chaperone that promotes cold tolerance, by stabilizing RNA intermediates at suboptimal temperatures, but also regulates fungal pathogenicity [78]. We posit that this variant may perform similar functions within the NA1 genomic background. We also identified an ortholog of the circadian clock gene frequency (frq) that was only conserved in NA1 (FGRRES_08069 Fig 6, S2 Table). Recent genome scans identified this gene as a driver of local adaptation in N. crassa, presumably because it allows the fungus to adjust circadian oscillations in response to photoperiod [44]. A frq-regulated circadian clock has not been described in F. graminearum, but previous studies reported a single ortholog within the genome [79] (FGRRES_06054; 3:4,152,665-4,155,702). We found orthologs of this gene that exhibited 99-100% identity in all F. graminearum genomes examined in this study, indicating that it is part of the core genome. In contrast, FGRRES_08069 was located on a different chromosome (2:305,728-306,903), and while it was highly conserved in NA1 isolates, it was found in only one NA2 isolate (Fig 6), likely reflecting the history of frq loss and duplication in F. graminearum [79]. Alignment with the N. crassa protein indicated low identity (34%), but revealed that the coil-coiled domain essential for FRQ function [80] was intact. Knowing that N. crassa utilizes different isoforms of FRQ to fine-tune clock-regulated protein expression in response to environmental stimuli [81], it is possible that the frq variant found in NA1 may also be involved in light or temperature sensitivity.
The NA3 population exhibited a selective sweep targeting a different circadian regulator, a DASH cryptochrome, which colocalized with genes encoding a kinase and a methylcitrate dehydratase (Table 2). DASH cryptochromes are blue/UV-A light photoreceptors and DNA repair enzymes that have been shown to play essential roles in regulating stress response and fungal growth [82]. In F. fujikuroi, the DASH cryptochrome, CryD, also regulates production of secondary metabolites, such as pigments, and the plant hormone, gibberellin, in a lightdependent manner [33]. The cryptochrome in NA3 was distantly related to the F. fujikuroi CryD homolog (64% identical) and exhibited considerable divergence from the reference F. graminearum CryD homolog (89% identical), but was highly similar to an ortholog found in F. culmorum (94% identity), an FHB pathogen found in the same geographic areas as the NA3 population [19]. Further analyses of the adaptive significance of the NA3 CryD ortholog are warranted, but our findings may indicate non-phylogenetic sorting of this gene, perhaps reflecting selection or recent genetic introgression at this locus.
Our comparative genomics study suggests that there may be marked ecological diversity among sympatric NA1, NA2 and NA3 F. graminearum, likely reflecting the unique demographic histories and population-specific selection pressures experienced by independently evolving populations of the fungus. All three populations had distinct genomic traits that could provide a selective advantage under specific ecological scenarios. Specifically, the outliers found on chromosome 2 that were associated with known quantitative trait polymorphisms in F. graminearum provide strong evidence that populations have divergent fitness traits involved in fungicide resistance and mycotoxin-associated virulence. Many of the population-differentiating genes appeared to be involved in host-infection, however, it remains to be determined if they evolved as adaptations to wild grasses, domesticated cereals or other plant hosts. Regardless, like other phytopathogens, F. graminearum exhibits evidence of genome evolution in response to the constant arms-race between pathogen and host [62,63]. Yet, standing variation in the Fusarium gene pool seems to be an important source of divergent alleles, and potentially, a reservoir of adaptations maintained in different populations through balancing selection. The undeniable parallels between candidate genomic adaptations identified herein and those found in saprophytic and ectomycorrhizal fungi, suggests commonalities in the genetic bases of adaptive population divergence, such as the importance of circadian regulators and helicases in adaptation to light and temperature [44,83]. In the future, the candidate genes identified in this study will enable functional testing of specific hypotheses about the genes that have contributed to the evolutionary dynamics of FHB pathogens, and also provide the foundation for further investigations of adaptive evolution in other fungal species.

Conclusions
By combining high-resolution genome scans with surveys of gene content, we were able to identify genes involved in population divergence of a major fungal phytopathogen. Our results provided compelling evidence that population-specific selection pressures have left distinct signatures in the genomes of F. graminearum isolates, resulting in differentiation of an array of genes involved in host invasion, antagonism and environmental adaptation. The diffusion of adaptations among populations via localized gene flow or transcontinental introductions [11] could present future challenges to FHB management, because selection could continue to drive emergence of highly adapted, virulent pathogens. In this regard, persistent surveillance at a global scale will be critical for monitoring the influx of pathogens with novel adaptations and assessing the sustainability of resistant cultivars and fungicides used to control FHB. Moreover, gene deletion analyses of candidate traits in planta could provide insight into ecological significance of the pathogenicity-related genes identified herein, while direct competition experiments between different populations or species will allow us to reevaluate fungal effectors in regards to antagonistic interactions and microbial diversity. However, future studies need to consider that fitness differences in F. graminearum could also be related to gene expression or transcription regulation [84,85], and probably involve a multitude of quantitative traits that depend on host genotype, climate and the microbiome. As such, comparative genomics and transcriptomic studies of FHB populations sampled across space and time could be an efficient way to understand genome-wide adaptation in relation to specific climatic and agronomic factors.
Fungal cultures were prepared for DNA extraction following methods described in O'Donnell et al. [86]. DNA from each isolate was quantified using the Quant-iT™ Qubit™ dsDNA HS Assay Kit (Invitrogen) and diluted to a final concentration of 1ng for library preparation with the Illumina Nextera XT DNA Library Prep Kit, following manufacturer's instructions for 2x300bp MiSeq runs. All genomic data generated herein were deposited in the NCBI Sequence Read Archive (Accession numbers SRR5948910-SRR5948979, SRR6396633). Before analysis, sequencing reads were trimmed to remove adaptors, low quality (Q < 20) bases and ambiguous nucleotides, and reads were filtered to eliminate bacterial and human DNA contaminants.

Reference-based mapping and de novo assembly
We used CLC Genomics Workbench version 9.5.2 (https://www.qiagenbioinformatics.com/) to map reads from all isolates to the genome sequence of the F. graminearum reference strain, PH-1 (Accession no. PRJEB5475), using gene and domain annotations from King et al. (2015) (S1 File). After mapping, duplicate reads were removed and local realignment was performed to resolve gaps and indels. To characterize genomic regions that were too divergent to be mapped (< 80% identity over > 50% of the read) or were not represented in the reference genome, we also performed de novo assembly of unmapped reads using default parameters in CLC Genomics Workbench, and then applied filters to remove regions with less than 5X coverage per genome. Augustus version 3.2 [87] was used to employ a Hidden Markov model to predict genes on these contigs, utilizing validated parameters settings based on experimentally validated introns from F. graminearum [88]. Coding sequences were identified on both strands of DNA, requiring ATG start codons and stop codons. Herein we refer to contigs assembled from unmapped reads as orphan contigs, and genes annotated on orphan contigs as orphan genes.

SNP discovery
We used the Genome Analysis Tool Kit [89] and followed best practices for genomic variant discovery [90] to recalibrate base-quality scores of read mappings and perform multi-sample SNP discovery. Indels were not considered as population genomic analyses focused on methods that utilized models of nucleotide substitution. Previously published, known sites of variation identified in the genome of PH-1 [22] were used to guide base-quality score recalibration and SNP validation. SNP calls with a Phred quality score < 30 or a mapping quality score < 20 in single samples were not considered during multi-sample SNP identification. In addition, we performed variant-quality score recalibration, utilizing filters to remove variants that exhibited strand bias, low quality or low read depth per genome, and ensuring that SNP sites included in genomic analyses had an estimated false discovery rate (FDR) <1%. To avoid ascertainment bias associated with sampling depth, we limited our analyses to sites having valid SNP calls in all samples [91].

Population genomic structure
Bayesian clustering analyses implemented in the program STRUCTURE [92] were used to establish genome-level demographic structure, discern the population identity of NX-2 F. graminearum and examine admixture among NA1, NA2 and NX-2 isolates. To minimize the effects of linkage disequilibrium and reduce computational load, we limited analyses to SNPs that were at least 5 kb apart (6,852 sites) as pilot analyses indicated that this sampling interval provided sufficient resolution to confidently assess population structure and admixture. We simulated K, the number of populations, accounting for shared ancestry by utilizing the linkage model [93], and used 50,000 Monte Carlo Markov Chain (MCMC) iterations following a 20,000 iteration burn-in for each of three replicate runs performed for K 1-7. The model that maximized ΔK, the rate of change in log likelihood values, was considered optimal based on methods described in Evanno et al. [94]. Estimates of q, the proportion of each isolate's membership in each of the inferred populations, were calculated using the optimal model parameters to assess probability of assignment and genetic admixture among populations.
Phylogenetic trees were generated from the full set of quality-filtered SNPs using rapid bootstrap [95] and maximum likelihood methods implemented in RAxML version 8.0.26 [96]. The General-Time-Reversible (GTR) model of nucleotide substitution, with corrections for ascertainment bias [97] and rate heterogeneity (ASC_GTRGAMMA), was used to infer a phylogeny of FGSC species. The SNPs identified after reference-based mapping (methods described above) of genome sequences from isolates of F. boothii, F. gerlachii and F. louisianense were included as outgroups in this tree. In addition, we built a phylogenetic network of F. graminearum isolates using SplitsTree4 version 4.14.6 [98] to depict an evolutionary history that reflected intraspecific recombination events.

High-resolution genome scans for selection
To identify regions that have been impacted by selection in the NA1, NA2 and NA3 populations we calculated several population genetic summary statistics in non-overlapping, 10 kb sliding windows using the R package PopGenome [99]. We chose 10 kb intervals based on patterns of linkage disequilibrium in the F. graminearum genome, which become negligible at distances > 10 kb [42]. Further, pilot analyses exploring a range of window sizes (5 kb-20 kb) indicated that a 10 kb resolution generally captured covarying signals from linked loci, while distinguishing signals from adjacent loci in linkage equilibrium.
We quantified within-population diversity by calculating π (nucleotide diversity, the average number of differences between individuals, [100]), θ (nucleotide polymorphism based on segregating sites, [101]) and Tajima's D [30], and estimated the number of recombination events within a population using R M (four-gamete test, [102]). In addition we quantified differentiation between populations by calculating pairwise values of D xy (nucleotide divergence based on the absolute number of differences between two populations, [100,103]), F ST ([104], per haplotype, adjusted for unequal population sizes using weighted averages), and Tajima's D, in this case calculated for combined populations as a relative measure of inter-population divergence [44]. Genome-wide averages of summary statistics were compared among the three populations using JMP [105], and visualized using Circos [106] and JalView Version 2 [107].
Genomic patterns created by genetic structuring, bottlenecks and population expansions can mimic those elicited by selection [25, [108][109][110], thus it was important to assess the significance of outliers against the genomic background of our populations to discriminate selective events targeting individual loci from neutral demographic processes that impact the entire genome [111]. However, the mixed reproductive system and introgression documented in F. graminearum [42,112], coupled with notable trans-species polymorphism [15,16] and horizontal transfer [113][114][115] in Fusarium species violate assumptions and introduce severe bias in coalescent and likelihood-based demographic models used for detecting selection [116,117]. As such, we adapted a nonparametric permutation method for genome-wide SNP data [118] to evaluate the significance of 10 kb sliding-window statistics against the genomic distribution of values for each population. This method has proven effective for identifying population-specific outlier loci in species with complex evolutionary histories, as it utilizes localized patterns of linkage disequilibrium to discern extreme patterns of genetic variation elicited by natural selection from those elicited by neutral demographic processes [111,118]. We used custom shell scripts and BedTools [119] to generate 1000 random permutations of the variant call format (VCF) file, containing the chromosomal coordinates of each SNP and the associated genotypes observed for the 60 sequenced isolates. Each permutation randomized observed genotypes across the SNP coordinates, generating a random set of SNP coordinate/genotype combinations that maintained the joint site-frequency spectrum of the data, but was unconstrained by linkage disequilibrium (i.e. spatially randomized). Sliding-window analyses were repeated on each of the 1000 permuted datasets as described above to generate a null genomewide distribution of summary statistics for each population, and all pairwise combinations of populations. Observed values for a 10 kb window were considered outliers at P < 0.001, i.e. the probability that the observed value was more extreme than values from the null distribution. As such, outliers were identified as those windows that exhibited localized signatures of linkage disequilibrium associated with selective sweeps, and had patterns of variation that were significant in the context of each population's genome-wide diversity. Still, to reduce the probability of identifying false positives, genomic regions of interest were limited to those having significant values of absolute and relative differentiation between population pairs [44] (significant F ST , D xy , and Tajima's D in at least two population comparisons), coupled with significant reductions in nucleotide diversity (π, within the targeted population).
Phylogenetic analyses were performed on genes residing in outlier regions using sequences from F. pseudograminearum as an FGSC outgroup for F. graminearum, F. boothii, F. gerlachii and F. louisianense. DNA sequences were aligned with MUSCLE (Edgar, 2004) as implemented in CLC Genomics Workbench, and then maximum likelihood phylogenies were constructed using IQtree [120], with models of molecular evolution selected on the basis of Bayesian Information Criterion scores calculated with ModelFinder [121]. Support for individual branches was assessed using 1000 ultrafast bootstraps [122]. We also examined scenarios of trans-species evolution by comparing the likelihood of unconstrained trees to constraint trees requiring monophyly of all F. graminearum isolates using Shimodaira-Hasegawa likelihood ratio tests [123].

Quantifying differences in gene content
To compare gene content among isolates we conducted an inventory of all F. graminearum genes, including coding sequences extracted from read mappings and genes predicted on orphan contigs. Annotated PH-1 gene sequences were also included as an internal reference. We first identified orthologous gene sequences using R-functions within the micropan package [124]. All-against-all BLAST comparisons were conducted on predicted protein sequences and then pairwise distance values (D i,j ) were computed from BLAST alignment scores (bitscores). Sequences were allocated into orthologous groups using single linkage clustering with D i,j ! 0.5 set as the cluster inclusion threshold, as this allowed us to resolve 97% of the PH-1 genes as distinct homologs (i.e. a one to one match between annotated genes and ortholog groups). A pan-genome matrix indicating the number of copies of each ortholog per genome was then constructed based on sequence clustering. The presence/absence of orthologous genes in each genome was verified by examining BLAST alignments and gene coverage in individual read mappings. Genes encoding proteins that were present in the reference genome and in all sampled genomes were classified as core genes, whereas genes encoding proteins that were missing in one or more genomes were classified as accessory genes. To confirm that core genes were not misclassified as accessory genes using BLAST-based similarity searches, we also classified orthologs based on Pfam domain content and order, using HMMER version 3.1 (http://hmmer.org) to characterize domain profiles for all accessory proteins.
To identify genes that were differentially conserved across populations, we used a gene enrichment test adapted from den Bakker et al. [125] to compare the relative frequency of each accessory gene in the three populations. Genes were considered differentially conserved when the test statistic (E) exceeded three standard deviations of the population mean, i.e., an empirical P-value < 0.01.
A dendrogram of the F. graminearum pan-genome was computed using the neighbor joining method implemented in MEGA 7 [126] and pairwise distances calculated between genomes based on gene presence/absence polymorphism.

Functional annotation and enrichment analyses
BLAST2GO version 3.2.7 [127,128] was used to assign enzyme codes, InterPro domains and Gene Ontology (GO) terms to proteins predicted from differentially conserved genes. Putative functions and homologues of these proteins were further explored using BLAST [129] and CDD [130,131] database searches. We also identified secreted proteins that lacked a transmembrane domain using SignalP version 4.1 [132]. Blast searches of PHI-base (www.PHIbase.org) were also conducted to examine homology with functionally characterized virulence genes in F. graminearum and other phytopathogenic microbes.
Gene Ontology (GO) enrichment analyses were performed on the candidate genes identified in each population (differentially conserved genes and genes located within outliers) using the R package XGR version 1.0.4 [133]. One-sided hypergeometric tests were performed to compare the relative frequency of GO terms assigned to domains in candidate gene sets to GO terms assigned to domains from all F. graminearum genes, requiring that GO terms have >3 hits in common between candidate and reference lists to account for the smaller number of genes analyzed in population-specific gene sets. In addition, we utilized a stringent criteria for significance by applying the Benjamini-Hochberg method to adjust for multiple tests, and identifying GO terms as significantly overrepresented only when the adjusted P-value for that term was < 0.01. Simulation studies indicated that these analytical conditions result in a low false positive rate (< 0.01) for datasets with sample sizes comparable to those analyzed herein [133].
Supporting information S1 Fig. Maximum likelihood phylogeny of TRI1 gene sequences from the 60 isolates of F. graminearum used for genome sequencing. The phylogeny was inferred using the Kimura 2-parameter model of nucleotide substitution [135] with a Gamma parameter to account for rate heterogeneity. Bootstrap values (%, based on 100 replications) ! 50 are indicated on branches. The tree was rooted at midpoint and drawn to scale, with branch lengths measured in the number of substitutions per site. (DOCX) S2 Fig. Outlier gene phylogenies for F. graminearum, F. gerlachii, F. louisianense, F. boothii and F. pseudograminearum. Maximum likelihood methods were used to construct phylogenies for 75 of the 81 genes residing in genomic regions exhibiting signatures of selection. The remaining six genes were not analyzed because homologs were not detected in one or more of the F. graminearum outgroup species (F. gerlachii, F. louisianense, F. boothii or F. pseudograminearum). Above, outlier genes located on chromosome 1, between 11.30 to 11.31 Mb are shown to exemplify discordance between phylogenies of outlier genes and the genome-wide SNP phylogeny (Fig 1). Numbers on branches indicate support values determined with 1000 ultrafast bootstraps [120,122]. The tree was rooted with F. pseudograminearum, a member of the   [127,128]. Asterisks indicate secreted proteins identified using SignalP 4.1 [132]. PHI-base accession numbers are indicated in parentheses for sequences showing significant homology (BLAST e-value < 10 −5 ) to proteins in the PHI-base 4.1 database. 2 Ortholog had a significant (e-value < 10 −5 ) BLAST hit to another species, or to another ortholog in PH-1. Only the top hits (based on bitscore) are shown. 3 Genes with the same designation were clustered on the same contig, or colocalized within the PH-1 genome.