Maintenance of divergent lineages of the Rice Blast Fungus Pyricularia oryzae through niche separation, loss of sex and post-mating genetic incompatibilities

Many species of fungal plant pathogens coexist as multiple lineages on the same host, but the factors underlying the origin and maintenance of population structure remain largely unknown. The rice blast fungus Pyricularia oryzae is a widespread model plant pathogen displaying population subdivision. However, most studies of natural variation in P. oryzae have been limited in genomic or geographic resolution, and host adaptation is the only factor that has been investigated extensively as a contributor to population subdivision. In an effort to complement previous studies, we analyzed genetic and phenotypic diversity in isolates of the rice blast fungus covering a broad geographical range. Using single-nucleotide polymorphism genotyping data for 886 isolates sampled from 152 sites in 51 countries, we showed that population subdivision of P. oryzae in one recombining and three clonal lineages with broad distributions persisted with deeper sampling. We also extended previous findings by showing further population subdivision of the recombining lineage into one international and three Asian clusters, and by providing evidence that the three clonal lineages of P. oryzae were found in areas with different prevailing environmental conditions, indicating niche separation. Pathogenicity tests and bioinformatic analyses using an extended set of isolates and rice varieties indicated that partial specialization to rice subgroups contributed to niche separation between lineages, and differences in repertoires of putative virulence effectors were consistent with differences in host range. Experimental crosses revealed that female sterility and early post-mating genetic incompatibilities acted as strong additional barriers to gene flow between clonal lineages. Our results demonstrate that the spread of a fungal pathogen across heterogeneous habitats and divergent populations of a crop species can lead to niche separation and reproductive isolation between distinct, widely distributed, lineages.

Introduction Introduced fungal plant pathogens constitute a major threat to ecosystem health and agricultural production [1][2][3]. Research in plant pathology has revealed a seemingly inexhaustible stream of disease outbreaks or significant changes in the geographic location of diseases and host ranges of pathogens [4,5]. In addition to the burden of emerging fungal infections, most major crops or domesticated trees are colonized by introduced pathogens that are so widespread and have been present for so long, that they do not necessarily come to mind when one thinks about invasive fungi [6,7]. If fungal pathogens are to spread successfully over their host species distribution, they must overcome a series of barriers to invasion related to dispersal, host availability, competition with other microbes, and abiotic conditions [8]. An understanding of the factors underlying the colonization success of widespread fungal pathogens is essential to protect world agriculture against future pandemics [9,10].
Introduced fungal plant pathogens display a range of population structures, reflecting their highly diverse life-history strategies and invasion histories [8,11,12]. Demographic events (e.g. population bottlenecks, secondary contact) and natural selection during spread across heterogeneous environments may be associated with intricate changes in population dynamics and life history traits, or may cause such changes. Shifting to a new host is a primary life history trait change in introduced fungal pathogens, and a major route for the emergence of disease. Changes in host range are facilitated by certain features, such as mating within or on their hosts and strong selective pressure on a limited number of genes [13,14]. Reproduction within, or on plant hosts favors assortative mating with respect to host use, leading to a strong association between adaptation to the host and reproductive isolation [15,16] and the differentiation of pathogen populations adapted to different host populations, varieties or species [17][18][19][20]. Such changes in pathogen host range can result from immune-escape mutations. Typically, they occur in a small number of genes encoding small secreted proteins called effectors, which enable microbes to influence the outcome of host-pathogen interactions for their own benefit, and which may be subject to surveillance by the plant immune system [21][22][23]. Another frequent change in the life history traits of introduced fungal pathogens is the loss of sexual reproduction [24,25]. Introduced fungal pathogens may develop a clonal population structure over their entire introduced range or over parts of that range ( [8] and references therein). The immediate causes of the loss of sexual reproduction include a lack of compatible mating partners in the introduced range (i.e. a lack of compatible mating types) [11], hybridization and hybrid sterility [26,27], and a lack of compatible alternate hosts for sexual reproduction [28]. Mating type loss may be driven by the extreme population bottlenecks associated with the colonization of a new host or new area. Shifts to asexual reproduction may also result from selection against reproduction [8], particularly for pathogens of domesticated crops and trees, due to the availability of large, homogeneous host populations, releasing constraints that maintain sexual reproduction in the short term [29]. These conditions may favor the asexual spread of new lineages carrying adaptive allelic combinations requiring shelter from recombination [30]. Finally, changes in life history traits can also result from variations in environmental conditions, including temperature in particular, during the course of pathogen spread [31,32]. Deciphering the complex interplay between changes in population structure and life history strategies during the spread of plant pathogenic fungi requires deep sampling and the integration of genetic, phenotypic, and environmental information throughout the native and introduced ranges.
Pyricularia oryzae (Ascomycota; syn. Magnaporthe oryzae) is a widespread model plant pathogen displaying population subdivision. Pyricularia oryzae is best known as the pathogen causing rice blast, which remains a major disease of Asian rice (Oryza sativa), but it is also a major constraint on the production of wheat (Triticum), finger millet (Eleusine coracana) and Italian millet (Setaria italica), a significant disease of ryegrass (Lolium) and an emerging pathogen of maize (Zea maydis) [33][34][35][36][37][38]. An in-depth genomic analysis of a large worldwide collection of P. oryzae isolates from 12 host plants revealed the existence of 10 distinct P. oryzae lineages, each associated mostly with a single main cereal crop or grass host [39]. Sequence divergence between lineages was low, of the order of about 1% [39], and analyses of gene flow and admixture provided evidence that the different host-adapted lineages were connected by relatively recent genetic exchanges, and therefore corresponded to a single phylogenomic species [40,41]. The reproductive biology of P. oryzae is typical of plant pathogenic Ascomycetes. Pyricularia oryzae has a haplontic life cycle. Thus, the multicellular state is haploid, and the breeding system is heterothallic, with mating occurring only between haploid individuals of opposite mating types. Pyricularia oryzae is hermaphroditic [42], but successful mating also requires that at least one of the partners to be capable of producing female structures (i.e. "female-fertile").
The lineage of P. oryzae that infects rice is the most widely studied at the population level, due to its agricultural importance. Analyses of populations from commercial fields and trap nurseries based on molecular markers and mating assays have consistently shown that non-Asian populations of the rice-infecting lineage harbor a single mating-type and are clonal, with sexual reproduction never observed in the field [42,43]. Populations with female-fertile strains, a coexistence of both mating types and signatures of recombination have been observed exclusively in Asia [42][43][44][45][46]. More recent population genomic studies of the rice-infecting lineage have revealed genetic subdivision into four main lineages, estimated to have diverged approximately 1000 years ago and displaying contrasting modes of reproduction [47][48][49]. Only one of the four lineages that infects rice (lineage 1, prevailing in East and Southeast Asia) displays a genome-wide signal of recombination, a balanced ratio of mating type alleles and a high frequency of fertile females, consistent with sexual reproduction [44]. All the other lineages (lineages 2, 3 and 4, present on several continents) have clonal population structures, highly biased frequencies of mating types and rare fertile females, suggesting that reproduction in these lineages is strictly asexual [44,49]. The recombining lineage has a nucleotide diversity that is four times greater than that of the clonal lineages [49]. In theory, gene flow remains possible between the majority of the lineages, because there are rare fertile strains and some lineages have fixed opposite mating types. This raises questions about the nature of the factors underlying the emergence of different rice-infecting lineages in P. oryzae, and contributing to their maintenance in the face of potential gene flow. Previous studies have helped to cement the status of P. oryzae as a model for studies of the population biology of fungal pathogens. However, most efforts to understand the population structure of the pathogen have been unable to provide a large-scale overview of the distribution of rice-infecting lineages and of the underlying phenotypic differences, because the number of genetic markers was limited [43,44], the number of isolates was relatively small [47,49] and because host-pathogen interaction phenotypes were the only phenotypes that were scored [43,44,47]. The contribution of differences in the geographic and climatic distribution of rice blast lineages and genetic incompatibilities between them, in particular, remains unknown.
To further explore the factors underlying the origin and maintenance of population structure in rice-infecting P. oryzae, we analyzed genomic and phenotypic variation in isolates sampled across all rice-growing areas of the world. Our aims were to re-evaluate population subdivision and the reproductive mode using a large collection of samples, to further explore host specialization using a larger number of isolates and rice varieties, to examine differences in the geographic and climatic distribution of lineages, and to measure their interfertility. For this, we analyzed genetic variation using genotyping data with a high resolution in terms of genomic markers and geographic coverage, and measured phenotypic variation for a representative subset of the sample set. We also characterized the genetic variability and content of repertoires of effector proteins using whole-genome resequencing data for a representative set of isolates.

The rice blast pathogen comprises one recombining, admixed lineage and three clonal, non-admixed lineages
We characterized the global genetic structure of the rice blast pathogen, using an Illumina genotyping beadchip to score 5,657 single nucleotide polymorphisms (SNPs) distributed throughout the genome of 886 P. oryzae isolates collected from cultivated Asian rice in 152 sites in 51 countries (S1 Table). The final dataset included 3,686 SNPs after the removal of positions with missing data, which identify 264 distinct multilocus genotypes in the 886 isolates. The genotyping error rate was below 0.62% (S1 Table). Clustering analyses based on sparse nonnegative matrix factorization algorithms, as implemented in the SNMF program [50], revealed four clusters, hereafter referred to as "lineages" (Fig 1C and S1 Data and S2 Data). The model with K = 4 clusters was identified as the best model on the basis of the cross-entropy criterion. Models with K>4 induced only a small decrease in cross-entropy, suggesting that K = 4 captured the deepest subdivisions in the dataset (S1 Fig). The neighbor phylogenetic network inferred with SPLITSTREE also supported the subdivision into four widely distributed lineages, with long branches separating three peripheral lineages branching off within a central lineage (Fig 1A  and 1B and S1 Data). A comparison with previous findings revealed that the central lineage in the Neighbor-net network corresponds to the combination of recombining lineage 1 with lineages 5 and 6, represented by two and one individuals, respectively, in a previous study [49] (S2 Fig). This central lineage is referred to hereafter as lineage 1. The three peripheral lineages in the Neighbor-net network corresponded to the previously described lineages 2, 3 and 4 [49]. Lineages 2 and 3 are similar to lineages B and C, respectively, identified with microsatellite data in a previous study [44] (S2 Fig). Genetic differentiation between the four lineages was strong (Weir and Cockerham's F ST > 0.56), indicating strong barriers to gene flow between the lineages (S1 Text). All genotypes from lineages 2, 3 and 4 had high individual ancestry coefficients in a single cluster (q > 0.89, q representing the proportion of an individual genome that originate from a given ancestral gene pool), whereas shared ancestry with the other three lineages was detected in lineage 1, with 32% of the genotypes having q > 0.10 in lineages 2, 3 and 4 ( Fig 1C and S1 and S2 Data). Admixture may account for the lower F ST observed in comparisons between lineage 1 and the other lineages.
We detected no signatures of recombination in lineages 2, 3 and 4 with three different tests (pairwise homoplasy index [51], maximum χ 2 [52] and neighbour similarity score [53]) and a null hypothesis of clonality (Table 1), confirming previous findings obtained with smaller datasets [44,[47][48][49]. The lack of linkage disequilibrium decay with increasing physical distance in lineages 2, 3 and 4 (S3 Fig), and their low proportions of homoplastic mutations (Table 1), provided additional evidence for an absence of recombination in these lineages, contrasting with lineage 1. Consistent with patterns of linkage disequilibrium, the clonal fraction (1-

PLOS PATHOGENS
Maintenance of population structure in rice blast number of observed multilocus genotypes over the number of isolates) was lower in lineage 1 (39%) than in lineages 2, 3 and 4 (75, 82 and 68%, respectively; Table 1). Using whole-genome data for isolates from lineages of P. oryzae infecting other grasses and cereals, we showed that our set of Infinium SNPs could be used to identify isolates not belonging to the rice-infecting lineage, although it was unable to differentiate between the different host-specific lineages of P. oryzae (S2 Text). Clustering analysis identified two isolates from West Africa (BN0019 and BF0072, clonal group 18 [S1 Table]) corresponding to a lineage different from the rice lineage and possibly resulting from ancient admixture between other lineages (S2 Text). However, on inclusion of these divergent isolates in our dataset, we detected no fixed differences with respect to the rest of the isolates from lineage 1, and these divergent isolates did not form a separate group in clustering analyses. They are not, therefore, likely to have affected on our main conclusions on population subdivision and mode of reproduction.
Geographic structure and admixture within recombining lineage 1 Lineage 1 was the only lineage for which there was population genetic evidence of recombination (Table 1 and S6-S8 Data). Most of the lineage 1 isolates were collected in Asia (79%), but the lineage was present on all continents in which the pathogen was sampled (Europe: one isolate; North America: 10, Central and South America: 7, Africa: 22) (Fig 2A and 2B and S3 and S4 Data). Clustering analysis on this single lineage with SNMF detected four clusters with different geographic distributions (Fig 2D and S4 Data). Estimates of differentiation were lower between the clusters within lineage 1 (F ST < 0.51) than between the main lineages (F ST > 0.56), consistent with a longer history of restricted gene flow between the main lineages and a more recent history of admixture between clusters within lineage 1 (S1 Text).
Two clusters within lineage 1 (referred to hereafter as Baoshan and Yule) consisted mostly of isolates sampled from two different sites in the Yunnan province of China (Baoshan and Yule, respectively, located about 370 km apart). The Baoshan cluster also included one isolate from Nepal and one isolate from Hunan; the Yule cluster also included two isolates from Vietnam. The third cluster consisted mostly of isolates from Laos (referred to hereafter as Laos), but also included one isolate from Rwanda, three isolates from Yunnan, two isolates from Vietnam, and one isolate from French Guyana. This latter isolate is the widely used laboratory strain GUY11, collected from fields cultivated by the Hmong diaspora. The fourth cluster brought together 95% of the isolates from lineage 1 collected outside Asia (referred to hereafter as International). In Asia, the International cluster was found mostly in the Yunnan province of China, India and Nepal ( Fig 2B and S1 Table). The Yule, Laos and International clusters coexisted in the same field (Yule village), the same year (2009). PHI tests for recombination rejected the null hypothesis of clonality for all four clusters (Table 1 and S6-S8 Data). Shared ancestry was widespread in lineage 1, with most of the isolates (78%) displaying individual ancestry coefficients (q) greater than 0.10 for two or more clusters (Fig 2D and S4 Data). Only five genotypes were detected in multiple countries (genotype ID [number of countries]: 2 [6], 18 [2], 58 [2], 98 [3] and 254 [3], corresponding to 41 isolates in total; Fig 2E and S5 Data). All these clonal genotypes belonged to the International cluster, which included 76 isolates in total. The clonal fraction (i.e. 1-number of genotypes over the sample size) was larger in the International cluster (62%) than in Baoshan, Laos and Yule clusters (15, 35 and 18%, respectively; Table 1).

Reproductive barriers caused by female sterility and postmating genetic incompatibilities
To elucidate how the clonal lineages have emerged from the more ancient recombining populations in lineage 1, we determined the capacity of the different lineages to engage in sexual

PLOS PATHOGENS
Maintenance of population structure in rice blast reproduction. Using in vitro crosses with Mat1.1 and Mat1.2 tester strains, we analyzed the distribution of mating types and determined the ability to produce female structures (i.e. female fertility). We found that lineages 2, 3 and 4 were composed almost exclusively of a single mating type: 97% of lineage 2 and 4 isolates tested carried the Mat1.1 allele, while 97% of lineage 3 isolates carried the Mat1.2 allele. Only a small proportion (0-7%) of isolates from these lineages showed female fertility ( Table 1). The mating type ratio was more balanced in lineage 1 (52% of Mat1.1; Table 1), with Mat1.1/Mat1.2 ratios ranging from 40/60 in the Yule cluster to 69/31 in the Baoshan cluster. Female fertility rates were also high in most of the clusters in lineage 1 (Yule: 79%; Baoshan: 67%; Laos: 37%; Table 1). Only in the International cluster female fertility was as low as those in the clonal lineages, with 4% fertile females (Table 1).

PLOS PATHOGENS
Maintenance of population structure in rice blast These observations suggest that despite the generally low rate of female fertility, sexual reproduction between most lineages is possible, except between lineages 2 and 4, that have the same highly biased mating type ratio.
We further assessed the likelihood of sexual reproduction within and between lineages, by evaluating the formation of sexual structures (perithecia), the production of asci (i.e. meiotic octads within the perithecia) and the germination of ascospores (i.e. meiospores in meiotic octads) in crosses between randomly selected isolates of opposite mating types from all four lineages (Fig 3). This experiment revealed a marked heterogeneity in the rate of perithecium formation across lineages. Clusters in lineage 1 had the highest rates of perithecium formation, with isolates in the Yule cluster, in particular, forming perithecia in 93% of intra-lineage crosses, and in more than 46% of inter-lineage crosses ( Fig 3A and S9 Data). Due to their highly biased mating type ratios, isolates from lineages 2, 3 and 4 could only be crossed with isolates from other lineages. The proportion of these inter-lineage crosses producing perithecia was highly variable and ranged, depending on the lineages involved, from 0% to 83% (Fig 3A  and S9 Data). In inter-lineage crosses involving the International cluster of lineage 1, the rate of perithecium formation was similar as in inter-lineage crosses involving clonal lineages 2, 3 and 4. None of the intra-lineage crosses attempted between isolates from the International cluster led to perithecium formation ( Fig 3A and S9 Data).
Perithecium dissection for a subset of crosses involving some of the most fertile isolates revealed that most inter-lineage crosses produced perithecia that did not contain asci or contained asci with low ascospore germination rates ( Fig 3B and S10 Data). While 100% of intralineage crosses between isolates of the Yule cluster produced numerous germinating ascospores, only 33%, 56% and 7% of Yule x lineage 2, Yule x lineage 3 and Yule x lineage 4 crosses produced germinating ascospores, respectively.
Together, our crossing experiments indicate that the three clonal lineages and the International cluster of lineage 1 are isolated from each other by strong pre-and early-postmating barriers, including breeding system isolation (differences in mating type and female sterility), and a combination of gametic incompatibility, zygotic mortality or hybrid non-viability. However, three of the clusters in lineage 1 had biological features consistent with an ability to reproduce sexually, suggesting possible sexual compatibility with clonal lineages 2, 3 and 4, and the International cluster.

Specialization to temperature conditions and rice subgroups
Given the very broad distribution of P. oryzae and the strong environmental heterogeneity it encounters in terms of the nature of its hosts and the climate in which it thrives, we evaluated the variation of fitness over different types of rice host genotypes and temperature conditions. We measured growth rate and sporulation of 41 representative isolates cultured at different temperatures to test the hypothesis of adaptation to temperature (lineage 1 [yule cluster]: 11; lineage 2: 10; lineage3: 10; lineage 4: 10). For all lineages, mycelial growth rate increased with incubation temperature. This trend was more visible from 10˚C to 15˚C (increased mean mycelial growth of +2.22 mm/day) than from 25˚C to 30˚C (+0.05 mm/day) (growth curves and a full statistical treatment of the data are presented in S6 Text; data are reported in S11 Data).

PLOS PATHOGENS
Maintenance of population structure in rice blast rate of lineage 4 at 10˚C relative to other lineages and a significantly higher growth rate of lineage 1 at 15˚C, 20˚C, 25˚C and 30˚C relative to other lineages. Almost no sporulation was detected at 10˚C after 20 days of culture, with the few spores observed being not completely formed and divided by only one septum instead of two septa in mature conidia (sporulation curves and a full statistical treatment of the data are presented in S7 Text; data are reported in S12 Data). For all lineages, sporulation (weighted by mycelium colony size) increased with

PLOS PATHOGENS
Maintenance of population structure in rice blast temperature from 15 to 25˚C and dropped at 30˚C. Significant lineage effects were observed at 10˚C, 15˚C and 30˚C (Kruskal-Wallis tests; 10˚C: H(3) = 9.27, p = 0.026; 15˚C: H(3) = 17.5, p-value<0.001; 30˚C: H(3) = 9.60, p = 0.022). Pairwise non-parametric comparisons of lineages based on Dunn's test revealed significant differences between lineages 1 and 2 at 10˚C, between lineage 1 and lineages 3 and 4 at 15˚C, and between lineage 1 and 4 at 30˚C. Together, measurements of mycelial growth and sporulation at different temperatures revealed differences in performance between lineages, but no clear pattern of temperature specialization.
We assessed the importance of adaptation to the host, by challenging 45 rice varieties representative of the five main genetic groups of Asian rice (i.e. the temperate japonica, tropical japonica, aus, aromatic and indica subgroups of Oryza sativa) with 70 isolates representative of the four lineages of P. oryzae and the four clusters within lineage 1 (S2 Table and S13 Data). Interaction phenotypes were assessed qualitatively by scoring resistance (from full to partial resistance: scores 1 to 3) or disease symptoms (from weak to full susceptibility: scores 4 to 6), which is standard in rice blast phenotyping and documented by images [54][55][56][57]. Resistance/ susceptibility scores were analyzed by fitting a proportional-odds model and performing an analysis of variance that revealed significant differences between groups of isolates (χ 2 (6) = 100, p-value<0.001), between rice genetic groups (χ 2 (4) = 161, p-value<0.001), and a significant interaction between these two variables (χ 2 (24) = 97, p-value<0.001) (S8 Text). The finding of a significant interaction between groups of isolates (lineages or clusters) and rice types indicates that the effect of the group of isolates on the proportion of compatible interactions differed between rice types, suggesting adaptation to the host. This is also supported by the specific behaviour of certain lineages. Isolates from lineage 2 had much lower symptom scores than the other lineages on all rice types except temperate japonica, and the isolates of the Yule cluster were particularly virulent on indica varieties (Fig 4A and S2 Table). In comparisons of rice genetic groups, significantly higher symptom scores were observed on temperate japonica rice than on the other types of rice, whereas the varieties of the aromatic genetic group were significantly more resistant to rice blast ( Fig 4B). Together, these experiments therefore revealed significant differences in host range between lineages. However, this specialization to the host was not strict, because there was an overlap between host ranges (Fig 4C-4G).

Differences in the number and genetic variability of putative effector genes between P. oryzae lineages
Effectors, defined as secreted proteins that modulate host responses, are the most prominent class of fungal virulence factors. They act by manipulating host cellular pathways and are thus key determinants of the host range and fitness on compatible hosts [58,59]. We therefore determined the differences in effector repertoires between P. oryzae lineages in terms of presence/absence and nucleotide polymorphisms, and analyzed whether this was a particularly dynamic fraction of the gene space. To this end, we used whole-genome sequencing data for 123 isolates, including 29 isolates sequenced in this study and 94 publicly available genomes [39,47,60], 33 of which were also genotyped with our Infinium genotyping beadchip (S3 Table  and S3 Text). Clustering analysis indicated that this dataset covered the four lineages of P. oryzae, and three of the clusters of lineage 1 (Laos, International and Baoshan; S3 Text and S3 Table). Assembly lengths for the 123 sequenced isolates ranged from 36.7 to 39.6 Mb, with a mean value of 38.1 Mb (S3 Table). The number of assembled contigs longer than 500 bp ranged from 1010 to 2438, and the longest contig was 1.1 Mb long (S3 Table).
Gene prediction identified 11,684 to 12,745 genes per genome, including 1,813 to 2,070 genes predicted to encode effector proteins. We found a significant effect of the lineage of origin on both the mean number of putative effector genes (ANOVA, F = 5.54, p = 0.0014), and  [54,55] estimated using pathogenicity tests in controlled conditions. A: Symptom scores as a function of the lineage of origin of isolates, or cluster of origin for isolates from lineage 1; B: Symptom scores as a function of the type of rice; C-G: Symptom scores as a function of the lineage of origin of isolates, for each type of rice. Abbreviations: Y, Yule; I, International; B, Baoshan; L, Laos; 2, lineage 2; 3, lineage 3; 4, lineage 4; Ind, indica; TeJ, temperate japonica; TrJ, tropical japonica; Aro, aromatic. Shared small capitals indicate non-significant differences (pairwise comparisons after computing least-square means, with Tukey adjustment). All interactions between rice varieties and P. oryzae isolates were assessed in three independent experiments, and the median of the three symptom scores was used in calculations. Boxen plots were drawn using function boxenplot() with PYTHON package SEABORN 0.11.1. Starting with the median as centerline, each successive level (i.e., each successive box) outward contains half of the remaining data. Number of boxes was controlled using option k_depth = 'trustworthy'. Black horizontal lines represent the median. https://doi.org/10.1371/journal.ppat.1010687.g004

PLOS PATHOGENS
Maintenance of population structure in rice blast the mean number of non-effector genes (ANOVA, F = 7.95, p-value<0.001). Multiple comparisons revealed that mean numbers of putative effectors and non-effector genes were only significantly different between the genomes of lineage 2 and lineages 3 (Tukey's HSD test; p-value<0.001; Fig 5A and 5B and S14 Data).
To estimate the size of accessory and core genomes in the different lineages, we performed an orthology analysis that identified 14,573 groups of orthologous sequences (i.e. orthogroups; S17 Data), and we applied a rarefaction approach to the table of orthology relationships to estimate the size of accessory and core genomes in lineages while accounting for differences in sample size. For both putative effectors and the remaining of the gene space, we found that the number of core and accessory genes did not reach a plateau. Instead, they displayed an almost linear relationship with the number of genomes resampled. This indicates that the number of core genes was probably much smaller, and the number of accessory genes substantially higher than estimated from 123 genomes (S4 Fig). With pseudosamples of size n = 30 per lineage (i.e. excluding lineage 4, due to its small sample size), the gene content was highly variable within lineages with only 61-71% of all predicted effectors, and 68-73% of the remaining of the gene space being conserved in lineages 1-3 (S5 Fig). Despite extensive variation of the gene content within lineages, the clustering of isolates based on presence/absence variation for both putative effectors and the remaining of the gene space was highly similar to that based on 3,868 SNPs ( Fig 5D and 5E and S15 and S16 Data), indicating that variation in gene content and SNP allelic variation reflected similar genealogical processes.
For the identification of putative effectors potentially involved in the host specialization of rice-infecting lineages of P. oryzae, we first identified orthogroups with different patterns of presence/absence across lineages. Principal component analysis on presence/absence data identified 72 orthogroups of putative effectors accounting for 95% of the variance of the three principal components differentiating the four lineages (Fig 5C, 5D and 5E), and potentially contributing to the differences in host range between lineages.
Host specialization can also involve sequence divergence for effector proteins. We therefore also scanned the corresponding genes for signatures of higher rates of evolution, i.e. an excess of non-synonymous nucleotide diversity (ratio of non-synonymous to synonymous nucleotide diversity π N /π S >1), and signatures of divergent selection (i.e. high relative divergence, F ST ). We identified 185 orthogroups with π N /π S > 1 in at least one lineage, including 164 orthogroups with π N /π S > 1 in only one lineage (lineage 1: 131 orthogroups; lineage 2: 18; lineage 3: 10; lineage 4: 5), and twelve, seven and two orthogroups with π N /π S > 1 in two, three or four lineages, respectively (S4 Table). There was no significant difference in inter-lineage F ST between effector and non-effector genes (Mann-Whitney U-tests, p-value>0.05; S1 Text). We identified 291 orthogroups in the top 10% of inter-lineage of F ST in at least one pair of lineages, including 42 orthogroups with π N /π S > 1 in at least one lineage (S5 Table). None of the orthogroups with π N /π S > 1 or high inter-lineage F ST corresponded to effectors previously characterized as being involved in P. oryzae virulence on rice or other Poaceae hosts [61].

Geographic and climatic differentiation in the distribution of P. oryzae lineages
Our tests of adaptation to host and temperature under controlled conditions showed that specialization was not strict, but it remained possible that fitness differences between lineages would be sufficient under natural conditions to induce separation in different ecological niches, and/or that our experimental conditions did not capture the full extent of the phenotypic differences. Here, we tested the hypothesis that lineages thrive in geographically and/or environmentally different conditions, which would provide indirect evidence for specialization

PLOS PATHOGENS
Maintenance of population structure in rice blast to different habitats. We tested this hypothesis by collecting geographic and climatic data for all isolates, using the GPS position of the nearest city or the geographic center of their region of origin when the exact GPS position of isolates was not available (S1 Table). At a larger scale, clonal lineages 2 and 3 were both widespread, with lineage 2 found on all continents, and lineage 3 on all continents except Europe. Lineage 2 was the only lineage sampled in Europe (with the exception of a single isolate from lineage 1), whereas lineage 3 was more widespread in intertropical regions. Lineage 4 was mostly found in South Asia (India, Bangladesh, Nepal), and in the USA and Africa (Benin, Tanzania). The recombining lineage, lineage 1, was found on all continents, but mostly in Asia (79% of all isolates and 94% of all genotypes). At a smaller scale, two, or even three different lineages were sampled in the same year in the same village or city in 11 different countries from all continents. Such coexistence was observed for 24 of the 65 municipalities x year combinations for which multiple isolates were collected. A Kruskal-Wallis test showed that there was a statistically significant difference in altitudinal distribution between lineages (H = 163.9; p-value<0.0001; Fig 6). On average, lineage 1 was distributed at higher altitudes (mean: 908m), lineage 2 at lower altitudes (mean: 209m) and lineages 3 and 4 were intermediate (means: 521 and 669m, respectively). Mann-Whitney tests for multiple comparisons showed that the altitudinal distribution of lineages 1 and 2, lineages 1 and 3, and lineages 2 and 3 differed significantly at p-value < 0.05. Altitudinal distributions were relatively broad, with all lineages distributed at high altitudes (>1500m), and all lineages except lineage 4 distributed at low altitudes (<250m). Together, these analyses reveal that lineages had different, but partially overlapping latitudinal, longitudinal and altitudinal distributions, including some overlap with the distribution of the sexually reproducing lineage (lineage 1).
We then investigated whether differences in the geographic range of lineages were associated with differences in climatic variables. By plotting sampling locations onto a map of the major climate regions [62] we were able to identify clear differences in the climatic distributions of lineages 2 and 3, with lineage 2 mostly found in warm temperate climates and lineage 3 found in equatorial climates (Fig 7A and 7B and S18 Data and S4 Text). We used the outlying mean index (OMI), which measures the distance between the mean habitat conditions used by a lineage and the mean habitat conditions used by the entire species, to test the hypothesis that different lineages are distributed in regions with different climates. Using information for 19 climatic variables (biomes) from the WorldClim bioclimatic variables database [63] for all sampling locations, we obtained statistically significant results, in a permutation test on OMI values, for lineages 2, 3 and 4 (permutation test: lineage 1: OMI = 2.14, p-value = 0.620; lineage 2: OMI = 6.54, p-value<0.001; lineage 3: OMI = 2.08, p-value<0.001; lineage 4: OMI = 13.73, p-value = 0.017). The OMI analysis, in which the first two axes accounted for 69% and 25% of the variability, respectively, revealed that lineage 2 was more frequent in regions with a high annual temperature range (biome 7) or a high degree of seasonality (biome 4); lineage 4 was associated with regions with high levels of seasonal precipitation (biomes 13, 16 and 18), and lineage 3 was more frequent in regions with high temperatures (biomes 1, 6, 10 and 11) and high levels of isothermality (biome 3), characteristic of tropical climes (Figs 7C, 7D and S6 and S18 Data and S5 Text).

Discussion
This study reports on the worldwide distribution of genetic and phenotypic diversity in P. oryzae, and of the eco-evolutionary factors underlying the maintenance of multiple divergent lineages in this widespread pathogen. We detected three clonal lineages and one recombining lineage with broad and largely overlapping distributions, corresponding to the four groups previously identified on the basis of geographically more restricted samplings (43,44,(47)(48)(49). Our study substantially extends these previous findings by revealing that the lack of recombination and recent admixture in widespread lineages persists with deeper sampling, and by providing evidence for the subdivision of the recombining lineage into one international and three Asian clusters. This deep sampling of the natural genetic variation also reveals that clonal lineages of rice blast are more frequent in areas differing in terms of the prevailing environmental conditions. We also provide additional experimental evidence that lineages are hostspecialized, and new evidence that breeding system isolation and early post-mating genetic incompatibilities act as strong barriers to gene flow between lineages.

Niche separation and barriers to gene flow between clonal lineages
Many fungal plant pathogens consist of multiple divergent lineages coexisting on the same crop species, over large areas [64]. This raises questions about the maintenance of different lineages in the face of potential gene flow. We show that female sterility and intrinsic genetic incompatibilities represented strong barriers to gene flow between the clonal lineages, potentially accounting for their maintenance over time, without fusion, despite the compatibility of mating types. These barriers to gene flow may also have contributed to the establishment of lineage specialization because reproductive isolation facilitates adaptation by limiting the flow of detrimental ancestral alleles in populations adapting to a new habitat.
To test for host specialization, we conducted pathogenicity experiments using a larger set of isolates [49,54], but also a larger set of rice varieties representing all five rice subgroups, in , keeping only isolates for which the sampling position was precisely known (i.e., for which the region, city or GPS position was documented). Background map is from OpenStreetMap (CC-BY-SA) and represents the updated Köppen-Geiger climate classification of main climates at world scale as described in (62). (C, D) Outlying Mean Index analysis, testing the separation of ecological niches among lineages, with (C) canonical weights of the 19 environmental variables included in the analysis, and (D) site coordinates (dots) and realized niches of lineages represented as ellipses. Variable bio13 co-localizes with bio16, and variables bio1 and bio3 co-localize with bio11. The 11 temperate variables included in the analysis are listed in the Materials and Methods section. PC1 and PC2 in panels C and D represent the first and second principal components, respectively, with percentage of variation between parentheses. https://doi.org/10.1371/journal.ppat.1010687.g007

PLOS PATHOGENS
Maintenance of population structure in rice blast order to assess whether certain features of adaptation to host may have been missed by previous studies based on more limited datasets. Our results were consistent with earlier findings in that the host range varied across lineages, but all host ranges overlapped, indicating that host specialization was not strict, or was not fully captured by our experiments. We therefore set out to examine the geographic and climatic distribution of lineages as additional barriers to gene flow, by relying on our dataset of unprecedented geographic density (152 sites in 51 countries, vs. 23 countries in ref. [47][48][49] or 55 sites in 15 countries in ref. [44]). We show that, despite widely overlapping large-scale distributions, the different lineages are essentially found in different regions, with different climatic characteristics, when their distributions are observed at a finer scale. Lineage 1 did not predominate in one particular type of climate but was mostly sampled in Southeast Asia, lineage 4 predominated in regions with high levels of seasonal precipitation (typically the Indian subcontinent), and lineages 2 and 3 had circumglobal distributions but predominated in temperate and tropical climates, respectively. The types of rice grown in these geo-climatic zones tend to be different, with in first approximation indica rice growing in the plains throughout the tropics and sub-tropics, tropical japonica in upland high-rainfall environments, temperate japonica in mid-latitude temperate zones, aromatic rice in the northwest of the Indian subcontinent, and aus in ecologically variable areas of South and Southeast Asia [65][66][67][68][69][70]. We cannot, therefore, rule out that host specialization contributes to the observed geoclimatic differentiation, but it remains unlikely that specialization is the main cause of this differentiation, because when observing at a finer scale the distribution ranges of rice subgroups largely overlap [65][66][67][68][69][70]. Disentangling host specialization from climate adaptation would require the combination of dense sampling and characterization of the genetic make-up of the varieties of origin of isolates.
Despite the finding of lineage separation in different geo-climatic regions, our experiments revealed no strong differences between lineages in terms of sporulation and mycelial growth on synthetic media at different temperatures. This suggests that, if adaptation to temperature occurs in this pathogen, it was not measured by our experiments, either because it involves traits other than sporulation and hyphal growth, or because the in vitro conditions were not suitable to demonstrate differences. These small differences in temperature optima may nonetheless, together with adaptation to the host, contribute to niche separation and thereby reduce gene flow between lineages, via pre-mating barriers, such as immigrant non-viability (i.e., lower rates of contact between potential mates due to the mortality of immigrants; [14,71]), and post-mating barriers, such as ecological hybrid non-viability (i.e., lower survival of illadapted hybrid offspring).

Link between disease spread and pathogen lineage divergence
There is a direct connection between disease emergence and speciation, because the emergence of new pathogens can be facilitated, caused by or associated with strong divergent selection in organisms mating within or on their hosts, and by major changes in demographics and geographic isolation [15,[72][73][74]. There is also a connection between disease emergence and mode of reproduction, because disease emergence should be facilitated by multiple asexual cycles, corresponding to multiple cycles of selection for local adaptation without migration introducing locally deleterious immigrant alleles and recombination breaking down locally advantageous allelic combinations [15]. However, theoretically, in the long term, asexual organisms should accumulate a greater load of deleterious mutations and be less able to fix advantageous mutations, resulting in higher rates of extinction and lower rates of speciation [75,76]. The pattern of population structure revealed in P. oryzae seems to be intermediate between observations for introduced fungi with fully clonal population structures-such as the

PLOS PATHOGENS
Maintenance of population structure in rice blast pathogens causing wheat brown rust [28], boxwood blight [77] or verticillium wilt [78,79]and introduced fungi with barriers maintaining lineages that are permeable and allow hybridization and introgression to occur, such as the pathogens causing Dothistroma needle blight [80], Dutch elm disease [11], and soybean anthracnose Colletotrichum truncatum [81]. The global structure of P. oryzae is more reminiscent of that of the pathogens causing chestnut blight [82,83], wheat yellow rust [84,85], and hop powdery mildew [86], for which clonal lineages coexist with recombining lineages, although the proximal causes for the loss of sexual reproduction may differ between these models and rice blast. Another marked difference from most of the abovementioned pathogens is that the clonal lineages of rice blast are relatively older, having diverged about a thousand years ago [47][48][49], which corresponds to several thousand asexual generations. Recombining lineage 1 may, at some point, become a source of adaptive introgression, reinvigorating the genomes of clonal lineages to reduce the mutational load, introducing beneficial variation, and leading to marked changes in population structure.
Thus, although our work shows that the spread of a pathogen over heterogeneous habitats and over divergent populations of a crop species can drive the emergence of reproductive barriers, we cannot predict whether these lineages will continue to exist in their current form for the foreseeable future.

Loss of female sterility as the proximal cause of shifts to clonality
The loss of sexual reproduction, caused by demographic events or selection against sex, is a frequent change in the life history traits of introduced fungal pathogens [8,24,25,87]. However, the underlying changes and driving factors remain largely unexplored. In P. oryzae, the release of constraints maintaining sex in the short term, such as the need for sexual structures for winter survival or long-distance dispersal, is probably not responsible for the loss of sex, because the likelihood of overwintering on seeds and the rate of human-mediated transportation would not be expected to differ between clonal lineages 2, 3 and 4 and recombining lineage 1. The protection of locally advantageous allelic combinations is a more plausible explanation for the loss of sex in lineages 2, 3 and 4, because, as shown here, these lineages are found in, and adapted to, different hosts and environmental conditions. This leaves open the question of the maintenance of sex in lineage 1 open, but it is possible that the diversity of cultivated and wild rice diversity of rice is higher in the geographical range of this lineage [45]. It is difficult to draw firm conclusions about the evolutionary causes for selection against sexual reproduction, but the finding of an internationally distributed cluster in lineage 1 sheds light on the possible sequence of proximal events underlying the emergence of clonal lineages in this pathogen. The International cluster of lineage 1 displayed genetic signatures of recombination and sexual reproduction (in the form of an excess of homoplastic variants or relatively balanced mating types), but the signal of recombination detected here may be purely historical, because we also found that the clonal fraction and the frequency of sterile females were very high in this cluster. The loss of female fertility may, therefore, be an early and major cause of the shift towards asexual reproduction. Some genotypes assigned to the international cluster of lineage 1 may, in the future, come to resemble other clonal lineages in analyses of genealogical relationships, and form a long branch stemming from the central lineage 1, particularly for genotypes already displaying clonal propagation, such as genotypes 2, 18, 58, 98 and 254. This model could also apply to other fungal pathogens with similar life history strategies.

PLOS PATHOGENS
Maintenance of population structure in rice blast of P. oryzae possess smaller effector repertoires [48] and that the clonal lineage associated with japonica rice (lineage 2) possesses more effectors, and, in particular, Avr-effectors, than other clonal lineages [48,58]. Our analyzes on a larger set of putative effectors (ca. 2000 in our study, vs. 13 Avr-effectors in [58] and 178 known and candidate effectors in [48]) do not completely confirm this trend as we do not find a significantly larger effector complement in lineage 1 compared to other lineages. Lineage 2 associated with temperate japonica has significantly more putative effectors than lineage 3 associated with indica. However, the number of noneffector genes is also greater in lineage 2 than in lineage 3, and thus it remains possible that the larger effector complement of lineage 2 is a mere consequence of a larger number of genes. We also found that patterns of presence / absence variation mirrored patterns of population subdivision based on SNP allelic variation, and, thus, that the differential sorting of both nucleotide polymorphism and gene content across lineages reflects similar genealogical processes. Our multivariate analyses identified 72 effectors making the greatest contribution to the differentiation of the four lineages in terms of presence / absence. It is possible that the frequency differences for these 72 effectors were due to chance events during bottlenecks at the onset of lineage formation, or, alternatively, that their differential loss was important for the initial adaptation of lineages to new rice populations or subgroups. Interestingly, two of these 72 effectors, AvrRmg8 (= OG0011611) and PWL3 (= OG0011928), are host range determinants in wheat-infecting P. oryzae because they trigger immunity in wheat varieties carrying the resistance genes Rmg8 [91] or PWT3 [38,92]. Nucleotide diversity at synonymous and nonsynonymous sites and measures of relative nucleotide divergence (F ST ) also identified effectors with signatures of diversifying or divergent selection, possibly mediated by coevolutionary interactions with host molecules. In the future, it will be interesting to determine the molecular targets of these effectors and to decipher the relationship between their polymorphism and their mode of action.

Lineage 1 is a threat to global rice production
Our results also indicate that lineage 1 may pose a major threat to rice production. Most of the effectors with signatures of diversifying selection were identified in lineage 1, highlighting the greater ability of this lineage to fix advantageous mutations rapidly. Eleven of the 12 most multivirulent isolates (lesion type > 2 on more than 40 varieties tested) belonged to lineage 1, and the Yule cluster, in particular, was highly pathogenic on indica varieties. The propagation of such highly pathogenic genotypes, belonging to a lineage that has both mating types and fertile females, should be monitored closely. This monitoring of lineage 1 is all the more critical because this lineage has an intermediate geoclimatic distribution that overlaps largely with the distributions of the other three lineages, and some of the attempted crosses between clonal lineage 2, 3 and 4 isolates and recombining lineage 1 isolates produced viable progeny. This finding confirms the possibility of gene flow into this lineage, as previously demonstrated on the basis of admixture mapping (48,49). It also raises the question of the threat posed by gene flow from the highly pathogenic lineage 1 to the other lineages.

Concluding remarks
Our study of genetic and phenotypic variation within and between clonal lineages, and within the recombinant lineage of P. oryzae suggests a scenario for the emergence of widespread clonal lineages. The loss of female fertility may be a potent driver of the emergence of asexually reproducing groups of clones. The reproductive isolation generated by the loss of sex and the accumulation of mutations due to the absence of sexual purging would facilitate the specialization of some of these clonal groups, leading to the extinction of the least efficient clonal groups,

PLOS PATHOGENS
Maintenance of population structure in rice blast and, finally, to the propagation of clonal lineages fixed for a single mating type. These results highlight that disease spread over heterogeneous habitats and over divergent populations of a crop species can lead to niche separation and barriers to gene flow between different lineages of a widely distributed fungal plant pathogen.

Biological material
We chose 886 P. oryzae isolates collected on Asian rice between 1954 and 2014 as representative of the global genetic diversity of the fungus. Isolates were selected on the basis of microsatellite data, to maximize the number of multilocus genotypes represented [44], or based on geographic origin in the absence of genotypic data, to maximize the number of countries represented in the dataset.
Seventy-two and seventy isolates were selected for experimental measurements of reproductive success and pathogenicity, respectively (S1 Table). This subset of isolates included 10 isolates from each of the three clonal lineages (lineages 2, 3 and 4), and 40 isolates (42 for mating experiments) from the various clusters within lineage 1 [Baoshan (9 isolates), International [10], Laos [11], and Yule (12 for mating experiments, 10 for pathogenicity experiments)]. A subset of 41 isolates was selected for growth and sporulation experiments at different temperatures (S1 Table) including 10 isolates from each of the three clonal lineages (lineages 2, 3 and 4), and 11 isolates from the Yule cluster of lineage 1. Isolate CH0999 was included in the pathogenicity tests, although not genotyped with the Infinium array. We assigned this isolate to the Yule cluster using a combination of microsatellite data [44] and Infinium SNPs (not shown).
Forty-five varieties were chosen as representative of the five main genetic subgroups of Asian rice [93]:

Genotyping
Pyricularia oryzae isolates were genotyped at 5,657 genomic positions with an Illumina Infinium beadchip microarray carrying SNPs identified in whole genome sequencing data for 29 rice-and barley-infecting P. oryzae isolates [49]. The set of 29 whole genome sequences corresponded to four rice-infecting isolates characterized in ref. [94] and 25 isolates sequenced using paired-end Illumina reads in reference [49]. Read mapping and SNP calling were carried out as in reference [49]. The SNPs were selected to be polymorphic and biallelic, and not to distinguish between the different lineages identified previously. Of the ca. 20,000 SNPs initially identified in 29 genomes, we retained 7,000 SNPs that were biallelic and whose flanking sequences (50 bp) were unique and monomorphic. We then randomly subsampled 5,692 SNPs, and 5,657 could be successfully characterized in our set of 886 isolates. The final dataset included 3,686 biallelic SNPs without missing data.

Whole genome sequencing
We used a set of 123 whole-genome sequences to investigate differences in effector content between clusters and lineages, including 94 publicly available genomes and 29 new genomes that we selected to increase sample size for the International and Laos clusters of lineage 1, and for lineage 4 (S3 Table). The 29 isolates were sequenced using Illumina HiSeq 3000. Isolates were grown on rice flour-agar medium for mycelium regeneration, then in liquid rice flour medium [95]. Genomic DNA extraction was carried out with >100 mg of fresh mycelium from liquid culture. Fresh mycelium dried on Miracloth paper was crushed in liquid nitrogen. Nucleic acids were subsequently extracted using an extraction buffer (2% CTAB-1.4 M NaCl-0.1 M Tris-HCl pH 8-20 mM EDTA pH 8 added before use with a final concentration of 1% Na 2 SO 3 ), then purified with a chloroform:isoamyl alcohol (24:1) treatment, precipitated overnight in isopropanol, and rinsed with 70% ethanol. The extracted nucleic acids were further treated with RNase A (0.2 mg/mL final concentration) and purified with another chloroform: isoamyl alcohol (24:1) treatment followed by overnight precipitation in ethanol. The concentration of extracted genomic DNA was assessed on Qubit using the dsDNA HS Assay Kit. The purity of the extracted DNA was checked by verifying that the 260/280 and 260/230 absorbance ratios measured with a NanoDrop spectrophotometer were between 1.8 and 2.0. Preparation of TruSeq nano library preparation and HiSeq3000 sequencing (150 nucleotide reads, and 500 bp insert size) were performed at GeT-PlaGe (INRAE, Toulouse, France).

Genome assembly, gene prediction, orthology analysis, effector prediction, and summary statistics of nucleotide variation
For the 123 sequenced isolates included in the dataset, low-quality reads were removed with CUTADAPT software [96]. Reads were assembled with ABYSS 2.2.3 [97,98], using different Kmer sizes. For each isolate, we chose the assembled sequence with the highest N50 for further analyses. Genes were predicted with BRAKER 2.1.5 [99,100] using RNAseq data [60] as extrinsic evidence for model refinement. Genes were also predicted with AUGUSTUS 3.4.0 [101] (training set = Magnaporthe grisea) and gene models that did not overlap with the gene models identified with BRAKER were added to the GFF file generated with the latter. Repeated regions were masked with REPEATMASKER 4.1.0 (http://www.repeatmasker.org/). The completeness of genome assemblies and gene predictions was assessed with BUSCO [102], and completeness in BUSCO genes was greater than 93.7% (S3 Table). Putative effector genes were identified as genes encoding proteins predicted to be secreted by at least two of three methods [SIGNALP 4.1 [103], TARGETP [104] and PHOBIUS [105]], with no predicted transmembrane domain based on TMHMM analysis [106], no predicted motif of retention in the endoplasmic reticulum based on PS-SCAN [107], and no CAZy annotation based on DBSCAN V7 [108]. Differences in the numbers of putative effectors and non-effectors between lineages (ANOVA and Tukey's HSD test) were assessed with the SCIPY 1.6.0 package in PYTHON 3.6. Homology relationships between predicted genes were established with ORTHOFINDER v2.4.0 [109]. Principal component analysis of presence/absence variation for putative effectors and non-effector proteins was performed with the R package PRCOMP. We used the get_pca_var() in PRCOMP to extract the results for variables (i.e. effectors) and to identify the effectors making the largest contributions to the principal components, defined as effectors with a contribution to loadings of PC1, PC2 and PC6 greater than 1%. For estimates of the numbers of core and accessory genes, we used a rarefaction approach to account for differences in sample size across lineages. For each pseudo-sample size, we estimated the size of the core and accessory genome in each lineage using a maximum of 2000 pseudosamples. Core and accessory genome sizes were compared using a Kruskal-Wallis test and posthoc Mann-Whitney U-tests as implemented in packages SCIKIT_-POSTHOCS 0.6.6 and SCIPY 1.8.0 in PYTHON 3.6. Sequences for each orthogroup were aligned and cleaned with TRANSLATORX [110], using default parameters. The ratio of non-synonymous to synonymous diversity π N /π S and Weir and Cockerham's F ST [111] were calculated for each orthogroup with EGGLIB 3 (https://www.egglib.org), with minimal sample size of at least four per lineage, excluding sites with more than 30% missing data, and excluding alignments shorter than 100 bp after removal of sites with excess missing data. Distributions of F ST in effector and non-effector genes were compared using Mann-Whitney U-tests as implemented in package SCIPY 1.8.0 in PYTHON 3.6.
All raw sequencing data are deposited under accession codes PRJEB42377 and PRJEB46618. Published data sets with accession codes PRJNA417903, PRJEB41186 and PRJNA354675 were used. Single-nucleotide polymorphisms, genome assemblies, genome annotations, and gene predictions were deposited in Zenodo (doi: 10.5281/zenodo.4561581).

Population subdivision and recombination
The dataset used for population genetic analysis had no missing data. Clones were therefore identified as isolates with identical genotypes at 3,686 sites. Among the 886 P. oryzae isolates genotyped, we identified 264 different multilocus genotypes (i.e., "clones"), which were used for the analysis of population subdivision. We used the SNMF program to infer individual ancestry coefficients in K ancestral populations. This program is optimized for the analysis of large datasets and does not assume Hardy-Weinberg equilibrium. It is, therefore, more appropriate to deal with inbred or clonal lineages [50]. We used SPLITSTREE version 4 [112] to visualize relationships between genotypes in a phylogenetic network, with reticulations to represent the conflicting phylogenetic signals caused by homoplasy. Weir and Cockerham's F ST [111] was calculated using the same method as whole genome sequencing data.
Analyses of recombination were conducted on a clone-corrected dataset (i.e. keeping only one representative of each the genotypes repeated multiple times). We tested the null hypothesis of clonality using the pairwise homoplasy index (PHI; [51]), maximum χ 2 (Max χ 2 , [52]) and neighbor similarity score (NSS; [53]), as implemented in the PHIPACK program (https:// www.maths.otago.ac.nz/~dbryant/software.html). Homoplastic sites have sequence identities that are not inherited from a common ancestor, but instead derived from independent events in different branches, such as recurrent mutations, sequencing errors or recombination. Homoplastic sites were identified by concatenating genotypes at all sites, inferring a genealogy with the GTR+GAMMA model in RAXML v.8.2.12 [113], and mapping mutations onto the genealogy with the 'Trace All Characters' function of MESQUITE under maximum parsimony [114]. In the resulting matrix of ancestral states for all nodes, the number of mutations occurring at each site was counted and sites displaying multiple mutations across the genealogy were considered to be homoplastic. We used POPLDDECAY [115] to measure linkage disequilibrium (r 2 ) as a function of distance between pairs of SNPs. POPLDDECAY was configured with a maximum distance between SNPs of 400 kbp, and a minor allele frequency of 0.005.

Experimental measurement of reproductive success and female fertility
Pyricularia oryzae is a heterothallic fungus with two mating types (Mat1. 1 and Mat1.2). Sexual reproduction between strains of opposite mating type can be observed in laboratory conditions, and results in the production of ascospores within female sexual structures called perithecia [116]. Crosses were carried out on a rice flour agar medium (20 g rice flour, 2.5 g yeast extract, 15 g agar and 1 L water, supplemented with 500 000 IU penicillin G after autoclaving for 20 min at 120˚C), as described by [116]. We assessed reproductive success by determining the production of perithecia produced after three weeks of culture at 20˚C under continuous light. Each cross was performed twice and we determined the mean number of perithecia across repeats. We further assessed the presence of asci and germinating ascospores in perithecia for a subset of crosses, by excising perithecia with a scalpel and counting the number of germinated filaments for each individual ascus after overnight incubation on water agar. The
We measured female fertility for 210 isolates (listed in S1 Table), by monitoring perithecium production in crosses involving the tester strains CH0997 (Mat1.2) and CH0999 (Mat1.1). On synthetic media, perithecia are formed at the zone of contact between parental mycelia. Isolates forming perithecia are described as female-fertile. Perithecia can be formed by the two interacting partners or by one partner only. If the two partners are female-fertile, two parallel lines of perithecia are observed at the contact zone between mycelia. If only one partner is female-fertile, only one line of perithecia is observed, at the edge of the mycelium of the fertile isolate.

Pathogenicity tests
Compatibility between P. oryzae isolates and rice plants from the five main genetic subgroups of rice (indica, temperate japonica, tropical japonica, aus and aromatic) was measured in controlled conditions. We inoculated on 45 varieties (see section Biological Material), with 70 isolates. Inoculations were performed as described by [54]. Conidial suspensions (25 000 conidia. mL -1 ) in 0.5% gelatin were sprayed onto three-week-old rice seedlings (> 6 plants/variety). The inoculated plants were incubated for 16 hours in the dark at 27˚C and 100% humidity and then for seven days with a day/night alternation (13 hours at 27˚C/11 hours at 21˚C), before the scoring of symptoms. Lesion type was rated from 1 to 6 and the symptom type was assessed visually on leaves. Each interaction was assessed in three independent experiments, which represented 9,450 scored interactions in total. Symptoms scores are ordinal-scale variables (i.e. variables that can be ranked but are not evenly spaced). We therefore analyzed the data with a proportional-odds model with the clm() function implemented in ORDINAL version 2018.8-25 in R, including rice type and pathogen lineage as main effects, and the rice type-pathogen lineage interaction. A comprehensive mixed model including all experimental replicates, with isolate as a random effect, was also tested, but did not display convergence. We therefore used the median of the three scores in calculations, as this is what most accurately represented the biological and experimental variation in the ability of an isolate to exploit a given host genotype. We used an analysis of deviance to evaluate whether effect and interaction terms were statistically significant. Pairwise comparisons of significant factors were performed after computing least-squares means with LSMEANS version 2.30-0 in R, with Tukey adjustment. The R commands and their outputs are provided in S8 Text.

Mycelial growth and sporulation at different temperatures
Mycelial growth and numbers of spores were measured for 41 isolates at five different temperatures (10˚C, 15˚C, 20˚C, 25˚C and 30˚C). For each isolate, Petri dishes containing PDA medium were inoculated with mycelial plugs placed at the center, and incubated in a growth chamber at a fixed temperature. All isolates were cultured together at each temperature. Mycelium diameter was estimated as the mean of two measurements made along two perpendicular axes at different time points. At the end of the experiment, conidia were collected by adding 5 mL of water supplemented with 0.01% Tween 20 to the Petri dish and rubbing the surface of the mycelium. Conidia were counted with a hemocytometer. Three or four independent experiments were performed for each isolate, at each temperature. We had initially planned to carry out only three independent experiments. However, a fourth experiment was eventually performed for some temperature conditions because some isolates did not grow in the first experiment, and because some cultures were invaded by mites and had to be discarded in the second and third experiments. At 30˚C, the number of spores was measured after 7 days only, versus 10 days at other temperatures, because by this stage, the mycelium had already reached the edges of the plates.
Mycelium growth was analyzed independently for each temperature with linear mixedeffects models. The variable used in statistical analysis was the square root of the mean mycelial diameter measured along two perpendicular axes on a Petri plate. Post-hoc pairwise comparisons were performed after computing least-squares means with LSMEANS version 2.30-0 in R, with Tukey adjustment. The R commands and their outputs are provided in S6 Text. Data are reported in S11 Data.
Sporulation data were analyzed with a Kruskal-Wallis test with RSTATIX version 0.7.0 in R. The variable used in statistical analysis was the median number of spores weighted by mycelium size, calculated across replicates. Post-hoc pairwise comparisons were performed with Dunn's non-parametric multiple comparison test. The R commands and their outputs are provided in S7 Text. Data are reported in S12 Data.

Statistical analyses of climatic data
The outlying mean index (OMI), or marginality, is used to study niche separation and niche breadth. The OMI approach evaluates whether a particular lineage is mostly associated with some climatic conditions or can be sampled with the same probability throughout the range of the species. The approach is based on a multivariate analysis (here a PCA) of climatic data associated with the locations included in the dataset, regardless of the identity of the lineages sampled or the number of isolates sampled. An index of marginality or specialization (the outlying mean index) is then calculated for each lineage represented in the dataset. This index measures the distance between the average habitat of a given lineage and the average habitat conditions of the area studied (corresponding to the distribution of a hypothetical species uniformly distributed under all conditions). Lineages are then placed in environmental conditions that maximize their OMIs, and permutation tests are used to test whether the distribution of a lineage differs from random. OMI gives the same weight to all sampling locations, whether rich or poor in individuals or lineages, and is particularly suitable in cases in which sampling is not homogeneous. Environmental values, consisting of 19 biome values (WorldClim bioclimatic variables [63]), were retrieved for all sampling locations. The 11 temperature variables included in the analysis were (bio1) annual mean temperature, (bio2) mean diurnal range, (bio3) isothermality [100 � (bio2/bio7)], (bio4) temperature seasonality [standard deviation � 100], (bio5) maximum temperature of the warmest month, (bio6) minimum temperature of the coldest month, (bio7) annual temperature range [bio5-bio6], (bio8) mean temperature of the wettest quarter, (bio9) mean temperature of the driest quarter, (bio10) mean temperature of the warmest quarter, (bio11) mean temperature of the coldest quarter. The eight precipitation variables included in the analysis were (bio12) annual precipitation, (bio13) precipitation of the wettest month, (bio14) precipitation of the driest month, (bio15) precipitation seasonality (coefficient of variation), (bio16) precipitation of the wettest quarter, (bio17) precipitation of the driest quarter, (bio18) precipitation of the warmest quarter, (bio19) precipitation of the coldest quarter. The 11 climatic variables were normalized before analysis. A contingency table was generated to associate the number of isolates from each lineage with each sampling location. Only isolates from our dataset with a precisely known geographic origin (region, city or GPS position) were included in the analysis. A random permutation test with 10,000 permutations was used to assess the statistical significance of marginality for each lineage. Altitudinal data were analyzed using a Kruskal-Wallis test and posthoc Mann-Whitney Utests as implemented in packages SCIKIT_POSTHOCS 0.6.6 and SCIPY 1.8.0 in PYTHON 3.6.

PLOS PATHOGENS
Maintenance of population structure in rice blast Supporting information S1 Data. Multilocus genotypes for neighbor-net phylogenetic inference, principal component analysis and recombination analyses. S6 Data. RAxML genealogy for homoplasy analysis, estimated from S1 Data. Used to generate Table 1