Genomic surveillance uncovers a pandemic clonal lineage of the wheat blast fungus

Wheat, one of the most important food crops, is threatened by a blast disease pandemic. Here, we show that a clonal lineage of the wheat blast fungus recently spread to Asia and Africa following two independent introductions from South America. Through a combination of genome analyses and laboratory experiments, we show that the decade-old blast pandemic lineage can be controlled by the Rmg8 disease resistance gene and is sensitive to strobilurin fungicides. However, we also highlight the potential of the pandemic clone to evolve fungicide-insensitive variants and sexually recombine with African lineages. This underscores the urgent need for genomic surveillance to track and mitigate the spread of wheat blast outside of South America and to guide preemptive wheat breeding for blast resistance.

In light of the reviews, which you will find at the end of this email, we are pleased to offer you the opportunity to address the comments from the reviewers in a revision that we anticipate should not take you very long. We will then assess your revised manuscript and your response to the reviewers' comments with our Academic Editor aiming to avoid further rounds of peer-review, although might need to consult with the reviewers, depending on the nature of the revisions.
IMPORTANT: Please address the following: a) Please could you change the title to something a bit more declarative? We suggest something like: "Genomic surveillance identifies a pandemic clonal lineage of the wheat blast fungus" We have changed the title following your suggestion: "Genomic surveillance uncovers a pandemic clonal lineage of the wheat blast fungus". b) Please attend to the requests from the reviewers.
Below, you can find a point-by-point reply to the reviewers. c) Please ensure that you comply with our Data Policy requests; specifically, we need you to supply the numerical values underlying Figs 1ABC, 2ABC, 3C, 4A, S1, S2, S3, S4, S5, S6, S7AB, S8AB, S9, S10, S11, S12CD, S14 (some of these will be treefiles, I guess, rather than numbers), either as a supplementary data file or as a permanent DOI'd deposition. I note that some of these data may be in your GitHub deposition (https://github.com/Burbano-Lab/wheat-clonal-linage); if so, please clarify, and supply a DOI'd version (e.g. in Zenodo, Figshare, etc.) We You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.
2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted " file type. (2016) using the following criteria: (i) they had to be polymorphic among wheat blast strains alone to make the markers useful for diagnostics; (ii) the minor allele was >30%; (iii) SNPs were located on long exonic sequences (> 1500 bp without interrupting intron); (iv) long exons to contain only 2-4 SNPs. The last two criteria were to make sure that the assay will focus on SNPs surrounded by well-conserved stretches among wheat blast strains with an aim to reduce amplification failures due to polymorphism in the primer binding sites. The above criteria reduced the available genomic regions to 102 loci. We designed 102 PCR primer pairs to amplify~200 bp amplicon for each gene containing 100 bp flanking regions on each side of the SNP locus for multiplex amplicon sequencing. Trial sequencing runs with these primer pairs resulted in 84 primer pairs that consistently produced amplicons that can be sequenced to identify the correct nucleotides in those SNP loci in Bangladeshi wheat blast isolates (  We included a sentence in the main text highlighting these new results and the corresponding section in the supplementary materials: "We genotyped 537 M. oryzae samples from different geographical regions and hosts based on multiplex amplicon sequencing (MonsterPlex; see material and methods) (N=237) and publicly available genomes (N=351) (Fig. 1, Fig. S1, Table S1). Using the set of isolates from which we genotyped the 84 SNPs and also sequence their whole genomes, we showed that the set of 84 Monsterplex SNPs accurately reflects the patterns of genome-wide diversity and host specificity of the blast fungus. (Fig. SX)" "To show that the set of 84 SNPs are informative, we compared the genetic (Hamming) distances between each pair of blast isolates using the set of 84 SNPs, and the genome-wide SNPs. Our analysis revealed a correlation coefficient of 0.82 between the two sets of pairwise distances ( Figure S2C). To show that this correlation is robust and much higher than expected by chance, we repeated the calculation of pairwise distances for both datasets ( There are two steps in our analysis, in which we report missing/filtered positions. In both cases, we provided files where the missing/filtered positions can be identified. In the section of "Processing of short reads and variant calling", the missing/filtered positions can be identified in the Variant Call Format (VCF) files, whereas in the section "Phylogenetic analyses, estimation of evolutionary rates and divergence times", they are reported in the XML files that were the input for the BEAST phylogenetic analysis. All the above-mentioned files are available through our Github deposition (https://doi.org/10.5281/zenodo.7590238).
Finally, regarding the temporal analysis using BEAST: whilst HKY is less complex, this might not necessarily be the best fit for your data. A better approach would be to use bModelTest (under different combinations of demographic and clock scenarios) to assess which is the best model to use. To illustrate this point we repeated the BEAST analysis using GTR, a more complex substitution model, in which all exchangeability parameters (every single transition and transversion) and base frequency parameters need to be estimated. We retrieve the same tree topology and extremely similar evolutionary rates (HYK 95CI: 6.21e-7 -7.58e-7; GTR 95CI: 6.23e-7 -7.54e-7).
However, the effective sample sizes (ESS) of the evolutionary rate were lower using the GTR than in our HKY original analysis, 168 and 509, respectively. We have included this new analysis in the material and methods.
Demographic scenarios: as explained in the manuscript, to avoid making and testing multiple demographic history assumptions, we selected an Extended Bayesian Skyline approach (ref. 56). Our approach reduces the demographic assumptions and the computing resources needed for the analysis.
Molecular clock: we now include a new analysis, in which we ran BEAST2 analysis using the HKY substitution model but this time with a random local clock model, which considers whether each branch in the tree needs its own branch rate. We obtained evolutionary rates extremely similar to our original estimates (Strict clock 95CI: 6.21e-7 -7.58e-7; Random local clock: 95CI: 6.22e-7 -7.56e-7). We have included this new analysis in the manuscript.
All in all, our estimated evolutionary rates are robust to the choice of both substitution and clock models and make no assumptions regarding the demographic scenario.
We modified the main text: "We removed regions that disrupt the clonal pattern of inheritance ( Fig. S6-S78) [13] and tested for a correlation between genetic distances and collection years ( Fig. S9). We obtained rates ranging from 2.74e-7 to 7.59e-7 substitutions/site/year (Table S3), which were robust to the choice of both substitution and clock models (Table S4).
We modified the material and methods and include two additional supplementary tables: "To test the robustness of our evolutionary rate estimation to changes in substitution and clock models, we repeated the analysis using GTR in combination with a strict clock model, and HYK in combination with a random local clock model." The dates of emergence need confidence intervals (in the Results section), and the rates seem very small -how do these compare to other rates in fungi?
The confidence intervals for the emergence times and evolutionary rates can be found in the results section: "...we dated the emergence of the Asian and African sub-lineage to similar periods (2009-2012 and 2010-2015, respectively As we have shown in this response, our new BEAST analysis using GTR for the substitution rate shows the robustness of our estimated evolutionary rates. Thus, the combination of four independent MCMC chains in our analysis proves to be sufficient to reach convergence with high ESS values (> 200) for all the estimated parameters, therefore we see no need to increase the MCMC chain length. Additionally, in the manuscript, we used a BEAST-independent approach that estimated very similar evolutionary rates.
All in all -this is a very good piece of research and must represent a huge effort from all involved, and I'd be happy to see this published.
We thank the reviewer again for her comments and suggestions. We thank Reviewer 2 for his positive assessment of our manuscript.
We have now included the suggested reference in the introduction of our resistance, and mating ability. My only concern is about the claim of independence for the two introductions and my concern can be addressed by some straightforward additional analyses involving the current dataset and some additional discussion. I have put my comments below quotes from the manuscript, which are preceded by @.
We thank Reviewer 3 for their positive assessment of our manuscript and their suggestions.
@Wheat, the most important food crop, Some more information is needed. When I Google crops, rice comes up as the number one food crop.
To avoid a discussion about the criteria used to rank staple crops according to their importance (e.g. total cultivated area, human calorie intake, etc.), we have modified the sentence in the abstract: "Wheat, one of the most important food crops, is threatened by a blast disease pandemic."

@ following two independent introductions from South America
It is clear that there were two introductions, one to Bangladesh and one to Zambia, but it is not clear that they were independent. I see these possibilities. have enough data to distinguish among scenarios numbers 1 -3 by using trees constrained to reflect the scenarios and using likelihood ratio tests to see if your data are significantly more likely for one of them. My guess is that you won't be able to exclude both numbers 2 and 3. In which case, it would be best to consider the various scenarios and comment on how other data (historical records, etc.) support or refute each scenario.
The phylogenetic reconstruction we presented in the manuscript showed that Zambian (ZM) and Bangladeshi (BD) isolates are reciprocally monophyletic (with 100% bootstrap support and posterior probability of 1, for the Maximum Likelihood and the Bayesian reconstructions, respectively). This means that the two following statements are true: ZM isolates have a shared common ancestor that is more recent than the most recent common ancestor of any of the ZM isolates with any BD isolate; BD isolates have a shared common ancestor that is more recent than the most recent common ancestor of any of the BD isolates with any ZM isolate. Therefore, we favored scenario 1 in the manuscript: two independent introductions to Zambia and Bangladesh from South America. We thank the reviewer for pointing out scenario 4, which postulates that the introductions could have come from other unsampled location(s). We consider scenario 4 possible but unlikely. However, as the reviewer pointed out, scenario 4 is impossible to disprove. Therefore we have included a sentence in the main text acknowledging the possibility of scenario 4 (see quoted text below).
We think that the reciprocal monophyly of ZM and BD rules out by definition scenarios 2 and 3, introductions from Zambia to Bangladesh or from Bangladesh to Zambia, respectively. However, we have independently tested these two scenarios (2 and 3) against scenario 1 using constrained trees in IQ-TREE, as suggested by the reviewer. As expected, we found that the likelihood ratio test strongly favors scenario 1 over scenarios 2 and 3. We consider that this analysis is not relevant enough to be included in the manuscript.
Main text modification recognizing the possibility of scenario 4: "We conclude that the clonal lineage has spread to Asia and Africa through at least two independent introductions, most probably from South America, although we cannot totally rule out that the source population was located in an unsampled location outside of South America." @cause total crop failure (5) Are there data on the effect of wheat blast on the production of wheat in Bangladesh or Zambia? Reference 5 is about the synchrony of crop loss and not about wheat blast losses.
We have modified the statement about total crop failure and included a new sentence reporting the average yield loss caused by the wheat blast outbreak in, Bangladesh in 2016: "In wheat, yield losses caused by pests and diseases average over 20% (4).
Wheat is currently threatened by the expanding blast pandemic caused by the ascomycete fungus Magnaporthe oryzae (Syn. Pyricularia oryzae), a formidable and persistent menace to major grain cereals that could contribute to total crop failure (5)." "The disease first appeared in 1985 in Brazil but has been reported in Bangladesh and Zambia over the last years, causing, for instance, an average yield loss of 51% in the Bangladesh outbreak in 2016." @twin challenge of climate change and armed conflicts in major agricultural regions.
Is there a reference for climate change or armed conflict and wheat blast? Or, one about the two problems and agricultural production in general?
We have included a reference that predicts how climate change will impact future yield gains by increasing disease risk (Chaloner et al The authors cite reference 11 later in their ms regarding recombined regions of the wheat blast genome and this bioRxiv article also addresses the origin of the Zambian isolates. If there were any way to publish both this ms and that of ref.
11 back-to-back, science would be better served. Note that Latorre et al. goes well beyond reference 11 in testing virulence and drug resistance.
We do not see any conflict since we consider a bioRxiv preprint as a full See above comment about the independence of the introductions. It is also not clear to me that the introductions need to be single. Couldn't several closely related but genetically different strains have been involved in the introductions?
We addressed this point above and have modified the sentence in the main text accordingly. Notice the "at least" in the sentence, that includes now the possibility of multiple introductions from the same clonal lineage.
"We conclude that the clonal lineage has spread to Asia and Africa through at least two independent introductions, most probably from South America, although we cannot totally rule out that the source population was located in an unsampled location outside South America." @The B71 lineage shows reduced genetic diversity in comparison with South American isolates although incipient sub-structuring can be noted between Zambian and Bangladeshi clusters ( Fig. 2A, inset). Indeed, genetic variation in the Zambian clade might be due to structural rearrangements. Accordingly, we explained in the material and methods that "...the SNPs marked as putatively affected by recombination are preferentially located in genomic regions affected by structural variants, e.g.
presence/absence variants. Such variants will generate phylogenetic discordances due to differential reference bias among the B71 isolates." All the calculations of evolutionary rates and divergence times were carried out after removal of the non-clonal variation (e.g. the mini-Chromosome), as it can be seen if Fig. S8. This is explicitly stated multiple times in the Main Text and Material and Methods: Main text: "We leveraged the collection dates of M. oryzae clonal isolates to estimate their evolutionary rate. Before performing the tip-calibration analyses (12), we removed regions that disrupt the clonal pattern of inheritance ( Fig. S6-S8) (13) and tested for a correlation between genetic distances and collection years ( Fig. S9)."

Material and Methods:
We stated multiple times that we used only the recombination corrected tree.
"To test for the existence of a phylogenetic temporal signal (i.e. a positive correlation between sampling dates and genetic divergence) we used the recombination-corrected tree generated by ClonalFrameML (Fig. 2C)" "As input for BactDating, we used the recombination-corrected tree generated by ClonalFrameML." "From the alignment of the concatenated SNPs, we masked those that ClonalFramML marked as putatively recombining and used the masked alignment as input for the BEAST2 analyzes." @The B71 cluster is a clonal lineage.
This claim is solid.
We are glad to see that the reviewer agreed with the conclusion of our analysis.
@The B71 clonal lineage has recently expanded with independent introductions in Zambia and Bangladesh.
See comments above.
We addressed this point above.
@These findings are consistent with the conclusions of a recent independent study (11). The tree presented by Liu et al, which the reviewer considers as a good phylogenetic argument, shows exactly the same topology as we present in Figure 2C. For this same reason we favored the hypothesis of two independent introductions to Zambia and Bangladesh from South America.
We addressed above the comment regarding the other possible scenarios suggested by the reviewer and the likelihood ratio tests. @we removed regions that disrupt the clonal pattern of inheritance (Fig.

S6-S8)
More explanation is needed here. Having just told the reader that the spread is clonal, you need to let the reader know how there can be non-clonal elements in the genome. Figure S7 needs to be described in more detail. This figure also provides an argument in favor of independent introduction because the Zambian isolates and the Bolivian isolate seem to share a mini-chromosome that is absent in the Brazilian and Bangladeshian isolates. This point is also made in the Liu et al. bioRxiv article.
The full explanation is available in the Material and Methods: "Since the LD decay analyses revealed that the B71 pandemic lineage is a non-recombining clonal lineage, we hypothesized that the SNPs marked as putatively affected by recombination are preferentially located in genomic regions affected by structural variants, e.g. presence/absence variants. Such variants will generate phylogenetic discordances due to differential reference bias among the B71 isolates. To test this hypothesis we created full-genome alignments of the B71 and the 70-15 reference genomes using Minimap2 (50) and visualized the output with AliTV (51). Then, we overlapped the visual output with the SNPs putatively affected by recombination that were previously identified by ClonalFrameML (Fig. S9)." We have modified the sentence in the main text to convey the same message presented in the material and methods: "Before performing the tip-calibration analyses (12), we removed regions that disrupt the clonal pattern of inheritance ( Fig. S6-S7) (13). Such regions were preferentially located in genomic regions affected by structural variants (Fig.   S8), which results in phylogenetic discordances, even in clonal lineages, due to differential reference bias between B71 isolates and the reference genome." @We scanned the available genomes and found that AVR-Rmg8 is conserved in all 36 isolates of the B71 clonal lineage even though the other 35 isolates of the Triticum lineage carry four diverse AVR-Rmg8 virulent alleles (eII, eII', eII'', eII''') that fully or partially evade immunity ( Fig.   3A; Fig. S11) (16). B71 lineage isolates also lack the PWT4 effector, which is known to suppress AVR-Rmg8-elicited resistance (17).
This section of the ms is solid.
We are glad to see that the reviewer agreed with the conclusion of our analysis.
@These genome analyses predict that the B71 lineage isolates (AVR-Rmg8 positive, PWT4 negative) cannot infect wheat plants with the matching disease resistance gene Rmg8. To test this, we inoculated 14 B71 lineage isolates from Zambia and Bangladesh on wheat lines with and without the Rmg8 resistance gene ( Fig. 3B; Fig. S12). Unlike a distinct South American isolate, none of these pandemic isolates could infect Rmg8 wheat plants.
This section of the ms is solid.
We are glad to see that the reviewer agreed with the conclusion of our analysis.
@Remarkably, all but one Brazilian isolate (12.1.181) of the 36 B71 lineage genomes carry the G1243C allele and are predicted to be strobilurin sensitive.
We tested this by assaying B71 lineage isolates and found that all tested 30 isolates are strobilurin sensitive ( Fig. 4B-C; Fig.   S13).
From the legend to Figure 4b-c, it seems that only the Zambian isolates were tested for antifungal resistance.
We thank the reviewer for identifying a typo in the figure legend of Figure 4B-C.
The legend should have stated that all Zambian and Bangladeshi isolated were found to be "strobilurin susceptible". See amended figure legend below: "...indicating that all the Zambian and Bangladeshi isolates have the 'strobilurin susceptible' genotype as anticipated by their CYTB sequences." If strain 12.1.181 has the genome of a strobilurin resistant strain, and it is basal to the Bangladesh clade, are all Bangladesh strains resistant?
No. All the Bangladeshi strains we tested are strobilurin susceptible. We have changed the legend of Figure  All data appear to be available to the scientific community.
We are glad to see that the reviewer acknowledges our commitment to open and reproducible science.