Comparative Analysis of the Genomes of Two Field Isolates of the Rice Blast Fungus Magnaporthe oryzae

Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus.


Introduction
Rice blast caused by the heterothallic ascomycete Magnaporthe oryzae (also known as Pyricularia oryzae) is one of the most destructive diseases of rice, which is a staple for over half of the world's population. This pathogen also infects wheat and other small grains, and poses major threats to global food security [1,2]. In the past two decades, rice blast has been developed as a model system to study fungal-plant interactions. M. oryzae was the first plant pathogenic fungus to have its genome sequenced and made available to the public [3].
In most parts of the world, rice blast is controlled mainly with resistant cultivars. However, M. oryzae is notorious for its ability to overcome resistance based on race-specific R genes [4][5][6]. New cultivars often lose their resistance within a few years of introduction. Genetic variations in populations of the pathogen have been well-documented in many parts of the world [7,8]. M. oryzae isolates are also known to lose virulence and female fertility during laboratory manipulations [1] and large chunks of genomic DNA can be lost spontaneously during cultivation on artificial media, such as the deletion of over a 40 kb region containing the BUF1 locus [9]. The laboratory strain 70-15 of M. oryzae was generated by backcrossing a progeny from a cross between a rice isolate and a weeping love grass (Eragrostis curvula) isolate with the rice isolate Guy11 from French Guyana [10,11]. It has been used in many laboratories and was selected for genome sequencing [3]. Although most of the 70-15 genome should be from the rice pathogen after backcrossing with Guy11 several times, some weeping love grass pathogen sequences are likely retained. In comparison with Guy11, 70-15 is reduced in female fertility, conidiation, and virulence [12].
To determine the extent of genetic variation among isolates of M. oryzae, we sequenced two field isolates Y34 and P131. Y34 was isolated from Japonica rice in 1982 in Yunnan province, China, where both Indica and Japonica rice cultivars are cultivated [13,14]. Due to rich genetic diversity in rice cultivars and centuries of rice cultivation, highly diverse rice blast pathogen populations exist in Yunnan [15], and hence Y34 was chosen as a representative from this region for sequencing. The other field isolate, P131, originated from Japan where Japonica rice cultivars are dominant [16,17]. The isolates P131, Y34, and 70-15 differ in some cultural characteristics ( Figure S1). These three isolates also carry different avirulence genes and vary in aggressiveness toward different rice cultivars (Table S1). In comparison with 70-15, both Y34 and P131 have slightly larger genomes. The two Asian field isolates share a higher degree of similarity and contain over 200 genes that are absent in 70-15. Many pathogenesis-related genes showed evidence of exposure to diversifying selection when comparing either field isolate (P131 or Y34) to the laboratory strain . Functional characterization of randomly selected genes specific to the field isolates revealed that they play diverse roles, some of which affect virulence and others important for conidiation and vegetative growth. Furthermore, thousands of loci with transposon-like elements were identified in each genome. Many of them tend to be associated with the distribution of unique sequences and translocation of duplicated genes.

Genome sequencing and assembly
The genomes of P131 and Y34 were sequenced with the Sanger (2-fold) and 454 sequencing technologies (18-fold). The combined sequence reads for P131 and Y34 were 793.94 Mb and 843.92 Mb, representing about 20-and 21-fold genome sequence coverage, respectively ( Table 1). The 454 sequence reads were assembled into contigs and placed into scaffolds by the Newbler assembler with paired-end information from the Sanger reads. The assembled P131 genome consisted of 1,823 scaffolds with a combined length of 37.95 Mb. The N50 and maximum lengths of P131 scaffolds were 65 kb and 459 kb, respectively ( Table 1). The Y34 genome was assembled into 1,198 scaffolds with a combined length of 38.87 Mb. The N50 and maximum length of Y34 scaffolds were 106 kb and 708 kb, respectively (Table 1). Over 95% of the sequence reads were assembled into scaffolds .5 kb in both isolates. Approximately 33% and 51% of P131 and Y34 sequences, respectively, were assembled into scaffolds longer than 100 kb. In addition, the mitochondrial genomes of P131 and Y34 were also assembled ( Table 1). While P131 has an almost identical mitochondrial genome with 70-15, Y34 lacks two short fragments with a combined length shorter than 350 bp ( Figure S2).
Because repetitive sequences comprise approximately 10% of the genome of the laboratory strain 70-15 (version 6), repetitive sequences in the new assemblies were masked out with the RepeatMasker program for comparative analyses. The resulting ATCG bases after masking were 37.6 Mb, 38.2 Mb, and 37.5 Mb, respectively, for P131, Y34, and 70-15 (Table 1), indicating that the core genomes of these three isolates were not significantly different in size. However, because repetitive sequences and singletons smaller than 2 kb were not included in this analysis, it remains possible that the complete genomes of these three isolates vary in abundance of repetitive sequences and actually have greater size differences.
Scaffolds of P131 and Y34 were aligned with the assembled genome of 70-15 ( Figure 1). Overall, most of the 70-15 genome (96%) is also conserved in two field isolates. Only 0.45 Mb of sequence in 70-15 are absent from the two field isolates. In contrast, P131 and Y34 have 1.69 Mb and 2.56 Mb isolatespecific sequences, respectively. In general, isolate-specific sequences were dispersed throughout the genomes. For individual chromosomes, there are regions enriched for isolate-specific sequences ( Figure 1). Blocks of such sequences can be found at both ends of chromosome IV and at single ends of chromosomes I, II, III, V, and VI. In M. oryzae, genetic variation and avirulence genes are known to be enriched near the telomeres [18,19]. Comparative analysis of the genomes of these three M. oryzae isolates revealed that genes responsible for variations in virulence and adaptation to the environment may be concentrated at the chromosomal ends.
To locate and verify isolate-specific sequences in the field isolates, we used clamped homogenous electric fields (CHEF) gel electrophoresis to separate the chromosomes. Chromosome size polymorphisms were observed among these three isolates

Author Summary
Magnaporthe oryzae is the causal agent of rice blast that is mainly controlled with resistance cultivars. However, genetic variations in the pathogen often lead to overcoming R gene-mediated resistance in rice cultivars. In this study we sequenced two field isolates from China and Japan. In comparison with the laboratory strain that was previously sequenced, the field isolates have a similar genome size and overall genome structure. However, they have slightly more genes and contain a number of expanded gene families that are likely involved in plantfungal interactions. Each of the isolates has specific genes, some of which affect virulence and some others are important for asexual development. The three strains differ noticeably in the distribution of transposon-like elements. Many of the transposable elements tend to be associated with isolate-specific or duplicated sequences. This study revealed genetic factors involved in genome variation of the rice blast fungus.
( Figure 2A). Whereas chromosome VII (the smallest chromosome) in 70-15 was estimated to be 4.3 Mb, the smallest chromosomes in Y34 and P131 were approximately 1.8 Mb and 2.5 Mb, respectively. When one P131-specific sequence, P131_scaffold00006_11, which was not mapped on the chromosome alignment was used as the probe, an aggregate band of chromosomes larger than 6.0 Mb was detected in P131 but not in Y34 nor in 70-15 ( Figure 2B). When a similar blot was probed with an Y34-specific sequence, Y34_scaffold00824_1665, only the smallest chromosome of Y34 was hybridized ( Figure 2B). These findings confirm that the field isolates contain isolate-specific DNA.

Improvement of the 70-15 assembly
Because the assembly of P131 or Y34 relied on the alignment with the 70-15 genome, it was not possible to accurately map P131 and Y34 sequences that were absent from the 70-15 genome assembly. However, the P131 and Y34 sequences could be used to fill the sequence gaps ($50 bp) in the 70-15 assembly. We identified the end sequences of the contigs or scaffolds flanking these gaps. After filtering out simple repeats, these sequences were used to search against the assembled P131 and Y34 sequences. If both upstream and downstream flanking sequences of one gap were mapped on the same contig in either P131 or Y34, the in-between sequences were used to fill the gaps of 70-15. A total of 55 gaps were filled with sequences from P131 or Y34 (Table 2). Among them, 35 gaps had the sequences present in both P131 and Y34 ( Table 2). The total gap sequence filled in the 70-15 genome was 25.3 kb. We randomly selected 18 of these filled gaps of the 70-15 genome for verification. All of them were confirmed in 70-15 by PCR ( Figure S3).

Gene pool analysis and improvement of gene prediction in 70-15
The number of predicted genes in the masked genomes of P131, Y34, and 70-15 was 12,714, 12,862, and 12,440 (Table 1), respectively. The average length of predicted proteins was over 400 amino acids. Y34 apparently has the largest genome size and gene content, which may contribute to its adaptation to the   environment or to rice cultivars grown in Yunnan province, China. To identify the gene pool of these three strains, the predicted amino acid sequences of the total gene set from each isolate were used to search against the nucleotide sequences of other two isolates by TBLASTN. The large majority of M. oryzae genes (12,375 from P131, 12,431 from Y34, and 12,214 genes from 70-15) share sequence homology in pair-wise comparisons. Among these genes constituting the 'core' gene set of the M. oryzae genome ( Figure 3A), 11.3% had no orthologous sequences in other organisms. Moreover, approximately 10.1% of these M. oryzaespecific genes were predicted to encode secreted proteins.
To improve gene annotation in 70-15, we identified the genes that were common to all three isolates and had similar sizes (difference less than 1%) between Y34 and P131 but were 50 amino acids or 20% longer or shorter in 70-15. A total of 340 genes meeting these criteria were then manually annotated. Among them, 135 genes in 70-15 had incorrect intron annota-tions. The number of genes with inaccurate start or stop codon predictions was 259 or 15, respectively (Table S2).
The number of genes shared only by the two field isolates (198 from P131 and 220 from Y34) was approximately twice that of those shared by either P131 or Y34 with 70-15 ( Figure 3A), implying that the two Asian field isolates share a higher degree of similarity and with about 200 genes that are absent in 70-15. For isolate-specific genes, we found that 51, 136, and 71 genes were unique to P131, Y34, and 70-15, respectively ( Figure 3A). All the genes randomly selected for verification were confirmed by PCR to be either shared by two isolates or unique to one specific isolate ( Figure S4). As found in 70-15, isolates P131 and Y34 also had various copies of DNA helicase Q genes and LTR elements towards the chromosomal ends [3].
For the genes common in Y34 and P131 but absent in the automated annotation of 70-15, we used their amino acid sequences to search the 70-15 scaffolds. The resulting homologous sequences of 70-15 were then used to search against M. oryzae ESTs deposited in GenBank. A total of 81 candidate genes were identified in the 70-15 genome and ESTs (Table S3). Seventy-six of them encoded hypothetical proteins with no known homologs in GenBank. Some of these M. oryzae specific genes may be important for the virulence or fitness of the pathogen because all three isolates have these genes. The other five genes had orthologous sequences of unknown functions in Sordariomycetes but were absent in lower fungi, such as Zygomycetes and Saccharomycetales.
To further analyze genetic relatedness of these three isolates, the 10,074 clusters containing one protein from one isolate were selected and the resulting individual protein sequences from each isolate were combined for distance analysis with PHYLIP. As shown in Figure 3B, the two field isolates have a closer relationship to each other than with the laboratory strain 70-15.

Isolate-specific genes
Based on analyses of gene content, 51, 136, and 71 genes, respectively, were unique to P131, Y34, and 70-15. Overall, 13% of these isolate-specific genes encoded secreted proteins and 46% of them had no significant homolog in GenBank (Table S4). RT-PCR analyses were performed with 10 and 14 randomly selected P131-and Y34-specific genes, respectively. All the selected genes were confirmed to be expressed in mycelia ( Figure S5). While most of the isolate-specific genes were dispersed through the genome, some were located within clusters ( Figure 1; Table S4). For example, scaffolds 00875 and 01112 of Y34 contained five and eight of the Y34-specific genes, respectively. In P131, there were three isolate-specific genes each on scaffolds P131_scaffold01777 and P131_scaffold01784. Moreover, many of the isolate-specific genes with known chromosomal positions in P131 and Y34 were located near the chromosomal ends (within 500 kb), which is consistent with the distribution tendency of isolate-specific sequences ( Figure 1).
To determine the biological function of these isolate-specific genes, nine Y34-specific genes and three P131-specific genes were selected for functional characterization. For majority of them, the resulting gene deletion mutants had no obvious changes in colony growth, conidiation, or virulence ( Figure S6). Their functions in plant infection may be redundant or too minor to be detected under laboratory conditions. However, deletion of one P131 unique gene, P131_scaffold00208-2, resulted in a reduction in virulence in infection assays with seedlings of a susceptible rice cultivar ( Figure 4A). Deletion of another P131 unique gene, P131_scaffold01777-7, resulted in approximately 10% growth reduction on oatmeal tomato agar plates ( Figure 4B). Proteins encoded by these two P131-unique genes were predicted to be localized in the nucleus. Homologous sequences of these two genes were not found in other sequenced fungal species. Moreover, deletion of one Y34 unique gene, Y34_scaffold00875-3, resulted in approximately 36% reduction in conidiation ( Figure 4C). Interestingly, deletion of one Y34-unique gene encoding a putative G protein-coupled receptor (GPCR)-like integral membrane protein with six transmembrane domains resulted in changes in pathogenicity on a rice cultivar carrying the Pi-7 R gene, suggesting that this Y34-unique gene might be the potential AVR Pi-7 gene (data not shown).
Among the genes shared by both field isolates P131 and Y34 but absent in 70-15, 19% had signal peptides for secretion and 12% had transmembrane domains ( Figure 3; Table S5). About 70% of these genes had no functional annotation. Strain 70-15 may have lost these genes during the initial genetic cross or after generations of cultivation in the laboratory. For example, a gene encoding a CFEM-containing GPCR-like protein [20] and the avirulence gene AVR Pi-a [21] were present in the field isolates P131 and Y34 but not found in 70-15.

Duplicated genomic sequences
Duplication is one of the major mechanisms for evolutionary innovation. The total duplicated genomic DNA fragments (longer than 500 bp and greater than 90% identity) were 289 kb, 385 kb, and 825 kb in P131, Y34, and 70-15, respectively. A total of 16, 20, and 155 predicted genes in P131, Y34, and 70-15, respectively, were located in these duplicated sequences (Table S6).
Although duplicated DNA sequences were detected genomewide in all three isolates, in general chromosomes II, IV, V, and VII had more duplicated DNA sequences than other chromosomes ( Figure 5A). For individual chromosomes, the end regions tend to contain more duplicated DNA sequences than the central region. Comparative analysis indicated that P131, Y34, and 70-15 all contained isolate-specific duplicated regions ( Figure 5A). However, the laboratory strain 70-15 had significantly more duplicated genes, including the AVR gene PWL2 [22] (Table S6). Other duplicated genes with known functions include LPS glycosyltransferases, MFS transporters, sugar transporters, and carboxypeptidases. Both intra-and inter-chromosomal duplications were observed, but more inter-chromosomal duplications were apparent, and only a small portion of duplication events were conserved in all three isolates ( Figure 5A).

Gene families
To identify gene families, the entire set of the predicted proteins from all three isolates were clustered with the OrthoMCL program. A total of 38,016 proteins were grouped into 14,189 clusters with each cluster representing a group of putative orthologs. Among these clusters, 195 gene families were identified with more than one member in at least one isolate ( Figure 5B), suggesting that 1.37% of the M. oryzae genes may have been evolved by gene family expansion. Among 45 clustered loci duplicated equally in each isolate, 38, 6, and 1 gene loci were duplicated between two, three or four times, respectively, per isolate (Table S7). These gene families might have existed before the divergence of the three isolates. The majority of these gene families were predicted to be involved in synthesis and transport of nutrition and secondary metabolites, suggesting that they may be related to plant infection (Table S7). There were 87 clustered loci duplicated at different frequencies in three isolates (Table S8). Most of these gene families (61 out of 87) contained duplicated genes in only one isolate, and 17 gene families contained gene loci duplicated at least three times in one or more isolates (Table  S8), suggesting that they have been expanded or contracted in different strains, possibly during environmental adaptations. For example, one putative calcium P-type ATPase gene was duplicated three times in P131 and Y34, and twice in 70-15. Members of this gene family have been demonstrated to be required for disease development and induction of host resistance [23,24].
For loci duplicated in two isolates but absent in the third one, there were eight in P131 and Y34, five in P131 and 70-15, and twelve in Y34 and 70-15 ( Figure 5B; Table S9). Most of these expanded gene families had unknown functions. To confirm the duplication events that were unique to the two field isolates, three genes were selected by Southern blot analysis. All of them were confirmed to be specifically duplicated in P131 and Y34 but not in 70-15 ( Figure S7). There were seven, thirteen, and eighteen gene families specifically expanded in P131, Y34, and 70-15, respectively ( Figure 5B; Table S10). Most of these isolatespecific gene families contained two or three duplicated members that had unknown functions or no known homologs in GenBank.  Asynonymous to synonymous nucleotide substitution rate (Ka/Ks) analysis To analyze asynonymous and synonymous nucleotide substitutions, we first identified and removed orthologous genes with large deletions or insertions in any of the isolates from the list of common genes. In total, 9,184 highly conserved orthologs were used to identify nucleotide substitution events. Among them, 7,569 genes had neither synonymous nor asynonymous nucleotide substitution in pair-wise comparisons, indicating that most of the genes were well-conserved among different isolates. Only 428 genes had nucleotide substitutions between P131 and Y34, and 1,651 genes had nucleotide substitutions between 70-15 and P131 or Y34, further indicating that the field isolates had closer relationship with each other than with the laboratory strain. Genes with substitutions in the 70-15 versus P131/Y34 comparison could be categorized into four groups: 414 genes only with synonymous nucleotide substitutions, 697 genes only with asynonymous nucleotide substitutions, 124 genes with Ka/Ks,1, and 6 genes with Ka/Ks.1.
Overall, similar numbers of genes identical between Y34 and P131 but with nucleotide variations in 70-15 were thought to have undergone diversifying versus purifying selections. However, several functional categories of genes, such as those involved in cellular responses to stimuli and organophosphate metabolisms, had more members exhibiting diversifying selection in the two field isolates (Table S11). Several of the genes underwent diversifying selection in the 70-15 versus P131/Y34 comparison (Table S12), including ATG4, HEX1, MCK1, MoSNF1, PTH2, and RGS1, which are known virulence factors in M. oryzae [25][26][27][28][29][30]. Three of them encode putative CFEM-domain receptors that may be involved in recognizing different environmental and plant signals (Table S12).

Repetitive sequences and transposable elements
Repetitive sequences were masked by Newbler for assembling 454 sequence data of P131 and Y34. To compare repetitive sequences of these two isolates, we assembled the Sanger reads of P131 and Y34 (approximately 2-fold genome coverage) and found that 10.8%, 10.3%, and 10.6% of the 70-15, P131, and Y34 genomes, respectively, were repetitive sequences, indicating that the abundance of repetitive sequences is similar among these three isolates. Transposable elements (TE) and their insertion sites (flanking sequences) were identified by RepeatMasker. Although the exact copy numbers vary, both field isolates contained all classes of transposable elements identified in 70-15 (Table 3). In general, 70-15 has more members of the LINE, Maggy, and  (Table 3). However, none of them was unique to the field isolates. While clusters 1, 4, 5, 6, and 7 were much more abundant in the field isolates, 70-15 had more copies of the cluster 2 repetitive elements (Table 3).
In comparison with 70-15, the two field isolates were more similar in the distribution pattern of repetitive sequences (Figure 1 and 6A). While chromosomal ends tend to have more repetitive sequences, all three isolates had much reduced numbers of TEs in the gene-rich regions of chromosomes III, V, and VI ( Figure 6A). For the TEs that could be assembled into the genome sequences, approximately 27% of them had the same locations in all three isolates by comparison of their flanking sequences ( Figure 6B). Y34 had more TEs with unique chromosomal positions (1,061) than P131 (830) or 70-15 (976). In addition to the 603 locations of TEs conserved among the three strains, Y34 and P131 also shared 281 TEs with the same chromosomal locations, which was fewer than the 377 between 70-15 and Y34 or the 341 between 70-15 and P131 ( Figure 6B). While over two-thirds of the members of some TEs, including Occan, had conserved genomic locations, TEs such as Retro5 and Maggy differed significantly in their chromosomal positions between Y34 and 70-15. Similar results were obtained with the P131 and 70-15 comparison (Table 3). A total of 41.1% and 46.0% of TEs in 70-15 and P131, respectively, had conserved genomic locations. The Pot3, Maggy, Retro5, and Retro7 elements had the highest variation in chromosomal positions between 70-15 and P131.
We also analyzed the impact of TEs on the genome evolution by comparing two-fold coverage Sanger data of P131 and Y34 with the 70-15 assembly. A total of 35, 38, and 116 genes were disrupted by the insertion of TEs in P131, Y34, and 70-15, respectively (Table S13, S14, S15). Over 50% of the gene disruption events were caused by TEs belonging to MGL, Mg-SINE, or Pot2/Pot4. Strain 70-15 had a number of genes disrupted by cluster 7, cluster 9, Occan, and RETRO5 elements, which were not observed in P131 or Y34 ( Figure 6C). Some of these genes may have been disrupted by transposition events occurring during generations of cultivation under laboratory conditions, and these genes may play roles in plant infection or survival in the field isolates but were not required for the laboratory isolate. In comparison with 70-15, the field isolates P131 and Y34 had more genes disrupted by SINE ( Figure 6C), which may indicate that these SINE elements were more active in these two field isolates.
Among all the genes disrupted by TEs in three isolates, only approximately one third of them have known functions based on their orthologs in GenBank, and most of them are involved in protein metabolism, transportation, transcription, or lipid metabolism. The majority of the TE-disrupted genes encode hypothetical proteins with unknown functions. Interestingly, 23.8% of them contained putative signal peptide sequences, which is significantly higher than the average percentage of predicted extracellular proteins in the genomes of these three strains (Table S13, S14, S15). Some of them may function as effectors involved in fungalplant interactions, such as AVR Pi-ta1 in 70-15 (Table S13). In addition, 14.7%, 14.2% and 15.8% of the TE-disrupted genes in 70-15, P131, and Y34, respectively, encoded proteins with putative nuclear localization sequences.
Intriguingly, the regions containing isolate-specific sequences or duplicated genes families were often near areas with high frequency of TEs ( Figure S8). In 70-15, several TEs were found within 1.0 kb from 23 duplicated genes families, including the avirulence gene PWL2 (Table S16) although many of these duplicated sequences were not closely linked or located on different chromosomes. Taken together, it is likely that the transposition events of TEs might be related to translocation of duplicated DNA fragments and presence of isolate-unique sequences in these three strains.

Discussion
In a number of eukaryotic organisms, comparative analysis of multiple genomes of the same species has been used to improve assembly and annotation and to identify genome variations [31][32][33][34]. The rice blast fungus is well-known for its natural genetic variation [1,2]. In this study, we sequenced two field isolates of M. oryzae from Asia. Genome analysis indicated that these two field isolates are more closely related to each other than to 70-15, which is a laboratory strain derived from three backcrosses of rice pathogen Guy11 with a progeny of a cross involving a weeping love grass pathogen, and maintained for many years under laboratory conditions. The overall genome content and composition are similar among these three isolates, but the genomes of P131 and Y34 with only A/C/T/G and no N's were slightly larger than that of 70-15.
Although the 70-15 genome has been updated several times, it still has many gaps (www.broadinstitute.org/annotation/genome/ magnaporthe_grisea). In this study, a total of 55 gaps of the 70-15 genome (version 6) were filled in with sequences from P131 and Y34, and the results were validated by PCR analyses of 70-15. This number of putative filled gaps with sequences from two isolates may seem low, but because of the short read length, the threshold set may have been too stringent. For 35 gaps, they were filled with consensus sequences found in both field isolates. For the gaps with sequences only available in either Y34 or P131, the filling sequence for 70-15 was less certain, but of high probability because the overall nucleotide sequence identity between 70-15 with P131 or Y34 was over 98%. Besides improving the genome assembly, the sequences of P131 and Y34 were used to improve the annotation of 70-15. We identified 81 genes that were not predicted in the automated annotation of the 70-15 genome sequence, and none of them were related to the sequence gaps. In addition, we identified potential annotation errors in 340 predicted genes of 70-15. Most of them were related to the problems with the prediction of the boundaries of introns and start or stop codons.
Our study revealed that each M. oryzae isolate had some unique genomic DNA sequences. Because genome sequences of P131 and Y34 were aligned with that of 70-15, it was impossible to locate most of the sequences unique to Y34 and P131 onto specific chromosomes or chromosomal regions. However, sequences unique to 70-15 were distributed over all seven chromosomes. Because 70-15 was derived from three backcrosses of rice pathogen Guy11 with a progeny of a cross involving a weeping love grass pathogen, we expected that a small portion of its genome was from the weeping love grass pathogen. The isolates Y34, P131, and 70-15 had 136, 51, and 71 unique genes, respectively. Therefore, less than 1% of the predicted genes were unique to each isolate and these genes play diverse roles, some of which might possibly contribute to the specificity of individual isolates. Some of the isolate-specific genes were clustered, suggesting that isolate-specific DNA fragments might be gained or lost during evolution. The P131-specific gene P131_scaf-fold00208-2 encoded a hypothetical protein without known homologs in other fungi. Deletion of this gene resulted in reduced virulence toward rice plants. Because it might be involved in plant infection, P131_scaffold00208-2 may play an isolate-specific role in suppressing or overcoming plant defense responses. These results suggest that some of the field isolate-specific genes may play important roles in plant infection.
In all three M. oryzae isolates, most of the duplicated genes are functionally unknown. Duplicated sequences are distributed all over seven chromosomes and appear to be enriched in the telomeric regions. For the duplicated genes with known functions, many of them are predicted to be involved in primary and secondary metabolism and interactions with the host (such as cutinases and Avr proteins), which is consistent with earlier observations with 70-15 [3]. Interestingly, several gene families involved in synthesis and transport of nutrients and secondary metabolites were expanded with different frequencies in these three isolates. Some of these duplicated genes may contribute to the adaption of M. oryzae to different environmental conditions. Among the genes that had undergone diversifying selection in Y34 and P131 in comparison with 70-15, a number of them are known to be important for virulence, suggesting that such genes may have been under strong selection pressure in their natural field environments. There were six genes under positive selection in the two field isolates compared to 70-15. Two of them encoded two hypothetical proteins, a serine/threonine protein kinase, an acyltransferase, a putative catalytic domain of diacylglycerol kinase, and an aspartic-type endopeptidase. Three of them are located on chromosome I. In contrast, there were no genes showing positive selection in the comparison between the field isolates.
Because sexual reproduction has not been observed in the field, it is possible that translocations of the repetitive sequences may be one of the major sources for genome variation and rapid adaption to different host and environmental conditions. Consistent with this hypothesis, over 10% of the genome sequences were found to be repetitive sequences. In addition to TEs that have been identified in previous studies [3], nine new clusters of repetitive sequences were identified in all three M. oryzae strains in this study. Most of these TEs have different copy numbers in different isolates. Strikingly, among thousands of TE loci, less than 30% of them were conserved among these isolates, suggesting active transposition of these TEs in M. oryzae. Moreover, approximately 200 genes were totally disrupted by TEs in these three strains, and approximately 40% of them encoded extracellular or nuclear proteins, suggesting that transpositions of TEs may contribute to variations in host-microbe interactions and transcriptional regulation. Interestingly, TEs tended to be found near isolate-specific sequences and duplicated DNA fragments. It is possible that translocation of TEs is important for gain or loss of isolate-specific sequences and gene duplication events.
Overall, our results indicate that gain or loss of unique genes, duplications, gene family expansions, and translocations of TEs can be important factors for genome variation in the rice blast fungus. Among these factors, translocation of TEs may be the most important one because of its association with gene duplication and isolate-specific sequences. There are reports on comparative genomic analyses of plant pathogenic oomycetes and fungi, such as Phytophthora and Fusarium species [35,36]. However, to our knowledge, this study is the first on comparative analysis of the field and laboratory strains of a plant pathogenic fungus, and this can give insights into the genome variations of the fungus under different environments.

Sequencing and genome assembly
For Sanger sequencing, genomic libraries with insertion size of 1.5 kb to 3.5 kb were constructed and sequenced at the Beijing Genomic Institute (BGI, Beijing, China). These two isolates were also sequenced with the GS-FLX and GS-FLX Titanium 454 platforms [37] at BGI that generated reads with an average length of 240-and 380-bp, respectively. Reads from Sanger and 454 sequencing were placed into scaffolds using the Newbler assembler (version 1.1.02.15, Roche).
The M. oryzae 70-15 genome sequence version 6 was downloaded from the Broad Institute (www.broad.mit.edu/annotation/ genome/magnaporthe_grisea). The repetitive sequences in the assembled genomes of laboratory strain 70-15 and the field isolates P131 and Y34 were masked with RepeatMasker (Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-3.0 at http:// repeatmasker.org). Masked genome sequences of the three M. oryzae isolates were compared with the MUMMER package [38] to construct chromosome sequences for P131 and Y34 based on 70-15 data. Genomic sequences with nucleotide identity over 92% were considered to be conserved among different isolates.

Gene prediction and annotation
De novo gene prediction of the P131 and Y34 genome sequences was performed with FGENESH [39], which was trained with 79 gene models of M. oryzae (kindly provided by Prof. Zhen Su at China Agricultural University). The tRNA genes were identified by tRNAscan [40]. Gene functions were predicted by comparison with the NCBI NR protein database (http://www.ncbi.nlm.nih. gov/) and the Pfam database [41]. InterPro [42] was used for gene ontology annotations. Membrane and sub-cellular localization were predicted by TMHMM 2.0 [43], SignalP3.0 [44], and WoLF PSORT [45].

Gene pool and distance analyses
Nucleotide sequences of the predicted genes of P131, Y34, or 70-15 were compared separately with genomic sequences of the other two isolates with TBLASTN [46]. Homologous genes with sequence identities of 100%, 80-100%, and 50-80% were defined as identical, similar, and divergent, respectively, while those below 50% were considered non-homologous. Sequences of genes unique to the field isolates were also queried against the unassembled reads of 70-15. Orthologous proteins were clustered with OrthoMCL [47]. Only the clusters containing one protein from each isolate were selected for distance analysis. Individual protein sequences from three isolates were concatenated and aligned with T-Coffee [48], and a distance matrix was calculated with PROTDIST from the PHYLIP package [49]. Finally, a neighbor-joining tree was constructed with NEIGH-BOR from the PHYLIP package.

Ka/Ks analysis
The coding sequences of orthologous genes conserved in all three isolates were aligned with ClustalW [50] to detect large deletions (.12-bp), frame shifts, and null mutations. Orthologous genes without large deletions, frame shifts, or null mutations in the open read frame were analyzed for Ks and Ka with the YN00 program in the PAML package [51].

Analysis of repetitive sequences and transposable elements
The Sanger reads of P131 and Y34 were assembled with RePS [52] and analyzed for transposable elements with RepeatMasker. New repetitive elements were identified by RepeatScout [53]. For each transposable element (TE) identified in P131 or Y34, its flanking sequences of 30 to 100 bp were extracted and used to search against the 70-15 genome with Standalone BLASTN (evalue,10 25 ). Each TE and its corresponding region in 70-15 genome were aligned with BLAST2seq to assess whether it was conserved. To search for genes disrupted by TEs, unique flanking sequences of TEs in P131 or Y34 were used to search against 70-15 genes (e-value,10 220 ). The search results were removed if more than one hit was found. Similar analyses were performed with P131 and Y34.

Culture conditions and plant infection assays
The wild-type and mutant strains of 70-15, P131, and Y34 were cultured at 25uC on oatmeal tomato agar (OTA) plates and conidiation assessed [17]. Mycelia collected from two-day-old cultures in complete media (CM) shaken at 150 rpm were used for extraction of fungal DNA and protoplasts. Media were supplemented with 250 mg/ml hygromycin B (Roche, USA) or 400 mg/ ml neomycin (Amresco, USA) to select hygromycin-resistant or neomycin-resistant transformants. Four-week-old seedlings of monogenic rice cultivars (Table S1) and eight-day-old seedlings of barley cultivar 'E8' were inoculated as previously described [17,54]. Lesion development was examined 5-7 days after inoculation.

CHEF electrophoresis
Chromosome-size DNA were prepared with protoplasts isolated from vegetative hyphae as previously described [55,56], and separated on 0.65% Megabase agarose (Bio-Rad, USA) gels with a Bio-Rad DR III system with switching intervals of 60 min for 48 h, 55 min for 72 h, 45 min for 72 h, and 35 min for 72 h at 1.5 V/cm. Chromosomal DNA of Schizosaccharomyces pombe and Hansenula Wingei (Bio-Rad, USA) were used as the molecular weight markers.