Characterization of Resistance Gene Analogues (RGAs) in Apple (Malus × domestica Borkh.) and Their Evolutionary History of the Rosaceae Family

The family of resistance gene analogues (RGAs) with a nucleotide-binding site (NBS) domain accounts for the largest number of disease resistance genes and is one of the largest gene families in plants. We have identified 868 RGAs in the genome of the apple (Malus × domestica Borkh.) cultivar ‘Golden Delicious’. This represents 1.51% of the total number of predicted genes for this cultivar. Several evolutionary features are pronounced in M. domestica, including a high fraction (80%) of RGAs occurring in clusters. This suggests frequent tandem duplication and ectopic translocation events. Of the identified RGAs, 56% are located preferentially on six chromosomes (Chr 2, 7, 8, 10, 11, and 15), and 25% are located on Chr 2. TIR-NBS and non-TIR-NBS classes of RGAs are primarily exclusive of different chromosomes, and 99% of non-TIR-NBS RGAs are located on Chr 11. A phylogenetic reconstruction was conducted to study the evolution of RGAs in the Rosaceae family. More than 1400 RGAs were identified in six species based on their NBS domain, and a neighbor-joining analysis was used to reconstruct the phylogenetic relationships among the protein sequences. Specific phylogenetic clades were found for RGAs of Malus, Fragaria, and Rosa, indicating genus-specific evolution of resistance genes. However, strikingly similar RGAs were shared in Malus, Pyrus, and Prunus, indicating high conservation of specific RGAs and suggesting a monophyletic origin of these three genera.


Introduction
When a genome sequence is available, the analysis of large gene families can contribute to the understanding of major events responsible for molecular evolution. This is the case for resistance gene analogues (RGAs) with a nucleotide-binding site (NBS) domain [1][2][3][4][5]. The NBS domain is part of the larger NB-ARC domain that hydrolyses ATP and GTP and functions as a molecular switch for signal transduction after pathogen recognition [6]. Many resistance proteins encoded by RGAs contain a leucine-rich repeat (LRR) domain [7,8], involved in proteinprotein interactions and in pathogen recognitions [9]. Proteins codified by RGAs can be further classified according to the presence of the toll/interleukin-1 receptor (TIR) or other Nterminal features, such as coiled-coil (CC) and BED finger (Bed) [3,10,11]. The N-terminal features are involved in downstream specificity and signaling regulation [12]. RGAs evolved for pathogen recognition and frequently matched with specific pathogen avirulence factors to trigger signal transduction cascades and defense responses [9].
In this study, cluster organization of RGAs and their distribution across chromosomes were analyzed in terms of recent duplication of the apple genome. In addition, the phylogenesis of RGAs from the domesticated and wild Malus species, including also other Rosaceae, P. trichocarpa, and V. vinifera RGAs, was considered to clarify the evolutionary history of apple and its related species.
The 868 RGAs accounted for 1.51% of M. domestica predicted genes, a percentage slightly higher than that in other plant genomes ( Table 1). The density of RGAs per Mb was similar for M. domestica and other genomes with the exception of Z. mays, C. papaya, C. sativus, and S. bicolor.
The mean exon number detected in apple RGAs was 4.51, and the number of exons of CNL class (3.46) was lower than the number of TNL class (6.41; P,0.001). Thus, the number of exons in RGAs of M. domestica was consistent with the number in A. thaliana and B. rapa but higher than the number in V. vinifera, P. trichocarpa, and O. sativa (Table 1). Moreover, 23% of CNL RGAs are encoded by a single exon, while all TNL have at least three exons.

Phylogeny of RGAs in Domesticated and Wild Malus Species
Twenty-four wild Malus species (Table S2) were considered, and PCR fragments were amplified from germplasm. After sequence comparison, unique fragments were translated in to amino acid sequences (Table S1), and 115 of them matched NBS sequences of known resistance proteins with an E-value lower than 1E 210 . Phylogenetic analysis indicated that RGAs of wild Malus species grouped mainly in clades that included sequences of the domesticated apple ( Figure 2). A significant fraction of phylogenetic clades contained only a few RGAs, probably due to the short sequence of the NBS domain used for this analysis. Some clades consisted mainly of sequences from wild species and contained only few RGAs of the domesticated apple.

Phylogeny of RGAs among Rosaceae Species
A total of 693 Rosaceae RGA sequences at NCBI were downloaded (75 from Rubus, 293 from Prunus, 16 from Fragaria, 125 from Rosa, 34 from Pyrus, and 150 public sequences from Malus species) and compared to the 868 RGAs of M. domestica and the 210 sequences obtained from wild Malus species (Table S1). In RGAs assigned to chromosomes (Chr) are represented by dots with colors corresponding to major phylogenetic clades. The size of each chromosome is given in megabase (Mb, on the left side), whereas the markers of the genetic map are shown in black (on the right side). Resistance-related genes different from RGAs are shown in red. Known quantitative trait loci (QTL) for resistance to apple scab (brown), powdery mildew (green), aphids (light blue), fire blight (red) and rust mite (blue) are shown by bars on the left side of chromosomes [67][68][69][70][71][72][73], together with the major resistance genes to apple scab (Vd3 and Rvi genes) [74][75][76], powdery mildew (Pl1) [77], and aphids (Sd-1, Sd-2, Er1, Er2) [78,79]. doi:10.1371/journal.pone.0083844.g001 Table 2. Organization and distribution of resistance gene analogues (RGAs) with a nucleotide-binding site (NBS) domain in the apple (Malus 6 domestica) chromosomes. the phylogenetic tree of Rosaceae species (Figure 3), 49 clades were specific to the genus Malus, and included sequences from two or more Malus species. Most of the remaining clades were represented by RGAs from two or more Rosaceae genera. In particular, three clades comprised RGAs of Malus, Pyrus, and Prunus, indicating a monophyletic origin of the three genera and strong conservation of some RGA sequences in these plants. Few clades were represented by non-apple RGAs, and clades specific to Fragaria or Rosa were also present.
Comparison of RGAs among Malus 6 domestica, Populus trichocarpa, and Vitis vinifera RGA sequences can also be compared across different plant families, and a phylogenetic tree of RGAs from M. domestica, wild Malus species, V. vinifera, and P. trichocarpa (Table S1) was obtained ( Figure 4). Several clades included sequences from two or three species, and two major clades, named Md1 and Md2, comprised only sequences of M. domestica ( Figure 4). However, sequences of the Md1 clade were grouped in three subclades in the phylogenetic tree of RGAs from Rosaceae species ( Figure S2). RGAs of subclades Md1 sc2 and Md1 sc3 did not show similarity with any Rosaceae RGAs, whereas sequences of Md1 subclade 1 (Md1 sc1) shared significant similarity with four RGAs of Pyrus ( Figure S2). Clade Md2 included one and two RGAs from Rubus and Rosa, respectively. Most of the RGAs of the clade Md2 are located on Chr 2, 3, 7, 11, 12, and 15.

Duplication of RGAs in Malus 6 Domestica
To study the recent duplication of RGAs in the M. domestica genome, Ks values were determined, and results from recent gene duplications were highlighted ( Figure S3). Links among different RGAs helped to describe the relationships among the duplicated apple chromosomes [34]. Homologous apple chromosomes had more than 10 links, except for Chr 13 and 16, which hosted only a low number of RGAs. Chr 6 was not included in this analysis because it contains only nine RGAs, six of them derived from the recent WGD. Moreover, the duplicated chromosomes had RGAs belonging to the same phylogenetic clades ( Figure S4).

Discussion
To counteract pathogens, plants rely on the innate immunity of their cells and on systemic signals emanating from infection sites [9,41]. Pathogen effectors from very diverse organisms are recognized by resistance proteins encoded by RGAs and activate plant defense responses [6,9]. NBS-mediated disease resistance is effective against obligate biotrophic and hemibiotrophic pathogens but not against necrotrophs, which kill host tissues during colonization [42].
In apple, the abundance of RGAs is only partly related to genome size (750 Mb), which is much smaller than in maize (2300 Mb; [21]) or soybean (1115 Mb; [19]). The TIR-NBS class accounts for the largest group of RGAs in A. thaliana (64%; [11]) and B. rapa (64%; [14]). In P. trichocarpa [26], V. vinifera [2,5,28,29], and C. papaya [16,30], the percentage of TIR-NBS class is much lower than in the previously mentioned species. The TIR-NBS class is present at a very low frequency in O. sativa (1%; [24]) and S. bicolor (1%; [27]) and is absent in B. distachyon and Z. mays [30], supporting the conclusion that this class is specific for dicotyledons. In apple, 231 RGAs of TIR-NBS class have been identified, and they are mainly located on Chr 2, 5, 9, 12, 15, 16, and 17. However, the number of RGAs belonging to non-TIR-NBS class in apple (505) is greater than in all other species considered, and these RGAs are mainly located on Chr 3, 4, 8, 11, 13, and 14. The existence of chromosome-specific RGAs classes suggests that groups of chromosomes evolved separately, but further analyses are required to test this hypothesis. In grapevine, the existence of two chromosome groups has been inferred based on RGAs cluster similarity, and the two groups seem to have evolved independently  [2]. Moreover, the TIR-NBS class is specific for only one of the two components of V. vinifera genome, suggesting an independent evolution of the RGA classes [2].
In apple, 56% of RGAs (435 of 778 anchored) are located preferentially on six chromosomes, with 14% located on Chr 2. In large gene families, genes are commonly organized in clusters and superclusters [4,5,11,14,16,25,26], as demonstrated here for the apple genome. Of the RGAs clusters in apple, 71% (108 of 152) include RGAs from the same phylogenetic clade, and 29% RGAs from two to three different clades. Clusters frequently consist of tandem duplications of the same gene [5,43]. Heterogeneous clusters, in which sequences belong to different phylogenetic lineages, are also present, most probably as a result of different molecular mechanisms like ectopic recombination, chromosomal translocation, and gene transposition, as has been recently highlighted for the grapevine genome [2]. This kind of genome evolution could be explained in terms of a positive selection for cluster complexity, which could serve as the basis for the generation of new resistance specificities [4,44]. The role of tandem duplication in the apple genome is supported by low Ks values among RGAs of the same cluster, as is already known for other species [2,5,14,22,43]. Gene duplication in a position different from the original cluster has to be preceded by gene transposition, as predicted for A. thaliana and V. vinifera RGAs [1,2]. Thus, a successful transposition is the starting point for the creation of a new RGA cluster, and the selection for disease resistance could favor the process [5,45]. Moreover, analysis of RGA transposition has indicated that V. vinifera putative component genomes may have evolved independently and later fused and evolved together in the same nucleus [2].
Velasco et al. [34] have shown that recent WGD has increased the chromosome number in apple from nine in the putative ancestor to the current 17. The recent duplication of RGAs due to a WGD event supports the existence of i) a tetraploid state of the genome in which a pair of chromosomes exists with a second homologous pair; ii) duplications inside chromosomes, particularly for Chr 11 where recent duplications can be observed; and iii) duplications in different chromosomes, suggesting recent events of gene transposition. Eight of the 17 chromosomes (Chr 3 and 11, 5 and 10, 9 and 17, and 13 and 16) represent a direct duplication of four ancestral chromosomes, and each of the extant Chr 4, 6, 12, and 14 derives from translocation between two ancestral chromosomes [34]. More complex events have generated the remaining five chromosomes that are derived from starting three ancestral chromosomes. The different clades of RGAs along duplicated chromosomes indicate a similar position of orthologous RGAs along each chromosome doublets (Chr 3 and 11, 5 and 10, 9 and 17, and 13 and 16). These results strongly support the origin of the apple chromosomes as described by Velasco et al. [34] and indicate that RGA distribution might be used to dissect plant genome evolution [2]. As is the case for other species, the process of gene duplication has shaped the apple genome in different ways, including the selective retention of paralogs associated with specific biological processes, the amplification of specific gene families, and an extensive subfunctionalization of paralogs. Both the major WGD event and small-scale duplications could be responsible for the high number of the apple RGAs. A remarkable feature of gene duplication in apple is the high proportion of paralogs showing divergent expression patterns [46]. Extensive subfunctionalization could have contributed to the acquisition of new traits specific to apple or to the Pyrinae lineage [47]. Sequences of Eurosid genomes provide evidence of ancient genome duplications that occurred early in evolution, suggesting a polyploid origin for most Eudicots [28,48].
Most of the RGAs of wild Malus species are closely related RGAs of the domesticated apple. Whereas RGAs sequencing from wild Malus species was partial and could include alleles of the same gene, phylogenetic analysis revealed specific clades of wild Malus species, indicating, as expected, the potential to enlarge the the genetic variation of RGAs in domesticated apple. Moreover, the comparison of apple RGAs with those of other Rosaceae indicates the existence of specific clades for apple. In addition, several clades include a mixture of RGAs from Malus, Pyrus, and Prunus, indicating that similar resistance genes are still shared in different genera of the Rosaceae. While these results support the monophyletic origin of the three genera, clades specific for each genus were also found. The existence of genus-or species-specific clades indicates the existence of mechanisms for cluster conservation, as reported by Plocik et al. [49].
Phylogenetic relationships within the Rosaceae inferred from RGAs are consistent with phylogenies based on chloroplast and other nuclear genes [50,51]. The phylogenetic analysis of the RGAs from Malus, Vitis, and Populus shows that Malus contains two large non-TIR-NBS clades that are specific to Malus. This inference should be considered with caution, because the RGA sequences used in our analysis are from only a few species. Several other reasons could explain the variation of RGAs in Rosaceae species, such as the inter-specific variation of the RGA family size observed in dicotyledonous plants. Similar situations were reported for other gene families in the Archeae [52], bacteria [52,53], and mammals [54,55]. The variation of RGA family size between species could be attributed to gene duplication, deletion, pseudogenization, and functional diversification [56][57][58]. The last case is supported by the necessity of a species to adapt to rapidly changing pathogen populations.

Concluding Remarks
This paper analyses the RGAs of Malus spp. and other Rosaceae species to reveal specific evolutionary features of M. domestica. RGAs of M. domestica are mainly located in clusters and are mapped preferentially on six chromosomes. TIR-NBS and non-TIR-NBS classes of RGAs are located in different chromosome groups. Phylogenetic reconstruction in the Rosaceae family revealed specific clades of RGAs for Malus spp., Fragaria spp., and Rosa spp., indicating genus-specific evolution of resistance genes. However, strikingly similar RGAs were shared in different species of Malus, Pyrus, and Prunus highlighting a monophyletic origin of these three genera and the high conservation of some RGA sequences in these plants.

Identification of RGAs in the Apple Genome
The RGA sequences were identified from the predicted proteins of M. domestica cultivar 'Golden Delicious' [34] based on their NB-ARC domain profile (PF00931 [59]) using HMMER [60]. Putative RGA alleles were identified as predicted genes that have more than 90% of sequence similarity and overlap with another RGA along each scaffold of the heterozygous apple genome. Apple RGAs were validated by BLAST-N analysis (more than 90% protein sequence similarity) against known A. thaliana, P. trichocarpa, and V. vinifera genes. RGAs were grouped in different classes based on the presence of the domains TIR, LRR, CC, and BED finger [43]. The motifs were derived from the domain profiles retrieved from PFAM (http://pfam.janelia.org), PANTHER (http://www. pantherdb.org/), and SMART (http://smart.embl-heildelberg.de) databases and from the COILS program; a stringent threshold of 0.9 was used so that CC domains were specifically detected [61]. Resistance-related proteins were also identified based on kinase domains (IPR000719, PF07714, PF00069). Additional putative apple resistance genes were selected using BLAST and Arabidopsis proteins as reference sequences, based on a 60% similarity threshold.

Identification of RGA Clusters in the Apple Genome
The Arabidopsis definition of RGA cluster [4] was adopted: two or more RGAs in a cluster should be located within an average of 250 Kb and should not be interrupted by more than 21 open reading frames different from RGAs, as previously adopted for grapevine RGA clusters [2].

Isolation of RGAs from Wild Species
Four pairs of degenerate primers targeting the NBS domain [62,63] were used to amplify RGA sequences from 26 different Malus accessions present in the USDA apple germplasm collection at Geneva (NY, USA) (www.ars-grin.gov/npgs/index.html; Table  S2). The homologous sequences represent the following species:  (Table S2). PCR fragments were cloned in pGEMT easy (Promega), and two clones for each fragment were sequenced. Sequences were screened, cleaned, and compared with resistance genes previously identified in Rosaceae and in other Angiosperms. BLAST DNA similarity searches were performed against the RGA sequences of the apple genome using a collection of established RGAs. The RGAs were translated using tBLAST-N. Clones were filtered based on hit quality, because most of the RGA clones encoded between 24 and 40 amino acid residues. Queries having only a single hit below 90% identity were removed, and those with multiple smaller hits were annotated manually. RGA sequences from wild Malus species were submitted to the NCBI database (www.ncbi.nlm.nih.gov) under the accession numbers reported in Table S1.

Phylogenetic Analyses
Public RGA sequences from Rosaceae, P. trichocarpa, and V. vinifera Release 2 were downloaded from GenBank (http://www. ncbi.nlm.nih.gov; Table S1). RGA sequences from wild Malus species were also included (Table S1). Protein sequences of NBS domain of RGAs from M. domestica were aligned together with NBS sequences of wild Malus species, P. trichocarpa, V. vinifera and with the other Rosaceae species using hidden Markov models with the Sequence Alignment and Modeling Software System (SAM-T2K [64]); the sequences were formatted for analysis with the Phylip phylogenetic inference package [65].
The SEQBOOT tool of the Phylip package was used to generate 500 bootstraps of the data set, and the PROTDIST tool was used to construct 500 bootstrapping distance matrices using the Dayhoff PAM matrix [65]. These matrices were jumbled twice and processed with the FITCH tool to create a phylogenetic tree. A neighbor-joining tree of the 500 bootstraps was also constructed (jumbling the sequence input order twice), and a majority-rule consensus tree was assembled.

Determination of the Ks Value
Based on a CLUSTALW nucleotide alignment of M. domestica RGAs sequences, a total of 302253 Ks values were obtained [66]. The connections between chromosomes were defined on the basis of the number of RGAs and Ks values. A connection between two chromosomes was accepted if at least ten RGAs had a Ks value lower than or equal to the first quartile of 0.25 [34]. Joining lines represent connections between two RGAs among duplicated chromosomes [35] (blue, red, pink, green), among not duplicated chromosomes (yellow), and within the same chromosome (gray). Each line represents a connection between two RGAs with a Ks value lower than 0.25 [35]. A connection between two chromosomes was accepted if at least ten pairwise comparisons had a Ks value lower than 0.25. (TIF) Figure S4 Distribution of RGAs among chromosome (Chr) doublets derived from the recent whole genome duplication of apple [34]. Colours of major phylogenetic clades ( Figure 1A