Convergent Evolution at the Gametophytic Self-Incompatibility System in Malus and Prunus

S-RNase-based gametophytic self-incompatibility (GSI) has evolved once before the split of the Asteridae and Rosidae. This conclusion is based on the phylogenetic history of the S-RNase that determines pistil specificity. In Rosaceae, molecular characterizations of Prunus species, and species from the tribe Pyreae (i.e., Malus, Pyrus, Sorbus) revealed different numbers of genes determining S-pollen specificity. In Prunus only one pistil and pollen gene determine GSI, while in Pyreae there is one pistil but multiple pollen genes, implying different specificity recognition mechanisms. It is thus conceivable that within Rosaceae the genes involved in GSI in the two lineages are not orthologous but possibly paralogous. To address this hypothesis we characterised the S-RNase lineage and S-pollen lineage genes present in the genomes of five Rosaceae species from three genera: M. × domestica (apple, self-incompatible (SI); tribe Pyreae), P. persica (peach, self-compatible (SC); Amygdaleae), P. mume (mei, SI; Amygdaleae), Fragaria vesca (strawberry, SC; Potentilleae), and F. nipponica (mori-ichigo, SI; Potentilleae). Phylogenetic analyses revealed that the Malus and Prunus S-RNase and S-pollen genes belong to distinct gene lineages, and that only Prunus S-RNase and SFB-lineage genes are present in Fragaria. Thus, S-RNase based GSI system of Malus evolved independently from the ancestral system of Rosaceae. Using expression patterns based on RNA-seq data, the ancestral S-RNase lineage gene is inferred to be expressed in pistils only, while the ancestral S-pollen lineage gene is inferred to be expressed in tissues other than pollen.


Introduction
Self-incompatibility (SI), a genetic barrier to self-fertilisation in which the female reproductive cells discriminate between genetically related and non-related pollen, and reject the former [1], has evolved at least 35 times independently in different flowering plant lineages (see for crossing of crosses between SC and a SI Fragaria species, led to the identification of two unlinked RNase loci, one located in chromosome 1 (called S-locus) and the other in chromosome 6 (called T-locus) [49]. In this system a single active allele at either of the two loci is sufficient to confer self-incompatibility. Thus, self-compatibility implies being homozygous for null alleles at both loci [49]. Such inference depends, however, on the existence of the null alleles at each locus. These null alleles are associated with a lack of RNase activity. Nevertheless, because of the methodology used (isoelectric focusing of the stylar native protein extract and staining for ribonuclease activity), the different number of ribonuclease proteins in the different species may reflect differences in other ribonucleases not involved in GSI. Indeed, ribonucleases not determining GSI can also be expressed in pistil tissues [50].
In this work we test the hypothesis of evolution from paralogs for the Prunus and Malus GSI genes, by performing phylogenetic analyses of the Malus, Prunus and Fragaria T2-RNase and SFBB-SFB genes identified in five Rosaceae genomes. We also use macro and micro synteny data to support our findings. T2 RNase and F-box expression patterns were also characterized in order to understand how difficult it is to evolve the restricted expression pattern shown by S-pistil and S-pollen genes. Inferences on the Fragaria and ancestral Rosaceae S-locus are presented.
SFB, SLFL and SFBB-like genes were identified as described above using M.× domestica SFBB3-beta (BAF47180.1) and P. avium SFB3 (AAT72121.1) as queries for M. × domestica, SFBB3-beta, P. avium SLFL1 (BAG12295.1) and SFB3 as queries for P. persica and P. mume, and SFBB3-beta and SFB3 as queries for F. vesca and F. nipponica. Query sequences were trimmed to eliminate the F-box region, and only matches with e<E-12 were considered. For the M. × domestica, P. persica and F. vesca genes here obtained, the genomic location was obtained using GBROWSE at GDR (http://www.rosaceae.org; [43]) to retrieve the sequence of the region and blast two sequences (bl2seq) to obtain the location of the gene sequence used.

Phylogenetic analyses
Phylogenetic analyses utilized Muscle, ClustalW2, and T-Coffee alignment algorithms as implemented in ADOPS [54]. It should be noted that when ADOPS is used, nucleotide sequences are first translated and then aligned using the amino acid alignment as a guide. Only codons with a support value above 2 were used for phylogenetic reconstruction. Bayesian trees were obtained using MrBayes 3.1.2 [55], as implemented in the ADOPS pipeline. The Generalised Time-Reversible (GTR) model of sequence evolution was implemented in the analyses, allowing for among-site rate variation and a proportion of invariable sites. Third codon positions were allowed to have a gamma distribution shape parameter different from that of first and second codon positions. Two independent runs of 2,000,000 generations with four chains each (one cold and three heated) were carried out. The average standard deviation of split frequencies was always~0.01 and the potential scale reduction factor for every parameter was~1.00, showing that convergence was achieved. Trees were sampled every 100th generation and the first 5000 samples were discarded (burn-in). The remaining trees were used to compute the Bayesian posterior probabilities for each clade of the consensus tree.
Expression analyses of Malus S-RNase, S-RNase lineage, SFB, SFBB, and SFBB-like genes To estimate expression of the Malus S-RNase like and SFB-like genes, we use RNA-seq data from M. fusca (a diploid wild apple species; http://vannocke.hrt.msu.edu/DBI-0922447/RNAseq.html; S. van Nocker, manuscript in preparation; S1 Table). Before assembly, adaptor sequences were removed from raw reads. FASTQC reports were then generated and based on this information the resulting reads were trimmed at both ends. Nucleotide positions with a score lower than 20 were masked (replaced by an N). These analyses were performed using the FASTQ tools implemented in the Galaxy platform [56][57][58]. The resulting high-quality reads were used in the subsequent transcriptome assembly using Trinity with default parameters [59]. FPKM values were estimated using the eXpress software [60] as implemented in Trinity. All contigs were used as queries for tblastn searches using local blast [61]. The query sequences used, except for the S-locus genes, were those of M. domestica (RNase S-lineage 1 MDP0000210735A; Malus RNase S-lineage 2 MDP0000682955; Malus SLFL3-like MDC010871; Malus SLFL-like MDP0000266067 and MDP0000302221; and Malus SFB-like MDP0000890198 and MDP0000393954), since divergence values between M. × domestica and M. fusca are low (~0.02 [33,62]). In contrast, divergence values for S-locus genes are generally above 0.2 [37]. Therefore, query sequences for the S-RNase and SFBB genes were first obtained from M. fusca, whereas query sequences for S-RNase like and SFBB like genes were those previously annotated in M. × domestica. For the S-RNase gene, M. fusca genomic DNA and primers SorbusRNaseF (5'..AAGTTGTTTACGGTTCAC..3') and SorbusRNaseR (5'..TATTCTTTTGGCACTT GA..3') were used for PCR with standard amplification conditions [35 cycles of denaturation at 94°C for 30 seconds, primer annealing at 48°C for 30 s, and primer extension at 72°C for 3 min; [23]]. For the SFBB gene, M. fusca genomic DNA and primers SFBBgenF (5'..AAGTC YCTGATGMGRTTC..3') and SFBBgenR (5'..GTCCATTACCCAYRTYTC..3') were used with the same amplification conditions. For each amplification product, four amplicons were sequenced in order to obtain a consensus sequence. The M. fusca S-RNase and SFBB sequences have been submitted to GenBank (accession numbers KP768248 (M. fusca S28-RNase), KP768249 (M. fusca S-RNase), and KP768250 (M. fusca SFBB)).
The phylogenetic relationship of the Rosaceae T2-RNase sequences here identified, together with previously described Malus, Prunus, Solanaceae, and Plantaginaceae S-RNases, and Fabaceae (Medicago truncatula, and Cicer arietinum) S-lineage genes (S-RNase like genes of selfcompatible species, that are surrounded by F-box Malus SFBB/ Prunus SLFL like sequences, and are expressed in tissues other than pistil; unpublished results) is presented in Fig 1 (see also S1 Fig). The phylogeny supports the inferences that sequences presenting amino acid pattern 4, and/or more than two introns, are not S-lineage genes, since they do not cluster with Malus, Prunus, Solanaceae, and Plantaginaceae S-RNases.
The Malus S-RNase lineage genes defined three groups: Pyreae S-RNases, S-RNase lineage 1, and Malus S-RNase lineage 2 (Fig 1; S1 Fig). The two S-RNases present in the sequenced M. × domestica cultivar, according to blastn results, are S2-RNase, and S3-RNase. Although the sequenced scaffold containing the Malus S3-RNase is not anchored, the S2-RNase is located on Malus chromosome (MC) 17 (Table 1; Fig 2), in agreement with the linkage-based results [66]. Malus S-RNases cluster with Fabaceae S-lineage genes, and not with Prunus S-RNases. Thus, we can conclude that Malus and Prunus S-RNase genes represent two different gene lineages. Four out of five Malus S-RNase lineage 1 putative functional genes are located on MC10 (Table 1; Figs 1 and 2). These four genes have been wrongly annotated due to the incorporation of exons from a different gene into a single one (http://www.rosaceae.org/). Indeed, our transcriptional data (below) supports our inference that they represent two genes, namely one S-RNase lineage gene and one F-box gene. Here we use the gene name and the suffix-A for the S-RNase duplicate and-B for the F-box gene. Malus S-RNase lineage 2 genes are located at MC1 and MC15 (Table 1; Fig 2). These regions represent different linkage groups in the putative Rosaceae ancestral genome (Fig 2).
Prunus S-RNase lineage genes also defined three groups: the Prunus S-RNase gene, the S-RNase lineage 1 (that cluster with Malus S-RNase lineage 1 genes), and a Prunus specific group that includes the P. avium PA1 ([67]; Fig 1; S1 Fig). In P. mume, a self-incompatible species, only one of the S-RNase alleles was obtained. By performing blastn, it is clear that this S-RNase sequence has not been previously identified. It should be noted that, chromosomal location is only available for P. persica, a self-compatible (SC) species. SC in Prunus has been achieved by non-sense mutations mainly at the SFB gene, but also at the S-RNase gene [24], mutations producing low S-RNase transcription levels [68], and mutations at loci unlinked to the S-locus [69,70]. In P. persica we identify the S2-RNase that does not have in frame stop codons. According to the location of the P. persica S2-RNase allele, the S-locus region is at scaffold 6, as reported by Dirlewanger et al. [71]. This scaffold shows synteny to Malus MC2/MC15, MC3/MC11, and MC4/MC12, but not to MC17 [65,72]. Indeed, when the 194 Kb region flanking the Prunus S2-RNase is used as query against the M. x domestica genome two hits are observed with M. x domestica chromosome 4 and chromosome 12 (S2 Fig). It should be noted that, in Malus, because of a whole genome duplication, large syntenic regions are found between two or more chromosomes (Fig 2, [65]). Thus, the S-locus regions in Prunus and Malus  represent different linkage groups in the putative Rosaceae ancestral genome (Fig 2). P. avium PA1 [67] in two of the alignment methods clusters with S-RNase lineage 1 genes. Thus, it may represent a duplication of this gene lineage and not a duplication of the Prunus S-RNase gene [67]. Like in Malus, P. persica ppa024151m S-RNase lineage 1 gene has in its vicinity an Fbox gene (P. persica ppa024694m; S3 Table). This gene is located at Prunus scaffold PG8, a linkage group syntenic with Malus MC10 (and also MC5, and MC3/MC11) [65,72] where Malus S-RNase lineage 1 genes are located. These regions represent the putative Rosaceae ancestral linkage group1 (Fig 2).
There are Fragaria S-RNase lineage genes that cluster with Prunus S-RNases (the F. nipponica gi561805796, F. nipponica gi561674690-gi561985884-gi561957436, and the pseudogene F. vesca scf0513144.1), but none of the Fragaria genes cluster with the Malus S-RNase gene (Fig 1;  S1 Fig). It should be noted that F. nipponica is a SI species [73], that shows three bands in zymograms [49], and F. vesca is a SC species that lacks stylar S-RNase activity [49]. In Fragaria two unlinked RNase loci, the S-locus (FC1) and T-locus (FC6) have been described, although a single active allele at either of the two loci is sufficient to confer self-incompatibility [49]. Based on the ribonuclease zymograms, Fragaria T2-RNases can show high IP, as those at the S-locus (located at FC1; [49]), characteristic of S-RNases [3], but they can also be at the base of the cathodal region, as those at the T-locus (located at FC6; [49]), thus presenting a neutral IP. The F. nipponica gi561805796, and F. nipponica gi561674690-gi561985884-gi561957436 sequences, that cluster with the Prunus S-RNase, encode proteins with an isoelectric point above 9 (Table 1) and thus, in principle, can only represent alleles of the Fragaria S-locus S-RNase gene. Moreover, these sequences show two introns at the same positions as the Prunus S-RNase gene [52]. Nevertheless, the presence of F. vesca scf0513144.1 sequence that shows low levels of divergence with F. nipponica gi561674690-gi561985884-gi561957436 (K s in coding regions is 0.039, after Jukes Cantor correction) suggests that this F. nipponica sequence is not an S-RNase allele, but a S-lineage gene. Indeed, using blastn and Fragaria whole genome shotgun contigs (wgs) Database, this gene is highly conserved in Fragaria (nucleotide identity higher than 98% in F. ananassa (dbj|BATT01112757.1|), F. nubicola (dbj|BATW01064019.1|), and F. orientalis (dbj|BATX01305571.1)). Moreover, F. vesca scf0513144.1 is located at linkage group FC2 (Fig  2), and the Fragaria S-locus has been assigned to FC1 [74]. Therefore, only the F. nipponica gi561805796 gene may represent one of the Fragaria S-locus pistil genes, but unfortunately genomic location of this gene is not available in F. nipponica since the contig where it is located is short (4292 bp). Since F. vesca is a self-compatible species and thus the S-locus region may be deleted or non-functional, and the F. nipponica gi561805796 sequence clusters with Prunus S-RNases, we also performed a blastn search using the 194 Kb Prunus S2-RNase flanking region as query against the F. vesca genome. Two regions were identified showing synteny, one at FC1, and another at FC6, the two regions identified by Bošković et al. [49], as the regions harbouring the S-and T-locus, respectively (S3 Fig). The syntenic region located at FC1 is approximately at the middle of chromosome 1 and the region located at FC6 is approximately located at one end of chromosome 6 near the Pgl1 gene, which is broadly compatible with the locations of the S-and T-loci based on the recombination map shown by Bošković et al. [49]. Therefore, levels of diversity for the F. nipponica gi561805796 gene together with segregation analyses and Plantaginaceae, and Fabaceae S-lineage genes). # indicate sequences with more than two introns; (4) the sequences that show in the putative protein sequence amino acid pattern 4, that is absent in all S-lineage genes [4,7]; the + indicate sequences that present stop codons; the {indicate sequences where gaps were introduced to avoid stop codons.
doi:10.1371/journal.pone.0126138.g001 To identify other Fragaria T2-RNase gene candidates we also looked at levels of nucleotide similarity among Fragaria species for the other T2-RNase-lineage genes at FC1 and FC6. Since a single active allele at either of the two loci is sufficient to confer self-incompatibility [49], both loci must be under positive selection and thus, levels of diversity at each of the T2-RNase genes is expected to be large. In FC1, there is only one S-RNase-lineage gene (Fig 2), namely F. vesca 12961 (Table 1). This gene codes for a putative T2-RNase with a neutral IP (Table 1), and basic ribonucleases have been assigned to FC1. The phylogenetic relationship of this sequence depends on the alignment method used (Fig 1, and S1 Fig). Nevertheless, this gene is highly conserved (nucleotide identity higher than 98%) in F. ananassa (dbj|BATT01682710.1|), F. nubicola (dbj|BATW01027551.1|), and F. orientalis (dbj|BATX01257665.1|). Moreover, the F. vesca 12961 gene is not surrounded by F-box genes showing similarity to either Malus or Prunus S-pollen genes (Fig 2), and thus it is unlikely to be the Fragaria S-RNase S-locus gene. Located in FC6 there are three genes, namely F. vesca 00227, F. vesca 00230, and F. vesca scf0513063.1 (Table 1), that could represent T2-RNases of the T-locus. F. vesca 00227 gene is highly conserved (nucleotide identity higher than 98%) in F. ananassa (dbj|BATT01017701.1|), F. nubicola (dbj| BATW01044818.1|), and F. orientalis (dbj|BATX01013415.1|). For F. vesca 00230 gene, high nucleotide homology (more than 97%) is observed in F. iinumae (dbj|BATU01072984.1|), F. ananassa (dbj|BATT01231756.1|), F. nubicola (dbj|BATW01024410.1|), and F. orientalis (dbj| BATX01105279.1|). Therefore, it is unlikely that these genes represent the Fragaria T2-RNase T-locus gene. For F. vesca scf0513063.1 gene, levels of similarity with other Fragaria species is below 96% (F. iinumae (dbj|BATU01063406.1|), and F. ananassa (dbj|BATT01457356.1|)). When the 5´region of F. iinumae dbj|BATU01063406.1contig (427 bp long) is blasted against the F. vesca genome, only one hit is obtained with F. vesca scf0513063, with 94% homology. Similar result is observed with 3' region of F. ananassa dbj|BATT01457356.1| contig. Therefore, these T2-RNases seem to be orthologous. The F. vesca scf0513063.1 gene can thus, represent the T2-RNase at the T-locus. Indeed, this gene has in its vicinity a F-box gene (Fig 2). Nevertheless, this region is not located on the FC6 region identified by Bošković et al. [49] as being the Tlocus region. Indeed, the T-locus region is near the Pgl1 gene and the F. vesca scf0513063.1 gene is located on the other side of the chromosome. It should also be noted that the F. vesca scf0513063.1 gene is not located in the F. vesca region that shows synteny with the Prunus Slocus flanking region and that is near the Pgl1 gene (see above). None of the sequences here identified as the putative Fragaria S-an T-locus share homology with the S-RNase peptide sequences identified by Bošković et al. [49]. Indeed, these peptide sequences show similarity with the protein encoded by F. vesca 17424 (LG2:9,443,884.. 12,126,193) gene that belongs to the Glo_EDI_BRP_like superfamily.
Since for F. nipponica contigs, chromosomal location is not available, we address also if F. nipponica gi561793890, F. nipponica gi561844698, and F. nipponica gi561877040 T2-RNase sequences could represent the T2-RNases of the T-locus. For both F. nipponica gi561793890 and F. nipponica gi561844698 low levels of divergence were observed with F. vesca 26822 and F. vesca 22609, respectively (Fig 1). These F. vesca genes are located at LG 5 (Table 1). Furthermore, F. nipponica gi561793890 is highly conserved (nucleotide identity higher than 97%) in F. nubicola (dbj|BATW01019403.1|), F. orientalis (dbj|BATX01097770.1|), and F. ananassa (dbj| BATS01008420.1|). High conservation is also observed between F. nipponica gi561844698 Evolution of the GSI System in Rosaceae T2-RNase and sequences of F. orientalis (dbj|BATX01070344.1|), and F. ananassa (dbj| BATT01173691.1|). Thus, it is unlikely that these genes are the Fragaria S-RNase T-locus gene. The F. nipponica gi561877040 sequence is 1394 bp long and the T2-RNase gene is 793 bp long. High nucleotide conservation (99%) is observed only with F. ananassa (dbj|BATT01303160.1|) at the entire region. Such high homology is not expected between different S-RNase alleles, but this F. ananassa individual could share the same S-RNase allele with that of F. nipponica. When the F. nipponica gi561877040 T2-RNase region is used as query against the F. vesca genome using blast, 90% homology is observed with F. vesca scf0512945. This region contains the F. vesca 00227 gene, that in the phylogenetic analyses clusters with F. nipponica gi561877040 T2-RNase (Fig 1). When the same approach is used but the flanking regions of F. nipponica gi561877040 T2-RNase are used as query, high homology (96%) is observed with F. vesca scf0513158_5. A higher homology (99%) is observed with this F. vesca contig, using as query the larger F. ananassa contig (5740 bp long). F. vesca scf0513158_5 is located at LG5. Therefore, there is no evidence for this sequence being the T-locus RNase gene in Fragaria.
In summary, given the close relationship of F. nipponica gi561805796 gene with Prunus S-RNase gene, and the presence of Fabaceae S-lineage genes that cluster with Malus S-RNases, we can conclude that Malus and Prunus S-RNase genes represent two different gene lineages, and that Fragaria is most similar to Prunus. Moreover, the regions where Malus and Prunus Slocus are located are not orthologous. Eleven genes cluster with the Malus SFBB reference genes (Fig 3). Seven SFBB-like genes are from MC17 (the location of the Malus S-locus; Fig 2), but only two (MDC005063 and MDP0000294286) seem to be functional. The other genes are likely non-functional since alignment gaps had to be included to put the sequence back into the right frame, or have in frame stop codons (S2 Table). The exact number of Malus SFBBs is unknown, but more than 10 genes have been described at the S-locus [22,77]. Therefore, S-pollen genes are poorly represented in the Malus genome sequence. It should be noted that MDP0000150027 gene located at MC16 codes for a protein that is identical to SFBBX17 (BAJ11965), that shows linkage with the S9-RNase [22], and thus, it must be erroneously located. The other three SFBB-like genes located at MC2 and MC16 (Fig 2) are likely non-functional since alignment gaps had to be included to put the sequence back into the right frame, or have in frame stop codons (S2 Table). The presence of SFBB-like sequences in regions other than the S-locus implies that phylogenetic analyses alone may lead to the incorrect assignment of SFBB-like genes to the S-locus. Therefore, linkage analyses with the S-RNase gene are needed for the identification of SFBB genes. Malus SFBB genes are closely related to Prunus and Fragaria SLFL-like genes (Fig 3), as persica ppa/ppb), P. mume (P. mume scaffold), F. vesca, and F. nipponica F-box SFBB-and SFB-like genes. In grey are the reference sequences (Prunus SFB, Prunus SLFLs, Pyreae SFBBs, and Petunia SLF genes). The tree was rooted with A. thaliana F-box/kelch-repeat gene (NM111499). Numbers below the branches represent posterior credibility values above 60. The + indicate sequences that present stop codons; described before [19,28,[34][35][36]. Malus SFBB genes are, however, a distinct lineage since there are other Malus SLFL genes that are more closely related to Prunus and Fragaria SLFL genes than Malus SFBB genes. Furthermore, Malus SLFL-like lineage genes are not present in the Malus S-locus region (MC17; Fig 2).

Malus, Prunus and Fragaria SFB, SLFL and SFBB-like genes
Malus SLFL-like genes are a very heterogeneous group of genes (Fig 3) and they are located in nine chromossomes (Fig 2; S2 Table). Nevertheless, the two Malus SLFL-like genes, MDC010871 located at MC4, and MDP0000250455 located at MC12 (Figs 2 and 3) that are orthologs of Prunus SLFL genes located in the vicinity of the S-locus region (SLFL3 and P. persica ppa016207m) are located in syntenic regions. This suggests the presence of SLFL-like genes similar to those of the Prunus S-locus in the ancestral Rosaceae group 9.
Here we report Malus SFB-like sequences for the first time. MDP0000890198, MDC024304, and MDP0000393954 genes, located at MC13, cluster with Prunus and Fragaria SFB-like genes. These Malus SFB-like genes represent, however ancient gene duplications that occurred before the separation of Fragaria and the Malus/Prunus lineage. They could represent the SFBlike genes of the ancestral Rosaceae group 1 (Fig 2; [65,72]).
In the P. persica S-locus region there is a mutated S2-SFB gene (P. persica ppa011628m; a 5 bp sequence is inserted in the S2-SFB gene) that has been reported to cause SC in this species [24]. The P. mume scaffold 57.94 and P. mume scaffold241.2 are the two SFB alleles of the cultivar sequenced (Fig 3), and have not been described before. Prunus SFB gene is clearly a distinct lineage of the Malus SFBB genes (Fig 3). Although there is a SFB-like gene closely related to Prunus SFB gene (represented by P. persica ppa021167m, and P. mume scaffold 57.55 sequences; Fig 3), it is easily distinguished from the SFB gene since it presents low sequence divergence. Furthermore, this gene in P. persica is located in PG3 (S3 Table). In the vicinity of the Prunus S-locus, there are seven SLFL genes: three previously reported-SLFL1 (P. persica ppa021716m), SLFL2 (P. persica ppa025849m), and SLFL3 (P. persica ppa026586m; [12,13]), three unreported SLFLs genes-P. persica ppa019333m, P. persica ppa016317m, and P. persica ppa016207, and one unreported putative SLFL pseudogene (ppb020773m+).
In Fragaria, the number of SLFL-like genes is larger than in Prunus and Malus (Fig 3). About 70% of those are located at FC6 (Fig 2) and are the result of two tandem duplications (Fig 2; [65,72]). Only F. vesca 06873 clusters with Prunus SFB, and SFB-like P. persica ppa021167m, and P. mume scaffold 57.55 genes (Fig 3). Although F. vesca 06873 is located at FC6 (that is syntenic to PG6, where the Prunus S-locus is located; Fig 2, and S3 Fig), it shows 98% nucleotide identity with sequences from other Fragaria species (F. nubicola BATW01053320.1; F. iinumae BATU01056017.1). Thus, F. vesca 06873 pseudogene represents a SFB-like gene, and not a non functional S-pollen gene. None of the F. vesca F-box SFBB-SFB-SLFL-like genes located in FC1 region cluster with Prunus or Malus S-pollen genes, as observed for the T2-RNases.
In conclusion, the evidence here presented implies that Prunus SFB gene is from a distinct phylogenetic clade of the Malus SFBB genes. Furthermore Malus SFBB genes are a sister clade of Malus and Prunus SLFL genes. Although no putative Fragaria S-pollen has been identified, the presence of SLFL-like genes orthologous of the Prunus SLFL genes located in the vicinity of the Prunus S-locus, at FC6, suggests a similar organization to that of Prunus S-locus for the Fragaria T-locus. This region is the same that is identified when using the 194 Kb Prunus region flanking the S2-RNase (see above). Furthermore, we found no evidence for Malus SFBB lineage genes in Fragaria. Malus and Prunus expression analyses of the S-RNase, S-RNase lineage, SFB, SFB-like, SFBB, and SLFL-like genes In order to understand how difficult it is to evolve the restricted expression pattern shown by S-pistil and S-pollen genes, inferences must be made on the expression of the ancestral S-RNase lineage and S-pollen lineage genes. We addressed the expression of these lineage genes using RNA-seq expression data. In Malus we analysed 17 tissues derived from developmental transitions of the wild apple M. fusca (S1 Table), and in Prunus, seven tissues from two P. mume cultivars (Material and Methods).
Malus S-RNase gene is most highly represented in whole pistils one week prior to anthesis, in stigmas and styles of flowers at anthesis (Fig 4A), as described before [78], but they also show low expression in entire flower buds. Malus S-RNase lineage 2 gene, shows a similar pattern of expression to the S-RNase gene (maximum expression in pistils one week prior to anthesis, much less in stigma of flowers at anthesis, and entire flowers buds; Fig 4B), but levels of expression are about seven times lower than for the S-RNase. In contrast, S-RNase lineage 1 genes show maximum expression in seeds, and moderate expression in embryos and ovary ( Fig 4B). Prunus S-RNases, S-RNase lineage 1, and PA1genes are also expressed in the pistil and buds (Fig 5A and 5B). Therefore, the ancestral S-RNase gene is inferred to show expression mainly in pistils but also, at lower amounts, in stigmas, styles of flowers at anthesis and flowers buds.
Expression of SFBB genes is reported to be pollen-specific [21,22,26,27,76]. Accordingly, we found expression in anthers at anthesis (Fig 4C). It should be noted that SFBB expression is 5000 times lower than that of the S-RNase gene. SLFL-like genes are a very large group of genes, and thus, they show different patterns of expression. They can show maximum expression in anthers and pollen such as MDC010871 (Fig 4D), and MDP0000266067 (Fig 4E), but can be also expressed in a few other tissues (pistil, style stamen and filaments) such as MDC010871 (Fig 4D), or they can show moderate expression in most of the tissues analysed, such as the MDP0000266067 gene (Fig 4E). They can even show no expression in anthers and pollen such as MDP0000302221 (Fig 4E), but show low expression in most of the tissues analysed. Prunus SLFL genes are expressed in pollen (Fig 5E), as described before [12,13], in flowering buds, but also, very little, in the pistil. Expression in pollen and buds are also observed for other Prunus SLFL-like genes ( Fig 5F). Thus, from this data it is not possible to infer the expression of the ancestral SFBB gene.
Although Prunus SFB gene has been reported as highly expressed in pollen and anthers [12], it shows 70 times less expression than the Prunus S-RNase gene (Fig 5C). This gene is also expressed in buds. Similar expression pattern is observed for the closely related SFB-like P. mume scaffold 57.55. More divergent SFB-like genes have a different expression pattern (see for instance P. mume scaffold 241.9, (Fig 5D) and Malus MDP0000890198 (Fig 4F) genes). We can infer that the ancestral Prunus SFB gene would show a similar expression to that of Prunus SFB.

Discussion
S-RNase based GSI, according to the S-RNase gene has evolved about 120 MY ago, and thus is expected to be shared among distantly related species. Our results show that, in Malus and Prunus the S-RNase and the S-pollen genes have evolved from paralogous genes. The Malus S-RNase and SFBB gene lineages are not present in Prunus, and Fragaria. The presence of the F. nipponica gi561805796 Prunus S-RNase lineage gene suggests that Fragaria and Prunus share a common ancestral S-locus region. The location of this gene is unknown, but given its basic IP, it could represent the Fragaria S-locus. The location of this gene must be however Evolution of the GSI System in Rosaceae confirmed. The presence of SLFL-like genes in Fragaria FC6 that cluster with Prunus SLFLlike genes surrounding the S-locus region, as well as the conservation between the Prunus Slocus flanking regions and this FC6 region, further suggests that the T-locus could show a similar organization to that of Prunus S-locus. This region is located near the Pgl1 gene and thus corresponds to the T-locus identified by Bošković et al. [49]. The F. vesca scf0513063.1 gene identified as a T2-RNase T-locus candidate gene, is, however, not located in the region identified by Bošković et al. [49] as being the T-locus, and thus should be treated with caution.
The hypothesis that Fragaria and Prunus share a common ancestral self-incompatibility locus organization is unexpected because of the similarities between the Petunia and Malus S-RNase based GSI mechanism of pollen recognition by multiple S-pollen genes [19][20][21][22][23]26,28,29,76]. Thus, in Fragaria that is an out group to Prunus and Malus, the S-locus genes were expected to belong to the Malus S-lineage. It could be that in the ancestral Rosaceae, there were three loci determining GSI, the S and T Fragaria loci and the Malus S-locus. Under this scenario, different GSI loci were retained and lost in different Rosaceae SI species.
One alternative hypothesis is that the Malus S-locus region could have evolved de novo from a partial duplication of the ancestral Fragaria/Prunus-like S-locus region. Under this hypothesis the S-RNase duplicate gene is the Malus pistil component and duplicates of the ancestral SLFL-like genes (Malus SFBB genes) are now the S-pollen genes. It is likely that the ancestral SLFL-like genes were expressed in pollen, but also in other tissues and thus, they could have evolved an expression restricted to pollen and anthers. Further genomic data from other SI Rosaceae species, as well as a better assembly for Fragaria SI species, is needed to distinguish the two hypotheses.
The molecular characterization of S-RNase based GSI often starts with a search for S-RNase lineage genes, since they are just a few. Moreover, S-RNase lineage genes can be easily distinguished from other T2-RNases by looking at the sequences of the proteins they encode. Indeed, Vieira et al. [7] have shown that amino acid patterns 1 and 2 are exclusively found in proteins encoded by S-lineage genes, while pattern 4 is not found in any protein encoded by S-lineage genes. Moreover, in contrast to other T2-RNases, S-RNase lineage sequences have just one or two introns [5] and encode proteins with a basic isoelectric point [3]. Nevertheless, phylogenetic analyses of the S-RNase lineage genes alone is not enough to identify the S-RNase gene. For instance, in Coffea (Rubiaceae) there are at least three distinct S-RNase lineage genes [4]. The secondary evolution of GSI from paralogous regions further complicates the identification of the S-RNase gene. Thus, identifying the GSI biochemical components in non characterised eudicot species may be more difficult than anticipated. The expression pattern is not enough either, since, as we here show, there are S-RNase duplicates with an expression pattern identical to that of the S-RNase gene. Therefore, besides phylogenetic and expression analyses, evidence for high polymorphism levels, positively selected amino acid sites, as well as segregation analyses in controlled crosses are needed to identify the S-RNase. It should be noted that in the Pyreae S-locus region there are likely also genes that perform functions unrelated to selfincompatibility [21,27]. Such genes might contribute to other phenotypes of agronomical interest. In Rosaceae species where the S-locus is large, as in Pyreae, variants at these genes will cosegregate with GSI specificities [79]. It is thus, important to determine whether the S-locus structure of most Rosaceae species is similar to that present in Pyreae.
In conclusion, S-RNase based GSI may evolve multiple times from S-locus paralogous regions, as it happens in Rosaceae. In Brassicaceae, a duplication event of the S-locus region and recruitment of the paralogous genes of the ancestral SSI Arabidopsis and Brassica S-locus genes that determine SI has been described also in Leavenworthia [39]. Thus, multiple independent recruitment of SI genes from the same gene families may be an unexpected but common evolutionary process in plant SI systems.