Elucidating the contribution of wild related species on autochthonous pear germplasm: A case study from Mount Etna

The pear (genus Pyrus) is one of the most ancient and widely cultivated tree fruit crops in temperate climates. The Mount Etna area claims a large number of pear varieties differentiated due to a long history of cultivation and environmental variability, making this area particularly suitable for genetic studies. Ninety-five pear individuals were genotyped using the simple sequence repeat (SSR) methodology interrogating both the nuclear (nDNA) and chloroplast DNA (cpDNA) to combine an investigation of maternal inheritance of chloroplast SSRs (cpSSRs) with the high informativity of nuclear SSRs (nSSRs). The germplasm was selected ad hoc to include wild genotypes, local varieties, and national and international cultivated varieties. The objectives of this study were as follows: (i) estimate the level of differentiation within local varieties; (ii) elucidate the phylogenetic relationships between the cultivated genotypes and wild accessions; and (iii) estimate the potential genetic flow and the relationship among the germplasms in our analysis. Eight nSSRs detected a total of 136 alleles with an average minor allelic frequency and observed heterozygosity of 0.29 and 0.65, respectively, whereas cpSSRs allowed identification of eight haplotypes (S4 Table). These results shed light on the genetic relatedness between Italian varieties and wild genotypes. Among the wild species, compared with P. amygdaliformis, few P. pyraster genotypes exhibited higher genetic similarity to local pear varieties. Our analysis revealed the presence of genetic stratification with a ‘wild’ subpopulation characterizing the genetic makeup of wild species and the international cultivated varieties exhibiting the predominance of the ‘cultivated’ subpopulation.


Introduction
The pear (Pyrus spp.) is one of the most cultivated fruit crops in temperate zones. Pyrus species are traditionally divided into two groups based on domestication area and geographic distribution. European pears (P. communis) are cultivated mainly in Europe and the U.S., and Asian long juvenile period and large size of the plants (requiring great time and space investments) as well as the genetic complexity of Pyrus resulting from the self-incompatibility of the genus. The use of molecular markers could have a direct positive implication for the genetic characterization of the germplasm collection, laying a foundation for use of genetic polymorphisms to make predictions of phenotype changes through marker-trait association analysis.
In the present study, nuclear (nSSR) and chloroplast (cpSSR) microsatellites were used (i) to estimate the level of differentiation within the cultivated genotypes and the wild accessions; (ii) to elucidate phylogenetic relationships between the cultivated genotypes and the wild accessions; and (iii) to estimate the potential genetic flow between and the relationship among local Sicilian pear genotypes, native wild species and international varieties.

Plant material and DNA extraction
Ninety-five pear genotypes were used in this study (Table 1), including 46 local varieties (LV) and 21 wild related species (RS) collected from Etna district (Italy), 19 nationally cultivated varieties (NCV) and 9 internationally cultivated varieties (ICV) (Fig 1). Genotypes were sampled from different sites as specified in S1 Table. Genomic DNA was extracted from fresh leaves using ISOLATE II Plant DNA Kits (Bioline, Meridian Life Science, Memphis, TN, USA). The quantities and qualities of the extracted DNA samples were determined using a Nanodrop 2000 (Thermo Scientific, Waltham, MA, USA) spectrophotometer and agarose gel electrophoresis. DNA samples were stored at -20˚C.

SSR analysis by capillary electrophoresis
PCR amplification was performed using four chloroplast SSR primer pairs derived from the pear genome and eight nuclear SSR primer pairs derived from the pear and apple genomes ( Table 2). PCR reactions were each performed in a 15-μl volume containing 40 ng genomic DNA, 1x PCR buffer II, 2 mM magnesium chloride, 0.2 mM dNTPs, 0.3 μM each primer, 0.13 μM 5'-fluorescently labelled M13F primer (CAC GAC GTT GTA AAA CGA C) tagged with 6-FAM, NED, VIC or PET and 1U of MyTaq DNA polymerase (Bioline). Amplifications were conducted using a programme with an initial denaturation step at 95˚C for 15 min followed by 35 cycles at 95˚C for 30 sec, 52-55˚C for 30 sec and 72˚C for 45 min with a final cycle of 72˚C for 15 min. A 0.4-to 0.6-μl aliquot of PCR product (depending on the performance of amplification of each primer pair) was mixed with 13 μl of formamide and 0.3 μl of LIZ-500 size standard and denatured at 95˚C for 5 min. Up to four PCR products labelled with 6-FAM, PET, VIC or NED were pooled before separation in the ABI 3130 Genetic Analyser (Applied Biosystems, Foster City, CA, USA) and subjected to subsequent analysis using GeneMapper 4.0 software.

Genetic distance and clustering
Genetic distance was estimated by analysing dissimilarity indices calculated using allelic data by simple allele matching to obtain the genetic dissimilarity matrix. Dendrogram trees were obtained using Dissimilarity Analysis and Representation for Windows software version 5.0 (DARwin5) by the neighbour-joining method [41]. The robustness of branches was tested using 1,000 bootstraps.
The numbers of genotypes, the numbers of alleles, the major allele frequency (MAF), the expected heterozygosity (exp-het), the observed heterozygosity (obs-het) and the polymorphism information content (PIC value) for each SSR marker were calculated using PowerMarker [42]. Pairwise fixation index (F ST ) was calculated using GenePop software [43]. The level of genetic stratification within the germplasms in the analysis was assessed using STRUCTURE v.2.3.1 [44]. This analysis was performed on 73 genotypes, excluding those genotypes for which a third allele was observed for one or more loci. Eight nSSRs were used to compute the posterior probability [Pr(X|K)] given an increasing number of sub-populations (ranging from K = 1 to K = 8, with five independent runs each). The computation was performed with five independent runs using a 'Length of Burnin Period' and 'Number of MCMC Reps after Burnin' of 1,000,000 under the admixture model. The most likely number of subpopulations (K) was identified with STRUCTURE HARVESTER [45] using the ΔK described by Evanno et al. [46]. Samples were assigned to the sub-population when the assignation probability (qI) was greater than or equal to 0.8 [47][48][49]. Principal component analysis (PCA) was performed using the 'stat' package in R (R developing team), whereas median-joining network analyses were performed using Network 4.6.1.5 ( [50] http://www.fluxus-engineering.com/ sharenet.htm) with default settings.

Results
Capillary electrophoresis analysis produced clear profiles for all four cpSSR and eight nSSR loci for 91 pear genotypes. In contrast, two local ('Savino' and 'Angelico') and two wild (P. pyraster n. 6 and P. amygdaliformis n. 10) genotypes exhibited no PCR amplification and were excluded from further analysis.
Nuclear SSR markers allowed the identification of nineteen individuals exhibiting three alleles in at least one of the nSSRs (data not shown).
The nSSRs detected a total of 136 alleles with sizes ranging from 115 to 256 bp with average values of 17 and 0.29 for the number of alleles and the MAF, respectively ( Table 3). The mean value of the exp-het was 0.82, whereas the obs-het was 0.65. All eight nSSRs were highly polymorphic, with PIC values ranging from 0.42 to 0.92. The most polymorphic markers were TsuENH026 with a total of 21 alleles and 48 genotypes detected, obs-het of 0.74 and PIC of 0.92 and BGT23b with a total of 25 alleles and 43 genotypes, obs-het of 0.55 and PIC of 0.91. The least informative nSSR was CH04e03 demonstrating an obs-het and PIC of 0.18 and 0.42, respectively.
The cpSSR analysis detected a total of 11 alleles with an average value of 2.75 alleles and sizes ranging from 182 to 216 bp ( Table 3). The chloroplast marker PCHSSR27 was monomorphic, detecting an allele of 195 bp. In contrast, the highest number of alleles (5) was detected for the PCHSSR3 marker. The MAF ranged from 0.69 (PCHSSR19) to 0.92 (PCHSSR31), and PIC values ranged from 0.13 (PCHSSR31) to 0.39 (PCHSSR3).
Overall, cpSSRs and nSSRs discriminated 81 of the 91 analysed genotypes, detecting a total of 147 alleles with an average value of 12.25 and an average MAF of 0.47. The mean values of exp-het, obs-het and PIC were 0.63, 0.43 and 0.61, respectively. The genetic relationship among analysed genotypes is presented in the neighbour-joining dendrogram constructed using both nSSR and cpSSR data (Fig 2). The cluster analysis identi-  The 73 genotypes exhibiting one or two alleles for each locus were also included in a population stratification analysis. Unlike the results of the neighbour-joining analysis, the population stratification analysis identified two sub-populations (K = 2). This analysis was performed following a plateau criterion [51], a non-parametric Wilcoxon test [52], and the rate of change (ΔK) method proposed by Evanno et al. [46] (S2 Table). STRUCTURE analysis allowed the identification of two groups that will be henceforth named 'wild' and 'cultivated' sub-populations (Fig 3, S3 Table). Forty-two samples were characterized by a predominant (QI > 0.8) 'wild' genetic configuration, whereas twenty-two exhibited a predominance of the 'cultivated' sub-population. The remaining nine samples exhibited QI values less than 0.8 for both subpopulations and were therefore considered 'admixed' (S3 Table).
Within the four groups of pears (Table 1), individuals exhibited different relative frequencies of the 'wild' and 'cultivated' sub-populations. In particular, RS and ICV are mostly characterized by 'wild' and 'cultivated' genetic configurations, respectively, whereas LV and NCV presented a more balanced presence of both sub-populations (S3 Table). The LV group is characterized by a high relative contribution of the 'wild' genetic configuration. In total, 54% of the cultivars within this group exhibited a predominance of 'wild' subpopulation, whereas a notable proportion of individuals (32%) exhibited a clear predominance of the 'cultivated' subgroup. The same pattern registered in the NCV with nine individuals (60%) exhibiting a strong relative contribution of the 'wild' subpopulation, five individuals (33%) exhibiting an opposite trend in favour of the 'cultivated' subpopulation, and the remaining sample, 'Gentile', exhibiting a more balanced admixture between the two sub-populations (S3 Table).
The distinction between 'wild' and 'cultivated' subpopulations was further confirmed by the analysis of the fixation index (F ST ), a summary statistic quantifying the variation in allelic frequencies between groups. The F ST between these two subpopulations was 0.096, whereas the pairwise F ST estimates between 'admixed' and 'wild' or 'admixed' and 'cultivated' exhibited considerably reduced values (0.028 and 0.026, respectively).
To examine the presence of additional genetic stratification, the germplasm collection was divided into two subsets based on the two sub-populations detected ('wild' and 'cultivated'), and an additional round of structure analysis was separately performed on each of the two subsets following the approach presented by Urrestarazu and colleagues (2012) [49].
This nested structure analysis allowed for better characterization of each sub-population. The 'wild' subpopulation (S3 Table) exhibited the highest ΔK for K = 5 (43.1) although a secondary peak was detected for K = 2 (21.2, S1 Fig). For K = 2, all the P. pyraster and P. amygdaliformis accessions were assigned to the same subpopulation. In contrast, at K = 5, the two species were assigned to different sub-populations ('pink' for P. pyraster and 'yellow' for P. amygdaliformis, S2 Fig).  Related wild species influence the pear cultivars origin on Mount Etna occurring at PCHSSR27, in which one cultivar ('Kaiser') exhibiting an allelic size of 194 bp can be considered the origin of the median-joining network, whereas all other individuals exhibit an allelic length of 195 bp (S1 Table). Hap 1, 3, 5, 6 and 8 were further distinguished by different allelic sizes at PCHSSR3, whereas Hap2 originated from Hap1, exhibiting an additional mutation at PCHSSR19. Hap 4 and 7 contain 2 different mutations at PCHSSR31 compared with Hap 3.
Both nSSR and cpSSR were employed for PCA. The first two principal components (PC1 and PC2) accounted for 14.5% of total genotypic variability, allowing a distinction of cultivars  Table 1 presented a characteristic pattern on the biplot presented in Fig 5. International cultivars (ICV) were plotted on the positive PC1 quadrant, whereas all wild species (RS) exhibited negative PC1 values with the exception of P. pyraster 10 (plotted on the positive PC1 quadrant). Local cultivars (LV) and national cultivars (NCV) exhibited a similar pattern characterized by a unimodal Related wild species influence the pear cultivars origin on Mount Etna distribution of the samples centred in the range between -2 and +2 PC1 values. PC2 allowed a distinction within the RS, NCV and LV groups. The RS group is composed of individuals belonging to P. amygdaliformis and P. pyraster species. Extreme negative values indicated P. amygdaliformis individuals, whereas values of PC2 greater than -3.5 indicated the presence of P. pyraster. Similarly, NCV and LV can be further differentiated according to PC2, whereas PC3 can efficiently discriminate within the ICV group (data not shown).

Discussion
The Mount Etna area claims a large number of differentiated pear varieties due to ancient cultivation practises and the variability of soil and climatic conditions. Local varieties display significant variability in many agronomic traits, including fruit size, flowering and ripening periods and harvesting time [53]. Local germplasm could represent an important source of genetic diversity that can be readily used by breeders to develop novel cultivars with enhanced agronomic traits, including fruit quality, adaptability to limiting environmental factors and resistance to biotic stresses. The Pyrus genus includes important cultivated species that have been widely studied by means of different molecular markers, mostly nuclear markers. Polyploidy was occasionally observed in several Pyrus species [33]. In this work, the phylogenetic relationships among the traditional and broadly cultivated genotypes, wild accessions and related species in Pyrus were inferred by coupling cytoplasmic and nuclear markers. Two different types of SSR markers were adopted to combine the investigation of the maternal inheritance property of cpSSRs with the high informativity of nSSRs [54].
The study presented here employed eight nSSRs (Table 2). Among these nSSRs, three (CH02 h11a, CH05d04, CH04e03) were originally developed for apple. The high conservation between the genera Pyrus and Malus is confirmed by the high heterozygosity of CH02h11a and CH05d04 with the latter exhibiting the highest allelic diversity within the nSSR marker set. On the other hand, CH04e03 exhibited the lowest heterozygosity but was still efficiently used in the germplasm characterization analysis [55]. Although SSRs are not the markers of election for High Throughput Analysis, the use of both nuclear and cytoplasmic microsatellites allowed a high level of discrimination of the collection. Our analysis identified four multilocus nSSRs (BGT23b, TsuENH025, TsuENH026, NH015a), which is consistent with data previously reported in other studies on genetic characterization in pear [25,[56][57][58]. The most likely explanation of the high occurrence of multi-locus SSRs can be traced to the allopolyploid origin of Maloideae that originated from ancestors belonging to the subfamily of Spiraeoideae [26,33,[59][60]. The molecular weight of the amplified fragments, including the M13 tail, falls in the same range of the amplicons obtained in P. communis for nSSRs [16][17]55] and P. ussuriensis for cpSSRs [38]. Overall, the average number of alleles (17) among the analysed genotypes was increased in this study compared with that observed in earlier studies performed in several collections of P. communis using different sets of SSRs (ranging from 5.9 to 12.8) [21,25,54,[56][57][61][62][63][64][65][66].
Molecular characterization using SSR markers highlighted high genetic variability among the analysed accessions. It was possible to characterize most of the local varieties and to highlight the genetic relatedness of Italian varieties (NCV + LV) with wild genotypes (Fig 2).
As expected, the international varieties 'Williams' and 'Max Red Bartlett' exhibited the same SSR profile, confirming the origin of 'Max Red Bartlett' as a bud mutation of 'Williams'. Fingerprint analysis confirmed other previously known pedigree records, such 'Harrow Sweet' ('William' x 'Purdue 80-50'), for which the observed SSR profiles were consistent with the known origin. Some local varieties exhibited the same SSR profile ('Virgolese'-'Pergolesi'; 'Faccibedda'-'Pauluzzo'; the small-fruit pear 'Moscatello maiolino'-'Franconello'; the two accessions of 'Bianchetto' n. 1 and 2), indicating possible cases of synonymy or, at the very least, a close genetic relationship. This question could be further investigated through phenotypization and/or deeper molecular analysis.
Interestingly, most of the local varieties clustered together in B1 and B2 subclusters of the dendrogram. In structure analysis, most of the local varieties exhibited an increased contribution of the 'wild' subpopulation to their genetic makeup.
Among the wild species, few P. pyraster genotypes were reported in every cluster, exhibiting increased genetic similarity to the local pear varieties compared with P. amygdaliformis. In contrast, P. amygdaliformis appeared to be more conserved.
The use of nSSRs in structure analysis (Fig 3) allowed the definition of two sub-populations, defining wild species on one side with all the accessions of P. amygdaliformis and P. pyraster and a second group of cultivated pears. The relative contribution of one of the two sub-populations clearly demonstrates the different genetic structure of the ICV and RS groups characterized by a predominant contribution of the 'cultivated' and 'wild' sub-populations, respectively. In contrast, increased levels of admixture are registered for NCV and LV groups. These results suggest an increased contribution of the 'wild' subpopulation to the genetic makeup of many Italian varieties, especially local varieties, compared with internationally cultivated varieties. The F ST between the two subpopulations was 0.096, revealing increased population differentiation compared with other studies on pear germplasm collected in Spain [67] or Bosnia and Herzegovina [68].
The low polymorphism observed within the cpSSRs reflects an increased level of conservation of chloroplast DNA, which is consistent with that previously reported in the wild P. ussuriensis population [38]. Although none of the represented cpSSRs haplotypes can be unequivocally traced to wild or cultivated pear accessions, Hap2 seems to be more associated with cultivated accessions (89% of samples are cultivated accessions or admixed). In contrast, Hap1 and Hap3 are more associated with wild species (83% and 71%, respectively). Hap4 is exclusively associated with wild or admixture genotypes. However, limited individuals (6) are represented by this haplotype, thus preventing firm conclusions from being drawn (Fig 4). The absence of a haplotype unequivocally associated with wild or cultivated pear is congruent with the high level of admixture within the genus Pyrus. Additionally, the lack of a haplotype unequivocally representing one of the two sub-populations (Fig 3) clearly testifies to the occurrence of allelic interchange between wild and cultivated accessions.
Within related species, a different level of genetic admixture was observed. For example, P. amygdaliformis accessions were characterized by the absence of 'cultivated' subpopulation contribution compared with P. pyraster accessions. In particular, the close genetic proximity of P. pyraster to most of the local varieties could be explained by its wide employment as rootstock given its rusticity. These findings were further confirmed by our PCA analysis (Fig 5), in which P. pyraster accessions were located in the upper portion of the plot in the same region as several LV and ICV. In contrast, P. amygdaliformis accessions were mostly present in the lower left portion of the plot. PCA results were consistent with the outcome of the nested-structure approach. In the first round of structure analysis, both P. pyraster and P. amygdaliformis exhibited a predominant contribution of the 'wild' subpopulation, which is consistent with that observed in the PCA analysis in which these accessions were characterized by negative PC1 values. P. pyraster and P. amygdaliformis can be fully characterized through a second round of structure analysis (at K = 5) or considering the second PC.

Conclusions
For pear, the identification of traits of agronomic importance, including adaptability to different pedoclimatic conditions and resistance to biotic stresses, is crucial for breeding new varieties. Most of these traits can be found in local germplasm that to date has been less exploited for genetic improvement programmes. The Mount Etna area represents an important biodiversity repository for many woody species due to the presence of many climatic and pedological conditions and a wide range of altitudinal levels. In the case of cultivated species, such as pear, this richness has also been increased by the long history of diffusion and cultivation. In the present work, a set of ninety-five individual trees representing local varieties, wild genotypes and nationally and internationally cultivated varieties have been genotyped to obtain novel insights into pear genetic structure and to elucidate the influence of wild species on the genetic makeup of local germplasm. The molecular insight presented in this work could also be useful for future association studies at the genomic level, such as Genome Wide Association Studies (GWAS), in which a prior knowledge of the genetic diversity of the germplasm collection is a prerequisite of paramount importance.
Overall, our results allow a better understanding of the variability of the cultivars from the Etna area. We have provided evidence of allelic interchange between wild and cultivated subpopulations and provided useful information for better management and conservation of germplasm and the direction of pear breeding programmes.