Genetic diversity of Spanish Prunus domestica L. germplasm reveals a complex genetic structure underlying

European plum (Prunus domestica L.) is an ancient domesticated species cultivated in temperate areas worldwide whose genetic structure has been scarcely analyzed to date. In this study, a broad representation of Spanish European plum germplasm collected in Northeastern Spain and a representative set of reference cultivars were compared using nuclear and chloroplast markers. The number of alleles per locus detected with the SSR markers ranged from 8 to 39, with an average of 23.4 alleles, and 8 haplotypes were identified. Bayesian model-based clustering, minimum spanning networks, and the analysis of molecular variance showed the existence of a hierarchical structure. At the first level, two genetic groups were found, one containing ‘Reine Claude’ type reference cultivars altogether with ca. 25% of local genotypes, and a second one much more diverse. This latter group split in two groups, one containing most (ca. 70%) local genotypes and some old Spanish and French reference cultivars, whereas the other included 24 reference cultivars and only six local genotypes. A third partition level allowed a significant finer delineation into five groups. As a whole, the genetic structure of European plum from Northeastern Spain was shown to be complex and conditioned by a geographical proximity factor. This study not only contributes to genetic conservation and breeding for this species at the national level, but also supports the relevance of undertaking similar tasks of collection and characterization in other unexplored areas. Moreover, this kind of research could lead to future coordinated actions for the examination of the whole European plum diversity, to define conservation strategies, and could be used to better understand the genetic control of traits of horticultural interest through association mapping.


Reference cultivar
Country of origin K = 2 K = 3 K = 5 Haplotype 1 Albatros Hungary the CITA and Public University of Navarre collections, were prospected from singular trees that, at the moment of their collection, were actively cultivated (in backyards or small farms with the owner permission to conduct the study on this site) or were abandoned old trees for which specific permission was not required from three provinces of Aragon (Huesca, Zaragoza and Teruel), one from the Basque country (Alava) and Occidental Pyrenees in Navarra (S1 Table). The reference set was aimed at including: (i) old Spanish cultivars, (ii) old French cultivars (geographical proximity could have had important relationships with Spanish germplasm), (iii) a good representation of cultivars of the 'Reine-Claude'-group, whose presence in the past was reported as very important in the country [28][29][30], and (iv) a diverse set of other international cultivars, some of them mentioned in literature as relatively important in the past in the country, at least at the regional-level [31]. Young leaves of each accession were collected and immediately conserved at -20˚C until DNA extractions were performed. Total genomic DNA of each accession was isolated following the protocol described by Hormaza [34]. DNA concentration of each sample was quantified in a Nanodrop 1000 (Thermo Fischer Scientific, Wilmington, DE, USA), and working dilutions to 10 ng μl -1 final concentration were prepared.

Genetic characterization
The genetic characterization of all the accessions was performed using both nuclear and chloroplast markers. With regards to nuclear markers, a set of 21 SSRs [35][36][37][38][39] was selected according to their location in the reference linkage map for Prunus T×E (almond 'Texas' × peach 'Earlygold'). This SSR set contains three markers per chromosome, except for chromosomes 1, 2 and 4, for which two were included (S2 Table). The 21 SSR markers were amplified using four sets of multiplex PCR reactions, denoted as M01, M02, M03 and M04. Each multiplex was designed combining the molecular size (pb) of the fragments amplified for each SSR with different fluorescent dyes (S2 Table). The thermal profile for M01, M02, and M03 was performed as described in Dirlewanger et al. [38]. The four SSR markers included in M04 were amplified separately according to the thermal profiles proposed in their original references and combined after PCR. Fluorescently-labeled PCR products were separated using an ABI 3730 sequencer (Applied Biosystems, Foster City, CA, USA), and analyzed and sized with Peak Scanner Software version 1.0 (Applied Biosystems, Foster City, CA, USA). Chloroplast analyses were based on the amplification of six non-coding cpDNA regions, and were performed using six consensus primer pairs [HK, K1K2, VL, CD, DT and CS [40,41] and subsequently digested with three restriction enzymes: HinfI, TaqI and AluI (New England Biolabs). Three ng of total DNA were used for PCR amplifications, and 2 μl of PCR product were digested by 1.6 enzyme units in a mixture of 17 μl per sample. The reactions were incubated for 15 min at 65˚C for TaqI, and 15 min at 37˚C for HinfI and AluI, with a final inactivation for 20 min at 80˚C. Restriction fragments were run on 2.5% agarose gels, stained with red dye, and visualized under UV light. Approximate restriction fragment size was estimated with a 100 bp ladder marker (Invitrogen).

Diversity analysis
SSR diversity analysis. Since allele dosage determination in polyploid species is a complicated issue, we compared all multilocus genotypes scoring and recording the alleles as present/ absent. Thus, for instance, AABBCC and AAAABC genotypes were both codified and included in the dataset as ABC. This way to codify the data is known as the 'allelic phenotype' approach [42,43] and provides information on the presence of alleles, not on allele frequencies [44]. This approach has been shown to provide satisfactory results in recent population genetics works in polyploids [45][46][47]. The multilocus SSR profiles of all the accessions were compared pairwise in order to determine the genetic uniqueness of each accession and to quantify redundancy. The number of allelic phenotypes (A P ), the number of observed alleles per locus (A O ), the mean number of alleles per genotype (A M ), the effective number of alleles (A E ), and the number of rare alleles per locus (A R ) were estimated for each SSR locus. In order to quantify the occurrence of rare alleles, two levels were considered, present in < 5% (A <5% ) and < 1% (A <1% ) of the genotypes.
A pairwise distance similarity matrix between all genotypes was calculated using Bruvo distance (D B ) [48] through the 'poppr' R-package [49] on the R statistics platform [50], and graphically represented in minimum spanning networks (MSN) plots. Bruvo's genetic distance takes distances between SSR alleles into account without the knowledge of allele copy number or the requirement that individuals be the same ploidy [51]. Bruvo's distance ranges from 0, indicating identical genotypes, to 1, indicating maximum dissimilarity.
Chloroplast diversity analysis. As a complement to SSR diversity analysis, the polymorphism degree of the 18 possible cpDNA region/enzyme combinations was assessed. The study of chloroplast DNA variation using the PCR-RFLP has been shown to be useful and informative for studying chloroplast DNA diversity and phylogenetic relationships among Prunus species [12,[52][53][54][55]. For contingency reasons, a preliminary analysis of each primer-enzyme combination was performed in the set of the reference cultivars (46 genotypes). In a second analysis, only those combinations showing polymorphic fragments in the reference set were analyzed in the whole set of unique genotypes. The presence or absence of each restriction fragment at each polymorphic site was scored as binary data and used to identify chloroplastic haplotypes.

Genetic structure inference
The software STRUCTURE version 2.3.4 [56] was used to estimate the number of hypothetical subpopulations (K) and to quantify the membership probability of each genotype to the inferred subpopulations. Analysis was performed under the admixture model and correlated allelic frequencies, and run using the recessive allele approach [57] codifying the genotypes following the recommendations provided in the software manual for polyploid species, successfully applied in previous studies [16,19,[58][59]. The analysis was run for K values ranging from 1 to 10 inferred clusters, with 10 independent runs each, applying a burn-in period of 200,000 followed by 500,000 iterations. Structure Harvester ver. 0.6.93 [60], which implements the ΔK method defined by Evanno et al. [61], was used to estimate the most pertinent K value. Each genotype was assigned to the group for which it had the highest assignation probability (qI), considering a strong membership coefficient of a genotype to a particular group whenever qI ! 0.80 [16][17][18][19]21], whereas when qI < 0.80 were considered as admixed genotypes. Placement of accessions on the inferred groups was determined using CLUMPP ver. 1.1 [62] and the CLUMPP output was directly used as input for Distruct ver. 1.1 [63] to graphically display the results.

Genetic differentiation between and intra-group variability
The degree of differentiation between the genetic groups derived from STRUCTURE was estimated by performing analyses of molecular variance (AMOVA) using GenAlEx 6.5 [64,65]. The statistical significance of the variance components was assessed using 1000 permutations. Moreover, pairwise PhiPT, an analog of Wright's F ST for dominant binary data [66], were calculated among groups.
The mean pairwise distance (MPD), i.e., mean of the pairwise PhiPT per genetic group, an index of the group differentiation relative to other genetic groups, was calculated for each group. The within-group sum of squares divided by the number of individuals in the group was applied as a normalized intra-group variability index (nSSWG) [66]. Additionally, the pairwise Bruvo genetic distances among all genotypes clustered within each group were represented by heat maps. Allelic intra-group variability measures were provided for each group defined by STRUCTURE, including the number of alleles (N A ), the number of exclusive alleles (N EA ) and the mean number of alleles per genotype (A M ).

SSR polymorphism and redundancy
The study of amplification products showed that eight out of the 21 SSR markers used were problematic in terms of absence of amplification product, insufficient fluorescence signal, low level of polymorphism or complex scoring pattern and, as a consequence, were excluded from subsequent analyses (S2 Table). Furthermore, two out of the remaining markers, BPPCT007 and UDP98-406, amplified two loci each. The amplification ranges (pb) for this pair of markers were 118-167 pb and 68-124 pb, respectively, and since it was difficult to delimit the allelic range for each locus, their use was exclusively restricted to the identification process, and excluded from further analyses. Therefore, despite the alleged high cross-transferability of SSR loci reported between Prunus species [38,[67][68], it may not be true when a larger set of samples is considered.
Based on the remaining 11 SSR markers, 135 unique genotypes were found within the whole set of accessions evaluated (120 local and 46 reference accessions), which corresponds to a 18% of duplication degree as a whole. Seventeen groups of SSR duplicates were identified. Two accessions had the same SSR profile than 'Stanley', whereas eight had the same profile than cultivars within the 'Reine Claude' group (i.e. 'Reine Claude Verte', 'Reine Claude Tardive de Chambourcy', 'Reine Claude Diaphane', 'Reine Claude de Bavay' and 'Reine Claude d'Oullins'). The remaining groups of SSR duplicates were comprised exclusively by local accessions. It is worth noting that the latter SSR duplicates frequently included germplasm collected either in the Pyrenees or in the Iberian Cordillera, seldom including accessions collected from different areas. This fact may indicate that independent local selection processes could have been performed by the farmers from these areas in the past, seeking for adaptation to specific habitats across the heterogeneous Northeastern mountain landscape.
The presence of slight allelic differences was quantified by the pairwise Bruvo distance coefficient between genotypes, showing that 46 pairs of genotypes involving as a whole 24 differed in less than 0.05, which corresponded to a very few allelic mismatches. Slight allelic differences may result from potential genotyping errors, but also from putative somatic mutations, relatively frequent in long-lived tree species for which vegetative propagation has been used since ancient times [69][70].

Genetic diversity
Assessment of genetic diversity using nuclear SSR markers. The 11 SSR markers proved to be highly diverse in terms of their degree of polymorphism, ranging from 8 (UDP96-008) to 39 (BPPCT025) alleles per locus, leading to a total number of 257 alleles ( Table 2). The average number of alleles per locus (23.4) was slightly higher than those reported for Swedish (22.7; [14]) and Croatian germplasm (18.7; [13]), and lower than reported for French traditional germplasm (29) [12]. The mean number of alleles per genotype (A M ) ranged from 2.74 (UDP98-409) to 4.89 (CPSCT026), with a mean value of 4.01. In spite of the high number of alleles found, the mean number of effective alleles (A E ) was low (9.21), indicating that an important number of alleles appeared in the population at very low frequency. Thus, the proportion of rare alleles was high, with a mean percentage of 35.4% and 10.3% alleles per locus found respectively in less than 5% and 1% of the genotypes, and 32 alleles were unique (i.e. identified only in one genotype). This fact highlights the richness and allelic singularity that still can be found in traditional material, thus representing an important unexplored source of genetic variation.
Genetic diversity was also quantified using the allelic phenotype approach, as this task is much more problematic in polyploids than in diploids, since more than two alleles per individual and locus are transmitted, potentially including multiple copies of a given allele [10]. In this study, the highest number of A P corresponded, as expected, to those markers being the richest in detected alleles ( Table 2). The most frequent A P occurred in variable rates among markers, ranging between 6.25% (UDP96-005) and 34.09% (UDP96-008) of the genotypes. Assessment of genetic diversity using chloroplast DNA (cpDNA) markers. Chloroplast DNA (cpDNA) variation was assessed using PCR-RFLP and, seven out of the 18 primerrestriction enzyme combinations resolved polymorphic fragments in the reference set of cultivars. When analyzed with those seven primer-restriction enzyme combinations, 13 polymorphic sites were detected (S3 Table), and the whole set of unique genotypes considered revealed eight different haplotypes (named from H1 to H8) ( Table 3). This number of haplotypes is slightly higher than that found in 80 European plum cultivars from the French National collection, where five were detected using the same methodology [12]. In our study, H1 was by far the most prevalent haplotype, with a frequency of 0.86, whereas H5 was the second in frequency (0.08), and the remaining haplotypes included just two or a single genotype. Reference cultivars exhibited mainly haplotypes H1 and H5, the most frequent ones, although 'Albatros' and 'Tuleu gras' constituted H6 haplotype (Table 1).
A low degree of cpDNA variation can be due to historical events (i.e., number and type of the refugia, mode of recolonization of the species, etc.), but also to human-mediated activities  [71]. The presence in our study of a prevalent haplotype is consistent with the results reported for P. domestica germplasm preserved at the French National Plum Collection [12]. These authors argued that the presence of one major haplotype may suggest a limited number of founders during the period that plum was first introduced into Western Europe.
Genetic structure: Major divisions and substructuring of the diversity Estimate of the number of hypothetical genetic groups. The analysis of the rate of change ΔK over the range of K values evaluated in STRUCTURE revealed that the germplasm could be divided into two and three groups, with a less pronounced peak found at K = 5 (Fig  2), ΔK values being 39.  Table 4). The fact that all the K values resulted in asymmetric divisions may be indicative of the existence of a real population structure, and not resulting from a statistical artifact [56].
In species with a complex genetic background such as European plum, Bayesian clustering methods may detect genetic structures at different levels, which makes recommendable an examination of the results not only for the K value showing the highest ΔK, but also for some others showing relatively high values for this parameter in order to delineate further levels of substructure of the diversity. Therefore, the most likely divisions obtained for K = 2, K = 3 and K = 5 will be examined and compared throughout the subsequent sections. The consistency in the clustering of the genotypes between runs was examined to analyze the robustness of the divisions obtained at the three K values. The assignment of the genotypes was very consistent between runs, none shifting from genetic group when K = 2 and K = 3, and just 13 admixed genotypes showed slight discrepancy between runs when K = 5. These three partitioning levels were then compared in terms of their mean assignment probability of genotypes to the inferred groups for each of the three K values considered, as well as to the proportion of genotypes strongly assigned (qI ! 0.80) [17]. The mean probability of assignment for the genotypes to the inferred groups for the three K values was very high and almost identical (around 0.80). The proportion of genotypes strongly assigned differed between K values (Table 4), with 64%, 74% and 59% of the genotypes being strongly assigned when K values were 2, 3 and 5, respectively. The minimum spanning networks (MSN) based on Bruvo's distance (S2 Fig) were consistent with the results obtained with the Bayesian clustering method at K = 2, K = 3 and K = 5, supporting the existence of the above mentioned genetic groups.
Placement of genotypes in the genetic groups. For K = 2, ca. 25% of the local genotypes clustered together in G2.1 with all the reference cultivars of the 'Reine Claude' group (Fig 2A), a varietal-group whose preponderance in the past at the national level in terms of geographical distribution and production has been broadly documented [28][29][30][31]. The remaining local genotypes (ca. 75%) clustered in G2.2 along with those reference varieties (32) not belonging to the 'Reine Claude' group (Table 4). This major structure remained greatly unchanged when the population was examined using K = 3 and K = 5, since the 'Reine Claude' group (G2.1) remained essentially unmodified (corresponding to G3.1 and G5.1), whereas higher K values allowed disentangling internal pattern of substructure of G2.2. The study of Prunus domestica L. germplasm maintained in the French National Plum Collection [12] also reported the structuring of 'Reine Claude' material as a separate group. Although 'Reine Claude' varietal-group was generally thought to have a genetic origin similar to other European plum morphological groups, Horvath et al. [12] argued that its genetic origin could be different, and our results support the hypothesis that 'Reine Claude' germplasm could have been originated from a distinct genetic background.
As mentioned above, when K = 3 was considered, G2.2 was partitioned into G3.2 and G3.3 (Fig 2B). The former group comprised ca. 70% of the local genotypes (Table 4), which appeared clustered in association with old Spanish and French reference cultivars such as 'Monsieur Hatif', 'de la Rosa', 'Saint Antonin' or 'Catalonia' [12,31,[72][73]. By contrast, G3.3 included 24 reference cultivars, but only six local genotypes, all in admixis (Table 4; Fig 2B). Although G3.3 was very heterogeneous, it could be hypothesized that 'Stanley' ('D'Ente' × 'Grand Duke') could have played a significant role in the origin of this group, as some of the reference cultivars within G3.3 [e.g. 'Cacanska lepotica' ('Stanley' × 'Rutt Gerstetter'), 'Cacanska rodna' ('Stanley' × 'Rutt Gerstetter'), 'Valor' ('Imperial Epineuse' × 'Grand Duke'), or 'D'Ente' (unknown parentage)] present differential degrees of relatedness to it. Moreover, the chloroplastic haplotypes for some of the reference cultivars of this group for which either their two parents or at least their maternal progenitors were analyzed share haplotype H5 (Table 1), the second most frequent out of the eight identified, which would reinforce the likeliness of this relatedness. Some examples of the latter are 'Stanley', 'Cacanska lepotica', 'Cacanska rodna' or 'Valor'. Taking all the above into consideration, a certain level of genetic relatedness among some of the reference cultivars within this group may be hypothesized, though it cannot be confirmed, as their parentages are frequently unknown.
Last, when K = 5 was considered, G3.1 remained mainly unchanged, whereas G3.2 and G3.3 were split in two groups each, these subdivisions corresponding to, respectively, G5.2 and G5.3, and G5.4 and G5.5 (Fig 2C), suggesting that increasing K from 3 to 5 allowed discerning  Table 4. Genetic diversity measures for each of the genetic groups defined with STRUCTURE at K = 2, K = 3 and K = 5. Number of genotypes (n), number of reference cultivars (n R ), number of alleles (N A ), number of exclusive alleles (N EA ), mean pairwise distance (MPD), normalized intra-group variability index (nSSWG) and mean Bruvo distance between the genotypes clustered within each group.  (Table 4). G5.2 and G5.3 are majorly comprised by local genotypes, as only 10 out of the 71 genotypes included are reference cultivars, mainly of French and Spanish origin. G5.4 includes most the genotypes clustered in G3.3, and G5.5 had just three genotypes, very strongly assigned. The latter is noteworthy as, despite the very small size of G5.5, two out of the three genotypes displayed two chloroplastic haplotypes occurring at a low frequency (H2 and H3). All these results suggest that the structure of the plum local germplasm in Spain has been conditioned by a geographical proximity factor. Thus, ca. 95% of the local genotypes, clustered in three groups (G5.1, G5.2 and G5.3), where representatives of either the 'Reine Claude' group or old Spanish and French cultivars were found. The geographical proximity and historic connections between Spain and France, and the existence of ancient commercial roads and pilgrim's routes since the Middle Ages could have favored the exchanges of cultivars between regions from both countries. Conversely, just a few local genotypes (ca. 5%) clustered together with other reference cultivars, which could result from the introduction in the late 1960s of cultivars such as 'Stanley', 'President', or 'Ruth Gerstetter' by commercial nurseries [31]. Therefore, to find only a residual presence of local genotypes related to such kind of germplasm is not surprising. By contrast, to find that 'Arandana', a very old and emblematic Spanish cultivar, clustered in the same group than 'Stanley', 'President', or 'Ruth Gerstetter' was unexpected. Therefore, the impact that 'Arandana' would have had in originating new local cultivars seems very limited, in spite of its past relevance in the country [31].
Distribution of the haplotypes into the groups inferred. The analysis of the frequency distribution of the chloroplastic haplotypes into the genetic groups defined by STRUCTURE revealed noticeable differences (S4 Table). When K = 2 was considered, G2.1 mostly comprised genotypes carrying haplotype H1 (ca. 95%), except for two admixed genotypes with haplotype H5, whereas G2.2 contained genotypes showing the eight haplotypes, including all those constituted just by one or two genotypes. With regards to the two most frequent haplotypes (H1 and H5), when K = 3 was considered, the frequency of occurrence of H1 was considerably lower in G3.3 (~67%) than those found for G3.1 and G3.2 (! 90%), whereas haplotype H5 was four times more frequent in G3.3 (20%) than in the other two groups (~5%). Similarly, the haplotype H5 was not equally distributed between groups when considering K = 5, being mainly concentrated in G5.2 and G5.4. Chi 2 tests indicated significant differences in the distribution of the haplotypes between all pairs of groups except for the pairs G2.1-G2.2 (P = 0.120) and G3.1-G3.2 (P = 0.171).
Biparentally and maternally inherited markers such as, respectively, nuclear and cpDNA markers, have different inheritance modes and evolutionary rates. It is widely accepted that cpDNA markers reflect a past change in population variation, such as a population expansion or decline, whereas nuclear markers infer recent events in the population [74][75]. Therefore, it is not surprising than some haplotypes appear more frequently in some genetic groups inferred based on nuclear markers.

Intragroup variability
The study of the intragroup variability revealed that the 'Reine Claude' group contained as a whole a lower level of diversity than other groups. This fact was supported by a much lower percentage of total alleles captured in comparison to those found in the other groups, and especially by a much lower presence of exclusive alleles harbored in this group. This trend was maintained irrespectively of the K value considered (Table 4). Moreover, the mean similarity of the genotypes from the 'Reine Claude' group ranged between 0.30 and 0.39 (depending on the K value considered), below the range observed for the other groups (between 0.49 and 0.58), and its intragroup variability index (nSSWG) was lower than the remaining ones, also mirroring its lower intragroup genetic diversity. Heat maps represented in S3 Fig provide a clear display of this trend, as the parts of the plot that include genotypes of the 'Reine Claude' group are much more homogeneous in color. All these results may indicate that local genotypes strongly clustered within this group are probably the result of conscious or unconscious hybridization processes between representatives of this genetic group.
The other group that presented distinct characteristics with this regard was G5.5, the genetic group integrated by just three genotypes. Despite its very small size, this group displayed seven exclusive alleles (Table 4) and belonged to two non-very common chloroplastic haplotypes (Table 3). It also presented the lowest nSSWG value, although this fact comes mainly from its extremely small size. These genotypes could therefore be hybrids between P. domestica L. and other Prunus species, since the occurrence of interspecific hybridizations, either natural or human-mediated, is relatively common within the genus Prunus [8,12,76].

Genetic differentiation between groups
Genetic differentiation between the groups obtained by the Bayesian model-based clustering method showed moderate differentiation (P < 0.001) for all the three K levels studied. Intergroup variation accounted for 11.4%, 10.8% and 12.7% of the total variation at K = 2, 3 and 5, respectively (Table 5). Regarding to MPD estimates, the groups exhibiting the highest differentiation relative to the other groups were G5.5 (0.27) and G5.1 (0.23) ( Table 4). Pairwise PhiPT values at K = 3 resulted in moderate differentiation for the three pairs of genetic groups (0.10-0.14), whereas very high PhiPT estimates (up to 0.45) were found for some of the pairs of groups identified at K = 5 (Table 6). PhiPT values had been examined as they can help to identify unequal differentiation between some specific pairs that could remain hidden in the AMOVA and MPD estimates.
Analysing pairwise PhiPT in detail, it is worth highlighting that G5.2 and G5.3, the groups comprising~70% of the local genotypes in close relationship mostly with old Spanish and French reference cultivars differentiated at 0.19 and 0.16 with the 'Reine Claude' group (G5.1), respectively. This high level of differentiation from 'Reine Claude' group was an unexpected Table 5. Analyses of molecular variance (AMOVA) between the genetic groups defined with STRUCTURE at K = 2, K = 3 and K = 5. finding, since it is the varietal-group reported to be the prevailing one in the past in terms of abundance and geographical distribution in Northeastern Spain. Taking that prevalence into account, it appeared sensible to expect a higher degree of relatedness and, consequently, a less differentiation between this group and those containing most of the Spanish traditional germplasm. The contribution of 'Reine Claude' group cultivars to create new variability appears to have been relatively limited to generating a relatively closed set of local cultivars, as they cluster with~25% of local genotypes, but they apparently played a much lesser role in the origin of local genotypes in other groups. 'Reine Claude' varietal-group, having been crucial in European horticulture, with representatives as 'Reine Claude Verte' grown for more than five centuries [71,[77][78], has unquestionably played a decisive role shaping the genetic diversity of this species in Spain as well as at the European scale, but the intensity of this influence may differ depending on the geographical region and time of selection and on the aptitude sought. Finally, G5.5 showed the highest MPD (0.27) ( Table 4) and pairwise PhiPT values (ranging from 0.19 to 0.45) ( Table 6). Although these estimates should be regarded with caution since could be biased due to the small size of G5.5, indicate a remarkable differentiation of these genotypes, and could be attributable to a putative hybrid status. This intergroup distinctness, altogether with the results described in the intragroup analysis, reinforces this hypothesis. However, this hybrid status should be confirmed using the same SSR markers in the characterization of a representative set of cultivars of different Prunus species, as well as of hybrids between P. domestica and other species.

Concluding remarks
Our work demonstrates that, prospecting missions in unexplored areas may still be useful to recover an important source of diversity for this species, as local genotypes have been shown to enrich the genetic diversity held in varieties grown worldwide. Therefore, it would be advisable to perform similar tasks of collection and characterization, since understanding the extent and organization of diversity could promote efficient conservation actions, recovering ancient cultivars of potential interest, and ease their use into breeding programs in a near future. Nonetheless, the potential value of most local genotypes for the present-day market needs is largely unknown, since they have not been sufficiently characterized from an agronomic and consumer point of view. Hence, comprehensive phenotypic characterization based on standardized methods would be necessary in order to learn the commercial potentiality of these cultivars. Our approach, based on Bayesian model-based clustering, minimum spanning networks, and the analysis of molecular variance, revealed the existence of a hierarchical structure in European plum germplasm from Northeastern Spain. At the first level, two genetic groups were found, one containing 'Reine Claude' type reference cultivars altogether with ca. 25% of local genotypes, but the study of genetic structure at further levels evidenced the existence of an additional internal substructure, that yielded up to five genetic groups. The inferred groups were clearly differentiated and showed noticeable differences in the allelic composition at the group level. Additionally, the fact that ca. 70% of the European plum local genotypes clustered with old Spanish and French reference cultivars indicates that population structure has been deeply conditioned by a geographical proximity factor, and underlines that the genetic background of an important part of the Spanish germplasm may differ from genepools of other origin. The genetic characterization reported herein not only constitutes the most comprehensive study of population structure from Spain, but also puts into value collection and characterization actions. Moreover, this kind of research could lead to future coordinated actions for the examination of the whole European plum diversity, to define conservation strategies, and could be used for defining an European core collection, useful to better understand the genetic control of traits of horticultural interest through association mapping.  Table. Information of the unique genotypes used in this study. Collection information include name, collection site, specific longitude, latitude, approximate elevation, group placement by structure analysis with the mean qI values (when K = 2, K = 3 and K = 5 were considered) and the chloroplastic haplotypes. (XLSX) S2  Table. Frequency distribution of the chloroplastic haplotypes within the genetic groups defined by STRUCTURE at K = 2 (A), K = 3 (B) and K = 5 (C). (DOCX)