Characterization of Worldwide Olive Germplasm Banks of Marrakech (Morocco) and Córdoba (Spain): Towards management and use of olive germplasm in breeding programs

Olive (Olea europaea L.) is a major fruit crop in the Mediterranean Basin. Ex-situ olive management is essential to ensure optimal use of genetic resources in breeding programs. The Worldwide Olive Germplasm Bank of Córdoba (WOGBC), Spain, and Marrakech (WOGBM), Morocco, are currently the largest existing olive germplasm collections. Characterization, identification, comparison and authentication of all accessions in both collections could thus provide useful information for managing olive germplasm for its preservation, exchange within the scientific community and use in breeding programs. Here we applied 20 microsatellite markers (SSR) and 11 endocarp morphological traits to discriminate and authenticate 1091 olive accessions belonging to WOGBM and WOGBC (554 and 537, respectively). Of all the analyzed accessions, 672 distinct SSR profiles considered as unique genotypes were identified, but only 130 were present in both collections. Combining SSR markers and endocarp traits led to the identification of 535 cultivars (126 in common) and 120 authenticated cultivars. No significant differences were observed between collections regarding the allelic richness and diversity index. We concluded that the genetic diversity level was stable despite marked contrasts in varietal composition between collections, which could be explained by their different collection establishment conditions. This highlights the extent of cultivar variability within WOGBs. Moreover, we detected 192 mislabeling errors, 72 of which were found in WOGBM. A total of 228 genotypes as molecular variants of 74 cultivars, 79 synonyms and 39 homonyms as new cases were identified. Both collections were combined to define the nested core collections of 55, 121 and 150 sample sizes proposed for further studies. This study was a preliminary step towards managing and mining the genetic diversity in both collections while developing collaborations between olive research teams to conduct association mapping studies by exchanging and phenotyping accessions in contrasted environmental sites.

Olive (Olea europaea L.) is a major fruit crop in the Mediterranean Basin. Ex-situ olive management is essential to ensure optimal use of genetic resources in breeding programs. The Worldwide Olive Germplasm Bank of Có rdoba (WOGBC), Spain, and Marrakech (WOGBM), Morocco, are currently the largest existing olive germplasm collections. Characterization, identification, comparison and authentication of all accessions in both collections could thus provide useful information for managing olive germplasm for its preservation, exchange within the scientific community and use in breeding programs. Here we applied 20 microsatellite markers (SSR) and 11 endocarp morphological traits to discriminate and authenticate 1091 olive accessions belonging to WOGBM and WOGBC (554 and 537, respectively). Of all the analyzed accessions, 672 distinct SSR profiles considered as unique genotypes were identified, but only 130 were present in both collections. Combining SSR markers and endocarp traits led to the identification of 535 cultivars (126 in common) and 120 authenticated cultivars. No significant differences were observed between collections regarding the allelic richness and diversity index. We concluded that the genetic diversity level was stable despite marked contrasts in varietal composition between collections, which could be explained by their different collection establishment conditions. This highlights the extent of cultivar variability within WOGBs. Moreover, we detected 192 mislabeling errors, 72 of which were found in WOGBM. A total of 228 genotypes as molecular variants of 74 cultivars, 79 synonyms and 39 homonyms as new cases were identified. Both collections were combined to define the nested core collections of 55, 121 and 150 sample sizes proposed for further studies. This study was a preliminary step towards managing and mining the genetic diversity in both collections while developing collaborations between olive a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Olive (Olea europaea, ssp. europaea, var. europaea [1]) is widely cultivated for oil and canned fruit. It represents a commercially important fruit crop in the Mediterranean Basin, where about 95% of the world's olives are produced. More than 3,300,000 t of olive oil are produced annually throughout the world [2] on an area of over 10.8 million ha [3], ranking 7 th among all vegetable oils produced worldwide, while olive ranks 25 th among the 160 most cultivated crops in the world [3]. Of the 47 olive growing countries, Spain, Italy and Greece are the top three countries, accounting for around 38%, 13% and 10% of total olive oil production, respectively [2]. Cultivated olive was domesticated from the wild type (Olea europaea subsp. Europaea var. sylvestris) in the Middle East 6,000 years ago [4,5]. Early domesticated forms were probably disseminated during successive human migrations from east to west and introgressed with local wild olives, in turn giving rise to local cultivated forms through selection by farmers [6][7][8][9][10]. Olive spread from Mediterranean areas throughout the world. This crop is now of increasing commercial interest beyond the Mediterranean Basin, including countries such as Australia, Chile and USA [2].
Over the long history of olive domestication, cultivated forms have been selected, propagated and disseminated by farmers. An international initiative was thus conducted to pool olive germplasm information in a single database. The 2008 web-based edition (http://www. oleadb.it/) is currently the largest database with information extracted from almost 1520 publications [11]. This database nevertheless represents an underestimated level of domesticated olive diversity since it overlooks many minor local cultivars specific to olive growing areas such as Morocco, Cyprus and Syria. There are currently more than 1,200 cultivars with over 3,000 synonyms reported in 54 different countries and maintained in almost 100 separate collections at international, national and regional levels for conservation and evaluation purposes [11][12][13], this includes: in the Mediterranean Basin, at Cosenza (Italy, 500 cultivars [14]), Izmir (Turkey, 96 cultivars [15]) and Chania (Greece, 47 cultivars [16]), in addition to new olivegrowing regions of the world such as at Davis (USA, [17]) and Mendoza (Argentina, [18]). However, few of these cultivars have been fully characterized using molecular markers and morphological traits. Two worldwide olive germplasm banks (WOGB) exist in the Mediterranean Basin, the first was set up in the 1970s at Cordoba (Spain, WOGBC), with about 500 accessions from 21 countries [19][20][21]. In 2003, in the framework of the ResGen-T96/97 Project funded by the EU and IOOC and including 16 partners, a second WOGB was created at the INRA Research Station of Tassaoute, Marrakech (Morocco, WOGBM). This worldwide collection presently includes almost 560 accessions originating from 14 Mediterranean countries [22][23]. To optimize olive germplasm sampling, local genetic resources had been characterized by different partners using standardized morphological and/or molecular descriptors. WOGBM was thus established while including previously characterized olive genetic resources from each Mediterranean country.
Recently, socioeconomic changes in most olive producing countries have led to significant improvements in olive growing, including the establishment of modern orchards based exclusively on a few high-yielding and low-vigor cultivars, such as cv. Arbequina. These trends may potentially lead to the erosion of local olive germplasm because several minor traditional olive dataset for both WOGBs using SSR markers and morphological traits to manage, use and exchange plant material, (iii) to propose a subset of cultivars encompassing all of the genetic diversity in both collections, and (iv) to release information and methodologies that could be used for characterizing other national and international ex-situ olive collections [14][15][16][17]47,49,67].
Aligning SSR alleles of WOGBM and WOGBC databases. To align alleles between the two collections, 47 accessions from WOGBC already analyzed by Trujillo et al. [21] were regenotyped with the 20 SSR loci in laboratory conditions similar to those used for genotyping and visualizing WOGBM accessions (S2 Table). A total of 407 alleles among a set of 466 alleles (87.3%) were observed within this panel of 47 accessions using 33 SSR markers from the previous study of Trujillo et al. [21]. Once the 47 accessions were genotyped, the sizes of alleles observed in WOGBM were adjusted to match those recorded by Trujillo et al. [21] in WOGBC (S2 Table).
Morphological characterization. Morphological characterization was independently carried out by two observers using a representative sample of 40 endocarps/tree for 518 olive trees among the 554 analyzed with SSR markers. Each tree was analyzed twice during 2015 and 2016. Based on the protocol described by Trujillo et al. [21], we used eleven endocarp traits to characterize WOGBM and to compare the morphological datasets of both collections: weight, shape in position A, symmetry in positions A and B, position of maximum transverse diameter in position B, shape of apex in position A, shape of base in position A, surface roughness, number of grooves on the basal end, distribution of the grooves on the basal end and presence of mucro.

Data analysis
Total genotypes detected within the whole dataset (1091 olive trees) were discriminated by pairwise comparison of their SSR profiles using the Excel Microsatellite TOOLKIT [71]. Genetic diversity in each collection separately and in the whole dataset (both collections) was estimated by calculating different parameters for each microsatellite locus using the Excel Microsatellite TOOLKIT, including: allele size (bp), number of alleles (Na), number of alleles observed once (Nu), observed (Ho) and expected heterozygosity (He; [72]) and polymorphism information content (PIC; [73]). Otherwise, pairwise comparisons between samples based on endocarp traits were conducted to identify similar morphological profiles using a binary matrix of different morphological states.
Genetic structure in the whole dataset and for each collection was determined using a model-based clustering method implemented in STRUCTURE v.2.3.4 [74]. Bayesian analysis was run under the admixture model for a burn-in period of 200,000 iterations and a post-burning simulation length of 1,000,000 while assuming a correlation among allele frequencies. Analyses were run for K clusters from 1 to 8 with 10 replicates per K value. The most likely number of clusters was determined using the ad-hoc ΔK measure [75] with the R program [76], whereas the similarity index between the 10 replicates for the same K clusters (H 0 ) was calculated with the CLUMPP v1.1.2 program (Greedy algorithm [77]).
To describe the spatial distribution of genotypes, a principal coordinate analysis (PCoA), as implemented in the DARWIN v.5.0.137 program [78], was constructed based on SSR data with the simple matching coefficient [79]. For genotypes showing molecular variants in two collections, SSR data were converted into binary matrix (0 and 1) and the dendrogram was generated using the Dice similarity index [80] and UPGMA method with NTSYS-PC v2.02 software [81].
Comparisons between the two collections were carried out based on: (1) the accession denomination, (2) number of genotypes and cultivars in common, (3) genetic parameters such as number of alleles (Na) and Nei diversity index (He), (4) the genetic distance distribution between shared genotypes and those specific to each collection using the Smouse and Peakall index [82] in GENALEX 6.5 [83], (5) the allelic richness (Ar [84]); and (6) the genetic structure within both collections using the model-based Bayesian clustering approach implemented in STRUCTURE. The allelic richness (Ar) was computed according to a generalized rarefaction approach at the standardized G value using the ADZE program [85]. Significant differences in rarefied Ar and He were determined using the Mann-Whitney comparison test (p�0.05) with the PAST program [86].
Core collections representative of the genetic diversity in the whole dataset were constructed based on the two-step method, as described by El Bakkali et al. [23], by combining approaches implemented in MSTRAT [87] and CORE HUNTER [88] programs, and using Maximization [89] and 'Sh' strategies, respectively. Fifty final core collections were generated independently and one core collection was arbitrary selected and described.

Characterization of WOGBM
SSR polymorphism. Using the 20 SSR loci, a total of 370 alleles were observed within the WOGBM collection among which 49 alleles were just observed once. These alleles were checked by re-amplification to determine their correct SSR profiles. The number of alleles ranged from 6 for DCA15 to 35 for DCA10, with a mean of 18.5 alleles per locus ( Table 2). Allele frequencies ranged from 0.09% to 70.5%, while 203 alleles (54.8%) showed a frequency of less than 1%.
Cultivar identification using SSR markers and morphological traits. The 20 SSR markers identified 402 unique genotypes among the 554 WOGM accessions with at least one dissimilar allele (Table 1). In line with the findings of Trujillo et al. [21], each genotype was coded with an ordinal number (S1 and S3 Tables). The 554 accessions were classified as follows: (i) 323 accessions were identified as unique SSR profiles (not duplicated in WOGBM), and (ii) 231 accessions had SSR profiles in common with other accessions in the collection, resulting in the identification of 79 different SSR profiles. Three SSR markers (DCA09, UDO099-043 and DCA16) were able to identify 84% of the accessions. Six markers (DCA09, UDO099-043, DCA16, DCA10, DCA04 and GAPU103) were used to discriminate 95% of the accessions, whereas eight additional markers (DCA03, DCA08, DCA18, GAPU71B, DCA11, UDO099-011, DCA01 and DCA15) were applied to distinguish all WOGBM accessions.
Based on endocarp traits, a total of 251 different morphological profiles were identified that were coded with an ordinal number (S1 and S4 Tables). The accessions were classified as follows: (i) 164 accessions had unique morphological profiles (not duplicated in WOGBM), and (ii) 354 accessions had morphological profiles in common with other accessions in the collection, resulting in the identification of 87 different morphological profiles.

Characterization of WOGBC SSR polymorphism.
A total of 346 alleles were observed in WOGBC, with a mean of 17.3 alleles per locus, while 189 of these alleles (54.62%) showed a frequency of less than 1%, and 44 alleles were just observed once ( Table 2). The allele number ranged from 6 for UDO99-17 to 36 for DCA10 loci.
Cultivar identification using SSR markers and morphological traits. Using 20 SSR markers, the 537 WOGBC profiles were classified in 400 different genotypes coded with an ordinal number, as reported by Trujillo et al. [21] (S1 and S3 Tables). Only 15 SSR profiles were switched using the current set of markers compared to 33 analyzed by Trujillo et al. [21] (S1 Table). For instance, Trujillo et al. [21] identified 239 different genotypes originating from Spain whereas the 20 SSR markers only revealed 232 genotypes in the current analysis. Furthermore, the "Alameño de Marchena" (COR000254) and "Zarza" (COR000038) accessions were identified by Trujillo et al. [21] as being different from "Picholine Marocaine" and "Lechín de Sevilla", respectively. They had two dissimilar alleles at GAPU82 and UDO-42 loci that were not taken into account in the current set of markers for the first accession, and at UDO-05 for the second. Otherwise, among the 400 different genotypes, 320 were identified as unique SSR profiles (not duplicated) whereas 217 accessions had SSR profiles in common, resulting in the identification of 80 different SSR profiles.
Based on SSR markers and morphological traits. In the combined WOGBM and WOGBC dataset, 407 alleles were revealed using 20 SSR markers, with a mean of 20.35 alleles per locus. Among the 407 alleles, 309 alleles were in common, whereas 61 and 37 alleles were specific to WOGBM and WOGBC, respectively ( Table 2). A total of 43 unique alleles (observed once) were detected in 54 genotypes in both collections (27 in WOGBM, 21 in WOGBC and 6 in common). No significant difference was observed between the two collections regarding the allelic richness, computed at a G value of 400, and the diversity index (He; Mann-Whitney test p-value > 0.05; Table 2).
Similar pairwise genotype patterns were observed in both WOGBs (S1 Fig). In both collections, only 1054 (0.68%) and 610 (0.42%) pairwise comparisons, for WOGBM and WOGBC, respectively, represented closely related genotypes that differed by one to four dissimilar alleles, whereas the remaining pairwise genotypes were distinguished by 5 to 39 dissimilar alleles. The highest SSR dissimilarity (39 distinct SSR alleles) was observed in only two genotype pairs in WOGBM and six pairs in WOGBC.
The analysis of both datasets (1091 olive trees) revealed 672 different SSR profiles. Three SSR markers (DCA09, DCA16 and UDO099-043) were able to distinguish 77% of the identified genotypes, whereas six markers (DCA04, DCA09, DCA10, DCA16, UDO099-043 and GAPU103) could be applied to discriminate 94% of the total identified genotypes. The 672 genotypes were classified as: (i) 130 SSR profiles observed in common between the two collections, with a total of 436 accessions (213 and 223 for WOGBM and WOGBC, respectively), and (ii) 542 genotypes were specific to WOGBC (270) or WOGBM (272), corresponding to 655 accessions (341 and 314 for WOGBM and WOGBC, respectively; Tables 1 and 3, S1 and S3 Tables). The highest number of genotypes in common was identified within Spanish germplasm with 85 genotypes, followed by 12 within Italian germplasm (Table 1).
A total of 233 alleles were observed within the common 130 genotypes, with a mean of 11.65 alleles per locus, whereas 358 and 329 alleles were observed within genotypes specific to WOGBM (272) and WOGBC (270), respectively (Table 3 and S5 Table). A similar genetic distance distribution was observed within the three groups, while being shared and specific to each collection (S2 Fig). The PCoA plot revealed an even distribution of the shared genotypes along both axes accounting for 12.34% of the total genetic variation, whereas those originating from Spain were clustered (S3 Fig). No significant difference in allele number was observed between common genotypes and those specific to each collection regarding allelic richness (Mann-Whitney comparison test; p>0.05). However, a significant difference was observed for He (diversity index) between common genotypes and those specific to WOGBM (p<0.05; Table 3 and S5 Table), while no significant difference was revealed for the He diversity index regarding genotypes specific to each collection.
Based only on endocarp traits, a total of 371 different morphological profiles were identified in both collections (S1 and S4 Tables). Among these accessions: (i) 158 accessions were classified as 120 unique morphological profiles in WOGBC; (ii) 204 accessions as 126 unique Table 3. Comparison of genotypes shared between the two collections and those specific to each one. The number of alleles (Na), allelic richness (Ar) and index of diversity (He). Based on genetic structure. Based on the whole dataset (672) and data for each collection; 402 for WOGBM and 400 for WOGBC, the genetic structure was examined under the models with K = 2 to 8 clusters. According to ΔK and H', K = 3 were revealed as being the most probable genetic structure model for WOGBM, as previously reported by El Bakkali et al. [23] (ΔK = 1169.98 and H' = 0.998), whereas for WOGBC, K = 2 followed by K = 3 were revealed as being the most probable genetic structure models (Fig 1). With all 672 genotypes, the K = 2  Fig 1). Most olive genotypes from Morocco, Portugal and Spain were distinguished from other Mediterranean olives, whereas genotypes from France, Algeria, Albania, Tunisia, Italy, Slovenia, Croatia, Greece and Egypt were mostly assigned as a second cluster (named central Mediterranean) distinguished from the third cluster, which included genotypes from the eastern Mediterranean region. A clear distinction between the three clusters was observed when plotting the 672 genotypes, with membership probabilities of Q< 0.8, using PCoA (Fig 2).

Cultivar identification process in both WOGBs
Cultivar identification in WOGBM was carried out using nuclear markers and morphological traits as a complement. For accessions with no endocarp trait data, SSR markers were used as the main criteria to identify cultivars and therefore the cultivar identification process was suspended, as mentioned in S1 Table (pending identification). We defined individual cultivars as having accessions with the same molecular and morphological profiles. Molecular variants detected for any identified cultivar were considered in case of minimal differences in SSR profiles between accessions (1 to 4 mismatched alleles) despite similar morphological profiles. We confirmed the varieties authenticated by Trujillo et al. [21] and propose a new set of authenticated varieties on the basis of their matching molecular and morphological profiles with varieties within WOGBC. Detected errors in WOGBM (or mislabeling errors) were considered if the accession did not match its putative cultivar identified in WOGBC and by considering the cultivar name and area of cultivation. Synonymous cases were considered if the identified cultivars displayed similar (or close) SSR and morphological profiles while having the same or close area of cultivation within and/or between collections. Similarly, homonymy cases were considered if the identified cultivars displayed different SSR and morphological profiles while also considering the area of cultivation within and/or between collections.
Finally, we were able to authenticate a total of 120 cultivars in WOGBM. Regarding WOGBC, 110 were previously authenticated from the panel of 200 cultivars described by Trujillo et al. [21], including "Picholine marocaine" from Morocco, "Amargoso" from Spain and "Zaity" from Syria (S1 and S4 Tables). Moreover, 10 new cultivars in WOGBM that were not previously authenticated by Trujillo et al. [21] were found to be authentic as they displayed SSR and morphological profiles similar to those in WOGBC. For instance, "Chemlal de Kabilye" cultivars from Algeria and "Chalkidikis" and "Kolybada" from Greece were authenticated in the current study as they showed similar SSR and morphological profiles in both collections (S1 and S4 Tables).
Plantation mislabeling and molecular variants. The identification process allowed identification of 79 cases of mislabeled plantations in WOGBM compared to 120 in WOGBC (S1 Table). For instance, "Chemlali" accessions from Tunisia (MAR00301 and MAR00296) were detected as plantation errors as they showed profiles similar to those of well-known "Koroneiki" and "Arbequina" cultivars, respectively, and different from the profile of "Chemlali" (COR000744) in WOGBC (S1 Table). Similarly, the "Varudo" (MAR00275) accession was identified as differing from its putative cultivar in WOGBC. Moreover, 228 genotypes were considered to be molecular variants of 74 cultivars (1 to 4 dissimilar alleles) because no endocarp morphological differences were observed between them (Fig 3, S6 Table). The highest number of cultivars showing molecular variants was observed for Spanish cultivars with 29 (92 SSR profiles) followed by Italian and Syrian cultivars with 20 (66 SSR profiles) and 9 cultivars (25 SSR profiles), respectively. These 74 cultivars with molecular variants were classified as follows: (i) 13 cultivars observed only in WOGBM with 37 SSR profiles, (ii) 9 cultivars observed only in WOGBC with 21 SSR profiles and (iii) 52 cultivars shared between the two collections with a total of 170 SSR profiles (Fig 3, S6 Table).

Core collection sampling
According to the two-step method proposed by El Bakkali et al. [23], 110 individuals (16.3%) were necessary to capture the 407 alleles in the whole dataset using the MSTRAT program (672 genotypes). The CORE HUNTER program was run at 8.2% sample size (half of the initial sample size) using the 'Sh strategy' to sample a primary core collection (CC 55 ; S9 Table). The 55 entries thus allowed capture of 328 alleles (80.6%), and only 7 genotypes in common between both collections, while the others were from either WOGBM (33 genotypes) or WOGBC (17). No genotype assigned to western gene pools was selected in the CC 55 core collection ( Table 4).
The primary CC 55 was used as a kernel with MSTRAT to capture the remaining alleles. Hence, a total of 121 entries (CC 121 ; 18%) were sufficient to capture the total diversity and 50 core collections with 121 entries were generated using MSTRAT (S9 Table, S4 Fig). No differences in the Nei diversity index (He) were observed in 50 independent runs. In addition to the 55 genotypes used as a kernel, in all 50 independent runs 32 genotypes were found to be carrying unique alleles, while a combination of 34 complement genotypes could be selected among a panel of 131 genotypes to capture the total diversity (S9 Table). Moreover, 29 varieties which were not selected in the 50 repetitions generated were added, including the most cultivated ones, to account for the morphological trait variability. These 29 varieties could be added to CC 121 as supplementary genotypes to propose a final core collection of 150 entries (CC 150 ; S9 Table).
One core collection CC 121 was arbitrarily selected in addition to the 29 cultivars added. The 150 genotypes sampled were found to belong to 16 different countries among the 23 analyzed herein (Table 4). Only 39 genotypes shared between the two WOGBs among the 150 genotypes of CC 150 were selected while the others were either from WOGBM (69 genotypes) or WOGBC (42) and only 16 genotypes were assigned to western genetic clusters compared to 42 and 31 � One core collection was arbitrary selected among the 50 generated using Mstrat (S9 Table). 1 as identified by Structure program with a membership probability of Q � 0. that were assigned to central and eastern genetic clusters, respectively. The selected 150 entries were plotted according to genetic cluster assignations using PCoA. The results revealed that the genotypes sampled spanned the entire range of genotypes in the whole dataset among the three genetic clusters (Fig 2).

Discussion
The main purpose of this study was to develop a consensus database for Mediterranean olive cultivars based jointly on molecular markers and endocarp traits to serve as an efficient tool for scientists conducting research on breeding and adaptation, as well as other users of local genetic resources. Germplasm banks are crucial for breeding programs as they offer variability in target agronomic traits and allelic variation in genes linked to these traits. Development of reference standards and a well-defined nomenclature system free of homonymy, synonymy and molecular variants is essential for making effective use of olive genetic resources. Here we discuss our results based on: (i) the efficiency of the joint use of molecular markers and endocarp descriptors for olive cultivar characterization; (ii) the relevance of available genetic resources in worldwide collections; and (iii) the importance of this information and of establishing a core collection for the scientific community and other users of olive genetic resources.

Combining SSR markers and endocarp descriptors to efficiently characterize olive cultivars
Many studies have revealed the efficiency of microsatellites markers for characterizing olive cultivars [14,17,21,22,51,52,63,64]. The 20 SSR markers used here were selected based on their high polymorphism, clear amplification and reproducible patterns, as reported by many authors [21][22][23]51,52]. Among the 33 SSR loci used by Trujillo et al. [21] for WOGBC genotyping, only 17 were able to discriminate the 411 identified genotypes, while 14 loci were in common with the 20 loci used in the current study. Moreover, eight of the 20 loci were used by Sarri et al. [51] to discriminate between 118 cultivars, whereas 11 were selected among 37 SSR loci by Baldoni et al. [52] as a consensus list of microsatellites recommended for genotyping 77 olive cultivars. The 20 SSR markers used in the present study were able to discriminate 400 genotypes among the 411 genotypes identified in WOGBC using 33 SSR markers. The 11 genotypes not identified here corresponded to 15 olive accessions, most of which Trujillo et al. [21] considered were molecular variants and/or synonyms of well-known cultivars, e.g."Alameño de Marchena" (COR000254) for "Picholine Marocaine". Only two exceptions were noted for four different cultivars displaying distinct morphological traits: "Zarza" (COR000038)/"Lechin de Sevilla" (COR000005) and "Pulazeqin" (COR001085)/"Itrana" (COR000068), which were revealed to be similar based on 20 instead of 33 loci. We therefore noted that using a lower number of loci did not diminish the discrimination power as only 15 out of 537 profiles were grouped.
Characterization of large collections with a high number of loci is costly. Generating a subset of SSR markers able to discriminate all cultivars is thus essential to enable olive researchers to mine local olive diversity relative to known cultivars. The first step would be to eliminate duplicate samples, which could be done using a minimum of SSR loci. Here we aimed to develop a minimum SSR locus subset by exploring both collections. Among the 20 SSR loci used, we identified 3 and 6 markers that could discriminate 77% and 94% of 672 genotypes present in both WOGBs, respectively. Surprisingly, Trujillo et al. [21] proposed a set of 5 and 10 markers to distinguish 79% and 93% of cultivars present in WOGBC, respectively. By focusing on the 6 markers proposed here, five (UDO-43, DCA04, DCA09, DCA16, and GAPU103) were in common with the 10 proposed by Trujillo et al. [21], which was sufficient to distinguish 91.5% of genotypes (615 among 672). This set of five markers has been widely used in molecular characterization, including parentage analysis [53].
Molecular characterization was complemented by using 11 morphological descriptors related to endocarp traits in both collections. Endocarp traits are still considered to be the most discriminating and stable olive morphological traits as they are not highly influenced by environmental conditions, while being easily and quickly evaluated, even in the field. Moreover, the endocarp could be conserved in the long term and used as a reference. Endocarp descriptions have thus been widely used for olive cultivar identification and for clarifying domestication and diversification processes involving stones at archeological sites [26,27,[90][91][92][93]. However, we used SSR genotyping for a first classification of the different genotypes complemented by considering the endocarp traits: (i) the discrimination power of SSR (672 molecular profiles) compared to endocarp traits (371 morphological profiles); (ii) scoring discrepancies regarding endocarp traits due to their qualitative features, which can lead to confusion and misclassification of cultivars between observers; and (iii) phenotypic changes due to genetic differences might be expressed in organs other than endocarp (fruit, leaves, etc.). However, endocarp traits are still very useful when considering genetically similar/close cultivars revealed by SSR markers, e.g. "Zara"/"Lechin de Sevilla", "Azulejo"/"Manzanilla Cacereña", "Fouji vert"/"Besbassi", "Olivastra di Montalcino"/"Mortellino" and "Pulazeqin"/"Itrana".
Molecular variants have been regularly reported in olive cultivars [21,[28][29][30][31], relict wild olive trees in the Hoggar Mountains [94] and even in other fruit crops such as grapevine [31,95] and fig [32]. This slight allelic variation has been noted in olive cultivars that are widely grown in several olive growing areas, e.g. "Frantoio" (Italy), "Picholine Marocaine" (Morocco) and "Picual" (Spain). This has also been observed for ancient cultivars grown during several periods throughout history, e.g. "Picholine Marocaine" [30,96], "Cobrançosa" (Portugal) [29], "Manzanilla de Sevilla" (Spain) [97], "Gemlik" (Turkey) [98] and "Carolea" (Italy) [99]. These cultivars have undergone massive clonal propagation on a spatial or temporal scale, or both. It can thus be assumed that the massive clonal propagation process can lead to slight allelic variations, especially in the case of SSR markers. Indeed, these markers are considered to be mutational hotspots due to the presence of short repetitive units [100], and most variations occur at loci with dinucleotides and abundant GA repeat units which are more susceptible to mutations and slippage [101]. Hence, 10 amongst the 17 loci showing intra-varietal molecular variations were noted for SSR loci with abundant GA repeat units and in cultivars which are massively clonally propagated (Fig 3, S10 Table).
Identifying a single SSR profile as a molecular variant of a known cultivar is still challenging as no consensus approach has been proposed to clarify this issue. Here we proposed 228 genotypes as molecular variants of 74 cultivars showing light allelic variations (Fig 3, S6 Table). We considered that a threshold of less than 4 mismatched alleles among the set of 20 SSR loci would be strong enough to classify close accessions displaying similar endocarp traits within a single cultivar. As the allelic variation could be extended to up to six mismatched alleles (S1 Fig), we adopted a conservative but robust approach to identify these molecular variants. Several observations confirmed the relevance of our approach: (i) pairwise comparisons between the two WOGBs revealed that for WOGBM and WOGBC only 0.68% and 0.42% pairs, respectively, represented closely related genotypes with 1 to 4 distinct alleles, whereas the remaining pairs were mostly distinguished by 8 to 39 distinct alleles (S1 Fig); (ii) for all genotypes showing less than 4 mismatched alleles, similar endocarp profiles were observed, whereas significant differences were revealed at 5 mismatched alleles, e.g. "Morchiaio"/"Razzaio" and "Mantonica"/"Cirujal"; (iii) most genotypes showing less than 4 dissimilar alleles were identified within the same country (e.g. "Kato Drys" cultivar from Cyprus) or among closely located countries (e.g. "Morchiaio" cultivar from Italy and Slovenia), indicating that massive clonal propagation has been under way in large growing areas, thus leading to slight molecular variation; and (iv) Trujillo et al. [21], based on 33 SSR markers, considered that all accessions with a similarity index of 0.99 to 0.91, corresponding to 1 to 5 mismatched alleles, were molecular variants.

Identifying olive cultivars
With more than 1,200 olive varieties from around the Mediterranean Basin found in almost 100 ex-situ collections [13], two major goals have to be fulfilled: (i) characterization to eliminate mislabeling cases and identify synonymy and homonymy cases, and (ii) authentication of cultivars to ensure that accessions in one collection match true-to-type cultivars. However, most studies have been focused on characterization of ex-situ collections despite the fact that cultivar authentication should be a pre-requisite for exchanging material among scientists and nurseries. Trujillo et al. [21] defined authenticated cultivars based on their identity with respect to endocarp control samples from their original growing areas and DNA control samples. Hence, they were able to authenticate 200 cultivars by comparing endocarps with those of the same cultivar from the countries of origin (172 cultivars) and both endocarp and control DNA samples (28 cultivars). Even though the combination of molecular and morphological characterization was applied only to 28 out of 200 cultivars, the authentication approach used by Trujillo et al. [21] could be considered efficient since it targeted known cultivars from Spanish germplasm which represented 66% of the authenticated cultivars. However, for local and minor cultivars that are less known but especially present in limited geographic areas, the authentication process should systematically include both molecular and morphological characterization, as previously proposed for French olive germplasm [47]. Indeed, these authors defined a reference genotype for one cultivar when at least three olive trees grouped under the same denomination while presenting similar morphological traits and originating from different collections, nurseries and/or orchards display the same molecular pattern. Here, regarding the limits noted in the authentication process, we adopted a conservative strategy by focusing on the panel of 200 cultivars previously authenticated by Trujillo et al. [21]. We were able to authenticate 120 cultivars among the 329 identified in WOGBM, and most of them originated from Spain (81 cultivars; 67.5%). All of these 120 cultivars were previously authenticated in WOGBC by Trujillo et al. [21], except for 10 cultivars with matching molecular and morphological profiles between the two WOGBs.
Within and among accession mislabeling cases could occur at any step during plant establishment in the collection, as reported by many authors with regard to different fruit species [33,47]. Here we identified 79 cases of plantation mislabeling in WOGBM compared to 120 in WOGBC (S1 Table). Using the same approach as Trujillo et al. [21], mislabeling error was assumed when the profile of a known cultivar did not match its putative profile. Hence, when considering identified cultivars based on both molecular markers and morphological traits, Trujillo et al. [21] highlighted that the "Chemlal de Kabylie" (COR000118-Algeria), "Zaity" (COR000788-Syria), "Adkam" (COR001038-Syria) and "Aggezi Shami" (COR000723-Egypt) accessions were "Frantoio" cultivar mislabeling errors. Similarly, here for the same cultivar, i.e. "Frantoio", we highlighted three accessions as mislabeling errors: "Fakhfoukha" (MAR00533-Morocco), "Jlot" (MAR00587-Lebanon) and "Beladi" (MAR00583-Lebanon). This highlights the mislabeling issues that may arise as a result of broad diffusion of clonally propagated known cultivars such as "Frantoio".
The cultivar identification process was helpful for describing homonymy and synonymy cases in both collections. A total of 39 homonymy cases in WOGBM and 36 in WOGBC were identified among a total of 60 in the two collections involving a total of 179 identified cultivars, e.g. the Lentisca denomination which encompassed three distinct genotypes. Many authors have reported homonymy cases in olive [14,21,22,27,102,103]. Here we noted that homonymy cases mostly involved well known and widely cropped cultivars such as "Azeradj" (Algeria), "Cornicabra", "Manzanilla" and "Picual" (Spain), "Toffahi" (Egypt and Syria). This reflected a farming strategy of known cultivar appropriation whereby distinct genotypes were pooled under a single denomination.
Otherwise, synonymy is likely the result of plant material dissemination and of traditional olive vegetative propagation practices. Such practices led to complex relationships among cultivars. We identified 175 synonymy cases among 78 cultivars in both collections (e.g. the "Gordal Sevillana" cultivar from Spain was synonymous with "Santa Caterina" from Italy). We highlighted 88 new synonyms in WOGBM or shared between the two collections including not previously described cultivars such as "Kiti", "Meniko" and "Peristerona" from Cyprus as synonyms of "Kato Drys", whereas 87 had already been described by other authors (e.g. "Olivastra di Montalcino" as a synonym of "Olivastra Seggianese" described by Cimato et al. [104] (S7 Table). Many synonym cases identified in WOGBM were in agreement with those obtained by Trujillo et al. [21] in WOGBC as they showed similar morphological and molecular profiles in both collections, e.g. "Sigoise"/"Alameño de Marchena"/"Haouzia" for "Picholine marocaine", thus showcasing the power and efficiency of tools used to characterize and compare the two collections. The high number of synonymy cases observed in WOGBM compared to WOGBC could be explained by two factors. First, accessions received from many countries during the establishment of WOGBM were characterized in-situ using only morphological traits (fruit, endocarp, leaves, etc.), which could generate confusion compared to characterization based on the combination of morphological descriptors and molecular markers. For instance, accessions from Cyprus, Syria and Slovenia in which 31, 70 and 10 accessions, respectively, were analyzed and only 4 (13%), 42 (60%) and 3 (30%) genotypes were identified, respectively (Table 1). Conversely, analyzed accessions from Spain were found to be distinct due to previous analyses using molecular markers prior to WOGBM establishment [46]. Second, the varietal composition in the two collections contrasted with at least 50% of germplasm in WOGBC originated from Spain. This could be explained by the fact that most of the synonymy cases identified by Trujillo et al. [21] concerned Spanish cultivars (17 cultivars among 30 cultivars). Here, the highest number of synonyms was revealed for three cultivars that are widely cultivated throughout the Mediterranean Basin, with 13 cases for each one: "Beladi", "Frantoio" and "Picholine marocaine" (S7 Table). Most synonymy cases were observed within one country or among closely located countries.
Identifying synonymy cases is essential for managing ex-situ collections by eliminating denomination redundancy. The synonymy cases identified in the present study could be a focus of interest for the olive research community, and we are aware that further investigations with large samples and comparisons of several control samples are required using both endocarp and DNA control samples.
In perennial clonally-propagated fruit species, a cultivar is defined as a group of similar plants that have been selected for one or more interesting characters that are distinct, uniform and stable. It is thus not surprising to observe cases of molecular variants, synonymy and homonymy within cultivars resulting from vegetative propagation and long-term diffusion of domesticated olive via farming practices. Here, within the two collections, we identified a total of 539 cultivars among 672 genotypes, 210 of which are considered as authenticated cultivars. We also defined a set of six SSR makers as efficient tools for discriminating almost all diversity present in both collections. The database generated and the approach used in our study could be applied to compare and harmonize more collections at national (Italy-Cosenza, [14]; Turkey-Izmir, [15]; Greece-Chania, [16]; France-Porquerolles, [47]) and international (USA-Davis, [17]; Argentina-Mendoza, [18]) levels. Setting up an international consortium for the identification, authentication and cataloguing of more than 1,200 cultivars across the Mediterranean Basin, under a common protocol with the six SSR loci selected in the present study and 11 endocarp traits is a necessary step. The expected results would consolidate partnership cooperation between communities of scientists conducting research on olive breeding and other users of olive genetic resources. Our study will enhance the establishment of the third worldwide collection in the eastern Mediterranean Basin (Izmir-Turkey) and will help to update and extend the Olea database [11], which is the most comprehensive global olive science portal to date.
The identification and authentication of cultivars in the two largest worldwide olive germplasm collections will ensure sustainable use of local genetic resources. In fact, studies on the resilience of true-to-type cultivars in different environments using germplasm collections across the IOOC network will guarantee the future of the crop under different climate change scenarii. The use of true-to-type and healthy plant material by commercial nurseries and by farmers for selected local cultivars will promote the use of authentic cultivars through protected designations of origin (PDO).

The importance of pooling the two collections
The two WOGB collections have two distinct histories. WOGBC, as the top olive germplasm bank in the world, was established in 1970 and contains 499 accessions from 21 countries, most of which originated from Spain (56%) [21]. WOGBM, as the second ranking olive germplasm bank, was established 33 years later and contains accessions from 14 countries. Compared to WOGBC, WOGBM was set up in a scientific setting with more knowledge available about the plant material and previously characterized genetic resources were introduced from each Mediterranean country [22]. Therefore, the Marrakech collection has a more balanced representation of accessions from Mediterranean olive growing countries such as Morocco, Algeria, Tunisia, Egypt and Cyprus, whereas the Cordoba collection has accessions from Albania, Turkey and Iran that are not included in the Marrakech collection. Hence, the two collections are complementary and not duplicated, as supported by the following observations. First, only 178 accession names among 713 are shared between the two collections. Only 130 genotypes (20%) out of 672 are in common, with a dominance of Spanish olive germplasm. Fortyeight discrepancy cases were identified, including Meski, Lentisca and Varudo, which could mainly be explained by mislabeling errors during germplasm propagation and/or subsequent planting. Second, the high proportion of accessions and therefore genotypes and cultivars specific to each collection was noted due to the contrasted varietal composition of each collection. Third, the genetic structure analysis revealed that the most probable genetic structure models are K = 2 and K = 3 for the Cordoba and Marrakech collections, respectively, while with all of the 672 genotypes, the K = 2 model was found to be the most relevant genetic structure model (ΔK = 3020.39 and H' = 0.999). The presence of a high proportion of Spanish germplasm in WOGBC has certainly contributed to the genetic structure pattern observed when pooling the two datasets. However, a high proportion of alleles were shared between the two collections, i.e. 309 alleles (75.9%), despite the contrasted varietal composition between the two collections. Moreover, genetic diversity in terms of allelic richness (Ar) and diversity index (He) were similar and no significant differences were observed between the two collections, as previously reported by Trujillo et al. [21] using 11 SSR loci in common. Hence, the level of genetic diversity remained stable despite the considerable contrasted varietal composition between collections.
As more than 1,200 olive cultivars have been reported in the Mediterranean Basin [11,37], the current set of cultivars preserved and identified in both collections represented almost 45% of the described cultivars (535 cultivars). Further studies are required to encompass all olive germplasm from the Mediterranean Basin through the inclusion of new local accessions from different countries that are less represented in the two collections, such as French germplasm (more than 150 cultivars [47]), Cosenza-Italy (500 cultivars [14]), Turkey-Izmir (96 cultivars [15]) and Greece-Chania (47 cultivars [16]).
Here we propose a nested core collection based on the two WOGB collections, at three different levels, i.e. 55, 121 and 150 sample sizes. Core collections facilitate the exchange and efficient use of germplasm by providing a set of cultivars representing the overall genetic diversity available in two germplasm banks. We adopted the two-step methodology proposed by El Bakkali et al. [23], which has been found to be the most efficient approach for selecting subsets suitable for genetic association mapping. Our approach was based exclusively on genetic criteria but we believe that the diversity sampled in core collections based on SSR markers only are correlated with other criteria such as morphological and agronomic traits. Compared to WOGBM alone, a high number of cultivars were sampled that captured the total diversity in the two collections, as reported by El Bakkali et al. [23]; 121 vs 94, which could be explained by the contribution of specific alleles present in WOGBC (37 alleles). At the 55 sample size, only 7 genotypes (12.7%) were in common between the two collections, whereas 33 (60%) were revealed to be from WOGBM. Similar proportions were observed at the 121 sample size where 55.3% were from WOGBM and 11.6% were shared between the two collections (Table 4). These observations underline the importance of cultivar variability within WOGBM.
Defining a set of cultivars to serve as a core collection for the two WOGBs and its field assessment will certainly enhance knowledge on the genetic basis of target agronomical traits which is still at early stage. Genetic association mapping using unrelated cultivars is a powerful tool and has been successfully used to identify the genetic basis of many complex traits in plants [105,106]. The proposed core collection will boost insight into the genetic basis of most agronomic and adaptive traits in olive by taking advantage of advances achieved in next generation sequencing (NGS) with the development of high-throughput SNP markers through genotyping by sequencing approaches (GBS) [107,108], while also exploiting the recently released olive reference genome (http://olivegenome.org/) [59,60].

Conclusion
Intensive olive cropping systems with mechanical harvesting are leading to a reduction in the number of cultivars used in orchards, therefore increasing the risk of genetic erosion. Genetic resources included in germplasm collections represent a major pillar for sustainable use of olive genetic resources and for breeding programs geared towards selecting new resilient cultivars that are better adapted to climate change. Cultivar identification and authentication should thus be compulsory before using olive plant material. Here we first harmonized allele sizes between the two worldwide collections for a set of 20 SSR markers and used 11 endocarp traits as complementary tools for the characterization and identification of olive cultivars. We thus conducted the first in-depth analysis on olive cultivar germplasm. The information and database generated from this study will help manage olive cultivars from the Mediterranean Basin through the launch of a consortium under IOOC supervision. They also represent valuable tools for conducting further studies such as genetic association mapping.
Supporting information S1 Table. Characterization of all olive trees in WOGBM and WOGBC (1091). Codes in collections, names of accessions and their origins, number of trees analyzed, number of SSR profiles, SSR code, morphological code, name of the identified cultivar and main cultivation area are specified. Identified and authenticated cultivars are mentioned.