Genetic diversity and accession structure in European Cynara cardunculus collections

Understanding the distribution of genetic variations and accession structures is an important factor for managing genetic resources, but also for using proper germplasm in association map analyses and breeding programs. The globe artichoke is the fourth most important horticultural crop in Europe. Here, we report the results of a molecular analysis of a collection including globe artichoke and leafy cardoon germplasm present in the Italian, French and Spanish gene banks. The aims of this study were to: (i) assess the diversity present in European collections, (ii) determine the population structure, (iii) measure the genetic distance between accessions; (iv) cluster the accessions; (v) properly distinguish accessions present in the different national collections carrying the same name; and (vi) understand the diversity distribution in relation to the gene bank and the geographic origin of the germplasm. A total of 556 individuals grouped into 174 accessions of distinct typologies were analyzed by different types of molecular markers, i.e. dominant (ISSR and AFLP) and co-dominant (SSR). The data of the two crops (globe artichoke and leafy cardoon) were analyzed jointly and separately to compute, among other aims, the gene diversity, heterozygosity (He, Ho), fixation indexes, AMOVA, genetic distance and structure. The findings underline the huge diversity present in the analyzed material, and the existence of alleles that are able to discriminate among accessions. The accessions were clustered not only on the basis of their typology, but also on the basis of the gene bank they come from. Probably, the environmental conditions of the different field gene banks affected germplasm conservation. These outcomes will be useful in plant breeding to select accessions and to fingerprint varieties. Moreover, the results highlight the particular attention that should be paid to the method used to conserve the Cynara cardunculus germplasm and suggest to the preference of using accessions from different gene banks to run an association map.


Introduction
In order to understand and preserve C. cardunculus genetic diversity it is important to know the propagation system, which differs between globe artichoke and cardoon. They are both allogamous plants, but the first is mainly propagated vegetatively by means of basal shoots or semi-dormant shoots with a limited root system [11], while the second is seed propagated. As a result, the level of heterozygosity in globe artichoke is higher than in cardoon, both wild and cultivated [5,9,12,13]; moreover, the globe artichoke often has a multi-clonal structure. One problem in C. cardunculus conservation and characterization is that the local varieties are often named on the basis of the area where they are cultivated [14], regardless of their real genetic diversity or similarity. As a result, the names of accessions or local varieties are not always univocal and in some cases can generate synonymies or homonymies. In addition, previous studies have demonstrated that the huge diversity within each botanical variety is quite often not related with the corresponding geographic origin (e.g. [15][16][17][18][19]). Considering all this, it is clear that the characterization of C. cardunculus is essential for its correct conservation and utilization.
Moreover, it is sometime difficult to define globe artichoke varieties since, after cultivation over several decades in various geographical areas, they might be subjected to divergent selection. Thus, the accessions stored in gene bank field collections need to be rationalized by improving core collections and avoiding duplications [20][21][22].
The association between genotype and phenotype can be achieved either by controlled biparental crosses (linkage mapping) or via association mapping, controlling the linkage disequilibrium, i.e. the non-random association of alleles between loci, regardless of their position across the chromosomes [44]. The assessment of genetic variation and population structure is a prerequisite before performing association mapping. Linkage mapping has a series of limitations, including its high cost, low resolution, the need for polymorphism between the parents used, large segregant population and the distribution of chiasma across the genome. Conversely, an association map, using accessions that are not related by common parents, can detect several alleles at each locus and a higher level of polymorphism. Association mapping is a tool that can be used to investigate elite genes by structuring the natural variation present in a germplasm. Possible errors in association maps may arise due to unequal allele frequency distribution between subgroups, which may lead to spurious associations between molecular markers and the traits of interest [45]. In order to reduce such errors, before performing an association analysis in a population, it is essential to determine the population structure, as we do in this study.
Previous works on C. cardunculus characterization were addressed at specific accession typologies and used limited collections belonging to restricted geographic areas. The only exception is our previous paper, which reported preliminary data using only some of the dominant markers in the European collection [18]. The aims of the present paper are (i) to assess, for the first time, the diversity present in C. cardunculus European collections, using both dominant and co-dominant markers, (ii) to determine the population structure, (iii) to measure the genetic distance between accessions; and (iv) to cluster the accessions according to the molecular data. Moreover, two other questions are addressed: (v) are the accessions present in the different national collections and carrying the same name really the same material? and (vi) how is diversity distributed in relation to the geographic origin of the germplasm?

Plant material
A total of 556 individuals belonging to Italian (264) (CNR-IBBR, Bari; CNR ISAFOM Catania; ARSIAL Rome), French (162) (GEVES, Cavaillon) and Spanish (130) (ITGA, Navarra) collections, and representing 174 accessions were jointly analyzed. The accession list is reported in S1 Table together with their typology and country of conservation. The accessions are divided according to the four typologies described in the introduction and identified by Porceddu et al. [2]: Romanesco (225 individuals), Violet (34 individuals), Catanese (116 individuals) and Spiny (11 individuals), plus the accessions belonging to leafy cardoon (72 individuals). Moreover, two additional globe artichoke categories were added: the Blanca de Tudela typology (39 individuals) due to its importance in the Iberian Peninsula, and OFF (59 individuals), which includes the accessions not univocally classifiable as belonging to the previous typologies.
For all amplifications the forward primers were fluorescently labeled to resolve PCR amplicons on an ABI 3130xl (Applied Biosystems) or a CEQ 8800 (Beckman Coulter) sequencer. The detected bands were checked for reproducibility even if the visualization by sequencer showed high sensitivity and precision. For the co-dominant (SSR) markers, each allele was scored in accordance with its molecular weight in bp, while for the dominant (AFLP and ISSR) markers, 0-1 matrices were obtained, without knowing the allelic relationships. In this case, each possible band was considered as a locus with 2 possible alleles, 0 (absence) or 1 (band presence). In some cases, the 0-1 matrix was considered as haplotype, while in others it was converted into a co-dominant matrix with 1 as dominant homozygote (A/A) and 0 as the other homozygote (a/a).

Statistical analyses
The gene diversity index was calculated for each locus and population according to Nei [49], using the Hardy-Weinberg formula The polymorphism information content (PIC) was computed as [50].
To compare differences among and within accessions and groups Wright's fixation indices were used [51]. The F-statistics are based on the expected level of heterozygosity. The measurements were computed for the different levels of the accession structures, such as the variance of allele frequencies within accessions (F IS ), variance of allele frequencies among accessions (F ST ), inbreeding coefficient within individual total diversity (F IT ), variance among accessions within types (F SC ) and variance permuting accessions among groups (F CT ), which are related to the degree of heterozygosity at various levels of the accession structure. The terms mentioned above are related through the formula: 1-F IT = 1-F IS + 1-F ST , where I indicates the individual, S the sub-accession and T the total accession; F IT refers to the individual compared with the total; F IS is the individual compared with the sub accession; and F ST is the sub accession compared with the total. The total F, indicated by F IT , can be partitioned into F IS (or f) and F ST (or θ). F ST can be computed using the formula: F ST = (H T -H S )/H T, where H T is the proportion of the heterozygotes in full accessions and H S the average proportion of heterozygotes in sub-accessions. The F statistic was also used in the AMOVA (Analysis of MOlecular Variance) to measure the partition of variation among typologies, among accessions within typologies, among individuals within accessions, and within individuals.
The genetic diversity (He) and genetic identity (J or Ho) were also used to estimate the genetic distance. If is the probability of identity in the x accession and is the probability of identity in the y accession, the probability of identity in both accessions is p xi p yi as described by Nei [52]. The probability of identity in the x accession for all normalized loci is and, in turn, the genetic distance is: The distances between accessions were also computed using the Euclidean distance The obtained distance was then used to cluster the accessions according to different clustering methodologies such as the UPGMA algorithm. The clustering was also performed by Kmean, which is a non-hierarchical method of classification that partitions a set of samples into the most appropriate number of clusters decided in advance [53]. Run length, in STRUCTURE software, was given as a 150k burning period length followed by 150k Markov Chain Monte Carlo (MCMC) replications. As suggested, several analyses were first run using different K values from 2 to 9. Finally, in accordance with likelihood, ΔK [54] that identified two picks for K = 3 and K = 6 (S1 Fig), F ST distribution among groups and with the fact that the germplasm could be divided into 6 groups on the basis of typology (Romanesco, Violet, Catanese, Spiny, Cardoon, and OFF), both K = 3 and K = 6 were adopted and are presented below. Individuals were assigned to subgroups by the "No admixture model". The output reports the subsequent probability that individual i is from accession k. The prior probability for each accession is 1/ K. This model is appropriate for studying fully discrete accessions. The "admixture model" was also run, but no better resolution was observed with this model (data not shown). Linkage disequilibrium between loci, which measures the deviation from random association between alleles at different loci [55], and its significance (P values of χ 2 with 1000 permutations), was also computed.
In some cases, the co-dominant and dominant data typologies were analyzed separately, since it was not possible to compute some parameters for the dominant markers. In addition, analyses were run both jointly and separately for globe artichoke and cardoon accessions.

Genetic diversity
The identified co-dominant alleles were 147 in total, from the 556 individuals analyzed. For each co-dominant marker, the alleles ranged from 1 to 15 with an average of 7.4 alleles per marker (Table 1). Nevertheless, most of the alleles (about 62%) were rare, having a frequency lower than 5%; as a result, the major alleles had an overall average frequency of 70%. For the co-dominant markers, the major alleles for each marker had a frequency ranging from about 23% to around 87%, excluding the CMAFLP-05 marker, which was monomorphic. Some of the SSR alleles were specific for a single accession ( Table 2). In total, 33 private alleles were found in 24 accessions. As expected, the frequency was higher and more constant in the dominant markers with only two alleles. In fact, average frequency of private alleles was 0.816 and 0.805 for AFLP and ISSR respectively versus 0.700 for SSR, while it ranged from 0.743 to 0.878 and from 0.232 to 0.869 for dominant and co-dominant markers respectively.
The gene diversity computed as expected heterozygosity (He) and the polymorphism information content (PIC) provide information of a marker's ability to determine polymorphism. In the present study, He values for the dominant markers were quite uniform, with an average of around 0.26 and 0.28 for AFLP and ISSR, respectively (Table 1). For SSR, the gene diversity ranged from 0.24 (FA2-GAT and CLIB-02II) to 0.83 (CLIB-02I) ( Table 1). Note that the marker CLIB-02 identified 2 loci, here labeled as CLIB-02I and CLIB-02II, so that CLIB-02 altogether had higher values. The PIC values had a similar but not equal ranking among markers compared with the gene diversity parameter. PIC ranged from 0.21 of FA2-GAT to 0.80 of CLIB-02I, followed by CDAT-01 with 0.73 (Table 1). The values were, on average, lower in leafy cardoon than in globe artichoke (S3 and S4 Tables). For the co-dominant markers, it is also possible to compute the observed heterozygosity (Ho) and the Wright fixation indices by considering both He and Ho. The observed heterozygosity ranged from 5% for CLIB-02II to 99% for CMAL-25, with an average of 44% (Table 1). As a consequence, the partition of variation, into its components i.e. within accessions (F IS ), within individuals (F IT ), and among accessions (F ST ) was quite different from one marker to another. The average values for the F IS , F IT and F ST were -0.66, 0.16 and 0.51, respectively (Table 1). Generally, the values were lower in leafy cardoon than in globe artichoke (S3 and S4 Tables). In general, the loci were in LD with each other (S5 Table) except for CMAL-25, which was not in LD with CMAL11, CMAL117, CMAL24, CsCaCa05, CsEST03 and CsPal02.
The accessions analyzed had quite different levels of diversity, as detected using both dominant and co-dominant markers, as shown in S6 and S7 Tables. The polymorphism ranged from 0 to about 55%, with an average of 20%, for the dominant markers and from 10 to 85%, with an average of 54%, for the co-dominant markers. The expected heterozygosity ranged from 0 to 17%, with an average of 7%, for dominant and from 5 to 42%, with an average of 26%, for co-dominant markers. The observed heterozygosity ranged from 10 to 67%, with an average of 45%. The fixation index was mainly negative, ranging from -1 to 0.24, with an average of -0.74. The accessions had on average more than one allele but, in spite of the high number of alleles identified by the markers, the alleles were generally specific for each accession.
The alleles with a frequency higher than 5% in a single accession ranged from 0.6 to 2.3, and the number of locally common alleles found in less than 25% of the accessions but present in more than 5% in the specific accession ranged from 0 to 1.2, with an average of 0.2. Some accessions had more than a single private allele, reaching a maximum of five private alleles specific for VertVaulxVelin ( Table 2). The Shannon Information Index indicated richness, and the evenness ranged from 0 to 0.26 for the dominant and from 0 to 0.66 for the co-dominant markers, with an average of 0.11 and 0.36, respectively (S6 Table). On average, the marker parameters for each accession ranged from 0.403 to 3.286 for the number of alleles; from 1 to 2.55 for the number of effective alleles, from 0 to 0.934 for the Shannon's Information index, from 0 to 0.517 for the expected heterozygosity; from 0 to 0.583 for the unbiased expected heterozygosity; and from 6 to 192 for the number of amplicons of the dominant markers (S7 Table).

Accession structure and genetic relationships
The structure of the 174 accessions was analyzed by means of a Bayesian based approach in the STRUCTURE program, considering only the co-dominant loci. According to the Evanno [54] calculation, the most probable K was three (S1 Fig). The results obtained using STRUCTURE with K = 3 identified a first group with the leafy cardoon accessions, a second group with mainly the Catanese and Tudela accessions and a third group with all the other globe artichoke varieties (Fig 1). To better separate the non-Catanese globe artichokes, the structure analysis was repeated with K set equal to 6, which was the second most probable K in the Evanno analysis (S1 Fig). Finally, the sixth sub-group (SG6), labeled in red, included the Tudela accessions plus some Romanesco (Macau, CamusBretagneBH8) and Catanese (VioletProvenceF, VioletPro-vence41S). In the case of K equal 3 (Fig 1) the first group corresponded to SG1 including all the leafy cardoon accessions. The second group was similar to SG2 + SG6, while the third group included SG3, SG4 and SG5.
Geographical localization of some accessions was provided by the gene banks, which collected them. For the accessions geographically localized with certainty, the average proportions  (Fig 3). Even when a distribution pattern could be identified, such as the blue mainly in central Italy (Tyrrhenian side), the red mainly in Italy, the green in France, and turquoise in the southern part of each country, several exceptions were evident. It is interesting to note that the accessions from southern France have a multiple classification, belonging to more than a single group in most cases.
Individual distance was computed using Nei genetic distance [52] or Euclidean distance based on all the data (dominant and co-dominant). The triangular matrices were then used for clustering analysis based on the UPGMA method using Power Markers (Fig 4). The clustering results were not very different for the two distance methods used, and so only the results obtained using Nei distance were presented. The dendrogram grouped the accessions into six clusters. The first group was CL1, which contained all the leafy cardoon accessions which, as expected, were well separated from all the others. The other five clusters included the globe artichoke accessions, which were divided mainly on the base of the collection. The CL2 group included accessions from the University of Viterbo collection (maintained at ARSIAL), which were mainly Romanesco accessions (36 out of 81), except for the S. Erasmo accession. CL3 contained most of the accessions from the French collection, including all the Violet and the Romanesco accessions, except Petre; while the other accessions from the French collection were close together in CL4 with all the Spanish accessions. CL5 and CL6 groups were close to each other and included all the accessions from CNR collections, which were mainly Violetto di Sicilia and Violet de Provence accessions, respectively, with a small group of Romanesco types and another of OFF types stemming from the CL5 group. The CL6 group included some Romanesco and one Spiny type.
Principal Coordinates Analysis (PCA) via Covariance matrix, based on co-dominant markers only, distributed the 556 individuals into a two-dimensional scatter plot. The first two PCA axes accounted for 32.39 and 23.60% of the genetic variation among accessions, respectively (Fig 5). Also in this case, the leafy cardoons were separate from the globe artichokes and were in the second quadrant of the graph with a positive PC1 and negative PC2. In the bottom left, with negative PC1 and PC2, were the Violet and Catanese types together with Tudela, which lay more towards the center. The Romanesco accessions were in the top central part of the plot (Fig 5). A similar distribution of individuals between the 2 PCA axes, accounting for about 48% of the genetic variation among accessions, was obtained using only dominant markers (S2 Fig), but in this case the Romanesco accessions were more spread along the PC1 axis.

Genetic variance analysis
The hierarchical distribution of co-dominant molecular variance was partitioned among three levels: typology, accessions, and individuals. For typologies, seven groups, including the unassigned one, were used. The analysis revealed that only 15.49% of the total variation was among typologies, 19.22% was among accessions, but the greatest part, 65.29%, was within accessions ( Table 3). Calculation of Wright's F statistic at all SSR loci revealed that the genetic variation among accessions within the same geographical group (F SC ) was 0.227, while among geographical groups (F CT ) it was 0.155, and among accessions across the entire study area (F ST ) it was 0.347. All the values were highly significant even if most of the variation was not among typologies.

Discussion
An assessment of genetic diversity is essential for understanding which germplasm should be conserved and/or what is being lost or could be in danger of extinction. The genetic diversity indicates how to build up a core collection, maximizing the variation among accessions and minimizing accession repetition. This is particularly important considering that cultivation strategies tend to concentrate on few varieties, with a consequent reduction in the cultivation of many local landraces. Moreover, a knowledge of population structure in a collection is a prerequisite for making association analyses to attribute allelic variation to important agronomic traits. Molecular markers may or may not correlate with the phenotypic expression of a  genomic trait, but, whatever the case, they offer several advantages over conventional, phenotype-based alternatives. In particular, they are stable regardless of cultivation practices, environment, phenological phases or tissue. A structured collection analyzed with molecular markers is the basis for association studies involving phenotypic variation and biological function.
The germplasm analyzed here mainly derives from its center of origin [7], so it is not surprising that a huge diversity was detected in terms of polymorphic bands, expected and observed heterozyosity, fixation index, etc. The globe artichoke is mainly propagated vegetatively, and its outcrossing status is confirmed by the high level of heterozygosity observed  which is on average slightly less than 50%. In the present study, the use of sequencer machines to read markers enabled us to discriminate the bands with precision, while some of the previous studies only used silver staining and visual recording. The sequencers also made it easier to rerun the analysis to confirm the recording; moreover, the results from different laboratories were compared to validate the data used.

Markers and identified diversities
Among the markers used, some are more informative than others, as indicated by the different parameters used. The SSR CLIB-02 should certainly be included in any C. cardunculus evaluation since it was one of the most variable markers (Table 1); also good markers, in increasing order of PIC, were: CsPal02, CMAL-108, CMAL21, CMAFLP-04, CMAFLP-18, CMAL06, CsPal03, CLIB-12, CMAL24, and CDAT-01. By contrast, CMAFLP-05 did not provide useful information. It should be pointed out that CMAL-25, even with a PIC value below the average, was not in LD with any of the others. The PIC value of the dominant markers was much lower than that of the co-dominant markers, ranging from 0.176 for EagcMctt to 0.242 for EacgMctt (average 0.211) in the case of AFLP and from 0.181 for 834 to 0.277 for 857c (average 0.235) in the case of ISSR (Table 1). The present results showed that the differences within the two marker typologies (ISSR and AFLP) were much more consistent than those found by Lanteri et al. [29] and are similar to those found by Pagnotta et al. [30]. The high level of heterozygosity detected was expected since C. cardunculus is a highly outcrossed species. The level of He and Ho detected here (Table 1) in the SSR of the "CMAL-" and "CsPal-" series are comparable with the values found by Acquadro et al. [47] and Sonnante et al. [39], who tested newly developed SSR markers on a core collection of 27 or 29 globe artichoke accessions, respectively, but also on leafy cardoon accessions. The values of He and Ho and the number of alleles found here were much higher than the corresponding values found by Acquadro et al. [48], who tested their newly identified SSR of "CDAT-" and CLIB-" series on a slightly smaller core collection, as well as the value for CsCiCaca05 detected by Sonnante et al. [39]. Conversely, for the SSR of the "CMAFLP-" group, the values of the above parameters was much lower than those detected by Acquadro et al. [40], who mainly tested wild cardoon (24 accessions), and a limited number of leafy cardoon and globe artichoke (2 accessions each). Despite the high number of amplicons detected by the dominant markers (S7 Table), and hence the possibility to explore a wider genomic region, they were less informative (in terms of PIC) than SSRs due to their uncertain allelic phase [23]. Moreover, the dominant markers were considered to have only two possible alleles, i.e. band presence or absence. In this case, the He detected indicates that one of the two allele is quite common (with a frequency of about 80%) and the other rarer (frequency of about 20%). In fact, an He equal to 0.257 (average level for AFLP) means that the two alleles have a frequency of about 0.152 and 0.848 respectively. Similarly, in the case of ISSR, the average He was equal to 0.284 (Table 1) indicating that the allele frequencies were equal to about 0.171 and 0.829 for the two possible alleles.
It is interesting to note that a high number of alleles were identified for the co-dominant markers, with an average of eight alleles (excluding the monomorphic) per marker and a maximum of 15 alleles per marker (Table 1). An average of 18.3 alleles per locus was found by Gatto et al. [8], even though their study also included wild cardoon besides globe artichoke and leafy cardoon, while about six alleles were detected by Portis et al. [60] in wild cardoon and by Scaglione et al. [13] in a core collection with all the three C. cardunculus taxa. A much lower number was found by Crinò and Pagnotta [61] analyzing a collection including only Romanesco artichoke accessions from Latium region (Italy). Conversely, the number of alleles found in our work was similar to that detected by Ben Ammar et al. [19] in wild cardoon. This was also true when the globe artichoke and the leafy cardoon accessions were considered separately, even if, in these cases, the values were lower (S3 and S4 Tables).

Accessions and identified diversities
It is important to point out that despite the many alleles identified in the entire germplasm (Table 1), very few alleles were found in each accession (S7 Table), i.e. the alleles common to many accessions are limited. In addition, 34 alleles (Table 2) were specific for a single accession (private allele), which might be particularly useful for discriminating and fingerprinting the relative accession.
The average values for the F IS , F IT , and F ST were -0.66, 0.16, and 0.51, respectively ( Table 1). The variance of allele frequencies among accessions (F ST ) detected here is higher than the average values found in wild cardoon [19,60] or in C. cardunculus [8]. This is true also if the globe artichoke and the leafy cardoon accessions were considered separately, even if in these cases the values were lower (S3 and S4 Tables). The fixation indexes across accessions were mainly negative, ranging from -1 to 0.24 with an average of -0.74. Since the fixation index is equal to (H e -H o ) Ä H e = 1-(H o Ä H e ), its values ranged from -1 to +1. Values close to zero are expected with random mating, while substantial positive values indicate inbreeding or undetected null alleles. Negative values, as were here found, indicate an excess of heterozygosity due to negative assortative mating or selection for heterozygotes, which is not surprising in a highly outcrossing species like C. cardunculus. This is particularly true for globe artichoke (overall average F = -0.798), while for leafy cardoon (overall average F = -0.032) the values were among the highest, indicating random mating. This difference can be explained with the different propagation system, clonal for globe artichoke, by seed for leafy cardoon as for wild cardoon [12].
It is interesting to note that C3 was the accession with the lower level of observed heterozygosity. C3 clone has come to be widely cultivated in Italy in the last 10 years since micro-propagation was started, enabling its multiplication at a high rate [61]. It should be considered that a variable number of individuals per accession were analyzed, and this could have affected the computed parameters, particularly for the accessions with a single individual, such as C3. Nevertheless, it should be pointed out that correlation analyses between the number of individuals and the different computed parameters (data not shown) did not highlight any significance.

Relationships among accessions
All the previously published works analyzed C. cardunculus accessions belonging to single collections. Not surprisingly, they generally revealed good discrimination among the three C. cardunculus taxa, separating the globe artichoke from the leafy cardoon and/or the wild cardoon [8,13,32,34,[37][38][39][40]43]. In this study, using a higher number of accessions, a similar result was obtained comparing globe artichoke versus leafy cardoon accessions. The globe artichoke and the leafy cardoon displayed morphological and genetic differences, so it is surprising that, according to UPOV, the two taxa are bulked in the same description protocols. Moreover, the two taxa are differentiated by their reproductive habits [9] and for the commercial parts used, i.e. the leaves for the leafy cardoon and the capitula for the globe artichoke.
In a previous study, the STRUCTURE analysis in C. cardunculus was run only in a collection that mainly included wild cardoon (43 accessions) and a limited number of leafy cardoon (10 accessions) and globe artichoke (16 accessions) [8]. In the present study, the leafy cardoon accessions were separated into one sub-group and the globe artichoke accessions were differentiated in five sub-groups according to their structure. In this case, as in Gatto et al. [8], the Romanesco types were heterogeneously grouped, with the Spiny and Violet types not well distinguishable from some of the Romanesco. Catanese types are in a sub-group close to the Tudela type. Also, as in Gatto et al. [8], only the Catanese groups of the globe artichoke accessions showed consistency. Not surprisingly, the unassigned individuals were spread among four of the five globe artichoke sub-groups. Generally, all the individuals of a single accession were grouped together. Nevertheless, there are some exceptions in which individuals belonging to the same accession are assigned differently. These were the two Brindisi accessions, one blue and the other turquoise; and the Escarot, Pètre, Capitan, Compact, Hydes and MutRomanesco accessions, which shared more colors but inconsistently among individuals (Fig 2).
The resulting geographical distributions obtained by mapping the Q values, only for the globe artichoke accessions with a defined cultivation area, showed no clear pattern. Even if in some geographic areas some colors seemed to predominate (Fig 3). Most of the accessions from southern France have a multiple classification and belonged to more than one group, indicating the low uniformity of these accessions.
The dendrogram obtained with the Nei [52] genetic distance (Fig 4) grouped the accessions into 6 clusters, but, surprisingly, except for the leafy cardoon group, there was no consistency in the accession typologies, unlike that mentioned in other publications. Conversely, the origin of the collections seems to be more important. The CL3 and CL4 groups mainly included the accessions from France and Spain, respectively. The CL2 group included mainly Romanesco accessions, which were mainly from Central Italy gene banks. CL5 and CL6 included both Catanese and Violet types, which come from Southern Italy gene banks.
Within the groups, the leafy cardoon accessions (CL1) were clearly sub-grouped into leafy cardoon from France and leafy cardoon from Spain, in agreement with previous works, in which leafy cardoon accessions were placed in distinct groups according to their geographical origin [8,13,37,62]. As for the globe artichoke samples, the French accessions in CL3 were roughly divided into the Romanesco and Violet types. The Spanish accessions in CL4 were clearly subdivided into Tudela and Romanesco types. To the best of our knowledge, this is the first time that a wide collection of Tudela accessions has been analyzed and it is not surprising to find them well distinguished from the others. Previous studies pointed to the admixture structure of Blanca de Tudela lying between Catanese and Romanesco types [8], or allocated it close to Locale di Sibari [42]. The Spanish accessions of the Catanese and Spiny typologies were not markedly divided in CL4 either. Regarding the groups CL5 and CL6, as far as we know, this is the first time that several Violet de Provence accessions have been analyzed, except in Sonnante et al. [39] and Gatto et al. [8], where a single accession from Italy was included and where, in the first case, no clustering was reported. The overlapping of Spiny with Romanesco and the Violet with Catanese was also found by Sonnante et al. [38], while Scaglione et al. [13], using an EST database, grouped the Violet and Spiny types separately on one side and the non-Spiny types on the other.
The Principal component analysis ( Fig 5) dealing with only two main coordinates, which accounted both for about half of the total variation, clearly separated the leafy cardoon accessions, while all the other 484 individuals were spread along the two axes, with no clear pattern. This is also because, as underlined by the AMOVA analysis (with co-dominant markers), most of the variation was within the accessions and only 15.5% was due to differences among typologies. This was in line with previous results provided by Lanteri et al. [24], who, in Spinoso Sardo accessions, detected that 72% of the variation was due to within-accessions variations. Conversely, De Felice et al. [42] attributed most of the variation (86%) among accessions, and in Raccuia et al. [32] the results depended mainly on the groups analyzed. The percentage of variation among and within accessions was strongly related to the species breeding system, outcrossing species like C. cardunculus showing a higher variation within populations (accession) (see Pagnotta et al. [63] and references therein).

Conclusion
This is the first study that jointly analyzes C. cardunculus germplasm from different European regions and collections. Due to the high degree of heterozygosity and the vegetative propagation system of the globe artichoke, the structure of the accessions was only partially in agreement with some important morphological traits, which are the basis for the classification of accessions into typologies. In general, the grouping results were mainly typology-based. However, in spite of the great diversity found in the collections, it seems that the method adopted to conserve the germplasm affected their genetic distance and, indeed, the origin of the collections was reflected in germplasm clustering. It should be said that the origin often coincides with the main typologies. Hence, the assessment of genetic diversity could not be unequivocally detected when studies dealt with few accessions or accessions coming from the same collection.
The present results highlight that particular attention should be paid to the method used to conserve the C. cardunculus germplasm. The globe artichoke is mainly propagated vegetatively and the germplasm is conserved in field gene banks, which are subject to environmental effects. The different environmental conditions present in the field gene banks located in France, Spain, Central Italy and Southern Italy may well have created different selective pressures, which fix common alleles regardless of the accession type. Strategies to mitigate the environmental effects on field gene banks should apply several conservation methods including in vitro and cryopreservation, which are less affected by environmental conditions. As a conclusion, the accessions sharing the same or similar name should not be considered similar by default but great care should be taken to identify their origin and conservation methodology.
In the future, the present wide collection, which is structured according to genetic analyses, could serve as a base for association studies knowing its morphological characterization as well. Moreover, to run association maps and to pick higher diversity among accessions it is advisable to include accessions from different gene banks in the corresponding analysis.