Genetic diversity and population structure of Cynara cardunculus L. in southern Portugal

Cynara cardunculus L. is a cardoon species native to the Mediterranean region, which is composed of three botanical taxa, each having distinct biological characteristics. The aim of this study was to examine wild populations of C. cardunculus established in Portugal, in order to determine their genetic diversity, geographic distribution, and population structure. Based on SSR markers, 121 individuals of C. cardunculus from 17 wild populations of the Portuguese Alentejo region were identified and analysed. Ten SSRs were found to be efficient markers in the genetic diversity analysis. The total number of alleles ranged from 9 to 17 per locus. The expected and observed means in heterozygosity, by population analysed, were 0.591 and 0.577, respectively. The wild population exhibited a high level of genetic diversity at the species level. The highest proportion of genetic variation was identified within a geographic group, while variation was lower among groups. Geographic areas having highest genetic diversity were identified in Alvito, Herdade da Abóboda, Herdade da Revilheira and Herdade de São Romão populations. Moreover, significant genetic differentiation existed between wild populations from North-Alentejo geographic locations (Arraiolos, Évora, Monte da Chaminé) and Centro Hortofrutícola, compared with other populations. This study reports genetic diversity among a representative number of wild populations and genotypes of C. cardunculus from Portugal. These results will provide valuable information towards future management of C. cardunculus germplasm.

From each location, seven to nine different genotypes were selected for genetic analysis. In total, samples (leaves) of C. cardunculus included 121 genotypes from 17 populations (Table 1; Fig 1), leaves were air-dried and ground to a powder for DNA isolation.

DNA isolation
Total genomic DNA was extracted from samples using the DNeasy Plant Mini Kit (Qiagen, Germany), following manufacturer protocols. Prior to PCR amplification, DNA concentration and purity were determined spectrophotometrically (NanoVue plus, GE Healthcare Life Sciences, USA), while DNA integrity was assessed electrophoretically on 1% agarose gels, stained with GreenSafe Premium (NZYTech, Portugal) using a GelDoc XR System (BioRad, USA) for image capture and analysis.

Microsatellite analysis
Twenty-three pairs of primers for microsatellite analysis were retrieved from the available literature for artichoke [37][38][39] (S1 Table). Pairs of primers were chosen according to linkagegroup positions and to level of potential polymorphism.
PCR amplifications were performed using a total volume of 20 μL containing 0.1-0.5 ng of genomic DNA, 0.5 μM of forward and reverse primers and 1x Dream Taq PCR Mastermix (Thermo Scientific, USA). Amplification was performed using the following conditions: initially 95˚C-10 min; then 40 cycles at 95˚C-45 s; optimal annealing temperatures respective to primers-60 s ( Table 2); 72˚C-45 s; and a final elongation at 72˚C-10 min. Controls lacking a DNA template were included in each PCR reaction for each primer pair. Only ten of twenty three pairs of primers listed in S1 Table with amplification and specific PCR products were used for the following fragment analysis.
Amplicons of each primer pair were sequenced using the ABI 3730xl platform, to confirm the specificity of PCR products. Thereafter, multiplex PCRs were carried out using NYZProof DNA polymerase 2x Colourless Master Mix (NYZTech, Portugal) according to manufacturer's instructions, using the annealing temperatures listed in Table 2. Each forward primer was fluorescent-labelled with 6-FAM or HEX dyes (STAB VIDA, Portugal), and two loci were

PLOS ONE
Genetic diversity and population structure of Cynara cardunculus L.
amplified in the same reaction, according to the combinations described in S1 Table. PCR was carried out as follows: 95˚C-3 min; followed by 40 cycles at 94˚C-30 s, respective annealing temperature (Table 2) -45 s; 72˚C-45 s; and final extension at 72˚C-10 min. All PCR amplifications were performed using a MyCycler (BioRad, USA) thermocycler. For fragment analysis, PCR products were separated by capillary electrophoresis on an ABI 3130 Genetic Analyser (Applied Biosystems, Foster City, CA, USA) and peaks identified using internal size GeneScan™ 500 LIZ 1 Size Standard (Applied Biosystems) (S1 Fig). DNA fragment lengths were determined using GeneMapper software (Applied Biosystems, USA).

Data analysis
Genetic diversity parameters, such as total number of alleles (N) and polymorphic information content (PIC), were calculated using PowerMarker v.3.25 software [40]. Mean number of alleles (Na), expected heterozygosity (He), observed heterozygosity (Ho), Shannon's diversity index (I) and Fixation Index F (Inbreeding Coefficient) and Wright's F ST used to estimate genetic diversity and population differentiation, were generated by GenALEx v 6.5 software [41]. Analysis of molecular variance (AMOVA) was also performed using GenAlEx v6.5, to evaluate genetic variation among and within populations. Genetic distance was estimated according to Nei parameter [42]. MICROCHECKER [43] was used to test for the possibility of scoring errors, allelic dropout, and null alleles. Principal coordinate analyses (PCoA) [44] were performed, using GenAlEx v6.5 to identify genetic variation patterns among C. cardunculus genotypes. Genetic dissimilarity matrices and neighbour-joining (NJ) cluster analyses were used to construct genetic affiliation trees using Darwin v.6 software [45].
Population structure was performed using the Bayesian model-based clustering approach, using software STRUCTURE v2.3.4 software [46], to elucidate relationships among populations. Initially, geographic populations were assigned to 17 groups ( Table 2). Number of populations (K) was estimated by performing five independent runs for each K (from 1 to 10), using 100 000 MCMC steps and 50 000 burn-in periods, assuming the following parameters: admixture model and correlated allele frequencies model. The optimum number of populations (K) was processed and identified by STRUTURE HARVESTER web v 0.6.94, July2014 by comparing log probabilities of data for each value of K [47,48]. The clustering pattern was visualised using the Structure Plot V2 [49].

Results
Having a thorough comprehension of the genetic diversity and population structure of wild C. cardunculus, in Portugal, is an important step towards using available genetic resources, to develop cultivars useful to agriculture and industry. Until now, available knowledge regarding cardoon germplasm in the Iberian Peninsula was limited. The results of our study significantly enhance that knowledge.
For our study, SSR markers were used to characterize genetic diversity of 121 genotypes of C. cardunculus collected across the Alentejo region (southern Portugal), from 17 Portuguese populations. For this characterization we analyzed several parameters including, number of alleles (N), polymorphic information content (PIC), number of different alleles (Na), number of effective alleles (Ne), Shannon's Information Index (I), observed (Ho) and expected (He) heterozygosity, and Fixation Index (F).

Diversity parameters for Cynara cardunculus individuals
For genetic characterization, initially 23 SSRs were used for microsatellite screening of the cardoon collection. Of these, only 10 showed reproducible and specific PCR amplification, as confirmed by sequencing (S2 Table). A total of 138 alleles were generated from all 121 C. cardunculus individuals under study. The number of alleles per locus showed a range from nine (CyEM-183) to 17 (CELMS-58), with an average of 13.8 (Table 2; S3 Table).
These results demonstrated that the employed SSR markers were effective in providing valid estimates of genetic diversity of the cardoon population, as represented by the average of genetic diversity indices (PIC = 0.741, He = 0.591, I = 1.114).

Genetic diversity of Cynara cardunculus populations
Genetic diversity analysis were performed for all C. cardunculus populations (Table 3)

PLOS ONE
Genetic diversity and population structure of Cynara cardunculus L.
populations were more homozygous than expected (F positive). CH, PG, ALV, TR and AR populations showed negative F values, indicating significantly higher heterozygosity than expected, possibly resulting from negative assortative mating or selection for heterozygotes. MICROCHECKER analysis did not detect evidence for scoring errors due to stuttering, neither for allele dropout, nor for a high frequency of null alleles in any of the tested loci, although still not perturbing Hardy-Weinberg equilibrium of the natural populations.

Genetic differentiation within and among geographic populations
Analysis of molecular variance (AMOVA) for genetic differentiation among and within populations of C. cardunculus showed only an occurrence of 14% of genetic variation occurred among populations (Table 4). Contrastingly, 86% of remaining variability in genetic variation was represented within the population. The results of principal coordinate analysis (PCoA), based on Nei's genetic distance, are presented in Fig 2. The first two coordinates of the analysis account for 36.6% of the total variation. The first coordinate explains 18.88% of the variation and indicates mainly, the degree of separation of AR, MC, CH, EV from the remaining populations. The second coordinate explains an additional 17.71% of the variation (Fig 2 and S4 Table).

PLOS ONE
Genetic diversity and population structure of Cynara cardunculus L.
Pairwise F ST values, a measure of genetic differentiation among populations, showed the most differentiated wild populations in C. cardunculus were EV and AR (F ST = 0.32), while the least differentiated wild populations were PG and SV (F ST = 0.054) (S5 Table). The high pairwise population Fst values (S5 Table) observed between the AR, EV and MC wild populations indicate those populations were probably undergoing differentiation process, concurring with the data presented in Fig 2. The remaining wild populations presented lower pairwise F ST values (S5 Table), indicating a lower genetic differentiation.

Population structure
The 121 individuals of C. cardunculus were further evaluated for population stratification based on the admixture model approach using STRUCTURE software [46]. When SSR data were analysed, the number of subpopulations (K) tested were increased from one to 10. Estimation of ΔK by LnP(D) and Evanno's ΔK method analysis, revealed the highest value for K = 2 (ΔK = 159.78), while K = 3 (ΔK = 3.17) and K = 8 (ΔK = 19.21) presented also the high levels of K (S2 Fig; S6 Table).
Based on the K = 3 model (Fig 3B), genotypes from geographic locations MC and AR, and CH4/CH5 genotypes from CH population were still clustered in the same group (pink), with a q value above 0.80. However, genotypes CH1, CH2 and CH7 showed a q value of approximately 0.5, indicating an admixed ancestry. In the blue group, EV appears jointed with JURA (JURA1-4), individuals from HB (HB2, HB3, HB5, HB6, HB7), HP1 and SAL3, with a high q value (>0.80). The second higher value of ΔK was observed at K = 8. Individuals from TR, AR and EV were maintained as distinct subgroups within the structure of the population (q>0.80), while CH (CH1 to CH6) and MC still joint in the same cluster (pink; Fig 3C). The CH7 genotype showed admixed with a q value of 0.75.
The differentiation of EV, CH, MC and AR locations was observed in PCoA analysis, and is also supported by Fst values and confirmed by populations STRUCTURE analysis. In addition, the admixture model approach shows there is a consistent structure within the Juromenha location, concerning JURA and JURB populations, which is indicated by the K = 2 and K = 3 analyses. Although individuals from JURA and JURB are located within the same geographic area, genotypes of JURA are in fact isolated from JURB by a water barrier.
A phylogenetic tree using neighbour-joining analysis, based on genetic distance, rendered three distinct groups (I, II, III) of C. cardunculus (Fig 4). This analysis, based on a dissimilarity matrix, grouped AR, MC, CH, EV and some individuals from HB, and JURA in cluster I. The second cluster grouped individuals from TR, SAL, SV and HSR among others. The HA population was assigned to an independent cluster, cluster III.

Microsatellite polymorphism and genetic diversity of Cynara cardunculus
Characterization of genetic diversity is fundamental to design and manage strategies for species conservation and breeding programs. However, concerning Portuguese C. cardunculus germplasm there are only few studies supported by molecular markers, across its natural distribution [33,35]. This study encompasses several wild genotypes of C. cardunculus from different geographic locations in Portugal. Our study is comprehensive, and thus an important contribution to improve the ability to assess and manage C. cardunculus germplasm in Portugal. Consequently, it provides a contribution to our understanding of this species germplasm characterization also in the Mediterranean region, supplementing the limited information currently available [35].
Efficiency of molecular markers, for population studies, is largely dependent upon their ability to detect levels of polymorphism. The mean number of alleles per locus observed in our study (N = 13.8) was higher than in prior studies of C. cardunculus [32,34,38,50]. According to Botstein et al. [51] eight SSR loci were found to be highly informative (PIC > 0.5), with two others being moderately informative (0.5 > PIC > 0.25). The mean PIC of SSRs collectively used in our study was higher than those used in previous C. cardunculus genetic diversity  Table 1 for abbreviations of geographic populations). https://doi.org/10.1371/journal.pone.0252792.g004

PLOS ONE
Genetic diversity and population structure of Cynara cardunculus L. studies using nuclear microsatellites [34,38,50]. Our findings indicate that the SSR markers we used were sufficiently informative to suitably evaluate Portuguese C. cardunculus germplasm.
In our study, relatively high allelic diversity and heterozygosity confirm the high level of diversity represented by C. cardunculus populations of southern Portugal found in the wild, perhaps related to the high level of outcrossing. The genetic diversity of Portuguese populations of cardoon in this study was higher (He = 0.591) than the wild cardoon populations from other Iberian populations (He = 0.370), studied by Gatto et al. [33], based on SSR markers. The lower genetic diversity observed by Gatto et al. [33] could be related to lower number of genotypes (n = 5) analyzed. Within the present work, the genotypes from the ALV, HA, HR and HSR populations presented the highest genetic diversity, compared to those from other geographic locations. The mean Ho detected in our study was lower than mean He, which might indicate population isolation. Similar lower levels of Ho were also detected in seeds from leafy cardoon and wild cardoon, indicating that natural cross-breeding occurs in populations in the wild, as also noted by other authors [32,33]. However, some populations have similar values for He and Ho (fixation index F, close to zero) like PG, ALV indicating that the population are under Hardy-Weinberg Equilibrium (HWE), undergoing random mating, without significant natural selection, gene migration, mutation, or genetic drift [52]. The fixation index (F), also referred to as the Inbreeding Coefficient, represents any deviation from HWE, and allows detection of inbreeding, population fragmentation, migration and selection [53]. Most Portuguese populations showed more homozygous genotypes than expected (F positive), with exception of those from the CH, PG, HA, ALV, TR and AR locations. One factor explaining the relatively lower heterozygosity found in some geographic locations might be a result of inbreeding, resulting from small population size. The mean Fixation index from Portuguese germplasm in our study was 0.026, similar to that found for Iberian wild germplasm (F = 0.024), described by Gatto et al. [33]. In this study, wild populations from more easterly locations in Europe, such as those from Italy, Greece, Tunisia and Malta had a slightly higher level of inbreeding. Correspondingly, those populations were more homogeneous when compared to Iberian wild germplasm, from Portugal and Spain.
Level of genetic diversity of a species can be dependent on length of life cycle, reproduction system, geographic range and gene flow [53]. In our study, we found the majority of genetic variation occurred within populations of cardoon from the same geographic location. This finding is in accordance to previous studies of other cardoon populations described by others [33,50,54]. Higher diversity of C. cardunculus within the same geographic area might be related to the high level of outcrossing and degree of wind pollination inherent to cardoon. Such processes would increase gene flow between populations and reduce differentiation among them. For decades, cardoon has been used for multiple purposes having a wide range of applications, such as in traditional cheese production, human nutrition, and more recently for energy purposes [55,56]. Hence, human factors may have also played a role in the higher level of genetic diversity detected within population of C. cardunculus in Portugal, while variation was lower among groups. Although the actual contribution of human activity to the rate of gene flow is unknown, low levels of differentiation among some geographical groups might reflect human activity in different regions contributing to exchange of germplasm. In addition, pollination of cardoon is chiefly performed by insects or mechanical (wind) agitation, according to Harwood and Markarian [57]. Cardoon has been considered as a species that predominantly out-crosses, having a low capacity for self-fertilization [34]. This association has resulted in a high level of within-species genetic diversity in cardoon.

Population structure and molecular phylogeny
The structure of a population affects the degree of its genetic variation and pattern of distribution [58]. Our study included an analysis of population structure based on the sampled populations in order to identify any domestication events, heretofore unknown.
The likely number of different subpopulations we found, K = 2, was estimated using computer-based clustering analysis available in STRUCTURE [46]. That analysis identified the genotypes from AR, EV and MC geographic locations in the same cluster, pink group (Fig 3), corresponding to the northern edge of the Alentejo region. Geographic isolation among C. cardunculus populations of northern Alentejo may have resulted in minimal gene flow among them in surrounding areas. Moreover, in the same pink group, the CH population forms a cluster having a high proportion of similar alleles. As far as we know, genotypes from the CH location descended from Portuguese seeds of unknown origin, introduced along ago, into Alentejo, considering the specific biochemical properties of the cardoon flower for cheese production. In addition, the HR population partly represents a relictual experimental field population, introduced for studies on genetic diversity in the region several decades ago. The seeds for this study originated from different parts of Portugal, namely Beja, Quinta-do-Marquês in Oeiras and Torre Vã in Santiago do Cacém. These different Portuguese sources, might explain the similar ancestry proportion in the pink group, of 3 genotypes (HR1, HR4 and HR6) from HR population.
From the phylogenetic tree using neighbour-joining (NJ) cluster analysis it is also inferred a clear clustering of AR, MC, CH, EV geographic locations (Fig 4). The remaining populations were more complex, suggesting they were genetically admixed. The wild conditions of these populations and high level of outcrossing could explain the admixture of ancestry observed in several genotypes.

Conclusions
Understanding genetic diversity and population structure of C. cardunculus is critical for efficient management of its genetic characteristics when designing suitable cultivars. According to our results, identification of microsatellites using SSR markers, proved to be a reliable method to assess C. cardunculus population genetics. Our study is a significant contribution to the knowledge of cardoon genetics and genotypes of wild populations in southern region of Portugal.
The high level of genetic variability within the wild cardoon populations studied, provides essential information for future germplasm conservation. Moreover, this study showed there is significant genetic differentiation in the gene pools among various cardoon groups, namely those from the northern edge of the Alentejo region. This differentiation provides a robust, independent source of genetic variability and is a valuable resource of genetic traits for breeders. Choosing optimal cardoon reproductive material, based on our genetic diversity findings will help to support the stability of C. cardunculus. Moreover, previous studies showing variability in natural product profile and cynaropicrin content, the major SL presented in C. cardunculus leaves [14][15][16], merge our findings reflecting the high level of genetic variability in populations of Portuguese cardoon.
A molecular database reflecting the variability of C. cardunculus genotypes, with identified morphological and biochemical profiles, will be useful to develop new biotech strategies used in future breeding programs. Such programs could significantly be designed to enhance content of bioactive compounds having pharmaceutical and/or nutraceutical applications. The knowledge here disclosed, greatly contributes to augment the economic value of cardoon at both regional and national levels.