Understanding Genetic Diversity and Population Structure of a Poa pratensis Worldwide Collection through Morphological, Nuclear and Chloroplast Diversity Analysis

Poa pratensis L. is a forage and turf grass species well adapted to a wide range of mesic to moist habitats. Due to its genome complexity little is known regarding evolution, genome composition and intraspecific phylogenetic relationships of this species. In the present study we investigated the morphological and genetic diversity of 33 P. pratensis accessions from 23 different countries using both nuclear and chloroplast molecular markers as well as flow cytometry of somatic tissues. This with the aim of shedding light on the genetic diversity and phylogenetic relationships of the collection that includes both cultivated and wild materials. Morphological characterization showed that the most relevant traits able to distinguish cultivated from wild forms were spring growth habit and leaf colour. The genome size analysis revealed high variability both within and between accessions in both wild and cultivated materials. The sequence analysis of the trnL-F chloroplast region revealed a low polymorphism level that could be the result of the complex mode of reproduction of this species. In addition, a strong reduction of chloroplast SSR variability was detected in cultivated materials, where only two alleles were conserved out of the four present in wild accessions. Contrarily, at nuclear level, high variability exist in the collection where the analysis of 11 SSR loci allowed the detection of a total of 91 different alleles. A Bayesian analysis performed on nuclear SSR data revealed that studied materials belong to two main clusters. While wild materials are equally represented in both clusters, the domesticated forms are mostly belonging to cluster P2 which is characterized by lower genetic diversity compared to the cluster P1. In the Neighbour Joining tree no clear distinction was found between accessions with the exception of those from China and Mongolia that were clearly separated from all the others.


Introduction
Seeds were sown in jiffy pots and, and 30 plants were grown for each accession. Total genomic DNA was extracted from 0.5 to 1 g of fresh leaf tissue per plant using the Genelute Plant genomic DNA miniprep kit (Sigma Aldrich) according to the manufacturer's handbook. DNA concentration was assessed through optical density reading (DU650 spectrophotometer, Beckman) and confirmed by agarose gel electrophoresis.
For morphological characterization, 8 randomly chosen plants per accession were transplanted to the field in a randomized design and evaluated for: growth habit at flowering (scoring plants from 1 = erect to 9 = prostrate), rust resistance/tolerance (1 = r esistant/tolerant-9 = highly susceptible), leaf colour (1 = light green-9 dark green), stem length (mm), width and length of the flag leaf (mm), leaf texture (mm), spring plant regrowth (1 = slow-9 = very fast regrowth) and spring growth habit (1 = erect-9 = prostrate) after cutting. Some of the traits were evaluated according to the UPOV Guidelines [27]. Data were statistically analyzed using univariate one way ANOVA and multivariate discriminant analyses. A stepwise discriminant procedure was carried out in order to look at the most useful traits to use in a phenotypic characterization and for distinguishing populations on the basis of discriminant functions.

Flow cytometry analysis of somatic tissue
Karyotyping was difficult and inaccurate due to the high numbers of chromosomes and variable ploidy levels, hence we estimated genome size through flow cytometric measurements of relative total DNA content in somatic tissues. Fifteen plants from each accession were grown in a greenhouse at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK, Gatersleben, Germany), and an average of seven plants were randomly selected for flow cytometry. Nuclei were collected from young leaves of single plants by manual chopping them with a razor blade in the following staining buffer: 100 mM Tris-HCl, 5.3 mM MgCl 2 , 86 mM NaCl,  (30-μm mesh) to remove large debris the tubes were stored in the dark on ice for 1 to 4 h before measurement [28]. Fluorescence intensity of DAPI-stained nuclei was measured on a Ploidy Analyser PA (Partec, Münster, Germany). For each sample the standardized 2C DNA content was obtained by using the following equation: position of sample main peak times position of the internal control main peak (a triploid Boechera holboellii genotype) divided by mean position of the peak of the internal control. The repeated measure of the internal control over multiple days allowed to standardize the main peak position of each P. pratensis samples. The final DNA quantification in pictograms was then estimated through a simple linear regression based on four accessions (PI 286210, PI 298096, PI 355956 and PI 601107) characterized by Wieners and collaborators [10] and common in both studies (R 2 = 0.91, P<0.05). Since individual plants within the same accession might show different somatic tissue DNA content, particularly when multiple reproductive pathways occur, standard errors would be misleading and were therefore not calculated [10]. Differences between wild, cultivated and accessions with uncertain improvement status were analysed by one way ANOVA.

trnL-F region amplification and sequencing
A fragment of the chloroplast trnL-F region was PCR amplified from DNA extracted from 70 randomly chosen individuals representing all 33 collected accessions using primers "c" (CGAAATCGGTAGACGCTACG) and "f" (ATTTGAACTGGTGACACGAG) of Taberlet and collaborators [29]. Ten μl of the PCR products were electrophoresed on a 1.5% (w/v) agarose gel and visualized by ethidium bromide staining. Successful amplifications were purified using Qiaquik PCR purification Kit (Qiagen) following manufacture instructions. Sequencing reactions were performed for both DNA strands using the ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems) on an ABI 3700 genetic analyser according to the manufacturer's instruction. Amplification primers used for PCR were also used for sequencing. All sequence alignments were carried out using MEGA version 4 software [30].

trnL-F region CAPS marker development and analysis
Since only one Single Nucleotide Polymorphism (SNP) was detected in the chloroplast sequenced region (see below) we developed a Cleaved Amplified Polymorphic Sequences (CAPS) marker to test the allelic condition at the locus. PCR primers were developed using PRIMER 3 [31] from multiple alignments of sequences obtained in this study and those retrieved from public databases, and the restriction endonuclease was selected according to the CAPS differential cleavage sites. The allelic condition for the developed CAPS marker was tested on eight individuals randomly chosen from each accession (hereafter population) for a total of 264 individuals. PCR amplifications were carried out using the developed cpCAPfor and cpCAPrew (Table 2) in a 20 μl volume containing 25 ng genomic DNA, 1× PCR buffer (Invitrogen), 200 μM dNTPs, 0.5 μg of Bovine Serum Albumin (BSA), 10 pMol of the cpCAPsf and cpCAPsr primers, and 1 U of Taq DNA polymerase (Invitrogen). All reactions were performed using the following cycling parameters: 94°C for 5 min, followed by 38 cycles of 94°C for 30 s, 51°C for 30 s, and 72°C for 45 s. Aliquots of 11 μl PCR product were digested with 2 units of BsmI in 1× Buffer R for 2 h at 37°C in a total volume of 25 μl. After digestion, 15 μl of the reaction was electrophoresed on a 2% (w/v) agarose gel and visualized by ethidium bromide staining.

Nuclear and chloroplast SSR analyses
A total of 11 nuclear and two chloroplast single sequence repeat (SSR) markers were used to genotype the 264 P. pratensis samples tested for the CAPS marker allelic condition (Table 3).
Concerning microsatellites derived from P. alpina and P. arachnifera, PCR amplification was performed in a total volume of 20 μl containing 30 ng genomic DNA: 1× PCR buffer (New England Biolabs), 200 μM dNTPs, 3,5 pM of the Forward and Reverse primers and 0,75 U of Taq DNA polymerase (New England Biolabs). Since primer were not originally developed for P. pratensis a touchdown PCR method was used to optimize yield of amplified DNA and amplification specificity [32]. The reactions were performed using the same cycling parameters: 94°C for 3 min, followed by 10 cycles of 92°C for 30 s, 58°C for 40 s (decreasing of 0.7°C for each cycle) and 72°C for 60 s and 25 cycles of 92°C for 30 s, 50°C for 30 s and 72°C for 60 s. All others amplifications were performed in a total volume of 12 μl containing 15 ng genomic DNA: 1× PCR buffer (Sigma Aldrich), 1.5 mM MgCl 2 , 200 μM dNTPs, 4 pM of the Forward and Reverse primer and 0,75 U of Taq DNA polymerase (Sigma Aldrich). The following PCR cycling parameters were used: 94°C for 3 min, 35 cycles of 94°C for 30 s, 56°C for 30 s, 72°C for 30 s, and an extension final step of 72°C for 20 min. Amplification products were resolved and analyzed with an ABI 3100 Genetic Analyzer (Applied Biosystems), using Dye Set G5 (ABI) labelled primers. Size calculation and data record of generated amplicons were performed using GeneMapper 3.0 software (Applied Biosystems).

Population structure analysis
Considering variable ploidy in P. pratensis and the fact that it was not possible to infer relative dosage of PCR products, nuclear SSR alleles were scored as dominant markers (i.e. presence/ absence of amplicons). We carried out the analysis by using the software STRUCTURE ver. 2.3.4 [33,34] to investigate the genetic population structure of the 264 P. pratensis individuals from 33 populations. This software has been successfully used to evaluate genetic structure of individuals on different apomictic species, such as for example those belonging to Crataegus, Brachiaria, Panicum, Taraxacum, and Paspalum genera [35][36][37][38][39]. The approach uses a Markov Chain Monte Carlo (MCMC) algorithm to group individuals into populations (clusters) on the basis of multilocus genotype data [33,34]. In this study the procedure that accounts appropriately for the genotypic ambiguity inherent in dominant markers [40] was followed. The number of clusters (K) was estimated by carrying out ten independent runs for each K (from 1 to 10), using 30,000 MCMC repetitions and 30,000 burn-in periods, assuming correlated allele frequencies (Run 1). No a priori population information was used. The percentage of membership (q) of each individual in each of the inferred K clusters was computed by one additional STRUCTURE run for 100,000 MCMC repetitions and 100,000 burn-in periods (Run 2). Two different approaches were used to set the number of clusters (K), the ad hoc statistic ΔK [41] and the change in H' (average symmetric similarity coefficient) from CLUMPP software [42]. The latter indicates the goodness of fit to assign each individual to a given cluster(s), consistently across STRUCTURE independent runs. For each K, the results of Run 2 were compared with those obtained by merging the 10 initial STRUCTURE simulations (Run 1) using the CLUMPP software. CLUMPP was used with the Greedy Algorithm method.
Data on chloroplast DNA variation for both cpSSR (atpB-rbcL) and CAPS marker (trnL-F) were used to classify individuals into different groups on the basis of their relative alleles.

Genetic diversity and phylogeographic analysis
Genetic diversity analysis was carried out using i) the nuclear SSR raw data of the 33 populations and ii) the clusters identified by the software STRUCTURE. For the former analysis, the gene diversity Hj, analogous to the unbiased expected heterozygosity He [43], was computed from allele frequencies estimated following the approach of Lynch & Milligan [44]. The estimates were obtained by AFLPSURV [45], a software using a Bayesian method based on a nonuniform prior distribution of the allele frequencies [46], and able to produce unbiased estimates in dominant markers [47]. For the latter analysis, the F K parameter for each of the K clusters [34] was estimated using the software STRUCTURE. The parameter F K represents the estimated drift from the inferred common ancestor of all populations, thus similar to F ST but specific for each cluster and expected to be proportional to the divergence from a common ancestral population. A low F K value indicates little drift away from the ancestral state. The Tukey-Kramer HSD (Honestly significant difference) procedure was used to test for differences in gene diversity estimates among the 33 populations, while the Wilcoxon signed-ranks nonparametric test for two groups, arranged for paired observations, (i.e. one pair of estimates for each locus) [48], and Bonferroni correction for multiple comparisons was used to test for significant differences among clusters identified by STRUCTURE analyses.
Finally, to further shed light on the relationships among populations, a consensus NJ tree based on F ST estimates between the 33 populations was obtained; 500 matrices of F ST were computed by bootstrapping over loci by using AFLP-SURV [45] on the basis of allele frequencies estimated by the Zhivotovsky [46] approach. The CONSENSE procedure implemented in the PHYLIP software package [49] was used to infer bootstrap values.

Correlation tests
The association between the clusters, as obtained by a genetic diversity analysis at nuclear SSR, and the chloroplast data, for both cpSSR (atpB-rbcL) and CAPS marker (trnL-F), was tested by analysis of contingency tables with the likelihood ratio chi-squared (χ 2 ) test. To evaluate whether the nuclear DNA content was significantly associated with membership coefficients of identified K clusters and alleles of chloroplast markers a bivariate and one way analysis were performed. Finally, a correlation between morphological traits and DNA content was carried out on mean population values. All the test were performed using the JMP 8.0 software (SAS Institute Inc. 2008, Cary, NC, USA).

Morphological characterization
The phenotypic characterization of 32 P. pratensis populations (no plants of XD accession survived transplanting) showed highly significant differences for each of the examined traits (P<0.001; Table 4).
When populations of known origin were separated into cultivated and wild groups, traits such as spring growth habit, leaf colour, leaf texture and spring regrowth, showed significant differences (P<0.05), with cultivated accessions more prostrate, more tolerant to rust, having a darker green colour and slower spring regrowth as compared to wild accessions (Table 4). Stem length, a trait normally used to characterize a variety for registration procedures, was higher in wild compared to cultivated populations. Leaf texture and width of the flag leaf were higher in cultivated accessions, while no statistical significances were found for inflorescence pigmentation and flag leaf length ( Table 4).
The stepwise discriminant procedure included all recorded traits and increased in each step the averaged scored canonical correlation (P<0.001). Out of the ten possible discriminant functions found, the first eight were highly significant, with the first three able to explain over 66% of the total variability (29, 23 and 14%, respectively). The first discriminant function was highly and positively correlated with spring growth habit (0.61), the second negatively correlated with leaf colour (-0.70) and the third positively correlated with pigmentation of the inflorescence (0.64) and spring regrowth (0.60). The centroids of the class means, plotted according to the first three functions, showed that cultivated and wild accessions were clearly separated by the first discriminant function (cultivated accession being more prostrate than wild); no clear cut was observed according to the second discriminant function, although the leaf colour of some cultivars (B, N and V) were generally darker than many wild accessions (E, XC, XI). The third function was able to separate the cultivated from the uncertain origin basically for the presence of inflorescence pigmentation (Fig 2). Accession V, the cultivar Princeton from the USA, was unique for its prostrate growth habit and dark green colour of the leaves. Similar characteristics were shown by accession I and XM, both of uncertain improvement status and from East European countries (Poland and Czeck Republic, respectively). Also landrace M from Hungary clustered apart and was characterized by a light green colour, early spring regrowth and intense inflorescence pigmentation.

Genome size analysis
The total variation in terms of DNA content among 176 P. pratensis genotypes (JG and V were not included) ranged between~4.08 pg (accession B) to~18.66 pg (XO) (Fig 3), while the mean DNA content of each accession ranged from~4.91 (accession M) to~12.69 pg (XO) ( Table 1). The overall data reveal high variability both within and between populations (Fig 3). Uniform DNA content within accessions were found in M, R, S, JD, JE, XA, XD, XG, XL, XM and XP, while all others showed two distinct groups. DNA content of plants that showed contrasting values compared to the majority of plants of the same accession is reported in brackets in Table 1 last column. Considering only the somatic DNA content of the majority of plants present in each accession, no significant differences were detected between cultivated, wild and populations of uncertain improvement status. The distribution of DNA content of the accessions clearly shows a bi-modal distribution, with two major peaks, around 6-7 and 9-10 pg values (S1 Fig). According to the relationship between DNA content and chromosome number, developed for Kentucky bluegrass by Eaton and collaborators [18], the most common chromosome numbers in the collection are between 52 and 60 followed by those between 60 and 68. DNA content was correlated with some morphological traits: in cultivated materials it was negatively correlated with the spring regrowth (-0.66) and with the length and width of the flag leaf (-0.67 and -0.71, respectively), while in wild populations it was negatively correlated with the intensity of the green colour (-0.56) and positively correlated with the rust tolerance (0.54) (all P<0.05).

Molecular characterization
Using two chloroplast SSR primer sets, 5 discernible DNA fragments were obtained: the atpB-rbcL locus was polymorphic with four different alleles (A, B, C and D, respectively) (Fig 4, chloroplast DNA variation part) while rpoC2-rps2 was monomorphic and therefore not included in successive analysis.
The portion of the sequenced trnL-F region yielded sequences ranging from 561 to 566 bp. The alignment of the produced sequences with those available in the NCBI database (AY061951 and AY504641) revealed the presence of only 2 polymorphisms: i) a 5 bp indel (AATAT) at position 328 bp found only in Mongolian (XD) and Indian (XO) populations, and ii) a transversion from adenine to cytosine at position 359 bp (S2 Fig). In the 264 genotypes tested with the CAPS marker developed in this study, adenine type incidence was 51.1% (135 genotypes) while cytosine type incidence was 48.9% (Fig 4, chloroplast DNA variation part). According to SNP allelic state, the 33 studied populations were grouped in: i) adenine containing (13 populations, 39.4%); ii) cytosine containing (10 populations, 30.3%) and iii) adenine and Cytosine containing group (10 populations, 30.3%).
Scorable amplicons were produced for 9 of the 11 nuclear SSR, as loci Poa310 and Poa406 produced unclear electrophoretic profiles (Table 3). Locus SSR AE11 was monomorphic, while the remaining loci were polymorphic with a maximum of 18 alleles (for Poa290 and Poa415) and an average of 10.1 ( Table 3). The number of alleles per genotype varied from a minimum of 1 (as for several genotypes at Poa001, Poa002, Poa290, Poa410, Poa414 and AE11 loci) to 10 different alleles. A total of 91 different alleles were detected.
Population structure, genetic diversity and phylogeographic analysis  other K values were comprised between 0.59 at K = 3 and 0.45 at K = 7, thus not supporting any lower hierarchical structure. The percentages of memberships (q values) at K = 2 were then recalculated by a new run based on 100,000 repetitions and burn-in periods. The q values so obtained were consistent with those of CLUMPP (H' = 0.82), hence we used those from the STRUCTURE's run. A threshold value of q = 0.70 was used to assign genotypes to one of the inferred K clusters (Fig 4).
At the chloroplast level for the locus CAP (trnL-F) we found contrasting results for wild and cultivated materials, with the wild mostly represented by genotypes carrying the 'A' allele (71%) and the cultivated mostly represented by genotypes carrying the 'C' allele (82%). Finally, the DNA content was not significantly correlated with q P1 values and the two chloroplast loci. Cluster P1 was characterized by a significantly higher genetic diversity as compared to cluster P2 (P<0.05, Wilcoxon signed-ranks non-parametric test and Bonferroni correction; Table 5).
Moreover, cluster P1 showed a lower F K estimate (0.09) as compared to cluster P2 (F K = 0.18). A significant F ST value was found between P1 and P2 clusters (F ST = 0.09, P< 0.05). All populations showed similar level of gene diversity (mean Hj = 0.22), with few exceptions (Hj of O population was significantly higher than that of Q, JF, XP and XQ populations; S1 Table).
The NJ tree (Fig 6) revealed the relationships between the 33 populations analysed. No clear distinction was found between them, indeed bootstrap values resulted lower than 50% for almost all the nodes of the tree. However, it is worth noting that accessions from China and Mongolia clustered apart from the other populations (100% of bootstrap value). China (XC) and Mongolia (XD) populations, along with those included in their more closely related clusters (XO, JB, A, XG, R, P, Q, XQ populations) are mostly wild materials, with only populations A (Turkey) and XO (India) being of uncertain biological status. Interestingly, they were assigned to cluster P1 (the only exception is population XQ), they carried the allele 'A' at the locus CAP and showed a high variability at the locus cpSSR with the presence of all the four alleles (Fig 6).

Discussion
The main aim of the present study was to characterize, through both morphological evaluation and genetic diversity analyses, a large sample of worldwide distributed wild and cultivated Poa pratensis encompassing 23 different countries of Europe, Asia, America and Africa. The gained knowledge about their morphology, DNA content, level and structure of genetic diversity and genetic relationships are important both for interpretations of the evolutionary history of P. pratensis, while also providing useful data for breeders.
The analysis of the morphological data showed that the most relevant traits able to distinguish cultivated from wild forms of P. pratensis were spring growth habit and leaf colour (Table 4). This was confirmed both considering these traits independently, as well as considering them together, where both traits resulted correlated with the first two discriminant functions able to explain 52% of total variation. These traits are some of the most important traits that underwent selection during domestication, first, and breeding later [50]. In particular, rust tolerance, prostrate growth habit and dark green colour are among the most important traits selected by breeders for any new turf variety [50]. In fact, the dark green colour and rust tolerance are commonly associated with a healthy turf, while the prostrate growth habit with a better ground cover. Moreover, shorter basal internodes (short stature), higher tiller density, narrow leaf blades, high ground cover, all characteristics related to growth habit, were pursued since domestication [50,51]. Thus, our phenotypic analyses provide evidence that breeding selection for growth habit and leaf colour contributed to the cultivated phenotype. In contrast, traits such as inflorescence pigmentation and flag leaf length were apparently not influenced by anthropic selection pressures, as individuals with extreme values were found almost equally distributed between wild and cultivated forms (Table 4).
In recent years, flow cytometry has become the preferred technique for estimating nuclear DNA content because of its ease of utilization and accuracy [52,53]. The DNA content values found in this study were similar to those reported by other authors. Studying 38 accessions of a USDA core collection of Kentucky bluegrass, Wieners and collaborators [10] observed values ranging from 4.85 to 15.70 pg. In a different study on twenty-two commercial cultivars of this species, values ranging from 5.39 to 17.69 pg were reported [18]. The continue distribution of DNA content (S1 Fig) of the studied collection suggests the presence of several aneuploids in both wild and cultivated P. pratensis accessions and confirm the high genome variability of this species [10,18]. In fact, as suggested in previous studies DNA content can be efficiently used as a means to determining ploidy level in Kentucky bluegrass [18].
Genome size estimation in a polyploid species can be very important since molecular markers alone can fail in detecting the diversity caused by gene dosage effect, which may, in turn, affect the phenotypic diversity between accessions. Accordingly, we identified correlations between genome size and morphological traits related to both plant morphology, growth habit and rust tolerance in both wild and cultivated P. pratensis accessions. This supports previous studies which showed that variation of nuclear DNA content is associated with morphological characters in different plant species, for example negative correlations between nuclear DNA content and mean leaf length and width have been reported in Dasypyrum villosum (species of annual grass in the Poaceae family) [54]. The overall flow cytometry data produced in this study revealed the existence of both intra-and inter-population variation in P. pratensis DNA content. As suggested by Mowforth and Grime [55] intra-population variation in DNA content can arise through opposing forces: i) natural selection acting to reduce DNA amount, ii) the disposition of some genomes to accumulate "selfish" DNA, and iii) the existence of ecological advantages or disadvantages associated with large and small DNA amounts. The same authors postulated that in in P. annua, individuals with large genomes have an advantage under winter and early spring conditions, sufficient to guarantee their survival compared to faster-growing individuals characterized by small genomes. The negative correlation between genome size and spring-regrowth detected in this study may suggests a similar role for the DNA content in the distribution of P. pratensis, with respect of climate and growing season length. On the other hand, the presence of individuals with contrasting genome size in cultivated accessions (Fig 3), subjected to intense breeding selection, as documented in this study may also suggest the natural disposition of Poa genome to accumulate DNA. This tendency is even more clearly evident in wild populations where as many as 10 out of 16 (62.5%) were found to be characterized by at least two ploidy levels. In fact, as early suggested by Bashaw and Funk, the great adaptiveness of this species could be closely related to its variable ploidy and by the presence of asexual reproduction that provides an escape from the typical reproductive problems associated with unbalanced genomes [4]. Chloroplast genes have been used in numerous phylogenetic studies and have proven to be useful in distinguishing different species belonging to the same genus. Patterson and collaborators used chloroplast markers to dissected the relationships among different species of the complex Poa genus [56]. Data produced in this study on both wild and cultivated P. pratensis accessions, showed low polymorphism level in the trnL-F region. This condition was suggested by Stoneberg Holt and collaborators [57] studying three P. pratensis varieties. However, our data (S1 File), originated from a world wild collection, allow to speculate a situation of widespread homogeneity in this species.
The low diversity here observed could be both related to the low nucleotide substitution rate in plastid DNA and to the mode of reproduction of the species. In fact, the existence of two different asexual reproduction mechanisms (i.e. apomixis and rhizomes production) could strongly reduce plastid diversity within accessions that can derive from one or very few maternal individuals clonally propagated. It is noteworthy that wild and cultivated accessions were well separated according to the allelic state of the SNP position identified in the chloroplast sequence suggesting a strong phylogenetic value for polymorphisms in this region.
Differently from the chloroplast analysis, the high level of polymorphism observed at the nuclear level confirms the suitability of SSR markers in genetic studies of Kentucky bluegrass. In fact, the number of alleles detected per SSR locus (from 1 to 18) was similar to that reported by Honig and collaborators (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25) [26]. The high level of genome variability here reported, in terms of somatic DNA content and SSR allelic diversity, is not surprising considering the high variable ploidy level repeatedly documented for the species [10,18], the coexistence of apomictic and sexual reproduction in wild populations [7,9] and the pollen recognition system, which confers a strong ability to hybridize and retain alien genomes [58], all factors contributing to the extreme complexity of the genome of this species [59].
Only a modest relationship between genetic divergence and geographical origins was reported in P. pratensis by several authors [22,23]. A general lack of separation between cultivated and wild materials at molecular level was reported by Johnson and collaborators [23], who evaluated a USDA P. pratensis collection using RAPD markers and found a wide overlap of 17 commercial cultivars with clusters formed by wild accessions. Our results are consistent with these findings; indeed both the NJ and population structure analyses indicated that a clear distinction between wild and domesticated was not found (bootstrap values in NJ are almost all lower than 50% and, for STRUCTURE, wild and domesticated were attributed to both the two clusters, even if domesticated were mostly attributed to P2). The main reason of a similar genetic structure could rely on breeding activities used to develop new cultivars of Kentucky bluegrass starting from ecotypes. Finally, the widespread distribution of some ecotypes can generate confusion on the improvement status and on the geographical origin of some accessions that can be wrongly considered native of a given place.
However, the analyses of both chloroplast and nuclear data highlighted some differences between wild and cultivated materials. In particular, considering the chloroplast cpSSR a strong reduction of variability was detected in cultivated materials, where only two alleles were conserved out of the four detected in wild accessions. Concerning the chloroplast locus trnL-F most of the cultivated accessions showed the 'C' allele (82%), while, by contrast a predominance of the 'A' allele (71%) was found for the wild accessions. A higher variability was present at nuclear level for wild forms, where the two different clusters were equally represented (average q P1 = 0.51 and q P2 = 0.49), while the domesticated forms were mostly belonging to cluster P2 (average q P2 = 0.79), which showed also a significantly lower genetic diversity compared to the cluster P1. Thus, in spite of specific characteristics of the species that, as mentioned above, could make difficult a distinction between wild and domesticated forms, effects of domestication and breeding, were detected at molecular level by our analysis, with a certain loss of genetic variability in domesticated forms as expected due to founder effects and selection at the target loci.
The analyses at molecular level showed also interesting results related to the origin of the species. In this regard, we first focused our attention on the NJ tree. It showed the relationships between the different populations, with the main result being a clear and well supported subdivision, due to the 100% bootstrap value between the two wild populations from Asia (Mongolia XD and China, XC) and all the others. These two populations were included in cluster P1, the one with a lower F K estimate compared to cluster P2 [34]. F K parameter is similar to F ST but it is specific for each population and is expected to be proportional to the divergence from a common ancestral population, thus suggesting a more ancient origin of P1 germplasm. Moreover, Mongolian and Chinese populations carried the 'A' allele at the chloroplast locus trnL-F, the most represented allele in wild forms and also among the individuals assigned to cluster P1. All these outcomes seem to indicate China and Mongolia as the most probable geographical areas of origin of the species.
Population XO (India) and JB (USA, NJ) were those most closely related to China and Mongolia in the NJ tree, even if their position was not statistically supported by bootstrap. Although both populations belong to cluster P1 and carry the 'A' allele, JB is a cultivated form and XO is of uncertain biological status, hence it is difficult to clearly understand their role and origin.
Within the limits of resolution of the present study, the molecular evidences here presented suggest Mongolia and China as the most probable center of origin of P. pratensis. However, further studies, including a wider sample of P. pratensis accessions from Asia would be needed to obtain a more detailed picture on the topic.