Population Genetics of the Rubber-Producing Russian Dandelion (Taraxacum kok-saghyz)

The Russian dandelion, Taraxacum kok-saghyz (TKS), is a perennial species native to Central Asia that produces high quality, natural rubber. Despite its potential to help maintain a stable worldwide rubber supply, little is known about genetic variation in this species. To facilitate future germplasm improvement efforts, we developed simple-sequence repeat (SSR) markers from available expressed-sequence tag (EST) data and used them to investigate patterns of population genetic diversity in this nascent crop species. We identified numerous SSRs (1,510 total) in 1,248 unigenes from a larger set of 6,960 unigenes (derived from 16,441 ESTs) and designed PCR primers targeting 767 of these loci. Screening of a subset of 192 of these primer pairs resulted in the identification of 48 pairs that appeared to produce single-locus polymorphisms. We then used the most reliable 17 of these primer pairs to genotype 176 individuals from 17 natural TKS populations. We observed an average of 4.8 alleles per locus with population-level expected heterozygosities ranging from 0.28 to 0.50. An average pairwise FST of 0.11 indicated moderate but statistically significant levels of genetic differentiation, though there was no clear geographic patterning to this differentiation. We also tested these 17 primer pairs in the widespread common dandelion, T. officinale, and a majority successfully produced apparently single-locus amplicons. This result demonstrates the potential utility of these markers for genetic analyses in other species in the genus.


Introduction
Worldwide natural rubber production is largely reliant on the rubber tree, Hevea brasiliensis. Most H. brasiliensis cultivation occurs in equatorial regions of Southeast Asia, though there is some production in the Amazon basin [1,2]. Unfortunately, this economically important tree species is threatened by South American leaf blight (SALB) [3]. Though progress has been made in understanding the genetic basis of SALB resistance in H. brasiliensis [4,5], the development of resistant cultivars remains an ongoing challenge [6]. Moreover, there is concern that the responsible pathogen (Microcyculs ulei), which is spread by human activities as well as natural spore dispersal [7], will eventually reach Asian rubber tree populations. There is thus great interest in developing alternative sources of natural rubber.
Taraxacum kok-saghyz (TKS; Russian dandelion) is a perennial, diploid species that produces large quantities of rubber throughout the plant, especially in its roots. Initial attempts at developing TKS into a viable source of natural rubber were prompted by limited access to rubber plantations during World War II [8]. When the war ended and rubber trees once again became an available source of latex, however, TKS improvement efforts were abandoned. Recent years have seen a renewed interest in the possible development of TKS as a rubber crop, though improvement efforts have been limited in part by a lack of genetic information. While great strides have been made in our understanding of rubber production in TKS and congeners [9][10][11][12][13][14], we still know very little about the population genetics of this species.
Here, we report the development of simple-sequence repeat (SSR) markers derived from available TKS expressed-sequence tags (ESTs) and their use in population genetic analyses of wild-collected TKS accessions. These EST-SSRs represent the first published nuclear sequencetagged site (STS) markers for this fascinating species, and their use provides novel insights into the distribution of genetic variation within and among natural populations of TKS.

EST-SSR discovery and primer design
A total of 16,441 ESTs obtained from the root tissue of young (1 to 3 month) TKS seedlings were downloaded from GenBank (accession numbers DR398435 to DR403165 and GO660574 to GO672283). Sequences were then assembled into contigs using CAP3 [15] with the default settings prior to SSR identification and primer design. The resulting unigene set (i.e., contigs and singletons) was imported into BatchPrimer3 [16] for SSR discovery and primer design. We specifically searched for SSRs using the following repeat criteria: di-nucleotide motif > 5 repeat units; tri-nucleotide motif > 3 repeat units; and quad-, penta-and hexa-nucleotide motifs > 2 repeat units. We next used EST-SCAN [17] and BLAST searches on NCBI to determine whether a given SSR was located in coding sequence (CDS) or an untranslated region (UTR). PCR primer design was performed using the default settings. A subset of 192 primer pairs was ordered for testing (Integrated DNA Technologies, Coralville, IA, USA). Notably, a 19 bp extension (corresponding to the M13 forward primer sequence) was added to the 5' end of every forward primer to facilitate the use of a three-primer protocol for SSR amplification and fluorescent labeling [18,19], as described below.

DNA extraction and quantification
Seeds derived from 17 wild-collected TKS populations in Kazakhstan were obtained from the USDA-ARS Western Regional Plant Introduction Station. Seeds from each population were placed on moist filter paper and stored in darkness at 4C for 48 hours before being allowed to germinate at room temperature. Following germination, seedlings were transplanted into pots in the University of Georgia greenhouses (Athens, GA, USA). Once established, 600 mg of fresh tissue was collected from each individual for DNA extraction using the Qiagen (Valencia, CA, USA) DNeasy Plant Mini Kit (176 individuals total from 17 populations; mean = 10.4 individuals per population, range = 7-16). Following extraction, DNA was quantified using a NanoDrop (Thermo Fisher Scientific, San Diego, CA, USA) and diluted to 2.5 ng/μl in preparation for genotyping. We also collected tissue and extracted DNA (as above) from four individuals of T. officinale collected from Athens, GA to facilitate an investigation of the transferability of our markers to other species within the genus.

EST-SSR genotyping
All genotyping was done using a modification of the three-primer PCR protocol described by Schuelke [18], as adapted by Wills et al. [19]. We pre-screened the full set of 192 primer pairs described above on a test panel of eight TKS individuals, all from different populations, plus four T. officinale individuals. Reactions were performed in a total volume of 15 μL containing 2.5 ng of template DNA, 30 mM Tricine pH 8.4-KOH, 50 mM KCl, 2 mM MgCl 2 , 125 μM of each dNTP, 0.1 μM reverse primer, 0.02 μM forward primer, 0.1 μM fluorescently labeled M13 primer (either FAM, NED or HEX), and 1 unit of Taq polymerase. The PCR conditions were as follows: 3 min at 95C; 10 cycles of 30 s at 94C, 30 s at 65C and 45 s at 72C, annealing temperature decreasing to 55C by 1C per cycle, followed by 30 cycles of 30 s at 94C, 30 s at 55C, 45 s at 72C, followed by 20 min at 72C. PCR products were visualized on 1% agarose gels to identify primers that successfully produced amplicons. If the above cycling conditions were unsuccessful at producing a single-banded amplicon in TKS, the annealing temperature was increased or decreased, as appropriate, to achieve more or less stringent reaction conditions. For primer pairs that produced a single-banded amplicon, the reactions were diluted 1:15 and visualized on an ABI 3730xl (Applied Biosystems, Foster City, CA, USA) with GGF 500 ROX size standard (Georgia Genomics Facility, Athens, GA, USA), which includes standards ranging in size from 88 to 435 bp. Data files were imported into GeneMarker (SoftGenetics, State College, PA, USA; version 1.85) for genotype calling. The trace files were manually inspected to ensure accurate genotyping and 48 primer pairs that produced readily-interpretable, single-locus polymorphisms in the aforementioned test panel of eight TKS individuals were then used to genotype the full panel of TKS individuals following the same general methods. As more individuals were genotyped, we reduced our final dataset from the original 48 primer pairs to a subset of 17 loci that consistently provided easily interpretable, single-locus polymorphisms. The remaining 31 loci were excluded due to missing data or more complex (i.e., multi-locus) banding patterns that emerged when the larger set of samples was considered.

Population genetic analyses
The multi-locus genotype data were analyzed using GenAlEx (version 6.501) [20]. For each population, we calculated the observed heterozygosity, Nei's unbiased expected heterozygosity [21], and Wright's fixation index [22]. Additionally, F ST and G ST were calculated for each locus. An analysis of molecular variation (AMOVA) was also performed in GenAlEx using 999 permutations to determine the amount of genetic variation that is significantly attributable to populations. We performed a principal coordinates analysis (PCoA) by using a standardized genetic distance matrix as calculated in GenAlEx. A Mantel test was performed to test the correspondence between a pairwise geographic distance matrix (and its log transformation) and a pairwise Nei's genetic distance matrix among all 17 populations.

Identification and characterization of EST-SSRs
Assembly of the 16,441 TKS ESTs resulted in a unigene set of 6,960 contigs and singletons. Of these, a total of 1,248 (17.9%) were found to contain at least one SSR, on par with the values from other plant species [23]. Of the 1,248 SSR-containing contigs and singletons, 154 contained 2 SSRs, 40 contained 3 SSRs, 8 contained 4 SSRs, and 1 contained 5 SSRs, for a total of 1,510 SSRs. As has been observed in other species [24][25][26][27][28][29], we found that the majority of EST-SSRs (55.0%) were tri-nucleotide repeats, likely due to the fact that such repeats will not disrupt the reading frame when length variants occur in coding regions [23] (Table 1;  To investigate the utility of these EST-SSRs for population genetic analyses, we tested a subset of 192 pairs on a panel of one Russian dandelion individual derived from each of eight wild populations in Kazakhstan. These 192 primer pairs targeted 70 di-nucleotide repeats, 119 trinucleotide repeats, and 3 quad-nucleotide repeats; all of these primer pairs targeted EST-SSRs with at least 5 repeat units in the source EST. Of these tested primer pairs, 48 initially produced single-locus, polymorphic amplicons from a majority of test samples. However, as mentioned above, a subset of 17 primer pairs were found to be the most reliable and were used across the full collection of individuals. Ultimately, we genotyped 176 TKS individuals from 17 natural populations (average = 9.9 individuals genotyped per population per locus after accounting for missing data) in southeast Kazakhstan near the border with western China (Fig 1). These 17 loci were used in our population genetic analyses (details below), and their primers were also tested on 4 individuals from T. officinale to assess cross-taxon transferability. Twelve of these seventeen primer pairs successfully produced single-banded amplicons in T. officinale (Table 2), demonstrating the potential utility of these markers for analyses in other species within the genus.
doi:10.1371/journal.pone.0146417.t001 showed some general trends with respect to the location of SSRs within ESTs. Notably, SSRs in untranslated regions had an average of 5.3 ± 1.2 alleles while those in coding sequences had an average of 4.5 ± 0.5 alleles, though this difference was not statistically significant (P > 0.05). Similar to our results, Gupta et al. [24] found that SSRs located in UTRs tended to be more polymorphic than those found in coding sequence (CDS) in bread wheat, perhaps due to more stringent selection against length variants in coding regions. In contrast, results from Helianthus suggest that SSRs found in CDS vs. UTRs exhibit similar levels of polymorphism [31]. Averaging across the 17 loci, the observed heterozygosity levels within populations ranged from 0.28 to 0.47 (0.37 ± 0.01), whereas unbiased expected heterozygosity levels ranged from 0.28 to 0.50 (0.43 ± 0.01; Table 4). After correcting for multiple tests, only three population/ locus combinations were significantly different from Hardy-Weinberg expectations (Bonferroni-adjusted P < 0.05). Furthermore, we found only one population-level F IS value that was significantly different from zero (all others P > 0.05). This particular population (35169) had a significantly positive F value overall, though it had negative F IS values (higher observed heterozygosity relative to expected heterozygosity) for six of the sixteen loci with F IS values (one locus was monomorphic in this population). Two loci in population 35169 had exceptionally elevated F IS estimates compared to the average across loci, which may indicate the presence of population specific null alleles, or may be the result of selection. As a whole, these results suggest that there are little or no systemic deviations from Hardy-Weinberg equilibrium within the other 16 populations, though it should be noted that average sampling per population was somewhat limited. It is thus possible that deeper sampling would reveal significant deviations that previously escaped detection. Within populations, the mean percent polymorphic loci was 94.5% ± 1.6% and contained one outlier value of 70.6%. A number of previous studies have used molecular markers (including SSRs) to investigate genetic diversity in wild Taraxacum populations. For example, various marker types (including SSRs) have been used to assess levels of diversity in asexual and sexual dandelion populations in central Europe [32]. Additionally, population genetic markers have been used to describe the relative contributions of mutation and recombination in generating genotypic diversity in T. officinale [33]. In TKS, amplified fragment length polymorphisms (ALFPs) have been used differentiate TKS individuals from another species of dandelion, T. brevicorniculatum [34]. Specifically, the researchers show that within species genetic variation is much higher in TKS compared to the apomitic T. brevicorniculatum [34]. While their within population sample size precluded a direct calculation of population structure for TKS, a principal coordinates analysis coupled with a dendrogram both suggested relatively low levels of population structure in accordance with our results [34].

Population genetic structure
In terms of population structure, the average F ST across all 17 loci, and considering all 17 populations, was 0.11 ± 0.007 (mean G ST was 0.06 ± 0.007). Levels of pairwise F ST in all population comparisons were also relatively low (S2 Table). In addition to the low average level of differentiation amongst these populations, a Mantel test revealed no signature of isolation-by-distance in our dataset (P > 0.05). These results are generally consistent with the expectation for plant species with wind-based seed dispersal. Indeed, Nybom [35] found an average F ST level of 0.13 for such species in a meta-analysis of SSR-based studies compared to species with ingested seeds (F ST = 0.21), attached seeds (F ST = 0.33), and gravity-dispersed seeds (F ST = 0.34) (all F ST estimates from [35]). Additionally, there exists a trend toward long-lived perennials (F ST = 0.19) having less genetic structure relative to both short-lived perennials (F ST = 0.31) and annuals (F ST = 0.40) [35]. However, genetic structuring need not be solely attributed to seed dispersal and life history. Indeed, when comparing average F ST of species with different breeding systems, it has been shown that both selfing species (F ST = 0.42) and those with mixed mating systems (F ST = 0.26) exhibit higher levels of differentiation compared to outcrossing species (F ST = 0.22) [35]. An earlier meta-analysis of population genetic studies using allozymes also revealed that, for outcrossing species, genetic structure was lower for wind-dispersed species (G ST = 0.101) when compared to species with seeds dispersed via animal ingestion (G ST = 0.223) or gravity (G ST = 0.189), but was not statistically different from species with animalattached seeds (G ST = 0.114) [36].
Our PCoA further indicated a general lack of structuring with the first axis explaining 9.3% of the variation and the second axis explaining an additional 7.9% (Fig 2). Moreover, the PCoA did not reveal any population that was particularly well-differentiated from the rest; rather, our results suggest a history of substantial gene flow amongst populations. In general agreement with our F ST and PCoA analyses, an analysis of molecular variance (AMOVA) attributed only 7% (P = 0.001) of the genetic variance to population-level differentiation. This relatively low level of genetic structure could be a byproduct of relatively high levels of seed and/or pollen dispersal. Further insight awaits direct investigation of pollen vs. seed-mediated rates of gene flow (e.g., [37]). Relatively low values of genetic structure suggest that most of the genetic variation in TKS can be found within populations. Given these results, deep sampling of relatively few populations may be sufficient to capture the majority of the genetic diversity present within this species.

Conclusions
This study has resulted in the development of a set of well-characterized, polymorphic genetic markers for use in TKS, along with additional, untested PCR primer pairs for the potential development of additional markers. The majority of these markers were found to produce apparently single-locus amplicons in a related species, indicating their potential utility for investigating patterns of genetic variation in other species within the genus. Moreover, our work has provided valuable insight into patterns of genetic diversity in wild populations sampled from throughout the native range of this prospective industrial crop species. The apparently low level of population genetic structure, which may be attributable to potentially high levels of seed and/or pollen dispersal, suggest that much of the neutral genetic diversity within Russian dandelion resides within populations, and can thus be captured by targeted sampling from a subset of the species range. The extent to which these populations vary in terms of adaptively important genetic variation remains to be seen.