Development and Characterization of Polymorphic EST-SSR and Genomic SSR Markers for Tibetan Annual Wild Barley

Tibetan annual wild barley is rich in genetic variation. This study was aimed at the exploitation of new SSRs for the genetic diversity and phylogenetic analysis of wild barley by data mining. We developed 49 novel EST-SSRs and confirmed 20 genomic SSRs for 80 Tibetan annual wild barley and 16 cultivated barley accessions. A total of 213 alleles were generated from 69 loci with an average of 3.14 alleles per locus. The trimeric repeats were the most abundant motifs (40.82%) among the EST-SSRs, while the majority of the genomic SSRs were di-nuleotide repeats. The polymorphic information content (PIC) ranged from 0.08 to 0.75 with a mean of 0.46. Besides this, the expected heterozygosity (He) ranged from 0.0854 to 0.7842 with an average of 0.5279. Overall, the polymorphism of genomic SSRs was higher than that of EST-SSRs. Furthermore, the number of alleles and the PIC of wild barley were both higher than that of cultivated barley, being 3.12 vs 2.59 and 0.44 vs 0.37. Indicating more polymorphism existed in the Tibetan wild barley than in cultivated barley. The 96 accessions were divided into eight subpopulations based on 69 SSR markers, and the cultivated genotypes can be clearly separated from wild barleys. A total of 47 SSR-containing EST unigenes showed significant similarities to the known genes. These EST-SSR markers have potential for application in germplasm appraisal, genetic diversity and population structure analysis, facilitating marker-assisted breeding and crop improvement in barley.


Introduction
Barley (Hordeum vulgare L.) is the fourth important cereal crop worldwide. With the rapid development of beer and feed industry, the demand for barley keeps increasing. However, during the longterm domestication of the cultivated barley, especially after the modern breeding and intensive cultivation, the genetic variation degraded significantly, resulting in missing lots of genes, including some rare alleles [1]. The monotonous genetic background of cultivated barley has become the bottleneck of the effectiveness of breeding, while the abundant diversity of wild barley can provide a pool of alleles for barley breeding and improvement [2,3]. Morphological, archaeological cytogenetic and isozyme data revealed that wild barley on the Qinghai-Tibet Plateau is different from the Fertile Crescent wild barley [4]. Researches so far have shown even rich genetic diversity in Tibetan wild barley than in Ethiopian barley [5]. Novel germplasm has been identified from the Tibetan wild barley tolerant to drought, salinity and aluminum toxicity [6][7][8].
Increasing efficient molecular markers would be valuable in diversity analyses, resource conservation and beneficial alleles exploitation for wild barley. Comprehensive sets of expressed sequence tags (ESTs) sequences have been generated in many plants (http://www.ncbi.nlm.nih.gov/dbEST). The availability of increasing sequence databases enables the identification of functional genes with similar sequences in related species [9]. EST-based SSR markers (EST-SSRs) have been widely employed as powerful molecular genetic tools in a large number of cereal crop species due to their high level of transferability, close association to genes with known function, codominant inheritance, and low cost for development with available development from public databases [10][11][12]. Jaikishan et al. [13] used 25 EST-SSRs and 25 genomic SSRs to predict grain yield heterosis; multiple EST-SSRs were generated for wheat (Triticum aestivum L.) and these markers showed high transferability between wheat and the other crops, such as barley, maize, rice, and sorghum [14][15][16]. Up to date, polymorphic EST-SSRs were identified to establish Hordeum chilense evolutional relationships [17] and new EST-SSRs and genomic SSRs were complemented to the published Australian barley genetic maps [18]. However, to our knowledge, little work has been performed to develop EST-SSRs and apply them for population structure in Tibetan wild barley.
In the present study, with the objective of exploiting new SSRs from EST databases and confirming the published genomic SSRs in the Tibetan wild and cultivated barley accessions, 49 EST-SSRs and 20 genomic SSRs were developed and characterized. These markers can be utilized to evaluate the genetic variation and phylogenetic relationships of 96 barley genotypes. Furthermore,

Plant materials
A total of 96 barley accessions were used in this study including 80 Tibetan annual wild barley from Qinghai-Tibet Plateau provided by Huazhong Agricultural University barley germplasm collection, and 16 cultivars from China which were stored at the Institute of Crop Science, Zhejiang University, Hangzhou, China (Table S1). These accessions were collected on public land. And no specific permits were required for the collection. Seeds were surface sterilized with 3% H 2 O 2 for 30 minutes and thoroughly rinsed with distilled water, followed by germination in nutrient rich soil in an incubator (22/18uC, day/night) for 10 days. Total genomic DNA was extracted from barley leaves using the Plant Genomic DNA Kit (TianGen, Beijing, China).

Sequence screening and primer designing
A total of 525999 barley ESTs were acquired from the EST database of GenBank (up to September 2012) (http://www.ncbi. nlm.nih.gov/Genbank/). Redundant sequences were removed from these ESTs using CD-HIT-EST (http://cd-hit.org) with the identity parameter of 95%. The presence of SSRs was screened using Simple Sequence Repeat Identification Tool (SSRIT) software (http://www.gramene.org/gramene/searches/ssrtool). The criteria for di-, tri-, tetra-, and penta-nucleotides were 10, 7, 5, and 4 repeat units, respectively. A total of 188 EST-SSRs were randomly selected and primers were designed using Primer5.0 with a length ranging from 18-22 bp, and product sizes of 100 to 300 bp. The reverse primers were marked with 6-FAM or HEX fluorescent dye at 59 side for each pair. Based on the previous study of barley, 41 genomic SSR markers were selected and SSR primers were designed with the same criteria as mentioned above.

PCR amplification and sequencing
PCR amplification was performed in a total of 20 mL reaction mixture that contained 1 mL of genomic DNA, 1 U ExTaq DNA polymerase (Takara Inc.), 2 mL of 106Ex Taq Buffer (Mg 2+ Plus), 0.2 mM dNTPs mix, 0.05 mM forward primers, 0.1 mM reverse primers and fluorescent primers (FAM or HEX).
The PCR protocol used was as follows: initial denaturation for 5 min at 94uC, followed by 5 cycles of denaturation for 30 s at 94uC, annealing for 30 s at 50uC, and extension for 30 s at 72uC, subsequently followed by 32 cycles of denaturation for 30 s at 94uC, annealing for 30 s at 55uC, extension for 30 s at 72uC, with a final extension for 10 min at 72uC and a 4uC holding temperature. PCR products were diluted and tested on a MegaBACE 1000 DNA analysis system (Amersham Biosciences, Piscataway, NJ) at the Center of Analysis and Measurement in Zhejiang University. The lengths of PCR fragments were calculated using the ET550-R size standard and Genetic Profiler version 2.2.

Calculation of polymorphism
The polymorphism of EST-and genomic SSR alleles were scored for the presence (1) and absence (0) for 96 accessions. Alleles with frequency less than 5% (rare alleles) in the population were removed and considered as missing data for the polymorphism calculation and population structure analysis [19]. The genetic diversity was evaluated by the number of alleles (Na), the effective number of alleles (Ne), observed heterozygosity (Ho), and expected heterozygosity (He) using POPGENE v.1.31 [20]. Polymorphism information content (PIC) was calculated by applying software PIC_CALC version 0.6.

Population structure
Population structure was assessed using the STRUCTURE software v2.3.3 based on the admixture model [21]. Models were tested for clusters (k) from 1 to 15, each with ten independent runs and 100,000 MCMC (Markov Chain Monte Carlo) iterations. The most likely number of clusters (k) was indicated by Dk, the change rate of the estimated log probability of the data (LnP[D]) [22].

Gene function blast
EST-SSRs associated unigene sequences were blasted against the GenBank non-redundant (nr) protein database using BLASTX (http://www.ncbi.nlm.nih.gov/BLAST) with an expected value (E-value) of 10 210 for the function of polymorphic EST-SSRs.

Characterization of polymorphic SSRs
In total, 69 SSR primer pairs, including 49 (26% out of 188) EST-SSRs and 20 (49% out of 41) genomic SSRs (Tables 1 and 2), showed polymorphism among 96 accessions. A total of 213 alleles were generated from 69 loci with an average of 3.14 alleles per locus. The ratio of the EST-SSR repeat motifs was not equally distributed. The di-, tri-, tetra-, and penta-nucleotides accounted for 16.32%, 40.82%, 26.53%, and 16.32%, respectively. Whilst most of the genomic SSRs selected were composed of dinucleotide repeats. According to the results of POPGENE for the 69 SSRs, the observed number of alleles per locus (Na) ranged from 2 to 6 (mean = 3.14) and the effective number of alleles per locus (Ne) varied from 1.09 to 4.54 (mean = 2.30). The average Na was 3.12 and 2.59 for wild and cultivated barley, respectively (Table 3). Besides this, the polymorphic information content (PIC) ranged   (Table 4), for instance, zinc finger protein MAGPIE, transcription factor LAF1, photosystem II reaction center PSB28 protein, xyloglucan endotransglycosylase (XET), and protein kinase APK1B. In addition, the results revealed that the most annotated proteins were from Triticum urartu (17, 36.2%), and the species Hordeum vulgare and Aegilops tauschii accounted for the same percentage (11, 23.4%).

Population structure and genetic distance
To detect the population structure in the 96 barley genotypes, we performed STRUCTURE program for Bayesian clustering analysis using 69 SSR markers, assuming that the number of populations (K) ranged from 1 to 15. The highest log likelihood score (Dk) was at K = 8 ( Figure 1A), indicating that the most suitable number of subpopulations was eight. The frequency of each accession assigned to a subpopulation was shown in Table  S1. If the threshold of frequency was set at 0.5, only six accessions were defined as admixed. However, about 80% of the accessions can be derived from the subpopulations when the threshold was at 0.7. The output of structure analysis demonstrated that wild and cultivated barleys were assigned to different subpopulations ( Figure 1B). Most of the cultivated barleys were classified into the subpopulation 4, except for A74, Tadmor, B1342 and B1031. Fifty percent of the wild barley accessions studied were assigned to subpopulation 1.
According to the values of genetic distance of the eight subpopulations, we get the dendrogram showing the genetic  relationship of the subpopulations via UPGMA clustering analysis ( Figure 2). The dendrogram showed that the subpopulation 3 was most close to the cultivated barleys (subpopulation 4) with the genetic distance of 132.188. The subpopulation 7 had the largest genetic distance (165.167) with the cultivated subpopulation.

Discussion
In recent years, different kinds of molecular markers have been used widely, including marker-assisted breeding, study of genetic relationships between populations, and screening candidate genes associated with the target traits [23]. The simple sequence repeats (SSRs) are increasingly important due to their high polymorphism and convenient techniques. However, EST-SSRs are superior to genomic SSRs for their transcriptional sequence and suitable application in cross-species [24]. In the present study, we developed 49 EST-SSR and 20 genomic SSR markers for wild barley. These novel EST-derived markers will be a valuable resource for tagging and mapping of genes related to agronomic and stress-resistant traits of interest. In addition, these markers are advantageous for identifying functional diversity of unique adaptive germplasm because of their genic function.
In many plants, the di-and tri-nucleotides repeat motifs were the major types, but the predominant motifs were different in various species [25,26]. In our research, the tri-meric repeats were the most abundant motifs (40.82%), followed by the tetra-meric repeats accounted for 26.53%, and the di-meric and penta-meric repeat motifs were at the same frequency (16.32%).The polymorphism of SSRs can be divided into three degrees: high (PIC.0.5), medium (0.5.PIC.0.25) or low (PIC,0.25) [27]. In our study, the genetic diversity of genomic SSRs was higher than the EST-SSRs, with the mean PIC value of 0.57 (high) and 0.41 (medium), respectively, resulting in the general medium polymorphism (mean = 0.46). This finding was in line with previous results, and the lower level of polymorphism of EST-SSRs might be due to the selection against the variation in the conserved regions of the EST-SSRs [28]. Moreover, the expected levels of heterozygosity at EST-SSRs were also not as high as that of genomic SSRs, ranging from 0.0854 to 0.697 vs 0.3899 to 0.7842. Pompanon et al. [29] contributed the deficiency of heterozygosity to the primer problems, the deletion of alleles and appearance of invalid alleles at the annealing points.
Studies of the genetic variation in barley suggested that Tibetan wild barley showed higher polymorphism than cultivated barley [30][31][32]. The results of our study were consistent with the previous studies. The number of alleles and the PIC of wild barley were  both higher than that of cultivated barley, being 3.12 vs 2.59 and 0.44 vs 0.37. The expected heterozygosity (He) showed the same trend, with 0.5098 and 0.4333 for wild and cultivated barley, respectively. The richness of genetic diversity in Tibetan wild barley may be the source of novel genes contributing to the tolerance of biotic and abiotic stresses, which is important in the barley breeding.
BLASTX analysis indicated that 47 (96%) of the 49 unigenes containing EST-SSRs can be matched to at least one important proteins in the NCBI nr protein database. For futher study, we can search the candidate genes of interest via association analysis referring to the function of markers in the metabolism pathways. Furthermore, these EST-SSR markers can be utilized as affirmative markers for comparative studies in the related species, for example, Triticum urartu and Aegilops tauschii.
In the present investigation, the findings of population structure analysis demonstrated that the developed EST-SSRs and genomic SSRs could distinguish between the cultivated and wild barley genotypes clearly. The 96 genotypes were divided to eight subpopulations. The subpopulation 3 (XZ161, XZ163, XZ165, XZ168) was most closely related to the cultivated barley (subpopulation 4), and the subpopulation 7 (XZ120, XZ151, XZ153) and the cultivated barleys were two most genetically distant populations. The genetic relation of the subpopulations suggested that the subpopulation 3 contained the most domesticated genotypes among the studied wild barley. Futhermore, the other subpopulations of wild barley, especially subpopulation 7, may be the important germplasm resource for the improvement of cultivars tolerant of abiotic and biotic stresses. These results were consistent with recent clustering studies in the Tibetan wild barley genotype using DArT markers and SNPs [3]. This indicates that the cluster analysis using EST-SSR and SSR markers is an effective way to determine the structure of populations and can constitute a solid foundation for the genetic variation study.

Conclusion
The 49 novel EST-SSRs and 20 genomic SSR markers developed from 96 barley genotypes were highly polymorphic and could be employed to examine genetic diversity, evolution, linkage mapping, comparative genomics, and population structure. The Tibetan wild barley showed higher genetic variation than cultivated barley, and the cultivated subpopulation could be separated from the wild barley clearly. For further studies, these developed markers could be useful in identifying trait-marker association of interest in the marker-assisted breeding programs in barley.