Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L.)

Microsatellites or simple sequence repeats (SSRs) are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW) genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR); 70,564 (23.9%) were found to be monomorphic and 224,703 (76.1%) were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3%) amplified one locus, 8 (17.8%) amplified multiple identical loci, and 13 (28.9%) did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising source to increase the number of genetic markers available for the wheat genome. The results of this study will be useful for investigating the genetic diversity and evolution among wheat and related species. At the same time, the results will facilitate comparative genomic studies and marker-assisted breeding (MAS) in plants.


Introduction
Wheat (Triticum aestivum) is one of the most important cereals worldwide. The consumption of wheat is greater than that of rice, especially in China and India [1]. Moreover, wheat has long served as a major renewable resource, providing both feed and industrial raw materials [2]. However, after experiencing explosive growth over the past 40 years, the annual increase in wheat yield has begun to slow or even stagnate in most countries of the world [3]. The deterioration of environmental conditions, such as drought, heat and flooding, and the increasing world population have increased the demand for wheat [3]. To face the increasing demand for wheat, it will be important to breed new varieties of wheat that can withstand biotic and abiotic stresses (heat, cold, drought, flooding and so on) while maintaining yields and quality under conservation agriculture management practices [2]. Unlike the traditional process of phenotypic selection, which is too expensive and labor-intensive, new genetic and genomic approaches have been adopted to improve germplasm characterization at the molecular level [4]. During the past decades, great efforts have been made to develop molecular markers in wheat to improvebreeding strategies [5].
Markers can reveal new alleles, as well as original alleles that were reduced in the wheat gene pool during the process of evolution, thereby offering a deeper understanding of wheat during domestication and selection. Due to the very large size and polyploid complexity of wheat genome [6], progress in wheat research has been slow. However, numerous molecular markers including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNAs (RAPDs), sequence-tagged sites (STS), DNA amplification fingerprinting (DAF), amplified fragment length polymorphisms (AFLPs), simple sequence repeats (SSRs)/microsatellites, expressed sequence tags (ESTs) and single nucleotide polymorphisms (SNPs) have been used for molecular development, marker-assisted selection and marker validation in various wheat breeding studies [7].
SSRs have become the best choice among markers used in plant breeding programs, as they are practical, convenient, easy to use and inexpensive. Among all available molecular markers, SSRs are easy to score and have wide genomic distribution, codominant inheritance and a multiallelic nature. In addition, SSRs are superior to SNP markers because SSR markers can reveal more information per locus than biallelic SNP markers [8], which explains why SSR markers remain popular. To date, more than 4,000 SSR markers have been developed and used in genetic mapping studies of wheat. These markers enabled the construction of consensus maps or comparative maps by facilitating increasing marker density in specific regions [9]. An amount of SSR markers have been identified from chromosomes in wheat, like 1AL and 5DS [10,11]. Lucas et al have identified 362 SSR markers and 6948 ISBP molecular markers from the long arm of T.aestiivum chromosome 1A. Then 44 putative markers (eight SSRs, 26 ISBPs and ten ISBPs incorporating SSRs) were tested for polymorphism. 23 (52.3%) were found to be useful. These work will benefit to map chromosomes and further research in wheat marker assisted breeding.
SSRs, which are unevenly distributed in the genomes of prokaryotes and eukaryotes, are tandemly repeated sequences comprising 1-6 base pair (bp) [12]. SSRs derived from expressed sequence tags and genomic libraries are referred to as EST-SSRs and g-SSRs, respectively. To date, numerous SSRs have served as powerful tools to assess genetic diversity, establish core collections, select hybrid parents, study population structures and map or tag functional genes [13]. The polymorphism rate in EST-SSRs is lower than that in g-SSRs [14]. And g-SSRs can serve as valuable complements to EST-SSRs. Numerous EST-SSRs have been generated for wheat, which have revealed high universality between wheat and other cereals, such as barley, maize, rice and sorghum [15,16]. However, few studies have focused on identifying and analyzing g-SSRs in wheat. The availability of the whole draft genome sequence of CSW [6] provides an opportunity to accelerate the process of germplasm evaluation and breeding line identification in wheat breeding programs.
In this study, we identified g-SSRs from the recently sequenced genomic sequence of wheat cv. Chinese spring. The objectives of this study were a) to characterize the density, type and distribution of g-SSR motifs in CSW; b) to develop and analyze Chinese spring genomic SSR markers from a collection of genomic sequences and c) to evaluate the efficiency of these markers in polymorphism identification for application in comparative genomic studies and breeding.

Plant material
The 23 samples used to validate the polymorphic nature of genic-SSR candidate markers included 20 wheat cultivars and three species of its diploid ancestors (wheat A,B,D -genome progenitor Triticum urartu, Aegilops speltoides Tausch and Aegilops tauschii). The 20 wheat cultivars, Triticum urartu and Aegilops speltoides Tausch were provided by the Institute of Crop Science, Shanxi Academy of Agricultural Sciences, Taiyuan, China. Aegilops tauschii was provided by the Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing, China.

Source of genomic sequences
The genome sequence of model wheat (Triticum aestivum cv. Chinese spring) was obtained in FASTA format from URGI (https://urgi.versailles.inra.fr/download/iwgsc/). A total of 10,603,760 sequences were downloaded and studied.

SSR mining and primer design
The identification and localization of g-SSRs were carried out using MIcroSAtellite (MISA, http://www.pgrc.ipk-gatersleben.de/misa) and Primer 3.0 for large-scale primer design. The criteria used to search SSRs with the MISA script were as follows: motifs between two and six nucleotides long, with a minimum of ten repeats for dinucleotides, seven repeats for trinucleotides, five repeats for tetranucleotides and four repeats for penta-and hexanucleotides. The major parameters for primer design were as follows: primer length, 18-22 bp, with 20 bp being optimal; PCR product size, 100-800 bp; an annealing temperature of 50-65°C, with 57°C being optimal; and a optimal GC content of 50%, with 45% being the minimum.

Analysis of SSR polymorphism
Analysis of the uniqueness and specificity of the designed SSR markers in the Chinese spring genome was performed using the re-PCR strategy (http://www.ncbi.nlm.nih.gov/tools/epcr/). Re-PCR can be used to map STSs (sequencing tagged site) or short primers in sequence database. It is a version of e-PCR searching for STSs within DNA sequences. Those parameters were: re-PCR-S hash-file-n1 -g1 -r +. Subsequently, the corresponding amplicons were analyzed and the previously obtained SSR markers were classified as definitely polymorphic or monomorphic. Polymorphic SSR markers amplified multiple identical loci in the Chinese spring genome, while monomorphic markers tended to amplify one locus. These data were analyzed with Excel microsoftware and plotted.
The validation of monomorphic SSR markers in the CSW genome A total of 45 pairs of g-SSR primers were selected randomly for the validation of the designed monomorphic SSR markers in the CSW genome. Genomic DNA was extracted from fresh, young leaves of CSW using an improved cetyltrimethyl ammonium bromide (CTAB) method [17]. After extraction, the DNA quality and concentration were further assessed using a eppendorf biophotometer. Polymerase chain reaction (PCR) was performed in a total volume of 20.0 μl containing 1 μl of 50 ng/μl template DNA, 2 μl of 10× PCR buffer containing 20 mM MgCl2, 0.4 mM of dNTPs, 0.3 U of Taq polymerase and sterile distilled water and 0.8 μl of 10 μmol/L each of forward and reverse primers. The reactions were performed using the following conditions: 94°C for 2 min; 35 cycles of 94°C for 40 s, 55°C for 45 s, and 72°C for 60 s; and a final step at 72°C for 7 min. Then, 2 μl of the PCR product and a 600bp molecular size marker were loaded onto an 8% denaturing polyacrylamide (PAGE) gel in 1×TBE buffer, run at 100 V, and visualized using silver staining. SSR analysis was performed at least twice to confirm primer amplification.
Phylogenetic relationship [18] among 20 wheat cultivars and three species of its diploid ancestors was constructed in a dendrogram coefficients using the program of NTSYS-pc Version 2.10 to estimate the SSRs monomorphic SSR markers.

Characterization of SSRs on each chromosome of CSW
We then analyzed the distribution of SSRs on each chromosome of the Chinese spring genome (  Table 3). The percentage of di-, tri-, tetra-, penta-and hexanucleotide repeats on every chromosome (and even on the whole genome) was nearly identical. The percentage of dinucleotide repeats was the highest, followed by tri-, hexa-and tetranucleotide repeats, while the percentage of pentanucleotide repeats was so low that they could almost be discounted. The ratios of tri-and hexanucleotides were nearly equivalent (Fig 2). In addition, AG/CT and AC/GT were the most abundant dinucleotide repeats on each chromosome, while AAG/CTT, AAC/GTT, AGG/CCT, AGT/ATC and ACT/ATG were the most abundant trinucleotide repeats on each chromosome. The largest proportion of tetranucleotide repeats included AGAT/ATCT, AAAT/ATTT, ACAT/ATGT and ACGT/ATGC, and the most abundant penta-and hexanucleotide repeats were AAAAN and AAAAAN, respectively (S2 Table).

Genome-wide SSR markers development and polymorphism analysis
All SSRs were selected for SSR marker development, and a total of 295,267 SSR markers were successfully designed from the 21 chromosomes of CSW, covering the whole genome at a density of 29.73 per Mb. The densities of SSR markers on each chromosome were similar, ranging from 25.58 to 31.34 per Mb (Table 4). Among the chromosomes that contained SSR markers, the highest density of SSR markers was found on chromosome 5D, followed by 2D and 7D. Furthermore, all SSR markers were validated and subjected to polymorphism analysis via re-PCR. Markers that amplified prominent PCR products were classified as either polymorphic or monomorphic based on the number of corresponding loci. Of the markers amplified, 70,564 were monomorphic and 224,703 were polymorphic. The monomorphic markers included 2,387 (3.38%) present in compound formation and 8,177 (99.96%) present in perfect formation, whose dinucleotide motifs (34.46%) were the most common, followed by hexanucleotide (28.29%), trinucleotide (23.72%), tetranucleotide (7.88%) and pentanucleotide motifs (5.65%), respectively (Fig 3). Moreover, we also examined the distribution of monomorphic markers on the Chinese spring chromosomes. Chromosome 3B had 5,387 monomorphic markers, which was considerably higher than that of the other chromosomes, followed by chromosome 2B and 5B, containing 5,112 and 4,346 monomorphic markers, respectively. Chromosome 1D contained the fewest monomorphic markers (2,082), while 3B contained the largest number of di-, tetra-, penta-and hexanucleotide motifs and 2B contained the largest number of trinucleotide motifs. Chromosome 1D had the fewest di-and hexanucleotide motifs, 3D had the fewest tri-and pentanucleotide motifs and 4D had the fewest tetranucleotide motifs ( Table 5).

The validation of monomorphic SSR markers in the CSW genome
A sub set of 45 monomorphic markers were selected randomly for validation in CSW genome ( Table 6). Of the markers, 24 (53.3%) amplified one locus, which could be used for markerassisted breeding in wheat. 8 (17.8% monomorphic SSR markers amplified multiple identical loci and 13 (28.9%) monomorphic SSR markers amplified no fragment from genome of CSW (Fig 4). These data will provide a solid base for our follow-up study. The 24 monomorphic SSR markers amplified one locus were used to analysis genetic relationship among 20 wheat cultivars and three species of its diploid ancestors. The phylogenetic relationship constructed in a dendrogram coefficients using Numerical Taxonomy System of Multivariate Programs (NTSYS) cluster analysis (Fig 5). At a similarity coefficient0.6, the largest group consisted of hexaploid wheat cultivars and the diploid ansestor of the B genome. The diploid ansestor of the A and D genomes were clustered into subgroups at similarity values of 0.67. Hexaploid wheat cultivars were clustered into two subgroups as well at the same values. Our results indicate that the monomorphic SSR markers had the ability to assess molecular diversity and potential for use in fingerprinting analysis.  [19] and Populus (667.9 SSR/ Mb). [20]. These data may reflect the differences in DNA levels between genomes [8]. The distribution and density of SSRs are highly variable, perhaps due to differences in search criteria and database mining tools [21]. Though differently-sized genomes may also contribute to affecting repetitiveness of microsatellites, the density of SSRs were not significantly related to genome size [22,23] Moreover, the density of SSRs is considerably higher in dicot species than in monocots. We found that dinucleotide repeats were the most frequent motifs on each chromosome in CSW on the whole genome, which was also reported for the sweet orange genome [24]. Biswas et al. calculated the g-SSR frequencies in 11 plant genomes and found that dinucleotide repeats were predominant in both monocot and dicot genomes [20]. This conclusion is consistent with the current results. Like the genomes of Arabidopsis thaliana and rice, CSW contained the most AG/TC dinucleotide repeats, followed by AC/GT and AT/AT repeats [25,26]. The basic SSR composition of Chinese spring gives priority to A and T in all types of repeats; for example, CG/CG occurred at the lowest density among dinucleotide repeat motifs. This is also the case for human, Drosophila melanogaster and other eukaryotic genomes [27]. Hong et al. [28] reported that GAA/TCC and AGA/TCT are the most frequent trinucleotide patterns in the Solanaceae, and CCG/CGG are the most abundant trinucleotide patterns in the coccolithophore Emiliania huxleyi [29]. Although the densities of trinucleotide repeat pattern are different among varies species, the most abundant patterns in CSW, Arabidopsis and Brassicarapa are identical; the greatest number of trinucleotide repeats comprise AAG/CTT [28]. Among tetra-to hexanucleotide repeats, AAAN, AAAAN and AAAAAN are much more common than other repeat motifs [28], which is also true for other plant genomes.

Comparative characterization of SSRs between CSW and its related species
The occurrence of SSRs in genomes mainly result from mutations during evolution, such as replication slippage, addition or removal of one or several repeat motifs. Therefore, the particular number and lengths of SSRs can serve as an index of genetic variation during the process of evolution. Chinese spring, an allohexaploid Triticum aestivum cultivar, contains three homoeologous genomes (A, B and D) [30]. The number of SSRs in the A, B and D genomes was These results suggest that there is the most variation in the B genome of CSW (Fig 6). This finding may at least partially explain why the draft genomes of the wheat A-genome progenitor Triticum urartu and D-genome progenitor Aegilops tauschii were sequenced, while there is currently no draft sequence for the B-genome progenitor [30,31]. We compared the g-SSR distribution in A-genome progenitor Triticum urartu, D-genome progenitor Aegilops tauschii and Chinese spring (Fig 7). In Triticum urartu andCSW, the dinucleotide repeat motifs were the predominant types (57.8% and 42.5%, respectively). By contrast, trinucleotide repeats were the most abundant motifs in Aegilops tauschii (39.13%). Trinucleotide repeats play an important role in specific selection against frameshift mutations in genetic regions. Trinucleotide repeats could refrain from selective pressures in coding regions owning to they had not generated frameshifts through expansion of triplet SSRs. But, non-triplet SSRs tended to face greater purifying selection with the frameshifts mutations. Therefore, the most percentage of trinucleotide repeats may related to high genic density in Aegilops tauschii [23,32,33]. Additionally, these variations of repeats are related to the different parameters used when mining SSRs of different species.

Genome-wide SSR markers development and polymorphism analysis
As Cavagnaro et al. [34] noted, mononucleotide repeats are not suitable for marker development. Therefore, we only developed primers based on di-to hexanucleotide repeats. A total of 295,267 SSR markers were successfully designed, all of which were validated by re-PCR. Among the SSR markers, 70,564 (23.9%) were found to be monomorphic and 224,703 (76.1%) were found to be polymorphic. These monomorphic markers may serve as powerful tools for detecting sequence variation within a population of wheat and related species, examining the level of changes in genetic diversity and phylogenetic analyses [35]. Of the monomorphic markers, dinucleotide motifs (34.46%) were the most common, with chromosome 3B containing the most monomorphic markers, suggesting that the dinucleotide motifs on chromosome 3B may serve as better selectable markers for MAS in wheat. In addition, these markers represent an important genomic resource for use in many cereal crops and will benefit numerous genetic and genomic studies involving genetic diversity evaluation, population genetics, cloning functional genes related to agronomic and quality traits and comparative genomics in plants.
The validation of monomorphic SSR markers in the CSW genome was performed using 45 randomly selected monomorphic SSR markers. Eight amplified multiple loci. This may be a consequence of the fact that the full CSW genome sequence is not known and that there are large repetitive homologous sequences in non-homologous [6]. This can increase the frequency at which monomorphic makers amply multiple loci. Monomorphic makers which amplify one locus are more useful for phylogenetic analysis of wheat cultivars [36]. Analyses of phylogenetic relationship by using SSR markers could provide a better understanding of genetic background of wheat cultivars and become a foundation of the genetic improvement of wheat. The results of the present study indicated the monomorphic SSR markers used in this study might provide useful information for genetic improvement and germplasm conservation, evaluation and utilization in wheat. Monomorphic makers would play an important role in many aspects of wheat breeding, including in the identification of the genes responsible for desirable traits, and in the analysis of genetic relationships between and the diversity of wheat germplasm collections. Monomorphic markers should assist with improvements in wheat breeding [37].
Supporting Information S1