Development of SSR Markers and Genetic Diversity in White Birch (Betula platyphylla)

In order to study genetic diversity of white birch (Betula platyphylla), 544 primer pairs were designed based on the genome-wide Solexa sequences. Among them, 215 primer pairs showed polymorphism between five genotypes and 111 primer pairs that presented clear visible bands in genotyping 41 white birch plants that were collected from 6 different geographical regions. A total of 717 alleles were obtained at 111 loci with a range of 2 to 12 alleles per locus. The results of statistic analysis showed that polymorphic frequency of the alleles ranged from 17% to 100% with a mean of 55.85%; polymorphism information content (PIC) of the loci was from 0.09 to 0.58 with a mean of 0.30; and gene diversity between the tested genotypes was from 0.01 to 0.66 with a mean of 0.36. The results also indicated that major allele frequency ranged from 0.39 to 1.00 with an mean of 0.75; expected heterozygosity from 0.22 to 0.54 with a mean of 0.46; observed heterozygosity from 0.02 to 0.95 with a mean of 0.26; Nei's index from 0.21 to 0.54 with a mean of 0.46; and Shannon's Information from 0.26 to 0.87 with a mean of 0.66. The 41 white birch genotypes at the 111 selected SSR loci showed low to moderate similarity (0.025-0.610), indicating complicated genetic diversity among the white birch collections. The UPGMA-based clustering analysis of the allelic constitution of 41 white birch genotypes at 111 SSR loci suggested that the six different geographical regions can be further separated into four clusters at a similarity coefficient of 0.22. Genotypes from Huanren and Liangshui provenances were grouped into Cluster I, genotypes from Xiaobeihu and Qingyuan provenances into Cluster II, genotypes from Finland provenance into Cluster III, and genotypes from Maoershan into Cluster IV. The information provided in this study could help for genetic improvement and germplasm conservation, evaluation and utilization in white birch tree breeding program.


Introduction
Simple sequence repeats (SSRs), or microsatellite DNA, are short tandem repeats (1-6 bp long) of DNA sequence motifs that are widely distributed in eukaryotic organisms genomes [1][2]. The number of SSR motifs among different species shows polymorphism because of differences in repeated unit numbers [3]. SSRs are PCR-based markers that require low DNA amounts in the amplification of genomic DNA. Since SSR markers are generally co-dominant, multi-allelic, reproducible, and highly polymorphic [4][5][6], they have been widely applied in genetic linkage mapping, germplasmic resource investigation, phylogenetic analysis, DNA fingerprinting, and other genetic studies [7][8][9]. It has been demonstrated that SSR markers are suitable for studying genetic diversity and relationships between plant species, populations, and individuals [10][11].
White birch (Betula platyphylla Suk.), a deciduous broadleaf tree species, is widely distributed in the northeast and northwest of China, in where it plays an important role in timber production [12]. Because of its fast growth and easy regeneration, white birch is a typical pioneer tree as the secondary forest in these regions. In addition, white birch trees have an indispensable ecological role in the colonization of forest lands after harvesting and protection of wild fire damages in north China. They are also valuable for timber industries because of the compact and spotless qualities of wood [13].
Like other trees, white birch breeding takes a long time to develop a new variety by using phenotype-based traditional breeding methods because of its long life cycle. Because DNA sequence polymorphisms are directly associated with genotypes, a marker-assisted selection (MAS) strategy has been proposed and could be used to directly select desired progenies with target genotype. This method has incomparable superior to traditional breeding methods that infer genotypes from the phenotypes.
As aforementioned, SSR markers are among the best biomarkers in plant breeding programs. Therefore, white birch breeders attempted to use different methods to explore SSR markers in white birch genome. For example, Wu et al. [14] obtained 13 SSR markers from the genomic DNA library of B. platyphylla by using a PCR method. Ogyu et al. [15] obtained 184 SSR-contained clones from the SSR-enriched DNA library of B. maximowicziana and tested 15 SSR primer pairs, of which 8 SSR markers were successfully amplified polymorphic fragments. Kulju et al. [16] screened 38 SSR-contained clones from 17,300 clones in the genomic DNA library of B. pendula and developed 23 polymorphic SSR markers. Truong et al. [17] obtained 17 SSR-contained clones from 8,000 clones in the genomic DNA library of B. pubescens and found 3 polymorphic SSR markers. Recently, the expressed sequence tag (EST) has been widely used to develop SSR markers. Wang et al. [18] found 260 SSR motif-contained EST sequences from 2,548 ESTs (10.2%) in B. platyphylla and designed 45 EST-SSR primers that amplified polymorphic fragments in white birch genome. Lu et al. [19] obtained 331 SSR-contained EST from 3,028 EST sequences of B. platyphylla and developed 28 EST-SSR primers that successfully amplified polymorphic fragments.
One of the SSR applications is the genetic linkage mapping. By using 19 SSRs and 145 AFLP markers, Pekkinen et al. [20] built the first genetic linkage map of B. pendula genome. Jiang et al. [21] constructed high density genetic linkage maps in B. platyphylla and B. pendula species using AFLP and RAPD markers. To date, the numbers of SSR markers used for linkage mapping in B. platyphylla are limited. The numbers of SSR markers can saturate a high density genetic map, which is the foundation of cloning important genes of interested agronomic traits in white birch breeding program. High-throughput sequencing technologies make it possible to develop a large number of SSR markers base on the whole genome sequence information. In order to accelerate the process of germplasm evaluation and cultivar/ or breeding line identification in white birch breeding program [22], the present study was to develop SSR markers based on white birch genome Solexa sequences and used these markers to genotype 41 white birch plants that were collected from six geographical regions in north China and Finland.

Materials and Methods
No specific permissions were required for these locations/activities in this paper. And we confirm that the field studies did not involve endangered or protected species.

Materials
Seeds of 41 white birch genotypes were collected from 6 different geographical regions in Heilongjiang and Liaoning provinces in China and Finland (S1 Table) and sown in the greenhouse at the Tree Breeding Base of Northeast Forestry University, Harbin, China. Of them, 36 genotypes were from Huanren (3), Qingyuan (7), Xiaobeihu (7), Maoershan (15), and Liangshui (4) in Heilongjiang and Liaoning provinces, China, and the other five genotypes were imported from Finland. Young leaves were collected from the trees in the growing season and stored at -80°C for DNA extraction.

DNA Extraction
Total genomic DNA was extracted using Universal Genomic DNA Extraction Kit (TaKaRa, Dalian, China) following the manufacture's instruction. DNA concentration and quality were checked and quantified using a NanoDrop 2000c Spectrophotometer. The DNA was stored at -20°C for sequencing and PCR analysis.

Solexa Sequences and SSR Primer Design
Sequencing of white birch genome was implemented by BGI (Shenzhen Company Ltd., Shenzhen, China) using the Solexa next-generation sequencing technology (Illumina GA). The short sequence reads were cleaned and then assembled by using the SOAPdonova software. The genome of B. platyphylla was estimated approximately 440 million base pairs across 28 chromosomes. The clean, assembled sequences were used to search SSRs by using software SSRIT [23]. Repeats containing dimer, trimer, tetramer, pentamer, and hexamer motifs which are longer than 20bp in general were selected for SSR primer design using Primer Premier 5.0 [23] by following standard parameters: target amplicon length of 100-500 bp, annealing temperatures of 50°C-70°C, GC contents of 50%-70%, and primer size of 18-24 bp. The SSR primer pairs were synthesized at Sangon Biotech (Shanghai, China).

PCR assay and Detection
In order to detect SSR polymorphism, a feasible PCR condition was optimized. The total reaction mixture of 20 μl included 50 ng DNA, 1.0 μl of 10 μmol forward primer, 1.0 μl of 10 μmol reverse primer, 0.5μl of 10 mmol dNTP, 2 μl of 10× buffer (100 μmol Tris-HCl, 500 mmol KCl, 0.8% Nonidet P40), 2 μl of 25 mmol MgCl 2 , and 0.2 μl of Taq polymerase (5 U/μl). PCR amplification was performed in an MJ Research PTC-200 thermocycler (MJ Research, MA, USA), starting with an initial denaturation step of 94°C for 4 min, followed by 35 cycles of denaturation at 94°C for 1 min, annealing at appropriate temperature (depending on SSR primers) for 1 min and extension at 72°C for 30 sec, with a final extension step at 72°C for 10 min. The PCR products were subject to electrophoresis on 6% polyacrylamide denaturing gels in 1x TBE buffer and visualized by silver staining.
The PCR products were eluted from the gel using MiniBEST Agarose Gel DNA Extraction Kit Ver.3.0 (TaKaRa, Dalian, China) and cloned into pMD19-T Vector. The recombinant plasmid were transferred into E. coli by using a hot shock method and sequenced by GENEWIZ (Suzhou, China) using M13F (-47) and M13R (-48) primers.

Statistical Analysis
The visible band of each genotype was recorded as binary data: 1 = present of band and 0 = absent of band. Statistical components, including major allele frequency, polymorphism information content (PIC), gene diversity, observed heterozygosity (Ho), expected heterozygosity (He), Nei's index (1973), and Shannon's Information index, were computed by using the POPGENE (version 1.32) program. In order to generate a dendrogram showing the relationships of genetically diversified samples, cluster analyses were performed using the unweighted pair group method average (UPGMA) method (NTSYS-pc2.11a software) [24]. The dendrogram was visualized with the TreeView 1.6.6 [25].

Results and Discussion
Molecular markers are widely used in plant genetics, breeding, biological diversity analysis, and cultivar identification since they can directly manifest genetic differences at the DNA level. SSR motifs are polymorphic, abundant, and randomly distributed in eukaryotic genomes [1]. Compared to other biomarkers, such as RAPDs and AFLPs, SSR markers are stable, co-dominant, and low cost. Therefore, they have been widely used in genetic analysis and genomic linkage mapping.
High-throughput Solexa sequencing technology has provided an efficient tool to develop SSR markers. In the present study, 544 SSR primer pairs were designed from the white birch genomes (S2 Table) and tested polymorphism among five white birch genotypes. Of them, 215 showed polymorphisms with visible bands, indicating that 39.5% of SSR loci could be used for white birch genotyping. It also suggests that development of SSR markers from the highthroughput whole genome sequences is more efficient than from genomic DNA library and EST sequences, because the SSR markers from whole genome sequences are more wide distributed and then show higher rates of polymorphism. Of the 215 polymorphic loci, 111 solid loci were selected to genotype the 41 white birch genotypes ( Table 1). As results, a total of 717 alleles were visualized across these 41 genotypes. The SSR allele numbers varied by loci ranged from 2 (Loci BP-016, BP-022, BP-080, BP-121, and BP-301) to 12 (BP-210). The polymorphic rates of 111 primer pairs across the 41 white birch genotypes ranged from 17% to 100% with an average of 55.85%. Eleven loci including BP-016, BP-019, BP-022, BP-028, BP-044, BP-065, BP-080, BP-097, BP-224, BP-250 and BP-301 presented 100% polymorphism, but AF310866 showed the lowest rate of polymorphism (17%). In this study, 111 selected polymorphic SSR loci amplified an average of 6.46 alleles per locus, which was higher than that reported by Wu et al. [14] (4.69 alleles per locus) and close to Kulju et al. [16]. The polymorphism information content (PIC) is determined by both allele numbers and allele frequency distribution and can be used to evaluate the variation of SSR alleles [26]. The results in this study showed that the 111 loci had low to moderate PIC, ranged from 0.09 (BP-127) to 0.58 (BP-069) with a mean of 0.30 (Table 1). Similarly, these SSR loci showed low to moderate gene diversity in a range of 0.01 (BP-127) to 0.66 (BP-069) with a mean of 0.36. The low to moderate PIC (0.30) and gene diversity (0.36) indicated that white birch genotypes from the six geographical locations had a lower genetic variation. Among the 111 SSR loci, locus BP-069 had the highest PIC (0.58) and gene diversity (0.66), which suggested that this marker can be used to differentiate most white birch genotypes in Betula breeding programs. In contrast, locus BP-127 had the lowest PIC (0.09) and gene diversity (0.01), indicating lower polymorphism and less utilization in the Betula cultivar identification. In addition, some other statistical analyses in the present study also reflected similar observations, higher major allele frequencies in a range from 0.39 to 1.00 with an average of 0.75, expected heterozygosity (He) from 0.22 to 0.54      The To verify the genetic basis of sequence length variation, The PCR products were re-sequenced. The alignment profile of multiple sequences of BP-293 locus was illustrated in Fig 2. Sequence lengths ranged from 172 bp to 176 bp and the numbers of the SSR motifs ranged from 8 to 10. The results indicate that the PCR truly amplified the targets containing the expected SSR motif.
Genetic diversity is a result of gene evolution in plant species [27] and becomes a foundation of the genetic improvement of species. Analyses of genetic diversity by using molecular markers could provide better understanding of genetic background of white birch cultivars. The results of the present study indicated that the white birch trees from six geographical locations  had low to moderate similarity (0.025-0.610) and could be further separated into four clusters at a similarity coefficient of 0.22 (Fig 3). Genotypes from Huanren and Liangshui were closely related and grouped into the cluster I, and the genotypes from Xiaobeihu and Qingyuan into cluster II. Genotypes from Finland, and Maoershan were apparently different from each other and from the other groups as well, and grouped into the clusters III and IV respectively. The clusters of genotypes were apparently agreed with their provenances, suggested that the SSR primers used in this study can effectively distinguish white birch germplasm. The genetic relationships between these genotypes might provide useful information for genetic improvement and germplasm conservation, evaluation and utilization in white birch tree breeding program.
Supporting Information S1 Table. The tested white birch materials for SSR analysis. The information of tested white birch materials for SSR analysis. (DOCX) S2 Table. Information of 544 SSR primer pairs. The details of 544 SSR primer pairs included probe accessions, primer ID, repeat motif, forward/reverse primer sequence, annealing temperature (Tm), expected product size, observed fragment sizes in genotypes 1-5, number of alleles obtained. (DOCX)

Author Contributions
Conceived and designed the experiments: TJ BZ. Performed the experiments: WH SW TJ HL. Analyzed the data: WH TJ. Contributed reagents/materials/analysis tools: WH TJ. Wrote the paper: WH TJ XW.