Genetic diversity of Xanthoceras sorbifolium bunge germplasm using morphological traits and microsatellite molecular markers

Xanthoceras sorbifolium Bunge has great potential for producing biodiesel. In order to select and evaluate appropriate germplasm to produce biodiesel, we analyzed the genetic diversity of Xanthoceras sorbifolium Bunge germplasm based on morphological traits and simple sequence repeats (SSRs) in this study. Fifty-six germplasm samples were evaluated using nine morphological traits and 23 SSR loci. Significant differences among germplasms were observed in eight morphological characters. The SSR markers analysis showed high genetic diversity among the germplasms. All SSRs had polymorphisms, and we detected 77 alleles in total. The number of alleles at each locus ranged from two to six, averaging 3.35 per marker. The polymorphic information content values ranged from 0.36 to 0.61, averaging 0.49. Expected heterozygosity, observed heterozygosity, and Shannon’s information index calculations detected large genetic variations among germplasms. The high average number of alleles per locus and the allelic diversity observed in the set of genotypes analyzed indicated that the genetic base of this species is relatively wide. Thus, microsatellite markers can be used to efficiently distinguish Xanthoceras sorbifolium Bunge germplasms and assess their genetic diversity. Hundred-grain weight and lateral diameter were positively correlated with monounsaturated fatty acids and depended on genotype. These results suggest that seeds with higher hundred-grain weight and lateral diameter could be more suitable to produce biodiesel. Our data will lay a foundation for selecting appropriate germplasm to produce biodiesel based on seed phenotype and will contribute to the conservation and management of this important plant genetic resource.


Introduction
Xanthoceras sorbifolium Bunge (Family Sapindaceae) is a small tree that produces edible fruit and seeds with high oil content. The plants are long lived (up to 1000 years) and tolerant of drought, low temperature, alkaline soils, and low fertility. Because of high monounsaturated fatty acid contents, the oil from this plant has considerable potential for producing biodiesel [1,2]. However, due to rapid expansion in production, planting of unknown cultivars and use of low-quality planting material has often occurred. In addition, the number of similar or a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 The fatty acid methyl ester (FAME) composition of the seed kernel oil was measured with GC-MS using an HP-INNOWAX capillary column (30 m × 0.25 mm × 0.25 μm, model 7890A; Agilent Technologies, Santa Clara, CA, USA). The column temperature was held at 160˚C for 1 min, heated to 250˚C at 4˚C/min, and held constant for 5 min. Nitrogen was used as the carrier gas at a flow rate of 25 mL/min. The injector and detector temperatures were set to 220 and 275˚C, respectively. The hydrogen and air flow rates were set to 30 and 400 mL/ min, respectively. FAME content was quantified by comparison with an external standard (37 component FAME Mix, purity, 97.8-99.9%; Supelco, Bellefonte, PA, USA). The fatty acid qualitative analysis was performed using the standard peak retention times of fatty acids and the MS library, and the quantitative analysis was conducted by measuring peak area.

DNA extraction
Genomic DNA was extracted from young leaf tissues of each germplasm using the Takara MiniBEST Plant Genomic DNA Extraction Kit (Dalian, China). After 0.8% agarose gel electrophoresis, DNA concentration was quantified using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). The DNA concentration was adjusted to 20 ng/μL, and the quality of the product was evaluated by determining that the 260/280 nm and 260/230 nm absorbance ratios were ! 1.8 [15].

SSR markers and polymerase chain reaction (PCR) amplification
Thirty-eight genomic highly polymorphic SSR markers developed by Bi and Guan (2014) [14] were used to assess the genetic diversity in 56 samples. PCR amplifications of all primers were performed in a total volume of 10 μL containing 20 ng DNA, 0.2 μM forward primer, 0.2 μM reverse primer, and 5 μL RR901A mix (Takara Bio). PCR amplifications were performed in a thermal cycler (T100; Bio-Rad, Hercules, CA, USA) using the following sequence: initial step at 94˚C for 3 min, followed by 30 cycles of denaturation at 94˚C for 30 s, annealing at 55˚C for 30 s, and extension at 72˚C for 1 min. The final 10-min extension was performed at 72˚C. The PCR products were checked by 1.5% agarose gel electrophoresis. Subsequently, the primers with corresponding bands were resolved by non-denaturing polyacrylamide gel electrophoresis and visualized by silver nitrate staining to check the DNA banding patterns. Polymorphic bands were used for the identification step. Reproducibility of the PCR procedures was confirmed by repeating the process three times.

Data analysis
Seed phenotype, kernel oil contents, and FAME composition were examined by analysis of variance (ANOVA). The distance matrices were based on the Gower general similarity coefficient [16]. Cluster analyses were performed using the unweighted pair group method with arithmetic mean (UPGMA) procedure and NTSYS-pc 2.11 software (Exeter Software, Stauket, NY, USA).
The polymorphic bands in the SSR marker analysis were scored as either present (1) or absent (0). The alleles were coded alphabetically (e.g., A, B, or C for three bands) in order of decreasing size. The number of alleles per locus (Na), the number of effective alleles per locus (Ne), the Shannon index, observed heterozygosity (Ho), and expected heterozygosity (He) were calculated using POPGENE 32 software [17]. Polymorphism information content (PIC) was estimated using Power Stats V12.xls software [18]. A matrix of genetic distances [19] was constructed for the 56 germplasms. A dendrogram cluster analysis was performed with NTSYS-pc 2.11 software using the UPGMA procedure [20]. The Mantel test was performed to examine the relationships between morphological characters and genetic distance among the 56 germplasms.
The nine characters were used to evaluate genetic distances among the germplasms and to construct a dendrogram (Fig 1). The 56 germplasms were classified into two main groups according to the nine characters. The second group contained only the no. 11 germplasm, which had the lowest 100-grain weight, longitudinal diameter, and lateral diameter. The first group was divided into four subgroups at a coefficient of 16.82. The first subgroup comprised 15 germplasms, which had higher 100-grain weight, kernel oil content, and monounsaturated fatty acids. The second subgroup included only the no. 56 germplasm, which had the highest 100-grain weight, transverse diameter, longitudinal diameter, and lateral diameter, as well as higher kernel percentage, kernel oil content, and monounsaturated fatty acids. The third subgroup contained 20 germplasms and average levels of these characters. The last subgroup contained the remaining 19 germplasms, which had lower 100-grain weight, kernel oil content, and monounsaturated fatty acids.

Relationship between seed phenotype and oil characteristics
A partial correlation analysis was conducted to study the relationship between seed phenotype and oil characteristics ( Table 2). The results showed that 100-grain weight was positively correlated with saturated fatty acids (r = 0.317 Ã ) and monounsaturated fatty acids (r = 0.348

SSR characterization
Considerable variation was observed in the amplified fragment patterns using different primers. Of the 38 primers tested, 15 yielded no amplification products; these are not included in our report. The remaining 23 SSR markers (Table 3) were used for the characterization and genetic diversity analyses of the 56 Xanthoceras sorbifolium Bunge germplasms (Table 4). And the electropherogram were showed in S1 Folder. Seventy-seven alleles were detected in total. The number of alleles (Na) values ranged from two (QXH002) to six (QXH274), averaging 3.35 alleles/locus across the 23 loci. All loci were polymorphic. Polymorphism information content (PIC) values ranged from 0.36 (QXH002 and QXH197) to 0.61 (QXH323), averaging 0.49. The mean expected heterozygosity (He) value was 0.58; values ranged from 0.45 in QBRS192 to 0.68 in QXH323. Observed heterozygosity (Ho) ranged from 21% in QXRB116 to 96% in QXH274, averaging 0.74. Wright's fixation index (Fst) compares He and Ho, and is a

Cluster analyses
Because 100-grain weight and lateral diameter were positively correlated with monounsaturated fatty acids, it was considered whether 100-grain weight and lateral diameter were determined by genetics. Thus, a dendrogram cluster analysis and a Mantel test were performed.  The 56 Xanthoceras sorbifolium Bunge germplasms clustered into two main groups based on 100-grain weight and lateral diameter. The second group contained only the no. 11 germplasm, which had the lowest 100-grain weight and lateral diameter. At a coefficient of 10.11, the first group was divided into four subgroups. The first subgroup contained 13 germplasms, which had higher 100-grain weight and lateral diameter values. The second subgroup included only the no. 56 germplasm, which had the highest 100-grain weight and lateral diameter values. The third subgroup comprised 23 germplasms, which had average 100-grain weight and lateral diameter values. The last subgroup contained the remaining 18 germplasms, which had lower 100-grain weight and lateral diameter values (Fig 2).
Nei's genetic distances was calculated to explore the genetic relationships among the 56 Xanthoceras sorbifolium Bunge germplasms. The genetic distance matrix was subjected to a UPGMA cluster analysis (Fig 3). The 56 germplasms were classified into two main groups, in which the second group contained only the no. 11 germplasm. At a coefficient of 0.50, the first group was divided into four subgroups. The first subgroup contained 10 germplasms. The second subgroup comprised only the no. 56 germplasm. The third subgroup comprised 26 germplasms. The last subgroup contained the remaining 18 germplasms.
The Mantel test results showed that the genetic and phenotypic distances of the 56 germplasms were significantly positively correlated (r = 0.92, P < 0.01) (Fig 4).

Morphological variations
Variations in phenotypic traits are based on the variation and interactions at the genotypic level of the plant as well as the environmental pressure on the plant [21]. Although the 56 germplasms were in the same environment, they showed significant differences among the eight selected characters. This result illustrates that wild germplasm carries an important degree of genetic variation, which is vital for improving modern cultivars with domesticated and breeding-narrowed genetic backgrounds [22]. It was demonstrated here that seed quality includes important morphological traits, such as 100-grain weight, oil content, and monounsaturated fatty acids, among others, which can be used to improve the quality of biodiesel as reported previously [23]. Our results shows that seeds with higher 100-grain weight and lateral diameter values had higher monounsaturated fatty acid contents.

SSR markers
Data on the relationships between genotypes help solve problems in breeding programs and germplasm resource management [24]. Many types of molecular markers, particularly SSR markers, have been used successfully to assess genetic diversity and characterize crop resources [25][26][27]. Table 2. Partial correlation analysis between seed phenotype and oil characteristics of the 56 Xanthoceras sorbifolium Bunge germplasms.   Genetic diversity of Xanthoceras sorbifolium bunge germplasm Molecular techniques based on DNA markers, such as RAPD and ISSR, have been used to characterize genetic diversity in Xanthoceras sorbifolium Bunge [11][12][13], but SSRs markers have not yet come into general use in studies for this species. An SSR analysis was used to investigate genetic diversity in 56 Xanthoceras sorbifolium Bunge germplasms, and 23 SSR polymorphic markers were highly informative. The proportion of polymorphic loci that obtained in this study (100.00%) was exceeded the proportions in previous RAPD and ISSR studies [11][12][13]. The mean number of alleles per locus was 3.35, and the average PIC value was 0.49. As demonstrated previously, the SSR assay approach is appropriate for studies of genetic relationships [10,28] and has proven to be an efficient tool for assessing genetic diversity of Xanthoceras sorbifolium Bunge and identifying its germplasm.

Germplasm selection for producing biodiesel
Our Mantel test analysis detected a significantly positive correlation between phenotypic and genetic distances among the 56 germplasms (Fig 4), suggesting that these selected characters  Genetic diversity of Xanthoceras sorbifolium bunge germplasm (100-grain weight and lateral diameter) depended on genotype. Because the two characters were positively correlated with monounsaturated fatty acids (Table 2), it was hypothesized that seeds with higher 100-grain weight and lateral diameter values would be more suitable for producing biodiesel. Among the 56 germplasms, no. 56 had the highest 100-grain weight and lateral diameter values. Thus, it was inferred that no. 56 would be the best germplasm to produce biodiesel.
Our results help understand the relationships between germplasm characters and genotype and will improve the Xanthoceras sorbifolium Bunge germplasm to achieve higher production of higher quality biodiesel. Our data will lay the foundation for selecting excellent germplasm to produce biodiesel based on seed phenotype, regardless of the environment.

Conclusion
Our data showed significant variations in the morphological traits and microsatellite DNA polymorphisms among 56 Xanthoceras sorbifolium Bunge germplasms. The large average number of alleles per locus and allelic diversity in the set of genotypes analyzed indicate that the genetic spectrum was relatively wide. Our results show that SSR markers are a useful tool to explore Xanthoceras sorbifolium Bunge diversity. Hundred-grain weight and lateral diameter were positively correlated with monounsaturated fatty acids, and were dependent on genotype. These results suggest that seeds with higher 100-grain weight and lateral diameter values could be more suitable to produce biodiesel.
Supporting information S1