Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Genetic diversity and population structure analysis in cultivated soybean (Glycine max [L.] Merr.) using SSR and EST-SSR markers

  • Reena Rani,

    Roles Conceptualization, Data curation, Investigation, Methodology, Writing – original draft

    Affiliations National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan, Constituent College Pakistan Institute of Engineering and Applied Sciences, Faisalabad, Pakistan

  • Ghulam Raza,

    Roles Conceptualization, Data curation, Investigation, Project administration, Resources

    Affiliations National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan, Constituent College Pakistan Institute of Engineering and Applied Sciences, Faisalabad, Pakistan

  • Muhammad Haseeb Tung,

    Roles Data curation, Formal analysis, Methodology

    Affiliations National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan, Constituent College Pakistan Institute of Engineering and Applied Sciences, Faisalabad, Pakistan

  • Muhammad Rizwan,

    Roles Formal analysis, Investigation, Resources, Writing – review & editing

    Affiliation Plant Breeding and Genetics Division, Nuclear Institute of Agriculture (NIA), Tandojam, Pakistan

  • Hamza Ashfaq,

    Roles Formal analysis, Investigation, Methodology

    Affiliations National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan, Constituent College Pakistan Institute of Engineering and Applied Sciences, Faisalabad, Pakistan

  • Hussein Shimelis ,

    Roles Funding acquisition, Resources, Validation, Writing – review & editing (HS); (MA)

    Affiliation School of Agricultural, Earth and Environmental Sciences, African Centre for Crop Improvement, University of KwaZulu-Natal, Pietermaritzburg, South Africa

  • Muhammad Khuram Razzaq,

    Roles Investigation, Methodology, Software

    Affiliation Soybean Research Institute, National Center for Soybean Improvement, Nanjing Agricultural University, Nanjing, China

  • Muhammad Arif

    Roles Conceptualization, Project administration, Resources, Supervision, Visualization (HS); (MA)

    Affiliations National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Pakistan, Constituent College Pakistan Institute of Engineering and Applied Sciences, Faisalabad, Pakistan


Soybean (Glycine max) is an important legume that is used to fulfill the need of protein and oil of large number of population across the world. There are large numbers of soybean germplasm present in the USDA germplasm resources. Finding and understanding genetically diverse germplasm is a top priority for crop improvement programs. The current study used 20 functional EST-SSR and 80 SSR markers to characterize 96 soybean accessions from diverse geographic backgrounds. Ninety-six of the 100 markers were polymorphic, with 262 alleles (average 2.79 per locus). The molecular markers had an average polymorphic information content (PIC) value of 0.44, with 28 markers ≥ 0.50. The average major allele frequency was 0.57. The observed heterozygosity of the population ranged from 0–0.184 (average 0.02), while the expected heterozygosity ranged from 0.20–0.73 (average 0.51). The lower value for observed heterozygosity than expected heterozygosity suggests the likelihood of a population structure among the germplasm. The phylogenetic analysis and principal coordinate analysis (PCoA) divided the total population into two major groups (G1 and G2), with G1 comprising most of the USA lines and the Australian and Brazilian lines. Furthermore, the phylogenetic analysis and PCoA divided the USA lines into three major clusters without any specific differentiation, supported by the model-based STRUCTURE analysis. Analysis of molecular variance (AMOVA) showed 94% variation among individuals in the total population, with 2% among the populations. For the USA lines, 93% of the variation occurred among individuals, with only 2% among lines from different US states. Pairwise population distance indicated more similarity between the lines from continental America and Australia (189.371) than Asia (199.518). Overall, the 96 soybean lines had a high degree of genetic diversity.


Soybean, is the world’s fourth most widely grown crop. Its high-quality protein (40%) and vegetable oil (20%) [1, 2], compared to other crops, make it highly desirable for human and animal consumption and as a biofuel [3]. In addition, soybean plays a vital role in nitrogen fixation during crop rotation [4]. At present, Brazil leads all other soybean-growing nations in production and productivity. Indeed, the productivity in other major soybean-growing countries has increased in the last few decades, even though Pakistan remains behind mainly due to stagnant yields. Although there are more than 120.48 million hectares of soybean grown worldwide, but there is a negligible area under soybean cultivation in Pakistan. Agro-ecological conditions of country are favorable for soybean cultivation but still this crop has failed to attain the suitable position in current cropping pattern. The country is spending about two billion US$ on the import of soybean commodities to fulfil local requirements. Apart from the human food products, soybean meal is the main and preferred source of protein for all types of poultry due to good quality of protein and amino acids. Soybean meal is more frequently used in Pakistan’s poultry industry’s feed items. Although agro-ecological conditions of Pakistan favor soybean production, low genetic diversity has hindered the development of new varieties [58]. Several studies based on molecular markers and inbreeding coefficient analysis have revealed genetic uniformity in Brazilian soybean cultivars [9, 10]. This limited genetic diversity in elite soybean germplasm indicates that the genes present in current cultivars evolved from a small number of accessions. A more varied genetic background is desirable to protect against unexpected pest and disease outbreaks [11, 12].

For plant breeders, diverse genetic resources increase the chance of developing new and improved cultivars with desired traits [13]. In present, considering the large number of genes predicted to be involved in the control of agronomic traits,.main focus for developing modern cultivars is to locate the best alleles linked to these traits. Presumably, during soybean domestication and introduction in producing regions, a large number of advantageous alleles were lost as a result of genetic bottlenecks. The accessions chosen for a breeding programme must contain and transmit advantageous rare alleles that are lacking in elite germplasm. As a result, understanding the origins of these alleles is crucial. Accessions that are very different from elite genotypes are likely to offer novel alleles for the desired trait. The difficult part is to choose accessions from the available germplasm to use in breeding operations. Therefore, knowledge of the genetic diversity of soybean genotypes would help breeders and geneticists understand the structure of the germplasm to choose parents with greater genetic diversity and accelerate the expansion of the genetic resources [14]. Morphological characterization, biochemical markers, and molecular marker techniques are frequently used to access genetic diversity among and between populations [15]. Morphological and biochemical markers are less reliable than DNA markers due to significant environmental effects [16]. Developing DNA markers is important for understanding the genetic diversity between and within different crop species [17, 18] as they draw attention to variations in the nucleotide sequence between different individuals and are indifferent to environmental variables [19]. Molecular markers such as Random amplified polymorphic DNA (RAPD), Simple- sequence repeats (SSR), expresses sequence tags (EST-SSR), Amplified fragment length polymorphism (AFLP), and Single nucleotide polymorphism (SNP) have been used to identify genetic diversity in soybean germplasm [2026].SSR markers have been the most used for characterizing genes, analyzing genetic diversity, and mapping genetic linkages. SSR markers are very useful for genotype differentiation, pedigree analysis, assessing genetic distances among genotypes, and variety identification because they are short tandem repeats dispersed uniformly on the entire genome with high polymorphic information content (PIC) and reproducibility [2729]. While use of functional molecular markers, such as those developed from expressed sequence tags (EST), directly access to the population diversity of important genes for agriculture, making it easier to link genotype to phenotype. Although, SNPs are the most important DNA markers as they have low levels of recurrent mutations, making them stable in terms of evolution. Therefore, they are the best markers for dissecting the genetic basis of complex characters for analyzing genomic evolution processes [30]. While SNPs can be used as to assess genetic diversity in agricultural species, they are less preferred than SSRs due to their limited information content, biallelic nature, and high cost [19]. A comparative genetic diversity study on sugar beet cultivars using DArT, SNPs, and SSRs showed that SSR markers had the highest success rate due to their highly polymorphic characteristics [31, 32]. Other studies have shown that SSR markers were extremely effective in estimating genetic diversity and association among soybean accessions [16, 2832].

Since soybeans are a relatively new crop in Pakistan, local breeding programs focus on creating new, competitive cultivars with excellent production and quality with limited attention to understanding the extent of diversity in their working germplasm. Therefore, it is important to assess the genetic diversity of the soybean germplasm from USDA to develop new and improved soybean cultivars for Pakistan. Hence, this study analyzed 96 accessions of cultivated soybean from various geographical regions using 80 genomic SSR and 20 functional EST-SSR markers evaluate the geographic and genetic differences.

Materials and methods

Plant material

Ninety-six soybean accessions were collected from various geographical regions and grouped based on origin (Fig 1 and S1 Table). Of these, 59 accessions came from the USA, followed by China (12), Pakistan (8), Brazil (7), India (4), Australia (2), Afghanistan (2), Iran (1), and Japan (1). All accessions were grown in pots in a controlled environment to collect leaf samples.

Fig 1. World map representing geographic regions of the 96 accessions used in the study.

Created using map chart (


Fresh young leaves were used for genomic DNA extraction following the Cetyltrimethyl ammonium bromide (CTAB) method described by Doyle and Doyle [33]. Eighty genomic SSR markers and 20 functional EST-SSR markers distributed uniformly across the soybean genome were selected from the literature (S2 Table) and used to check genetic diversity among the soybean accessions. A PCR reaction mixture (15 μl) was prepared, comprising 1.5 mM of 10× buffer, 3.5 mM MgCl2, 600 μM dNTPs, 0.6 μM of each forward and reverse primer, and 1 U Taq polymerase with 50–100 ng DNA. The reaction began with initial denaturation at 94°C for 5 min, followed by 95°C for 30 sec, 48–55°C for 1 min, 72°C for 1 min, and a final extension at 72°C for 10 min. A gradient thermal cycler (Kyratec Super Cycler) was used to perform the PCR reaction. The PCR products were fractionated in 2.5% agarose gel electrophoresis containing ethidium bromide for staining bands and visualized using a UV Analyzer based on their migration distance relative to Gene ruler 50 bp DNA ladder (Thermo Scientific, 10416014).

Statistical analysis

Genotypic data obtained from the SSR and EST-SSR markers were scored as 0 or 1 based on the presence or absence of a DNA band in the gel (S1 Fig). The expected heterozygosity (He), observed heterozygosity (Ho), genetic distance between accessions (GD), and Shannon informative index (I) were estimated using POPGENE (v.1.32) software [34]. The PIC, gene diversity, and allele frequency of markers were calculated using Power Marker v.3 [35].

Diversity analysis and population structure

Phylogenetic analysis was conducted using genotypic data to evaluate the dissimilarity among accessions using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) method in DarWIN software. The phylip file obtained from DarWIN was used to construct the phylogenetic tree in MEGA6 software [36]. Principal coordinate analysis (PCoA) was conducted using Past 4.0 software to identify the degree of differentiation between accessions [37]. Model-based software STRUCTURE v.2.3.4 with the admixture ancestry model was used to analyze population structure, with correlated allelic frequencies used as parameters for the analysis [38]. The number of iterations for the Burn-in Period and Markov Chain Monte Carlo was set at 10,000. The online platform STRUCTURE HARVESTER was used to obtain Optimum Evanno’s K values [39]. Analysis of molecular variance (AMOVA) was used to assess genotypic variation in the population, with accessions from each country considered a single population. Since each population must contain at least two individuals, the accessions from Japan and Iran were considered one population. AMOVA was performed using GenALEx 6.5 software [40].


Marker informativeness and heterozygosity

Of the 100 markers used in this study, 17 EST-SSRs and 79 SSRs were polymorphic, with 262 amplified alleles, ranging from 2 to 5 alleles per locus (average 2.79). Of the 96 polymorphic markers, the five most polymorphic markers produced five alleles, followed by seven, 41, and 43 markers that produced four, three, and two alleles, respectively (Table 1). Sat_304 had the maximum PIC value (0.72), and GMES6776 had the minimum (0.16), with an average of 0.44 for the 96 SSR markers. Twenty-eight markers had PIC values ≥0.50, 64 had PIC values ranging from 0.30–0.49, while four had PIC values <0.30. GMES6776 had the highest major allele frequency (0.89), and Sat_304 had the lowest (0.31), with 54 markers ranging from 0.4–0.6 (average 0.57). The average expected heterozygosity (He) and observed heterozygosity (Ho) were 0.5 and 0.02, respectively. Sat_304 and Satt 373 had the maximum heterozygosity of 0.72. The maximum Shannon’s information index was 1.362, while the minimum was 0.356, with the average 0.794 (Table 1).

Table 1. Genetic diversity parameters for 96 SSR and EST-SSR markers.

Genetic relationship among 96 soybean accessions based on origin

The 96 tested soybean accessions were grouped into eight populations based on origin. As Japan and Iran only had a single accession each, they were grouped in a single population (Pop-8). The mean alleles per locus (Na) and effective alleles (Ne) of all populations were 2.023 and 1.786, respectively, with Pop-1 having the highest values (2.594 and 2.077, respectively). Shannon’s informative index ranged from 0.337–0.771 (average 0.569). The observed heterozygosity ranged from 0.005–0.027 (average 0.013) and expected heterozygosity ranged from 0.242–0.492 (average 0.378) (Table 2).

Table 2. Genetic diversity parameters for the eight soybean populations analyzed using SSR and EST-SSR markers.

Genetic distance (Nei’s measure) analysis

The genetic distance among the 96 soybean accessions from nine regions ranged from 0.079–1.232 (S3 Table). PI612157 from Georgia, USA, and PI462312 from India had the greatest genetic distance of 1.232, followed by PI269518C from Pakistan and PI462312 from India with 1.17. These four accessions had the highest degree of genetic differentiation based on genetic distance. PI644047 and PI644054, both from Georgia, USA, had the smallest genetic distance of 0.079.

Diversity analysis

The genetic diversity of 96 soybean accessions was assessed through following analysis.

Phylogenetic analysis.

The phylogenetic analysis identified two major groups and several subgroups (Fig 2). Group-1 (G1) comprised of 53 accessions of which most accessions were from the USA (37) and Brazil (7), while Group-2 (G2) comprised of 43 accessions including accessions from China, Pakistan, India, and Afghanistan. Further G2 contained Pakistani check cv. Faisal while G1 contained cv. Ajmeri. The phylogenetic analysis grouped the 59 USA accessions into nine groups (G1–G9), indicating a high degree of genetic diversity (Fig 3).

Fig 2. Phylogenetic tree of 96 soybean accessions using the UPGMA method.

Accessions in group 1 (G1) are represented with red color while accessions in group 2 (G2) are represented with green color.

Fig 3. Phylogenetic tree of 59 USA accessions using the UPGMA method.

Principal coordinate analysis.

The PCoA showed that all accessions were distributed across the plot, with 45.97% of the total variation explained in the first six coordinates (Fig 4). Based on their grouping, many USA accessions were similar to Brazilian and Australian accessions, while the accessions from India, China, Pakistan, and Afghanistan were similar, with six Chinese and two Pakistani accessions clustered together. A second PCoA of the USA accessions revealed that the accessions were scattered across the plot without any significant clustering. The first six PCoAs explained 51.5% of the total variation (Fig 5).

Fig 4. Principal coordinate analysis of 96 soybean accessions using 96 markers to identify variation among accessions based on their country of origin: USA (red), China (brown), Brazil (light green), Pakistan (pink), India (dark green), Iran (black), Afghanistan (blue), Australia (violet), and Japan (gray).

Fig 5. Principal coordinate analysis showing variation between soybean accessions from different states in the USA.

Population STRUCTURE.

Population STRUCTURE was used to 1) identify distinct genetic populations, 2) identify migrants and admixed individuals, and 3) assign individuals to populations [41]. The highest peak (K = 2) occurred at ΔK 176.6, indicating that the tested population could be divided into two groups. Two minor peaks, at K = 3 (ΔK = 20.71) and K = 8 (ΔK = 8.38), also occurred (Fig 6A). The accessions with a membership proportion (Q) of 80% or more were considered pure, with the remaining accessions classified as admixture (Fig 6B). At a threshold value of 80%, 42 pure and 54 admixture lines were observed. Of the 42 pure lines, 17 were present in Group-1 (G1 = red), and 25 were present in Group-2 (G2 = green). G1 and G2 contained eight and 19 USA lines, respectively, corresponding to 47% of the pure USA accessions in G1 and 76% of those in G2. At a threshold value of 70%, the number of pure lines increased from 17 to 24 in G1 and 25 to 30 in G2, with the USA lines increasing from 8 to 11 in G1 and 19 to 23 in G2. Three of the 12 Chinese accessions were in G1, with the rest in G2. Five of the eight Pakistani accessions were in G2, with the rest in G1. G1 and G2 each contained two Indian accessions. A similar structure analysis was undertaken for the USA accessions, with an optimum K value of K = 9 obtained (Fig 7A). The results were consistent with the phylogenetic analysis, indicating a high level of differentiation in the USA soybean lines (Fig 7B).

Fig 6.

(A) Graph of estimated membership fraction for K = 2. (B) Graphical representation of 96 soybean accessions using 96 markers for K = 2.

Fig 7.

(A) Graph of estimated K value for 59 USA soybean accessions with K = 9. (B) Population structure of 59 USA soybean accessions using SSR and EST-SSR markers.


The molecular variance observed among individuals within a population was 94% and among populations was only 2% (Table 3). Wright’s F-statistics for the tested markers were Fis (0.961) and Fit (0.962). The 96 polymorphic markers had a mean fixation index of 0.025, indicating low genetic variation across subpopulations. The rate of gene flow (Nm) was 9.814, indicating a high rate of gene exchange among populations.

Table 3. AMOVA of 96 soybean accessions using SSR and EST-SSR markers.

The 59 USA accessions were further grouped based on states and analyzed for AMOVA. The percentage of variation among individuals within the population was 93% and among populations was 2%. Wright’s F-statistics for the tested SSR markers were Fis (0.952) and Fit (0.953). The SSR markers had a mean fixation index (Fst) of 0.024, indicating a very low degree of exchange among populations. The rate of gene flow (Nm) was 10.36, indicating a very high rate of gene exchange among populations (Table 4). A pairwise population matrix of accessions was undertaken to check the population distance among populations in three continents: America, Asia, and Australia (Table 5). The results indicated greater genetic diversity among the lines from Asia than continental America and Australia.

Table 4. AMOVA of USA accessions from different states using SSR and EST-SSR markers.

Table 5. Pairwise population matrix of genetic distance measured across the genotypes from different continents.


Characterizing germplasm and understanding its genetic diversity are prerequisite steps for developing improved crop cultivars [42]. The plant breeders could increase the genetic base of locally adapted cultivars by using their knowledge of genetic diversity. Consequently, genetic diversity estimation has become an important method for locating genetically different parents that possess desirable features is genetic diversity estimate [42]. The identification of genetic diversity and genetic structure of evaluated soybean germplasm using molecular data supported the selection of possible parents based on morpho-biochemical properties, which facilitated long-term breeding and selection operations. In order to reduce the genetic instability of segregating populations, various parents are preferable in soybean crossbreeding [43]. Many studies have been conducted to assess the genetic diversity in legume crops using molecular markers [44, 45]. The polymorphism observed by using these molecular analysis was very high that was very likely due to polymorphic nature of SSRs [46, 47]. Thus molecular markers are reliable source for identifying the various soybean populations.

In present study, 100 uniformly distributed genomic SSR and functional EST-SSR markers were used to explore the genetic diversity among 96 soybean accessions. The results of the phylogenetic analyses, PCoA, population STRUCTURE, and AMOVA indicated high genetic variation among the accessions (Figs 2, 4, and 6; Table 2), with a slightly higher average allelic number per locus (2.88) than an earlier study by Bisen, Khare [48], who reported 2.21 average alleles per locus for 50 SRR markers in 38 soybean accessions. The difference in allele numbers may be due to the different sample sizes, number of markers, and genotypes used [49]. A marker with a PIC value = 0.5 or more indicates the presence of high informativeness [50]. Here, PIC values ranged from 0.18–0.72 (average 0.44), lower than the average 0.47 reported elsewhere [51]. Markers with high PIC values can be used to distinguish soybean accessions. Information on Ho and He suggests the extent of genetic variability in the population [5]. This study had a much lower average Ho (0.019) than average He (0.51), which may be due to the high self-pollinating nature of soybean [52, 53]. The average Shannon’s Index per locus was 0.79, slightly higher than Ullah et al. [54], who reported an average of 0.69 per marker in soybean that shows that diversity observed in present study is slightly higher than previous.

The study population was divided into two major groups, G1 and G2 (Fig 2), with most USA lines in G1 and those from other countries in G2. Žulj Mihaljević [43] also tested 42 SSR markers on 97 European soybean accessions that were separated into two sub-groups based on geographic origin, which also supported the present findings. Geographic distances and genetic variation are highly correlated, which is more likely the result of long-term selection and ecological diversity [55]. These groupings were further supported by PCoA and population STRUCTURE (Figs 2 and 4), suggesting that the USA lines are genetically distinct from lines from other countries and affirming the assumption that USA lines are somewhat distinct from lines from other continents. Similar results obtained by structure and PCoA indicates that two separate gene pools were the primary source of two sub populations [45]. However, some Brazilian lines were also present in G1, which may be due to their close geographical locations or free movement of the germplasm in this region. In addition, most Pakistani lines clustered with the USA lines in G1, possibly due to their similar origins. In an earlier study, Iqbal, Naeem [56] also reported that accessions from USA and Pakistan clustered together. STRUCTURE analysis showed that the ratio of pure USA lines increased when the threshold decreased from 80% to 70% in both groups, in line with the findings of other studies [51, 54, 56]. AMOVA showed a high percentage of variance among individuals within species (94%), supporting the high diversity argument. Similarly, Wen et al. [57] reported 2.70% variation among the population and 97.30% within a subpopulation. The UPGMA, PCoA, population structure, and AMOVA of the 59 USA lines identified a high level of genetic diversity among these accessions (Figs 3, 5, and 7B; Table 3) but also observed genetic similarity among some lines. Other studies support the high degree of variation among the USA lines [58, 59]. The local commercial varieties Faisal soybean and Ajmeri were present in different clusters, indicating that these genotypes were introduced from various origins and assessed throughout the selection process before being made available for commercial cultivation [60]. Appiah-Kubi et al. [61] assessment of the genetic diversity among the soybean genotypes using 20 SSR markers revealed a close link between genotypes and their geographic origin, which is consistent with these findings. Due to the substantial exchange of genetic resources among farming communities, the poor structure of germplasm may be a reflection of the presence of gene flow between the subpopulations [62]. So, by using molecular markers it will be easy for the farmers to assess the suitable genotype for the crossing that may lead to the development of new varieties.


Frequent use of closely related cultivars reduces the genetic diversity in germplasm and hinders the breeding of new cultivars with improved traits. The soybean accessions investigated in this study are highly diverse, with a medium to high level of genetic variation across geographic regions, Genetic markers can be considered as a useful source to access the genetic diversity which is not easy to achieve through phenotypic diversity thus could be valuable for future breeding programs by increasing the new varieties in already existing gene pool.

Supporting information

S1 Fig.

Gel images of PCR product on 2.5% agarose gel (A) Satt150, (B) Satt173, (C) Satt316, (D) Satt373, (E) Satt565, (F) Satt636, (G) Satt706 and (H) Sct189.


S1 Table. List of germplasm used in the study.


S3 Table. Genetic distance found between 96 accessions investigated in this study.



  1. 1. Pembele Ibanda A, Karungi J, Malinga GM, Adjumati Tanzito G, Ocan D, Badji A, et al. Influence of environment on soybean [Glycine max (L.) Merr.] resistance to groundnut leaf miner, Aproaerema modicella (Deventer) in Uganda. Journal of Plant Breeding and Crop Science. 2018;10(12):336–46.
  2. 2. Kumar SJ, Kumar A, Ramesh K, Singh C, Agarwal DK, Pal G, et al. Wall bound phenolics and total antioxidants in stored seeds of soybean (Glycine max) genotypes. Indian Journal of Agricultural Sciences. 2020;90:118–222.
  3. 3. Hartman GL, Hill CB. 13 Diseases of Soybean and Their Management. The soybean: botany, production and uses: CABI Publishing Cambridge, USA; 2010.
  4. 4. Singh G. The soybean: botany, production and uses: CABI; 2010.
  5. 5. Gupta S, Manjaya J. Genetic diversity and population structure of Indian soybean [Glycine max (L.) Merr.] revealed by simple sequence repeat markers. Journal of Crop Science and Biotechnology. 2017;20(3):221–31.
  6. 6. Liu Z, Li H, Wen Z, Fan X, Li Y, Guan R, et al. Comparison of genetic diversity between Chinese and American soybean (Glycine max (L.)) accessions revealed by high-density SNPs. Frontiers in Plant Science. 2017;8:2014. pmid:29250088
  7. 7. Torres AR, Grunvald AK, Martins TB, Santos MAd, Lemos NG, Silva LAS, et al. Genetic structure and diversity of a soybean germplasm considering biological nitrogen fixation and protein content. Scientia Agricola. 2015;72:47–52.
  8. 8. dos Santos JVM, Valliyodan B, Joshi T, Khan SM, Liu Y, Wang J, et al. Evaluation of genetic variation among Brazilian soybean cultivars through genome resequencing. BMC Genomics. 2016;17(1):1–18.
  9. 9. Mulato BM, Möller M, Zucchi MI, Quecini V, Pinheiro JB. Genetic diversity in soybean germplasm identified by SSR and EST-SSR markers. Pesquisa Agropecuária Brasileira. 2010;45:276–83.
  10. 10. Bonato ALV, Calvo ES, Geraldi IO, Arias CAA. Genetic similarity among soybean (Glycine max (L) Merrill) cultivars released in Brazil using AFLP markers. Genetics Molecular Biology. 2006;29:692–704.
  11. 11. Ssendege G, Obua T, Kawuki R, Maphosa M, Tukamuhabwa T. Soybean genetic diversity and resistance to soybean rust disease in Uganda. Agricultural Journal. 2015;10(3–6):17–23.
  12. 12. Varshney RK, Kudapa H. Legume biology: the basis for crop improvement. Functional Plant Biology. 2013;40(12):v–viii. pmid:32481187
  13. 13. Susmita C, Kumar S, Chintagunta AD, Agarwal DK. Apomixis: a foresight from genetic mechanisms to molecular perspectives. The Botanical Review. 2021:1–37.
  14. 14. Kumar A, Ramesh K, Chandusingh A, Sripathy K, Dinesh K, Pal G, et al. Bio-prospecting nutraceuticals from selected soybean skins and cotyledons. Indian Journal of Agricultural Sciences. 2019;89(12):2064–8.
  15. 15. Singh C, Kumar SJ, KV S. Characterization and identification of rice germplasm accessions using chemical tests. Seed Research. 2017.
  16. 16. Wang L, Guan R, Zhangxiong L, Chang R, Qiu L. Genetic diversity of Chinese cultivated soybean revealed by SSR markers. Crop Science. 2006;46(3):1032–8.
  17. 17. Wang L, Guan Y, Guan R, Li Y, Ma Y, Dong Z, et al. Establishment of Chinese soybean Glycine max core collections with agronomic traits and SSR markers. Euphytica. 2006;151(2):215–23.
  18. 18. Kumar S, Susmita C, Agarwal DK, Pal G, Rai AK, Simal-Gandara J. Assessment of genetic purity in rice using polymorphic SSR markers and its economic analysis with grow-out-test. Food Analytical Methods. 2021;14(5):856–64.
  19. 19. Mondini L, Noorani A, Pagnotta MA. Assessing plant genetic diversity by molecular tools. Diversity. 2009;1(1):19–35.
  20. 20. Chauhan DK, Bhat J, Thakur A, Kumari S, Hussain Z, Satyawathi C. Molecular characterization and genetic diversity assessment in soybean [Glycine max (L.) Merr.] varieties using SSR markers. Physiology and Molecular Biology of Plants. 2015;21:101–7.
  21. 21. Chen W, Hou L, Zhang Z, Pang X, Li Y. Genetic diversity, population structure, and linkage disequilibrium of a core collection of Ziziphus jujuba assessed with genome-wide SNPs developed by genotyping-by-sequencing and SSR markers. Frontiers in Plant Science. 2017;8:575.
  22. 22. Doldi ML, Vollmann J, Lelley T. Genetic diversity in soybean as determined by RAPD and microsatellite analysis. Plant Breeding. 1997;116(4):331–5.
  23. 23. Ren J, Sun D, Chen L, You FM, Wang J, Peng Y, et al. Genetic diversity revealed by single nucleotide polymorphism markers in a worldwide germplasm collection of durum wheat. International Journal of Molecular Sciences. 2013;14(4):7061–88. pmid:23538839
  24. 24. Singh N, Choudhury DR, Singh AK, Kumar S, Srinivasan K, Tyagi R, et al. Comparison of SSR and SNP markers in estimation of genetic diversity and population structure of Indian rice varieties. PloS ONE. 2013;8(12):e84136. pmid:24367635
  25. 25. Tantasawat P, Trongchuen J, Prajongjai T, Jenweerawat S, Chaowiset W. SSR Analysis of Soybean (Glycine max(L.) Merr.) Genetic Relationship and Variety Identification in Thailand. Australian Journal of Crop Science. 2011;5(3):283–90.
  26. 26. Lee S-H, Ha B-K, Bae J-S, Moon J-K. Evaluation of genetic diversity among soybean genotypes using SSR and SNP. Korean Journal of Crop Science. 2001;46(4):334–40.
  27. 27. Vignal A, Milan D, SanCristobal M, Eggen A. A review on SNP and other types of molecular markers and their use in animal genetics. Genetics Selection Evolution. 2002;34(3):275–305. pmid:12081799
  28. 28. Guan R, Chang R, Li Y, Wang L, Liu Z, Qiu L. Genetic diversity comparison between Chinese and Japanese soybeans (Glycine max (L.) Merr.) revealed by nuclear SSRs. Genetic Resources Crop Evolution. 2010;57(2):229–42.
  29. 29. Min W, Run-zhi L, Wan-ming Y, Wei-jun D. Assessing the genetic diversity of cultivars and wild soybeans using SSR markers. African Journal of Biotechnology. 2010;9(31):4857–66.
  30. 30. Alemu A, Feyissa T, Letta T, Abeyo B. Genetic diversity and population structure analysis based on the high density SNP markers in Ethiopian durum wheat (Triticum turgidum ssp. durum). BMC Genetics. 2020;21(1):1–12.
  31. 31. Simko I, Eujayl I, van Hintum TJ. Empirical evaluation of DArT, SNP, and SSR marker-systems for genotyping, clustering, and assigning sugar beet hybrid varieties into populations. Plant Science. 2012;184:54–62. pmid:22284710
  32. 32. Zhang C, Peng B, Zhang W, Wang S, Sun H, Dong Y, et al. Application of SSR markers for purity testing of commercial hybrid soybean (Glycine max L.). Journal of Agricultural Science Technology. 2014;16(6):1389–96.
  33. 33. Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. 1987.
  34. 34. Yeh F, Yang R, Boyle T, Ye Z, Xiyan J. PopGene32, Microsoft Windows-based freeware for population genetic analysis, version 1.32. Molecular Biology Biotechnology Centre, University of Alberta, Edmonton, Alberta, Canada. 2000.
  35. 35. Liu K, Muse S. PowerMarker: an integrated analysis environment for genetic marker analysis. Medline; 2005. p. 2128–9. pmid:15705655
  36. 36. Kumar S, Stecher G, Tamura K, evolution. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular Biology. 2016;33(7):1870–4.
  37. 37. Hammer Ø, Harper DA, Ryan PD. PAST: Paleontological statistics software package for education and data analysis. Palaeontol Electronica. 2001;4(1):9.
  38. 38. Pritchard JK, Wen W, Falush D. Documentation for STRUCTURE software: Version 2. University of Chicago, Chicago, IL. 2010.
  39. 39. Earl DA, VonHoldt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources. 2012;4(2):359–61.
  40. 40. Peakall R, Smouse P. Population genetic software for teaching and research. GenAlEx 6: genetic analysis in Excel. Molecular Ecology Notes. 2006;6:288–95.
  41. 41. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945–59. pmid:10835412
  42. 42. Oliveira M, Sousa L, Reis M, Junior ES, Cardoso D, Hamawaki O, et al. Evaluation of genetic diversity among soybean (Glycine max) genotypes using univariate and multivariate analysis. Genetics Molecular Resources. 2017;16(2). pmid:28613377
  43. 43. Žulj Mihaljević M, Šarčević H, Lovrić A, Andrijanić Z, Sudarić A, Jukić G, et al. Genetic diversity of European commercial soybean [Glycine max (L.) Merr.] germplasm revealed by SSR markers. Genetic Resources Crop Evolution. 2020;67:1587–600.
  44. 44. Tripathi N, Khare D. Molecular approaches for genetic improvement of seed quality and characterization of genetic diversity in soybean: a critical review. Biotechnology Letters. 2016;38(10):1645–54. pmid:27334709
  45. 45. Hwang T-Y, Gwak BS, Sung J, Kim H-S. Genetic diversity patterns and discrimination of 172 korean soybean (Glycine max (L.) merrill) varieties based on SSR analysis. Agriculture. 2020;10(3):77.
  46. 46. Hipparagi Y, Singh R, Choudhury DR, Gupta V. Genetic diversity and population structure analysis of Kala bhat (Glycine max (L.) Merrill) genotypes using SSR markers. Hereditas. 2017;154:1–11.
  47. 47. Dong D, Fu X, Yuan F, Chen P, Zhu S, Li B, et al. Genetic diversity and population structure of vegetable soybean (Glycine max (L.) Merr.) in China as revealed by SSR markers. Genetic Resources Crop Evolution. 2014;61:173–83.
  48. 48. Bisen A, Khare D, Nair P, Tripathi N. SSR analysis of 38 genotypes of soybean (Glycine Max (L.) Merr.) genetic diversity in India. Physiology and Molecular Biology of Plants. 2015;21(1):109–15.
  49. 49. Emanuelli F, Lorenzi S, Grzeskowiak L, Catalano V, Stefanini M, Troggio M, et al. Genetic diversity and population structure assessed by SSR and SNP markers in a large germplasm collection of grape. BMC Plant Biology. 2013;13(1):1–17.
  50. 50. Zhang Z, Deng Y, Tan J, Hu S, Yu J, Xue Q. A genome-wide microsatellite polymorphism database for the indica and japonica rice. DNA Research. 2007;14(1):37–45. pmid:17452422
  51. 51. Kumawat G, Singh G, Gireesh C, Shivakumar M, Arya M, Agarwal DK, et al. Molecular characterization and genetic diversity analysis of soybean (Glycine max (L.) Merr.) germplasm accessions in India. Physiology and Moecular Biology of Plants. 2015;21(1):101–7.
  52. 52. Sharma R, Kumar B, Arora R, Ahlawat S, Mishra A, Tantia M. Genetic diversity estimates point to immediate efforts for conserving the endangered Tibetan sheep of India. Meta Gene. 2016;8:14–20. pmid:27014586
  53. 53. Liu YL, Li YH, Zhou GA, Uzokwe N, Chang RZ, Chen SY, et al. Development of soybean EST-SSR markers and their use to assess genetic diversity in the Subgenus soja. Agricultural Sciences in China. 2010;9(10):1423–9.
  54. 54. Ullah A, Akram Z, Malik SI, Khan KSU. Assessment of phenotypic and molecular diversity in soybean [Glycine max (L.) Merr.] germplasm using morpho-biochemical attributes and SSR markers. Genet Resources and Crop Evolution. 2021;68:2827–47.
  55. 55. Shi A, Chen P, Zhang B, Hou A. Genetic diversity and association analysis of protein and oil content in food‐grade soybeans from Asia and the United States. Plant Breeding. 2010;129(3):250–6.
  56. 56. Iqbal Z, Naeem R, Ashraf M, Arshad M, Afzal A, Shah AH, et al. Genetic diversity of soybean accessions using seed storage proteins. Pakistan Journal of Botany. 2015;47(1):203–9.
  57. 57. Wen Z, Ding Y, Zhao T, Gai J. Genetic diversity and peculiarity of annual wild soybean (G. soja Sieb. et Zucc.) from various eco-regions in China. Theoritical and Applied Genetics. 2009;119(2):371–81.
  58. 58. Bandillo N, Jarquin D, Song Q, Nelson RL, Cregan P, Specht J, et al. A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Genome. 2015. pmid:33228276
  59. 59. Hyten DL, Song Q, Zhu Y, Choi I-Y, Nelson RL, Costa JM, et al. Impacts of genetic bottlenecks on soybean genome diversity. Proceedings of the National Academy of Sciences. 2006;103(45):16666–71. pmid:17068128
  60. 60. Khurshid H, Baig D, Jan S, Arshad M, Khan M. Miracle crop: the present and future of soybean production in Pakistan. MOJ Biology and Medicine. 2017;2(1):189–91.
  61. 61. Appiah-Kubi D, Asibuo J, Quain M, Oppong A, Akromah R. Diversity studies on soybean accessions from three countries. Biocatalysis Agricultural Biotechnology. 2014;3(2):198–206.
  62. 62. Zigene ZD, Asfaw BT, Bitima TD. Analysis of genetic diversity in rosemary (Salvia rosemarinus Schleid.) using SSR molecular marker for its management and sustainable use in Ethiopian genebank. Genetic Resources and Crop Evolution. 2021;68:279–93.