Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Population structure and genetic diversity characterization of soybean for seed longevity

  • Naflath T. V.,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Validation, Writing – original draft

    Affiliation Department of Seed Science and Technology, College of Agriculture, UAS, GKVK, Bangalore, Karnataka, India

  • Rajendra Prasad S.,

    Roles Conceptualization, Funding acquisition, Writing – review & editing

    Affiliation Department of Seed Science and Technology, College of Agriculture, UAS, GKVK, Bangalore, Karnataka, India

  • Ravikumar R. L.

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    rlravikumar@rediffmail.com

    Affiliation Department of Plant Biotechnology, College of Agriculture, UAS, GKVK, Bangalore, Karnataka, India

Abstract

Seed longevity is an important trait in the context of germplasm conservation and economics of seed production. The identification of populations with high level of genetic variability for seed longevity and associated traits will become a valuable resource for superior alleles for seed longevity. In this study, Genotyping-by-sequencing (GBS)-single nucleotide polymorphism (SNP) approach, simple sequence repeats (SSR) markers and agro-morphological traits have been explored to investigate the diversity and population structure of assembled 96 genotypes. The GBS technique performed on 96 genotypes of soybean (Glycine max (L.) Merrill) resulted in 37,897 SNPs on sequences aligned to the reference genome sequence. The average genome coverage was 6.81X with a mapping rate of 99.56% covering the entire genome. Totally, 29,955 high quality SNPs were identified after stringent filtering and most of them were detected in non-coding regions. The 96 genotypes were phenotyped for eight quantitative and ten qualitative traits by growing in field by following augmented design. The STRUCTURE (Bayesian-model based algorithm), UPGMA (Un-weighed Pair Group Method with Arithmetic mean) and principal component analysis (PCA) approaches using SSR, SNP as well as quantitative and qualitative traits revealed population structure and diversity in assembled population. The Bayesian-model based STRUCTURE using SNP markers could effectively identify clusters with higher seed longevity associated with seed coat colour and size which were subsequently validated by UPGMA and PCA based on SSR and agro-morphological traits. The results of STRUCTURE, PCA and UPGMA cluster analysis showed high degree of similarity and provided complementary data that helped to identify genotypes with higher longevity. Six black colour genotypes, viz., Local black soybean, Kalitur, ACC Nos. 39, 109, 101 and 37 showed higher seed longevity during accelerated ageing. Higher coefficient of variability observed for plant height, number of pods per plant, seed yield per plant, 100 seed weight and seed longevity confirms the diversity in assembled population and its suitability for quantitative trait loci (QTL) mapping.

Introduction

Seed longevity has an exceptional importance in the conservation of genetic resources since majority of 7.4 million gene bank accessions of plants species are stored as seeds [1]. It maintains the species without losing its genetic integrity, that can happen during regeneration of plants and holds up the global agriculture. The longevity of a seed in storage is highly varied between the species, varieties and even in different seed lots of a variety [25]. Soybean (Glycine max (L.) Merrill) is an important crop providing sustainable source of vegetable oil and high-quality protein to world population and live-stock [6]. It is not surprising that in 2021, soybeans are the largest global source of protein for live-stock and the global production of soybean has more than doubled since 2000 [7]. More than ninety countries grow soybean [8] and modern soybean varieties have been developed for high yield and wide adaptability; but low seed longevity [9, 10]. Large genetic variation in seed longevity has been reported in soybean [1113] and observed that black seed coat colour [12, 1416] and small seed size [17] are associated with seed longevity. It has been documented that thin and delicate seed coat and high seed oil content makes the soybean more susceptible to deterioration under dry storage [1821]. Soybean with black seed coat colour have been used as a traditional ingredient in medical treatments in many countries with many health benefits [6, 22, 23]. However, black soybeans, mostly landraces with higher longevity have been declined in popularly and replaced by improved varieties. Therefore, identification of populations with high level of genetic variability for seed longevity and associated traits will become a valuable resource for superior alleles for seed longevity. To understand the importance of the genetic diversity for seed longevity in soybean, the genotypes with different seed coat colour, seed size, germplasm and advanced breeding lines with abundant morphological diversity were assembled. Understanding population structure and genetic diversity in this population is important which paves the way for genome wide association studies and for functional gene investigation.

Analysing the genetic diversity and population genetic structure is significant for broadening the base population and allelic diversity in crop improvement. Numerous techniques can be used to determine the genetic variations. Among them, morphological and quantitative phenotypic traits have been extensively used to determine the genetic diversity in crop plants [2426] including soybean [27, 28]. These traits are affected by many external factors and limited in number [2931]. Morphological markers are less effective in understanding the genetic diversity and population structure [32, 33]. Hence, molecular markers became a powerful tool for the genetic studies of soybean populations. These includes random amplified polymorphic DNA (RAPD) [3436], simple sequence repeats (SSR) [37, 38], amplified fragment length polymorphism (AFLP) [39, 40], inter simple sequence repeats (ISSR) [35, 41] and single nucleotide polymorphism (SNP) [4244]. Among these, microsatellite (SSR) and SNP markers are the commonly used markers in estimating the genetic diversity and population structure of different species either singly [4549] or in combination [5053]. These two marker systems are distributed throughout the genome while the frequency and basis of polymorphism varies [54]. The abundance of SNPs over SSRs in a genome with low sequencing cost and easy genotyping techniques facilitated the SNP marker system as an overwhelmingly useful method. Genotyping-by-sequencing (GBS) is a next generation sequencing (NGS) technique, which sequence a representative genomic region and genotype the samples based on the identified SNPs at once [45, 55]. It has been a technique of high acceptance in genomic studies [5660] and widely used in population structure as well as genetic diversity studies [45, 61, 62].

Therefore, this research was conducted to perform a genetic diversity and population structure analysis from morphological, SSR and SNP markers on a soybean population of 96 genotypes differing in seed coat colour, seed size, growth habit, seed longevity and agro-morphological traits. The black seed coat colour and small seed size land races have been a special attraction for seed longevity. Our findings will facilitate future genome wide association mapping and functional aspects of seed longevity.

Material and methods

Plant material

A set of 96 soybean genotypes including 11 black, three green and 82 yellow seed coat colours (landraces, germplasm and advanced breeding lines) were included in this study (S1 Table). The genotypes were obtained from all India co-ordinated research project (AICRP) on soybean, University of agricultural sciences (UAS) Bangalore, Indian council of agricultural research, Ministry of agriculture, India, the seed material available with the first author and also at AICRP on soybean, UAS Dharwad. The field activities were conducted properly within the Indian laws and regulations at Department of plant biotechnology, UAS Bangalore. We confirm that the field studies conducted in the current study did not involve endangering indigenous or protected species.

Phenotyping under field conditions

The genotypes were phenotyped for eight quantitative and ten qualitative traits by growing the genotypes in the field during summer 2021 at Department of Plant Biotechnology, UAS Bangalore, Karnataka, India. The 96 genotypes were grown side by side in a single row of 2.0 meter without replication using augmented block design and randomly assigned to three blocks. Each block contained 32 genotypes and 4 checks (DSB 32, JS 95–60, Hardee and DSB 33) repeated twice. All the agronomic practices were followed to raise a good crop. Five plants were randomly selected from each genotype and the measurements on the following quantitative traits were recorded. The days to flowering (DFF) was recorded as the numbers of days from sowing until 50 percent of the plants bloomed. The plant height (PH) in centimetre was measured as the distance from the base of the plant (soil surface) to tip of the primary branch at maturity. Number of pods per plant (NPP) were counted by taking pods with at least one developed seed at maturity. Five pods per plant were randomly selected to count the number of seeds per pod (NSP) and pod length (PL) in centimetre was measured form base to the tip of the matured pod. Seed yield per plant (SY) in gram was taken after sun drying the seeds to uniform moisture content (9%) and 100 seed weight (100SW in gram) was recorded by weighing 100 filled seeds. The qualitative traits viz., hypocotyl anthocyanin pigmentation (HP), growth habit (GH), growth type (GT), flower colour (FC), leaf shape (LS), leaf greenness (LG), stem pubescence (SP), pod pubescence (PP), pod colour (PC) and seed coat colour (SCC) were recorded by visual observation as per distinctness, uniformity and stability (DUS) guidelines on soybean [63].

The seed longevity (G%) was measured using germination percentage after accelerated ageing. Freshly harvested seeds after sun drying to nine percent moisture content (measured using digital moisture meter (Kett-Seed & Grain Moisture Tester PM-600)) was subjected to accelerated ageing treatment [64] at National seed project, UAS, Bangalore, Karnataka, India. Forty two gram seeds from all the genotypes were placed on a wire mesh screen in an ageing box filled with 40 ml distilled water and subjected to 41±0.3°C temperature and more than 95 percentage relative humidity in an ageing chamber for 72 hours. DSB 32, JS 95–60, Hardee and DSB 33 were used as check varieties. Another set of all the genotypes were placed in separate ageing boxes and kept at ambient conditions without accelerated ageing treatment as control. After ageing treatment, one hundred seeds in four replicates from treated and control boxes were subjected to germination process using between paper method as per international seed testing association (ISTA) guidelines [65] at a temperature of 30°C with a relative humidity of 90±2 percent for 5 days. The germination count was recorded and expressed in percentage.

SNP genotyping

Two sets of genomic DNA from each genotype was isolated from fresh leaves of 10 days old seedling using CTAB method [66]. The Genotyping by sequencing [55] was outsourced to the company, Nucleome Informatics Pvt. Ltd., Hyderabad, India (https://www.nucleomeinfo.com). Genomic DNA of 0.3–0.6 μg from all the genotypes was digested using NlaIII_HaeIII and MseI restriction enzymes based on in silico evaluation results, and the obtained fragments were ligated with two barcoded adapters as well as P5 and P7 universal sequence followed by polymerase chain reaction (PCR) enrichment. The library was constructed by pooling the DNA fragments with desired sizes after gel electrophoresis. High-throughput pair-end-sequencing of the library was performed with a read length of 144 base pair (bp) at each end using Illumina® HiSeqTM 2500 platform. The base calling of the sequence was performed using CASAVA v1.8 software. The Illumina sequencing quality (Qphred) was calculated by using the equation, Qphred = −10 log10(e) [67], were e is the sequencing error rate in CASAVA version 1.8 software. The sequencing error rate or base quality is examined over the length of all sequences to detect the sites with error rate which increases with extension of sequencing reads due to the consumption of chemical reagents, laser irradiation damage of DNA, error during the sequencing cycles and high error rate of several bases at first due to the reading errors. The GC content distribution analysis was done to detect the AT or GC separation of the pair-end sequencing data. The raw data obtained from GBS was subjected to filtration by removing the reads with adapter contamination, more than 10 percent uncertain nucleotides and more than 50 percent low quality nucleotides.

After the quality check and data filtering, the GBS data was subjected to sequence assembly with reference genome (https://www.ncbi.nlm.nih.gov/assembly/GCF_000004515.6/) using Burrow-Wheeler Aligner (BWA) [68]. The SNP calling was performed using TASSEL (v. 5) GBS v2 pipeline [69]. Summary of the filtered data was computed using VCFtools [70]. The nucleotide variant annotation and effect prediction of detected SNPs was performed using SnpEff software v5.1 [71]. SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. The SNP filtering was performed in TASSEL 5 using the FilterSiteBuilderPlugin and the filtering criteria was set to 0.05 site minor allele frequency, 0.20 maximum heterozygous proportion and less than 10.00 percent missing data.

Another set of DNA, isolated from 96 genotypes was used for SSR genotyping using fifteen microsatellite markers and the primer details and standardized annealing temperature for the primers are given in S2 Table. The volume of reaction mixture was 10 μl and 38 reaction cycles were run. The PCR products were visualized in 3.5 percent agarose gel using ethidium bromide in gel documentation system. Marker scoring was done manually as 0/1 binary matrix. The presence of an allele was denoted as ‘1’ and absence as ‘0’ and the size of the amplified fragments were recorded in comparison with 1000 bp DNA ladder.

Statistical data analysis

Morphological diversity, principal component analysis and clustering.

The analysis of variance (ANOVA) was done using augmented-randomized complete block design for the quantitative observation of the genotypes. The angular transformed values of germination percentage were analysed using completely randomized design (CRD) in SPSS software [72]. Mean, range and percentage coefficient of variation (CV) of the traits were computed in microsoft excel. The violin plot of the quantitative traits are performed in PAST04 software [73]. The graphical representation of frequency distribution of genotypes in all the qualitative observations were computed in microsoft excel. The principal component analysis (PCA) and clustering of eight quantitative (DFF, PH, NPP, NSP, PL, SY, 100SW and G%) and ten qualitative traits (HP, GH, GT, FC, LS, LG, SP, PP, PC and SCC) was carried out in PAST04 software. A scatter plot was developed to visually assess the dissimilarity between the genotypes based on their positions in the plot depending on seed longevity and other plant morphological traits of the genotypes. The UPGMA (Un-weighed Pair Group Method with Arithmetic mean) based clustering of the genotypes was done using Euclidean distance-similarity index by taking all the phenotypic diversity including seed longevity.

Population structure and clustering using SNP.

The population structure was analyzed by means of K values which is an assumed number of sub-populations from 1 to 10 in the population using STRUCTURE v.2.3.4 software [74]. Ten independent analyses were performed for each K value and the parameters was set as 50,000 lengths of burn-in period followed by 50,000 markov chain monte carlo (MCMC) replications after burn-in. The optimum K value was determined in Structure Harvester v 0.6.94 [75] that creates a plot of mean likelihood values per K. The actual number of sub-populations was selected based on the maximum value of delta K [74, 76].

The distance-based population structure analysis was performed using SNP markers in TASSEL (v5) software. Hierarchical clustering was carried out using UPGMA by taking pair wise dissimilarity matrix between the genotypes [77]. Nei coefficient was used to estimate the genetic distance between the genotypes [78] with 1000 bootstraps.

Genotype clustering using SSR markers.

The UPGMA based population structure analysis was performed using fifteen SSR markers in DARwin software [79] by taking the pairwise dissimilarity index of SSR markers between the genotypes.

Results

SNP genotyping, population structure and clustering

Genotyping by sequencing analysis was performed on 96 diverse soybean genotypes by Illumina® HiSeqTM 2500 platform. Totally 28.088Gb raw data of 96 samples were sequenced from this study, with 28.0Gb clean data generated after filtering low-quality data. The raw data production for each sample ranged from 154.12 to 381.48 million reads, indicating the sufficient amount of data production. The sequencing quality for Q20 and Q30 reached 94.44% and 85.00%, respectively. The GC content ranged from 35.25% to 38.13%, fulfilling the quality standard for SNP analysis (S3 Table). For 989,192,843 bp reference genome, the mapping rate of each sample ranged from 97.77% to 99.69% with an average of 99.56%. The average depth (X) on the reference genome is in 4.60X to 8.76X range with a mean of 6.81X, while more than 1X coverage exceeds 2.87% (S4 Table). Totally 37,897 SNPs were identified during SNP calling using TASSEL (v. 5) GBS v2 pipeline. The variant rate was one variant in every 25,299 bases. The type and number of substitutions observed in the GBS analysis shows higher number of transitions (22,584) than transversion (15,273) with a ratio of 1.47 (Fig 1). Totally 39 insertions and deletions were identified from the GBS analysis. The effect by impact and functional class studied for the identified SNPs using SnpEff shows that, 97.25% SNPs were with modifier type impact in which mutations occurs in introns, intergenic, intragenic and other non-coding regions (Table 1). Only 1.67% and 1.02% variants were detected with low (variants which do not change encoded protein) and moderate (variants cause changes in protein function without affecting the protein structure) impact. Totally, 38 (0.05%) mutations were detected that cause significant changes (high impact) like frame shift, deletion of large part of chromosome or the entire exon and formation of stop codon. SnpEff detected higher percentage of point mutations or SNPs, with missense variant (59.60%) which results in change in amino acids. There are 38.94% silent mutations detected from the identified SNPs that do not cause significant change in the protein. The SNPs with nonsense effects was only 1.46% which results in formation of stop codons and subsequently produce truncate, incomplete or non-functional proteins. Various kinds of SNPs were detected from the GBS analysis of soybean genotypes including gain and loss of stop codon and start codon, respectively (Table 1). The SNPs were also detected at splice regions (0.27%), intergenic (41.96%) and intragenic (1.01%) regions. The possible amino acid changes estimated using SnpEff indicate that the detected SNPs has resulted in numerous rearrangements of the amino acid sequences of protein (Fig 2). Highest number of changes was detected in the replacement of Alanine with Valine (30) followed by Arginine with Histidine (20) and Serine with Glycine (20).

thumbnail
Fig 1. Number of transitions and transversion of nucleotide in genotyping by sequencing analysis.

https://doi.org/10.1371/journal.pone.0278631.g001

thumbnail
Fig 2. Amino acid changes of the variants in SnpEff analysis of the GBS derived SNPs.

(Rows are reference amino acids and columns are changed amino acids. For example: 2 Alanine have been replaced by Aspartic acid. The increase in intensity of green colour indicate the increase in number of amino acid changes and the diagonals are indicated in grey background colour). The amino acid codes- A: Alanine; C: Cysteine; D: Aspartic acid; E: Glutamic acid; F: Phenylalanine; G: Glycine; H: Histidine; I: Isoleucine; K: Lysine; L: Leucine; M: Methionine; N: Asparagine; P: Proline; Q: Glutamine; R: Arginine; S: Serine; T: Threonine; V: Valine; W: Tryptophan; Y: Tyrosine.

https://doi.org/10.1371/journal.pone.0278631.g002

thumbnail
Table 1. SnpEff summary of variant effects by impact and functional class of GBS derived SNPs.

https://doi.org/10.1371/journal.pone.0278631.t001

The detected SNPs were then subjected to filtering by removing SNPs with less than 5 percent minor allele frequency, greater than 20 percent heterozygous proportion and greater than 10 percent missing data (Table 2). A total of 29,955 high quality SNP markers were obtained after filtering and further used for population structure analysis. The proportion of heterozygosity and average minor allele frequency of the filtered SNPs were 0.02 and 0.25, respectively. Among 29,955 SNPs identified, 26,769 SNPs were aligned to 20 chromosomes and 3,186 SNPs were found in contigs. Higher number of SNPs were aligned to chromosome-18 (10.33%) followed by chromosome-15 (8.74%) and chromosome-14 (7.46%) and least was in chromosome-10 (2.81%) (S5 Table).

The structure analysis of 96 soybean genotypes was performed using 29,955 SNPs assuming 10 populations. The optimum number of clusters was estimated based on ΔK method and plateau criterion in the STRUCTURE. The maximum likelihood based ad-hoc method gave highest value of ΔK at K = 8 explaining that 96 genotypes can be grouped in to eight distinct populations (Fig 3A). The eight populations derived from structure analysis consisted of 11, 18, 29, 3, 11, 10, 8 and 6 genotypes, respectively (Fig 3B). Based on the membership fractions, i.e., inferred ancestry, the genotypes were placed in different sub-populations. The sub-population C7 consisted of seven black seeded genotypes along with one yellow which had 31% and 28% of membership to C5 and C2, respectively. Another black genotype, ACC No. 369 had 49% membership to C7 though it was placed in C5 (50%). All the genotypes in C2, C4 and C8 sub-populations had admixture from rest of the sub-populations. In addition, ΔK peak was also observed at K = 6, which indicate an additional informative population structure (S1 Fig; S6 Table).

thumbnail
Fig 3. SNP marker based population structure.

a) Delta K plot at different K-value based on the maximum likelihood method b) Barplot of K = 8 with genotypes in each sub-population depending on their inferred ancestry.

https://doi.org/10.1371/journal.pone.0278631.g003

Remarkable genetic divergence among the sub-populations derived from the population structure analysis was evident from the average genetic distance (expected heterozygosity, He). Highest value of He was detected in C4 and least was noted in C3 (Table 3). The fixation index (Fst) resulted from structure analysis indicate genetic variance of each sub-population to the total population variance. Highest Fst was found in C8 (0.81), followed by C3 (0.80) and least was in C4 (0.01). The sub-population-wise average seed longevity (G%) was highest for C7 (53.74%) followed by C4 (51.30%) with 100SW of 10.86 and 10.10 g, respectively. The PH was highest in C7 (73.08) followed by C4 (57.63) and least PH belonged to C2 (39.15). Seven out of eight genotypes in C7 were black seed coat colour genotypes and had smaller seed size with higher plant height.

thumbnail
Table 3. Fixation index, expected heterozygosity and number of genotypes in eight sub groups of soybean genotypes derived from SNP population structure analysis.

https://doi.org/10.1371/journal.pone.0278631.t003

In this study, UPGMA hierarchical clustering using 29,955 SNP loci was also carried out to analyse the population structure. The UPGMA divided the 96 genotypes to six distinct clusters consisting of 13, 11, 38, 5, 12 and 17 genotypes, respectively (Fig 4). The results are complementary to the STRUCTURE analysis. Out of 13 genotypes in cluster-I, seven were from C7 sub-population of STRUCTURE. Five out of seven black seed coat colour genotypes in this cluster formed a sub-cluster and BNS-5 (S20) (green) formed a solitary cluster. Interestingly, all the 11 genotypes in cluster-II were from C1 sub-population of STRUCTURE. Cluster-III consists of all the genotypes in C3 sub-population along with genotypes from other sub-populations and contain 36 yellow and 2 black (Pune 14 (S13) and Pune 30 (S35)) seed coat colour genotypes. Cluster-IV consists of yellow seed coat colour genotypes from C2 and C8. Nine out of 12 genotypes in cluster-V were the same as C2.

thumbnail
Fig 4. UPGMA based hierarchical clustering of genotypes using SNP markers.

https://doi.org/10.1371/journal.pone.0278631.g004

Molecular clustering using SSR marker.

The UPGMA paired hierarchical clustering of the genotypes were also performed using 15 SSR markers (Fig 5). Majority of the genotypes (50 genotypes) formed solitary clusters with only one genotype per cluster. The remaining 46 genotypes formed 14 clusters. However, the seven black seed coat colour genotypes formed one cluster in SSR genotyping along with 6 genotypes from C7 sub-population of STRUCTURE. The clustering based on SSR genotyping did not show any complementation with clustering based on SNP genotyping.

thumbnail
Fig 5. UPGMA cluster of 96 soybean genotypes using SSR markers.

https://doi.org/10.1371/journal.pone.0278631.g005

Phenotypic variation, principal component analysis and clustering.

The genotypes were evaluated for eight quantitative traits and the analysis of variance suggested significant variation for all the traits including seed longevity (Table 4). The phenotypic variation in all the eight traits (DFF, PH, NPP, NSP, PL, SY, 100SW and G%) among genotypes is depicted by range, mean, standard error and coefficient of variation (Table 5). The violin plot of genotypes for different traits indicated normal distribution for the traits with high density of the kernel density plot towards median, except for PH and DFF (Fig 6A and 6B). In DFF, a high-density peak of the genotypes was observed at lower quartile and a slight peak at upper quartile with a bimodal distribution. The kernel density of genotypes for PH were distributed throughout the inter quartile range without a remarkable peak. The G% also showed normal distribution.

thumbnail
Fig 6. Violin plot of quantitative traits.

a) Plot of Days to flowering (DFF), Plant height (PH), Number of pods per plant (NPP), Seed yield (SY), 100 Seed weight (100SW) and Seed longevity (G%) b) Plot of Pod length (PL) and Number of seeds per pod (NSP).

https://doi.org/10.1371/journal.pone.0278631.g006

thumbnail
Table 4. Mean sum of square of genotypes and error for eight quantitative traits.

https://doi.org/10.1371/journal.pone.0278631.t004

thumbnail
Table 5. Mean, range and coefficient of variation (CV %) of genotypes for eight quantitative traits.

https://doi.org/10.1371/journal.pone.0278631.t005

The variation for ten qualitative traits (HP, GH, GT, FC, LS, LG, SP, PP, PC and SCC) were recorded. The frequency distribution of these traits among 96 soybean genotypes are presented on Fig 7. Most of the genotypes showed yellow seed colour (85.42%) and absence of hypocotyl pigmentation (70.83%). Majority of the genotypes showed determinate growth type (58.33%), pod pubescence (52.08%), pointed ovate leaf shape (51.04%), dark green leaves (53.13%) and yellow pod colour (53.13%). The flower colour ranged from white (34.38%) to pink (8.33%) and most of the genotypes had purple flower (43.75%). Most of the genotypes showed absence of stem pubescence (44.79%) followed by very strong pubescence (22.92%).

thumbnail
Fig 7. Number of genotypes for each class in different qualitative traits.

(HP: Hypocotyl anthocyanin pigmentation (0: Absent; 1: Present), GH: Growth habit (0: Erect; 1: Semi-erect), SP: Stem pubescence (0: Absent; 1: Weak; 2: Medium; 3: Strong; 4: Strong), GT: Growth type (0: Determinate; 1: Indeterminate), LS: Leaf shape (0: Lanceolate; 1: Pointed ovate; 2: Round ovate), LG: Leaf greenness (0: Light green; 1: Medium green; 2: Dark green), FC: Flower colour (0: White; 1: Light purple; 2: Purple; 3: Pink), PC: Pod colour (0: Yellow; 1: Brown; 2: Black), PP: Pod pubescence (0: Absent; 1: Present) and SCC: Seed coat colour (0: Yellow; 1: Green; 2: Black).

https://doi.org/10.1371/journal.pone.0278631.g007

The PCA was performed using eight quantitative and ten qualitative traits to estimate Eigenvalues and principal components based on trait loadings which are used as covariates for the population structure. Seven components had more than one Eigenvalue and together contribute 99.595 percent variance (Table 6). Principal component 1 (PC 1) and principal component 2 (PC 2) were found to have higher Eigenvalues (Fig 8) with a cumulative percent variance of 86.68 (Table 6). The PC 1 was highly correlated with PH (91.20%) followed by NPP (36.30%) and SY (14.73%) (Fig 9(A)). While, PC 2 was highly associated with G% (97.83%) and NPP (17.14%) (Fig 9(B)). Hence these traits, i.e., PH, G%, NPP and SY can be used for the determination of genotypes dissimilarity and for grouping the genotypes. Interestingly, PC2 which was associated with seed longevity was negatively associated with 100SW.

thumbnail
Fig 8. Scree plot of different principal components in PCA using quantitative and qualitative traits.

https://doi.org/10.1371/journal.pone.0278631.g008

thumbnail
Fig 9.

Contribution of quantitative and qualitative traits towards PC 1 (a) and PC 2 (b). DFF: Days to flowering; PH: Plant height; NPP: Number of pods per plant; NSP: Number of seeds per pod; PL: Pod length; SY: Seed yield; 100SW: 100 Seed weight; G%: Seed longevity; SCC: Seed coat colour; FC: Flower colour; LS: Leaf shape; LG: Leaf greenness; SP: Stem pubescence; HP: Hypocotyl anthocyanin pigmentation; GH: Growth habit; GT: Growth type; PP: Pod pubescence; PC: Pod colour.

https://doi.org/10.1371/journal.pone.0278631.g009

thumbnail
Table 6. Principal components with its eigenvalues and percentage variances towards the total population variance in PCA using quantitative and qualitative traits.

https://doi.org/10.1371/journal.pone.0278631.t006

Based on the score of the genotypes and loading of the traits at PC 1 and PC 2, a scatter plot was made with an unambiguous pattern of genotype combinations on the factor plane (Fig 10). The genotypes with higher G% and smaller seed size with tall PH, and high NPP were placed in the first quadrant, which includes, majority of the black seed coat colour genotypes (ACC Nos. 37, 39 & 109, Local black soybean, LB-5 and Kalitur) from C7 sub-population of STRUCTURE using SNP markers. Second quadrant contained a close cluster having 29 genotypes including three black seed coat colour genotypes with high G% and small seed size with short PH. Nineteen genotypes with low G%, large seed size and short PH were placed in to third quadrant of the scatter plot. The fourth quadrant contained 16 genotypes along with tall PH with low G%.

thumbnail
Fig 10. Scatter plot of genotypes based on eigenvalues of components in PC 1 and PC 2.

(First quadrant: genotypes with high seed longevity, tall plant height and small seed size (circle consists of black seed coat colour genotypes from C7 sub-population of STRUCTURE analysis using SNP markers); Second quadrant: genotypes with high seed longevity, short plant height and small seed size; Third quadrant: genotypes with low seed longevity, short plant height and bigger seed size; Forth quadrant: genotypes with low seed longevity, tall plant height and bigger seed size).

https://doi.org/10.1371/journal.pone.0278631.g010

The UPGMA based hierarchical clustering of the genotypes using quantitative and qualitative traits divided the genotypes in to two major clusters (Fig 11). The cluster-I consisted of 89 genotypes and cluster-II consisted of five genotypes. The cluster-I was sub-clustered in to 5 subclusters. The subcluster-5 had genotypes with highest G% (57.66%) and lowest seed size (9.78 gram) (Table 7). Four out of six genotypes in this subcluster were with black seed coat colour. On the contrary the subcluster-1 of cluster-I with 11 genotypes had least G% (17.14%) and highest seed weight (16.76 gram). None of the genotypes had black seed coat colour in this subcluster.

thumbnail
Fig 11. UPGMA based hierarchical clustering of genotypes using quantitative and qualitative traits.

https://doi.org/10.1371/journal.pone.0278631.g011

thumbnail
Table 7. Number of genotypes, seed longevity (G%) and 100 seed weight (100SW) of clusters formed in hierarchical clustering using phenotypic traits.

https://doi.org/10.1371/journal.pone.0278631.t007

Discussion

This study focuses on the diversity and population structure analysis of 96 soybean genotypes assembled to evaluate the seed longevity as well as its association with seed size and seed coat colour. To the best of our knowledge this is the first attempt to assemble the population and understand the genetic population structure to study the seed longevity in soybean. Seed longevity in soybean is a pressing concern since it has been affecting the seed yield and quality and placing seed production under high risk of economic security [9]. Efforts have been made to study the genetic basis of seed longevity and identify the genotypes with high seed longevity in soybean [15, 80, 81]. In this study, 29,955 polymorphic SNPs, generated by GBS analysis of 96 genotypes, were used for population diversity and structure analysis. Various NGS technologies are in high demand due to its efficiency in generating thousands or millions of sequences simultaneously with low cost [82]. Genotyping by sequencing is a popular NGS technique which facilitates marker discovery and high-density genotyping at low cost [55, 83]. This technique enables us to study population structure and identify QTLs of complex traits in large number of genotypes of various crops [8386]. The high percentage of phred scores (Q20 and Q30) used in the study and GC content of the sequence data revealed sufficient sequencing quality for further variant calling and genotyping. The Q20 and Q30 scores indicate the correction rate of 99.00% and 99.99% for an error rate of 1 in 100 and 1 in 1000 bases, respectively. The high GC content in the sequence reads represents the less linear read coverage and hence wider genome coverage [87]. Higher sequencing depth and mapping rate obtained in the study explains that the Illumina HiSeqTM 2500 sequencing covered almost entire genome. The TASSEL-GBS (v2) pipeline detected variants in every 25,299 bases and found 37,897 polymorphic SNPs on sequences aligned to the reference genome as well as in contigs. The higher number of transitions over transversions indicate that transition mutations are better tolerated during natural selection in soybean which is common to other plant species [88, 89]. Despite the narrowing genomic variation of soybean due to selection pressure during the domestication process [90, 91] and higher level of repetitive DNA in soybean [92], we identified 29,955 high quality SNPs using stringent filtering criteria, detecting higher number of polymorphic SNPs as compared to previous studies [84, 93, 94]. The variant annotation performed using SnpEff [71] showed that majority of the detected SNPs were modifiers in non-coding regions and only few SNPs in coding region resulting in significant changes like frame shift, deletion of large part of chromosome or the entire exon and formation of stop codon. The non-coding SNPs play a significant role in gene regulation and attracted great interest of geneticist in the past decade [95]. The functional classification of the SNPs showed that majority of the variants result in amino acid changes and 19 SNPs results in formation of stop codons and subsequently produce truncate, incomplete or non-functional proteins. The missense to silent ratio was 1.53, which is higher than the ratio reported earlier in soybean [91, 96]. The 29,955 SNPs were distributed in all the chromosomes; however, the highest number of SNPs were found in chromosome-18 and lowest number of SNPs were in chromosome-10.

In this study we used SNP, SSR and agro-morphological traits to analyse the population structure and genetic diversity of 96 soybean genotypes. The Bayesian-model based structuring using SNP markers could effectively identify clusters with higher seed longevity associated with seed coat colour and seed size which were subsequently validated by UPGMA and PCA based on qualitative and quantitative traits. This model has been extensively used to study the optimum population number in various crop species to study the population structure analysis [52, 9799]. Bayesian-model based population structure analysis yielded eight distinct sub-populations. The average genetic distance and fixation index of the eight sub-populations revealed large genetic divergence existing in the genotypes used for the study. The C7 sub-population with majority of the black seed coat colour genotypes (7 out of 8) had highest seed longevity, plant height and smaller seed size as compared to other sub-populations. The black genotypes with small seed size in sub-population C7 are the valuable resource for seed longevity supporting the earlier results [16, 17, 26].

Both STRUCTURE and UPGMA clustering using SNPs showed that black colour soybean genotypes with small seed size were separated from other soybean genotypes. The UPGMA clustering based phylogenetic tree showed clear grouping of genotypes similar to STRUCTURE analysis. All the genotypes in cluster-II in UPGMA were from C1 and all the 29 genotypes from C3 were grouped in cluster-III. A similar clustering was also formed in SSR based UPGMA clustering with the same 7 black seed coat colour genotypes in a separate cluster as in SNP based UPGMA clustering. Non-parametric population structure analysis using pairwise dissimilarity index with SSR and SNP markers has been widely followed invariably in population genetic studies of cowpea, pigeon pea and soybean [52, 99, 100]. Our study was aimed at characterizing the level of polymorphism in a population assembled for seed longevity based on both SSR and SNP markers, and the study proved that both the marker systems were informative and provided complementary data that helped to describe the population in terms of seed longevity. The SSR and SNP marker system differ for their genomic distribution and basis of polymorphism as well as clustering of genotypes in soybean [54]. Singh et al. [101] and Courtois et al. [102] observed different patterns of cluster using SSR and SNP markers in rice genotypes and reported that the dissimilar grouping of genotypes is obvious irrespective of the markers under investigation. In the present study, we have used only a limited number of SSR markers for clear differentiation of genotypes. Comparatively, SNPs are the marker of choice for population structure analysis and plant breeding [103].

The genotypic or phenotypic information alone may not be efficient to capture the genetic diversity and structure analysis in the population. Therefore, we used both genotypic and phenotypic information for clustering the genotypes in the assembled population. To reveal the phenotypic variation, the 96 genotypes were characterized under field conditions for eight quantitative and ten qualitative traits. The main focus of the assembled population is the genetic variation for seed longevity. The genotypes under investigation were highly diverse for the seed longevity tested under accelerated ageing treatment with high coefficient of variability. Seed longevity assessment using accelerated ageing method was found to be beneficial as compared to natural ageing since it gives a quick and reliable estimate of longevity in soybean [15, 81, 104] and in other crops like chickpea, black gram and common bean [105107]. We observed a wide variability among the genotypes for seed longevity. The ACC No. 39, EC 1720617, MACS 1410, Local black soybean, ACC No. 369, LB-5, Kalitur, EC 8705 and PS 1618 showed significantly higher seed longevity after the ageing treatment. Five out of 11 black seed coat colour genotypes had more than 70 percent germination after accelerated ageing treatment, indicating higher seed longevity of the black seeded genotypes. All the genotypes with lower seed longevity had yellow seed coat colour. The seed size of the genotypes ranged from 6.60 to 20.80 grams and in general, the black seeded genotypes had smaller seed size except VLS-1. This suggests that substantial variation exists in the assembled population for mapping of QTLs responsible for seed longevity. The other quantitative traits, viz., SY, PH, NPP and 100SW showed massive genetic variation with high coefficient of variability suggesting heterogeneity in our assembled population. High level of variability has been reported for plant height [108], seed size [109] and other traits [6, 110, 111] in soybean. These morphological traits are of vital importance as they are linked to yield directly [112]. The qualitative features of the genotypes estimated following DUS guidelines [63] grouped the genotypes in to various categories based on their plant growth and seed characteristics. There is a significant variation in all the qualitative traits which was also reported by earlier studies [28, 113].

The PCA is a powerful approach that allows the understanding on the structure of plant population and it makes possible to identify important variables among the assembled genotypes. The Euclidean biplot showed that the distribution of accessions in all the four quadrants. The scatter plot of PC 1 and PC 2 made using 96 genotypes gave various close clusters in factor planes with majority of the black seed coat colour genotypes viz., Local black soybean, Kalitur, ACC Nos. 39, 109, 101 and 37 in the first quadrant that harbour the genotypes with higher seed longevity, plant height and smaller seed size. Interestingly, these six black seed coat colour genotypes were the same genotypes which was clustered together in molecular clustering using SSR and SNP markers as well as SNP based population structure analysis. These genotypes can be identified as genotypes with high seed longevity and grouped together in both molecular and morphological traits based analysis. To date, PCA using morphological and molecular information has been successfully employed to prove the diversity and phylogenetic relationship of genotypes and to select genotypes for breeding [114]. The quantitative and qualitative traits based PCA revealed two main principal components which together explains 86.68 percentage variance. The seed longevity was the most effective trait for PC 2. These results indicate that PH, seed longevity and NPP could be suitable candidate traits to investigate morphogenic variation related to seed longevity in future studies. We also found that seed longevity in the black seed coat colour genotypes showed positive correlation with plant height and number of pods, and negative correlation with seed size. The UPGMA hierarchical clustering using morphological traits revealed two major clusters with 91 genotypes in cluster-I and the remaining 5 genotypes in cluster-II. The genotypes with higher longevity has been considered as an important objective in this study. The cluster analysis of morphological traits illustrate that the subcluster-5 in cluster-I had 6 genotypes with high seed longevity after accelerated ageing treatment. Concomitantly, this group recorded least 100 seed weight suggesting small seed size of the genotypes. The subcluster-1 in cluster-I had 11 genotypes with the lowest seed longevity and largest seed size. The relationship observed between seed coat colour, seed longevity and seed size are in line with the findings of Hosamani et al. [17]. The ability of a seed to sustain its viability during dry seed storage highly varies with the genetic makeup of the seed [115, 116].

The results of STRUCTURE, PCA and UPGMA cluster analysis showed high degree of similarity and provided complementary data that helped to identify genotypes with high seed longevity. We observed that six black seed coat colour genotypes viz., local black soybean, Kalitur, ACC Nos. 39, 109, 101 and 37 with high seed longevity and small seed size clustered together in both molecular and morphological distance-based grouping and model-based structuring. The negative correlation between seed longevity in black seed coat colour genotypes and seed size were in agreement with the previous results [12, 14, 16, 17, 104]. High rate of lipid peroxidation and the production of reactive oxygen species are the most reported reasons for low seed longevity in soybean due to its high oil content [20, 117, 118] along with its seed coat composition [119]. Seed coat of black seeded soybean genotypes have high content of hemicellulose [120] which enhances the seed impermeability [121]. High concentration of calcium in the seed coat [122] with lesser gap between testa and cotyledon, a few pores on the seed coat surface, higher lignin content [12] and greater number of secondary metabolites such as anthocyanin and polyphenols [123, 124] are helpful in scavenging the reactive oxygen species created during deterioration [125]. Together with the health benefits of black seed coat colour genotypes [6, 123], higher longevity of the seeds during dry storage makes the black seed coat colour genotypes with enormous utility in soybean crop improvement programme.

Supporting information

S1 Fig. Bar plot of K = 6 using SNP markers.

https://doi.org/10.1371/journal.pone.0278631.s001

(DOCX)

S1 Table. List of soybean genotypes used for the present study and their codes in SNP-UPGMA cluster.

https://doi.org/10.1371/journal.pone.0278631.s002

(DOCX)

S2 Table. List of SSR primers used for the study, their sequence with annealing temperature.

https://doi.org/10.1371/journal.pone.0278631.s003

(DOCX)

S3 Table. Statistics of the GBS-sequencing data.

https://doi.org/10.1371/journal.pone.0278631.s004

(DOCX)

S4 Table. Statistics of mapping rate, depth and coverage.

https://doi.org/10.1371/journal.pone.0278631.s005

(DOCX)

S5 Table. Number and percentage of discovered SNPs in each chromosome out of 26769 chromosome aligned SNPs.

https://doi.org/10.1371/journal.pone.0278631.s006

(DOCX)

S6 Table. Fixation index, expected heterozygosity and number of genotypes in six sub groups of soybean genotypes derived from SNP population structure analysis.

https://doi.org/10.1371/journal.pone.0278631.s007

(DOCX)

Acknowledgments

The authors would like to extend thanks to All India Co-ordinated Research Project (AICRP) on Soybean, ICAR-Indian Institute of Soybean Research, Indore, Madhya Pradesh and University of Agricultural Sciences, Bangalore and University of Agricultural Sciences, Dharwad for providing soybean germplasm and AICRP-National Seed Project, DST-FIST funded Department of Plant Biotechnology, University of Agricultural Sciences, Bangalore, Karnataka, India, for extending the facilities to carry-out the work.

References

  1. 1. Pritchard HW. Diversity in seed longevity amongst biodiverse seeds. Seed Sci Res. 2020; 30: 75–80. doi: https://doi.org/10.1017/S0960258520000306
  2. 2. Walters L, Wheeler LM, Grotenhuis JM. Longevity of seeds stored in a gene bank: species characteristics. Seed Sci Res. 2005; 15: 1–20.
  3. 3. Khanam M, Yeasa MA, Rahman MD, Mahbub AA, Gomosta AR. Effects of different factors on the growth efficiency of rice seedlings. Bangl J Bot. 2007; 36 (2): 171–176.
  4. 4. Tubic SB, Tatic M, Dordevic V, Nikolic Z, Subic J, Dukic V. Changes in soybean seeds as affected by accelerated and natural aging. Rom Biotechnol Lett. 2011; 16 (6): 6740–6747.
  5. 5. Lee JS, Velasco-Punzalan M, Pacleb M, Valdez R, Kretzschmar T, McNally KL, et al. Variation in seed longevity among diverse Indica rice varieties. Annals of Botany. 2019; 124: 447–460. pmid:31180503
  6. 6. Jo H, Lee JY, Cho H, Choi HJ, Son CK, Bae JS, et al. Genetic diversity of soybeans (Glycine max (L.) Merr.) with black seed coats and green cotyledons in Korean germplasm. Agronomy. 2021; 11: 581. doi: https://doi.org/10.3390/agronomy11030581
  7. 7. Anonymous, The state of soybean: 2021. Health/Nutrition. Pig progress. 2021. Available from: https://www.pigprogress.net/health-nutrition/the-state-of-soybeans-2021-and-beyond
  8. 8. FAOSTAT. Countries—Select All; Regions—World + (Total); Elements—Production Quantity; Items—Soybeans; Years– 2018. Retrieved 20 May 2020.
  9. 9. Delouche JC. Soybean seed storage beyond one year. Proc. 7th Soybean Res Conf ASTA. 1977; 60–73.
  10. 10. Walters C, Hill LM, Wheeler LJ. Dying while dry: kinetics and mechanisms of deterioration in desiccated organisms. Integr Comp Biol. 2005; 45: 751–758. pmid:21676826
  11. 11. Bhatia VS. Seed longevity as affected by field weathering and its association with seed coat and pod characters in soybean. Seed Res. 1996; 24: 82–87.
  12. 12. Kuchlan MK, Dadlani M, Samuel VK. Seeds coat properties and longevity of soybean seeds. J New Seeds. 2010; 11: 239–249. https://doi.org/10.1080/1522886X.2010.497960
  13. 13. Sooganna S, Jain SK, Bhat KV, Amrit L, Lal SK. Characterization of soybean (Glycine max) genotypes for seed longevity using SSR markers. Indian J Agric Sci. 2016; 86(5):605–10.
  14. 14. Pawar PV, Naik RM, Deshmukh MP, Satbhai RD, Mohite SG. Biochemical and molecular marker-based screening of seed longevity in soybean (Glycine max (L.) Merrill). Legume Res. 2017; 1–10.
  15. 15. Adsul AT, Chimote VP, Deshmukh MP. Inheritance of seed longevity and its association with other seed-related traits in soybean (Glycine max). Agric Res. 2018; 1–8.
  16. 16. Chandra S, Talukdar A, Taak Y, Yadav RR, Saini M, Sipani NS (2021) Seed longevity studies in wild type, cultivated and inter-specific recombination inbred lines (RILs) of soybean. Genet Resour Crop Evol. https://doi.org/10.1007/s10722-021-01240-2
  17. 17. Hosamani J, Dadlani M. Santha IM, Kumar MBA, Jacob SR. Biochemical phenotyping of soybean (Glycine max (L.) Merrill) genotypes to establish the role of lipid peroxidation and antioxidant enzymes in seed longevity. Agric Res. 2013; 2 (2):119–126.
  18. 18. Simic B, Popovic R, Sudaric A, Rozman V, Kalinovic I. Cosic J. Influence of storage condition on seed oil content of maize, soybean and sunflower. Agric Conspectus Sci. 2007; 72 (3): 211–213.
  19. 19. Balesevic TS, Tatic MDV, Nikolic Z, Dukic V. Seed viability of oil crops depending on storage conditions. Helia, 2010; 33 (52): 153–160.
  20. 20. Banuprakash K, Yogeesha HS, Arun MN. Physiological and biochemical changes in relation to seed quality in ageing bell pepper (Capsicum annuum) seeds. Indian J Agric Sci. 2010; 80 (9): 777–780.
  21. 21. Ghassemi-Golezani K, Bakhshy J, Raey Y, Hossainzadeh-Mahootchy A. Seed vigor and field performance of winter oilseed rape (Brassica napus L.) cultivars. Not Bot Hort Agrobot Cluj. 2010; 38 (3): 146–150. https://doi.org/10.15835/nbha3834977
  22. 22. Jhan JK, Chung YC, Chen GH, Chang CH, Lu YC, Hsu CK. Anthocyanin contents in the seed coat of black soya bean and their anti-human tyrosinase activity and antioxidative activity. Int. J. Cosmet. Sci. 2016; 38: 319–324. pmid:26663436
  23. 23. Ganesan K, Xu B. A critical review on polyphenols and health benefits of black soybeans. Nutrients. 2017; 9: 455. pmid:28471393
  24. 24. Cericola F, Portis E, Toppino L, Barchi L, Acciarri N, Ciriaci T, et al. The population structure and diversity of eggplant from Asia and the Mediterranean basin. PLoS ONE. 2013; 8(9): e73702. pmid:24040032
  25. 25. Ganesan SK, Singh R, Choudhury DR, Bharadwaj J, Gupta V, Singode A. Genetic diversity and population structure study of drumstick (Moringa oleifera Lam.) using morphological and SSR markers. Industrial Crops and Products. 2014; 60: 316–325. http://dx.doi.org/10.1016/j.indcrop.2014.06.033
  26. 26. Nkhata W, Shimelis H, Melis R, Chirwa R, Mzengeza T, Mathew I, et al. Population structure and genetic diversity analyses of common bean germplasm collections of East and Southern Africa using morphological traits and high-density SNP markers. PLoS ONE. 2020; 15(12): e0243238. pmid:33338076
  27. 27. Kumar A, Pandey A, Aochen C, Pattanayak A. Evaluation of genetic diversity and interrelationships of agro-morphological characters in soybean (Glycine max) genotypes. Proc Natl Acad Sci India Sect B Biol Sci. 2015; 85(2): 397–405.
  28. 28. Kachare S, Tiwari S, Tripathi N, Thakur VV. Assessment of Genetic Diversity of Soybean (Glycine max) Genotypes Using Qualitative Traits and Microsatellite Markers. Agric Res 2020; 9(1): 23–34. https://doi.org/10.1007/s40003-019-00412-y
  29. 29. Bhandari HR, Bhanu AN, Srivastava K, Singh MN, Shreya I. Assessment of genetic diversity in crop plants—An overview. Adv Plants Agric Res. 2017; 7:279–286.
  30. 30. Darkwa K, Agre P, Olasanmi B, Iseki K, Matsumoto R, Powell A, et al. Comparative assessment of genetic diversity matrices and clustering methods in white Guinea yam (Dioscorea rotundata) based on morphological and molecular markers. Sci Rep. 2020; 10: 13191. pmid:32764649
  31. 31. Luo Y, Zhang X, Xu J, Zheng Y, Pu S, Duan Z. Phenotypic and molecular marker analysis uncovers the genetic diversity of the grass Stenotaphrum secundatum. BMC Genetics. 2020; 21: 86. https://doi.org/10.1186/s12863-020-00892-w
  32. 32. Last L, Lüscher G, Widmer F, Boller B, Kölliker R. Indicators for genetic and phenotypic diversity of Dactylis glomerata in Swiss permanent grassland. Ecological Indicators. 2014; 38:181–191.
  33. 33. Zhang Q-d Jia R-Z, Meng C Ti C-W, Wang Y-L. Diversity and population structure of a dominant deciduous tree based on morphological and genetic data. AoB PLANTS. 2015; 7: plv103. pmid:26311734
  34. 34. Malik MFA, Tariq K, Qureshi AS, Khan MR, Ashraf M, Gul-Naz Ali A. Analysis of genetic diversity of soybean germplasm from five different origins using RAPD markers. Acta Agriculturae Scandinavica, Section B-Soil & Plant Science. 2016; 67:2, 148–154.
  35. 35. Cui C, Li Y, Liu Y, Li X, Luo S, Zhang Z, Wu R. Determination of genetic diversity among Saccharina germplasm using ISSR and RAPD markers. C. R. Biologies. 2017; 340: 76–86. http://dx.doi.org/10.1016/j.crvi.2016.11.005
  36. 36. Moghadam FAM, Qaderi A, Sharifi-Sirchi G. rEvaluation of Genetic Diversity of 17 Populations (Lepidium sativumL.) Plant Collected from Different Regions of Iran by RAPD Marke. ACS Agric Sci Technol. 2021; 1 (6): 684–690. https://doi.org/10.1021/acsagscitech.1c00182
  37. 37. Rohini MR, Sankaran M, Rajkumar S, Prakash K, Gaikwad A, Chaudhury R, et al. Morphological characterization and analysis of genetic diversity and population structure in Citrus x jambhiri Lush. using SSR markers. Genet Resour Crop Evol. 2020; 67: 1259–1275. https://doi.org/10.1007/s10722-020-00909-4
  38. 38. Kumar P, Nimbal S, Badhlakoti N, Singh V, Sangwan RS. Genetic diversity and population structure analysis of morphological traits in upland cotton (Gossypium hirsutem L.). J Appl Genet. 2022; 63 (1): 87–101.
  39. 39. El-Esawi MA, Germaine K, Bourke P, Malone R. AFLP analysis of genetic diversity and phylogenetic relationships of Brassica oleracea in Ireland. C. R. Biologies. 2016; 339: 163–170. pmid:27156498
  40. 40. Bhattacharyya P, Ghosh S, Mandi SS, Kumaria S, Tandon P. Genetic variability and association of AFLP markers with some important biochemical traits in Dendrobium thyrsiflorum, a threatened medicinal orchid. South African J Bot. 2017; 109: 214–222. http://dx.doi.org/10.1016/j.sajb.2016.12.012
  41. 41. Yuan CY, Wang P, Chen PP, Xiao WJ, Zhang C, Hu S. Genetic diversity revealed by morphological traits and ISSR markers in 48 Okras (Abelmoschus escullentus L.). Physiol Mol Biol Plants. 2015; 21(3):359–364. pmid:26261400
  42. 42. Carvalho M, Muñoz-Amatriaín M, Castro I, Lino-Neto T, Matos M, Egea-Cortines M, et al. Genetic diversity and structure of Iberian Peninsula cowpeas compared to world- wide cowpea accessions using high density SNP markers. BMC Genomics. 2017; 18:891. pmid:29162034
  43. 43. Feng J, Zhao S, Li M, Zhang C, Qu H, Li Q, et al. Genome-wide genetic diversity detection and population structure analysis in sweetpotato (Ipomoea batatas) using RAD-seq. Genomics. 2020; 112: 1978–1987. pmid:31756427
  44. 44. Yu Z, Agyeman RF, Hwang SF, Strelkov SE. Molecular genetic diversity and population structure analyses of rutabaga accessions from Nordic countries as revealed by single nucleotide polymorphism markers. BMC Genomics. 2021; 22: 442. pmid:34118867
  45. 45. Larsen B, Gardner K, Pedersen C, Ørgaard M, Migicovsky Z, Myles S, et al. Population structure, relatedness and ploidy levels in an apple gene bank revealed through genotyping-by-sequencing. PLoS ONE. 2018; 13(8): e0201889. pmid:30110387
  46. 46. Adu BG, Badu-Apraku B, Akromah R, Garcia-Oliveira AL, Awuku FJ, Gedil M. Genetic diversity and population structure of early-maturing tropical maize inbred lines using SNP markers. PLoS ONE. 2019; 14(4): e0214810. pmid:30964890
  47. 47. Ali A, Pan YB, Wang QN, Wang JD, Chen JL, Gao SJ. Genetic diversity and population structure analysis of Saccharum and Erianthus genera using microsatellite (SSR) markers. Sci Rep. 2019; 9: 395. pmid:30674931
  48. 48. Bianchi D, Brancadoro L, Lorenzis GD. Genetic Diversity and Population Structure in a Vitis spp. Core Collection Investigated by SNP Markers. Diversity. 2020; 12: 103.
  49. 49. Ghione CE, Lombardo LA, Vicentin IG, Heinz RA. Association mapping to identify molecular markers associated with resistance genes to stink bugs in soybean. Euphytica. 2021; 217: 46. https://doi.org/10.1007/s10681-021-02768-1
  50. 50. Inghelandt DV, Melchinger AE, Lebreton C, Stich B. Population structure and genetic diversity in a commercial maize breeding program assessed with SSR and SNP markers. Theor Appl Genet. 2010; 120: 1289–1299. pmid:20063144
  51. 51. Wurschum T, Langer SM, Longin CFH, Korzun V, Akhunov E, Ebmeyer E, et al. Population structure, genetic diversity and linkage disequilibrium in elite winter wheat assessed with SNP and SSR markers. Theor Appl Genet. 2013; 126: 1477–1486. pmid:23429904
  52. 52. Zavinon F, Adoukonou-Sagbadja H, Keilwagen J, Lehnert H, Ordon F, Perovic D. Genetic diversity and population structure in Beninese pigeon pea [Cajanus cajan (L.) Huth] landraces collection revealed by SSR and genome wide SNP markers. Genet Resour Crop Evol. 2020; 67: 191–208. https://doi.org/10.1007/s10722-019-00864-9
  53. 53. Zurn JD, Nyberg A, Montanari S, Postman J, Neale D, Bassil N. A new SSR fingerprinting set and its comparison to existing SSR- and SNP-based genotyping platforms to manage Pyrus germplasm resources. Tree Genet Genomes. 2020; 16: 72. https://doi.org/10.1007/s11295-020-01467-7
  54. 54. Tanya P, Srinives T, Toojinda A, Ha B-K, Bae J-S, Moon J-K, et al. Evaluation of genetic diversity among soybean genotypes using SSR and SNP. Korean J Crop Sci. 2001; 46(4): 334–340.
  55. 55. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS ONE 2011; 6(5), e19379. pmid:21573248
  56. 56. Poland JA, Brown PJ, Sorrells ME, Jannink JL. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PloS ONE. 2012; 7 (2): e32253. pmid:22389690
  57. 57. Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA. Genotyping by sequencing in ecological and conservation genomics. Mol. Ecol. 2013; 22 (11): 2841–2847. pmid:23711105
  58. 58. Kim C, Guo H, Kong W, Chandnani R, Shuang L-S, Paterson AH. Application of genotyping by sequencing technology to a variety of crop breeding programs. Plant Sci. 2016; 242: 14–22. pmid:26566821
  59. 59. Yao X, Wu K, Yao Y, Bai Y, Ye J, Chi D. Construction of a high-density genetic map: genotyping by sequencing (GBS) to map purple seed coat color (Psc) in hulless barley. Hereditas. 2018; 155, 37. pmid:30473656
  60. 60. Guajardo V, Simon S, Almada R, Saski C, Gasic K, Moreno MA. Genome-wide SNP identification in Prunus rootstocks germplasm collections using Genotyping-by-Sequencing: phylogenetic analysis, distribution of SNPs and prediction of their effect on gene function. Sci Rep. 2020; 10: 1467. https://doi.org/10.1038/s41598-020-58271-5
  61. 61. Chen H, Semagn K, Igbal M, Moakhar NP, Haile T, N’Diaye A, et al. Genome-wide association mapping of genomic regions associated with phenotypic traits in Canadian western spring wheat. Mol Breeding. 2017; 37: 141. https://doi.org/10.1007/s11032-017-0741-6
  62. 62. Luo T, Zhang Y, Zhang C, Nelson MN, Yuan J, Guo L. et al. Genome-wide association mapping unravels the genetic control of seed vigor under low-temperature conditions in rapeseed (Brassica napus L.). Plants. 2021; 10: 426. pmid:33668258
  63. 63. PPV&FRA. Guidelines for the conduct of test for distinctiveness, uniformity and stability on soybean (Glycine max (L.) Merrill). Protection of Plant Varieties and Farmer’s Rights Authority, Government of India. 2009.
  64. 64. ISTA. Accelerated ageing method for soybean. International seed testing rules, Published by International Seed Testing Association, Zurich, Switzerlandt. 2010.
  65. 65. Anonymous, International Rules for Seed Testing (ISTA). Seed Sci Technol. 1996; 24.
  66. 66. Doyle JJ, Doyle JL. Isolation of plant DNA from fresh tissue. Focus. 1990; 12: 13–15.
  67. 67. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research. 1998; 8(3): 186–194. pmid:9521922
  68. 68. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009; 25(14): 1754–1760. pmid:19451168
  69. 69. Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q. et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One. 2014; 9: e90346. pmid:24587335
  70. 70. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA. The variant call format and VCFtools. Bioinformatics (Oxford, England). 2011; 27: 2156–2158. pmid:21653522
  71. 71. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012; 6 (2): 80–92. pmid:22728672
  72. 72. IBM Corp. Released 2015. IBM SPSS Statistics for Windows, Version 23.0. Armonk, NY: IBM Corp.
  73. 73. Hammer Ø, Harper DAT, Ryan PD. PAST: paleontological statistics software package for education and data analysis. Palaeontogia Electronica. 2001; 4(1): 4–9. http://palaeo-electronica.org/2001_1/past/issue1_01.htm.
  74. 74. Pritchard JK, Stevens M, Donnelly P. Inference of population structure using multi-locus genotype data. Genetics. 2000; 155: 945–959. pmid:10835412
  75. 75. Earl DA, vonHoldt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources. 2012; 4 (2): 359–361.
  76. 76. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005; 14: 2611–2620. pmid:15969739
  77. 77. Sokal RR, Michener CD. A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bulletin. 1958; 38: 1409–1438.
  78. 78. Nei M. Genetic distance between populations. Am Nat. 1972; 106: 283–292. http://dx.doi.org/10.1086/282771
  79. 79. Perrier X, Jacquemound-Collet JP. DARwin software. 2006; https://darwin.cirad.fr/
  80. 80. Singh RK, Raipuria RK, Bhatia VS, Rani A, Pushpendra , Husain SM, et al. SSR markers associated with seed longevity in soybean. Seed Sci Technol. 2008; 36: 162–167. https://doi.org/10.15258/sst.2008.36.1.17
  81. 81. Zhang X, Hina A, Song S, Kong J, Bhat JA, Zhao T. Whole-genome mapping identified novel “QTL hotspots regions” for seed storability in soybean (Glycine max L.). BMC Genomics. 2019; 20: 499. pmid:31208334
  82. 82. He J, Zhao X, Laroche A, Lu Z-X, Liu HK, Li Z. Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding. Front Plant Sci. 2014; 5: 484. pmid:25324846
  83. 83. Campa A, Ferreira JJ. Genetic diversity assessed by genotyping by sequencing (GBS) and for phenological traits in blueberry cultivars. PLoS ONE. 2018; 13(10): e0206361. pmid:30352107
  84. 84. Sonah H O’Donoughue L, Cober E, Rajcan I, Belzile F. Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol J. 2015; 13: 211–221. pmid:25213593
  85. 85. Xiong H, Shi A, Mou B, Qin J, Motes D, Lu W, et al. Genetic Diversity and Population Structure of Cowpea (Vigna unguiculata L. Walp). PLoS ONE. 2016; 11(8): e0160941. pmid:27509049
  86. 86. Luo Z, Brock J, Dyer JM, Kutchan T, Schachtman D, Augustin M, et al. Genetic Diversity and Population Structure of a Camelina sativa Spring Panel. Front Plant Sci. 2019; 10: 184.
  87. 87. Benjamini Y, Speed T. Summarizing and correcting the GD content bias in high-throughput sequencing. Nucleic Acids Res. 2012; 40(10): e72.
  88. 88. Kujur A, Bajaj D, Upadhyaya HD, Das S, Ranjan R, Shree T, et al. Employing genome-wide SNP discovery and genotyping strategy to extrapolate the natural allelic diversity and domestication patterns in chickpea. Front Plant Sci. 2015; 6: 162. pmid:25873920
  89. 89. Mantello CC, Cardoso-Silva CB, da Silva CC, de Souza LM, Scaloppi Junior EJ, et al. De Novo Assembly and Transcriptome Analysis of the Rubber Tree (Hevea brasiliensis) and SNP Markers Development for Rubber Biosynthesis Pathways. PLOS ONE. 2014; 9(7): e102665. pmid:25048025
  90. 90. Li YH, Reif JC, Jackson SA, Ma YS, Chang RZ, Qiu LJ. Detecting SNPs underlying domestication-related traits in soybean. BMC Plant Bio. 2014; 14: 251. pmid:25258093
  91. 91. Lambirth KC, Whaley AM, Schlueter JA, Piller KJ, Bost KL. Transcript polymorphism rates in soybean seed tissues are increased in a single transformant of Glycine max. Int J Plant Genom. 2016; 1–12. https://doi.org/10.1155/2016/1562041
  92. 92. Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT. SNP discovery by high-throughput sequencing in soybean. BMC Genom. 2010; 11: 469. pmid:20701770
  93. 93. Jarquin D, Kocak K, Posada L, Hyma K, Jedicka J, Graef G, et al. Genotyping by sequencing for genomic prediction in a soybean breeding population. BMC Genom. 2014; 15: 740. http://www.biomedcentral.com/1471-2164/15/740
  94. 94. Fu Y.-B.;Cober E.R.; Morrison M.J.; Marsolais F.; Peterson G.W.; Horbach C. Patterns of Genetic Variation in a Soybean Germplasm Collection as Characterized with Genotyping-by- Sequencing. Plants 2021; 10: 1611. pmid:34451656
  95. 95. Giral H, Landmesse U, Kratzer A, Into the wild: GWAS exploration of non-coding RNAs. Front Cardiovasc Med. 2018; 5: 181. pmid:30619888
  96. 96. Goettel , Xia E, Upchurch R, Wang M-L, Chen P, Y.-Q C. An Y-Q C. Identification and characterization of transcript polymorphisms in soybean lines varying in oil composition and content. BMC Genom. 2014; 15: 299. pmid:24755115
  97. 97. Filippi CV, Aguirre N, Rivas JG, Zubrzycki J, Puebla A, Cordes D et al. Population structure and genetic diversity characterization of a sunflower association mapping population using SSR and SNP markers. BMC Plant Bio. 2015; 15:52. pmid:25848813
  98. 98. Nachimuthu VV, Muthurajan R, Duraialaguraja S, Sivakami R, Pandian BA, Ponniah G, et al. Analysis of Population Structure and Genetic Diversity in Rice Germplasm Using SSR Markers: An Initiative Towards Association Mapping of Agronomic Traits in Oryza sativa. Rice. 2015; 8: 30.
  99. 99. Chen H, Chen H, Hu L, Wang L, Wang S, Wang ML, et al. Genetic diversity and a population structure analysis of accessions in the Chinese cowpea [Vigna unguiculata (L.) Walp.] germplasm collection. Crop J. 2017; 5: 363–372. http://dx.doi.org/10.1016/j.cj.2017.04.002
  100. 100. Chander S, Garcia-Oliveira AL, Gedil M, Shah T, Otusanya GO, Asiedu R, et al. Genetic diversity and population structure of soybean lines adapted to sub-Saharan Africa using single nucleotide polymorphism (SNP) markers. Agronomy. 2021; 11(3): 604. https://doi.org/10.3390/agronomy11030604
  101. 101. Singh N, Choudhury DR, Singh AK, Kumar S, Srinivasan K, et al. Comparison of SSR and SNP Markers in Estimation of Genetic Diversity and Population Structure of Indian Rice Varieties. PLoS ONE. 2013; 8(12): e84136. pmid:24367635
  102. 102. Courtois B, Frouin J, Greco R, Bruschi G, Droc G, Hamelin C, et al. Genetic Diversity and Population Structure in a European Collection of Rice. Crop Sci. 2012; 52:1663–1675.
  103. 103. Bohra A, Pandey MK, Jha UC, Singh B, Singh IP, Datta D, et al. Genomics-assisted breeding in four major pulse crops of developing countries: present status and prospects. Theor Appl Genet. 2014; 127: 1263–1291. pmid:24710822
  104. 104. Hosamani J, Kumar MB, Talukdar A, Lal SK and Dadlani M. Molecular characterization and identification of candidate markers for seed longevity in soybean [Glycine max (L.) Merill]. Indian J Genet. 2013; 73(1): 64–71.
  105. 105. Gangwar CB, Singh , Selvam CP, Poonam S, Arti K. Physiological changes in accelerated ageing seeds of chickpea (Cicer arietinum L.). J Foof Legumes. 2016; 29(2): 132–135.
  106. 106. Vanniarajan C, Sanjeev S, Nepolean T. Accelerated ageing in rice fallow blackgram varieties. Legume Res- An Int J. 2004; 27(2): 119–122.
  107. 107. Meyer MRM, Rojas A, Santanen A, Stoddard FL. Content of zinc, iron and their absorption inhibitors in Nicaraguan common beans (Phaseolus vulgaris L.). Foof Chemistry. 2013; 136(1): 87–93. https://doi.org/10.1016/j.foodchem.2012.07.105
  108. 108. Yang Q, Lin G, Lv H, Wang C, Yang Y, Liao H. Environmental and genetic regulation of plant height in soybean. BMC Plant Biology. 2021; 21: 63. pmid:33494700
  109. 109. He Q, Xiang S, Yang H, Wang W, Shu Y, Li Z, et al. A genome-wide association study of seed size, protein content, and oil content using a natural population of Sichuan and Chongqing soybean. Euphytica. 2021; 217: 198. https://doi.org/10.1007/s10681-021-02931-8
  110. 110. Valliyodan B, Qiu Q, Patil G, Zeng P, Huang J, Dai L. et al. Landscape of genomic diversity and trait discovery in soybean. Sci Rep. 2016; 6: 23598. pmid:27029319
  111. 111. Torres AR, Grunvald AK, Martins TB, Santos MAD, Lemos NG, Silva LAS. Genetic structure and diversity of a soybean germplasm considering biological nitrogen fixation and protein content. Sci Agric. 2015; 72 (1): 47–52. http://dx.doi.org/10.1590/0103-9016-2014-0039
  112. 112. Li M, Liu Y, Wang C, Yanf X, Li D, Zhang X, et al. Identification of traits contributing to high and stable yields in different soybean varieties across three Chinese latitudes. Fron Plant Sci. 2020; 10: 1642. pmid:32038668
  113. 113. Shilpashree N, Devi SN, Manjunathagowda DC, Muddappa A, Abdelmohsen SAM, et al. Morphological characterization, variability and diversity among vegetable soybean (Glycine max L.) Genotypes. Plants. 2021; 10: 671. pmid:33807322
  114. 114. Ma J, Amos CI. Principal Components Analysis of Population Admixture. PLoS ONE. 2012; 7(7): e40115. pmid:22808102
  115. 115. Liu J, Qin WT, Wu HJ, Yang CQ, Deng JC, Iqbal N, et al. Metabolism variation and better storability of dark-versus light-coloured soybean (Glycine max L. Merr.) seeds. Food chemistry. 2017; 223:104–113. pmid:28069115
  116. 116. Renard J, Ninoles R, Almonacid IM, Gayubas B, Fernandez RM, Bissoli G, et al. Identification of novel seed longevity genes related to oxidative stress and seed coat by genome-wide association studies and reverse genetics. Plant Cell Environ. 2020; 43:2523–2539. pmid:32519347
  117. 117. Sharma S, Kaur A, Bansal A, Gill BS. Positional effects of soybean seed composition during storage. J Food Sci Technol. 2013; 50: 353–359.
  118. 118. Indiarto R, Qonit MAH. A review of soybean oil lipid oxidation and its prevention techniques. Int J Advanced Sci Technol. 2020; 29(6): 5030–5037.
  119. 119. Zhou S, Sekizaki H, Yang Z, Sawa S, Pan J. Phenolics in the seed coat of wild soybean (Glycine soja) and their significance for seed hardness and seed germination. J Agric Food Chem. 2010; 58(20): 10972–10978. pmid:20879792
  120. 120. Shen M, Weihao W, Cao L. Soluble dietary fibers from black soybean hulls: Physical and enzymatic modification, structure, physical properties, and cholesterol binding capacity. J Food Sci. 2020; 10.1111/1750-3841.15133 pmid:32458493
  121. 121. Mullin WJ, Xu W. Study of soybean seed coat components and their relationship to water absorption. J Agric Food Chem. 2001; 49: 5331–5335. pmid:11714324
  122. 122. Sun L, Miao Z, Cai C, Zhang D, Zhao M, Wu Y, et al. GmHs1‐1, encoding a calcineurin‐like protein, controls hard‐ seededness in soybean. Nature Genetics. 2015; 47: 939–943. pmid:26098868
  123. 123. Astadi IR, Astuti M, Santoso U, Nugraheni PS. In vitro antioxidant activity of anthocyanins of black soybean seed coat in human low density lipoprotein (LDL). Food Chemistry. 2009; 112: 659–663.
  124. 124. Zhou W, Chen F, Zhao S, Yang C, Meng Y, Shuai H, et al. DA‐6 promotes germination and seedling establishment from aged soybean seeds by mediating fatty acid metabolism and glycol-metabolism. J Exptl Bot. 2019b; 70: 101–114. https://doi.org/10.1093/jxb/ery247
  125. 125. Debeaujon I, Léonkloosterziel KM, Koornneef M. Influence of the testa on seed dormancy, germination, and longevity in Arabidopsis. Plant Physiology. 2000; 122: 403–414. pmid:10677433