Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identification of quantitative trait loci controlling soybean seed protein and oil content

  • Elizabeth M. Clevinger,

    Roles Formal analysis, Investigation, Writing – original draft

    Affiliation School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America

  • Ruslan Biyashev,

    Roles Formal analysis, Investigation, Writing – review & editing

    Affiliation School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America

  • David Haak,

    Roles Investigation, Writing – review & editing

    Affiliation School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America

  • Qijian Song,

    Roles Investigation, Writing – review & editing

    Affiliation Soybean Genomics and Improvement Lab, United States Department of Agriculture-Agricultural Research Service, Beltsville, Maryland, United States of America

  • Guillaume Pilot,

    Roles Writing – review & editing

    Affiliation School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America

  • M. A. Saghai Maroof

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    smaroof@vt.edu

    Affiliation School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America

Abstract

Soybean is a major source of seed protein and oil globally with an average composition of 40% protein and 20% oil in the seed. The goal of this study was to identify quantitative trait loci (QTL) conferring seed protein and oil content utilizing a population constructed by crossing an above average protein content line, PI 399084 to another line that had a low protein content value, PI 507429, both from the USDA soybean germplasm collection. The recombinant inbred line (RIL) population, PI 507429 x PI 399084, was evaluated in two replications over four years (2018–2021); the seeds were analyzed for seed protein and oil content using near-infrared reflectance spectroscopy. The recombinant inbred lines and the two parents were re-sequenced using genotyping by sequencing. A total of 12,761 molecular markers, which came from genotyping by sequencing, the SoySNP6k BeadChip and selected simple sequence repeat (SSR) markers from known protein QTL chromosomal regions were used for mapping. One QTL was identified on chromosome 2 explaining up to 56.8% of the variation for seed protein content and up to 43% for seed oil content. Another QTL identified on chromosome 15 explained up to 27.2% of the variation for seed protein and up to 41% of the variation for seed oil content. The protein and oil QTLs of this study and their associated molecular markers will be useful in breeding to improve nutritional quality in soybean.

Introduction

Soybean (Glycine max (L.) Merr.) is one of the major sources of seed protein and oil in the United States and around the world with an average composition of 40% protein and 20% oil [1]. It is also a source for essential amino acids and metabolizable energy for both human and animal consumption [2]. One of the most important uses of soybean is protein rich soybean meal for poultry and swine feed, since it has the highest level of crude protein among plant-based protein sources [3, 4]. Conventional cultivars of soybean generally have protein values between 38–42% on a dry weight basis in the seed. The U.S. used 33.5 million metric tons of soybean meal in 2021 for livestock feed with the majority going to poultry. The U.S. also consumed 10.6 million metric tons of soybean oil in 2021 (http://soystats.com). Increasing protein and oil content in soybean seeds would enhance the economic value for growers and processors [5]. Soybean seed protein and oil content are complex quantitatively inherited polygenic traits [68]. Increasing protein content is problematic due to the negative correlation with oil content, carbohydrates and seed yield and is affected by environmental conditions [914]. It has been thoroughly documented that there is a negative correlation between oil and protein, typically a 1% reduction in total oil content will lead to a 2% increase in total protein content [1416]. It has also been reported that soybean protein content is higher and oil content lower in the Southeast United States compared to the Midwest [15].

Quantitative trait loci (QTL) controlling protein and oil concentrations have been mapped to all 20 soybean chromosomes. Over 250 protein QTL from bi-parental population studies are currently listed in Soybase (http://soybase.org, 2021). QTLs for protein and oil content have been repeatedly found and mapped on chromosomes 15 and 20. Diers et al. [17] first mapped protein and oil QTL to chromosomes 15 and 20 using a cross developed from an experimental G. max line and a G. soja accession. Chung et al. [13] using 76 recombinant inbred lines (RILs) derived from a high protein by high yield cross was able to determine that the same chromosome 20 QTL detected in other populations was also segregating in this population. Wang et al. [18] identified and validated QTLs associated with seed yield, protein and oil content in two different RIL populations. They were able to identify QTLs for protein and oil content on chromosomes 2, 3, 4, 9, 11, 17 and 20. Bandillo et al. [19], conducted genome wide association studies (GWAS) using 12,000 accessions from the USDA germplasm collection to detect seed protein and oil QTL. This study identified significant single nucleotide polymorphisms (SNPs) associated with both seed protein and oil content on both chromosomes 15 and 20. The identified chromosome 20 region was in the same location as those that had been reported in previous studies and was further narrowed with this study. Warrington et al. [20] were able to identify seed protein QTL on chromosomes 14, 15, 17 and 20 and the QTL found on chromosome 20 explained 55% of the phenotypic variation in their population. Phansak et al. [1] used selective genotyping-based QTL analysis to survey 48 populations to observe both previously known and unknown protein QTL. Lee et al. [21] conducted a genome-wide association study utilizing data from five environments for over six hundred accessions from multiple maturity groups. They identified QTL for seed protein, oil and amino acid content. Significant SNPs for seed protein and oil content were found on chromosomes 15, 19 and 20 in this study. Zhang et al. [22] used linkage analysis and GWAS on over 300 RILs and 200 soybean accessions to identify 15 QTLs affecting protein and/or oil content. QTL regions for seed protein and oil content have now been very well researched especially those meta-QTL on chromosomes 15 and 20 [23]. As these authors note these loci may be used for improvement but for just one trait at a time since there is such a strong negative correlation between protein and oil content associated with these loci.

Recently, progress has been made on discovery of the genes that may be responsible for protein content on chromosomes 15 and 20. Patil et al. [24] identified 52 putative SWEET genes, that play various roles within plants. For example, Glyma.15G049200 is a sucrose efflux transporter gene and is highly expressed in soybean seeds and leaves [25]. Using a combination of RNA-seq data and qRT-PCR comparing two soybean accessions differing in oil content they showed that the Glyma.15G049200 gene was positively correlated with seed oil content and has been selected to increase the oil content in soybean breeding. Zhang et al. [26] using association analysis identified an insertion/deletion (—/CC) within Glyma.15G049200, which was associated with both protein and oil content. The CC deletion caused the deletion of 19 amino acids, a premature stop codon and six amino acid changes from the C-terminus of Glyma.15G049200 in Williams82. Three amino acids were also changed in the cytoplasmic C-terminal tail that appear to be highly conserved in legumes. It is undetermined though how these changes affect the activity of Glyma.15G049200 and how this impacts oil and protein content [26]. They observed that the accessions in this study with the presence of CC (CC+) contained 9.5% less oil and 4.6% more protein than those with the CC deletion (CC-). Their findings suggest that the two alleles CC+ and CC- may be used for developing high protein lines or for oil improvement, respectively [26]. Fliege et al. [27] were able to fine map the cqSeed protein-003 QTL on chromosome 20 that has the greatest additive effect for seed protein content. Through fine mapping and positional cloning, they identified an insertion/deletion polymorphism in Glyma.20G085100 that controlled seed protein. Goettel et al. [28] identified a transposable element insertion within Glyma.20G085100 that causes significant increases in seed oil content and weight with a decrease in protein content.

Identification of unique QTLs responsible for high protein content is a critical step in the development of high protein content soybean cultivars. The USDA soybean germplasm collection has been reported to have phenotypic variation for protein content ranging from 34.1% to 56.8% [29]. The goal of this study was to identify QTL associated with seed protein and oil content. For this purpose, a population was constructed by crossing a high protein content line with a low protein content line, both from the USDA soybean germplasm collection. This segregating population allowed the identification of large effect QTLs for protein and oil content on chromosome 2 and chromosome 15.

Materials and methods

Genetic material

The population used in this study consisted of 96 recombinant inbred lines (RILs) from the cross PI507429 x PI399084. Plant Introduction (PI) 507429 (Tousan 89) was chosen based on its low protein content from the germplasm resource information network (GRIN) from the USDA. PI 399084 (Chungchong Namdo) was chosen based on a higher than average protein content based on the GRIN database. These two plant introduction lines were crossed in the summer of 2010 and advanced annually in the field in Blacksburg, Virginia (37°11ʹ53.15˝N, 80°34ʹ24.77˝W), via the single seed descent method. This RIL population (F7-F10) was planted over four years in 2018–2021, in two replications each year. The average of the parental lines over these four years for seed protein and oil content were 39.7%, 14.2% and 30.9%, 18.9%, for PI 399084 and PI 507429, respectively.

Young first or second trifoliate leaves of greenhouse-grown F7-9 plants were collected for DNA extraction. DNA from parental lines and at least 10 bulked plants from each individual RIL was isolated from lyophilized tissues using the CTAB method as described in Saghai Maroof et al. [30] with minor modifications. DNA concentration was measured with a DyNA Quanta2000 Fluorometer (Hoefer®Scientific, San Francisco, CA).

Phenotypic analysis

The seeds were analyzed for protein and oil content using near-infrared reflectance spectroscopy (NIRS) utilizing a DA 7250 NIR Analyzer (Perten Instruments, Springfield, IL). Briefly, around 42 g of seed from each line were put in a clear 2 oz. plastic cup and were analyzed for protein and oil content on a dry basis percentage and converted to a 13% moisture basis. Samples were measured 3 times and then averaged for each sample. Protein content was also independently determined by combustion on replications 1 and 2 from 2020 to confirm results obtained from NIRS analysis of those same replications/year (Eurofins Scientific Inc., Des Moines, Iowa, [AOAC 992.15, AOAC 990.03 and AOCS Ba4e-93]).

Statistical analysis

R version 3.6.1 was used to perform single-factor ANOVA and variance component analysis using the protein and oil data from the RIL population. The estimated variance components were used to compute the broad sense heritability (H2) for seed protein and oil based on the data from the 2 replications of 2018–2021 plantings. Broad sense heritability (H2) was calculated as: in which and refer to genotypic variance and genotype x environment variance, respectively. Coefficients e and r refer to the number of environments and replications within environments [31].

Correlations between protein and oil content for the two replications from each year from 2018 to 2021 and the averages of those replications for each year were calculated using R version 3.6.1

Genotyping by sequencing

A total of 96 recombinant inbred lines (RILs) and two parents were sequenced using genotyping by sequencing (GBS). The parents were also subjected to high-depth resequencing (>50X) to generate a high-density panel of markers. Genomic DNA was extracted from leaf tissue using the CTAB method as described in Saghai Maroof et al. [30] with minor modifications. These DNA samples were digested and prepared for as libraries for genotyping by sequencing as described in the protocol by [32]. Briefly, genomic DNA templates from 102 individuals were digested with ApeKI 6-base cutter (GCWGC) to reduce the genome complexity. Two 51-plex GBS libraries comprising 102 total DNA from 96 RILs, 2 parents replicated 2 times and 2 bulk samples, were prepared by ligating the digested DNA to unique barcode nucleotide adapters, followed by standard Polymerase Chain Reaction (PCR). The resulting 51-plex libraries were sequenced in two lanes of Illumina NovaSeq 6000 2 x 150bp. In addition, standard DNAseq libraries were constructed for each of the PI parents (N = 2) and sequenced using a single lane of Illumina NovaSeq 6000.

Sequencing data analysis and SNP calling

Raw sequence data were aligned to the soybean reference genome (Glycine max Wm82.a4.v1); [33] using bwa with the MEM option (version 0.7.17; [34]. Alignment files were sorted and PCR duplicates removed using samtools (version 1.2.1). Sorted bam files were simultaneously analyzed using freebayes (version-1.0; [35] with the following settings,—strict-vcf,—genotype-qualities,—pooled-continuous, -F 0.1 -C 5. This output a single vcf file with raw SNPs which were then filtered following the dDocent guidelines [36]. In short, using vcftools [37] variants were filtered for depth > 5, quality > Q30, and initially 50% missingness. This file was used to screen samples for high levels of missingness (all were <30%). The final SNP set was filtered for a maximum of 15% missing values and a minor allele frequency < 0.05, and 26 samples were removed due to missingness. These SNP markers were designated as GBS followed by assigned chromosome number and physical position (bp value) based on W82.a4.v1 assembly.

Molecular marker assay

To identify chromosomal regions associated with protein and oil content, the segregating population was genotyped using the Illumina Infinium SoySNP6K BeadChip [38] in addition to the SNPs identified through GBS. SNPs in the 6k BeadChip were selected from the SoySNP50K [39]. The 6K marker data set was processed using GenomeStudio software (version 3.2.23). SNP markers that were monomorphic between parents of the RIL population and those which had more than 20% missing data were not used for linkage map construction. To relate our results to previous studies, a set of 26 Simple Sequence Repeat (SSR) markers from the known protein QTL chromosomal regions on chromosomes 2 and 15 were also mapped in the RIL population. Several GBS SNPs from the protein and oil QTL regions of this study were converted to KASP markers (GBS based SNPs designated with “K” at the end of the corresponding GBS loci). Primer sequences of KASP SNP markers are presented in the S1 Table. SSR markers were amplified by PCR with dye labeled forward primers [40] and analyzed by capillary electrophoresis using an Applied Biosystems 3130xl Genetic Analyzer unit (Carlsbad, CA, USA).

Map construction and quantitative trait locus analysis

SNP data from the GBS and SoySNP6K of the RIL population were used for linkage map construction. JoinMap 4.0 [41] with a LOD threshold of 4.0 and a maximum recombination frequency of 50% for the original grouping was employed. Marker order and their positions within each linkage group were determined using the maximum likelihood algorithm and Kosambi mapping function; those unassigned to any linkage group (LG) were excluded.

MapQTL 5.0 software [42] was used for the identification of oil and protein QTL. Each round of QTL analysis was performed in two stages: Interval Mapping (IM) to detect critical chromosomal regions followed by more detailed QTL mapping provided by the enhanced power of multiple-QTL (MQM) method with the walking speed set to 1.0 cM.

In order to identify levels of LOD significance thresholds on both genome-wide and individual chromosome basis, permutation tests were conducted over 1000 iterations. A genome-wide LOD threshold was calculated at 3.5 and was used as a base line for QTL justification.

CC-/CC+ sequencing

Using the genomic sequence of Williams 82 Glyma.15g49200, a pair of primers were designed to encompass CC indel in exon 6 of the gene, which is associated with seed protein/oil content as reported by Zhang et al. [26]. Primer design software within DNASTAR Lasergene 17 package was used for appropriate oligo-pairs selection. The designed primer sequences TGGGGCTAGTTCAGATGGT (Forward) and AATTGATACTCCATTGAGGTAGT (Reverse) were used for PCR amplification and Sanger (regular) sequencing. PCR product sizes were 340 bp with CC deletion and 342 bp with insertion. Sequencing was conducted by Eton Biosciences Inc. (Research Triangle Park, NC). It was determined that the parental lines, PI 507429 and PI 399084 were polymorphic for this indel and therefore, all RILs were sequence-genotyped for this indel.

Results

Phenotypic and statistical analyses

The population was planted over four years, (2018–2021), with two replications each year. The data from each replication and the mean of those two replications resulted in twelve data sets using NIRS for protein and oil assay. In addition to NIRS, seeds from both replications in 2020 were assayed via protein combustion by Eurofins (Des Moines, IA); the mean of these two replications resulted in three data sets. QTL mapping was conducted using a total of 15 protein data sets including: 12 data sets based on NIRS assay and three data sets based on the protein combustion assay. The range of protein content values for all 15 data sets varies from a low of 26.84% in 2021 Rep 2 to a high of 46.59% in protein combustion 2020 Rep1 (Table 1). The skewness for all protein content data sets was negative and varied between moderately skewed to symmetric (Table 1). The range of oil content values for the 12 data sets was quite variable within the population from a low of 11.09% in the second replication of 2019 to a high of 20.74% in the first replication from 2020 (Table 2). The skewness of the oil content data set is symmetric for all of the data sets.

thumbnail
Table 1. Descriptive statistics of the 15 protein content data sets used to identify QTL on chromosomes 2 and 15 in the RIL population of PI 507429 x PI 399084.

https://doi.org/10.1371/journal.pone.0286329.t001

thumbnail
Table 2. Descriptive statistics of the 12 oil content data sets for PI 507429 x PI 399084 population.

https://doi.org/10.1371/journal.pone.0286329.t002

The broad sense heritability estimates for protein and oil traits in this population were 0.90 and 0.88, respectively, which are very similar to those reported earlier by Hyten et al. [43], Eskandari et al. [11], Mao et al. [44], Wang et al. [18], and Zhang et al. [45]. Pearson’s correlation between protein and oil content traits for the years 2018–2021 for both replications and the average of those replications for each year were strongly negative (Table 3). This very strong negative correlation, up to -0.87, between protein and oil content is what would be expected since these traits are a well-known subject in the search for finding a positive correlation for these important soybean seed traits [46].

thumbnail
Table 3. Correlations between the protein and oil content traits (%) within the 2018–2021 replications of the RIL population.

https://doi.org/10.1371/journal.pone.0286329.t003

Chromosomal map construction and quantitative trait locus identification

The JoinMap 4.0 [41] software package was used for map construction. A total of 12,761 markers including 10,526 SNP loci from GBS, 2,217 SNP markers from SoySNP6k Illumina Beadchip, 26 SSR and one indel type marker were mapped on 20 linkage groups using DNA from F7 RILs (S2 Table). MapQTL 5.0 [42] was used for QTL identification, and resulted in the identification of two QTLs. A large effect QTL was identified on chromosome 2 and another on chromosome 15.

The maximum LOD scores for the major seed protein QTL on chromosome 2 using NIRS data ranged from a low of 4.59 for replication 2 in 2018 to a high of 17.48 for replication 1 of the 2020 protein combustion data (Table 4, column 3). The level of phenotypic variation of the major QTL on chromosome 2, as shown in Table 4 (column 4), ranged from a minimum of 20.1% to a maximum of 56.8%. The QTL region between markers flanking the max LOD points was variable. However, depending on the data set, the map position at the maximum LOD varied only from 279 to 281 cM except for the 2019 data where replication 2 had a position of 274.1 cM (Table 4, column 2). Markers at or closest to the max LOD points are shown in Table 4, column 5. Two markers BARCSOYSSR_02_1645 and BARCSOYSSR_02_1685 flank the maximum LOD (Table 4, column 3) regions resulting from each of the 15 data sets. The regional map for the QTL along chromosome 2 can be seen in Fig 1.

thumbnail
Fig 1. A regional map of chromosome 2 of the PI 507429 x PI 399084 RIL population showing the maximum LOD region of the major seed protein and oil QTL detected between 265.4–283.1 cM.

The set LOD threshold was 3.5. (Markers ending in K are GBS SNP markers converted to a KASP marker).

https://doi.org/10.1371/journal.pone.0286329.g001

thumbnail
Table 4. Quantitative trait locus detected on chromosome 2 using composite interval mapping in the RIL population PI 507429 x PI 399084 using NIRS data sets from 2018–2021 and protein combustion using seeds from the 2020 harvest.

https://doi.org/10.1371/journal.pone.0286329.t004

The seed protein QTL region on chromosome 2 was also associated with the QTL for seed oil content with a LOD score varying from 4.1 to 11.2 (Table 4, column 3). The map positions at the maximum LOD ranged from 275.6 cM for the 2018 data set of seed oil content to 280.1 cM for the 2021 data (Table 4, column 2). These maximum LOD map positions for seed oil content fall within the seed protein content maximum LOD map locations (Table 4 column 2). The level of explained phenotypic variation for oil content had a minimum of 17.9% based on the 2018 data set and a maximum of 43% in the 2019 data set (Table 4, column 4). The regional map for the chromosome 2 seed protein and oil QTL region can be seen in Fig 1.

A QTL for seed protein content was also identified on chromosome 15. The minimum and maximum explained phenotypic variation for these data sets (2018–2021) were 16% and 27.2%, respectively (Table 5, column 4). Depending on the data set, map positions at max LOD varied from 37.5 to 39.9 cM (Table 5, column 2). The chromosomal region between markers flanking the max LOD map positions extended from 32.4 cM (Gm15_3468596_G_T) to 43.5 cM (BARCSOYSSR_15_0200) (Table 5, last column, Fig 2). This protein QTL region also appears to be associated with seed oil content. The genetic map positions corresponding to maximum LOD scores for the seed oil content trait were identified within the range of 32.4 to 49.2 cM (Table 5, last column) designated by SNPs Gm15_3468596_G_T and Gm15_4522374_C_A (Fig 2). The minimum and maximum percent explained phenotypic variation of the seed oil data sets were 21% and 41%, respectively.

thumbnail
Fig 2. A regional map of chromosome 15 of the PI 507429 x PI 399084 RIL population showing the maximum LOD region of the seed protein and oil QTL detected between 32.4 to 49.2 cM.

SSR marker Sat_289, a marker used in previous studies, was monomorphic in our population. Therefore, its expected location is shown in parentheses. The set LOD threshold was 3.5. (Those markers ending in K are GBS SNP markers converted to a KASP marker).

https://doi.org/10.1371/journal.pone.0286329.g002

thumbnail
Table 5. Quantitative trait locus detected on chromosome 15 using composite interval mapping in the RIL population, PI 507429 x PI 399084 using NIRS data sets from 2018–2021 and protein combustion using seeds from the 2020 harvest.

https://doi.org/10.1371/journal.pone.0286329.t005

Effect of CC indel on protein/oil content

Zhang et al. [26] identified a CC deletion (CC-) in the coding sequence of Glyma.15G049200 (GmSWEET39, a sugar transporter), that caused a reading frameshift. This gene located on soybean chromosome 15 was associated with both oil and protein content in the accessions they studied. Since the parents of our population were polymorphic for this indel (PI 507429 CC-; PI 399084 CC+), it was incorporated into the chromosome 15 genetic map and used for protein/oil content QTL mapping in this study. The genetic map position of the CC indel locus as well as its orientation relative to surrounding SNP loci were in good agreement with the reference physical map of Williams 82. Our mapping confirmed a close association of the CC indel marker with the detected protein and oil content QTLs on chromosome 15 in this study (Fig 2, Table 5 column 5).

Zhang et al. [26] proposed that the two distinct alleles of GmSWEET39 have had two separate avenues for the improvement process. The CC- allele has been inadvertently selected during breeding for oil content improvement and the CC+ allele for protein content improvement [26]. In the Zhang et al. [26] study looking at differences between wild, landrace and cultivars, the landrace lines had an average increase of 2.94% in oil in CC- lines with a decrease of 1.15% in protein content and the two parents of this population, PI 507429 and PI 399084, are landrace lines.

Discussion

Soybean protein and oil content traits are quantitatively inherited. In the current study, the RIL population of PI 507429 x PI 399084 was evaluated in two replications in each of the planting years 2018 to 2021 and seed protein and oil content data were collected. Quantitative trait locus mapping was used and large effect QTLs were detected on chromosomes 2 and 15. The large effect QTL on chromosome 2 explained a high level of phenotypic variation in each replication/year for seed protein up to 56.8% and oil content up to 43%. Similarly, the QTL on chromosome 15 accounted for up to 27.2% of the variation for protein content and up to 41% for oil content.

Protein and oil content QTL have previously been reported on all twenty soybean chromosomes (Soybase.org). The QTL on chromosomes 15 and 20 are frequently identified as having significant effects on protein content. The QTL on chromosome 15 identified in this study continued this trend, showing phenotypic variation of the seed protein content between 16% and 27.2% and phenotypic variation for oil content between 21% and 41%. The QTL detected on chromosome 15 for both traits mapped to a region between flanking markers Gm15_3468596_G_T and Gm15_4522374_C_A (Table 5, Fig 2). This is in the same region that other protein and oil QTLs on chromosome 15 have been identified in other reports [20, 26, 4751]. The present study also confirmed a close association of the CC indel marker (Gm15g49200_CC) developed on GmSWEET39 gene sequence which was earlier identified by Zhang et al. [26] with detected protein and oil content QTLs on chromosome 15 (Table 5, Fig 2).

The maximum LOD QTL region identified for the protein and oil traits on chromosome 2 was between 265.4 and 283.1 cM including the Satt459 SSR locus (Fig 1). Hyten et al. [43] were the first to report a protein QTL on chromosome 2, which was associated with the Satt459 marker. The same region was later reported by Qi et al. [52] to contain an oil QTL. Wang et al. [18] also mapped a protein QTL between SSR markers Satt274 and Sat_289 which encompasses the Satt459 region (Fig 1). The seed oil QTL was also mapped with markers Satt274 and Satt459 on chromosome 2 by others [5254]. In these reports the physical position of the protein and oil QTL falls into an interval of 45.3 to 47.0 Mbp (Glyma 2.0) between SSR loci Satt274 and Sat_289, which almost coincides with the starting point of the QTL interval identified in our study but extends about 1.3 Mbp further down to Sat_289 region. Our results confirmed the seed protein and oil QTL location on chromosome 2. In this study, we were able to narrow down the protein and oil QTL region on chromosome 2 from about 1.7 to 0.8 Mbp. This calculation is based on physical distance comparison between previously reported QTL markers Satt274 and Sat_289 (45,267,222 to 47,042,650 bp) and those identified in this study BARCSOYSSR_02_1645 and BARCSOYSSR_02_1685 (44,939,870 to 45,728,856 bp). This was made possible by incorporating numerous SNP markers for genetic map construction. Also, the max LOD scores and the corresponding levels of explainable phenotypic variation (Table 4) allowed us to claim high significance status for the protein and oil QTL on chromosome 2 identified in this study.

Previous studies have identified minor effect seed protein and oil QTLs in other regions of the chromosome 2 as in the report by Gillenwater et al. [55] who identified four minor effect QTL: one for protein, one for oil and two for both seed protein and oil content each explaining less than 10% of the phenotypic variation. Kabelka et al. [56] identified a QTL on LG D1b (chromosome 2) that explained 14% of the phenotypic variation for protein content. Chen et al. [49] identified a minor QTL on LG D1b explaining 5.16% for protein located near Sat_135 and Satt537. Mao et al. [44] observed a QTL on chromosome 2 that explained 29% of the phenotypic variation for protein content. This group also observed two other QTL on chromosome 2 but all had minor effect. Qi et al. [57] also observed this same QTL that Mao et al. [44] identified but in their study, it explained 10.92% of the phenotypic variation and in only one environment. Wang et al. [18] identified a protein content QTL on chromosome 2 that explained between 12.3% and 16.4% of the phenotypic variation. The present study detected a major effect QTL on chromosome 2, explaining up to 56.8% and 43% of the variation for protein and oil content, respectively.

Variability from year to year in seed protein and oil content in this population may partially be attributed to the environmental conditions during seed filling. The RIL population described herein was grown only in one location but was observed over four years with 2 replications each and the temperature and water availability conditions varied each year. Previous studies have observed mixed results of environmental factors such as temperature and water availability [15, 58, 59]. In our study, in years where the temperature was above average during seed fill and water was limited such as in 2019, we noticed a reduction in protein content whereas, in 2018, where the opposite was observed, the protein content was noticeably higher for some lines.

We observed a high heritability of protein and oil traits in this population that was consistent with the values reported by others [11, 18, 43, 45, 60]. The high heritability of these two traits would indicate that a considerable amount of the variation within the population is genetic. We also observed a negative correlation between seed protein and oil content traits for the years 2018–2021 for both replications and the average of those replications. This negative correlation between protein and oil content is what would be expected since these traits are known to be negatively correlated and reported in other mapping populations [13, 18, 43, 50, 61].

In this study, the two QTL identified on chromosomes 2 and 15 appear to be strongly associated with both seed protein and oil contents. Earlier studies such as Pathan et al. [61] identified two QTL for protein and oil on chromosomes 5 and 6. Hwang et al. [62] identified three QTL regions marked by seven SNP loci on chromosomes 8, 9 and 20 associated with both seed protein and oil content. Bandillo et al. [19] identified multiple SNPs on chromosomes 15 and 20 through GWAS that were associated with both oil and protein contents. Seo et al. [63] identified four QTL for both traits in a selected breeding population. Zhu et al. [64] detected QTL controlling both seed protein and oil content on chromosomes 8, 15 and 20.

Considering the high broad-sense heritability found within this population, selection for the seed protein and oil content seems possible. This population contains lines that have protein content values higher than PI 399084 and other lines have oil content values higher than PI 507429. These lines and the molecular markers identified in this study may be useful in a breeding program when selecting for increased seed protein or oil content. Fifteen of the SNPs mapping to chromosomes 2 and 15 were converted to KASPR markers facilitating their use in breeding programs.

Conclusion

In this study, using a RIL population, we constructed a high-resolution map, analyzed protein and oil content data from 4 years of field testing and identified large effect QTLs on chromosomes 2 and 15 for these traits. Protein content QTL mapping results based on NIRS assay was confirmed using protein combustion data. The protein and oil QTLs identified in this study were compared to those previously detected on these same chromosomes. The genetic materials of this study based on two plant introductions with wider protein and oil contents resulted in the identification of a QTL on chromosome 2 accounting for up to 56% of variation for protein and 43% for oil content, which are larger than those from previous studies. Furthermore, the QTL region was narrowed down from 1.7 to 0.8 Mbp. The QTL on chromosome 15 was identified in the same region as previously verified QTLs on this chromosome, it accounted for up to 27.2% of variation for protein and up to 41% of variation for oil content.

Supporting information

S1 Table. Primer sequences of GBS SNP markers converted to KASP.

https://doi.org/10.1371/journal.pone.0286329.s001

(DOCX)

S2 Table. The type and the number of markers per chromosomes used for QTL mapping in the RIL population of PI 507429 x PI 399084.

https://doi.org/10.1371/journal.pone.0286329.s002

(DOCX)

S3 Table. 2018–2021 Seed protein and oil data.

https://doi.org/10.1371/journal.pone.0286329.s003

(XLSX)

S4 Table. Genotypic data for chromosomes 2 and 15.

https://doi.org/10.1371/journal.pone.0286329.s004

(XLSX)

References

  1. 1. Phansak P, Soonsuwon W, Hyten DL, Song Q, Cregan PB, Graef GL, et al. Multi-population selective genotyping to identify soybean [Glycine max (L.) Merr.] seed protein and oil QTLs. G3 (Bethesda). 2016;6(6): 1635–48. Epub 2016/05/14. pmid:27172185
  2. 2. Patil G, Mian R, Vuong T, Pantalone V, Song Q, Chen P, et al. Molecular mapping and genomics of soybean seed protein: a review and perspective for the future. Theor Appl Genet. 2017;130(10): 1975–91. Epub pmid:28801731.
  3. 3. Brzostowski LF, Diers BW. Agronomic evaluation of a high protein allele from PI407788A on chromosome 15 across two soybean backgrounds. Crop Sci. 2017;57(6): 2972–8.
  4. 4. Cromwell GL. Soybean meal: An exceptional protein source. Ankenny, IA: Soybean Meal InfoCenter. 2017. Available from: https://www.soymeal.org/soy-meal-articles/soybean-meal-an-exceptional-protein-source/
  5. 5. Patil G, Vuong TD, Kale S, Valliyodan B, Deshmukh R, Zhu C, et al. Dissecting genomic hotspots underlying seed protein, oil, and sucrose content in an interspecific mapping population of soybean using high-density linkage mapping. Plant Biotechnol J. 2018;16(11): 1939–53. Epub pmid:29618164.
  6. 6. Wehrmann VK, Fehr WR, Cianzio SR, Cavins JF. Transfer of high seed protein to high-yielding soybean cultivars Crop Sci. 1987;27: 927–31.
  7. 7. Wilcox JR. Increasing seed protein in soybean with eight cycles of recurrent selection. Crop Sci. 1998;38: 1536–40.
  8. 8. Cober ER, Voldeng HD. Developing high-protein, high-yield soybean populations and lines. Crop Science. 2000;40(1): 39–42.
  9. 9. Leffel RC, Rhodes WK. Agronomic performance and economic value of high-seed-protein soybean. J Prod Agric. 1993;6(3): 365–8.
  10. 10. Hartwig EE, Kuo TM, Kenty MM. Seed protein and its relationship to soluble sugars in soybean. Crop Sci. 1997;37(3): 770–3.
  11. 11. Eskandari M, Cober ER, Rajcan I. Genetic control of soybean seed oil: II. QTL and genes that increase oil concentration without decreasing protein or with increased seed yield. Theor Appl Genet. 2013;126(6): 1677–87. Epub 2013/03/29. pmid:23536049
  12. 12. Wilcox JRaJFC. Backcrossing high seed protein to a soybean cultivar. Crop Sci. 1995;35: 1036–41.
  13. 13. Chung JHL, Babka G.L., Graef P.E., Staswick D.J., Lee P.B., Cregan R.C., Shoemaker J.E., Specht. The seed protein, oil, and yield QTL on soybean linkage group I. Crop Sci. 2003;43: 1053–67.
  14. 14. Hymowitz T, Collins FI, Panczner J, Walker WM. Relationship between the content of oil, protein, and sugar in soybean seed. Agron J. 1972;64: 613–5.
  15. 15. Piper EL, Boote KJ. Temperature and cultivar effects on soybean seed oil and protein concentrations. J Am Oil Chem Soc. 1999;76(10): 1233–41.
  16. 16. Clemente TE, Cahoon EB. Soybean oil: genetic approaches for modification of functionality and total content. Plant Physiol. 2009;151(3): 1030–40. Epub pmid:19783644.
  17. 17. Diers BW, Keim P, Fehr WR, Shoemaker RC. RFLP analysis of soybean seed protein and oil content. Theor Appl Genet. 1992;83(5): 608–12. Epub 1992/03/01. pmid:24202678
  18. 18. Wang X, Jiang GL, Green M, Scott RA, Song Q, Hyten DL, et al. Identification and validation of quantitative trait loci for seed yield, oil and protein contents in two recombinant inbred line populations of soybean. Mol Genet Genomics. 2014;289(5): 935–49. Epub pmid:24861102.
  19. 19. Bandillo N, Jarquin D, Song Q, Nelson R, Cregan P, Specht J, et al. A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Genome. 2015;8(3). pmid:33228276
  20. 20. Warrington CV, Abdel-Haleem H, Hyten DL, Cregan PB, Orf JH, Killam AS, et al. QTL for seed protein and amino acids in the Benning x Danbaekkong soybean population. Theor Appl Genet. 2015;128(5): 839–50. Epub
  21. 21. Lee S, Van K, Sung M, Nelson R, LaMantia J, McHale LK, et al. Genome-wide association study of seed protein, oil and amino acid contents in soybean from maturity groups I to IV. Theor Appl Genet. 2019;132(6): 1639–59. Epub pmid:30806741.
  22. 22. Zhang T, Wu T, Wang L, Jiang B, Zhen C, Yuan S, et al. A combined linkage and GWAS analysis identifies QTLs linked to soybean seed protein and oil content. Int J Mol Sci. 2019;20(23). Epub pmid:31775326.
  23. 23. Kumar V, Vats S, Kumawat S, Bisht A, Bhatt V, Shivaraj SM, et al. Omics advances and integrative approaches for the simultaneous improvement of seed oil and protein content in soybean (Glycine max L.). CRC Crit Rev Plant Sci. 2021;40(5): 398–421.
  24. 24. Patil G, Valliyodan B, Deshmukh R, Prince S, Nicander B, Zhao M, et al. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis. BMC Genom. 2015;16: 520. Epub pmid:26162601.
  25. 25. Miao L, Yang S, Zhang K, He J, Wu C, Ren Y, et al. Natural variation and selection in GmSWEET39 affect soybean seed oil content. New Phytol. 2020;225(4): 1651–66. Epub pmid:31596499.
  26. 26. Zhang H, Goettel W, Song Q, Jiang H, Hu Z, Wang ML, et al. Selection of GmSWEET39 for oil and protein improvement in soybean. PLoS Genet. 2020;16(11): e1009114. Epub pmid:33175845.
  27. 27. Fliege CE, Ward RA, Vogel P, Nguyen H, Quach T, Guo M, et al. Fine mapping and cloning of the major seed protein quantitative trait loci on soybean chromosome 20. Plant J. 2022;110(1): 114–28. Epub pmid:34978122.
  28. 28. Goettel W, Zhang H, Li Y, Qiao Z, Jiang H, Hou D, et al. POWR1 is a domestication gene pleiotropically regulating seed quality and yield in soybean. Nat Commun. 2022;13(1): 3051. Epub pmid:35650185.
  29. 29. Wilson RF. Seed Composition. In: Boerma HR, Specht JE, editors. Soybeans: Improvement, Production, and Uses, 3rd Edition. Madison: ASA, CSSA and SSSA; 2004. pp. 621–77.
  30. 30. Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW. Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. Proc Natl Acad Sci U S A. 1984;81(24): 8014–8. pmid:6096873
  31. 31. Nyquist WE. Estimation of heritability and prediction of selection response in plant-populations.CRC Crit Rev Plant Sci. 1991;10(3): 235–322.
  32. 32. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE. 2011;6(5): e19379. pmid:21573248
  33. 33. Valliyodan B, Cannon SB, Bayer PE, Shu S, Brown AV, Ren L, et al. Construction and comparison of three reference-quality genome assemblies for soybean. Plant J. 2019;100(5): 1066–82. Epub 2019/08/23. pmid:31433882
  34. 34. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14): 1754–60. Epub 2009/05/20. pmid:19451168
  35. 35. Garrison E, Marth GT. Haplotype-based variant detection from short-read sequencing (2012). arXiv:1207:3907.
  36. 36. Puritz JB, Hollenbeck CM, Gold JR. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ. 2014;2: e431. pmid:24949246
  37. 37. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15): 2156–8. Epub 2011/06/10. pmid:21653522
  38. 38. Song Q, Yan L, Quigley C, Fickus E, Wei H, Chen L, et al. Soybean BARCSoySNP6K: An assay for soybean genetics and breeding research. Plant J. 2020;104(3): 800–11. Epub 2020/08/11. pmid:32772442
  39. 39. Song Q, Hyten DL, Jia G, Quigley CV, Fickus EW, Nelson RL, et al. Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS One. 2013;8(1): e54985. Epub 2013/02/02. pmid:23372807
  40. 40. Diwan N, Cregan PB. Automated sizing of fluorescent-labeled simple sequence repeat (SSR) markers to assay genetic variation in soybean. Theor Appl Genet. 1997;95:723–33.
  41. 41. van Ooijen JW. JoinMap 4.0: Software for the Calculation of Genetic Linkage Maps in Experimental Populations. Wageningen, Netherlands: Kyasma B.V.; 2006.
  42. 42. van Ooijen JW. MapQTL 5: Software for the mapping of quantitative trait loci in experimental populations. Wageningen, Netherlands: Kyasma B.V.; 2004.
  43. 43. Hyten DL, Pantalone VR, Sams CE, Saxton AM, Landau-Ellis D, Stefaniak TR, et al. Seed quality QTL in a prominent soybean population. Theor Appl Genet. 2004;109(3): 552–61. Epub pmid:15221142.
  44. 44. Mao T, Jiang Z, Han Y, Teng W, Zhao X, Li W, et al. Identification of quantitative trait loci underlying seed protein and oil contents of soybean across multi-genetic backgrounds and environments. Plant Breed. 2013;132(6): 630–41.
  45. 45. Zhang YH, Liu MF, He JB, Wang YF, Xing GN, Li Y, et al. Marker-assisted breeding for transgressive seed protein content in soybean [Glycine max (L.) Merr]. Theor Appl Genet. 2015;128(6): 1061–72. Epub pmid:25754423.
  46. 46. Grassini P, Cafaro La Menza N, Rattalino Edreira JI, Monzón JP, Tenorio FA, Specht JE. Soybean. In: Sadras VO, Calderini DF, editors. Crop Physiology Case Histories for Major Crops: Academic Press; 2021. pp. 282–319.
  47. 47. Kim M, Schultz S, Nelson RL, Diers BW. Identification and fine mapping of a soybean seed protein QTL from PI 407788A on chromosome 15. Crop Sci. 2016;56(1): 219–25.
  48. 48. Lee SH, Bailey MA, Mian MA, Carter TE Jr., Shipe ER, Ashley DA, et al. RFLP loci associated with soybean seed protein and oil content across populations and locations. Theor Appl Genet. 1996;93(5–6): 649–57. Epub 1996/10/01. pmid:24162390
  49. 49. Chen QS, Zhang ZC, Liu CY, Xin DW, Qiu HM, Shan DP, et al. QTL analysis of major agronomic traits in soybean. Agr Sci China. 2007;6(4): 399–405.
  50. 50. Brummer EC, Graef GL, Orf J, Wilcox JR, Shoemaker RC. Mapping QTL for seed protein and oil content in eight soybean populations. Crop Sci. 1997;37(2): 370–8.
  51. 51. Tajuddin T, Watanabe S, Yamanaka N, Harada K. Analysis of quantitative trait loci for protein and lipid contents in soybean seeds using recombinant inbred lines. Breed Sci. 2003;53(2): 133–40.
  52. 52. Shi A, Chen P, Zhang B, Hou A. Genetic diversity and association analysis of protein and oil content in food-grade soybeans from Asia and the United States. Plant Breed. 2010;129(3): 250–6.
  53. 53. Panthee DR, Pantalone VR, West DR, Saxton AM, Sams CE. Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Sci. 2005;45(5): 2015–22.
  54. 54. Qi ZM, Wu Q, Han X, Sun Y, Du XY, Liu CY, et al. Soybean oil content QTL mapping and integrating with meta-analysis method for mining genes. Euphytica. 2011;179(3): 499–514.
  55. 55. Gillenwater JH, McNeece BT, Taliercio E, Mian MAR. QTL mapping of seed protein and oil traits in two recombinant inbred line soybean populations. J Crop Improv. 2022;36(4): 539–54.
  56. 56. Kabelka EA, Diers BW, Fehr WR, LeRoy AR, Baianu IC, You T, et al. Putative alleles for increased yield from soybean plant introductions. Crop Sci. 2004;44(3): 784–91.
  57. 57. Qi ZM, Hou M, Han X, Lu CY, Jiang HW, Xin DW, et al. Identification of quantitative trait loci (QTLs) for seed protein concentration in soybean and analysis for additive effects and epistatic effects of QTLs under multiple environments. Plant Breed. 2014;133(4): 499–507.
  58. 58. Wolf RB, Cavins JF, Kleiman R, Black LT. Effect of temperature on soybean seed constituents—oil, protein, moisture, fatty-acids, amino-acids and sugars. J Am Oil Chem Soc. 1982;59(5): 230–2.
  59. 59. Dornbos DL, Mullen RE. Soybean seed protein and oil contents and fatty-acid composition adjustments by drought and temperature. J Am Oil Chem Soc. 1992;69(3): 228–31.
  60. 60. Mao TT, Jiang ZF, Han YP, Teng WL, Zhao X, Li WB. Identification of quantitative trait loci underlying seed protein and oil contents of soybean across multi-genetic backgrounds and environments. Plant Breed. 2013;132(6): 630–41.
  61. 61. Pathan SM, Vuong T, Clark K, Lee JD, Shannon JG, Roberts CA, et al. Genetic mapping and confirmation of quantitative trait loci for seed protein and oil contents and seed weight in soybean. Crop Sci. 2013;53(3): 765–74.
  62. 62. Hwang EY, Song Q, Jia G, Specht JE, Hyten DL, Costa J, et al. A genome-wide association study of seed protein and oil content in soybean. BMC Genom. 2014;15: 1. Epub 2014/01/03. pmid:24382143
  63. 63. Seo JH, Kim KS, Ko JM, Choi MS, Kang BK, Kwon SW, et al. Quantitative trait locus analysis for soybean (Glycine max) seed protein and oil concentrations using selected breeding populations. Plant Breed. 2019;138(1): 95–104.
  64. 64. Zhu XT, Leiser WL, Hahn V, Wurschum T. Identification of seed protein and oil related QTL in 944 RILs from a diallel of early-maturing European soybean. Crop J. 2021;9(1): 238–47.