Marco Antônio Rott de Oliveira and Wilson Higashi are employees of COODETEC, and Ivan Schuster is an employee of Dow Agrosciences. There aren't any patents, products in development or marketed products to declare in relation to this study. Commercial affiliation does not alter our adherence to PLOS ONE's policies on sharing data and materials.
Mapping quantitative trait loci through the use of linkage disequilibrium (LD) in populations of unrelated individuals provides a valuable approach for dissecting the genetic basis of complex traits in soybean (
One of the most important crops for global production of vegetable protein and oil is Soybean (
In order to enforce improvement in crops, SNP markers have turned out to be a potential tool in soybean breeding programs [
Looking at LD from an analytical point of view, it has been seen that it is best described using the haplotype-block approach [
It could be beneficial for GWAS to use haplotype information in making marker-phenotype associations [
The association panel consisted of 169 genotypes that represent the core cultivars used by Brazilian farmers from 1990 to 2010, and some of these were key progenitors in soybean breeding programs of Brazil. The cultivars were field-evaluated in four sites of southern Brazil: Cascavel (24°52'55"S 53°32'30"W), Palotina (24°21'07"S 53°45'25"W), Primavera do Leste (15°34'38"S 54°20'42"W) and Rio Verde (17°45'49"S 51°01'49"W) (Table A in
The cultivars were genotyped with 6,000 single nucleotide polymorphisms (SNP) using the Illumina BARCSoySNP6K BeadChip, which corresponds to a subset of SNPs from the SoySNP50K BeadChip [
941 haplotype blocks (characterized from the 3,780 SNPs) were used in this genome-wide association study (Table B in
A Bayesian model-based method implemented in the program InStruct [
The following agronomic traits were measured and field-evaluated in the growing season 2012/2013: Seed yield (SY), 100-Seed Weight (SW) and Plant Height (PH). A mixed linear model was employed for phenotypic data analysis using the MIXED procedure in SAS (SAS Institute, Inc., Cary, NC). The model that represents the combined data analysis was the following:
Correlations among traits were determined following the method described by Holland et al. [
AEM values were used to perform single-SNP analysis and then haplotype-based genome-wide association for the traits under consideration. In order to take into account the effects of population structure and genetic relatedness among the cultivars, the following unified mixed-model [
Additionally, the genomic regions (or SNPs in haplotypes blocks) identified in this study were compared to the genomic locations of QTLs previously reported for the traits under study. Genes, QTLs and markers annotated in Glyma1.01 and NCBI RefSeq gene models in SoyBase (
Analysis of variance indicated that the effects of genotype (G), environment (E) and their interaction (G × E) were statistically significant (p < 0.01) for all three traits under study (SY, SW and PH). This result is in agreement with the mixed model analysis, in which the 169 cultivars presented significant differences at P < 0.01 in all traits. The statistical results of fixed effects for the complex traits are summarized in
Data are presented as phenotypic means with standard deviations in parentheses.
Trait | Environment | Mean squares | |||||
---|---|---|---|---|---|---|---|
Cascavel | Palotina | Primavera | Rio Verde | E | G×E | G | |
SY | 2322 (779) | 1037 (381) | 1890 (735) | 2535 (839) | 219490 |
220491 |
52737 |
PH | 104 (18) | 89 (21) | 49 (12) | 57 (14) | 32.6 |
75.4 |
158.3 |
SW | 12 (1.9) | 11 (1.2) | 13 (1.8) | 12 (1.4) | 0.78 |
0.69 |
1.36 |
**Significant at the 0.01 probability level according to type III tests of fixed effects; G, genotype; E, environment; G×E, genotype-by-environment interaction.
Estimates of correlation coefficients among traits are shown in
Environment | Trait | SY | SW |
---|---|---|---|
Cascavel | SW | 0.47 |
|
PH | -0.39 |
-0.18ns | |
Palotina | SW | 0.37 |
|
PH | -0.02ns | -0.03ns | |
Primavera do Leste | SW | 0.29 |
|
PH | 0.51 |
-0.20ns | |
Rio Verde | SW | 0.07ns | |
PH | 0.54 |
-0.49ns |
** Significant at the 0.01 probability level; ns, not significant.
In the present study, population structure of a soybean association panel consisting of 169 cultivars was investigated using a Bayesian clustering approach and a core set of SNP markers. According to the average log (likelihood) and the deviance information criterion (from the posterior Bayesian clustering analysis), the most probable number of subpopulations is nine (Fig A in
For model fit evaluation of mixed linear models with Q (structure) and K (kinship) matrices, the results based on Bayesian information criterion consistently showed a better fit for the (Q + K) model over the model that consider either Q or K alone (Table B in
Six SNPs were significantly associated with SY on three chromosomes across two locations (
Manhattan plots of GWAS for seed yield (SY) evaluated in a soybean association mapping panel across the following environments of southern Brazil, A) Cascavel, B) Palotina, C) Primavera do Leste and D) Rio Verde. Negative log10-transformed P-values of SNPs from a genome-wide scan for SY using a mixed linear model that includes both kinship and populations structure are plotted against positions on each of the 20 chromosomes. The significant SNPs associated with the trait (P > 3.0 × 10−3) are distinguished by the threshold line.
In Cascavel, the significant SNP ss715613203 (SY) was located in the same linkage disequilibrium block Gm12_Hap12 with the SNP ss715613192, ss715613207 and ss715613219. For this reason, this SNP is in linkage disequilibrium with the same genes and proteins associated with this LD block: Gm12_Hap12, i.e., uncharacterized gene LOC102667945 and the putative gene glyma12g075700 annotated as a double-stranded RNA-binding protein 2-like, which encodes a ribonuclease III protein (
Gm12_Hap12 is in the same region of gene glyma12g075700 annotated as a double-stranded RNA-binding protein 2-like, which encodes a Ribonuclease III protein (BT097697). Glyma12g075600 is another gene near of this LD block region (Gm12_Hap12) which encodes protein for senescence regulator in soybean. SSR markers have been involved to seed protein synthesis (Liang
Position (bp) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Environment | Chr | Start | End | Hap ID |
HapA |
HF |
R2 (%) | SY |
Other nearby QTLs and genes |
Cascavel | |||||||||
12 | 5610868 | 6023395 | Gm12_Hap42a | TAAT | 42 | 12.1 | 2566.5 a | Ribonuclease III;satt568; satt442 and satt192 |
|
Gm12_Hap42b | TAAC | 62 | 2380.3 a | ||||||
Gm12_Hap42c | CGGT | 36 | 1929.4 b | ||||||
13 | 28918187 | 28957669 | Gm13_Hap36a | CT | 34 | 3.5 | 2436.5 a | Putative germinal-center associated nuclear protein-like | |
Gm13_Hap36b | AT | 74 | 2418.8 a | ||||||
Gm13_Hap36c | AC | 18 | 2136.4 ab | ||||||
Gm13_Hap36d | CC | 13 | 1725.9 ab | ||||||
Rio Verde | |||||||||
6 | 15115808 | 15242800 | Gm6_Hap29a | CC | 2 | 21.0 | 3508.0 a | - | |
Gm6_Hap29b | TC | 25 | 3305.6 a | - | |||||
Gm6_Hap29c | CT | 16 | 2761.6 a | - | |||||
Gm6_Hap29d | TT | 104 | 2446.4 b | - |
* Hap ID = Haplotype identification; HapA = haplotype alleles.
a HF = Haplotype frequency: the number of cultivars with the respective haplotype.
b The average over the frequency of cultivars for each environment and the statistical difference.
** satt568 and satt442 from Liang et al. [
¶ Genes nearby of the haplotype block.
Position (bp) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Environment | Chr | Start | End | Hap ID |
HapA |
HF |
R2(%) | SW |
Other nearby QTLs and genes |
Cascavel | |||||||||
5 | 9012813 | 9097414 | Gm5_Hap10a | AA | 19 | 13.8 | 12.5 a | glyma05g09390 | |
Gm5_Hap10b | GG | 135 | 11.7 a | ||||||
Palotina | |||||||||
12 | 5610878 | 6023395 | Gm12_Hap42b | TAAC | 62 | 31.2 | 11.5 a | Ribonuclease III |
|
Gm12_Hap42a | TAAT | 42 | 11.4 a | ||||||
Gm12_Hap42c | CGGT | 36 | 10.5 b | ||||||
Primavera do Leste | |||||||||
11 | 5065170 | 5238788 | Gm11_Hap13a | AA | 76 | 13.2 | 11.8 a | - | |
Gm11_Hap13b | GA | 22 | 12.4 a | - | |||||
7 | 6604493 | 7096376 | Gm7_Hap13a | GGCGAGG | 20 | 14.8 | 13.3 a | Glyma07g076800 | |
Gm7_Hap13b | GGCAAAT | 2 | 12.7 a | ||||||
Gm7_Hap13c | GGCAGAG | 2 | 12.6 a | ||||||
Gm7_Hap13d | AATAGAG | 15 | 12.2 a | ||||||
Gm7_Hap13e | AATAAAT | 66 | 12.2 a | ||||||
Gm7_Hap13f | GACAGAG | 9 | 12.0 ab | ||||||
Gm7_Hap13g | GGCAAGG | 19 | 11.8 abc | ||||||
12 | 5610878 | 6023395 | Gm12_Hap42b | TAAC | 62 | 21.8 | 12.8 a | - | |
Gm12_Hap42a | TAAT | 42 | 12.3 a | - | |||||
Gm12_Hap42c | CGGT | 36 | 11.9 a | - |
* Hap ID = Haplotype identification; HapA = haplotype alleles.
a HF = Haplotype frequency: the number cultivars with the respective haplotype.
b The average over the frequency of cultivars for each environment and the statistical difference.
** satt568 and satt442 from Liang et al. [
¶ Genes nearby of the haplotype block.
Seven SNPs were significantly associated with SW on chromosomes 5, 7, 11 and 12 across the locations under study (
Manhattan plots of GWAS for 100-seed weight (SW) evaluated in a soybean association mapping panel across the following environments of southern Brazil, A) Cascavel, B) Palotina, C) Primavera do Leste and D) Rio Verde. Negative log10-transformed P-values of SNPs from a genome-wide scan for SW using a mixed linear model that includes both kinship and populations structure are plotted against positions on each of the 20 chromosomes. The significant associations (P > 3.0 × 10−3) are distinguished by the threshold line.
The SNPs of the Gm12_Hap12 were associated to SW in Palotina and Primavera do Leste (Tables
One-hundred seed weight (SW) is one of the major yield components having direct effect on the final seed yield. For this trait, the proportion of phenotypic variance explained by a single genomic region found in this study was 9.92% in Cascavel (SNPs ss715592623 and ss715592632). In Palotina, the phenotypic variation ranged from 12.33% (ss715613104) to 13.31% (ss715613203). In Primavera do Leste, marker-SW associations explained from 8.92% (ss715613203) to 10.08% (ss715610817) of the phenotypic variation (Table D in
Twenty-eight SNPs were significantly associated with PH across the four locations (Table E in
Manhattan plot of GWAS for plant height (PH) evaluated in a soybean association mapping panel across the following environments of southern Brazil, A) Cascavel, B) Palotina, C) Primavera do Leste and D) Rio Verde. Negative log10-transformed P-values of SNPs from a genome-wide scan for PH using a mixed linear model that includes both kinship and populations structure are plotted against positions on each of the 20 chromosomes. The significant associations (P > 3.0 × 10−3) are distinguished by the threshold line.
In Palotina, the SNP markers ss715635224 and ss715603983, located on chromosomes 19 and 9, respectively, showed no entry with genes and/or molecular markers related to PH in soybean [
In Primavera do Leste, the SNP markers ss715619979, ss715637964 and ss715637991 were located on intergenic regions and showed no encoded genes related to plant height [
The haplotype block 42, associated to PH on Chr19 (Gm19_Hap42), is a region containing the
Gm19_Hap42 was associated with PH, SY and SCN in soybean. QTLs are in the same genomic region of gene Glyma19g37890 (Dt1 or GmFLT1), which is involved in the stem growth habit in soybean. Gene Glyma19g194500 encodes an abscisic acid-insensitive protein; Glyma19g38160 encodes a beta-fructofuranosidase isoenzyme and Glyma19g196000 encodes a spindly-related enzyme. Bottom panel depicts haplotypes regions of 494 (Gm19_Hap42) and 163 kb (Gm19_Hap43) associated with the mentioned traits (Red color intensity indicates the intensity of r2, i.e., higher color intensity means higher r2).
Position (bp) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Environment | Chr | Start | End | Hap ID |
HapA |
HF |
R2 (%) | PH |
Other nearby QTLs and genes |
Cascavel | |||||||||
19 | 44761515 | 45255796 | Gm19_Hap42a | AATxAA | 34 | 91.4 | 111.62 a | Sd yld 11–6 |
|
Gm19_Hap42b | GCCGGG | 110 | 101.18 b | ||||||
Gm19_Hap42c | ACCGGG | 2 | 83.75 b | ||||||
19 | 45361938 | 45525374 | Gm19_Hap43a | GTA | 2 | 44.1 | 121.25 a | - | |
Gm19_Hap43b | ATA | 34 | 112.28 a | ||||||
Gm19_Hap43c | GCG | 111 | 100.98 ab | ||||||
Gm19_Hap43d | ACG | 2 | 90.00 ab | ||||||
19 | 32194361 | 32318695 | Gm19_Hap20a | CG | 57 | 17.3 | 107.68 a | LOC100789162 | |
Gm19_Hap20b | TA | 87 | 105.53 a | ||||||
18 | 61175038 | 61450878 | Gm18_Hap71a | ATGG | 7 | 22.2 | 115.36 a | LOC100787543 | |
Gm18_Hap71b | ATAT | 76 | 109.13 ab | ||||||
Gm18_Hap71c | ATAG | 15 | 108.67 abc | ||||||
Gm18_Hap71d | GCGG | 31 | 99.44 abc | ||||||
Gm18_Hap71e | GTGG | 9 | 94.70 bc | ||||||
19 | 39686084 | 40143590 | Gm19_Hap34a | TGAT | 13 | 9.1 | 108.65 a | LOC100786140 | |
Gm19_Hap34b | TGGC | 3 | 107.50 a | ||||||
Gm19_Hap34c | CGGC | 23 | 107.28 a | ||||||
Gm19_Hap34d | TTAT | 25 | 101.40 a | ||||||
Gm19_Hap34e | TTGC | 70 | 100.38 a | ||||||
15 | 48653554 | 48727813 | Gm15_Hap45a | CC | 81 | 18.5 | 109.33 a | LOC100804065 | |
Gm15_Hap45b | AC | 6 | 105.00 a | ||||||
Gm15_Hap45c | AT | 64 | 100.08 ab | ||||||
Gm15_Hap45d | CT | 2 | 90.00 ab | ||||||
3 | 38761991 | 38976026 | Gm3_Hap32a | TAAT | 51 | 33.2 | 108.87 a | - | |
Gm3_Hap32b | GGCT | 29 | 105.26 a | ||||||
Gm3_Hap32c | GGCC | 49 | 104.92 a | ||||||
Gm3_Hap32d | GGAT | 4 | 100.63 a | ||||||
Palotina | |||||||||
19 | 44761515 | 45255796 | Gm19_Hap42a | AATxAA | 34 | 96.0 | 107.03 a | - |
|
Gm19_Hap42b | ACCGGG | 2 | 85.00 ab | ||||||
Gm19_Hap42c | GCCGGG | 110 | 78.33 b | ||||||
19 | 45361938 | 45525374 | Gm19_Hap43b | ATA | 34 | 52.8 | 106.88 a | - | |
Gm19_Hap43a | GTA | 2 | 105.00 ab | ||||||
Gm19_Hapd | ACG | 2 | 80.00 ab | ||||||
Gm19_Hapc | GCG | 111 | 78.75 abc | ||||||
19 | 42812863 | 43117852 | Gm19_Hap38a | TA | 29 | 17.4 | 106.34 a | LOC100777767 | |
Gm19_Hap38b | TC | 7 | 83.00 b | ||||||
Gm19_Hap38c | CC | 113 | 82.30 bc | ||||||
9 | 38013391 | 38454149 | Gm9_Hap24a | AA | 59 | 12.0 | 94.87 a | - | |
Gm9_Hap24b | GG | 53 | 89.89 a | ||||||
Gm9_Hap24c | GA | 24 | 87.50 a | ||||||
Primavera do Leste | |||||||||
14 | 8027761 | 8527621 | Gm14_Hap21a | CGGGTA | 4 | 47.6 | 63.75 a | LOC100804944 | |
Gm14_Hap21b | CGGGGA | 37 | 55.39 a | ||||||
Gm14_Hap21c | CGTATA | 8 | 52.25 a | ||||||
Gm14_Hap21d | TTTAGA | 19 | 51.15 ab | ||||||
Gm14_Hap21e | TTTATA | 47 | 48.00 ab | ||||||
Gm14_Hap21f | CGTAGA | 2 | 46.25 abc | ||||||
Gm14_Hap21g | TTTAGG | 14 | 41.46 bc | ||||||
20 | 37857633 | 38195568 | Gm20_Hap24a | GGxTG | 16 | 27.6 | 66.56 a | LOC100810047 | |
Gm20_Hap24b | AATTG | 2 | 57.50 a | ||||||
Gm20_Hap24c | AATTA | 78 | 47.25 ab | ||||||
Gm20_Hap24d | AATCG | 2 | 44.75 b | ||||||
20 | 37211061 | 37410040 | Gm20_Hap23a | GC | 14 | 19.3 | 67.44 a | - | |
Gm20_Hap23b | AT | 140 | 48.04 b | ||||||
Gm20_Hap23c | GT | 2 | 44.75 c | ||||||
Rio Verde | |||||||||
5 | 41481303 | 41866018 | Gm5_Hap40a | TCCCG | 3 | 55.3 | 70.00 a | LOC100788304 | |
Gm5_Hap40b | CCCCG | 45 | 69.48 ab | ||||||
Gm5_Hap40c | TTTTG | 47 | 56.25 b | ||||||
Gm5_Hap40d | TTTTA | 33 | 52.86 b | ||||||
Gm5_Hap40e | CTTTG | 2 | . |
* HapID = Haplotype identification; HapA = haplotype alleles.
a HF = Haplotype frequency; the number of cultivars with the respective haplotype.
b The average over the frequency of cultivars for each environment and its statistical difference.
** Also associated in Palotina; Pl ht 13–8 and Pl ht 4–2 from Lee et al. [
¶ Genes nearby of the haplotype block.
The genome-wide haplotype association analysis (941 haplotypes) identified eleven, seventeen and fifty-nine SNP-based haplotypes significantly associated with SY, SW and PH, respectively. As expected, both the size (kb) and the number of SNPs by LD block were highly variable (Tables
For SY in Cascavel, the haplotypes TAAT (Gm12_Hap12a) and TAAC (Gm12_Hap12b) showed significant differences with the haplotype CGGT (Gm12_Hap12c). Gm12_Hap12a and Gm12_Hap12b had a mean value of 2567 kg ha-1 and 2381 kg ha-1, respectively, while the haplotype Gm12_Hap12c yielded a mean of 1929 kg ha-1, a yield 19% and 25% lower than the haplotypes Gm12_Hap12a and Gm12_Hap12b, respectively (
A discriminant haplotype was identified in a low frequency for PH in this association mapping panel, i.e. the haplotype Gm19_Hap42b in which the plants had a mean of 83.8 and 85.0 cm of height in Cascavel and Palotina, respectively. In both environments, this haplotype showed statistical difference with the haplotype responsible for produce tallest plants (Gm19_Hap42a). Together, these haplotypes explained a phenotypic variation of 91.4% and 96% in Cascavel and Palotina, respectively (
This study was undertaken to identify genomic regions associated with key complex traits in soybean, using a genome-wide association approach. An advantage of using a genetically broad panel is the opportunity to explore alleles that could potentially be used in a marker-assisted selection context to improve agronomic traits in soybean. In fact, this GWAS approach employed the optimal mixed model identified valuable SNPs that were significantly associated with SY, SW and PH. In addition, to refine the association with SNPs markers, a haplotype-based analysis was performed to discover if these genomic regions were localized at the same haplotype blocks, and Williams 82 physical map. The soybean whole-genome sequence of SoyBase [
Genetic relatedness (or kinship) and population structure are known as the major confounding factors that may lead to spurious associations in GWAS [
SY had a positive and significant correlation with SW, which is in agreement with previous reports in soybean [
Many studies have demonstrated the power of GWAS to detect significant QTL in soybean populations. In this study, we highlight the importance of having haplotype maps of tropical soybean cultivars for marker-assisted selection (MAS). Moreover, according to Lorenz et al. [
Yield QTLs identified on chromosome 12 are of particular interest because they showed consistent effects across locations (Palotina, Primavera do Leste and Cascavel). Zhang et al. [
The SNP at 45 Mb on Chr19 associated with PH has been previously reported by Lee et al. [
Near to Dt1 gene, in the same haplotype Gm19_Hap42, was located the SPINDLY gene (SPY) (Glyma19g196000), which is considered to be a negative regulator of gibberellin (GA) signaling in
The significant G × E interaction explains the relatively low stability (or consistency) of the identified loci. Moreover, this result is important, because clearly justifies the inclusion of different environments (locations) in the GWAS. In fact, to obtain the real QTL with genetic stability and high phenotypic variation explained, different environments of the same material, QTL mapping and QTL geographic interactions should be used and explored [
Only three SNPs (ss715613203, ss715613104 and ss715613207) and one haplotype (Gm12_Hap12) were detected to be stable for SY and SW with high correlation between these two traits in the four environments under consideration, which was due to that agronomic traits are the result of the combined actions of multiple genes and environmental factors; with gene expression varying across environments [
In conclusion, with the aid of the haplotype block map constructed by Song et al. [
Some haplotypes contain SNP markers that were not detected in the single-marker analysis (i.e., SY: Gm13_Hap36; SW: Gm7_Hap13 and Gm12_Hap12; PH: Gm14_Hap21). This is attributed to the nature of the haplotype-based method, which can better detect functional haplotypes such as
The use of SNPs associated with quantitative trait loci under the allelic combination approach, for example, can be further used for the efficient marker assisted selection of complex traits [
Although SNP chips with higher density and next-generation sequencing may provide new data [
(DOCX)
Table A. Detailed information of SNPs used in the study (SNP name, chromosome position, and polymorphic alleles in the respective tag sequence, according to soybean reference genome V1.1;
(XLSX)
RICS thanks to Coordenação de aperfeiçoamento de pessoal de nível superior (CAPES) of Brazil and Programa de Pós-graduação em Genética e Melhoramento from the State University of Maringa for the research resources and scholarship.