Fig 1.
Distribution of adjusted tolerance index among the 234 soybean accessions.
Table 1.
ANOVA table for tolerance index based on biomass reduction under SCN infestation among 234 soybean genotypes.
Table 2.
SNP profiling used for GWAS for tolerance to SCN infection based on biomass reduction.
Fig 2.
Manhattan plots and QQ-plots for tolerance indexes based on biomass reduction under SCN infestation.
The x-axis of each Manhattan plot represented the chromosome number, whereas the y-axis denoted the LOD (-log10(p-value)). Color coding on the Manhattan plot is chromosome-wise. The x-axis of each QQ-plot represented the expected -log10(p-value), whereas the y-axis displayed the observed -log10(p-value). A: Manhattan plot and QQ-plot resulted from the single marker regression model (SMR). B: Manhattan plot and QQ-plot obtained using the generalized linear model (GLM(PCA)). C: Manhattan plot and QQ-plot generated by the mixed liner model (MLM(PCA+K)).
Table 3.
Significant SNPs (LOD>3.00) associated with tolerance to biomass reduction under SCN infestation using a Single Marker Regression (SMR), Generalized Linear Model_PCA (GLM_(PCA)), and Mixed Liner Model_PCA_K(MLM_(PCA+K)) models.
Table 4.
Overlapping significant SNP markers (LOD>3.00) between the Single Marker Regression (SMR), Generalized Linear Model_PCA (GLM_(PCA)), and Mixed Liner Model_PCA_K(MLM_(PCA+K)) models.
Table 5.
Genotypic count for the top 78 soybean accessions with the highest tolerance index under SCN infestation, top 78 soybean accessions having the lowest tolerance index under SCN infestation, and selection accuracy and efficiency for the SNPs associated with tolerance index based on biomass reduction under SCN infestation.
Fig 3.
Boxplots showing genomic selection accuracy for SCN tolerance index for biomass reduction under SCN infestation using 5 statistical models: Bayesian Lasso regression (BLR), genomic best linear unbiased predictor (gBLUP), random forest (RF), ridge regression best linear unbiased predictor (rrBLUP), and support vector machines (SVMs).
For each model, cross-validation was conducted using different levels (2-fold, 3-fold, 4-fold, 5-fold, 6-fold, and 7-fold) in order to assess the effect of population training size on genomic selection accuracy. At each level of cross-validation, SNP set consisting of all SNPs and SNPs with an LOD greater than 2 based on the GWAS on analysis were used for conducting genomic selection. SMR_SNPs denoted the SNPs from the single marker regression model, GLM_PCA_SNPs represented the SNPs from the generalized linear model, and MLM_PCA_K_SNPs corresponded to the SNPs from the mixed linear model in GWAS. Box plot color coding in the above figure is SNP set-wise. Genomic selection was conducted using a total of 100 replications and empty dots were outliers.
Table 6.
Genomic selection accuracy of tolerance index based on biomass reduction under SCN infestation using 5 statistical models (rrBLUP: ridge regression best linear unbiased predictor, gBLUP: genomic best linear unbiased predictor, BLR: Bayesian Lasso regression, RF: random forest, and SVMs: support vector machines), four SNP sets (all SNPs, SMR_SNPs, MLM_PCA_SNPs, and MLM_PCA_K_SNPs), and different levels of cross-validation (2-fold, 3-fold, 4-fold, 5-fold, 6-fold, and 7-fold) with a total of 100 replications each.