Genome-wide association study and genomic selection for tolerance of soybean biomass to soybean cyst nematode infestation

Soybean cyst nematode (SCN), Heterodera glycines Ichinohe, is one of the most devastating pathogens affecting soybean production in the U.S. and worldwide. The use of SCN-resistant soybean cultivars is one of the most affordable strategies to cope with SCN infestation. Because of the limited sources of SCN resistance and changes in SCN virulence phenotypes, host resistance in current cultivars has increasingly been overcome by the pathogen. Host tolerance has been recognized as an additional tool to manage the SCN. The objectives of this study were to conduct a genome-wide association study (GWAS), to identify single nucleotide polymorphism (SNP) markers, and to perform a genomic selection (GS) study for SCN tolerance in soybean based on reduction in biomass. A total of 234 soybean genotypes (lines) were evaluated for their tolerance to SCN in greenhouse using four replicates. The tolerance index (TI = 100 × Biomass of a line in SCN infested / Biomass of the line without SCN) was used as phenotypic data of SCN tolerance. GWAS was conducted using a total of 3,782 high quality SNPs. GS was performed based upon the whole set of SNPs and the GWAS-derived SNPs, respectively. Results showed that (1) a large variation in soybean TI to SCN infection among the soybean genotypes was identified; (2) a total of 35, 21, and 6 SNPs were found to be associated with SCN tolerance using the models SMR, GLM (PCA), and MLM (PCA+K) with 6 SNPs overlapping between models; (3) GS accuracy was SNP set-, model-, and training population size-dependent; and (4) genes around Glyma.06G134900, Glyma.15G097500.1, Glyma.15G100900.3, Glyma.15G105400, Glyma.15G107200, and Glyma.19G121200.1 (Table 4). Glyma.06G134900, Glyma.15G097500.1, Glyma.15G100900.3, Glyma.15G105400, and Glyma.19G121200.1 are best candidates. To the best of our knowledge, this is the first report highlighting SNP markers associated with tolerance index based on biomass reduction under SCN infestation in soybean. This research opens a new approach to use SCN tolerance in soybean breeding and the SNP markers will provide a tool for breeders to select for SCN tolerance.

Introduction Soybean [Glycine max (L.) Merr.] is a widely grown legume with high oil and protein contents. The wild type Glycine soja Sieb. & Zucc. was used for soybean domestication [1]. Soybean is one of the most economically important cultivated legumes worldwide. The value of biofuel made from soybean was reported to exceed $35 billion in the United States (www.soystats. com). Increase in need for soybean production has been significant [2]. This requires the use of high-yielding soybean cultivars and the expansion of croplands for soybean cultivation. However, soybean production has been constrained by various factors. Soybean cyst nematode, Heterodera glycines Ichinohe, has been one of the most devastating biotic stresses affecting soybean production worldwide. Costs associated with soybean production loss due to SCN have exceeded $1.5 billion in the U.S. alone [3].
The soybean cyst nematode (SCN) is an obligate parasite and has been found in the most soybean-producing areas in the U.S. [4]. SCN feeds on soybean roots and uses soybean plants as a carbon source. This will cause a decrease in soybean biomass and suppress soybean yield [5]. SCN is difficult to control upon establishment in fields. One of the most effective ways to manage SCN is the use of SCN-resistant soybean cultivars and a non-host plant during crop rotation [6]. Therefore, developing new SCN-resistant soybean cultivars through breeding is of great importance.
Breeding for new SCN-resistant soybean cultivars requires a better understanding of the genetic mechanism conferring SCN resistance. A total of 216 QTLs in soybean have been identified to confer resistance to SCN (www.soybase.org). Of which, two loci were extensively investigated. These two loci consisted of rhg1 and Rhg4, which were mapped on chromosomes 18 and 8, respectively [7]. The soybean cultivar 'Forest' harbored both SCN-resistant QTLs, and Rhg4 is dominant [8]. This resistance comes from the Peking accession. The second type of resistance only requires rhg1 and this type of resistance comes from PI 88788 [9]. The Rhg4 locus contained a gene encoding for a serine hydroxymethyltransferase [10], whereas the genes within the rhg1 locus encoded for an amino acid transporter, an α-soluble N-ethylmaleimide-sensitive factor attachment protein (α-SNAP), and a wound-inducible domain protein (WI12) [11].
The development of cultivars enhanced with disease resistance has been time-efficiently achieved thanks to the use of molecular markers via marker-assisted selection (MAS) [12]. With the recent development of high-throughput sequencing technology, tools such as genome-wide association study (GWAS) and genomic selection (GS) have been proven to be powerful for investigating the genetic architecture of complex traits [13]. Previous studies demonstrated that GWAS can be used to efficiently identify single nucleotide polymorphism (SNP) markers associated with SCN resistance in soybean. A total of 7 SNP markers were reported to be associated with resistance to SCN HG type 0 using GWAS [14]. In addition, two new candidate genes, FGAM1 and Glyma18g46201, were identified to be associated with SCN resistance [14]. In another GWAS, a total of 440 soybean accessions were phenotyped for resistance to SCN HG type 0 and HG type 1.2.3.5.7, and 19 SNP markers were shown to be associated with SCN resistance [15]. In addition, a GWAS conducted on a panel consisting of 553 soybean accessions identified a total of 8 new loci contributing to resistance to SCN [16]. Predictive breeding using genomic selection has brought considerable attention over the past few years. Genomic selection (GS) was reported to be more efficient over marker-assisted selection (MAS) for SCN resistance in soybean [14]. The earliest GS study for SCN resistance suggested an accuracy in the range of 0.59 to 0.67 for predicting SCN resistance in soybean [14].
The current commercial U.S. soybean cultivars have a narrow genetic background [1]. Due to this narrow genetic background, the current soybean germplasm would be vulnerable to nematode infestation [7,17]. This could be addressed by diversifying the source of resistance to nematodes and by investigating potential new loci conferring SCN resistance. Evaluating tolerance index based on biomass reduction under SCN infestation and characterizing loci affecting such trait could lead to a new approach for breeding SCN tolerance in soybean. Therefore, the objectives of this study were (i) to conduct GWAS for tolerance of soybean biomass to SCN infection, (ii) to identify SNP markers associated with SCN tolerance based on decrease in plant biomass, and (iii) to perform GS for soybean tolerance to SCN. The trait of SCN tolerance is different from the trait of SCN resistance in that a tolerant soybean can support good SCN reproduction but suffer little damage from the SCN infection, while SCN-resistant soybean does not support SCN reproduction.

Plant materials and phenotyping
The association panel investigated in this study consisted of 234 soybean accessions (S1 Table) from the panel of 288 lines used for a previous GWAS of SCN resistance [14]. A large number of these accessions were selection from the University of Minnesota soybean breeding program and 9 were Plant Introductions (PIs). Ten lines of the panel were resistant and 6 lines were moderately resistant to SCN HG Type 0 (race 3) with the resistance from PI 88788 (rhg1). 'MN0095' was used as a susceptible check and a few lines derived from PI 88788 harboring SCN-resistant genes were used as resistant checks [14,18].
SCN phenotyping was carried out in the greenhouse of the University of Minnesota St. Paul campus. Soil without SCN infestation collected from a soybean field was mixed with sand at a 2:1 ratio, and 1.5 kg of the soil-sand mixture was placed in 1-galon plastic bags. The natural field soil rather than the sterilized soil was used because the data from the natural field soil would be better for extrapolating the results to the field setting. Particularly, we considered the importance of rhizobium for the soybean growth, and the natural field soil can support sufficient rhizobium development. The soil from each bag was used in one 16-cm-diam clay pot. Soybean cyst nematode HG Type 1.2.3.5.6.7 (race 4), which can reproduce well on the lines containing resistance genes from PI 88788, was used. The SCN eggs at a density of 10,000 eggs/100 cm 3 of soil and diluted into 10 ml water were added into the soil in each SCN pot. Ten soybean seeds were placed on the surface each pot and the seeds were covered with the remaining soil. Four replicated pots were included for each soybean accession in both SCN and no-SCN treatments. The two pots (SCN and no-SCN) of the same soybean line were placed together to minimize the environmental difference between the SCN and no-SCN treatments within a genotype. Due to the large number of lines and limitation of the space of the greenhouse, this experiment was conducted at four different times with approximately 60 lines per time in the same greenhouse. Although lines of each replicate were arranged in a randomized block (S1 and S2 Figs), the experiment was considered complete randomized design because the lines were evaluated in four groups at four different times. The distances between each two pots were about 10 cm (S2 Fig).
After 5 days, the plants were thinned to provide five plants per pot. At 65 days after planting, the average total dry shoot biomass of the five plants in each pot under non-SCN infestation or with SCN infestation was measured. Tolerance index for biomass was computed using the following formula [19].
Tolerance Index (TI) = 100 × (Biomass under SCN infestation/Biomass without SCN infestation) TI values were adjusted to that of 'MN0095', used as susceptible controls, in order to minimize environmental effects within and between runs. In each run, there were two sets of 'MN0095' pots with total of 8 pairs of pots inoculated with SCN or not inoculated. TI_adjusted = TI × (Average TI for 'MN0095' between runs/Average TI for 'MN0095'  within each run) with average TI for 'MN0095' between runs/average TI for 'MN0095' within  each run being the adjustment coefficient. There were a total of 936 TI data points (234 soybean lines x 4 replicates). ANOVA on TI_adjusted values was performed using PROC MIXED of SAS v. 9.4 (SAS Institute Inc., Cary, NC, USA). Mean separation was conducted using a protected LSD procedure at a significance level α = 0.05. The statistical model for ANOVA analysis was described as following.
where Y ij denoted the observation on the j th genotype from the i th run, R i represented the effect of the i th replication (random effect), G j was the effect of the j th soybean accession (fixed effect), and ε ij was the random error associated with the ij th observation. Broad sense heritability was calculated using H 2 = 100 × (σ 2 g /σ 2 p ) = 100 × σ 2 g /[σ 2 g + (σ 2 e /r)] [20] where σ 2 g was the genotypic variance (σ 2 g = MSGenotype-MSError), σ 2 p denoted the total phenotypic variance, σ 2 e was the variance associated with the random error, and r represented the number of replications. Graph showing the data distribution was drawn using JMP1 oGenomics 9 (SAS Institute, Cary, NC).

Genotyping and quality control
The soybean panel was genotyped using the Soy6K SNP Infinium Chips (https://www.soybase. org/snps/download.php). DNA was extracted from young leaves of each accession using DNeasy 96 Plant Kit (QIAGEN, Valencia, CA). A total of 4,251 SNPs were obtained. Of the 4,251 SNPs, a total of 3,782 SNPs were maintained after SNP filtering (missing data<15%, heterozygosity<20%, minor allele frequency>5%). Those high-quality SNPs were used for further analysis.

Genome-wide association study (GWAS) and candidate gene discovery
Genome-wide association study was conducted using TASSEL 5 [21]. A total of 3 GWAS statistical models were used. These models consisted of single marker regression (SMR), generalized linear model using principal component (PCA) as additional covariate (GLM_(PCA)), and mixed linear model using principal component (PCA) and Kinship (K) as covariates (MLM_(PCA+K)). The LOD threshold for declaring a significant SNP was 3 [22]. The 50-kb genomic region containing the significant SNP was used for the candidate gene(s) search. Functional annotation related to the candidate gene(s) was investigated using Soybase (www. soybase.org). Candidate genes related to plant defense mechanism were more considered.

SNP selection accuracy and efficiency
SNP selection accuracy and efficiency were computed using the formulas established by Shi et al. [23] as shown below.
• Selection accuracy = 100×[(Number of genotypes having high tolerance index with the favorable SNP allele)/ (Number of genotypes having high tolerance index with the favorable SNP allele + Number of genotypes having low tolerance index with the favorable SNP allele)].
• Selection efficiency = 100×[(Number of genotypes having high tolerance index with the favorable SNP allele)/(Total number of genotypes having the favorable SNP allele)].
The top 78 SCN-resistant soybean genotypes (one-third of the whole panel) were the genotypes having high tolerance index, whereas the 78 least performing genotypes (one-third of the whole panel) had low tolerance index.

Genomic estimated breeding values (GEBVs) and genomic selection accuracy assessment
Genomic estimated breeding values were computed under 5 different genomic selection models: ridge regression best linear unbiased predictor (rrBLUP) [24], genomic best linear unbiased predictor (gBLUP) [25], Bayesian least absolute shrinkage and selection operator (Bayesian LASSO) [26], random forest [27], and support vector machines (SVMs) [28]. The packages 'rrBLUP' [29], GAPIT [30], 'BGLR' [31], 'randomForest' [32], and 'kernlab' [33] were used and run in R to perform the genomic selection models rrBLUP, gBLUP, Bayesian LASSO, random forest, and SVMs, respectively. The posterior distribution of the parameter in the Bayesian LASSO model was Double Exponential and the prior distributions were Uniform and Inverse Chi-Square for the hypoparameters λ and σ 2 e . Markov Chain Monte Carlo (MCMC) iteration and the burin were set to 5,000 and 1,000, respectively, when running the Bayesian LASSO model [34]. Random forest was achieved using a total of 500 trees and 4 branches for each tree as previously described [14]. The SVMs model was done using a Gaussian kernel function [33].
In order to assess the effect of population training size on genomic selection accuracy of tolerance index based on biomass reduction under SCN infestation, cross-validation was conducted at different levels. In this study, we have performed a 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, and 7-fold cross-validation corresponding to a training population size of 117, 156, 176, 187, 195, and 201 individuals, respectively. A total of 100 replications were conducted at each level of cross-validation. Genomic selection was conducted using all filtered SNPs and the selected SNPs from GWAS under the single marker model (SMR_SNPs), the generalized linear model (GLM_PCA_SNPs), and the mixed linear model (MLM_PCA_K_SNPs), respectively. In order to better fit the genomic selection model when the GWAS-derived SNPs were used, the number of covariates (SNPs) was increased by choosing the SNPs with LOD greater than 2 instead of 3. Fewer SNPs incorporated into the models would result in poorly fitted genomic selection models. Genomic selection accuracy was estimated by evaluating the Pearson's correlation coefficient between the GEBVs and the observed phenotypes in the testing set [35].

Phenotyping
Adjusted tolerance index for biomass reduction for the 234 soybean accessions was approximately normally distributed (Fig 1). The tolerance index was used to measure the level of tolerance of soybean to the SCN infection. The higher the index was, the more tolerant to SCN infection the genotype was.
Adjusted tolerance index varied from 22.87 to 118.16, with an average of 63.80 and a standard deviation of 17.03 (S1 Table). The analysis of variance (ANOVA) indicated that adjusted tolerance index was statistically significantly different among the soybean accessions (F value = 3.35, p-value<0.0001) ( Table 1). Broad sense heritability was estimated using the variance components from ANOVA ( Table 1). Estimate of broad sense heritability for adjusted tolerance index was high (89.3).
The lowest adjusted tolerance index was recorded for MN0082SP (22. Table), indicating that SCN infestation did not significantly reduce biomass for those accessions, hence they were SCN-tolerant.

SNP profiling
After SNP filtering, a total of 3,782 high quality SNPs were used for GWAS. Average number of SNPs per chromosome was 189, with chromosome 18 having the highest number of SNPs (256) and chromosome 11 harboring the lowest number of SNPs (127) ( Table 2). Average distance between SNP was 255 kb. SNP density was the highest on chromosome 13 with an inter-SNP distance of 181 kb ( Table 2). SNPs were most scattered on chromosome 1 with an average distance of 374 kb between SNPs (Table 2). Kinship analysis revealed that the population was structured into 3 subgroups (S3 Fig), of which, two subgroups had similar size.
Generalized linear model (GLM_PCA). The GLM_PCA model incorporated the principal component (PCA) covariate in its equation. In this study, this model provided a total of 21 significant SNPs (LOD>3) associated with tolerance index based on biomass reduction (Table 3)     A total of 7 significant SNPs were mapped on a 3.5-Mb region of chromosome 15 (Table 3) (Fig 2B). .08%, 6.53%, 6.99%, 6.83%, and 9.13%, respectively (Table 3) (Fig 2C). Chromosome 15 harbored a total of 4 significant SNPs out of these 6 SNPs. The 4 SNPs were located on an 840-Kb region of chromosome 15, which overlapped with the significant loci indicated by the SMR and GLM_PCA models (Fig 2C). Since this region on chromosome 15 was suggested by all 3 statistical models, the likelihood of having QTL(s) affecting is high. Interestingly, no SNPs having an LOD greater than 3 were found on chromosome 18, unlike the SMR and the GLM_PCA models. If the threshold was 2.60, one SNP marker located on chromosome 18 would be significant.

Overlapping significant SNP markers between models and candidate genes
The linear regression models upon which the SMR, GLM_PCA, and GLM_PCA_K models were built had different covariates. Despite this discrepancy between models, the results indicated three consistent genomic regions significantly associated with tolerance index based on biomass reduction under SCN infestation in soybean (Fig 2). The SNPs Gm06_11098210_C_T  (Table 4). These results suggested that the regions harboring these SNPs, especially the 840-Kb region of chromosome 15, had a strong probability of containing QLT(s) affecting tolerance index based on reduction in biomass under SCN infestation in soybean.

Selection accuracy and efficiency
Selection accuracy and efficiency for the SNPs overlapping between models were calculated. The average selection accuracy for the selected SNPs was 43.00% and ranged between 40.82% and 48.67% ( Table 5). The SNP with the highest selection accuracy was Gm06_11098210_C_T, whereas the one with the lowest accuracy among the selected SNP was Gm19_37932358_C_T. Selection efficiency varied from 25.33% to 32.74%, with an average of 28.12% and a standard deviation of 2.58% ( Table 5). The SNP with the highest selection efficiency was Gm06_11098210_C_T, and the one with the lowest selection efficiency was Gm15_7574118_T_C (Table 5).

Genomic selection
Genomic selection for tolerance index based on biomass reduction under SCN infestation was conducted using statistical models consisting of ridge regression best linear unbiased predictor  (rrBLUP), genomic best linear unbiased predictor (gBLUP), Bayesian lasso regression (BLR), random forest (RF), and support vector machines (SVMs) ( Table 6). Marker effects were evaluated from all SNPs and the SNPs obtained from GWAS under single marker regression (SMR), generalized linear model (GLM_(PCA)), and mixed linear model (MLM_(PCA+K)) models, respectively. The effect of training population size on genomic selection accuracy was also investigated by conducting cross-validation at different levels with 100 replications for each cross-validation fold. Regardless of the genomic selection model and the size of the training population, the accuracy of genomic selection was higher when the SNPs obtained from GWAS were used (Table 6) (Fig 3). Interestingly, the genomic selection accuracy using the significant SNPs from SMR was not as high as the one obtained from the other GWAS models such as GLM (PCA) and MLM (PCA + K) when the gBLUP model was used for conducting genomic selection (Fig 3). There was not a significant variation in genomic selection accuracy between the SNP set consisting of SMR_SNPs, GLM_PCA_SNPs, and MLM_PCA_K_SNPs for the genomic selection models involving BLR, RF, rrBLUP, and SVMs (Fig 3). Overall, genomic selection accuracy slightly increased when the training population was bigger and plateaued out at a 6-fold cross-validation (Table 6) (Fig 3), corresponding to a training population size of 195 individuals. For rrBLUP, genomic selection accuracy increased to almost 2-fold at each level of crossvalidation when the GWAS-derived SNPs were incorporated into the genomic selection model. The highest increase was found at 2-fold cross validation where the genomic selection accuracy was 0.22 when all SNPs were used and was equal to 0.50, 0.50, and 0.51 when the SNP set SMR_SNP, GLM_PCA_SNP, and MLM_PCA_K_SNP were used, respectively Table 5

. Genotypic count for the top 78 soybean accessions with the highest tolerance index under SCN infestation, top 78 soybean accessions having the lowest tolerance index under SCN infestation, and selection accuracy and efficiency for the SNPs associated with tolerance index based on biomass reduction under SCN infestation.
High_tolerance_index Low_tolerance_index   SNP  AA  CC  GG  TT  H a  Missing b  Total  AA  CC  GG  TT  H  Missing  Total   Gm06_11098210_C_T  0  55  0  7  14  2  78  0  58  0  16  1  3  78   Gm15_7574118_T_C  0  38  0  18  17  5  78  0  55  0  15  5  3  78   Gm15_7864348_G_T  0  0  49  8  14  7  78  0  0  66  8  2  2  78   Gm15_8263547_G_T  0  0  49  9  17  3  78  0  0  64  10  2  2 (Table 6). At 3-fold and 4-fold cross-validation, genomic selection accuracy was the highest when the SNP set GLM_PCA was used (Table 6). Genomic accuracy was equally high for the SNP set SMR_SNP and MLM_PCA_K_SNP at 5-, 6-, and 7-fold cross validation. For gBLUP, using GWAS-derived SNPs increased genomic selection to almost 3-fold expect for the SMR SNP set (Table 6) (Fig 3). Under the gBLUP model, genomic selection accuracy was the highest when the GLM_PCA SNP set was used. Genomic selection was 0.43, 0.44, 0.46, 0.48, 0.45, and 0.45 at 2-, 3-, 4-, 5-, 6-, and 7-fold cross-validation, respectively, for the GLM_PCA SNP set (Table 6), whereas the accuracy was 0.14, 0.17, 0.16, 0.16, 0.19, 0.18 at 2-, 3-, 4-, 5-, 6-, and 7-fold cross-validation, respectively, when all SNPs were used to perform genomic selection (Table 6). Genomic selection accuracy performed better under the BLR model than the gBLUP model. Genomic selection accuracy was more than 0.50 when the GWAS-derived SNPs were used in the BLR model (Table 6). Unlike the results suggested by gBLUP, genomic selection under the BLR model showed a higher and a more stable result if the SNPs were derived from a GWAS analysis (Fig 3). The SVMs model resulted in a lower genomic selection accuracy than BLR and rrBLUP, but was comparable to gBLUP except for the SNPs derived the SMR model in GWAS (Fig 3). In addition, the genomic selection based on the SVMs model was special in a way that the accuracy was the best when the SNPs from the SMR GWAS model were used, which was not the case for the other genomic selection models (Table 6) (Fig 3). Among the 5 genomic selection models used for predicting tolerance index based on biomass reduction under SCN infestation, the RF model displayed the best accuracy when all SNPs were used (Table 6). However, the selection accuracy under the RF model was lower than rrBLUP and BLR when the GWAS-derived SNPs were used (Table 6) (Fig 3). These results suggested that genomic selection for tolerance index based on biomass reduction under SCN infestation was model-, SNP set-, and training population size-dependent.

Discussion
A large variation in tolerance index based on biomass reduction due to SCN infection was found among the soybean panel in this study. Biomass of the genotypes M97251029, ALTONA, MN1804CN, M97305077, M97304052, M97205096, M98332108, MN1806SP, and ALPHA was not affected by SCN infection, indicating that these genotypes were tolerant to SCN infection. Among the 9 lines, MN1804CN and ALPHA (PI 564524) contained resistance from PI 88788 and were resistant to HG Type 0 (race 3) [14]; the SCN tolerance in these 2 lines can be due to or partially due to the SCN resistance trait. All other 7 lines were susceptible to race 3 and the SCN population used in this study; the tolerance in these 7 lines must be due to some other traits rather than the traits resistant to SCN development and reproduction. The results suggest that the SCN tolerance traits based on biomass phenotyping is useful in diversifying soybean cultivars with SCN tolerance traits (minimizing SCN damage) in additional to SCN resistance traits (suppressing SCN reproduction). Genome wide association study (GWAS) has been a powerful tool to identify SCN-resistance loci in soybean [14][15][16]36]. A total of 3,782 high quality SNPs were used to conduct GWAS for tolerance index based on biomass reduction under SCN infestation in this study. A total of 35, 21, and 6 SNPs were identified to be associated with tolerance index based on biomass reduction under SCN infestation using the models SMR, GLM (PCA), and MLM (PCA +K). Of which, 6 SNPs overlapped between the 3 models. The discrepancy in terms of the number of significant SNPs found from each model was attributed to the different covariates used upon which each GWAS model was built. PCA accounted for population stratification within the soybean panel investigated in this study. The SMR model does not account for population structure, GLM (PCA) has the ability to reduce false discovery due to population stratification, and MLM (PCA +K) further decreases false discovery rate by incorporation the genetic relatedness between soybean lines, which was denoted as Kinship (K). Both SMR and GLM (PCA) models identified a cluster of significant SNPs found on chromosome 18, which harbored the resistant locus rhg1 [9]. A few lines involved in the soybean panel used in this study were derived from PI 88788 [14], which has the rhg1 locus resistant to SCN. The SCN resistant traits can contribute to soybean tolerance under the SCN infection, but the rgh1 genes have little resistance to the SCN population of HG Type 1.2.3.5.6.7 (race 4) used in this study. Interestingly, the highest GWAS signals were located on an 840-Kb region of chromosome 15. Of the 6 SNPs overlapping between the 3 models, 4 were mapped on chromosome 15. To the best of our knowledge, to date, no SCN-resistant loci have been reported in the vicinity of these SNPs. There were only 16 lines potentially containing SCN resistance of rhg1 on chromosome 18 from PI 88788. Since the FI of the SCN population on PI 88788 was 28.2 and a soybean derived from PI 88788 generally have higher FI than the source, probably none of these 16 soybean lines could have FI less than 30 and be classified as a soybean resistant to the SCN population used in this study. Therefore, this QTL on this chromosome 15 is probably a sole tolerance trait that promotes soybean growth to minimize biomass reduction by SCN infection, but is unable to suppress SCN reproduction.
One of the most surprising findings reported in this study was the involvement of small heat-shock protein (HSP20) for SCN tolerance based on biomass reduction. In addition to conferring resilience to abiotic stresses such as drought and hot temperatures, small-heat shock proteins (HSP20) have been also suggested to help with plant defense mechanism to pathogen infection [37]. These proteins act as molecular chaperones by assisting with protein folding and other post-translational modifications during pathogen invasion. Park and Seo [37] showed that HSP20 is highly involved in controlling R proteins during plant pathogen attack. Therefore, there could be a link between HSP20 and its involvement in limiting damage of SCN infection in soybean. However, this finding requires additional validation. Our results also indicated a mago nashi family protein to be a good candidate for SCN tolerance. Mago nashi protein is a key component of the exon junction complex (EJC) [38]. The involvement of mago nashi protein in plant defense pathways against pathogens remains poorly understood. However, Gong et al. [38] reported that one of the genes found in the EJC complex is involved in plant growth and development. In this study, we reported loci affecting tolerance index based on biomass reduction under SCN infestation in soybean. We could suggest that the EJC complex might promote plant growth under SCN infestation, resulting in a greater shoot biomass. A phosphatase 2C family protein has been also found to be associated with SCN tolerance in this study. This protein belongs to the class of enzymes requiring Mg 2+ and Mn 2+ to be functional. These proteins are highly involved in plant signaling pathways during pathogen infection [39]. One of the pathways involving these proteins that are relevant to biotic stress is the signaling of the hormone abscisic acid (ABA) regulation. Fuchs et al. [39] described that the regulation of ABA upon plant biotic stress contributes to maintaining vegetative growth and modulating plant transpiration. This could explain the fact that some soybean lines used in this study were able to grow and develop under SCN infestation. A predicted 3-ketosphinganine reductase has also been identified as a potential candidate gene for tolerance to SCN infection in this study. This protein has been shown to play key roles in plant response to biotic stress [40]. This protein is an essential element of cell wall and acts as a hormone signaling molecule [40]. Wang et al. [41] demonstrated that the gene encoding for 3-ketosphinganine reductase is upregulated during powdery mildew infection in Arabidopsis thaliana. This molecule is significantly involved in the salicylic-acid pathway [41]. Even though no reports have described the direct involvement of 3-ketosphinganine reductase in SCN defense mechanism, we could still speculate that its mechanism in soybean might be similar to the one described during the powdery mildew infection in Arabidopsis thaliana. An annotated gene, Glyma.19G121200.1, has also been found in the vicinity of the significant SNPs associated with tolerance index based on biomass reduction under SCN infestation. This gene encodes for FAR1-related sequence 3-like isoform X1. However, to date, there is no report describing the role of this protein in plant defense mechanism against pathogens. Therefore, further analyses are required to elucidate the unknown functions of the proteins transcribed by the annotated genes found in the vicinity of the SNP markers. In addition, further studies are needed for validating the candidate genes prior to their deployment into a marker-assisted selection aiming at improving SCN tolerance in soybean.
Genomic selection has become more and more popular in modern plant breeding [42,43]. Genomic selection has been proven to be successful in improving genetic gain per unit of time in other studies [35,[44][45][46][47]. However, few reports have focused on the potential establishment of genomic selection to unravel the genetic architecture of SCN tolerance in soybean. In this investigation, we performed genomic selection based on 5 statistical models (BLR, gBLUP, rrBLUP, RF, and SVMs) and 4 SNP sets (all SNPs, SMR_SNPs, GLM_PCA_SNPs, and MLM_PCA_K_SNPs). In addition, we have also investigated the effect of training population size on genomic selection accuracy by conducting genomic selection at different levels of cross-validation. Results indicated that genomic selection accuracy was model, SNP set-, and training population size-dependent. This implies that model selection criteria, type of SNPs, population training size are critical components of interest when conducting a genomic selection study. Using GWAS-derived SNPs enhanced the accuracy to almost two-fold. A study conducted by Bao et al. [14] on both association mapping and genomic selection on a total of 282 soybean genotypes for SCN resistance indicated that the use of significant SNPs for conducting genomic selection significantly improved the accuracy of prediction, which was in agreement with the data reported in this current investigation.
In this study we conducted a genome-wide association study (GWAS) to identify SNP markers, and to perform a genomic selection (GS) study for soybean tolerance to SCN infection based on biomass reduction. GWAS has been shown to be successful in studies investigating the genetics of SCN in soybean [48]. Li et al. [49] mapped a 6-Mb region of chromosome 15 to be associated with SCN. In this report, this region has been narrowed to a less than 2-Mb region. In a previous report, we presented the genomic study of soybean tolerance to the SCN infection based on chlorophyll content. Soybean tolerance to SCN may be best characterized by soybean yield response to the SCN infection [48]. However, due to a large number of soybean lines that are needed for the GWAS, it is difficult to conduct a field scale experiment to phenotype the soybean yield response of all lines to SCN in the same field under similar environmental conditions including the SCN infestation level. The greenhouse experiment is the first feasible step to conduct genomic study of SCN tolerance. Biomass in the greenhouse can probably be used to predict yield potential in field, but cautiousness must be taken to extrapolate the greenhouse results to the field setting. We realize that there were some limitations in this greenhouse study. For example, we were unable to conduct the experiment to include all lines at the same time, so it is possible that there were unknown extraneous environmental factors that interfered with our data interpretation. Furthermore, the plants within and between pots appeared to be too crowded for the soybean plants to grow for two months in the greenhouse (S2 Fig), and the plant growth of a line might have affected the plant growth of lines in the neighbor pots. We took consideration of the density of plants per pot and variability of biomass measurements among replicates; if number of plants/pot were reduced, it would probably increase the variability among replicates. It appeared that the plant density of 5 plants/pot was an appropriate design for this study. However, if there is sufficient greenhouse space, it would be better to increase the distance between pots and number of replicates. Nevertheless, to our knowledge, this is the first study of tolerance QTLs to SCN in soybean, perhaps first study of tolerance QTLs to any plant-parasitic nematode based on biomass reduction under nematode infestation. Soybean cyst nematode is mainly managed by using SCN-resistant soybean cultivars, which limit SCN reproduction. SCN-tolerant QTLs may include the QTLs resistant to SCN reproduction, but some QTLs can only affect plant growth, not SCN development and reproduction. This study opened a novel approach to diversify genes that promote plant growth and level off the damage caused by SCN infection. With the advances in genome sequencing and genetic analyses, genomic selection of SCN-tolerant traits is a promising approach to breed SCN-tolerant soybean for enhancing SCN management.

Conclusions
This study reported the variation in tolerance index based on biomass reduction of a total of 234 soybean genotypes. To the best of our knowledge, this is one of the a few reports investigating SCN tolerance in soybean. In addition to confirming previously reported loci, we identified new loci for SCN tolerance using tolerance index based on biomass reduction. Moreover, we have showed that genomic selection accuracy for SCN tolerance depends on various factors such as statistical models, SNP sets, and training population size.
Supporting information S1 Table. List of soybean genotypes evaluated for resistance/tolerance to SCN, average adjusted tolerance index based on biomass reduction under SCN infestation, and standard deviation.