SNP-SNP Interaction Analysis on Soybean Oil Content under Multi-Environments

Soybean oil content is one of main quality traits. In this study, we used the multifactor dimensionality reduction (MDR) method and a soybean high-density genetic map including 5,308 markers to identify stable single nucleotide polymorphism (SNP)—SNP interactions controlling oil content in soybean across 23 environments. In total, 36,442,756 SNP-SNP interaction pairs were detected, 1865 of all interaction pairs associated with soybean oil content were identified under multiple environments by the Bonferroni correction with p <3.55×10−11. Two and 1863 SNP-SNP interaction pairs detected stable across 12 and 11 environments, respectively, which account around 50% of total environments. Epistasis values and contribution rates of stable interaction (the SNP interaction pairs were detected in more than 2 environments) pairs were detected by the two way ANOVA test, the available interaction pairs were ranged 0.01 to 0.89 and from 0.01 to 0.85, respectively. Some of one side of the interaction pairs were identified with previously research as a major QTL without epistasis effects. The results of this study provide insights into the genetic architecture of soybean oil content and can serve as a basis for marker-assisted selection breeding.


Introduction
Soybean originates from China which is not only grain crop but also commercial crop. The content of protein accounts for 40% of soybean seeds, and the content of oil occupy 20%. Soybean is the largest oil seed crop in the world, which accounts for 57% of world oilseed production in 2015 (http://www.soystats.com) [1]. Oil content of soybean seeds is a complex quantitative trait governed by a number of genes mostly with small effects and was affected by the environment [2]. Till now, 188 QTL mapping research on soybean oil content were listed in the USDA Soybean Genome Database (http://www.soybase.org). Epistasis interaction is common and can cause cryptic genetic variation for quantitative traits [3]. Deriving genetic interaction networks from epistatic interactions between loci will enhance our understanding of biological systems that give rise to variation in quantitative traits [4]. Many recent studies have been performed regarding to epistatic interaction analysis, including investigations of agronomic and nitrogen acquisition traits in rice [5][6][7][8][9], and protein content, seed and pot traits, grain weight per plant in soybean [10][11][12], and proteins, oils, starches and fatty acids in maize [13][14], and grain protein, yield, and plant heights traits in wheat [15][16][17]. These studies, however, were based on the interactions of QTLs, which encompass more than a single marker. However, quantitative variation in phenotypes must result in multiple factor, therefore, interaction study designs have concentrated on estimating additive effects of single loci [18]. Because analyses of single loci can only partially investigate significant loci and may miss interactions between them, interaction analysis of single nucleotide polymorphisms (SNPs) is a better approach to elucidate genetic mechanisms [19], Goodman et al. [20] has explored the association between colon cancer and 94 SNPs, by the method, polymorphism interaction analysis in 2006. Dinu et al. [21] had discovered the Crohn's disease genetics' SNP-SNP interactions by logic regression. Lin  which examines all possible SNP combination from a set of given SNPs and choose the combination that the best predicts risk by minimizing the classification error of cases and controls [26]. In addition, MDR analysis incorporates a cross-validation/permutation procedure to minimize the rate of false positive findings that may otherwise result from tests involving multiple variables or comparisons [25][26][27]. Although some recent studies on SNP interactions have exploited the MDR method [25][26][27][28][29][30][31], most of them has been focused on human and animal complex diseases and related traits. To date, no investigations of SNP interactions involving soybean-related traits have been conducted. This study is on the strength of a high-density genetic map for soybean which is based on specific length amplified fragment sequencing. We set the experiment in 23 environments in order to explore the stable locus across the multiple environments. MDR method was employed to analyze epistatic interactions between SNPs for oil content in soybean, the results on SNP interactions underlying genetic mechanisms of quantitative traits and may help improve soybean quality traits.

Plant Materials
An F 2:10 -F 2:20 population of 147 RILs was advanced by single-seed descent from crosses between two soybean cultivars: 'Charleston' (♀), (American cultivator) [32], and 'Dong- and 2008-2012, respectively. The timing and frequency of cultural management procedures used for the trials were in accordance with normal production practices for the respective environments.

Measurement of Soybean Oil Content
Oil was analyzed from seeds obtained from five randomly harvested plants of each line per plot. The seed oil phenotypic data was analyzed using the FOSS Infratec TM1241 Grain Analyzer (FOSS Tecator AB, Hoeganaes, Sweden). The oil content of each RIL line was averaged from the data of two or three plot replicates. Two hundreds grams seeds of each replicate were used to analysis at a 13% moisture basis.

Genotyping and Genetic Map Construction
SNP genotype data for the RIL population was detected based on the specific-locus amplified fragment sequencing (SLAF-seq) method following Qi et al. [33] and used to construct a highdensity genetic map of 5,308 markers assigned to 20 linkage groups. The map spanned 2294.433 cM, with an average inter-marker distance of 0.43 cM. The percentage of gaps in which the distance between adjacent markers and markers is smaller than 5 cM (gap 5) is 99.84%.

Interaction Analysis
To identify SNP × SNP effects in this study, we used the MDR method [25-31], The Pearson chi-squared module was used to assess significance (p < 0.001), with the selected optimization model based on the maximum Pearson chi-squared statistic combined with cross-validation [30].
The chi-square statistic measures the association between genotype (high-risk and low-risk group) and affection status (case and control group) in a two-way table. It is calculated as sum of the square of the difference between the observed and expected frequency in each combination, divided by the expected value, across all the combinations: The calculation was based on the value of chi-square. Average Pearson chi-square of 10 time 10-fold cross-validation were used to select optimization model [30]. The p value was calculated referred the Bonferroni correction [34][35]. The epistatic interaction values and their contribution to genetic values and variance were calculated following the method proposed by Cheverud et al. [36]. Using contribution rate represent the phenotypic variation of the SNP-SNP interaction pair.

Phenotypic Variation in the RIL Population under Multiple Environments
Oil content datas of the RIL population in 23 environments are shown in Table 1. The differences between the two parents were significant in parts of the 23 environments and the oil content of Charleston and Dongnong 594 ranged from 19.03 to 21.89% and from 18.96 to 21.83%, respectively. Oil content displayed a wide range of variation across the 23 environments, but the range of variation was narrow between different locations in the same year. The mean values of the RIL population ranged from 19.07 to 20.99%. The coefficient of variation ranged from 0.01 to 0.06, with a standard deviation of 0.36 to 1.39. Overall, the oil content of the RIL population was continuously distributed and consistent in all environments (Table 1).

Pairwise Interaction Analysis
Analyze the oil content and genotype of 23 environments by the method of MDR. The selection criteria of SNP interaction pairs was the p <3.55×10 −11 which was calculated based on the Bonferroni correction (Table 2). In total, the available interaction pairs have been detected in 12 of all environments, 36 (Table 2).
There were two and 1863 SNP interaction pairs were identified under 12 and 11 environments which have been detected the available interaction pairs (Fig 1). Among the 20 linkage groups, most of stable interaction pairs were from Gm01 to others, including Gm01, Gm05, Gm06, Gm07, Gm08, Gm09, Gm10, Gm13, Gm18 and Gm20. Two SNP interaction pairs detected across 12 environments showed the interaction effect between Gm01 (Mark673223) and Gm05 (Mark310931, Mark312588), furthermore, 1863 SNP interaction pairs detected across 11 environments showed the strong interaction region at Gm01, Gm05, Gm06, Gm07, Gm09, Gm10, Gm13 and Gm18 (Fig 1). These 1865 SNP interaction pairs were detected stable across environments which almost 50% of all, so we saw them as the stable interaction pairs and used for next step calculation.

Epistatic Value and Contribution Rate Analysis
The epistatic interaction values and their contribution to genetic values and variance were screened by the two way ANOVA test for each environment at significant level of p<0.01. In total, there were 560 of 1865 interaction pairs were screened, 4, 23, 99 and 344 SNP interaction pairs were detected as stable and effective interaction pairs across 5, 4, 3 and 2 environments respectively (Fig 2), which similar as the environment-universal QTLs, defined as QTLs associated with traits of interest in at least one macro-environment (i.e., year-location combination) [37,38], here, it could be the environment-universal SNP interaction pairs, and there were no significance for the other SNP interaction pairs. Epistatic value and contribution rate data of stable interaction pairs across all environments are shown in S1 Table. Epistasis value was ranged from 0.01 to 0.89 at the p<0.01 level. Contribution rates of epistatic pairs were ranged from 0.01 to 0.85 at the p<0.01 level (S1 Table). A compairson of the environmentally stable SNP interaction pairs uncovered in this study with those detected in previous research revealed none that matched completely. In a few cases, however, a member of an interaction pair matched a main effect QTL not previously  [46]. On Gm20, 24.6Mb and 36.5Mb have been detected by Shibata et al. [53] and Reinprecht et al. [49] respectively.

Discussion
Soybean oil content is an important quantitative trait under complex genetic control. Understanding the oil content locus interaction mechanisms could help us effectively increase the oil content in soybean. In this study, MDR method was employed to identify the stable loci controlling oil content in soybean that is based on a high-density genetic map, which has been constructed by specific-locus amplified fragment sequencing (SLAF-seq) with 5308 markers on 20 linkage groups and the map was 2294.43 cM in length, with an average distance of 0.43 cM between adjacent markers [33]. Epistasis is a biochemically plausible feature of the genetic architecture of quantitative traits. So, epistasis causes hidden quantitative genetic variation in natural populations and could be responsible for the small additive effects [4]. In quantitative genetics, epistasis refers to any statistical interaction between genotypes at two or more loci [56][57]. The role of epistasis in the genetic architecture of quantitative traits has been controversial since early formulations of quantitative genetic theory [58][59]. Numerous methods to detect epistatic interactions between SNPs have been published, such as exhaustive algorithms [60] and MDR [25][26][27][28][29][30][31][32][33], regression [61][62][63], heuristic [64][65] and mutual information [66], among others [60,[67][68][69][70].
Currently, many studies are based on multifactor dimensionality reduction (MDR) focus on the complex human diseases and animal-related traits. Guia et al. [71] reveals interaction of important gene variants involved in allergy by MDR. Su et al. [72] applied the multifactor dimension reduction method to analyze gene-gene and gene-environmental interactions of childhood asthma. Gui et al. [73] detected gene-gene interactions with application to the genetic analysis of bladder cancer susceptibility with multifactor dimensionality reduction method. Cho et al. [74] using MDR method find out that a two-locus interaction genes among 23 loci in the candidate genes of Type 2 diabetes. Julia et al. [75] suggest that 13 genes is associated with synovial fibroblast response to rheumatoid arthritis proinflammatory stimulus by MDR. Many studies using the MDR method have focused on complex human and animal disease-related traits to identify the epistatic interactions [25][26][27][28][29][30][31][71][72][73][74][75], to our knowledge, no study have applied MDR to analyze soybean quantitative traits. In this study, we identified 1865 stable SNP interactions pairs associations with soybean oil content under multi-environments, successfully applied MDR to the plant research. Among 1865 SNP interactions pairs, 2 pairs were detected stable across 12 environments, 1863 pairs were detected stable across 11 environments. MDR method can improve the analysis efficiency as an analysis method based on haplotype, which is a good choice for genome-wide interaction analysis. Nevertheless, this method also has some restrictions. First, it does not have a way to adjust for covariate effects such as age, gender and smoking status, an often necessary step to obtain an unconfounded SNP interaction outcome association [76]. Second, if there are too many covariates, this approach may over fit the data and sometimes fail due to limited sample size [77].
The present study has opened the door for further research on soybean quantitative trait integration analysis. On the one hand, the SNP-SNP interaction of hot region which were identical with some published major QTLs, however, those major QTLs were not shown the interaction each other previously. On the other hand, the candidate major region could be found as only one site of the pair was aligned with the published QTL. Based on this research, the markers of those hot region could be developed, and used for breeding selection process, furthermore, the candidate genes of hot region could be identified and to analysis the interaction of genes in next step. Future efforts based on the uncovered epistatic interaction values and their contribution to genetic values can be applied for the identification of key loci with higher phenotypic contributions, the prediction of phenotype and genotype design modules with epistatic values, and to provide direction for high-quality soybean breeding.
Supporting Information S1 Table. Epistatic value and contribution rate data of stable interaction pairs across 11 environments. NS means no signification value in that environments, epistatic value indicates the epistatic effects value of the SNP-SNP pairs, contribution rates indicates the phenotypic variation of the SNP-SNP interaction pairs.