Genome-wide association and epistatic interactions of flowering time in soybean cultivar

Genome-wide association studies (GWAS) have enabled the discovery of candidate markers that play significant roles in various complex traits in plants. Recently, with increased interest in the search for candidate markers, studies on epistatic interactions between single nucleotide polymorphism (SNP) markers have also increased, thus enabling the identification of more candidate markers along with GWAS on single-variant-additive-effect. Here, we focused on the identification of candidate markers associated with flowering time in soybean (Glycine max). A large population of 2,662 cultivated soybean accessions was genotyped using the 180k Axiom® SoyaSNP array, and the genomic architecture of these accessions was investigated to confirm the population structure. Then, GWAS was conducted to evaluate the association between SNP markers and flowering time. A total of 93 significant SNP markers were detected within 59 significant genes, including E1 and E3, which are the main determinants of flowering time. Based on the GWAS results, multilocus epistatic interactions were examined between the significant and non-significant SNP markers. Two significant and 16 non-significant SNP markers were discovered as candidate markers affecting flowering time via interactions with each other. These 18 candidate SNP markers mapped to 18 candidate genes including E1 and E3, and the 18 candidate genes were involved in six major flowering pathways. Although further biological validation is needed, our results provide additional information on the existing flowering time markers and present another option to marker-assisted breeding programs for regulating flowering time of soybean.


Introduction
A genome-wide association study (GWAS) is one of the promising approaches for the identification of genomic variants responsible for specific phenotypes [1]. With the introduction of PLOS  Korea (96), the USA (60), China (59), and Japan (32), and comprised 335 improved cultivars (ICs), 2,175 landraces (LRs), and 152 unknown cultivars (UCs) (S1 Dataset). These accessions belonged to the following seven maturity groups: I (2), II (31), III (60), IV (187), V (88), VI (16), and VII (3); however, the maturity group of most of the Korean accessions (2,275) was not reported. After genotyping, 180,961 SNP markers were detected, and haplotype phasing and imputation were conducted using BEAGLE version 3.3.1 [46]. Then, 78,427 SNP markers with a minor allele frequency (MAF) > 0.05 were obtained. The non-MAF filtered SNPs were used to examine the genomic structure and relationship, and the MAF filtered SNPs were used to perform the GWAS. The genotype data of all 2,662 accessions, which were generated in collaboration with the RDA, is available at http://k-crop.kr and https://github.com/ kyounghyoun/Soybean_epistasis [47]. Phenotypic evaluation was conducted in the experimental field of the National Institute of Crop Science (NICS, Jeonju, Korea) (35˚50'26.7" N, 1270 2'42.7" E), and the DTF trait was measured from June to October in 2014. Because of an agreement on limited disclosure with our phenotypic evaluation team, the raw DTF values are provided in the distribution plot and categorical data (Fig 1f, S1 Dataset).

Analysis of the genomic structure of and relationship among soybean accessions
Phylogenetic analysis was conducted based on the identity-by-state coefficient matrix calculated using Plink v1.90b [48]. The phylogenetic tree was reconstructed using the BIO-neighbor-joining algorithm [49] and FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/) (Fig  1a). Structure analysis was conducted using fastSTRUCTURE v1.0 [50], based on a variational Bayesian framework (Fig 1b). The number of genetic clusters (K) was set at six, considering the error values of 10-fold cross-validation (S1 Fig). The error value was the lowest at K = 6. Principal component analysis (PCA) was performed by applying singular value decomposition to the distant matrix calculated using the Kimura two-parameter model [51], and then displayed using the PC axes 1, 2, and 3 (Fig 1c and 1d). The linkage disequilibrium (LD) pattern was computed using PopLDdecay v3. 2 [52]. The mean value of LD was calculated within 100 and 500 kb regions (S1 Table), and the degree of LD up to 500 kb is displayed in Fig 1e. Nucleotide diversity (π) was calculated using a 10 kb slide size with a 100 kb window size using VCFtools v4. 2 [53], and inbreeding coefficient (F) was estimated using the method of moments in VCFtools (S1 Table). Additionally, using the same software, fixation index value (Fst) was calculated using a 10 kb slide size with a 100 kb window size.

GWAS
An association study between genomic regions and flowering time was conducted using a compressed mixed linear model within Genomic Association and Prediction Integrated Tool [54]. The MAF-filtered 78,427 SNP markers (MAF > 0.05) was used to consider common markers (S2 Dataset), and a statistically significant cut-off value was adopted to −log (p-value) = 7. To minimize false-positive results, the cut-off was set slightly higher than the Bonferonni-corrected −log(p-value) of 6.2. The extent of model fitting was confirmed using a quantile-quantile (Q-Q) plot for the expected and obtained p-values. After GWAS based on SNP markers, gene annotations were performed using SnpEff v4. 3 [55]. All results are summarized in S3 Dataset, and genome-wide plots including the Q-Q plot are shown in Fig 2.

SNP marker set for epistatic analysis
To construct an SNP marker-set for multilocus epistatic interaction analysis, a biological filtering approach was conducted. First, genes affecting DTF and maturity in Arabidopsis were   , and a total of 5,534 SNP markers (excluding 6 overlapping SNP markers) was constructed as the final SNP marker-set for the multilocus epistatic analysis. All results, including the number of SNP markers per gene, are summarized in S4 Dataset, and a brief summary is provided in Table 1. . The threshold for the posterior probabilities on the dependency for multilocus and phenotype was set at 0.5. A Markov Chain Monte Carlo approach was then used to search potential epistatic marker sets, and a Bayesian computational approach was utilized to detect final high-order epistatic market sets. All results of multilocus epistatic interactions are summarized in Table 2, and displayed in Circular-Manhattan plots in Fig 3. To more complement the multilocus epistasis, two-locus epistatic interactions were investigated to the BHIT results. The two-locus epistatic test was conducted using a fixed linear regression model of the Plink v1.90b [48], and the resulting p-values are summarized in Table 2.

Genomic structure of cultivated soybeans
A large data set of 2,662 cultivated soybean accessions, originated from Korea (2,511), the USA (60), China (59), and Japan (32), were used in this study. These accessions included 335 ICs, 2,175 LRs, and 152 UCs (S1 Dataset). Korean accessions accounted for approximately 94.33% of the entire soybean collection, and covered with most of the soybean accessions in Korea. These 2,662 accessions were genotyped using the Axiom 1 180k SoyaSNP array developed by our team in 2016 [45], and a total of 180,961 SNP markers were identified. These SNP markers represented 39,674 soybean genes, with an average number of 2.80 SNP markers per gene. Prior to GWAS, the genomic structure of 2,510 accessions (2,175 LRs and 335 ICs, excluding 152 UCs) was examined using the SNP markers (Fig 1). These 2,510 accessions represented a single large population with considerable genomic diversity. A phylogenetic tree was constructed to radial forms rather than branched forms with distinct clades, and LRs and ICs exhibited a mixed aspect with each other (Fig 1a). Structure analysis revealed similar genomic composition and proportion between LRs and ICs at K = 2 (Fig 1b; green and blue). With the increase in the value of K, the entire IC group showed a unique genomic composition, which slightly distinguished from the LR group; however, each accession in the IC group exhibited an aspect of inclusion in the LR group, with varying levels of genomic composition. The results of PCA showed that the LR and IC groups formed a single group with large diversity, and the IC group overlapped with the LR group, consistent with the results of structure analysis (Fig 1c  and 1d). Average values of LD, π, and DTF showed differences between the IC and LR groups. Due to the impact of conscious selection for improvement, the IC group showed slightly higher average values of LD (0.121583) (Fig 1e, S1 Table), and lower π (2.22E-5) and DTF (42.16) (Fig 1f, S1 Dataset) than the LR group (0.112253, 2.39E-5, and 45.86, respectively). Values of inbreeding coefficient were all high (IC, F = 0.9710; LR, F = 0.9731) because of the characteristics of inbreeding and stringent cleistogamy (S1 Table). Also, a numeric value Fst, supported the results of various population analyses. The Fst between the IC and LR groups was 0.0581 and was small enough to be regarded as a single group. Based on these results, a single dataset consisting of all 2,662 accessions was constructed, and used for subsequent GWAS and epistatic analysis. The results of PCA indicated that 132 UCs overlapped between the LR and IC groups; these accessions were included in GWAS to reflect as much genomic diversity of cultivated accessions as possible.

GWAS of DTF
Of the 180,961 SNP markers identified in 2,662 accessions, 78,427 SNP markers with MAF > 0.05 were selected as common SNP markers for GWAS. These SNP markers were distributed on all 20 soybean chromosomes and represented 26,162 genes, with an average number of 2.07 SNP markers per gene (S2 Dataset). Then, the association between these SNP markers and DTF was examined using a compressed mixed linear model that adjusts population structure and kinship (Fig 2, S3 Dataset). Statistically significant cut-off was set at −log(p-  value) = 7, which was higher than the Bonferroni-adjusted p-value = 6.2, to minimize falsepositive results. A total of 132 significant SNP markers including 93 genic SNP markers were detected, and all mapped to chromosomes 6 and 19 (Fig 2a). Among these 93 significant genic SNP markers, 78 SNP markers mapped to 51 soybean genes on chromosome 6 (Fig 2b), and 15 SNP markers mapped to eight genes on chromosome 19 (Fig 2c).
The two major DTF-related genes, E1 and E3, were identified from the 59 significant genes. The E1 gene has the largest influence on the determination of DTF [37,63] and affects vegetative development by regulating multiple genes related to plant growth [64]. The E1 gene was located at the 20,207,322 bp position on chromosome 6 (Glyma.06G207800), and showed a considerably significant −log(p-value) of 14.23. The E3 gene regulates DTF under short-day and long-day conditions, and affects plant maturity [65]. This gene was positioned at 47,638,344 bp on chromosome 19 (Glyma.19G224200) and exhibited a significant −log(pvalue) of 9.75. Two additional major genes affecting DTF, E2 and E4 [66,67], were not detected as significant genes in this study. The E2 gene was located at nucleotide positions 45,295,453, 45,295,508, 45,296,750, and 45,300,271 bp on chromosome 10 (Glyma.10G221500), but all showed non-significant p-values of 2.00, 2.43, 2.31, and 2.43, respectively (Fig 2a). Also the E4 gene was positioned at 33,236,286, 33,236,286, and 33,241,589 bp on chromosome 20 (Glyma.20G090000), but all were excluded from the GWAS because of low MAF (0.000187, 0.000909, and 0.002441, respectively).

Epistatic interaction among DTF-related genes
Interaction effects on DTF among significant SNP and non-significant SNP markers were examined to complement GWAS single-variant-additive-effect. Significant SNP markers consisted of 93 genic SNP markers belonging to 59 soybean genes derived from the GWAS, as described above. To focus on interactions among DTF-related genes, 5,441 non-significant SNP markers in 2,184 soybean genes (Table 1) were selected from a set of 356 DTF-related Arabidopsis genes on the Axiom 1 180k SoyaSNP array (S4 Dataset) (see Materials and methods for details). In the 356 Arabidopsis genes, a total of 2,188 homologous soybean genes were identified, but four genes (Glyma.06G205700, Glyma.06G207800-E1, Glyma.06G221000, and Glyma.19G224200-E3) with six SNP markers were excluded from the non-significant SNP marker-set since these markers were already included in the significant marker-set in GWAS. Also, when detecting homologous genes, the non-MAF filtered SNPs were used in order to consider various DTF-related genes which were excluded from the GWAS due to MAF < 0.05, such as E4 gene (Glyma.20G090000). Finally, a total of 5,534 SNP markers belonging to 2,243 soybean genes were selected and used for analyzing multilocus epistatic interactions. The results revealed five interaction groups (G1-G5) containing two significant and nine non-significant genes (Fig 3a) and two interaction groups (G6, G7) containing only eight non-significant genes (Fig 3b, Table 2).

Pathways of epistatic genes related to DTF
To identify the degree of involvement in DTF, two significant and 16 non-significant genes in all interaction groups were examined on DTF-related pathways. These 18 candidate genes were involved in the following six major DTF-related pathways [56]: ambient temperature (AM), autonomous (AU), flowering pathway integrator (FPI), photoperiod (P), floral meristem identity (FMI), and vernalization (V) (Table 2, Fig 4). The V and P pathways control the overall flowering process [89] and the AM pathway regulates flowering time [90,91], as external factors. The AU pathway promotes induction of flowering as an internal factor [89]. The FPI pathway regulates the other pathways and triggers the induction of FMI [89], and the FMI pathway induces flowering. The P pathway contained eight candidate genes including the E1 and E3 genes; A and AM pathways contained two candidate genes each; AU and FPI pathways contained four candidate genes each; and the FMI pathway contained three candidate genes. These results confirmed that all 18 candidate genes are involved in major DTF-related pathways, and presented that they have a wide effect on DTF while interacting with each other.

Discussion
The Axiom 1 180k SoyaSNP array was developed by our team mainly based on Korean soybean accessions, with the aim to perform GWAS using a large number of high-density markers [45]. Of the 2,662 soybean accessions used in this study (S1 Dataset), 94% were of Korean origin and therefore suitable for genotyping with the Axiom 1 180k SoyaSNP high-density array, generating 180,961 SNP markers. These 2,662 soybean accessions mainly comprised two subgroups, LRs and ICs, and have a tendency to form a single large population with considerable genomic diversity (Fig 1). The Fst value, a numeric value indicating population structure, was also sufficiently small (0.0581) for the two subgroups to be considered as a single group (S1 Table). Based on the results of population structure analyses, we conducted GWAS on all 2,662 soybean accessions and identified 59 soybean genes, including E1 and E3, as candidates associated with DTF (Fig 2, S2 and S3 Datasets).
Genes E1-E10 [31][32][33][34][35][36][37][38][39][40][41], J [42], and Dt1 [43,44] are known as major DTF-related genes in soybean. Among these, genes E1-E4 act as the major determinants of DTF, as these genes account for approximately 62-66% of variation in DTF [40,57]. The E1 gene has the largest effect on DTF and photoperiod sensitivity [37,63] through regulations of GmFT2a, GmFT5a, and GmFT1a genes [92,93]. The E2 gene enhances photoperiod response and contributes to early flowering by regulating GmFT2a [66,94]. Genes E3 and E4 are involved in the regulation of E1 and affect DTF through the regulation of the far-red light ratio [65,67]. Above all, E1 functions as a key regulator of DTF in soybean [95]. In our GWAS, E1 and E3 genes showed significant p-values, whereas E2 and E4 genes were not detected (Fig 2). The E2 gene showed a non-significant p-value and the E4 gene was excluded from the GWAS because of considerably low MAF of SNPs. According to previous studies, the E1-E4 genes in soybean are considerably affected by habitat ecology and latitude [57, [96][97][98]. The soybean population used in this study consists mostly of Korean accessions and therefore reflects the Korean environment. Korean soybeans are also reported to possess high genomic diversity, unlike soybean accessions from other regions [99][100][101]. Considering the genomic characteristic of our population, the GWAS result suggests that E1 and E3 genes play a significant role in the regulation of DTF trait in Korean soybean population.
In addition to GWAS results, the interaction effects between 59 significant genes and the remaining non-significant genes on DTF were examined ( Table 1). Identification of epistatic interactions could help in the detection of a greater number of candidate genes, since the  Table 2).
https://doi.org/10.1371/journal.pone.0228114.g004 complex DTF trait is affected by interactions among multiple genes [29,102]. In our epistatic analysis, non-significant genes were utilized as 2,188 soybean genes which are homologous genes to 356 DTF-related Arabidopsis genes (S4 Dataset). This approach of marker reduction, called biological filtering, has been reported as one of the effective approaches for epistatic analysis [103,104], as it allows the overcoming of computational limitations caused by a considerable number of marker combinations [105]. Therefore, we selected 356 Arabidopsis genes based on previous reports on DTF and maturity (S4 Dataset), and used to analyze the interaction effects of various DTF-related genes on DTF. The results of epistatic analysis revealed seven interaction groups and 18 candidate genes with significant influence on DTF (Table 2); E1 and E3 genes, which were detected in GWAS, also played a major role in the interaction groups (Fig 3).
DTF is regulated by complex networks of biological processes [106,107], and is characterized to six representative pathways from vegetative to flowering stages in Arabidopsis [108,109]: P (response to day length and light), V (cold exposure), AU (flowering promotion), FPI (floral transition), AM (time of floral transition), and FMI (floral development) (Fig 4). To better understand the genetic effects of candidate genes on DTF, it is necessary to identify the extent to which these candidate genes are involved in the six major pathways. Among the 18 candidate genes identified in this study, eight genes including E1 (interaction groups G1-G3) are involved in P, FPI, FMI, and V pathways; four genes including E3 (G4 and G5) are associated with P, AU, and FPI pathways; and eight genes (G6 and G7) are related to P, V, AU, AM, FMI, and FPI pathways (Table 2, Fig 4). Thus, all of these 18 candidate genes were evenly distributed in the six major pathways and affected DTF while interacting with each other in the corresponding interaction groups. Overall, our results of GWAS and epistatic analysis suggest that these 18 candidate genes play a significant role in the regulation of DTF in soybean, at least in the Korean population.
However, our findings have several limitations. First, the DTF-related candidate markers identified in this study do not represent the worldwide soybean population. It is possible that these candidate markers are specific to Korean soybean accessions because 94% of the soybean population was of Korean origin. Second, the epistatic analysis was restricted to only 356 DTFrelated Arabidopsis genes. The 356 genes were used to reduce computational overload; however, markers excluded in the epistatic analysis could also have a significant effect on DTF while interacting with each other. Finally, our candidate markers have not been validated by biological experiments. To minimize this limitation, we examined the genomic characteristics of the soybean population before GWAS and used a conservative cut-off in GWAS, which was higher than the Bonferroni-adjusted p-value. Despite these efforts, our candidate markers need further experimental validation, but they are expected to present valuable information to soybean breeding programs aimed at improving the DTF trait.

Conclusion
DTF is an important agronomic trait relevant to plant growth, development, and productivity in soybean. This trait is regulated by complex biological processes and is affected by endogenous genetic factors as well as environmental cues. Many researches have demonstrated the effect of genotype × environment interaction on DTF; however, studies on epistatic interactions among genetic factors have rarely been reported. Although our study is restricted by the origin of soybean accessions and also requires further biological validation, candidate markers identified in this study provides not only additional information for understanding the DTF trait in soybean but also a valuable genetic basis for soybean breeding programs aimed at improving DTF.   (2)