Whole-Genome Quantitative Trait Locus Mapping Reveals Major Role of Epistasis on Yield of Rice

Although rice yield has been doubled in most parts of the world since 1960s, thanks to the advancements in breeding technologies, the biological mechanisms controlling yield are largely unknown. To understand the genetic basis of rice yield, a number of quantitative trait locus (QTL) mapping studies have been carried out, but whole-genome QTL mapping incorporating all interaction effects is still lacking. In this paper, we exploited whole-genome markers of an immortalized F2 population derived from an elite rice hybrid to perform QTL mapping for rice yield characterized by yield per plant and three yield component traits. Our QTL model includes additive and dominance main effects of 1,619 markers and all pair-wise interactions, with a total of more than 5 million possible effects. The QTL mapping identified 54, 5, 28 and 4 significant effects involving 103, 9, 52 and 7 QTLs for the four traits, namely the number of panicles per plant, the number of grains per panicle, grain weight, and yield per plant. Most identified QTLs are involved in digenic interactions. An extensive literature survey of experimentally characterized genes related to crop yield shows that 19 of 54 effects, 4 of 5 effects, 12 of 28 effects and 2 of 4 effects for the four traits, respectively, involve at least one QTL that locates within 2 cM distance to at least one yield-related gene. This study not only reveals the major role of epistasis influencing rice yield, but also provides a set of candidate genetic loci for further experimental investigation.


Introduction
Given the paramount importance in sustaining food demanding, great efforts have been made in large scale genetic research and extensive breeding programs in almost all rice (Oryza sativa L.) producing countries [1,2]. Gains in rice yield in recent decades are mainly owed to advancements in breeding technologies including selection of cultivars with higher productivity and significant increase of agricultural inputs such as fertilizers and insecticides [3]. While global environmental degradation has limited further yield increase through more agricultural inputs, studying the underlying biological processes of rice yield, and transferring the knowledge gains into improvement in breeding and agronomic productivity have become the key for further increase of food production [4].
Rice yield is determined by several factors including the number of panicles per plant, the number of grains per panicle and grain weight. These component traits and the overall yield per plant exhibit continuous variation since they are influenced by multiple genetic factors named quantitative trait loci (QTLs) and other environmental factors. Genetic markers such as restriction fragment length polymorphisms (RFLPs) [5] and simple sequence repeats (SSRs) [6] have been utilized to identify QTLs for understanding genetic basis controlling rice yield [7][8][9][10][11]. A recent study on QTL mapping for rice yield derived a high density single nucleotide polymorphism (SNP) bin map from genomic sequences obtained using deep sequencing technology, and demonstrated that such high density SNP bin map enabled to identify more QTLs with higher location precision than the traditional approach based on RFLP and SSR markers [12]. However, these studies attempted to identify QTLs individually via single interval mapping [5] or composite interval mapping with a small scan window [13], which had limited power of detection, given that many agronomic traits are controlled simultaneously by multiple QTLs and influenced by environmental factors [14,15].
Whole-genome marker QTL mapping employs a multiple QTL model that includes all available markers and evaluates effects of these markers simultaneously [16][17][18]. Such approach overcomes the limitations of the traditional single marker-based QTL mapping methods [16]. However, when genetic interactions are considered, a multiple QTL model can have a huge number of variables, which makes model inference very challenging. Early methods for multiple QTL mapping usually rely on Markov chain Monte Carlo (MCMC) simulation to fit a Bayesian model [16][17][18][19][20], which is computationally intensive and unpractical when a large number of markers are considered. Recently, more efficient and accurate methods have been developed [21,22], which make whole-genome marker QTL mapping feasible. With wholegenome marker QTL mapping considering main effects and interactions of all additive and dominance effects simultaneously, contributions of numerous genetic effects to rice yield can be assessed.
In this study, we applied our empirical Bayesian least absolute shrinkage and selection operator (EBlasso) method [21,22] to whole-genome QTL mapping for an elite indica rice hybrid, Shanyou 63 [7,23]. Our EBlasso model includes additive and dominance main effects of 1,619 markers, all additive 6 additive interactions, additive 6 dominance interactions, dominance 6 additive interactions, and dominance 6 dominance interactions, with a total of more than 5 million possible effects. The quantitative traits considered in this study include yield per plant and three yield component traits, namely the number of panicles per plant, the number of grains per panicle and grain weight. We will demonstrate that our EBlasso identifies a number of QTLs, most of which are involved in digenic interactions, and coincide with or are close to experimentally investigated genes related to yield.

Results
Four quantitative traits including three rice yield component traits (the number of panicles per plant, the number of grains per panicle and grain weight) and overall yield per plant were analyzed using the EBlasso method. The full QTL model includes main additive and dominance effects of 1,619 markers and all their pair-wise interactions, with a total of k = 5,242,322 variables (see the Materials and Methods section for the genetic map). To understand the performance gain of the full model, we also performed QTL mapping for the four traits with a QTL model including k = 3,238 main effects, which is referred to as the main effect model.
We estimated the phenotypic variance explained by a particular  variance of the coefficient of QTL j and the total phenotypic variance s 2 y was estimated from the data. To estimate the total variance explained by all identified QTLs, we refitted the data to an ordinary linear regression model that includes variables corresponding to the identified QTLs. The phenotypic values were predicted from the linear regression model asŷ y, and the total phenotypic variance explained by all identified QTLs was calculated as h 2~v ar(ŷ y) var(y)~v .

QTL mapping for the number of panicles per plant
The three-step cross validation (CV) procedure (detailed in the Materials and Methods section) for the full model identified the optimal pair of parameters as (a, b) = (0.5, 0.5) (Table S1 in File S1). Using the optimal values of (a, b), the EBlasso algorithm shrunk most of k variables to zero and yielded a QTL model with 111 nonzero effects. The statistical test, described in the Materials and Methods section, for each nonzero effect identified 54 significant effects at a p-value #0.01 (Table 1). Among them, one was main additive effect, 18 were additive 6 additive interaction, 32 were additive 6dominance interaction, and three were dominance 6dominance interaction. The 54 effects involved 103 QTLs and explained 94.05% of the total phenotypic variance.  We did a literature survey and found 99 genes with known genomic locations that had experimental evidence showing that they were related to rice yield and yield component traits. For each of the 103 QTLs, we identified genes from 99 experimentally investigated genes that were within 20 centi-Morgan (cM) distance and associated such genes with the QTL. In total, we found 58 genes for 103 QTLs. For the ease of presentation, we organized QTLs within 20 cM distance into a group, which resulted in 51 groups for 103 QTLs. These 51 QTL groups and associated genes are listed in Table S2 in File S1. It is seen that 36 groups of QTLs have at least one associated gene and the distances between QTLs and their associated genes are relatively small (median distance 5.37 cM). Moreover, 21 QTLs involved in 19 of 54 effects locate within 2 cM distance to at least one gene influencing rice yield. The interaction network of the 103 QTLs and their associated genes are visualized in Figure 1.
The three-step CV for the main effect model identified the optimal pair of parameters as (a, b) = ( 20.01, 0.5) ( Table S1 in File S1), with which eight additive and two dominance effects involving ten QTLs were identified with a p-value #0.01 ( Table 2). The ten effects totally explained 39.76% of the phenotypic variance, and nine of them had genes related to yield within 20 cM distance (median distance 9.29 cM) ( Table S3 in File S1). Seven QTLs were identical to QTLs or within the QTL group identified from the full model (Bins 228, 353, 757, 861, 908, 994, 1363), and the other three (Bins 3, 461 and 818) were close to QTLs identified by  Table 3. Estimated QTL effects from the full model for the number of grains per panicle. QTL mapping for the number of grains per panicle The CV analysis identified the optimal pair of parameters (a, b) = (0.05, 0.1) for the full QTL model for the number of grains per panicle (Table S4 in File S1), with which EBlasso identified five nonzero effects. All of these nonzero effects were significant at a pvalue #0.01 (Table 3), including one main additive effect and four additive 6 dominance interactions. The five effects involved nine QTLs, and explained 46.51% of the overall phenotypic variance. Eight of the nine QTLs have experimentally verified genes related to rice yield within 20 cM distance (median distance 4.86 cM) ( Table S5 in File S1). Moreover, four of these QTLs involved in four effects locate within 2 cM distance to at least one yield-related gene. The interaction network of the nine QTLs and their associated genes are depicted in Figure 2.
The same three-step CV for the main effect model identified the optimal pair of parameters (a, b) = ( 20.4, 0.5) ( Table S4 in File S1), with which five additive effects were identified, all having a pvalue #0.01 (Table 4). The five QTLs (Bins 43, 436, 877, 1006, 1057) totally explained 41.48% of the phenotypic variance, and all had molecularly characterized genes related to rice yield within 19 cM distance (median distance 1.59 cM) ( Table S6 in File S1). All five QTLs were identical or very close to the QTLs identified from the full model. Specifically, Bin436 and Bin1057 were identified in both models; Bin43 is 3.40 cM away from Bin50 identified from the full model; Bin877 is 0.47 cM away from Bin875 identified from the full model; and Bin1006 is 0.72 cM away from Bin1004 identified from the full model. Comparing the results obtained from the two models, we observed that although both models identified five effects, the full model identified four more QTLs and explained a slightly larger percentage of phenotypic variance. Moreover, the main effect model identified five additive effects, but the full model identified QTLs with both additive and dominance effects.

QTL mapping for grain weight
The CV analysis determined the optimal (a, b) = (1, 1) (Table S7 in File S1) for the full QTL model for grain weights. Using the optimal a and b, EBlasso yields a QTL model including 89 nonzero effects, among which 28 effects were identified as significant at a p-value #0.01 (Table 5). Among them, one was a main additive effect, 10 were additive 6 additive, 15 were additive 6dominance, and two were dominance 6 dominance interactions. The 28 effects involved 52 QTLs, and explained 93.79% of the phenotypic variance. QTLs with a distance # 20 cM were placed into a group, resulting in 32 groups, and 26 of the 32 QTL groups had at least one gene within 20 cM distance (median distance 5.06 cM) ( Table S8 in File S1). Moreover, 15 QTLs involved in 12 of 28 effects locate within 2 cM distance to at least one yield-related gene. The interaction network of the 52 QTLs and their associated genes are shown in Figure 3.
The CV analysis for the main effect model identified the optimal pair of parameters (a, b) = (1, 1) ( Table S7 in File S1), with which 26 QTLs (19 additive and 7 dominance effects) were identified with a p-value #0.01 ( Table 6). The 26 QTLs totally explained 84.24% of the overall phenotypic variance, and 23 of them had molecularly characterized genes related to rice yield within 16 cM distance (median distance 4.08 cM) ( Table S9 in File S1). Twenty three of the 26 QTLs were identical to or within a QTL group identified from the full model, but three QTLs (Bins 228, 843, and 894) do not correspond to any QTLs identified from the full model within 20 cM distance. Again, the full model identified more QTLs than the main effect model and the QTLs detected by the full model explained more phenotypic variance than those detected by the main effect model.

QTL mapping for yield per plant
The CV analysis determined the optimal pair of parameters (a, b) = (1, 1) for the full QTL model for rice yield (Table S10 in File S1). Using the optimal values of (a, b), EBlasso yielded four nonzero effects, all were significant at a p-value #0.01: one main additive effect, one additive 6 additive interaction, one additive 6dominance interaction, and one dominance 6 dominance interaction (see Table 7). The four effects involved seven QTLs and explained 34.01% of the overall phenotypic variance. Five out of the seven QTLs have an experimentally verified gene within 15 cM distance (median distance 2.21 cM) (Table S11 in File S1). Moreover, two QTLs involved in two of four effects locate within 2 cM distance to at least one yield-related gene. The interaction network of the seven QTLs and their associated genes are described in Figure 4.
The optimal pair of parameters determined by the CV analysis for the main effect model was (a, b) = (20.5, 0.1) (Table S10 in File S1), with which four QTLs with a p-value #0.01 were identified ( Table 8). The four QTL effects explained 23.79% of the phenotypic variance, and all had at least one gene within 17 cM distance (median distance 7.82 cM) (Table S12 in File S1). Two of the four QTLs (Bin1014, Bin1057) were identical to the QTLs identified from the full model, but the other two QTLs do not correspond to any QTL identified from the full model within 20 cM distance. Overall, although the full model did not detect all QTLs identified by the main effect model, it still detected more QTLs and explained more phenotypic variance. Effect types and pleiotropic genes Among the five types of effects (main additive, main dominance effects, additive 6 additive, additive 6dominance, and dominance 6 dominance interactions) considered in the EBlasso full models for four traits, no main dominance effects was detected, but several dominance 6 dominance interactions (one for rice yield, three for the number of panicles per plant, and two for grain weight) were identified. Many additive 6dominance interaction effects were identified, including one for rice yield, 32 for the number of panicles per plant, four for the number of grains per panicle, and 15 for grain weight. Phenotypic variance explained by a single effect is relatively small for all traits (Tables 1, 3, 5 and 7). For example, the largest effect hasĥ h 2 = 7.82% (908 _dominance 6994 _additive ) for the number of panicles per plant, 8.53% (595 _dominance 61004 _additive ) for the number of grains per panicle, 15.48% (729 _additive ) for grain weight, and 6.08% (1057 _additive 6 1144 _dominance ) for yield per plant. Each main effect detected by the main effect model also explained a small percentage of the total phenotypic variance.
Many molecularly characterized genes related to yield are known to play pleiotropic roles in regulating grain productivity [31]. Without surprise, a number of such genes coincide with or close to the QTLs that were identified by our EBlasso for multiple traits, although they did not necessarily have pleiotropic effects. For example, gene Ghd7, OsNRAMP5 and DEP2 are close to several QTLs common for the four phenotypes, qSW5/GW5, OsEF3 and LOG are near the QTLs for three phenotypes except the number of grains per panicle, and Gn1a, OsJAG, GS3, OsJMT1, OsSPL14, GW8/OsSPL16, SGL1 are associated with QTLs for three phenotypes except yield per plant. Besides Ghd7, OsNRAMP5 and DEP2, gene FZP, OsSDR, and OsFAD8 was near QTLs for both yield per plant and the number of grains per panicle; 14 genes were close to QTLs for both the number of panicles per plant and the number of grains per panicle; and 62 other genes were associated with QTLs for both the number of grains per panicle and grain weight. While the pleiotropic effect of some genes have been reported [32], our QTL mapping results identified a number of genes associated with multiple phenotypes, implying their possible pleiotropic role worthy of further experimental investigation. Moreover, it is also possible that the QTLs we detected may be closely linked to unknown genes, which, if identified, will yield more insight into the molecular basis of phenotypes [1].

Discussion
Due to its small genome and close relatedness with other grass crops, rice has served as a model plant for investigating genetic factors underlying crop productivity [33,34]. To date, more than 600 rice genes have been experimentally cloned with related traits including yield, biotic and abiotic stresses, grain quality, plant architecture, fertility, etc. [2]. However, there is still a knowledge gap regarding the molecular basis of yield-related biological processes [1], suggesting the importance of systematic tools that can enable to understand functional role of genes [2,35]. In this study, we employed a multiple QTL model that included all additive and dominance main effects of 1,619 markers, and all their pair-wise interactions with a total of more than 5 million possible effects, and then applied our EBlasso algorithm to identify QTLs for four agronomic related traits of rice, including yield, the number of panicles per plant, the number of grains per panicle and grain weight. Our QTL mapping revealed a number of QTLs for four traits, most of which are involved in digenic interactions. Moreover, most of these QTLs have at least one experimentally cloned gene within 20 cM distance.
The same set of markers in the recombinant inbred line (RIL) population where the ''immortalized F 2 '' (IMF 2 ) was derived from were used for QTL mapping, via a composite interval mapping method with a scan window size of five markers [12]. Upon development of the IMF 2 population, this dataset was obtained and the ANOVA method was applied to each pair of markers to identify both main and digenic interaction effects from 5,242,322 possible effects [24]. The composite interval mapping identified zero, three (Bin40, Bin446 and Bin1006), seven (Bins 49, 171, 439, 729, 928, 1008, and 1266), and one (Bin1007) QTLs for the  [24]. In contrast, our EBlasso method identified a reasonable number of effects and QTLs for each trait, and 35%-80% of identified effects for four traits involve at least one QTL that locates within 2 cM distance to at least one gene related to crop yield, which corroborates the reliability of the identified effects. The list of genes associated with the identified QTLs provides insight into rice yield with respect to yield component traits. First, the number of panicles depends on plant's ability of producing tillers, which is under genetic, developmental and environmental influence. While previous composite interval mapping did not identify any significant effect with the same set of markers in an RIL population [12], we have identified a set of QTLs that have nearby genes known to regulate plant tillering. For example, among genes in Table S2 in File S1, MOC1/SPA is the first gene characterized for rice tillering; it initiates axillary buds that grow into lateral braches [36]. OsTB1/FC1 has been identified as an important gene that negatively regulates lateral branching in rice [37]. OsSPL14 is a highly expressed gene in the shoot apex and primordial of primary and secondary branches, which promotes panicle branching while reducing tiller number [38]. Through gene mutations, D3, D10, D14, D17/HTD1, and D27 were found to affect tiller initiation and/or outgrowth [37]. Secondly, the number of grains per panicle is another important trait determining crop yield. While composite interval mapping identified three QTLs (Bin40, Bin446 and Bin1006) close to genes Gn1a, GS3, OsNRAMP5 and Ghd7, our EBlasso also identified these genes in addition to other 13 genes. Among them, FZP is known to control spikelet meristem identity [39], Ghd7 is a pleiotropic gene affecting grain number, plant height and heading date [40], GW8/ OsSPL16, PGL, and DEP2 all are known to be essential in regulating cell proliferation or elongation [41][42][43]. Thirdly, composite interval mapping detected seven QTLs (Bins 49, 171, 439, 729, 928, 1008, and 1266) for grain weight, with nearby genes Gn1a, LAX1, GS3, GS5, qSW5/GW5, OsJMT1, OsIAA23, Ghd7, OsNRAMP5, TAC1, LGD1 and SG1. In addition to these genes, our EBlasso identified many other genes with known effects in controlling grain weight. For example, GIF1 is a gene encoding a cell-wall invertase required for carbon partitioning during early grain filling, and overexpression of GIF1 leads to larger and heavier grain weight [44]. Genes SRS3 and SRS5 have been found to regulate seed cell elongation [45,46]. Over-expression of LRK1 gene results in enhanced cellular proliferation and increased grain weight [47]. Finally, yield per plant is the most complex trait and a small number of effects were identified compared with its component traits. While composite interval mapping identified only one QTL (Bin1007) with nearby gene Ghd7 and OsNRAMP5, our EBlasso identified this QTL and six other QTLs, four of which have cloned gene within 15 cM distance (Table S11 in File S1).
In conclusion, taking advantage of the powerful EBlasso model for simultaneously accounting for more than 5 million possible effects, we identified a number of QTLs for four traits of the elite rice hybrid Shanyou 63, a vast majority of which are involved in digenic interactions. This set of QTLs not only shed light on the genetic basis of the yield of the rice hybrid, but also provide candidate loci for identification of new genes that may be involved in crop yield.

Plant materials and QTLs
The genotype and phenotype data used in this study were obtained from previous studies [12,24]. The mapping plants were created by first crossing between indica rice Zhensha 97 and Minghui 63 [7] to produce the elite rice hybrid Shanyou 63 that was the most widely cultivated in China in 1980s -1990s [24]. Then a population of 240 F 9 RILs was derived from single-seed descent of Shanyou 63. Next, an ''immortalized F 2 '' (IMF 2 ) population consisted of 278 crosses was created by intercrossing RILs for QTL mapping study [7,23]. The crossed population was field tested on the experimental farm of Huazhong Agricultural University in Wuhan, China, in 1999, for traits including yield per plant, the number of panicles per plant, the number of grains per panicle and grain weight.
The RILs were genomic sequenced with an Illumina Genome Analyzer II using the bar-coded multiplexed sequencing approach as described in [25], and 270,820 high quality SNPs were identified. Bin maps were constructed by lumping consecutive SNPs with the same genotype into blocks, masking blocks with less than 250 kb to avoid false double recombinations, and merging recombination bins less than 5 kb, resulting in a map consisting of 1,619 bins without missing data [12]. Genotypes of the IMF 2 crosses were deduced according to genotypes of their RIL parents [24]. The three genotypes in each bin were coded as A and B for each parental homozygote genotype and H for the heterozygote. Using the recombinant bins as QTLs, a 1,625.5 cM genetic linkage map was constructed with about 1.0 cM (230 kb) in length per bin (Figure 1).

Bayesian Lasso linear regression model for multiple QTLs
We employed a Bayesian Lasso (BLasso) multiple linear regression model to infer genotypes and quantitative trait associations. The regression model includes main additive and dominance effects of 1,619 SNP bins and all their pair-wise interactions. Let y i be the phenotypic value of a quantitative trait of the ith individual in a mapping population. In this study we observed y i , i = 1, ???, n, of n = 278 individuals and collected them into a vector y = [y 1 , y 2 , ???, y n ] T . In these n individuals, let where m is the population mean, vectors b A and b D represent the main additive and dominance effects of all markers, respectively, and vectors b AA , b AD , b DA and b DD capture the additive 6additive, additive 6 dominance, dominance 6 additive, and dominance 6 dominance interactions, respectively. Matrices X A~½ x A1 ,x A2 , Á Á Á , x DAn T , and X DD~½ x DD1 ,x DD2 , Á Á Á ,x DDn T are the corresponding design matrices of different effects, and 1e is the residual error that follows a normal distribution with zero mean and variance s 2 0 I.
Given m markers, the size of matrix X A or X D is n6m, and the size of X AA , X AD , X DA , or X DD is n6q, where q = m(m21)/ 2 = 1,309,771. Defining b~½b T A ,b T D ,b T AA ,b T AD ,b T DA ,b T DD T , and X~X A ,X D ,X AA ,X AD ,X DA ,X DD ½ , we can write (2) in a more compact form: The size of matrix X is n6k, where k = 2m+4q = 5,242,322, and we apparently have k&n. However, we would expect that most elements of b are zeros and thus we have a sparse linear model. The Blasso model employs a three-level hierarchical prior distribution to model the sparsity. At the first level, letb j , j~1,2, Á Á Á ,k, follows an independent normal distribution with mean zero and unknown variance s 2 j : b jÑ N(0,s 2 j ). At the second level, let s 2 j , j = 1, 2, ???, k, follows an independent exponential distribution with a common parameter l: p(s 2 j )~lexp({ls 2 j ). At the third level, we assign a conjugate Gamma prior Gamma(a, b) with a shape parameter a and an inverse scale parameter b to the parameter l. Finally, we assign noninformative uniform priors to m and s 2 0 . The three-level hierarchical model has two hyperparameters (a, b) for adjusting the degree of shrinkage, and cross validation (CV) can be applied to choose appropriate values of these parameters. The QTL model (2) or equivalently (3) includes all main effects and digenic interactions. We refer to this model as the full model throughout the paper. We also performed QTL mapping with the model y~mzX A b A zX D b D ze, which is referred to as the main effect model, since it includes only the main effects.

Model inference and cross validation
The Blasso model can be inferred efficiently with the empirical Blasso (EBlasso) algorithm [21]. The EBLasso algorithm employs a coordinate ascent method to findŝ s 2 j , the estimate of s 2 j , j = 0 …, k, that maximizes the likelihood function of s 2 j , j = 0, …, k. In the iterative process, many s 2 j or equivalent b j are shrunk to zero. The coordinate ascent method along with other algorithmic techniques makes the EBlasso algorithm very efficient. Our previous studies demonstrated that EBlasso outperformed several other multiple QTL mapping methods including the empirical Bayes method [26], the Bayesian hierarchical generalized linear models (BhGLM) [27], HyperLasso [28], and Lasso [29]. Detailed description of the EBlasso algorithm can be found in [21,22] and an efficient C program with the R interface [30] implementing the EBlasso algorithm is available.
The optimal values of two hyperparameters (a, b) of the EBLasso algorithm were obtained with five-fold CV in three steps to minimize the prediction error (PE) calculated from PE~1 n P n i~1 (y i {ŷ y i ) 2 , whereŷ y i ,i~1, Á Á Á ,n, is the estimated phenotypic value. In the first step, a = b = 0.001, 0.01, 0.1, 1 were examined and a pair (a 1 , b 1 ) corresponding to the smallest PE was obtained. In the second step, b was fixed at b 1 and a was chosen from the set [20.9, 20.8, 20.7, 20.6, 20.5, 20.4, 20.3, 20.2, 20.1, 20.01, 0.01, 0.05, 0.1, 0.5, 1], which yielded a value a 2 corresponding to the smallest PE. In the third step, a = a 2 was fixed and b varied from 0.01 to 10 with a step size of one for b.1 and a step size of one on the logarithmic scale for b,1. Note that when fixing one of the two parameters, the degree of shrinkage is a monotonic function of the other parameter [21,22]. Therefore, in the second and third steps, the selection did not go through the full path but stopped if the current PE was one standard error larger than the minimum PE in previous steps.

Statistical significance test
One advantage of the EBLasso algorithm relative to Lasso [29] is that it not only outputs a k'|1 (k'%k) vectorb bas an estimate of nonzero elements of b, but also gives an estimate of the covariance ofb b,Ŝ S. LettingŜ S jj be the jth diagonal element ofŜ S, we can use the t-statisticsb b j =Ŝ S jj 1=2 to test ifb b j =0 at a certain significance level.

Supporting Information
File S1 Tables S1-S12. Table S1. Cross-validation for determining hyperparameters (a, b) used in QTL mapping for the number of panicles per plant. Table S2. Experimentally investigated genes near QTLs for the number of panicles per plant identified with the full model. Table S3. Experimentally investigated genes near QTLs for the number of panicles per plant identified with the main effect model. Table S4. Crossvalidation for determining hyperparameters (a, b) used in QTL mapping for the number of grains per panicle. Table S5.
Experimentally investigated genes near QTLs for the number of grains per panicle identified with the full model. Table S6.
Experimentally investigated genes near QTLs for the number of grains per panicle identified with the main effect model. Table  S7. Cross-validation for determining hyperparameters (a, b) used in QTL mapping for grain weight. Table S8. Experimentally investigated genes near QTLs for grain weight identified with the full model. Table S9. Experimentally investigated genes near QTLs for grain weight identified with the main effect model. Table S10. Cross-validation for determining hyperparameters (a, b) used in QTL mapping for yield per plant. Table S11.
Experimentally investigated genes near QTLs for yield per plant identified with the full model.