## Figures

## Abstract

Although rice yield has been doubled in most parts of the world since 1960s, thanks to the advancements in breeding technologies, the biological mechanisms controlling yield are largely unknown. To understand the genetic basis of rice yield, a number of quantitative trait locus (QTL) mapping studies have been carried out, but whole-genome QTL mapping incorporating all interaction effects is still lacking. In this paper, we exploited whole-genome markers of an immortalized F_{2} population derived from an elite rice hybrid to perform QTL mapping for rice yield characterized by yield per plant and three yield component traits. Our QTL model includes additive and dominance main effects of 1,619 markers and all pair-wise interactions, with a total of more than 5 million possible effects. The QTL mapping identified 54, 5, 28 and 4 significant effects involving 103, 9, 52 and 7 QTLs for the four traits, namely the number of panicles per plant, the number of grains per panicle, grain weight, and yield per plant. Most identified QTLs are involved in digenic interactions. An extensive literature survey of experimentally characterized genes related to crop yield shows that 19 of 54 effects, 4 of 5 effects, 12 of 28 effects and 2 of 4 effects for the four traits, respectively, involve at least one QTL that locates within 2 cM distance to at least one yield-related gene. This study not only reveals the major role of epistasis influencing rice yield, but also provides a set of candidate genetic loci for further experimental investigation.

**Citation: **Huang A, Xu S, Cai X (2014) Whole-Genome Quantitative Trait Locus Mapping Reveals Major Role of Epistasis on Yield of Rice. PLoS ONE 9(1):
e87330.
https://doi.org/10.1371/journal.pone.0087330

**Editor: **Tongming Yin, Nanjing Forestry University, China

**Received: **September 23, 2013; **Accepted: **December 19, 2013; **Published: ** January 29, 2014

**Copyright: ** © 2014 Huang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Funding: **This work was supported by the National Science Foundation (NSF) under NSF CAREER Award no. 0746882 to XC and by the Agriculture and Food Research Initiative (AFRI) of the USDA National Institute of Food and Agriculture under the Plant Genome, Genetics and Breeding Program 2007-35300-18285 to SX. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Given the paramount importance in sustaining food demanding, great efforts have been made in large scale genetic research and extensive breeding programs in almost all rice (*Oryza sativa* L.) producing countries [1], [2]. Gains in rice yield in recent decades are mainly owed to advancements in breeding technologies including selection of cultivars with higher productivity and significant increase of agricultural inputs such as fertilizers and insecticides [3]. While global environmental degradation has limited further yield increase through more agricultural inputs, studying the underlying biological processes of rice yield, and transferring the knowledge gains into improvement in breeding and agronomic productivity have become the key for further increase of food production [4].

Rice yield is determined by several factors including the number of panicles per plant, the number of grains per panicle and grain weight. These component traits and the overall yield per plant exhibit continuous variation since they are influenced by multiple genetic factors named quantitative trait loci (QTLs) and other environmental factors. Genetic markers such as restriction fragment length polymorphisms (RFLPs) [5] and simple sequence repeats (SSRs) [6] have been utilized to identify QTLs for understanding genetic basis controlling rice yield [7]–[11]. A recent study on QTL mapping for rice yield derived a high density single nucleotide polymorphism (SNP) bin map from genomic sequences obtained using deep sequencing technology, and demonstrated that such high density SNP bin map enabled to identify more QTLs with higher location precision than the traditional approach based on RFLP and SSR markers [12]. However, these studies attempted to identify QTLs individually via single interval mapping [5] or composite interval mapping with a small scan window [13], which had limited power of detection, given that many agronomic traits are controlled simultaneously by multiple QTLs and influenced by environmental factors [14], [15].

Whole-genome marker QTL mapping employs a multiple QTL model that includes all available markers and evaluates effects of these markers simultaneously [16]–[18]. Such approach overcomes the limitations of the traditional single marker-based QTL mapping methods [16]. However, when genetic interactions are considered, a multiple QTL model can have a huge number of variables, which makes model inference very challenging. Early methods for multiple QTL mapping usually rely on Markov chain Monte Carlo (MCMC) simulation to fit a Bayesian model [16]–[20], which is computationally intensive and unpractical when a large number of markers are considered. Recently, more efficient and accurate methods have been developed [21], [22], which make whole-genome marker QTL mapping feasible. With whole-genome marker QTL mapping considering main effects and interactions of all additive and dominance effects simultaneously, contributions of numerous genetic effects to rice yield can be assessed.

In this study, we applied our empirical Bayesian least absolute shrinkage and selection operator (EBlasso) method [21], [22] to whole-genome QTL mapping for an elite *indica* rice hybrid, Shanyou 63 [7], [23]. Our EBlasso model includes additive and dominance main effects of 1,619 markers, all additive × additive interactions, additive × dominance interactions, dominance × additive interactions, and dominance × dominance interactions, with a total of more than 5 million possible effects. The quantitative traits considered in this study include yield per plant and three yield component traits, namely the number of panicles per plant, the number of grains per panicle and grain weight. We will demonstrate that our EBlasso identifies a number of QTLs, most of which are involved in digenic interactions, and coincide with or are close to experimentally investigated genes related to yield.

## Results

Four quantitative traits including three rice yield component traits (the number of panicles per plant, the number of grains per panicle and grain weight) and overall yield per plant were analyzed using the EBlasso method. The full QTL model includes main additive and dominance effects of 1,619 markers and all their pair-wise interactions, with a total of *k* = 5,242,322 variables (see the Materials and Methods section for the genetic map). To understand the performance gain of the full model, we also performed QTL mapping for the four traits with a QTL model including *k* = 3,238 main effects, which is referred to as the main effect model.

We estimated the phenotypic variance explained by a particular QTL *j* as , *j* = 1, 2, …, , where var(*x** _{j}*) is the variance of the coefficient of QTL

*j*and the total phenotypic variance was estimated from the data. To estimate the total variance explained by all identified QTLs, we refitted the data to an ordinary linear regression model that includes variables corresponding to the identified QTLs. The phenotypic values were predicted from the linear regression model as , and the total phenotypic variance explained by all identified QTLs was calculated as(1).

### QTL mapping for the number of panicles per plant

The three-step cross validation (CV) procedure (detailed in the Materials and Methods section) for the full model identified the optimal pair of parameters as (*a*, *b*) = (0.5, 0.5) (Table S1 in File S1). Using the optimal values of (*a*, *b*), the EBlasso algorithm shrunk most of *k* variables to zero and yielded a QTL model with 111 nonzero effects. The statistical test, described in the Materials and Methods section, for each nonzero effect identified 54 significant effects at a *p*-value ≤0.01 (Table 1). Among them, one was main additive effect, 18 were additive × additive interaction, 32 were additive ×dominance interaction, and three were dominance × dominance interaction. The 54 effects involved 103 QTLs and explained 94.05% of the total phenotypic variance. We did a literature survey and found 99 genes with known genomic locations that had experimental evidence showing that they were related to rice yield and yield component traits. For each of the 103 QTLs, we identified genes from 99 experimentally investigated genes that were within 20 centi-Morgan (cM) distance and associated such genes with the QTL. In total, we found 58 genes for 103 QTLs. For the ease of presentation, we organized QTLs within 20 cM distance into a group, which resulted in 51 groups for 103 QTLs. These 51 QTL groups and associated genes are listed in Table S2 in File S1. It is seen that 36 groups of QTLs have at least one associated gene and the distances between QTLs and their associated genes are relatively small (median distance 5.37 cM). Moreover, 21 QTLs involved in 19 of 54 effects locate within 2 cM distance to at least one gene influencing rice yield. The interaction network of the 103 QTLs and their associated genes are visualized in Figure 1.

The circle shows the bin map and columns indicate position of the makers (ticks in million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line indicates a main effect. Molecularly characterized genes related to yield are also labeled in the appropriate positions of the genome.

The three-step CV for the main effect model identified the optimal pair of parameters as (*a*, *b*) = ( −0.01, 0.5) (Table S1 in File S1), with which eight additive and two dominance effects involving ten QTLs were identified with a *p*-value ≤0.01 (Table 2). The ten effects totally explained 39.76% of the phenotypic variance, and nine of them had genes related to yield within 20 cM distance (median distance 9.29 cM) (Table S3 in File S1). Seven QTLs were identical to QTLs or within the QTL group identified from the full model (Bins 228, 353, 757, 861, 908, 994, 1363), and the other three (Bins 3, 461 and 818) were close to QTLs identified by the full model. Specifically, Bin3 was 3.97 cM away from Bin7 identified from the full model; Bin461 was 8.29 cM away from Bin456 identified from the full model; and Bin818 was 6.15 cM away from Bin810 identified from the full model. Comparing the results obtained from the two models, we see that the full model identified more QTLs, which included all those identified by the main effect model, and explained a much larger percentage of the phenotypic variance.

### QTL mapping for the number of grains per panicle

The CV analysis identified the optimal pair of parameters (*a*, *b*) = (0.05, 0.1) for the full QTL model for the number of grains per panicle (Table S4 in File S1), with which EBlasso identified five nonzero effects. All of these nonzero effects were significant at a *p*-value ≤0.01 (Table 3), including one main additive effect and four additive × dominance interactions. The five effects involved nine QTLs, and explained 46.51% of the overall phenotypic variance. Eight of the nine QTLs have experimentally verified genes related to rice yield within 20 cM distance (median distance 4.86 cM) (Table S5 in File S1). Moreover, four of these QTLs involved in four effects locate within 2 cM distance to at least one yield-related gene. The interaction network of the nine QTLs and their associated genes are depicted in Figure 2.

The circle shows the bin map and columns indicate position of the makers (ticks in million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line indicates a main effect. Molecularly characterized genes related to yield are also labeled in the appropriate positions of the genome.

The same three-step CV for the main effect model identified the optimal pair of parameters (*a*, *b*) = ( −0.4, 0.5) (Table S4 in File S1), with which five additive effects were identified, all having a *p*-value ≤0.01 (Table 4). The five QTLs (Bins 43, 436, 877, 1006, 1057) totally explained 41.48% of the phenotypic variance, and all had molecularly characterized genes related to rice yield within 19 cM distance (median distance 1.59 cM) (Table S6 in File S1). All five QTLs were identical or very close to the QTLs identified from the full model. Specifically, Bin436 and Bin1057 were identified in both models; Bin43 is 3.40 cM away from Bin50 identified from the full model; Bin877 is 0.47 cM away from Bin875 identified from the full model; and Bin1006 is 0.72 cM away from Bin1004 identified from the full model. Comparing the results obtained from the two models, we observed that although both models identified five effects, the full model identified four more QTLs and explained a slightly larger percentage of phenotypic variance. Moreover, the main effect model identified five additive effects, but the full model identified QTLs with both additive and dominance effects.

### QTL mapping for grain weight

The CV analysis determined the optimal (*a*, *b*) = (1, 1) (Table S7 in File S1) for the full QTL model for grain weights. Using the optimal *a* and *b*, EBlasso yields a QTL model including 89 nonzero effects, among which 28 effects were identified as significant at a *p*-value ≤0.01 (Table 5). Among them, one was a main additive effect, 10 were additive × additive, 15 were additive ×dominance, and two were dominance × dominance interactions. The 28 effects involved 52 QTLs, and explained 93.79% of the phenotypic variance. QTLs with a distance ≤20 cM were placed into a group, resulting in 32 groups, and 26 of the 32 QTL groups had at least one gene within 20 cM distance (median distance 5.06 cM) (Table S8 in File S1). Moreover, 15 QTLs involved in 12 of 28 effects locate within 2 cM distance to at least one yield-related gene. The interaction network of the 52 QTLs and their associated genes are shown in Figure 3.

The circle shows the bin map and columns indicate position of the makers (ticks in million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line indicates a main effect. Molecularly characterized genes related to yield are also labeled in the appropriate positions of the genome.

The CV analysis for the main effect model identified the optimal pair of parameters (*a*, *b*) = (1, 1) (Table S7 in File S1), with which 26 QTLs (19 additive and 7 dominance effects) were identified with a *p*-value ≤0.01 (Table 6). The 26 QTLs totally explained 84.24% of the overall phenotypic variance, and 23 of them had molecularly characterized genes related to rice yield within 16 cM distance (median distance 4.08 cM) (Table S9 in File S1). Twenty three of the 26 QTLs were identical to or within a QTL group identified from the full model, but three QTLs (Bins 228, 843, and 894) do not correspond to any QTLs identified from the full model within 20 cM distance. Again, the full model identified more QTLs than the main effect model and the QTLs detected by the full model explained more phenotypic variance than those detected by the main effect model.

### QTL mapping for yield per plant

The CV analysis determined the optimal pair of parameters (*a*, *b*) = (1, 1) for the full QTL model for rice yield (Table S10 in File S1). Using the optimal values of (*a*, *b*), EBlasso yielded four nonzero effects, all were significant at a *p*-value ≤0.01: one main additive effect, one additive × additive interaction, one additive ×dominance interaction, and one dominance × dominance interaction (see Table 7). The four effects involved seven QTLs and explained 34.01% of the overall phenotypic variance. Five out of the seven QTLs have an experimentally verified gene within 15 cM distance (median distance 2.21 cM) (Table S11 in File S1). Moreover, two QTLs involved in two of four effects locate within 2 cM distance to at least one yield-related gene. The interaction network of the seven QTLs and their associated genes are described in Figure 4.

The optimal pair of parameters determined by the CV analysis for the main effect model was (*a*, *b*) = (−0.5, 0.1) (Table S10 in File S1), with which four QTLs with a *p*-value ≤0.01 were identified (Table 8). The four QTL effects explained 23.79% of the phenotypic variance, and all had at least one gene within 17 cM distance (median distance 7.82 cM) (Table S12 in File S1). Two of the four QTLs (Bin1014, Bin1057) were identical to the QTLs identified from the full model, but the other two QTLs do not correspond to any QTL identified from the full model within 20 cM distance. Overall, although the full model did not detect all QTLs identified by the main effect model, it still detected more QTLs and explained more phenotypic variance.

### Effect types and pleiotropic genes

Among the five types of effects (main additive, main dominance effects, additive × additive, additive ×dominance, and dominance × dominance interactions) considered in the EBlasso full models for four traits, no main dominance effects was detected, but several dominance × dominance interactions (one for rice yield, three for the number of panicles per plant, and two for grain weight) were identified. Many additive ×dominance interaction effects were identified, including one for rice yield, 32 for the number of panicles per plant, four for the number of grains per panicle, and 15 for grain weight. Phenotypic variance explained by a single effect is relatively small for all traits (Tables 1, 3, 5 and 7). For example, the largest effect has = 7.82% (908_{_dominance}×994_{_additive}) for the number of panicles per plant, 8.53% (595_{_dominance}×1004_{_additive}) for the number of grains per panicle, 15.48% (729_{_additive}) for grain weight, and 6.08% (1057_{_additive}×1144_{_dominance}) for yield per plant. Each main effect detected by the main effect model also explained a small percentage of the total phenotypic variance.

Many molecularly characterized genes related to yield are known to play pleiotropic roles in regulating grain productivity [31]. Without surprise, a number of such genes coincide with or close to the QTLs that were identified by our EBlasso for multiple traits, although they did not necessarily have pleiotropic effects. For example, gene *Ghd7*, *OsNRAMP5* and *DEP2* are close to several QTLs common for the four phenotypes, *qSW5/GW5*, *OsEF3* and *LOG* are near the QTLs for three phenotypes except the number of grains per panicle, and *Gn1a*, *OsJAG*, *GS3*, *OsJMT1*, *OsSPL14*, *GW8/OsSPL16*, *SGL1* are associated with QTLs for three phenotypes except yield per plant. Besides *Ghd7*, *OsNRAMP5* and *DEP2*, gene *FZP*, *OsSDR*, and *OsFAD8* was near QTLs for both yield per plant and the number of grains per panicle; 14 genes were close to QTLs for both the number of panicles per plant and the number of grains per panicle; and 62 other genes were associated with QTLs for both the number of grains per panicle and grain weight. While the pleiotropic effect of some genes have been reported [32], our QTL mapping results identified a number of genes associated with multiple phenotypes, implying their possible pleiotropic role worthy of further experimental investigation. Moreover, it is also possible that the QTLs we detected may be closely linked to unknown genes, which, if identified, will yield more insight into the molecular basis of phenotypes [1].

## Discussion

Due to its small genome and close relatedness with other grass crops, rice has served as a model plant for investigating genetic factors underlying crop productivity [33], [34]. To date, more than 600 rice genes have been experimentally cloned with related traits including yield, biotic and abiotic stresses, grain quality, plant architecture, fertility, etc. [2]. However, there is still a knowledge gap regarding the molecular basis of yield-related biological processes [1], suggesting the importance of systematic tools that can enable to understand functional role of genes [2], [35]. In this study, we employed a multiple QTL model that included all additive and dominance main effects of 1,619 markers, and all their pair-wise interactions with a total of more than 5 million possible effects, and then applied our EBlasso algorithm to identify QTLs for four agronomic related traits of rice, including yield, the number of panicles per plant, the number of grains per panicle and grain weight. Our QTL mapping revealed a number of QTLs for four traits, most of which are involved in digenic interactions. Moreover, most of these QTLs have at least one experimentally cloned gene within 20 cM distance.

The same set of markers in the recombinant inbred line (RIL) population where the “immortalized F_{2}” (IMF_{2}) was derived from were used for QTL mapping, via a composite interval mapping method with a scan window size of five markers [12]. Upon development of the IMF_{2} population, this dataset was obtained and the ANOVA method was applied to each pair of markers to identify both main and digenic interaction effects from 5,242,322 possible effects [24]. The composite interval mapping identified zero, three (Bin40, Bin446 and Bin1006), seven (Bins 49, 171, 439, 729, 928, 1008, and 1266), and one (Bin1007) QTLs for the number of panicles per plant, the number of grains per panicle, grain weight and yield per plant, respectively. The ANOVA method detected thousands (1432, 2696, 3524 and 2251) of digenic interactions between two bins with a *p*-value ≤0.001; and after those digenic interactions involving adjacent bins were merged, 115, 189, 238, and 204 effects were reported, respectively [24]. In contrast, our EBlasso method identified a reasonable number of effects and QTLs for each trait, and 35%–80% of identified effects for four traits involve at least one QTL that locates within 2 cM distance to at least one gene related to crop yield, which corroborates the reliability of the identified effects.

The list of genes associated with the identified QTLs provides insight into rice yield with respect to yield component traits. First, the number of panicles depends on plant's ability of producing tillers, which is under genetic, developmental and environmental influence. While previous composite interval mapping did not identify any significant effect with the same set of markers in an RIL population [12], we have identified a set of QTLs that have nearby genes known to regulate plant tillering. For example, among genes in Table S2 in File S1, *MOC1/SPA* is the first gene characterized for rice tillering; it initiates axillary buds that grow into lateral braches [36]. *OsTB1/FC1* has been identified as an important gene that negatively regulates lateral branching in rice [37]. *OsSPL14* is a highly expressed gene in the shoot apex and primordial of primary and secondary branches, which promotes panicle branching while reducing tiller number [38]. Through gene mutations, *D3*, *D10, D14*, *D17/HTD1*, and *D27* were found to affect tiller initiation and/or outgrowth [37]. Secondly, the number of grains per panicle is another important trait determining crop yield. While composite interval mapping identified three QTLs (Bin40, Bin446 and Bin1006) close to genes *Gn1a*, *GS3*, *OsNRAMP5* and *Ghd7*, our EBlasso also identified these genes in addition to other 13 genes. Among them, *FZP* is known to control spikelet meristem identity [39], *Ghd7* is a pleiotropic gene affecting grain number, plant height and heading date [40], *GW8/OsSPL16, PGL*, and *DEP2* all are known to be essential in regulating cell proliferation or elongation [41]–[43]. Thirdly, composite interval mapping detected seven QTLs (Bins 49, 171, 439, 729, 928, 1008, and 1266) for grain weight, with nearby genes *Gn1a*, *LAX1*, *GS3*, *GS5*, *qSW5/GW5*, *OsJMT1*, *OsIAA23*, *Ghd7*, *OsNRAMP5*, *TAC1*, *LGD1* and *SG1*. In addition to these genes, our EBlasso identified many other genes with known effects in controlling grain weight. For example, *GIF1* is a gene encoding a cell-wall invertase required for carbon partitioning during early grain filling, and overexpression of *GIF1* leads to larger and heavier grain weight [44]. Genes *SRS3* and *SRS5* have been found to regulate seed cell elongation [45], [46]. Over-expression of *LRK1* gene results in enhanced cellular proliferation and increased grain weight [47]. Finally, yield per plant is the most complex trait and a small number of effects were identified compared with its component traits. While composite interval mapping identified only one QTL (Bin1007) with nearby gene *Ghd7* and *OsNRAMP5*, our EBlasso identified this QTL and six other QTLs, four of which have cloned gene within 15 cM distance (Table S11 in File S1).

In conclusion, taking advantage of the powerful EBlasso model for simultaneously accounting for more than 5 million possible effects, we identified a number of QTLs for four traits of the elite rice hybrid Shanyou 63, a vast majority of which are involved in digenic interactions. This set of QTLs not only shed light on the genetic basis of the yield of the rice hybrid, but also provide candidate loci for identification of new genes that may be involved in crop yield.

## Materials and Methods

### Plant materials and QTLs

The genotype and phenotype data used in this study were obtained from previous studies [12], [24]. The mapping plants were created by first crossing between *indica* rice Zhensha 97 and Minghui 63 [7] to produce the elite rice hybrid Shanyou 63 that was the most widely cultivated in China in 1980s –1990s [24]. Then a population of 240 F_{9} RILs was derived from single-seed descent of Shanyou 63. Next, an “immortalized F_{2}” (IMF_{2}) population consisted of 278 crosses was created by intercrossing RILs for QTL mapping study [7], [23]. The crossed population was field tested on the experimental farm of Huazhong Agricultural University in Wuhan, China, in 1999, for traits including yield per plant, the number of panicles per plant, the number of grains per panicle and grain weight.

The RILs were genomic sequenced with an Illumina Genome Analyzer II using the bar-coded multiplexed sequencing approach as described in [25], and 270,820 high quality SNPs were identified. Bin maps were constructed by lumping consecutive SNPs with the same genotype into blocks, masking blocks with less than 250 kb to avoid false double recombinations, and merging recombination bins less than 5 kb, resulting in a map consisting of 1,619 bins without missing data [12]. Genotypes of the IMF_{2} crosses were deduced according to genotypes of their RIL parents [24]. The three genotypes in each bin were coded as A and B for each parental homozygote genotype and H for the heterozygote. Using the recombinant bins as QTLs, a 1,625.5 cM genetic linkage map was constructed with about 1.0 cM (230 kb) in length per bin (Figure 1).

### Bayesian Lasso linear regression model for multiple QTLs

We employed a Bayesian Lasso (BLasso) multiple linear regression model to infer genotypes and quantitative trait associations. The regression model includes main additive and dominance effects of 1,619 SNP bins and all their pair-wise interactions. Let *y _{i}* be the phenotypic value of a quantitative trait of the

*i*th individual in a mapping population. In this study we observed

*y*,

_{i}*i*= 1, ···,

*n*, of

*n*= 278 individuals and collected them into a vector

**= [**

*y**y*···,

_{1}, y_{2},*y*]

_{n}*. In these*

^{T}*n*individuals, let

*m =*1,619 denote the number of genetic markers genotyped whose main effects include additive and dominance effects. Let the additive and dominance genotypes of marker

*j*of individual

*i*be

*x*and

_{Aij}*x*, respectively, where

_{Dij}*x*takes on values +1, 0 and −1, and

_{Aij}*x*takes on values 0, +1 and 0, corresponding to genotypes A, H and B, respectively. Let us define and . The interactions between any two effects are modeled as element-wise product of the corresponding main effects. Let

_{Dij}**x**

*,*

_{AAi}**x**

*,*

_{ADi}**x**

*, and*

_{DAi}**x**

*be vectors containing, ,, and , respectively, where and . Then we have the following linear regression model for*

_{DDi}**:(2)where**

*y**μ*is the population mean, vectors

*β**and*

_{A}

*β**represent the main additive and dominance effects of all markers, respectively, and vectors*

_{D}

*β**,*

_{AA}

*β**,*

_{AD}

*β**and*

_{DA}

*β**capture the additive × additive, additive × dominance, dominance × additive, and dominance × dominance interactions, respectively. Matrices , , , , , and are the corresponding design matrices of different effects, and is the residual error that follows a normal distribution with zero mean and variance .*

_{DD}Given *m* markers, the size of matrix **X*** _{A}* or

**X**

*is*

_{D}*n*×

*m*, and the size of

**X**

*,*

_{AA}**X**

*,*

_{AD}**X**

*, or*

_{DA}**X**

*is*

_{DD}*n*×

*q*, where

*q*=

*m*(

*m*−1)/2 = 1,309,771. Defining , and , we can write (2) in a more compact form:(3)

The size of matrix **X** is *n*×*k*, where *k* = 2*m*+4*q = *5,242,322, and we apparently have . However, we would expect that most elements of are zeros and thus we have a sparse linear model. The Blasso model employs a three-level hierarchical prior distribution to model the sparsity. At the first level, let, follows an independent normal distribution with mean zero and unknown variance . At the second level, let , *j* = 1, 2, ···, *k*, follows an independent exponential distribution with a common parameter *λ*: . At the third level, we assign a conjugate Gamma prior *Gamma*(*a*, *b*) with a shape parameter *a* and an inverse scale parameter *b* to the parameter *λ*. Finally, we assign non-informative uniform priors to *μ* and . The three-level hierarchical model has two hyperparameters (*a*, *b*) for adjusting the degree of shrinkage, and cross validation (CV) can be applied to choose appropriate values of these parameters.

The QTL model (2) or equivalently (3) includes all main effects and digenic interactions. We refer to this model as the full model throughout the paper. We also performed QTL mapping with the model which is referred to as the main effect model, since it includes only the main effects.

### Model inference and cross validation

The Blasso model can be inferred efficiently with the empirical Blasso (EBlasso) algorithm [21]. The EBLasso algorithm employs a coordinate ascent method to find , the estimate of , *j* = 0 …, *k*, that maximizes the likelihood function of , *j* = 0, …, *k*. In the iterative process, many or equivalent are shrunk to zero. The coordinate ascent method along with other algorithmic techniques makes the EBlasso algorithm very efficient. Our previous studies demonstrated that EBlasso outperformed several other multiple QTL mapping methods including the empirical Bayes method [26], the Bayesian hierarchical generalized linear models (BhGLM) [27], HyperLasso [28], and Lasso [29]. Detailed description of the EBlasso algorithm can be found in [21], [22] and an efficient C program with the R interface [30] implementing the EBlasso algorithm is available.

The optimal values of two hyperparameters (*a*, *b*) of the EBLasso algorithm were obtained with five-fold CV in three steps to minimize the prediction error (*PE*) calculated from , where , is the estimated phenotypic value. In the first step, *a* = *b* = 0.001, 0.01, 0.1, 1 were examined and a pair (*a*_{1}, *b*_{1}) corresponding to the smallest *PE* was obtained. In the second step, *b* was fixed at *b*_{1} and *a* was chosen from the set [−0.9, −0.8, −0.7, −0.6, −0.5, −0.4, −0.3, −0.2, −0.1, −0.01, 0.01, 0.05, 0.1, 0.5, 1], which yielded a value *a*_{2} corresponding to the smallest *PE*. In the third step, *a* = *a*_{2} was fixed and *b* varied from 0.01 to 10 with a step size of one for *b*>1 and a step size of one on the logarithmic scale for *b*<1. Note that when fixing one of the two parameters, the degree of shrinkage is a monotonic function of the other parameter [21], [22]. Therefore, in the second and third steps, the selection did not go through the full path but stopped if the current *PE* was one standard error larger than the minimum *PE* in previous steps.

### Statistical significance test

One advantage of the EBLasso algorithm relative to Lasso [29] is that it not only outputs a () vector as an estimate of nonzero elements of , but also gives an estimate of the covariance of , . Letting be the *j*th diagonal element of , we can use the *t*-statistics to test if at a certain significance level.

## Supporting Information

### File S1.

**Tables S1–S12. Table S1.** Cross-validation for determining hyperparameters (*a*, *b*) used in QTL mapping for the number of panicles per plant. **Table S2.** Experimentally investigated genes near QTLs for the number of panicles per plant identified with the full model. **Table S3.** Experimentally investigated genes near QTLs for the number of panicles per plant identified with the main effect model. **Table S4.** Cross-validation for determining hyperparameters (*a*, *b*) used in QTL mapping for the number of grains per panicle. **Table S5.** Experimentally investigated genes near QTLs for the number of grains per panicle identified with the full model. **Table S6.** Experimentally investigated genes near QTLs for the number of grains per panicle identified with the main effect model. **Table S7.** Cross-validation for determining hyperparameters (*a*, *b*) used in QTL mapping for grain weight. **Table S8.** Experimentally investigated genes near QTLs for grain weight identified with the full model. **Table S9.** Experimentally investigated genes near QTLs for grain weight identified with the main effect model. **Table S10.** Cross-validation for determining hyperparameters (*a*, *b*) used in QTL mapping for yield per plant. **Table S11.** Experimentally investigated genes near QTLs for yield per plant identified with the full model. **Table S12.** Experimentally investigated genes near QTLs for yield per plant identified with the main effect model.

https://doi.org/10.1371/journal.pone.0087330.s001

(DOC)

## Author Contributions

Conceived and designed the experiments: XC SX. Performed the experiments: AH. Analyzed the data: AH XC SX. Wrote the paper: AH XC SX.

## References

- 1. Xing Y, Zhang Q (2010) Genetic and molecular bases of rice yield. Annu Rev Plant Biol 61: 421–442.
- 2. Jiang Y, Cai Z, Xie W, Long T, Yu H, et al. (2012) Rice functional genomics research: Progress and implications for crop genetic improvement. Biotechnol Adv 30: 1059–1070.
- 3. Tester M, Langridge P (2010) Breeding technologies to increase crop production in a changing world. Science 327: 818–822.
- 4. Ikeda M, Miura K, Aya K, Kitano H, Matsuoka M (2013) Genes offering the potential for designing yield-related traits in rice. Curr Opin Plant Biol 16: 213–220.
- 5. Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
- 6. Zietkiewicz E, Rafalski A, Labuda D (1994) Genome fingerprinting by simple sequence repeat (SSR)-anchored polymerase chain reaction amplification. Genomics 20: 176–183.
- 7. Hua JP, Xing YZ, Xu CG, Sun XL, Yu SB, et al. (2002) Genetic dissection of an elite rice hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162: 1885–1895.
- 8. Li J, Yu S, Xu C, Tan Y, Gao Y, et al. (2000) Analyzing quantitative trait loci for yield using a vegetatively replicated F2 population from a cross between the parents of an elite rice hybrid. Theor Appl Genet 101: 248–254.
- 9. Xing Y, Tan Y, Hua J, Sun X, Xu C, et al. (2002) Characterization of the main effects, epistatic effects and their environmental interactions of QTLs on the genetic basis of yield traits in rice. Theor Appl Genet 105: 248–257.
- 10. Tan Y-F, Xing Y-Z, Li J-X, Yu S-B, Xu C-G, et al. (2000) Genetic bases of appearance quality of rice grains in Shanyou 63, an elite rice hybrid. Theor Appl Genet 101: 823–829.
- 11. Lian X, Xing Y, Yan H, Xu C, Li X, et al. (2005) QTLs for low nitrogen tolerance at seedling stage identified using a recombinant inbred line population derived from an elite rice hybrid. Theor Appl Genet 112: 85–96.
- 12. Yu H, Xie W, Wang J, Xing Y, Xu C, et al. (2011) Gains in QTL detection using an ultra-high density SNP map based on population sequencing relative to traditional RFLP/SSR markers. PLoS ONE 6: e17595.
- 13. Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136: 1457–1468.
- 14. Song X-J, Ashikari M (2008) Toward an optimum return from crop plants. Rice 1: 135–143.
- 15. Bernardo R (2008) Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci 48: 1649–1664.
- 16. Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163: 789–801.
- 17. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829.
- 18. Zhou X, Carbonetto P, Stephens M (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet 9: e1003264.
- 19. Iwata H, Ebana K, Uga Y, Hayashi T, Jannink J-L (2010) Genome-wide association study of grain shape variation among Oryza sativa L. germplasms based on elliptic Fourier analysis. Mol Breed 25: 203–215.
- 20.
de los Campos G, Pérez P, Vazquez AI, Crossa J (2013) Genome-enabled prediction using the BLR (Bayesian Linear Regression) R-package. Genome-Wide Association Studies and Genomic Prediction. New York: Springer. pp. 299–320.
- 21. Cai X, Huang A, Xu S (2011) Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12: 211.
- 22. Huang A, Xu S, Cai X (2013) Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC Genet 14: 5.
- 23. Hua J, Xing Y, Wu W, Xu C, Sun X, et al. (2003) Single-locus heterotic effects and dominance by dominance interactions can adequately explain the genetic basis of heterosis in an elite rice hybrid. Proc Natl Acad Sci USA 100: 2574–2579.
- 24. Zhou G, Chen Y, Yao W, Zhang C, Xie W, et al. (2012) Genetic composition of yield heterosis in an elite rice hybrid. Proc Natl Acad Sci USA 109: 15847–15852.
- 25. Xie W, Feng Q, Yu H, Huang X, Zhao Q, et al. (2010) Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing. Proc Natl Acad Sci USA 107: 10578–10583.
- 26. Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513–521.
- 27. Yi N, Banerjee S (2009) Hierachical generalized linear models for multiple quantitative trait locus mapping. Genetics 181: 1101–1133.
- 28. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130.
- 29. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B Met 58: 267–288.
- 30.
R Development Core Team (2012) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
- 31. Miura K, Ashikari M, Matsuoka M (2011) The role of QTLs in the breeding of high-yielding rice. Trends Plant Sci 16: 319–326.
- 32.
Yan W-H, Wang P, Chen H-X, Zhou H-J, Li Q-P, et al. (2011) A major QTL,
*Ghd8*, plays pleiotropic roles in regulating grain productivity, plant height, and heading date in rice. Mol Plant 4: 319–330. - 33.
Yu J, Hu S, Wang J, Wong GK-S, Li S, et al. (2002) A draft sequence of the rice genome (
*Oryza sativa*L. ssp.*indica*). Science 296: 79–92. - 34.
Goff SA, Ricke D, Lan T-H, Presting G, Wang R, et al. (2002) A draft sequence of the rice genome (
*Oryza sativa*L. ssp.*japonica*). Science 296: 92–100. - 35. Zhang Q, Li J, Xue Y, Han B, Deng XW (2008) Rice 2020: A call for an international coordinated effort in rice functional genomics. Mol Plant 1: 715–719.
- 36. Li X, Qian Q, Fu Z, Wang Y, Xiong G, et al. (2003) Control of tillering in rice. Nature 422: 618–621.
- 37.
Minakuchi K, Kameoka H, Yasuno N, Umehara M, Luo L, et al. (2010)
*FINE CULM1*(*FC1*) works downstream of strigolactones to inhibit the outgrowth of axillary buds in rice. Plant Cell Physiol 51: 1127–1135. - 38.
Jiao Y, Wang Y, Xue D, Wang J, Yan M, et al. (2010) Regulation of
*OsSPL14*by*OsmiR156*defines ideal plant architecture in rice. Nat Genet 42: 541–544. - 39.
Chuck G, Muszynski M, Kellogg E, Hake S, Schmidt RJ (2002) The control of spikelet meristem identity by the branched
*silkless1*gene in maize. Science 298: 1238–1241. - 40.
Xue W, Xing Y, Weng X, Zhao Y, Tang W, et al. (2008) Natural variation in
*Ghd7*is an important regulator of heading date and yield potential in rice. Nat Genet 40: 761–767. - 41.
Li F, Liu W, Tang J, Chen J, Tong H, et al. (2010) Rice
*DENSE AND ERECT PANICLE 2*is essential for determining panicle outgrowth and elongation. Cell Res 20: 838–849. - 42.
Wang S, Wu K, Yuan Q, Liu X, Liu Z, et al. (2012) Control of grain size, shape and quality by
*OsSPL16*in rice. Nat Genet 44: 950–954. - 43. Heang D, Sassa H (2012) Antagonistic actions of HLH/bHLH proteins are involved in grain length and weight in rice. PLoS ONE 7: e31325.
- 44.
Wang E, Xu X, Zhang L, Zhang H, Lin L, et al. (2010) Duplication and independent selection of cell-wall invertase genes
*GIF1*and*OsCIN1*during rice evolution and domestication. BMC Evol Biol 10: 108. - 45. Kitagawa K, Kurinami S, Oki K, Abe Y, Ando T, et al. (2010) A novel kinesin 13 protein regulating rice seed length. Plant Cell Physiol 51: 1315–1329.
- 46.
Segami S, Kono I, Ando T, Yano M, Kitano H, et al. (2012)
*Small and round seed 5*gene encodes alpha-tubulin regulating seed cell elongation in rice. Rice 5: 1–10. - 47.
Zha X, Luo X, Qian X, He G, Yang M, et al. (2009) Over-expression of the rice
*LRK1*gene improves quantitative yield components. Plant Biotechnol J 7: 611–620.