Composite Interval Mapping Based on Lattice Design for Error Control May Increase Power of Quantitative Trait Locus Detection

Experimental error control is very important in quantitative trait locus (QTL) mapping. Although numerous statistical methods have been developed for QTL mapping, a QTL detection model based on an appropriate experimental design that emphasizes error control has not been developed. Lattice design is very suitable for experiments with large sample sizes, which is usually required for accurate mapping of quantitative traits. However, the lack of a QTL mapping method based on lattice design dictates that the arithmetic mean or adjusted mean of each line of observations in the lattice design had to be used as a response variable, resulting in low QTL detection power. As an improvement, we developed a QTL mapping method termed composite interval mapping based on lattice design (CIMLD). In the lattice design, experimental errors are decomposed into random errors and block-within-replication errors. Four levels of block-within-replication errors were simulated to show the power of QTL detection under different error controls. The simulation results showed that the arithmetic mean method, which is equivalent to a method under random complete block design (RCBD), was very sensitive to the size of the block variance and with the increase of block variance, the power of QTL detection decreased from 51.3% to 9.4%. In contrast to the RCBD method, the power of CIMLD and the adjusted mean method did not change for different block variances. The CIMLD method showed 1.2- to 7.6-fold higher power of QTL detection than the arithmetic or adjusted mean methods. Our proposed method was applied to real soybean (Glycine max) data as an example and 10 QTLs for biomass were identified that explained 65.87% of the phenotypic variation, while only three and two QTLs were identified by arithmetic and adjusted mean methods, respectively.


Introduction
A quantitative trait is usually regarded as complex because of its inheritance mechanism [1]. In the past two decades, unraveling the genetic basis of quantitative traits has become an attractive and challenging research field. Great efforts have been made in quantitative trait locus (QTL) mapping, based on molecular markers, to identify the genetic architecture underlying quantitative phenotypic variation [2][3][4][5][6]. Generally, to effectively map the QTLs of a trait, a proper statistical method, a genetic population, and an efficient experimental design are required both for powerful and accurate QTL mapping.
Numerous statistical methods have been proposed for QTL mapping [7], among which single marker regression is the simplest method, which identifies QTLs by testing the difference between marker group means on the phenotype, using methods such as analysis of variance (ANOVA). The single marker regression approach can only detect QTLs at marker positions, thus requiring an ultra-high density of markers to obtain accurate estimates of QTL locations [8]. Interval mapping (IM) was proposed to map genome-wide QTLs based on linkage maps [9]. The IM method performs a statistical test for a QTL at each genome position between a pair of markers by conditioning on the genotypes of the two flanking markers. However, two or more linked QTLs may affect the mapping in IM, leading to biased estimates of locations and effects of QTLs [10][11][12]. Based on IM, composite interval mapping (CIM) was proposed to reduce the impact of linkage on QTL under testing and to improve the precision of QTL mapping [12]. More recently, a further refinement was made to reduce the impact of covariate marker selection on CIM, which was designated as inclusive composite interval mapping (ICIM) [13]. Furthermore, various multi-locus model methods based on Bayesian statistical frameworks have also been developed for simultaneously modeling multiple genome-wide QTLs [14][15][16][17][18]. Although Bayesian methods have a number of advantages for QTL mapping, they are usually computationally intensive and rarely easy to use. With the user-friendly computer program Windows QTL Cartographer [19], CIM is currently the most widely used method for QTL mapping in segregating populations derived from bi-parental crosses.
Various types of genetic segregating populations used for QTL mapping may be classified into tentative mapping populations, such as F 2 and backcross (BC), and permanent mapping populations, such as recombinant inbred lines (RILs) and doubled haploid lines (DH). Genetic experiments with tentative mapping populations may not repeat between years or locations. On the contrary, genetic experiments with permanent mapping population can be repeated and can evaluate genotypes by environment interaction. Thus, permanent mapping populations are currently preferred for QTL mapping as the phenotypes of mapping population individuals can be evaluated in multiple environments with multiple replications.
Both accurate genotype and phenotype data are required for high-resolution QTL mapping. The current genotyping technologies are much more reliable; therefore, obtaining high-quality phenotypic data becomes more important in QTL studies [20,21]. Appropriate experimental design is critical to reduce the experimental error and phenotype evaluation [22]. For tentative mapping populations, such as F 2 s and BCs, a completely randomized design (CRD) is usually implemented in a phenotyping experiment. However, non-uniform field conditions and lack of replication can result in large experimental errors in CRD, which further reduce the accuracy of QTL mapping. For permanent mapping populations, such as RILs and DHs, randomized complete block design (RCBD) is the most widely used experimental design in QTL mapping. RCBD may reduce the experimental error by increasing replications. However, RCBD is not efficient for local control. Lattice design [22], which has efficient local control, is an excellent experimental design for phenotype evaluation of a genetic population, especially for large sample sizes [23]. In recent years, lattice design has been applied increasingly in QTL mapping [24][25][26][27][28].
Previous mapping methods or software could not handle the original data from lattice design; therefore, the adjusted mean or arithmetic mean of each line were used typically as response variables in the mapping procedure [27,[29][30][31]. In the present study, a QTL mapping method termed composite interval mapping based on lattice design (CIMLD), which is based on lattice design and the most popularly used CIM method, is proposed. Simulations demonstrated that our model improved the power of QTL detection. Real soybean (Glycine max) data analyses were performed as a practical example of the proposed method.

Materials and Methods CIMLD
Following the CIM method [12], the linear model for testing a QTL on a marker interval (v, v +1) in an RIL population with an experiment incorporating lattice design can be written as: where y ijk is the phenotypic observation of the k-th (k = 1, 2,. . ., t) line in the j-th (j = 1, 2, . . ., s) block within the i-th (i = 1, 2,. . ., r) replication, μ is the overall mean, α i is the effect of i-th replication, γ j(i) is the effect of j-th block within i-th replication, b Ã is the effect of the QTL and x k Ã is a dummy variable taking a value of 0 or 1 and denotes the genotype of k-th line at the QTL, b l is the effect of l-th (l = 1, 2, . . ., p) covariate marker representing the background effect and x kl denotes the genotype of k-th line at l-th covariate marker, taking a value of 0 or 1. ε ijk is the random error effect. All effects were regarded as fixed except γ j(i) and ε ijk , g jðiÞ e Nð0; s 2 b Þ, ε ijk~N (0, σ 2 ), where s 2 b is the variance of the block effect within the replication and σ 2 is the error variance.
The model (1) can be rewritten in matrix notation as where y denotes the n×1 vector of y ijk , n is the total number of observations and equals t×r, 1 n is an n×1 vector of 1s, X R is an n×r design matrix whose i-th column is the dummy variable (taking a value 0 or 1) of α i , b R is r×1 vector of α i , X Q is an n×1 design matrix of QTL, b R is the same as b Ã , X M is an n×p design matrix of covariate markers, b M is a p×1 vector of b l , Z is an n×s design matrix of block, and u is a s×1 vector of γ j(i) . The variance of y is where I n is an n×n identity matrix, d ¼ s 2 b =s 2 , δ may be used to measure the variance of a block within the replication; the covariance structure is Σ = ZZ'δ+ I.
Then the phenotype is adjusted as The variance of y Ã is Σ -1/2 Var(Zu + e)Σ -1/2 = Iσ 2 . Thus, when Σ is known, model (4) is just a simple linear model and can be used to test a QTL under the framework of CIM. Here, we make use of the error variance and block-within-replication variance parameter estimated from ANOVA of the lattice design to provide an estimate of δ. Then, the estimate of Σ can be obtained according to eq (3). The QTL genotype is generally unknown in QTL mapping. In the standard IM method, conditional probabilities for the QTL genotypes were calculated according the genotype of two flanking markers and incorporated into a mixture of normal distributions [9]. In the present study, multiple versions of the complete QTL genotype were imputed based on the hidden Markov model technology in the R/qtl package [32,33]. Thus, the model (4) can be solved using standard statistical methods for simple linear regression [34]. Although an F-test can be used to test a QTL, we used the traditional logarithm of odds (LOD) score as a convenient statistical test [32]: where RSS is the residual sum of squares; the subscript 1 indicates a linear model including the QTL effects and 0 indicates the null model without the QTL effect. There are multiple versions of the QTL genotype; therefore, the LOD scores at a particular position are averaged to obtain an appropriate combined LOD score for QTL detection [35].

Simulated data
The genotype data of a RIL population comprising 196 lines were simulated using the R/qtl package [32]. The genome consisted of five chromosomes, each of 150 cM in length and with 16 evenly distributed markers. Ten QTLs (represented by Q1-Q10) were simulated with the same positions and effects as used by Zeng [12]. Three QTLs were located on each of the first three chromosomes, one QTL on the fourth chromosome and no QTLs on chromosome five. Simulated phenotype data for a 14×14 simple lattice design experiment with two replications were generated. The replication effect was set to 2. The random error effect was drawn from a normal distribution with a mean of zero and variance of Var(g)×(1/h 2 − 1), where Var (g) is the total genetic variance and h 2 is total heritability, and was fixed as 0.7. The blockwithin-replication effect was drawn from a normal distribution with a mean of zero and variance of Var(e)×δ, where Var(e) is the error variance and δ is the same as defined in model (3), and was set to 0.5, 1, 5, and 10, respectively. One hundred replications were performed for each scenario.

Performance analysis
Our proposed CIMLD method was compared with two other mapping strategies: the standard CIM method with the arithmetic mean as the response (referred as RCBD hereinafter) and the standard CIM method with the adjusted mean as the response (referred as AMLD hereinafter). The threshold for the LOD score was determined by a permutation test [36] with 1000 replications or a predefined empirical value of 2.5 [19]. The false discovery rate (FDR) and the power of QTL detection were calculated and used for performance comparisons. A detected QTL was considered a false positive if none of the predefined QTLs were found within a 10 cM window, centered at the detected QTL. A predefined QTL was considered as detected if at least one QTL was found within a 10-cM window, centered at the true QTL. Standard CIM analysis was performed using the cim function of R/qtl [32].

Soybean data set
A RIL population of soybean comprising 184 lines derived from the cross between cultivars Kefeng No.1 and Nannong1138-2 was planted in a 14×14 simple lattice design, with two replications, for 2 years, in 2005 and 2006, in the National Center of Soybean Experiment, Jiangsu, Nanjing Agricultural University, China [37,38]. The biomass of each RIL was obtained by measuring the dry weight of all plants in each plot. The linkage map of the population was initially constructed from 452 markers and was refined recently [38,39]. The new map contains 834 molecular makers covering 2308 cM in 24 linkage groups.

Statistical analysis
A random effects model was used for ANOVA of biomass and the phenotypic value for the ith environment, j-th replication, and l-th genotype was expressed as y ijkl = μ + e i + α j(i) + γ k(ij) + g l + (ge) il + ε ijkl , where y ijkl is the phenotypic observation of the l-th line in the k-th block within the i-th environment and j-th replication, μ is the overall mean, e i is the effect of i-th environment, α j is the effect of j-th replication, γ k(ij) is the effect of k-th block within i-th environment and j-th replication, g l is the effect of l-th line, (ge) il is genotype by environment interaction effect, and ε ijkl is random error. All effects were regarded as fixed except γ k(ij) and ε ijkl , γ k(ij)~N (0,s 2 b ), ε ijkl~N (0, σ 2 ), where s 2 b is the variance of the block effect within the replication and σ 2 is the error variance.
The computation for ANOVA was performed by using the GLM procedure in the SAS/STAT software [40]. Trait heritability of biomass was estimated as h 2 ¼ s 2 g =½s 2 g þ s 2 ge =s þ s 2 =ðs Á rÞ, where s 2 g , s 2 ge and σ 2 are genotype, genotype by environment interaction and error variance estimated from expected mean squares in ANOVA, respectively, and s is the number of environments and r is the number of replications [41].

Results Simulation
Researchers have realized that it is necessary to adopt a lattice design to control experimental errors. However, the lack of methods to analyze the full data from a lattice design for QTL mapping caused some previous reports to use the arithmetic means of genotypes as response variables to map QTLs with the CIM method, although the experimental design was lattice. In that case, the experimental design was actually equivalent to RCBD. In other reports, adjusted means of genotypes of the lattice design were used as response variables for QTL mapping with the CIM method. Here, we have described a model (CIMLD) for QTL mapping based on a lattice design. To compare the three strategies, we simulated four error conditions of block-within-replication relative to error variance of the model.
The genome-wide average LOD score profiles under the four block variance conditions (Fig 1) showed that as the block variance increased, the LOD score of the RCBD method decreased rapidly, especially when the block variance was large, e.g. δ = 10, where the genomewide LOD score decreased to almost zero, indicating a rapid loss in power. There is no direct correspondence between the size of the LOD score and power; therefore, we calculated the frequency that a LOD score was greater than the threshold from a permutation test among the 100 replications at each genome position. The pointwise power analysis (Fig 2) confirmed that the RCBD method suffered a high false-negative rate with large block variance. In contrast, the LOD scores of AMLD and CIMLD methods were not sensitive to block variance and stayed stable across the four block conditions, as expected. However, the CIMLD method was likely to be more powerful than AMLD, because the LOD score was generally greater than that of AMLD at the predefined QTL locations in all block conditions. The pointwise power analysis  Composite Interval Mapping Based on Lattice Design for Error Control further showed that CIMLD method had a higher power peak at the predefined QTL locations than the AMLD method, especially for small effect QTL locations, such as Q1, Q3, and Q7. Furthermore, clear peaks were observed only around most of the predefined QTL in all the three methods under all block conditions, indicating good accuracy for all three mapping methods for the different block variance conditions.
The power of each QTL detection (Fig 3) showed that with increasing block variance, the power of RCBD method decreased rapidly, while the power of AMLD and CIMLD method stayed stable. For all block variances, the power of the RCBD method was generally less than AMLD and CIMLD, especially for large block variances: a large power gap was observed. This indicated that the lack of error control of the RCBD method resulted in a bad performance for QTL mapping. The CIMLD method showed a higher power than the AMLD method for most of the QTL, except for Q5, Q6, and Q8, where the AMLD method had a comparable power to the CIMLD method. For those QTLs with small effects, especially at Q3, the power of the CIMLD method was approximately 40% greater than that for AMLD. The results indicated that the CIMLD method is much more powerful than AMLD, especially for small-effect QTLs.
The FDR and overall power of the three methods were investigated. The FDR for each method (Table 1) was in a reasonable range and was generally less than 1% either for the permutation test or an empirical threshold of 2.5. With increasing = block variance, the FDR of each method remained relatively stable. This indicated that large block variances would not cause an increase in FDR for all three methods.
The overall power of QTL detection ( Table 2) showed that the RCBD method was very sensitive to the size of block variance. With the increase of block variance, the power of the RCBD method decreased from 51.3% to 9.4%. The power of CIMLD and AMLD methods were both higher than the RCBD method under all four block conditions. The power of the three methods may be ordered as CIMLD > AMLD > RCBD. In contrast to the RCBD method, the power of CIMLD and AMLD did not change for different block variances.
The power of RCBD and AMLD methods under the permutation test was generally less than the empirical threshold of 2.5; however, the CIMLD method was more stable for the twothreshold strategy. The power of the CIMLD method was approximately 14% higher than AMLD in case of the permutation test and approximately 6% higher under an empirical threshold of 2.5. This was because the LOD score of the CIMLD method was much greater than those of the RCBD and AMLD methods (Fig 1), and the threshold from the permutation test was generally greater than 2.5.
Thus, the CIMLD method has a higher power of QTL detection because an appropriate model is implemented and the experimental error is effectively controlled.

Application to a real soybean data set
Soybean biomass data of a RIL population were used as an example of our proposed CIMLD method (S1 Fig). ANOVA for biomass showed that that the genotype variance was significant δ represents the size of the block variance and is defined as d ¼ s 2 b =s 2 , where σ 2 is the error variance and s 2 b is the variance of a block within the replication. For the purpose of computing the detection power and FDR, a predefined QTL was counted as detected if at least one QTL was found within a 10 cM window centered at the true QTL, and a detected QTL was considered as a false positive if none of the predefined QTLs were found within a 10-cM window centered at the detected QTL. A permutation test was performed with 1000 replications.  δ represents the size of the block variance and is defined as d ¼ s 2 b =s 2 , where σ 2 is the error variance and s 2 b is the variance of a block within the replication. For the purpose of computing the detection power, a predefined QTL was counted as detected if at least one QTL was found within a 10-cM window centered at the true QTL. A permutation test was performed with 1000 replications.
doi:10.1371/journal.pone.0130125.t002 and the line-by-environment interaction was also significant (S1 Table); however, the F value of the statistical test for the interaction was much less than that of main line effects, thus the phenotypic values of the lines agreed well across environments relative to the variation of the genotypic effect. The estimated heritability of biomass was approximately 82.1%.
To analyze the 2-year soybean data for QTL mapping, a modification of model (1) was needed to include the QTL by environment interaction effect. Although our previous model for QTL mapping did not explicitly take environment factors into account, such environment factors can be treated as fixed effects and modeled as covariates without any further assumptions. Accordingly, the resulting modified version of model (2) can be expressed as y = 1 n μ + where X E is the design matrix of the environment effect b E and X QE is the design matrix of a QTL by the environment interaction effect b QE . Thus, the CIMLD method with genotype by environment interaction was employed to map QTLs for biomass.
The LOD profile of biomass (Fig 4) showed clear peaks on linkage groups B1, C2, D2, E, and O. The genome-wide thresholds were 4.09, 4.02, and 4.36 for the RCBD, AMLD, and CIMLD methods at a genome-wide significance level of 0.05 by permutation test with 1000 replications, respectively. The RCBD method identified three QTLs located on B1 at 63 cM, C2 at 35.2 cM, and O at 118 cM, respectively. The AMLD method identified two QTLs located on C2 at 35 cM and O at 117 cM.
By contrast, the CIMLD method identified 10 additive QTLs for biomass distributed on eight linkage groups with a total narrow-sense heritability of 65.8%, which are summarized in Table 3. The heritability of the single QTLs ranged from 2.0% to 16.7%, and four large effect QTLs, located on linkage group B1 at 28 cM and 62 cM, C2 at 35.2 cM, and O at 117 cM, were observed with a total heritability of 46.1%. Although no additive-environment QTLs were detected, the additive environment effects of QTLs located on linkage group D2 at 34 cM and K at 46 cM were relatively large, showing a weak environment interaction signal. As predicted by the simulation results, the CIMLD method had a higher power than the RCBD and AMLD method; we identified all the three QTLs detected by RCBD and AMLD (the positions were the same or within 1 cM). The three common QTLs all belonged to the four large-effect QTLs. The effects of newly identified QTLs are relatively small, except for a QTL on B1 at 28 cM, whose heritability was 11.3%.

Discussion
Large numbers of permanent population lines are generally required in QTL mapping, and experimental fields or space need to be enlarged accordingly, resulting in larger experimental errors with the increase of lines or genotypes. As such, adopting an experimental design suitable for large numbers of genotypes is critical to control the experimental error. However, most researchers who used the RCBD method for QTL mapping ignored error control. As shown by the simulation results, the RCBD method used in most QTL mappings with permanent populations has a low power of QTL detection because of its lack of local control. Thus, it should be emphasized that error control in QTL mapping is very important for increasing the power of QTL detection and the accuracy of QTL effect estimation. Furthermore, we believe that error control is also important in association mapping, because the error not only affects the estimation of a QTL's position, but also its effects.
In a lattice design, environmental errors can be decomposed into block-within-replication error and random error, which is the residual of the lattice design model. The CIMLD method is beneficial to QTL studies because of its better error control. The previous QTL mapping based on lattice design mis-specified the model. One mis-specified model, which was actually an RCBD method in this study, has a larger error variance than the CIMLD method, which led to a lower detection power. Another mis-specified model, redefined as the AMLD method, takes block effect into account, and uses adjusted means of genotypes as response variables. AMLD obviously improved the QTL detection power; however, it ignored partial dependence of phenotypic values, caused by random block effect in the lattice design. It is this dependence that violated the hypothesis of independence of observations in the CIM approach, and thus lowered the QTL detection power relatively to the CIMLD method. Our proposed CIMLD method directly modeled the full phenotypic data of the lattice design for QTL mapping, and the simulation showed it to have a good performance in terms of power and FDR.
Our CIMLD implementation is based on CIM and a multiple imputation framework, but could be extended easily to other, more sophisticated statistical methods for QTL mapping, such as ICIM [13], Bayesian LASSO [16] and two-dimensional genome scan to identify epistasis. Furthermore, the concept can be applied to other widely used experimental designs for efficient error control, such as the block-in-replication design, replication-in-block design, and other incomplete block designs [42]. Our method is also applicable to association mapping LG, linkage group; a, additive effect in kg hm -2 ; ae, additive-environment interaction effect in kg hm -2 ; h 2 , narrow-sense heritability. The support interval of a QTL is the interval in which the LOD score is within 1.5 units of its maximum.

Conclusions
Error control is very important for accurate mapping of quantitative traits. Even if the experimental design is implemented for efficient error control, an appropriate linear model is also needed to achieve powerful QTL mapping. Our composite interval mapping method, based on lattice design, was demonstrated to be more powerful for QTL mapping because of its better error control. The concept presented in the current study may be extended to other statistical methods or experimental designs for QTL mapping. An R language implementation of our proposed method has been made publicly available at https://github.com/hjbreg/cimld.