The Power of QTL Mapping with RILs

QTL (quantitative trait loci) mapping is commonly used to identify genetic regions responsible to important phenotype variation. A common strategy of QTL mapping is to use recombinant inbred lines (RILs), which are usually established by several generations of inbreeding of an F1 population (usually up to F6 or F7 populations). As this inbreeding process involves a large amount of labor, we are particularly interested in the effect of the number of inbreeding generations on the power of QTL mapping; a part of the labor could be saved if a smaller number of inbreeding provides sufficient power. By using simulations, we investigated the performance of QTL mapping with recombinant inbred lines (RILs). As expected, we found that the power of F4 population could be almost comparable to that of F6 and F7 populations. A potential problem in using F4 population is that a large proportion of RILs are heterozygotes. We here introduced a new method to partly relax this problem. The performance of this method was verified by simulations with a wide range of parameters including the size of the segregation population, recombination rate, genome size and the density of markers. We found our method works better than the commonly used standard method especially when there are a number of heterozygous markers. Our results imply that in most cases, QTL mapping does not necessarily require RILs at F6 or F7 generations; rather, F4 (or even F3) populations would be almost as useful as F6 or F7 populations. Because the cost to establish a number of RILs for many generations is enormous, this finding will cause a reduction in the cost of QTL mapping, thereby accelerating gene mapping in many species.


Introduction
Mapping quantitative trait loci (QTL) plays crucial roles in a number of research fields in biology. QTL mapping basically relies on detecting correlations between genetic markers and phenotypic traits in a segregating population [1][2][3][4]. The development of the interval mapping method [5,6] made it possible to infer the positions of QTL with a limited number of markers. Since then, QTL mapping has been applied to various crop and vegetable species, including an early application to genome-wide QTL analysis of tomato species [7]. With the advent of molecular biology techniques such as sequencing, DNA microarray and primer extension assay [8][9][10], it became feasible to distribute a large number of markers across the genome and genotype those markers for a large sample of individuals. This revolutionary change in molecular biology further facilitated QTL mapping in many species.
Efficient fine-scale QTL mapping requires a large segregating population (bi-parental mapping population) such as an F 2 population or Recombinant Inbred Lines (or RILs). An F 1 population is first generated by a pair of homozygous parents (usually denoted by P 1 and P 2 ), and then selfing or sibling mating of the F 1 individual generates an F 2 population. It is common that each of the RILs is further selfed or sib-mated for several more generations, and F 6 ,F 7 populations are frequently used for QTL analyses.
The advantages of using RILs for a QTL analysis are obvious. First, multiple selfing processes can increase the number of recombination events [11], which results in a finer mapping of QTLs. More importantly, once RILs are established, in which the genotypes of all lines are fixed as homozygotes, these lines can be repeatedly used for investigating QTLs of various phenotypes under different environments. Thus, the establishment of a comprehensive set of RILs will be a substantial contribution to QTL mapping of the species.
In the meantime, QTL mapping is frequently applied to species that do not have substantial resources at the single-lab level. In this case, it is not reasonable to establish comprehensive RILs; rather, it makes more sense to conventionally map a rough location of a QTL with a limited amount of effort. There is an obvious tradeoff between the performance of QTL mapping and the cost required, including the sample size and the number of generations of selfing or sibling mating. The heaviest labor would be to maintain a number of RILs for multiple generations, so that a simple idea is to use a younger generation with a limited number of RILs. As a consequence, as a most aggressive setting, there are a number of QTL mapping studies that conventionally used an F 2 population.
In such a small-scale QTL mapping, it is very useful if we have some ideas about the relationship between the performance (statistical power) and the cost (the number of selfing or sibling mating generations, sample size, and marker density), which will greatly help to optimize the design of the QTL mapping experiment. This problem has been extensively investigated in simple theoretical models [12][13][14]. Here, we provide the results of extensive simulations in more realistic situations. We assume that a large number of markers are distributed across the genome, and that they are partially linked. With these results, we discuss how the cost can be reduced by minimizing the reduction of the performance.

Model and simulation
For simulating QTL mapping process with RILs, we consider a diploid species. It is assumed that the genome consists of L chromosomes with equal lengths and that the genome size is G Mb, which corresponds to R centimorgan (cM). It is also assumed that M markers are evenly distributed across the genome. We set a single QTL in the simulated genome, and ask whether we can find significant phenotype-genotype correlations for markers nearby the QTL. To assess the performance of QTL mapping, we simulate the process of creating a large number of RILs from a single pair of parental lines, P 1 and P 2 , both of which are assumed to be completely homozygote. Their hybrid progeny, F 1 , is created, and then N|F 2 progenies are produced by selfing F 1 . It is assumed that each of the F 2 progenies is successfully inbred by the singe-seed-descent method for six generations (i.e., up to F 7 ). Throughout this process, recombination occurs randomly at rate R, following the four-strand model [15]. It is also assumed that at least one chiasma form in each chromosome in one meiosis event, called obligate chiasma [16][17][18], but for simplicity, we also assume no crossing-over interference. At each generation from F 2 to F 7 , a simple QTL mapping method (see below) is applied.
In the QTL mapping process, it is assumed that all markers are genotyped for all individuals, and the phenotype of each individual is determined by a simple model, in which there is a particular locus that partially contribute to the quantitative trait of interest [1][2][3][4][5]. Let Q 1 and Q 2 be the two alternative alleles at this QTL inherited from the two parental lines (P 1 and P 2 ). Then, it is assumed that the phenotype of each diploid individual in the segregating population is determined by the genotype at this locus. There are three possible states, Q 1 Q 1 , Q 1 Q 2 , Q 2 Q 2 , which are denoted by genotypes 1, 2, and 3, respectively. The numbers of individuals with the three genotypes are denoted by n 1 , n 2 , and n 3 , and N is the total number of individuals (N~n 1 zn 2 zn 3 ).
Let y ij be the quantitative value representing the focal phenotype of the jth individual in the ith genotype (i~1,2,3 and j~1,2,:::n i ), then in a simple model with no interaction between genotypes and environment, y ij can be written as where m is the mid-parental value, a is the additive genetic effect and d is the dominance effect. Other factors are represented by e ij , including the environmental variance and the residual genotypic variance due to other unlinked QTLs. e ij is assumed to follow a normal distribution with mean 0 and variance s 2 . We assume that this factor e ij is added at each generation independently. In other words, only m, a and d are the parameters that determine the genetic factors that can be inherited through generations, and e ij is not affected by the phenotype or genotype at the previous generation. Simulations of RILs under this simple model are used for investigating the performance of QTL mapping. Assuming a large number of markers are available across the genome, we simply perform a statistical test of the null hypothesis of no association between the phenotype and each of all markers. We do not need to use the interval mapping method because of the availability of a large number of markers (this condition will be relaxed later). We use two likelihood ratio tests to examine if there is a significant phenotype-genotype correlation.
In the first method (Method I), if B and b represent the two alleles from P 1 and P 2 , respectively, the null model assumes equal average phenotypes of the three genotypes, y y BB~ y y Bb~ y y bb . Alternatively, if the marker and the QTL is completely linked, we expect E( y y BB )~mza, E( y y Bb )~mzd and E( y y bb )~m{a. Method I requires the likelihoods of the observation ( y y BB , y y Bb , y y bb ) under these two extreme cases (null and alternative). It should be noted that this very commonly used method requires estimation of the dominance effect (e.g., [6,19,20]). Alternatively, the second method, which we propose here, is a simplified version (Method II), in which only homozygote individuals with marker genotypes, BB and bb are considered (heterozygotes, Bb, are excluded), and tests the null hypothesis of y y BB~ y y bb . We propose this conventional method because it does have to involve the dominance parameter by excluding heterozygotes from the analysis. Estimation of the dominance parameter has to rely on a relatively small number of heterozygotes, which will likely cause a great deal of uncertainty in the estimate. We suspected that miss-inference of the dominance parameter due to such uncertainty might result in a reduction of the power. Obviously, the situation would be identical when selfing generations increase and all RILs become homozygote in the entire genome. The two methods are described below in detail.
Method I. This method involves computation of the maximum likelihoods of the observation, ( y y BB , y y Bb , y y bb ), under the null and alternative models. The latter involves maximum likelihood estimation of the four unknown parameters, m, a, d, and s 2 I,Alt , which are given bym a a~ y y 1 { y y 3 2 , ð5Þ andŝ Then, it can be considered that the maximum likelihood of the observation under the alternative scenario is given these estimates.
That is, the log-maximum likelihood is computed by In the null model, in which only two parameters (m and s 2 I,Null ) are involved, the maximum log-likelihood of the data is given by where s2 2 I,Null is simply given by y j represents the phenotypic value of the jth individual (with no specification of genotype, so that j~1,2,:::N). Thus, the maximum log-likelihoods under the null and alternative models are computed by equations (9) and (8), respectively, from which the LOD score can be obtained by (LL I,Alt {LL I,Null )=ln (10). For each replication of the simulations, we set a cut-off value of the LOD score by 1,000 replications of a permutation test [21], so that the false positive rate is set at a~0:05 after correcting for multiple testing by multiplying the Pvalue by the number of markers (i.e., Bonferroni correction). Note that because a permutation test is performed for each data set, the false positive rate is always 5% for any parameter set in all generations. This allows a fair comparison of the performances of different models with different parameters.
Method II. This method is a simplified version of Method I, in which marker-heterozygous individuals are excluded so that it does not involve the process of estimating the dominance parameter. In the alternative model of Method II, m and a can be estimated from the average phenotypes, y y 1 and y y 3 : a a~ y y 1 { y y 3 2 , ð12Þ and s 2 II,Alt is given by This process is basically identical to that for Method I. Then, the maximum likelihood of the observation under the alternative scenario is given with these given these estimates: In the null model, where only two parameters (m and s II,Null ) are involved as well as Method I, the maximum log-likelihood of the data is given by where s 2 II,Null is simply given by Then, from equations (14) and (15), the LOD score can be computed as (LL II,Alt {LL II,Null )=ln(10).

Simulation results
We designed simulations to quantitatively evaluated the effect of the number of generations on the performance of QTL mapping. Throughout this article, we fix m~0 and a~1. We assume a simple model, in which the simulated genome consists of L~12 chromosomes with equal length G~30 Mb, so that the genome size (360 Mb) is similar to that of rice, a species to which QTL mapping is frequently applied. In total M~1,200 codominant DNA markers are evenly distributed on the genome, such that the interval length is 300 kb (100 markers per chromosome). The recombination rate is assumed to be 4 cM/Mb, which is roughly consistent with estimates of rice [22]. Some of these simulation conditions will be relaxed later.
We are interested in the power of QTL mapping to detect a particular QTL that has a significant genetic contribution. It is assumed that this QTL locates at the center of one chromosome. This location is also the middle of two adjacent markers; therefore, the distance to the closest marker is 150 kb. Although the model does not set other specific QTLs, their effect is incorporated in the environmental factor, e in equations (1)(2)(3). For each of these parameter settings, we performed 10,000 independent replications of simulations from F 1 to F 7 , and at each generation (except for F 1 ) the LOD scores were computed for all markers.
A typical pattern of the results is shown in Figure 1, in which N~200, s 2~2 , and no dominance (d~0) were assumed. The expected heritability in the F 2 population is given by Therefore, with this parameter set, we expect that the expected heritability is 20% (note that the heritability changes in the following F 3 , F 4 ,… generations). It was found that on the chromosome with the QTL (left panel in Figure 1A), both Methods I and II provide the highest LOD score around the QTL, creating a sharp peak, whereas the LOD scores on all other chromosomes are low (plot for one representative chromosome is shown in the right panel in Figure 1a). We confirmed that similar patterns hold for all simulated parameter sets unless s 2 is very large.
We found that there are at least two notable observations in Figure 1. (i) The distributions of LOD scores do not change much through generations, suggesting that significant power of detecting QTL may be expected even in early generations. If so, QTL mapping does not necessarily require many generations of inbreeding, so that a huge amount of time and cost could be saved. (ii) The performance of Method II exceeds that of Method I in many cases, especially at early generations. Method II is a simplified method that does not use heterozygous markers, whereas Method I uses all samples. It is suggested that the simpler method without considering the dominance effect (Method II) may be more efficient even with an obvious drawback of reducing sample size. These two observations have significant implications that F 3 ,F 4 populations could have reasonable power for QTL mapping and that Method II would perform better at such early generations.
In order to quantitatively evaluate these hypotheses, we investigated the power of QTL mapping. The right panel of Figure 1 summarizes the results of 10,000 replications of the simulations with the same parameters as those used for the left panel. The power was computed for each SNPs, which is defined as the proportion of the replications, in which the LOD score is significant at the 5% level (av0:05, after correcting for multiple testing). The spatial distributions of the power support our two hypotheses; the performance of Method II (blue line) overall exceeds that of Method I (red line) and the power at F 4 is almost comparable to that at F 7 .
Further simulations with wide ranges of parameters were carried out to confirm if this holds. The results are summarized in Figure 2. In this figure, we mainly focus on how the environmental variance (s 2 ) affects the power in two sample sizes, N = 200 and 1,000. We also considered two cases: no dominance (d = 0) and complete dominance (d = 1). We used a wide range of s 2~f 2,4:5,9:5,19:5,49:5,99:5g (the corresponding heritability at the F 2 generation are Q PDF2~f 20%,10%,5%,2:5%,1%,0:5%g), and partial results are shown in Figure 2 such that the power at F 7 distributes roughly from 0.1 to 1. The power is here defined as the proportion of simulation replications in which the LOD scores of both of the two closest makers to the QTL are significant at the 5% level (after correcting for multiple testing). As the power is overall much higher when N = 1,000, we found that the QTL can be detected with probability *1 when s 2 is smaller than 9.5 (that is, larger heritability; Figure 2B), while the QTL with s 2 = 9.5 These simulations supported that our two hypotheses hold with these wide ranges of parameters. For all the parameter sets, the performance of Method II exceeds that of Method I especially at early generations and the power of Method II at F 4 is almost comparable to that of F 7 . These seem to be true regardless of the degree of dominance. It should be noted that as mentioned earlier, the power is measured by a permutation applied to each data set, so that the false positive rate is alway controlled to be 5% for all parameter sets. Therefore, the comparison of power is statistically fair.
In Figure 3, we investigated the effects of other parameters including the recombination rate, genome size, and marker density. It is found that overall the effects of these parameters are small. In Figure 3A, the power is shown for the recombination rate is changes from R~1 to R~8, while all other parameters remained the same as those used for Figures 1 and 2A. The panel in the broken square is identical to Figure 2A. In Figure 3B, the effect of genome size is investigated. Because our initial setting may be applied to species with small genomes such as Arabidopsis and rice, the genome size is increased up to 4 Gb, which is almost as large as maize and wheat. In Figure 3C, the marker density is reduced to up to 10 times. We found that the overall patterns are similar to one another, although the power becomes relatively weak when marker density is low (the leftmost panel in Figure 3C, and also see the leftmost panel of Figure 3B). There also seems to be a weak negative correlation between the power and the recombination rate (Figure 2A). Thus, our conclusion could be robust to these parameters.
These results are for the cases of relatively normal settings with additive phenotype effect at the focal QTL. However, there are cases where this does not obviously hold. One example is overdominance. Suppose the phenotypic value of heterozygote individuals at the focal QTL are expected to be larger than those of homozygotes. Such a situation can be realized by setting dw1, so that the expected phenotype value for Q 1 Q 2 heterozygotes exceeds that of Q 1 Q 1 homozygotes (Q 2 Q 2 homozygotes always have smallest values. See equations (1-3)). To investigate the power of the two methods under this setting, we repeated the same power simulations by assuming d~1:5 and 2. (we don't need to mention m~0, a~1 and s 2~2 *99:5 if they are identical to those above.) With these settings, because the phenotype of heterozygotes are very informative to identify the QTL, the overall performance of Method I is quite good (Fig. 4A). This is remarkable especially in earlier generations, but the situation becomes similar to those with the QTL with the additive phenotype effect as the number of generation increases because almost all individuals become homozygotes. This pattern is remarkable in the extreme case, symmetric overdominance, where a~0 is given so that the expected phenotype values of Q 1 Q 1 and Q 2 Q 2 homozygotes are identical and the phenotype of heterozygotes exceeds homozygotes by d (Fig. 4B). In earlier generations, Method I works fairly well, but the power is almost zero in F 6 and F 7 because almost all individuals are homozygotes, either Q 1 Q 1 and Q 2 Q 2 , between which there is no difference in phenotype.

Linked QTLs
We also consider a more complicated model, in which there are QTLs that are linked to the focal QTL. It should be noted that our basic model described above takes into account the effect of multiple QTLs, whose effects are included in the third term of the right-hand side of equations (1)(2)(3). The assumption was that those QTLs are not linked to the focal QTL. We here investigate the effect of linked QTLs to the focal QTL.
We use a simple two-locus model. The alleles from P 1 at the two loci are denoted by Q I,1 and Q II,1 and those from P 2 are denoted by Q I,2 and Q II,2 . m, a and s 2 were set such that their Q PDF2 are 20% and 10%, respectively, in the codominance case. Other parameters follow those used in the earlier simulations for Figure 2A. These two QTL are linked, and four different distances between them were considered (f30,21,12,3g Mb). No epistasis between QTLs was assumed.
We first consider the cases of coupling phenotype effect, that is, both of the two alleles from P 1 (i.e., Q I,1 and Q II,1 ) have positive effects on the phenotype. The results are summarized in Figure 5A, which shows the power to detect each QTL in the codominance and dominance cases. The overall patterns are quit similar to each other. When the distance is short (12 and 3 Mb), we observe very high power because the two QTLs behave almost as a single QTL with relative contribution *30%. As the distance increases, the power decreases because of recombination. If the distance is significantly long (i.e., &30 Mb), the two QTLs behave almost independently, so that the power to detect them should become comparable to those shown in Figure 2A. The performance of Method II is better than Method I in all cases. Figure 5B shows the power when the phenotype effects of the two QTLs are decoupling or repulsion, that is, Q I,1 and Q II,2 have positive effects on the phenotype. We first consider the codominance case. Because alleles with positive and negative effects are initially coupled, the power is much more lower than in the case of coupling ( Figure 5A). Recombination between the two QTLs creates coupling haplotypes, Q I,1 Q II,1 and Q I,2 Q II,2 , thereby increasing the power. Indeed, the power increases with increasing the number of generations and the distance between the QTLs. The performance of Method II is overall better than Method I.
The pattern is more complicated in the dominance case. With few recombinations (i.e., in younger generations with short distance), heterozyge individuals have the largest phenotypic values, so that they are very informative. This is why we observe higher performance of Method I. When the distance is 3 Mb, the power of Method I in F 2 is almost one because of the striking difference between homozygotes and heterozygotes. As more recombination events occur, the advantage of Method I is getting smaller, and the pattern becomes similar to the codominance case.
Thus, when there are multiple QTLs especially with dominance effect and/or epistatic interaction, the relationship between the phenotype parameters (s 2 ) and the power is complicated. In such a case, it is quite common that we observe a single peak of high LOD scores encompassing the two QTLs. In a practical case, the problem would be that it is very difficult to know whether a single peak of the LOD score involves only one QTL or multiple QTLs. To distinguish these cases, further breeding should be required. For example, see refs. [23,24].

Discussion
QTL mapping plays significant roles to identify genetic regions responsible to important phenotype variation. One of the common strategies of QTL mapping uses a large number of RILs, which are established for at least several generations of inbreeding (typically up to F 6 or F 7 ). We here used simulations to quantitatively evaluate the performance of QTL mapping using RILs. Under the simple model with one focal QTL, it was found that the performance of QTL mapping with F 4 population could be almost comparable to that with F 6 or F 7 populations (Figures 2  and 3). It was also found that Method II has more power than Method I especially at earlier generations. Method II is a simplified version of Method I, and it does not involve the process to estimate the dominance parameter, d. An obvious drawback of Method II is a reduction of sample size because it discards markerheterozygote samples. For example, roughly 25% and 12.5% of RILs are excluded at F 3 and F 4 , respectively. Nevertheless, the performance of Method II exceeds that of Method I, suggesting that the uncertainty of d might reduce the power of Method I. Thus, our results imply that QTL mapping does not necessarily requires RILs at F 6 or F 7 generations; rather, F 4 (or even F 3 ) populations would be almost as useful as F 6 or F 7 populations. Although we quit the simulations at F 7 , it is expected that the results for further generations can be intuitively understood; Because the power is almost saturated at F 6 ,F 7 for many parameter sets, the power for F w7 cannot be much larger than that of F 7 . Only when the power is still increasing at F 7 , more power is expected for F w7 , but it would eventually saturate in a few generations. Soller and Beckmann [12] suggested relatively little gain of the power by increasing the number of inbreeding generations when heritability is large, based on their theoretical analysis under a two-locus model (i.e., QTL vs. marker). Our simulations support their implication in more practical situations with a number of markers for a wide range of s 2 . While we only simulated RILs with selfing, these conclusions should hold for RILs with sibling mating, which was confirmed by a limited amount of additional simulations. We found that the only effect of sibling mating is that the decrease of heterozygous loci is slightly retarded (data not shown).
Further simulations under various conditions were performed ( Figure 3) to investigate the effects of the parameters that were fixed in the basic simulation for Figures 2 and 3. The investigated parameters are the recombination rate, genome size, and marker density, while the sample size was fixed to be 200. It was found that these factors have relatively minor effects on the results,  indicating that our conclusions should hold under wide ranges of the parameters. It was surprising that the power did not decrease much when we have only 10 markers on a 30 Mb (120 cM) of chromosome. An implication is that in order to reduce the cost, a reasonable level of power could be expected when there are roughly every 10 cM.
In contrast, it seems that the effect of the sample size is much larger than those of the factors explored in Figure 3. As shown in Figure 2, QTLs with much larger s 2 can be detected when N~1000 in comparison with the case of N~200. Increasing sample size is costly, may be as much as extending inbreeding generations, but our results imply that the former may be more efficient than the latter. We would suggest that increasing the sample size is one of the best strategies to improve the performance rather than continuing inbreeding for many generations. Because the cost to establish a number of RILs for many generations is enormous, it is important to understand the relationship between the cost and output. Our results provide several ideas to obtain better performance with a limited cost, there by accelerating gene mapping in many species.
In summary, we demonstrated that our idea of ignoring heterozygotes (incorporated in Method II) works quite well in a relatively simple situations. The major difference between the two methods is that Method I has an additional parameter (d) that has to be estimated from data. Our demonstration might indicate that simple methods with no estimation process work well. In this sense, one might think that a linear regression analysis might also work well [13,14]. However, although this analysis does not involve estimation of the dominance parameter, it assumes a certain level of dominance (most commonly no dominance). Therefore, when the true dominance parameter is different from the assumption, the power might be reduced. In other words, it still involves uncertainty of the dominance parameter. As expected, we confirmed that the performance of the linear regression analysis did not exceed that of Method II for all parameter range (data not shown). Our Method II provides a general framework in evaluating likelihood ignoring heterozygote. This can be readily incorporated in the interval mapping method [1][2][3][4][5][6], or recently developed more computationally sophisticated QTL mapping algorithms, such as, Baysian shrinking method e.g., [25,26] and penalized maximum likelihood e.g., [27].
We mainly obtained these conclusions under a simple model with one focal QTL, but they can be applied to broad cases because the model does not necessarily assumes that there is only one QTL in the genome. We simply focused on a single QTL with its phenotype effect specified by parameter s 2 (the effects of other QTLs are included in the environmental factors, e, in equations 1-3). Therefore, as long as the focal QTL is not linked to other QTLs, our conclusions should hold. We confirmed this by additional simulations in a model allowing multiple QLTs with various quantitative effects, although too obvious theoretically.
It should be noted that there are some cases where the performance of Method I exceeds that of Method II, as demonstrated in Figures 4 and 5. The consensus of these cases is that the phenotype of heterozygotes is informative. One is the case of overdomenace, where the performance of Method I is much better in earlier generations because there are a number of heterozygotes. The situation is similar when there are two linked QTLs that have decoupling phenotype effects with complete dominance. Also in this case, the phenotype value of double heterozygotes is the highest, Method I performs well particularly in earlier generations. We should keep in our mind that our major conclusions may not hold in these cases (may not be very common though).

Conclusions
QTL mapping plays significant roles to identify genetic regions responsible to important phenotype variation. One of the common strategies of QTL mapping uses a large number of RILs, which are established for at least several generations of inbreeding (typically up to F 6 or F 7 ). We here used simulations to quantitatively evaluate the performance of QTL mapping using RILs. It was found that the performance of QTL mapping with F 4 population could be almost comparable to that with F 6 or F 7 populations (Figs. 2 and 3). It was also found that Method II has more power than Method I especially at earlier generations. Method II is a simplified version of Method I, and it does not involve the process to estimate the dominance parameter, d. An obvious drawback of Method II is a reduction of sample size because it discards marker-heterozygote samples. For example, roughly 25% and 12.5% of RILs are excluded at F 3 and F 4 , respectively. Nevertheless, the performance of Method II exceeds that of Method I, suggesting that the uncertainty ofd d might reduce the power of Method I. Thus, our results imply that in most cases, QTL mapping may not necessarily require RILs at F 6 or F 7 generations; rather, F 4 (or even F 3 ) populations would be almost as useful as F 6 or F 7 populations. Because the cost to establish a number of RILs for many generations is enormous, this finding will cause a reduction in the cost of QTL mapping, thereby accelerating gene mapping in many species.