Inclusive Composite Interval Mapping of QTL by Environment Interactions in Biparental Populations

Identification of environment-specific QTL and stable QTL having consistent genetic effects across a wide range of environments is of great importance in plant breeding. Inclusive Composite Interval Mapping (ICIM) has been proposed for additive, dominant and epistatic QTL mapping in biparental populations for single environment. In this study, ICIM was extended to QTL by environment interaction (QEI) mapping for multi-environmental trials, where the QTL average effect and QEI effects could be properly estimated. Stepwise regression was firstly applied in each environment to identify the most significant marker variables which were then used to adjust the phenotypic values. One-dimensional scanning was then conducted on the adjusted phenotypic values across the environments in order to detect QTL with either average effect or QEI effects, or both average effect and QEI effects. In this way, the genetic background could be well controlled while the conventional interval mapping was applied. An empirical method to determine the threshold of logarithm of odds was developed, and the efficiency of the ICIM QEI mapping was demonstrated in simulated populations under different genetic models. One actual recombinant inbred line population was used to compare mapping results between QEI mapping and single-environment analysis.


Introduction
QTL by environment interaction (QEI) widely exists in crops and other organisms. Studies on QEI contribute to the efficient use of marker-assisted selection (MAS) in breeding, and better understanding of genetic architecture of important quantitative traits and genotype by environment interactions [1,2,3]. As a consequence, many theoretical and applied studies have been conducted on QEI analysis in multi-environmental trials.
Analysis of variance (ANOVA), the simplest method, tested one marker at a time and had no background control, which gave rise to many false positive QTL [4]. Composite interval mapping was applied to detect QEI when multiple environments were regarded as multiple traits [5,6], but the effect of QTL at the current interval may be absorbed by the background

Materials and Methods
Genetic and linear regression models in QEI mapping DH population was used to illustrate the ICIM QEI mapping. For simplicity, it is supposed that two inbred lines P 1 and P 2 differ at m QTL, being located in m intervals defined by m+1 markers on one chromosome. If no QTL is located in a marker interval, the average and interaction effects of the QTL are treated as zero. QTL genotypes of P 1 and P 2 are assumed to be Q 1 Q 1 Q 2 Q 2 . . .Q m Q m and q 1 q 1 q 2 q 2 . . .q m q m , respectively. Suppose that a DH population is derived from the F 1 hybrids of P 1 and P 2 and phenotyped in e environments. For each individual, X = (x 1 , x 2 ,. . ., x m , x m+1 ) represents the marker variables equal to 1 or -1, standing for two marker types (i.e. P 1 marker type and P 2 marker type). G = (g 1 , g 2 ,. . ., g m ) represents QTL variables equal to 1 or -1, standing for two QTL genotypes (i.e. P 1 QTL type and P 2 QTL type). Let a 1h , a 2h ,. . ., a mh represent the additive effects of the m QTL in the h th environment, respectively. Under the assumption of additivity of QTL effects, G h , the genotypic value of an individual in the h th environment under the additive genetic model, can be written in Eq (1).
The expected frequency of the j th QTL genotype depends on its position at the chromosomal interval flanked by the j th and (j+1) th markers and the length of this interval [7,23,32,33,], i.e., where λ j and ρ j are functions of the three recombination fractions between the j th marker and j th QTL, between the j th QTL and (j+1) th marker, and between the j th and (j+1) th markers. The expectation of genotypic value G h conditional on marker type X can be denoted as, where b 0h = μ h , b 1h = λ 1 a 1h , and b jh = ρ j-1 a (j-1)h +λ j a jh (j = 2,. . .,m). The coefficient of the j th marker in the h th environment, i.e. b jh , is only affected by QTL in the (j-1) th and j th marker intervals. Therefore, when QTL are isolated by at least one blank interval, b jh and b (j+1)h contain all the position and additive effect information of QTL in the j th interval. These statistical properties provide the theoretical basis of QEI mapping. Suppose a DH mapping population has observations on a quantitative trait of interest and genotyping information on m+1 ordered markers. The following linear regression model can be used in the additive QEI mapping, i.e., where y ih is the phenotypic value of the i th individual in the h th environment; b 0h is the overall mean of linear model in the h th environment; x ij is the indicating variable for the j th marker's genotype of the i th individual, which is equal to 1 or -1 standing for P 1 type or P 2 type respectively; b jh is the partial regression coefficient of phenotype on the j th marker in the h th environment; and ε ih is the residual random error in the h th environment that is assumed to be normally distributed. Stepwise regression can be therefore conducted for phenotypic value in each environment to select significant markers, similar to additive mapping in single environment [7].

One-dimensional scanning of QEI mapping
For a testing position in the k th marker interval, the phenotypic value of the i th individual in the h th environment was adjusted by whereb jh is the estimate of b jh of significant markers selected by stepwise regression in model (4). The phenotypic value Δy ih contains QTL information in the current interval and does not change until the testing position moves to the next interval. Traditional interval mapping was conducted on the adjusted phenotypic values given by Eq (5). For a testing position in an interval, individuals of DH population can be classified into four groups based on the types of the two flanking markers. If there is one QTL (with the two alleles denoted as Q and q) at the current testing position, each marker group has both QTL genotypes QQ and qq, and hence follows a mixture distribution of two components Nðm 1h ; s 2 εh Þ and Nðm 2h ; s 2 εh Þ. In each marker group, frequencies of the two QTL genotypes depend on the recombination fractions between the putative QTL and two flanking markers, and they are different for the four marker groups. Existence of QTL at the current scanning position can be tested by the following hypotheses: where S l (l = 1, 2, 3, and 4) denotes the l th marker type group; π l1 and π l2 are the proportions of two QTL genotypes QQ and qq in the l th group, respectively; f ðÁ ; m kh ; s 2 εh Þ represents the density of the k th normal distribution in the h th environment.
The expectation and conditional maximization (ECM) algorithm [34] was used to estimate the two means and one variance in the h th environment in Eq (6). Their initial values can be determined by: Dy ih ; and where n 1:3 is the summation of n 1 to n 3 . In the E-step, the posterior probability of the i th individual (i = 1, . . ., n h ; h = 1, . . ., e) belonging to the k th QTL genotype (k = 1, 2) was calculated as, , The EM algorithm continued until the difference in the likelihood between two consecutive iterations reached a pre-assigned precision, say 10 −6 . The ML estimates thus obtained were represented bym 1h ,m 2h andŝ 2 εh . Then the additive effect under the h th environment was calculated as a h ¼ 1 2 ðm 1h Àm 2h Þ. The log-likelihood function under the alternative hypothesis H 2 is, where λ is the Lagrange multiplier.
In the EM algorithm for L 2 , the calculation of posterior probability was the same as previous one. In the M-step, the three parameters were updated as follows, where l ¼ 2e Under the null hypothesis H 0 , the Δy ih follow the normal distribution of Nðm 0h ; s 2 0h Þ. The mean and variance of this distribution can be estimated as, ðDy ih Àm 0h Þ 2 : The log-likelihood function under the null hypothesis H 0 is, log½f ðDy ih ; m 0h ; s 2 0h Þ: The LOD score (denoted by LOD A ) calculated by L 1 -L 2 indicates whether there is significant average effect at the testing position. The LOD score (denoted by LOD AE ) calculated by L 2 -L 0 indicates whether there are significant QEI effects. Sum of LOD A and LOD AE (denoted by LOD) gives the overall test statistic indicating the significance of both average effect and QEI effects.

Phenotypic variation explained (PVE) by the identified QTL
In the QTL IciMapping software, PVE of additive and QEI effects at each scanning position were estimated by posterior probability w ik and two QTL genotypic meansm kh ðk ¼ 1; and 2Þ, which have been estimated by EM algorithm previously. For illustration, we assume there is one QTL at the current scanning position, and the expected frequency of the k th QTL genotype (QQ or qq) in the h th environment is f kh , (k = 1 and 2, and h = 1,. . ., e). The marginal frequency of QTL genotype is defined as the sum of QTL genotype frequencies in all environments, i.e., f kh , which can be estimated byf kÁ ¼ 1 n X n i¼1 w ik . QTL genotype is independent of environment, therefore f kh can be estimated byf kh ¼ 1 ef kÁ ,. Genetic variance and QEI variance can be calculated from Table 1, a two-way table of two QTL genotypic means and a set of environments. The deviation between QTL genotypic mean and the grand mean can be decomposed into three components, i.e., m kh Àm ÁÁ ¼ ðm kÁ Àm ÁÁ Þ þ ðm Áh Àm ÁÁ Þ þ ðm kh Àm kÁ Àm Áh þm ÁÁ Þ: The three components stood for QTL average effect, environmental effect and QEI effect, respectively. In statistics, it can be proved the decomposition is orthogonal. In DH population, genetic variance is equal to the additive variance, which can be calculated as, QEI effect and variance can be calculated as, Then, PVE A and PVE AE can be calculated from PVE A ¼ V A V P and PVE AE ¼ V AE V P , respectively, where V P was the average of phenotypic variances in the e environments. Take DH population and four environments as an example. Assume the expected marginal frequencies of QTL genotypes QQ and qq are 0.4 and 0.6, which indicated the frequencies of the two genotypes in each environment were 0.1 and 0.15, respectively. Phenotypic variances in the four environments were set at 30, 20, 10 and 40 respectively, and the values ofm kh were given in Table 2. PVE A and PVE AE can be calculated as follows,

Empirical formula of LOD threshold in QEI mapping
In single-environment analysis, when the null hypothesis is true, likelihood ratio test (LRT) at each scanning position follows the χ 2 distribution with the degree of freedom (df) equal to the number of genetic parameters be estimated in the genetic population [32,35]. Sun et al. [36] found that the number of independent tests (denoted as M eff ) was proportional to the genome length in one-dimensional QTL scanning, and the proportion varied with marker density and population type. So M eff can be estimated as the product of proportion efficient and the genome length. Let α g be the genome-wide type-I error, the type-I error at each testing position should be a p ¼ a g M eff based on the Bonferroni correction. Therefore, the empirical LOD threshold can be determined by formula LOD ¼ w 2 a p ðlÞ=2lnð10Þ, where w 2 a p ðlÞ is the inverse χ 2 distribution that returns the critical value of a right-tailed probability α p for the degree of freedom λ. In QTL mapping, λ is equal to the number of genetic parameters to be estimated [36].
This formula can also be used to determine the LOD threshold in QEI mapping by considering the difference in degree of freedom. In QEI mapping, each QTL genotype has its own distribution in each environment, and the number of independent genetic parameters to be estimated is the sum of parameters in each environment. In other words, λ is equal to e for BC 1 (or DH and RIL) population and 2e for F 2 population. For validation, LOD threshold was determined in simulated BC 1 and F 2 populations under null genetic model (S1 and S2 Files). The genomic information and mapping parameters were the same as unlinked and linked QTL models (to be described in the next section), except that there was no QTL located on the genome. LOD thresholds for α g = 0.05 and α g = 0.01 were estimated at the 95 th and 99 th percentiles of the 1000 maximum LOD scores out of 1000 runs.

Putative genetic models in simulation studies
QTL IciMapping is integrated software for linkage map construction and QTL detection. QEI mapping has been implemented in version 4.0 of the software as the MET functionality [37]. In this study, unlinked and linked QTL models were both considered to evaluate the efficiency of QEI mapping. The genome consisted of six chromosomes, each of 150 cM in length with 16 evenly distributed markers. Two environments were considered with equal heritability in the broad sense in both models. In the unlinked QTL model, five QTL were located on five chromosomes, and the broad sense heritability was 0.5 for both environments. QTL additive effects in the two environments were given in Table 3, representing three QEI levels, i.e., strong interaction (Q2), environment-specific interaction (Q3 and Q4) and no interaction (Q1 and Q5).
Eight QTL effect scenarios were considered for two linked QTL (Table 4), i.e., Q1 and Q2, located at 25 and 55 cM on chromosome 1. These scenarios represented different QEI levels and linkage phases (coupling or repulsion phases). For example, Q1 and Q2 had strong QEI, and they were linked in the coupling phase in model L3, and in the repulsion phase in model L7. Three levels of heritability were considered, i.e., H 2 = 0.1, H 2 = 0.5 and H 2 = 0.8. One thousand DH populations, each of a size of 200, were generated for unlinked model (S3 File) and for each effect scenario of the two linked QTL under each heritability level (S4-S6 Files). The LOD threshold was set at 3.11 by empirical formula to ensure the genome-wide Type-I error rate (α g ) to be less than 0.05. The scanning step was set at 1 cM. The two probabilities for entering and removing variables in stepwise regression were set at 0.001 and 0.002.
Detection power and FDR were used to evaluate the efficiency of QEI mapping. Each predefined QTL was assigned to a support interval of 10 cM centered at the predefined location. Power of each QTL was calculated as the percentage of the simulation runs having significant peaks higher than the LOD threshold in its support interval. QTL identified but out of this interval were treated as false positives. FDR was calculated as the ratio of the number of false positives to the total number of significant discovery [26,38]. For each genetic model, estimated positions and effects were calculated as the average values of all detected QTL.

One RIL population in maize
The actual maize population used in this study was derived from a cross between inbred parents CML444 and SC-Malawi, consisting of 236 RILs [39]. A subset of 160 markers with an average missing rate of 5.12% was used to build the linkage map. Total genome length was 2105.6 cM by Haldane mapping function, and the average marker density was 13.  [39]. To compare with the empirical LOD threshold values, permutation tests were conducted on the maize population for 1000 times [40].  populations (i.e. BC 1 and F 2 populations). Thus, the empirical formula from individual environments [36] is also suitable for QEI mapping in multi-environments, considering the change in degree of freedom. As for the RIL population, accumulated recombination frequency (represented by R) is much larger than the one-meiosis recombination frequency (represented by r) due to the continuous self-pollinations. Their relationship is well known as R ¼ 2r 1þ2r , indicating R is approximately two times of r, when r is small. The larger recombination frequency estimated in RIL population is equivalent to a longer genome. When map distance is 1 cM in DH population, r and R are equal to 0.0099 and 0.0194 under the Haldane mapping function. The corresponding map distance is 1.98 cM for the accumulated recombination frequency in RIL population. Therefore 1 cM in DH population is expanded by 1.98 times in RIL population. Similarly, 5 cM and 10 cM are expanded by 1.91 and 1.83 times. In most genetic populations, marker density is around 5 to 10 cM, and the genome size is expanded by about 1.9 times. Therefore, the genome length should be multiplied by 1.9 before applying the LOD threshold empirical formula to RIL population.

Empirical LOD threshold in QEI mapping
When marker density is 10 cM, the number of independent tests is about 0.072 (α g = 0.05) or 0.084 (α g = 0.01) times the genome size [36]. Empirical LOD thresholds for common genome lengths were given in Table 5, by which empirical LOD thresholds can be found and applied. For example, one population is planted in four environments, the genome length is 1000 cM, and the average marker density is 10 cM. So df is equal to 4 for BC 1 , DH or RIL populations, and 8 for F 2 population. According to Table 5, LOD threshold should be 4.19 for BC 1 and DH populations, 5.87 for F 2 population, and around 4.50 for RIL population (referring to length of 1900 cM which is between 1800 cM and 2000 cM) and α g = 0.05. For the actual maize population, "a QTL was considered to be significant (comparison-wise Type-I error rate α c = 0.001, experiment-wise error rate α e = 0.02) when the LOD exceeded the appropriate threshold 5.67 (joint QTL, seven experiments)" in Messmer et al. [39]. LOD thresholds 6.08 and 6.89 were achieved under α g = 0.05 and α g = 0.01 by 1000 permutation tests. According to the   Detection powers of the predefined QTL were higher than 80%, and FDR was 14.11% ( Table 6). The larger PVE was, the higher detection power would be. Q1, Q2 and Q5 had the same PVE = 12.5%, and their detection powers were 91.2, 98.3 and 89.5% respectively; the detection powers were 85.4 and 80.7% for Q3 and Q4, respectively, having the same PVE of 6.25%.

Power analysis for the unlinked QTL model
When detection powers were calculated by marker intervals along the six chromosomes, it can be seen that most detected QTL were distributed around the marker intervals where the QTL were located (Fig 3). In other words, the ICIM QEI mapping was less likely to locate a QTL in chromosome regions far from the predefined QTL or in other chromosomes where no QTL were located. For example, Q1 was located at 16 cM on chromosome 1. Power at the marker interval where Q1 was located was 83.4%, and powers at its nearest left and right intervals were 5.1% and 14.0%, respectively. Powers at other intervals on chromosome 1 were close to 0.

Estimation of QTL positions and effects for the unlinked model
Estimated QTL positions and effects from the 1000 simulated populations given in Table 6 showed their unbiasedness. Estimated positions of Q1 to Q5 were 16 (Fig 4B and 4C). The trend that higher PVE resulted in larger LOD score was also observed in linked QTL model. For example, PVE of Q1 and Q2 in models L3 and L6 were both 16.14% under H 2 = 0.5, and similar LOD scores at peaks around predefined positions were obtained, i.e., 20.84 and 20.60 in model L3, and 20.32 and 20.85 in model L6, respectively. PVE of Q1 and Q2 in model L2 were both 25% under H 2 = 0.5, and the LOD scores at peaks around QTL were 32.46 and 32.22, respectively, which were much higher than those in models L3 and L6 (Fig 4B).
For the same QTL effect model, higher heritability results in larger PVE and consequently increases the LOD score. Compared with H 2 = 0.1, LOD scores at QTL peaks of H 2 = 0.5 were larger for all effect models. LOD scores at QTL peaks of H 2 = 0.8 were the largest for the three heritability levels (Fig 4). For instance, LOD scores at peaks around Q1 and Q2 in model L8 were 5.20 and 1.67 for H 2 = 0.1, 48.99 and 22.01 for H 2 = 0.5, and 117.96 and 51.75 for H 2 = 0.8. Linked QTL with larger PVE were also easier to be separated. For example for H 2 = 0.1, PVE of Q1 and Q2 in model L1 were 4.88 and 2.44%, and only one peak appeared in the average LOD profile (Fig 4A). PVE of Q1 and Q2 in model L7 were both 11.08% for H 2 = 0.1, and two clear peaks appeared around the two predefined QTL positions.  When detection powers were calculated by marker intervals along chromosome 1, most detected QTL were distributed around the marker intervals where the two QTL were located especially for H 2 = 0.5 and H 2 = 0.8 (Fig 6). For example, Q1 and Q2 in model L3 were located at 25 and 55 cM on chromosome 1. Powers at their two marker intervals were 96.4 and 96.9%, respectively for H 2 = 0.5. Powers were 3.8 and 3.7% at the nearest left and right intervals of Q1, and 4.4 and 4.1% at the nearest left and right intervals of Q2. Powers were rather low at other intervals. It could be found that even for the linked QTL, ICIM QEI mapping was less likely to locate a QTL in chromosome regions far away from the predefined QTL or in other chromosomes where no QTL were located.

QEI mapping in the maize RIL population
Through stepwise regression, 4, 5, 2, 3, 4, 2 and 4 markers were selected for environments WSM1, WSM2, WSZ1, WSZ2, WWM1, WWM2 and WWZ2 respectively. In QEI mapping, profiles of LOD, LOD A and LOD AE along the maize genome were shown in Fig 7A. Under the LOD threshold 5.67, a total of 13 QTL affecting MFLW were identified across the seven environments: one each on chromosomes 8 and 10, two each on chromosomes 2, 3, 4, and 6, and three on chromosome 1 (Table 7). Although QEI was observed in some chromosomal regions, e.g., chromosomes 2, 3 and 10 (Fig 7A), most identified QTL were relatively stable with large LOD A and small LOD AE (Table 7). Five QTL had positive average effects (Table 7). qMFLW-2-2 had the highest LOD = 24.45, LOD A = 18.45 and LOD AE = 5.99, and had the largest average and QEI effects as well. Average additive effect was -0.43, indicating the allele from CML444 at this locus would reduce MFLW by 0.43 days on the basis of population mean. qMFLW-1-1 was      Fig  7B. Under the LOD threshold 3.0, a total of 19 QTL were identified, four of which were coincident in more than two environments. Respectively, 2, 5, 2, 3, 4, 1 and 2 QTL were identified in environments WSM1, WSM2, WSZ1, WSZ2, WWM1, WWM2 and WWZ2 (Table 8). Ten QTL detected by QEI mapping, i.e., qMFLW-1-2, qMFLW-2-1, qMFLW-2-2, qMFLW-3-1, qMFLW-3-2, qMFLW-4-1, qMFLW-4-2, qMFLW-6-1, qMFLW-6-2 and qMFLW-10, were also detected by single-environment analysis, while other QTL were detected only by single-environment analysis (Tables 7 and 8). Take a common QTL around 220 cM on chromosome 1 as an example, the estimated positions were 217 and 220 cM by single-environment analysis in WSM1 and WSZ1 respectively, which was close to qMFLW-1-2 detected by QEI mapping. Compared with single-environment analysis, QEI mapping has the following properties. Single-environment analysis is subject to the errors from different environments probably resulting in different positions and effects of the same QTL. It is not inconvenient to evaluate QTL stability and QEI effect by directly comparing the effects estimated by single-environment analysis. Some studies conduct QTL mapping using the mean phenotypic value across multienvironments or the genotypic value predicted by best linear unbiased prediction [41,42]. But this approach can only detect QTL with significant major effects. When dealing with multienvironment phenotypic data, the estimated positions and effects by QEI mapping are more reliable than single-environment analysis as the data across all environments are used simultaneously.
The two-step strategy used in ICIM simplifies the mapping procedure by separating the cofactor selection from interval mapping [7]. This study demonstrated that the superiority of ICIM has been maintained when extended to QEI mapping. Stepwise regression was conducted only once, based on which the phenotype was adjusted during the interval mapping. This strategy avoids the repeated interval mapping in Boer et al. [1], and requires much less computing time. Another feature distinguishing ICIM from other methods is that major effect and QEI effect of QTL are estimated based on genotypic value of two QTL genotypes, QQ and qq, across multi-environments through the orthogonal decomposition. QTL stability and QEI level can be directly evaluated from the mapping results, including three LODs, three PVEs, major effects and QEI effects.
Using similar algorithms described in this study, ICIM has been extended to dominant QEI mapping, and epistatic QEI mapping as well. In the case of epistatic QEI mapping, the first step was to use stepwise regression to select significant markers and significant marker pairs in each environment. The second step was to apply the two-dimensional interval mapping on the adjusted phenotypic values. Both major epistasis effect and epistasis by environment interaction effect can be estimated. QEI mapping of additive, additive and dominant, and epistatic effects in most biparental populations have been well implemented in the QTL IciMapping software [37].
LOD threshold is used to control false positive in QTL mapping. Use of a suitable threshold is an important issue, as it determines the number of identified QTL and control the genomewide error rate. For convenience, we used an empirical formula to select the LOD threshold in this study. The LOD threshold to define a stable QTL can also be obtained from the empirical formula. For LOD A , df is equal to 1 for DH, BC 1 and RIL populations, and 2 for F 2 population when one-dimension scanning is conducted. Similarly for LOD AE , df is equal to e-1 for DH, BC 1 and RIL populations, and 2(e-1) for F 2 populations to obtain LOD threshold for significant QEI.
LOD threshold in QEI mapping is much higher than that in single-environment analysis, due to the increased degree of freedom. Some QTL detected in single environment may result in peaks lower than the threshold, which will not be reported in QEI mapping. But, most QTL detected in more than one environment and QTL with higher LOD score in single-environment analysis are more likely to be detected in QEI mapping. In the maize RIL population, most QTL detected in WS condition had higher LOD than that in WW condition. For example, 12 QTL were detected in WS, 8 of which had LOD scores over 4. In WW condition, 7 QTL were detected, 3 of which had LOD scores over 4. In addition, more QTL in WS were detected in more than one environment. For example, QTL around 217 cM on chromosome 1 was detected in WSM1 and WSZ1; QTL around 120 cM on chromosome 2 was detected in WSM2 and WSZ2. In comparison, more QTL in WW were detected in only one environment, especially those not identified by QEI mapping.
Genetic architecture of quantitative traits could be more complicated than additive, dominance and digenic epsitatsis discussed in this study. The environments where the traits are phenotyped can be equally complicated. Though we see benefits to apply QEI analysis for multienvironmental trials, we cannot exclude the use of QTL mapping by each environment, and then summarize the mapping results across the environments. Neither can we exclude the use of the estimated breeding values in QTL mapping where the target is to locate the highlyadapted and stable genes.