The genetic dissection of complex traits plays a crucial role in crop breeding. However, genetic analysis and crop breeding have heretofore been performed separately. In this study, we designed a new approach that integrates epistatic association analysis in crop cultivars with breeding by design. First, we proposed an epistatic association mapping (EAM) approach in homozygous crop cultivars. The phenotypic values of complex traits, along with molecular marker information, were used to perform EAM. In our EAM, all the main-effect quantitative trait loci (QTLs), environmental effects, QTL-by-environment interactions and QTL-by-QTL interactions were included in a full model and estimated by empirical Bayes approach. A series of Monte Carlo simulations was performed to confirm the reliability of the new method. Next, the information from all detected QTLs was used to mine novel alleles for each locus and to design elite cross combination. Finally, the new approach was adopted to dissect the genetic basis of seed length in 215 soybean cultivars obtained, by stratified random sampling, from 6 geographic ecotypes in China. As a result, 19 main-effect QTLs and 3 epistatic QTLs were identified, more than 10 novel alleles were mined and 3 elite parental combinations, such as Daqingdou and Zhengzhou790034, were predicted.
Citation: Lü H-Y, Liu X-F, Wei S-P, Zhang Y-M (2011) Epistatic Association Mapping in Homozygous Crop Cultivars. PLoS ONE 6(3): e17773. https://doi.org/10.1371/journal.pone.0017773
Editor: Samuel Hazen, University of Massachusetts Amherst, United States of America
Received: September 27, 2010; Accepted: February 14, 2011; Published: March 15, 2011
Copyright: © 2011 Lü et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by grant 2011CB109300 from the National Basic Research Program of China, grants 30971848 and 30671333 from the National Natural Science Foundation of China, grant KYT201002 from the Fundamental Research Funds for the Central Universities, grant B08025 from the 111 Project, and grant 20100097110035 from Specialized Research Fund for the Doctoral Program of Higher Education. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Germplasm resources play crucial roles in genetics, evolution and breeding, by forming the physical foundation of the study of genetic diversity –, fueling much evolutionary research – and providing the raw material for breeders to produce new cultivars or to further improve the existing ones, due to the existence of many valuable genes in genetic resources –. The identification of valuable genes and markers associated with traits of interest will greatly increase the efficiency of plant breeding programs. However, these beneficial genes are largely unexplored due to the lack of appropriate statistical techniques. Meanwhile, as the complexity of the trait increase, breeding problems increase, for example, favorable alleles in exotic genetic resources are in unadapated genetic backgrounds and linked to other unfavorable alleles. This means that methods to utilize these favorable alleles in crop breeding also need to be further addressed. Accordingly, there is a critical need for in-depth study of methodologies for mining elite alleles in germplasm resources and for the utilization of these elite alleles in crop breeding.
During the past several decades, many attempts have been made to mine elite alleles for objective traits of interest. In early studies, many genes for qualitative traits in crop breeding were studied with morphological and biochemical approaches –, and those for complex diseases in human genetics were identified by both sibling pair analysis – and pedigree analysis –. The introduction of molecular markers has facilitated the genetic association analysis of complex diseases in humans, animals and plants. Single-marker association analysis  and, later, genome-wide association study (GWAS) have been widely used in human genetics . There has been substantial research of two aspects of GWAS: population structure – and mixed genetic models –. However, only one QTL was analyzed at a time in the above models. Likewise, although epistasis association analysis has been utilized in human genetics –, all of the main genetic effects and gene interaction effects have not been simultaneously included in one genetic model. A full genetic model, including all the main and epistatic effects, could improve the power of QTL detection –. Several parameter estimation approaches such as LASSO , , empirical Bayes , and penalized maximum likelihood ,  make this full genetic model possible. Therefore, epistasis association analysis with a full genetic model is feasible in crop germplasm resources.
In the past, most crop breeding methods were based on selection for observable phenotypes and breeding efficiency without markers is simply a function of heritability and choice of parental material. To date molecular markers have improved efficiency of selection largely for traits under simple genetic control and in specific conditions where marker selection is easier/cheaper than phenotypic selection –. However, this approach is only feasible for the improvement of one or several independent genes. If there are interactions among the objective genes, breeding strategy must be addressed by the incorporation of the epistasis , . Carlborg and Haley  showed that epistasis is a common response to selection in breeding programs. Therefore, genetic interaction should be considered in crop breeding strategies.
One purpose of the genetic analysis of quantitative traits is to design a suitable breeding strategy, called breeding by design . However, genetic analysis and crop breeding have traditionally been performed separately; for example, most genetic analyses exclusively use biparental crosses, but these are rarely used alone in commercial breeding. Therefore, the results of these biparental cross experiments have limited roles in breeding practice –. However, direct mapping of QTLs in natural populations, such as crop cultivars, is both economical and practical because the population being mapped is readily available, and the identified QTLs are directly applicable .
The purpose of this study was to develop an epistatic association mapping (EAM) approach in homozygous crop cultivars. We described detailed genetic and statistical models of epistasis association analysis in crop cultivars. All the parameters were estimated using the empirical Bayes approach. Our methods were confirmed by real data analysis in soybean and by a series of Monte Carlo simulation experiments.
We measured seed length in 215 soybean cultivars. The minimum, maximum, average, median, standard deviation, coefficient of variation, skewness and kurtosis values were 5.30, 11.85, 7.94, 7.86, 0.99, 12.43, 0.61 and 0.91, respectively. Results from ANOVA showed that there is significant difference among cultivars (P<10−4) and there are no significant differences between years (P = 0.192) and among cultivar × year interactions (P = 0.328). This means that in the cultivar population, there is a large amount of genetic variation, which exhibits a continuous normal distribution (Fig. 1).
Epistasis association mapping
Two years of phenotypic observations, along with information on 134 SSR molecular markers, were used to dissect the genetic basis of seed length in soybean. In the full model, 9,180 effects needed to be estimated, 40 times larger than the sample size. We adopted a two-stage method . Nineteen main-effect QTLs and 3 epistatic QTLs for seed length in soybean were detected by EAM (Table 1). All of these QTLs were nearly evenly distributed along the soybean genome, except for chromosomes H, J and L. Among these QTLs, the proportion of the total phenotypic variance was from 0.25% to 10.44% for main-effect QTLs and from 5.08% to 7.38% for epistatic QTLs, and each of 12 QTLs contributed greater than 5.0% of the variance. In addition, five loci were involved in epistatic interactions, and only one of these five (sat_342) had a significant main effect. This lack of main effects may create difficulties in detecting epistasis with other methods.
To compare the proposed approach with regular genome-wide association study (GWAS), the GWAS was used to analyze the above dataset. Results showed that three main-effect QTL, linked with markers satt382, sat_254 and satt441, respectively, were detected (Fig. 2a) and no significant environmental and epistatic interactions were identified (Fig. 2). These results are similar to those by the proposed approach in two aspects. First, the three main-effect QTLs detected by the GWAS are also identified by the proposed method. Second, no significant environmental interaction is detected by the above two approaches. However, there are some differences as well. The main difference is that the new approach can detect more main-effect and epistatic QTLs than the GWAS.
Mining elite alleles
The allelic effects of the cultivars were evaluated for all the identified loci for soybean seed length. The reduced model that includes the total mean, the population structure, all the identified loci and the residual error was a mixed model equation. In the reduced model, the allelic effects at each locus were estimated by a maximum likelihood approach. If we want to increase the trait value, we should take the allele with the largest positive effect per main-effect QTL as novel allele. If decreasing the trait value is our selection objective, we should take the allele with the largest negative effect per main-effect QTL as novel allele. The same is true for allele combination of epistatic QTL. The summary statistics for novel allele or allele combination are given in Table 2. These results show that there is one novel allele for each main-effect locus or one novel allele combination for each epistatic QTL. For example, for the locus linked to marker satt656, all the allelic effects are showed in Fig. 3, and novel allele is the allele with an effect of 2.63. Similarly, for the interaction between markers sat_342 and AW277661, novel allele combination is the allele combination with an effect of 1.29. The novel allele and allele combination were found in the Zhengzhou 790034 and Guangxibayuehuang cultivars, respectively.
Predictions for elite cross combination
The elite cross combinations could be predicted from all the detected loci and their effects by using the method described below. In a hypothetical cross between two cultivars, all types of RILs would be produced. In these RILs, seed length could be predicted by the combined effects of all the detected loci. The best RIL with maximum seed length in one cross would represent the cross. The best cross with maximum seed length in all the crosses could be selected by comparing all the crosses. In this study, the best three crosses were Daqingdou × Zhengzhou790034, Zhenghe- zhibanzi × Zhengzhou790034, and Liyangdawuhuangdou × Zhengzhou 790034. The presence of Zhengzhou790034 in the three best crosses indicated that it contained the best allele or allele combination.
Monte Carlo simulation studies
Evaluation of the performance of the proposed approach.
The first simulation experiment was designed to investigate the effect of QTL heritability on QTL mapping in crop cultivars. The results show that the precision and power of the detection of QTLs increase with increasing QTL heritability, and that the false positive rate (FPR) is only 0.0244% (Table S2).
In the second simulation experiment, we investigated the effect of sample size by randomly sampling 100, 200, or 300 non-founder lines. The other parameters were the same as those in the first simulation experiment. As expected, the precision and power increased with increasing sample size (Table S3). Sample sizes under 300 yield much better results than those under 200; we recommend a sample size of 300 for future studies.
The third simulation experiment compared the effect of the number of alleles on QTL mapping in crop cultivars. We set the numbers of alleles at 2, 3 and 4; other parameters were the same as those in the first simulation experiment. The results showed that precision and power decrease as the number of alleles increases (Table S4). The results also imply that the SNP or indel markers are better than the other markers.
In the fourth simulation experiment, the effect of allelic frequency on QTL mapping was assessed by setting the frequency ratio of the two alleles as 1∶1 (uniform distribution), 1∶2 (skewed distribution) or 1∶3 (skewed distribution). The other parameters were the same as those in the first simulation experiment. The results showed that skewed distribution decreased the statistical power (Table S5), indicating that rare alleles should be preferentially studied in association analyses.
The detection of QTL-by-environment interaction.
To investigate whether environmental effects could be detected, all the cultivars were evaluated in multiple environments. In the fifth simulation experiment, two environments, ten main-effect QTL and five QTL-by-environment interactions were simulated. The new method holds greater power for detecting QTL-by-environment interactions than for the main-effect QTL, and the FPR is lower than 0.06% (Table 3). To further demonstrate the performance of the new method, in the sixth simulation experiment, we designed a large genome with high density markers. In total, 510 markers were simulated on ten chromosome segments 1,000 cM long, with an average marker interval of 2 cM. The other parameters were the same as those in the fifth simulation experiment. The same trend in the fifth experiment was obtained (Table 4), indicating that our method works in large genomes with a high marker density.
The identification of QTL-by-QTL interaction.
To demonstrate whether QTL-by- QTL interactions could be detected, all epistatic effects between two main-effect QTLs were included in the full model. In the final simulation experiment, 50 markers were evenly distributed in five linkage groups 450 cM in length. Five main-effect QTLs, 3 QTL-by-environment interactions and 5 QTL-by-QTL interactions were simulated. The results (Table 5) show that the estimates for the positions and variances of simulated QTLs are close to their true values, and the power in the detection of QTL is high (e.g., over 80% for the QTLs with a heritability over 2%), especially for QTL-by-QTL interactions.
The approach proposed in this work has several advantages over the approaches of previous association analysis studies. First, main, environmental, QTL-by- environment and QTL-by-QTL interactions were simultaneously considered in our full genetic model, improving the statistical power –. Although multi-locus genetic models have been proposed in plant genetics –, they have difficulty combining both QTL-by-environment and QTL-by-QTL interactions. Epistasis association mapping has been developed in human genetics –, but here the epistasis was identified by two-dimensional scan, and significant effects in the two-dimensional scan were further tested in one genetic model. Second, epistasis association analysis was first integrated with crop breeding by design. In the past, the results from QTL mapping have had limited utility in breeding practice, due to the use of a simple cross population or the neglect of epistasis in the detection of QTLs. We designed an elite cross combination to take these two issues into account. Third, it is easy to extend the proposed approach to nested association analysis. The commonality is that all the individuals in the mapping populations are inbred lines. The difference is that the pedigree is general for the present study and relatively simple for nested association analysis. Therefore, the new method is suitable for nested association analysis and human genetics. Fourth, the FPR is minimized in the new method. A shrinkage estimation method, empirical Bayes (eBayes), was adopted to estimate all types of effects in the full model so that the FPR was less than 0.06%.
At present the most widely used genome-wide association study (GWAS) is analysis of variance or mixed model approaches with the control of false discovery rate. In theory, it is similar to single-marker analysis for main-effect QTL and two-marker analysis for epistatic QTL, and the difference is that the GWAS requires the setting of a significance threshold at the genome-wide level. However, it does not overcome the shortcomings of marker analysis. If a trait of interest is controlled by multiple QTLs, whether the QTL under consideration can be detected depends on the proportions of phenotypic variance explained by both this QTL and background QTLs. If the proportion by background QTLs is large, large residual variance will result in a decreased power in the detection of the current QTL and sometime the QTL can not be identified. In the new approach, this issue can be avoided, because a full model that includes all kinds of QTL in one genetic model results in a small residual variance. This explains why some main-effect QTLs and all the epistatic QTLs can not be mapped in the soybean genome-wide association study.
Prediction of elite cross combination is based on the assumption that dominance and dominance-type epistasis effects are absent. If the breeding objective is the development of inbred lines or cultivars as often the case in self-pollinated crops, the prediction may be useful. If these non-additive effects are important, then the prediction would not reliable. This issue needs to be addressed in the future.
Xu  described a linear model in which the dimensions of the genotypic value vector and its incidence matrix depend on the number of genotypes for the locus. In theory, this model matches the situation under study. However, the model dimensions will increase rapidly. Therefore, it is preferable to gather more samples or reduce the number of effects considered ,  to reduce the dimensions of the model. In this study, we designed a special incidence matrix such that there is one variable for each main-effect QTL. Simulation studies show that this approach works well. If the number of markers is large, the number of effects in the model is enormous. In this case, the two-stage method of He and Zhang  is recommended. We adopted this approach in our analysis of real data, and the results were consistent with those of He and Zhang  and He et al. . The new approach works well if the marker interval length is approximately 5 cM. However, one must delete some closely linked markers if the interval length is less than 5 cM .
We compared the QTLs of seed length in soybeans with the QTLs in previous studies. Although few common markers existed between their data and ours, some loci that we detected were also detected in previous studies. Seven QTLs linked to markers sat_342, satt534, satt514, sat_365, sat_254, sat_419 and sat_274 in this study were detected by Xu et al. ; four QTLs associated with markers satt411, satt329, satt022 and AW277661 in this paper were identified by Salas et al. ; one QTL close to marker sat_256 was confirmed by Li et al. ; and one QTL next to marker satt514 was mapped by Liang et al. . The above results further confirmed the feasibility of the approach proposed in this study.
Materials and Methods
We recently assembled a soybean association panel with 215 cultivars provided by the National Center for Soybean Improvement, China. All the cultivars were obtained by stratified random sampling from six geographic ecotypes in China , planted in three-row plots in a completely randomized design and evaluated at the Jiangpu experimental station at Nanjing Agricultural University in 2008 and 2009. The plots were 1.5 m wide and 2 m long. Five individuals and 20 seeds in the middle row of each plot were randomly picked to measure seed length by digital vernier caliper. The measurements were averaged over 20 seeds, and the mean was used in this study.
Approximately 0.3 g of fresh leaves obtained in 2008 from each cultivar was used to extract genomic DNA using the cetyltrimethylammonium bromide method as described by Lipp et al. . To screen for polymorphisms among all the cultivars, PCR was performed with 134 simple sequence repeat (SSR) primer pairs. The primer sequences were obtained from the soybean database Soybase (http://www.ncbi.nlm.nih.gov). PCR was performed as described by Xu et al. .
For the soybean data, the STRUCTURE program was used to investigate the population structures of all selected cultivars . The number of subpopulations (K) was set from 2 to 10. In the Markov chain Monte Carlo (MCMC) Bayesian analysis for each K, the length of a Markov chain consisted of 110,000 sweeps. The first 10,000 sweeps (the burn-in period) were deleted, and thereafter, the chain was used to calculate the mean of log-likelihood. This process was repeated 20 times, and the total average for mean log-likelihood at fixed K was used. STRUCTURE analysis with 134 SSR molecular markers showed that the log-likelihood increased with the increase of the model parameter K, so a suitable number of K could not be determined. In this situation, using the ad hoc statistic , based on the rate of change in the log-probability of data between successive K values, STRUCTURE accurately detected the uppermost hierarchical level of structure . Here, the value was much higher for the model parameter than for other values of K. By combining this high value with knowledge of the breeding history of these cultivars, we chose a value of 4 for K. The Q matrix was calculated based on SSR markers and incorporated into the mixed model of epistasis association analysis.
The phenotypic value of a quantitative trait for the ith cultivar in the jth environment (;), , may be described by the following mixed model:(1)where ; is the Q matrix for population structure; and are the design matrices of the environment effect, main effect, QTL-by-environment interaction effect and QTL-by-QTL interaction effect, respectively; and are the corresponding effects; and is the total average. The first three terms were viewed as fixed effects and the following three terms were considered random effects; therefore, model (1) was rewritten as(2)where , , and .
Several methods exist to simultaneously estimate the parameters in model (2); for example, eBayes , . Here, we adopted eBayes. Briefly, the parameter vector in model (2) is . The priors and the likelihood are not described in detail here. The iteration process is given below.
Likelihood ratio test
The traditional likelihood ratio test (LRT), as described by Zhang and Xu , could not be performed in this study, due to an oversaturated epistatic genetic model. We proposed the following two-stage selection process to screen all the effects. In the first stage, all the effects with are picked up. In the second stage, the full model is modified so that only the effects that passed the first round of selection are included. Due to the smaller dimensionality of the reduced model, we can use the maximum likelihood method to reanalyze the data and perform the LRT. The procedure for the LRT is below.
The overall null hypothesis is no effect of the QTL at the locus of interest, denoted by , where is the effect of the tth allele. If we solve the maximum likelihood estimation of the parameters under the restriction of and calculate the log-likelihood value using the solutions with this restriction, we obtain . We can also evaluate the log-likelihood value of the solutions without restrictions and obtain . Therefore, the LR test statistic is(9)
Other test statistics can be used in similar ways. The significance threshold of the LOD score was set at 2.5 for our real data analysis, where.
Genome-wide association study
First, phenotypic values for seed length in 215 soybean cultivars were corrected using population structure obtained by STRUCTURE software. Then, the corrected phenotypes along with SSR marker information were used to carry out genome-wide association studies for main-effect QTLs, environmental interactions and QTL-by- QTL interactions by ANOVA. Finally, critical values at the 0.05 level of significance were determined by 1000 permutation experiments and thus significant QTL could be identified.
We performed seven simulation experiments in this study. In the first, the simulated pedigree was the maize pedigree described by Zhang et al. , . The number of inbred lines within the maize pedigree was 404. Of these, were base (founder) lines, which were in linkage equilibrium so that the genotypes for markers and QTLs with two alleles could be simulated. Non-founders (n1 = 301) were bred via repeated self-pollination of a hybrid between two inbred lines. Thus, each non-founder line represents a recombinant inbred line (RIL) with respect to a known pair of parents. The genotypes of all the non-founders could be generated from the genotypes of their parents, analogous to simulating the genotypes of RILs from their parents. All of the non-founder lines could be used to detect QTLs. To mimic the actual linkage maps that did not have equally spaced markers, 153 markers were simulated on ten chromosome segments of length ∼2258.70 cM, with an average marker interval of 14.86 cM. A total of 20 QTLs, all of which overlapped with the markers, were simulated; the sizes and locations of the QTLs are listed in Table 3. The allelic effects were calculated by relating the genetic variance of the QTL to both the allelic frequencies and the allelic number. The phenotypic value of each line was the sum of the corresponding QTL genotypic values and the residual error, with an assumed normal distribution. Each simulation run consisted of 200 replicates. For each simulated QTL, we counted the samples in which the LOD statistic surpassed 3.0. The ratio of the number of such samples (m) to the total number of replicates (200) represented the empirical power of this QTL. The false-positive rate was calculated as the ratio of the number of false-positive effects to the total number of zero effects considered in the full model. The other simulation experiments were performed similarly. All simulated parameters are given in Table S1.
Simulated parameters in all the simulation experiments.
Multi-QTL detection under various QTL heritabilities in the first simulation experiment (200 replicates).
Effect of sample size on multi-QTL mapping in the second simulation experiment (200 replicates).
Effect of the number of alleles on multi-QTL mapping in the third simulation experiment (200 replicates).
We are grateful to three anonymous referees for their constructive comments and suggestions that significantly improved the presentation of the manuscript.
Conceived and designed the experiments: YMZ. Performed the experiments: XFL HYL SPW. Analyzed the data: HYL. Contributed reagents/materials/analysis tools: HYL XFL SPW. Wrote the paper: YMZ HYL.
- 1. Abdalla AM, Reddy OUK, El-Zik KM, Pepper AE (2001) Genetic diversity and relationships of diploid and tetraploid cottons revealed using AFLP. Theor Appl Genet 102: 222–229.
- 2. Dong YS, Zhuang BC, Zhao LM, Sun H, He MY (2001) The genetic diversity of annual wild soybeans grown in China. Theor Appl Genet 103: 98–103.
- 3. Reif JC, Hamrit S, Heckenberger M, Schipprack W, Maurer HP, et al. (2005) Genetic structure and diversity of European ﬂint maize populations determined with SSR analyses of individuals and bulks. Theor Appl Genet 111: 906–913.
- 4. Milne RI, Abbott RJ (2000) Origin and evolution of invasive naturalized material of Rhododendron ponticum L. in the British isles. Mol Ecol 9(5): 541–556.
- 5. Dillon SL, Shapter FM, Henry RJ, Cordeiro G, Liz Izquierdo, Liz LS (2007) Domestication to crop improvement: Genetic resources for Sorghum and saccharum (Andropogoneae). Annals of Botany 100: 975–989.
- 6. Friesen ML, von Wettberg EJ (2010) Adapting genomics to study the evolution and ecology of agricultural systems. Current Opinion in Plant Biology 13: 119–125.
- 7. Ellis RP, Forster BP, Robinson D, Handley LL, Gordon DC, et al. (2000) Wild barley: a source of genes for crop improvement in the 21 century? J Exp Bot 51: 9–17.
- 8. Upadhyaya HD, Ortiz R (2001) A mini core subset for capturing diversity and promoting utilization of chickpea genetic resources in crop improvement. Theor Appl Genet 102: 1292–1298.
- 9. Warburton ML, Crossa J, Franco J, Kazi M, Trethowan R, et al. (2006) Bringing wild relatives back into the family: recovering genetic diversity in CIMMYT improved wheat germplasm. Euphytica 149: 289–301.
- 10. Sasaki A, Ashikari M, Ueguchi-Tanaka M, Itoh H, Nishimura A, et al. (2002) Green revolution: a mutant gibberellin-synthesis gene in rice. Nature 416(6882): 701–702.
- 11. Zhang X-Q (2002) Three lines hybrid rice. In: Shi Y-C, editor. Chinese Academic canon in the 20th century. Fuzhou: Fujian Education Press. pp. 25–27.
- 12. Stuber CW (1995) Mapping and manipulating quantitative trait in maize. Trends in Genetics 11: 477–481.
- 13. Tanksley SD, Nelson JC (1996) Advanced backcross QTL analysis: a method for the simultaneous discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding lines. Theor Appl Genet 92: 191–203.
- 14. Haseman JK, Elston RC (1972) The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 2: 3–19.
- 15. Wright FA (1997) The phenotypic difference discards sib-pair QTL linkage information. Am J Hum Genet 60: 740–742.
- 16. Drigalenko E (1998) How sib pairs reveal linkage. Am J Hum Genet 63: 1242–1245.
- 17. Forrest W (2001) Weighting improves the “new Haseman-Elston” method. Hum Hered 52: 47–54.
- 18. Sham PC, Purcell S (2001) Equivalence between Haseman-Elston and variance components linkage analyses for sib pairs. Am J Hum Genet 68: 1527–1532.
- 19. Sham PC, Purcell S, Cherny SS, Abecasis GR (2002) Powerful regression -based quantitative trait linkage analysis of general pedigrees. Am J Hum Genet 71(2): 238–253.
- 20. Chen WM, Broman KW, Liang KY (2004) Quantitative trait linkage analysis by generalized estimating equations: unification of variance components and Haseman-Elston regression. Genet Epidemiol 26(4): 265–272.
- 21. Wang T, Elston RC (2005) Two-level Haseman-Elston regression for general pedigree data analysis. Genet Epidemiol 29(1): 12–22.
- 22. Sax K (1923) The association of size difference with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8: 552–560.
- 23. Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273: 1516–1517.
- 24. Smouse PE, Waples RS, Tworek JA (1990) A genetic mixture analysis for use with incomplete source population-data. Can J Fish Aquat Sci 47: 620–634.
- 25. Balding DJ, Nichols RA (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96: 3–12.
- 26. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.
- 27. Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nat Genet 36: 512–517.
- 28. Zhu CS, Yu JM (2009) Nonmetric multidimensional scaling corrects for population structure in association mapping with different sample types. Genetics 182: 875–888.
- 29. Li MY, Reilly MP, Rader DJ, Wang LS (2010) Correcting population stratification in genetic association studies using a phylogenetic approach. Bioinformatics 26(6): 798–806.
- 30. Diao G, Lin DY (2005) A powerful and robust method for mapping quantitative trait loci in general pedigrees. Am J Hum Genet 77: 97–111.
- 31. Zhang Y-M, Mao YC, Xie C, Smith H, Luo L, et al. (2005) Mapping QTL using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics 169: 2267–2275.
- 32. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38(2): 203–208.
- 33. Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S (2006) Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet 79(6): 1002–1016.
- 34. Chen X, Liu CT, Zhang MZ, Zhang HP (2007) A forest-based approach to identifying gene and gene-gene interactions. Proc Natl Acad Sci USA 104: 19199–19203.
- 35. Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions in case-control studies. Nat Genet 39: 1167–1173.
- 36. Phillips P (2008) Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9: 855–867.
- 37. Wan X, Yang C, Yang Q, Xue H, Tang NLS, et al. (2009) MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics 10: 13.
- 38. Zhang Y-M, Xu S (2005) A penalized maximum likelihood method for estimating epistatic effects of QTL. Heredity 95: 96–104.
- 39. Xu S, Jia Z (2007) Genome-wide analysis of epistatic effects for quantitative traits in Barley. Genetics 175: 1955–1963.
- 40. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4(7): e1000130.
- 41. Xu S (2010) An expectation–maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity 105: 483–494.
- 42. Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58: 267–288.
- 43. Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513–521.
- 44. Bernardo R (2008) Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci 48: 1649–1664.
- 45. Holland JB (2004) Implementation of molecular markers for quantitative traits in breeding programs - challenges and opportunities. In New directions for a diverse planet, Proceedings of the 4th International Crop Science Congress, 26 Sep – 1 Oct 2004, Brisbane, Australia. Published on CDROM. Web site http://www.cropscience.org.au/.
- 46. Michelmore RW, Paran I, Kesseli RV (1991) Identification of markers linked to disease resistance genes by bulked-segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proc Natl Acad Sci USA 88: 9828–9832.
- 47. Cho YG, Eun MY, McCouch SR, Chae YA (1994) The semidwarf gene, sd-1, of rice (Oryza sativa L.).II. Molecular mapping and marker-assisted selection. Theor Appl Genet 89: 54–59.
- 48. Mohan M, Nair S, Bhagwat A, Krishna TG, Yano M, et al. (1997) Genome mapping, molecular markers and marker-assisted selection in crop plants. Molecular Breeding 3: 87–103.
- 49. Ribaut JM, Betrán J (1999) Single large-scale marker-assisted selection (SLS-MAS). Molecular Breeding 5: 531–541.
- 50. Zhang TZ, Yuan Y, Yu J, Guo WZ, Kohel RJ (2003) Molecular tagging of a major QTL for fiber strength in upland cotton and its marker-assisted selection. Theor Appl Genet 106: 262–268.
- 51. Jahufer MZZ, Cooper M, Ayres JF, Bray RA (2002) Identification of research to improve the efficiency of breeding strategies for white clover in Australia: A review. Australian Journal of Agricultural Research 53(3): 239–257.
- 52. Dwivedi SL, Crouch JH, Mackill DJ, Xu YB, Blair MW, et al. (2007) The molecularization of public sector crop breeding: Progress, problems, and prospects. Advances in Agronomy 95: 163–318.
- 53. Carlborg Ö, Haley CS (2004) Epistasis: too often neglected in complex trait studies? Nat Rev Genet 5: 618–625.
- 54. Peleman JD, van der Voort JR (2003) Breeding by design. Trends in Plant Sci 8: 330–334.
- 55. Liu YF, Zeng ZB (2000) A general mixture model approach for mapping quantitative trait loci from diverse cross designs involving multiple inbred lines. Genet Res 75: 345–355.
- 56. Blanc G, Charcosset A, Mangin B, Gallais A, Moreau L (2006) Connected populations for detecting quantitative trait loci and testing for epistasis: an application in maize. Theor Appl Genet 113: 206–224.
- 57. Verhoeven KJF, Jannink JL, Mcintyre LM (2006) Using mating designs to uncover QTL and the genetic architecture of complex traits. Heredity 96: 139–149.
- 58. He XH, Zhang Y-M (2008) Mapping epistatic quantitative trait loci underlying endosperm traits using all markers on the entire genome in a random hybridization design. Heredity 101: 39–47.
- 59. Iwata H, Uga Y, Yoshioka Y, Ebana K, Hayashi T (2007) Bayesian association mapping of multiple quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L. germplasm. Theor Appl Genet 114: 1437–1449.
- 60. Iwata H, Ebana K, Fukuoka S, Jannink J-L, Hayashi T (2009) Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L. germplasm. Theor Appl Genet 118: 865–880.
- 61. Zhang Y-M, Lü H-Y, Yao L-L (2008) Multiple quantitative trait loci Haseman-Elston regression using all markers on the entire genome. Theor Appl Genet 117: 683–690.
- 62. Lü H-Y, Li M, Li G-J, Yao L-L, Zhang Y-M (2009) Multiple loci in silico mapping in inbred lines. Heredity 103: 346–354.
- 63. Hoti F, Sillanpää MJ (2006) Bayesian mapping of genotype×expression interaction in quantitative and qualitative traits. Heredity 97: 4–18.
- 64. He X-H, Qin H, Hu Z, Zhang T, Zhang Y-M (2011) Mapping of epistatic quantitative trait loci in four-way crosses. Theor Appl Genet 122: 33–48.
- 65. Xu Y, Li HN, Li GJ, Wang X, Cheng LG, et al. (2011) Mapping quantitative trait loci for seed size traits in soybean (Glycine max L. Merr.). Theor Appl Genet 122: 581–594.
- 66. Salas P, Oyarzo-Llaipen JC, Wang D, Chase K, Mansur L (2006) Genetic mapping of seed shape in three populations of recombinant inbred lines of soybean (Glycine max L. Merr.). Theor Appl Genet 113: 1459–1466.
- 67. Li CD, Jiang HW, Zhang WB, Qiu PC, Liu CY, et al. (2008) QTL analysis of seed and pod traits in soybean. Molecular Plant Breeding 6: 1091–1100.
- 68. Liang HZ, Wang SF, Yu YL, Wang TF, Gong PT, et al. (2008) Mapping quantitative trait loci for six seed shape traits in soybean. Henan Agricultural Science 45: 54–60.
- 69. Wang YS, Gai JY (2002) Study on the ecological regions of soybean in China II. Ecological environment and representative varieties. Chinese Journal of Applied Ecology 13: 71–75.
- 70. Lipp M, Brodmann P, Pietsch K, Pauwels J, Anklam E, et al. (1999) IUPAC collaborative trail study of a method to detect genetically modified soybeans and maize in dried powder. Journal of AOAC International 82: 923–928.
- 71. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14: 2611–2620.