An Efficient Hierarchical Generalized Linear Mixed Model for Mapping QTL of Ordinal Traits in Crop Cultivars

Jian-Ying Feng; Jin Zhang; Wen-Jie Zhang; Shi-Bo Wang; Shi-Feng Han; Yuan-Ming Zhang

doi:10.1371/journal.pone.0059541

Abstract

Many important phenotypic traits in plants are ordinal. However, relatively little is known about the methodologies for ordinal trait association studies. In this study, we proposed a hierarchical generalized linear mixed model for mapping quantitative trait locus (QTL) of ordinal traits in crop cultivars. In this model, all the main-effect QTL and QTL-by-environment interaction were treated as random, while population mean, environmental effect and population structure were fixed. In the estimation of parameters, the pseudo data normal approximation of likelihood function and empirical Bayes approach were adopted. A series of Monte Carlo simulation experiments were performed to confirm the reliability of new method. The result showed that new method works well with satisfactory statistical power and precision. The new method was also adopted to dissect the genetic basis of soybean alkaline-salt tolerance in 257 soybean cultivars obtained, by stratified random sampling, from 6 geographic ecotypes in China. As a result, 6 main-effect QTL and 3 QTL-by-environment interactions were identified.

Citation: Feng J-Y, Zhang J, Zhang W-J, Wang S-B, Han S-F, Zhang Y-M (2013) An Efficient Hierarchical Generalized Linear Mixed Model for Mapping QTL of Ordinal Traits in Crop Cultivars. PLoS ONE 8(4): e59541. https://doi.org/10.1371/journal.pone.0059541

Editor: Rongling Wu, Pennsylvania State University, United States of America

Received: December 19, 2012; Accepted: February 15, 2013; Published: April 2, 2013

Copyright: © 2013 Feng et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the National Basic Research Program of China grant 2011CB109306; the National Natural Science Foundation of China grant 30971848; the Fundamental Research Funds for the Central Universities grants KYT201002 and KJ2011003; a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions and the Specialized Research Fund for the Doctoral Program of Higher Education grants 20100097110035 and 20120097110023. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Many characters of biological interest and economic importance vary in an ordinal form, i.e. disease and tolerance, but are not inherited in a simple Mendelian fashion. More importantly, they cause substantial yield loss. To decrease the loss, developing resistance cultivar is the most economic and effective way. Therefore, there is a critical need for in-depth study of methodology for mining elite alleles for ordinal traits.

During the past several decades, many attempts have been made to mine elite alleles for binary and ordinal traits. The methodologies of mapping quantitative trait loci (QTL) for discrete traits have been well established within the framework of threshold model. On the early stage, almost all the approaches are based on single QTL genetic model [1]–[7]. Later on, several methods have been proposed to simultaneously identify multiple QTL for ordinal traits [8], [9]. Recently, Bayesian methodology has been used to map multi-QTL and epistatic QTL for binary and ordinal traits [10]–[14]. However, all the above approaches are based on bi-parental segregating populations.

Many commercial inbred lines are available in crops. A large amount of elite alleles have preserved among these lines. Mining these elite alleles is the prerequisite in the integration of genetic analysis with crop breeding. Up to now, some approaches for mining elite alleles in crop cultivars have been developed [15]–[20]. All kinds of QTL can be effectively identified, elite alleles can be easily mined and novel parental combination can be effectively predicted [18]. However, these approaches in crop cultivars are for quantitative traits but not for discrete traits. As for discrete traits, too much complication comes from seemingly simple descriptions and unknown population structure meanwhile in fact the underlying biological model may be complicated. Accordingly, genetic analyses may be more challenging for discrete traits than for continuous traits. If pedigree information among these lines is known, Bayesian linkage analysis [21] and variance-components approach [22] have been presented. If the pedigree information is not known, relatively little has been known, except for Iwata et al. [23] and Hoggart et al. [24]. Although Iwata et al. [23] have developed Bayesian multilocus association analysis, the method is implemented via Markov chain Monte Carlo, and computing time becomes a major concern. Although Hoggart et al. [24] proposed simultaneous analysis of all SNPs in genome-wide association study, the method is for case-control dataset.

Multi-QTL mapping for discrete and quantitative traits is now the state-of-the-art method [18]–[20], [24], [25]. However, it is difficult to implement under the maximum-likelihood framework. At present the Bayesian method implemented via expectation-maximization algorithm [26] is specialized to handle complicated models and thus it is the ideal tool for mapping multiple QTL for ordinal trait in crop cultivars. Accordingly, in this study empirical Bayes approach of Xu [26] and the computational algorithm of Yi et al [27] were incorporated into the hierarchical generalized linear model of Yi et al [12] to map main-effect QTL (M-QTL) and QTL-by-environment (QE) interaction for ordinal traits in crop cultivars. The new method was validated by a series of Monte Carlo simulation experiments and real data analysis in soybean.

Results

Phenotypic variation for soybean alkaline-salt tolerance

We measured lengths of main root (LR) of 257 soybean cultivars under the cases of control (CK), 100 mM NaCl and 10 mM Na2CO3. These original trait observations might be transferred into alkaline tolerance index (ATI) and salt tolerance index (STI). To measure the degree of salt-alkaline tolerance, these indexes were partitioned into five grades: high tolerance, tolerance, middle tolerance, sensitivity, and high sensitivity. In other words, this data is ordinal. The phenotypic distributions were shown in Fig. 1 and Table S1. All the two discrete indexes almost exhibited skewed distribution, indicating the existence of genetic variation. Results from test showed that there is significant relationship between the tolerance and environment ( = 44.83 and P<1e-4 for ATI, and = 13.29 and P = 0.004 for STI), indicating the existence of environmental interaction.

Download:

Figure 1. Frequency distribution for soybean alkaline-salt tolerance grade in 2009 (left) and 2010 (right).

https://doi.org/10.1371/journal.pone.0059541.g001

Mapping M-QTL and QE interaction for ATI and STI

A total of 6 M-QTL (3 for ATI, and 3 for STI) and 3 QE interactions (one for ATI, and 2 for STI) for soybean alkaline-salt tolerance are detected by new method, and mapped to chromosomes A1, B2, I, L, N and O. Among them, one QTL, associated with marker sat_274, is responsible simultaneously for the above two traits; seven QTL are consistent with those of continuous ATI and STI using enriched compression mixed linear model (ECMLM) [28] and epistatic association mapping (EAM) [18] methods, and the other two were also confirmed by test of independence ( test); and one M-QTL and one QE interaction are associated simultaneously with marker satt270. A summary of all detected QTL is shown in Table 1.

Download:

Table 1. Association mapping for ordinal alkaline-salt tolerance in 257 soybean cultivars.

https://doi.org/10.1371/journal.pone.0059541.t001

4 ATI QTL, with proportion of phenotypic variance explained by single QTL (PVE) of 3.29–11.04%, are detected and mapped to chromosomes A1, B2 and O. Of these QTL, there are three M-QTL (18.96%) and one QE interaction (11.04%); and three QTL are further identified by ECMLM (or EAM) and test. It should be noted that the PVE by qAT10-2 and qATI5e, associated respectively with sat_274 and sat_344, are greater than 10%.

5 STI QTL, with PVE of 4.21–9.17%, are detected and mapped to chromosomes I, L, N and O. Of these QTL, there are three M-QTL (21.06%) and two QE interactions (13.48%); and all the QTL, except for qSTI10, are further identified by ECMLM (or EAM) and test. It should be noted that the PVE of all the QTL are less than 10%.

Mining elite alleles

The summaries of elite allele and its representative carrier are shown in Table 1. As for the qATI14 associated with Sat_342, there are 12 alleles and one unknown allele. The effects for all these alleles can be estimated by maximum likelihood method. Of these effects, the 260 bp allele has the smallest effect −0.73, being an elite allele, which can be found in soybean cultivar Zunyizongzidou. Similarly, as for the qSTI3e associated with satt270, the 223 bp allele shows the smallest effect in 2010, elite allele combination is the 223 bp allele×2010 with an effect of −0.90.

Predicting novel parental combination

In a hypothetical cross of two cultivars, all the recombinant inbred lines (RILs) from the cross may be produced. In these RILs, the trait values can be predicted by the effects of all the detected loci. The best RIL with minimum value would represent the cross. Therefore, the best cross could be selected from all the crosses. It was found that any cultivar-pair does not pyramid all the elite alleles of the detected QTL. However, some four-cultivar combinations might pyramid all the elite alleles of salt-alkaline tolerances in this study, for example, the best two combinations were Zunyizongzidou × Hunanqiudou 1×Ludou 1×Qi 588-8, and Zunyizongzidou×Hunanqiudou 1×Ludou 2×Qi 588-8, which are used to simultaneously improve the two traits.

Prediction for potential candidate genes

The summary of potential candidate genes for alkaline-salt tolerance in soybean is shown in Table 2. A total of 7 soybean genes homologous to Arabidopsis are linked to 7 markers detected in this study, with physical distances of 206.21–129132.42 kb; and one gene (Glyma03g38040) is closely linked to the associated markers (satt022) in this study, within 210 kb in physical distance.

Download:

Table 2. Prediction for potential candidate genes that are homologous to alkaline-salt tolerance genes in Arabidopsis thaliana.

https://doi.org/10.1371/journal.pone.0059541.t002

Monte Carlo simulation studies

Comparison of new method with both single-QTL method and test of independence.

In the first simulation experiment, each simulated sample was analyzed by three methods. One is multi-QTL-based method in this study (new method), one is to use the new method under the condition of single-QTL model and one is test of independence. All the results are shown in Fig. 2. Among the three methods, the statistical power of the new method is the maximum, and the false positive rate (FPR) of the new method is the minimum. The estimates of QTL effects and threshold values from the new method are closer to the corresponding true values than those from single-QTL method, although all the estimates were slightly biased. Relatively small variations were observed in the new method for the estimates of position and effects of QTL as well as the threshold values. Therefore, the new method works relatively well.

Download:

Figure 2. Comparison of new method with single-QTL-based method and Chi-squared test.

https://doi.org/10.1371/journal.pone.0059541.g002

Effect of phenotypic distribution on QTL mapping.

In the second simulation experiment, the effect of the shape of phenotypic distribution on the new method was assessed by letting the phenotypic distribution of five ordinal categories be set as 1∶1∶1∶1∶1 (uniform distribution), 1∶2∶4∶2∶1 (symmetrical distribution) and 8∶5∶3∶1∶1 (skewed distribution). Other parameters were the same as those in the first simulation experiment. The results are given in Fig. 3. We found that skewed distribution has decreased the statistical power. The optimal power occurred in the situation where the phenotypic distribution is bell-shaped. Relatively small variations were also observed in the three situations for the estimates of position and effects of QTL as well as the threshold values.

Download:

Figure 3. Effect of phenotypic distribution on association mapping for ordinal traits.

https://doi.org/10.1371/journal.pone.0059541.g003

Effect of the number of categories on QTL mapping.

In the third simulation experiment, we evaluated the effect of the number of categories on the new method. The design of the simulation was similar to that described in the first simulation experiment, except for the number of phenotypic categories. We simulated three levels for the number of categories: 2, 6 and 9. The corresponding phenotypic distributions were 1∶1, 1∶3∶6∶6∶3∶1 and 1∶2∶4∶6∶9∶6∶4∶2∶1, respectively. The results are given in Fig. 4, which shows that the estimate of QTL position is very close to its true value in the three cases, and the power for QTL detection increases as the number of categories increases. The reason is that increasing the number of categories has increased the information of predicting the liability from the observed categorical phenotype. In addition, relatively small variations were also observed in the three situations for the estimates of QTL effects and the threshold values.

Download:

Figure 4. Effect of the number of categories on association mapping for ordinal traits.

https://doi.org/10.1371/journal.pone.0059541.g004

Effect of sample size on QTL mapping.

In the fourth simulation experiment, we assumed the pedigree to have the numbers of non-founders of 100, 200, 300 and 500, and the number of founders of 50. One hundred and one equally spaced markers, each with three alleles, were placed on each of three 1000 cM chromosome segments; and eighteen QTL, each with three alleles, were simulated with heritabilities of 0.01–0.15. Other parameters were given in Table 3. The results of five QTL are shown in Fig. 5. As expected, the QTL detection power increases and the variations for the estimates of QTL parameters and the threshold values decreases as sample size or QTL heritability increases.

Download:

Figure 5. Effect of sample size on association mapping for ordinal traits.

https://doi.org/10.1371/journal.pone.0059541.g005

Download:

Table 3. Simulated parameters in all simulated experiments (3 alleles for marker and QTL, and 3 chromosomes).

https://doi.org/10.1371/journal.pone.0059541.t003

Effect of the number of founders on QTL mapping.

In the last simulation experiment, we assumed the pedigree to have the number of non-founders of 200, and the numbers of founders of 25, 50 and 75. Other parameters were the same as those in the fourth simulation experiment. The results of five QTL are shown in Fig. 6. As expected, the QTL detection power increases as the founder number increases and relatively small variations and biasedness for the estimates of QTL parameters and the threshold values were observed.

Download:

Figure 6. Effect of the number of founders on association mapping for ordinal traits.

https://doi.org/10.1371/journal.pone.0059541.g006

Discussion

In this study the probability of , , is viewed as an approximate normal distribution so that empirical Bayes approach could be adopted to estimate genetic effects in the hierarchical generalized linear model for ordinal trait association studies. As a result, M-QTL and QE interaction for ordinal traits in crop cultivars can be identified, elite alleles can be mined and novel parental combinations can be predicted. Clearly, it integrates genetic analyses with crop breeding design. More importantly, the mapping results in this study are reliable because they have been validated in four aspects. First, seven QTL detected by new method are consistent with those by at least one of three approaches: ECMLM, EAM and single marker analysis (Table 1). Second, a total of 7 potential candidate genes homologous to Arabidopsis are found to be around 7 associated markers (Table 2). Third, some QTL were simultaneously identified among alkaline-salt tolerance index, original and ordinal traits, for example, Sat_342 and Satt348 were associated with alkaline tolerance, and Satt270 was associated with salt tolerance. Finally, the results from Monte Carlo simulation studies show that new method improves statistical power and precision, and reduces FPR.

The major contribution of this study is the pseudo data normal approximation of the likelihood function for ordinal trait association studies. The normal likelihood approximation was first developed by Wolfinger and O'Connell [29] and continued by Gelman et al [30]. McGilchrist [31] used a different approach for the same problem, but much easier to understand. Although the method has been explored for binary and binomial traits in linkage studies [32], this study is the first report of the pseudo data approximation for ordinal trait association studies.

We compared the new method with that of Lü et al. [18]. There are some commons between the two approaches. For example, the similar effects of phenotypic distribution (the number of categories, sample size and heritability) on QTL mapping in homozygous cultivars are observed. However, the differences exist as well. For example, the trait is quantitative in Lü et al. [18] and ordinal in this study; and the power for the detection of QTL is lower for this study than for Lü et al. [18], because limited information is observed for ordinal traits. As the number of categories increases, it is better to use the normal trait hierarchical linear mixed model. Note that the main benefit of this study comes from small number of categories. Although Iwata et al. [23] and Hoggart et al. [24] are for ordinal traits, in this study main-QTL, environmental effect and QTL-by-environment interactions were simultaneously considered in our full genetic model, improving the statistical power and estimation precision.

As compared with genome-wide association studies in Yu et al. [16] and Zhang et al. [17], kinship matrix was not considered in this study. In fact, this term is related to background control, which is similar to co-variable markers in composite interval mapping. Note that all the main-effect QTL and QTL-by-environment interactions are included in the full genetic model of this study. Thus, it is unnecessary to consider this term in the current study. In addition, in real data analysis we also consider the effect of population structure on association studies. As a result, a slightly different result is observed while Q matrix is deleted from the above full model.

Epistasis, the interaction between QTL, plays an important role in the dissection of genetic architecture for complex traits. To date, several approaches have been developed, including multiple interval mapping, Bayesian approach, and penalized maximum likelihood method. Most of these methods are for quantitative traits in bi-parental segregating populations. In homozygous cultivars, it is relatively difficult. Because of its complexity, it will be investigated separately in a future project.

Materials and Methods

Soybean samples

257 soybean cultivars used in this study were mainly provided by the National Center for Soybean Improvement, China. All the cultivars were obtained by stratified random sampling from six geographic ecotypes in China, planted in three-row plots in a completely randomized design and evaluated at the Jiangpu experimental station at Nanjing Agricultural University in 2009 and 2010. The plots were 1.5 m wide and 2 m long. Twelve seeds for each cultivar were sown in a 30×20×15 cm plastic container with the 3.5 cm height sand and then treated with control (CK), 100 mM NaCl and 10 mM Na₂CO₃, and each with two replications. They were grown in a growth chamber under white fluorescent light (600 µmol m⁻² s⁻¹; 14 h light/10 h dark) at 25±1°C. Length of main root (LR, centimetre) for healthy seedlings were measured from 5 plants 7 days after sowing. To measure the degree of salt-alkaline tolerance, original trait observations might be transferred into salt-alkaline tolerance index for each trait using the below equationswhere , and stand for phenotypic values in control, saline and alkaline treatments, respectively [33]. The tolerance grades 1 to 5, used in this study, were indicated by 0–20%, 20–40%, 40–60%, 60–80% and 80–100%, respectively.

Approximately 0.3 g of fresh leaves obtained in 2008 from each cultivar was used to extract genomic DNA using the cetyltrimethylammonium bromide method as described by Lipp et al. [34]. To screen for polymorphisms among all the cultivars, polymerase chain reaction (PCR) was performed with 135 simple sequence repeat (SSR) primer pairs. The primer sequences were obtained from the soybean database Soybase (http://www.ncbi.nlm.nih.gov). PCR was performed as described by Xu et al. [35] and Wei et al. [36].

Population structure

For the soybean sample, the STRUCTURE software [37] was used to investigate the population structures of all selected cultivars. The number of subpopulations (K) was set from 2 to 10. In the Markov chain Monte Carlo (MCMC) Bayesian analysis for each K, the length of a Markov chain consisted of 110,000 sweeps. The first 10,000 sweeps (the burn-in period) were deleted, and thereafter, the chain was used to calculate the mean of log-likelihood. This process was repeated 20 times, and the total average for mean log-likelihood at fixed K was used. STRUCTURE analysis with 135 SSR molecular markers showed that the log-likelihood increased with the increase of the model parameter K, so a suitable number of K could not be determined. In this situation, using the ad hoc statistic ΔK, based on the rate of change in the log-probability of data between successive K values [38], STRUCTURE accurately detected the uppermost hierarchical level of structure. Here, the ΔK value was much higher for the model parameter K = 4 than for other values of K [33]. By combining this high ΔK value with knowledge of the breeding history of these cultivars, we chose a value of 4 for K. The Q matrix was calculated based on SSR markers and incorporated into the hierarchical generalized linear mixed model in this study.

Generalized linear mixed Model

Let () be the vector of underlying latent variable or liability of cultivar j. For the jth cultivar, it is postulated that(1)where is non-genetic effects, i.e., population mean () and environmental effect ; is allelic effect for and allele-by-environment interaction effect for , , and is the number of alleles for locus ; and are dummy variables of and for cultivar j, respectively; and is the random residual error with an distribution. will be adopted here because the liabilities are unobservable.

Methods of estimating allelic effects and allele-by-environment interaction effects are the same. For the sake of clarity of notation, we redefine the design matrix and the regression coefficients as follows. Let and . The above model is now rewritten as(2)where .

Let denote the vector of observed ordinal data. Here each represents an assignment into C ordinal categories. These classes result from the hypothetical existence of thresholds () in the latent scale. The relationship between and is indicated by(3)

The conditional probability that falls in category c, given , and , is given by(4)where is the cumulative distribution function of standard normal distribution. The data are conditionally independent, given , and . Therefore, log-likelihood function can be written as(5)where ; and is an indicator function taking value of 1 if and 0 otherwise.

Prior distribution and joint posterior density

The parameters and are treated as fixed and random effects, respectively. The number of random effects in the above genetic model is very large so that the model is oversaturated. Therefore, the hierarchical generalized linear mixed model is adopted in this study. It is assumed that each genetic effect has a different variance . The following prior distributions are chosen for building the hierarchical modelwhere and are the constants given in advance. When , the method works well. The joint posterior distribution has a form of(6)where

Parameter estimation

Genetic effect.

As shown in Wolfinger and O'Connell [29], is an approximate normal distribution , where pseudo-data , ; pseudo-mean ; pseudo-variance ; , ; ; where is the probability density function of standard normal distribution. The conditional log-posterior distribution, related to , is indicated by

Using expectation-maximization empirical Bayes approach of Xu [26], the expectation of the quadratic term required in the maximization step is expressed as(7)where , , , , and . Once a certain criterion of convergence is satisfied, the converged is the estimate for .

Genetic effect variances and related hyperparameters.

According to joint posterior density in equation (6), conditional posterior distribution is for , for and for . Here the mode is used to estimate the corresponding parameter, such as,(8)

Non-genetic effect β.

Formula for the fixed effect follows the standard procedure of mixed model methodology, we have(9)

Thresholds.

Using the Newton–Raphson method, the threshold are estimated by(10)where is the estimate of parameter at the sth iteration,

Summary of iterations

1. Let and , and provide initial values for , for example, let be an uniform random number, , be the quantile of the standard normal distribution based on the phenotypic distribution of , be a gamma random number. and can be obtained by equation (8).

2. Update , and using equation (8);

3. Update using the estimate of ;

4. Update using equation (10);

5. Update using equation (9);

6. Repeat step 2 to step 5 until predetermined criterion of convergence is satisfied.

Statistical test

A two-stage selection process in Lü et al. [18] was used to conduct likelihood ratio test (LRT) for all the QTL. In the first stage, all the markers were included in the model. If the estimate of an absolutely allelic effect (environmental interaction effect) at the kth locus is greater than , the kth locus is picked up. In the second stage, we modified the full model only to contain the effects passing the first round of selection. If doing so, we can use the maximum likelihood method to perform the LRT.

The overall null hypothesis is no effect of the oth QTL (or interacted QTL), denoted by , where is the ath effect for the QTL. If we solve the maximum likelihood estimation of the parameters under the restriction of the and calculate the log-likelihood value using the solutions with this restriction, we obtain . We can also evaluate the log-likelihood value of the solutions without restrictions and obtain . Therefore, the LRT statistic is and the significance threshold of the LOD score was set at 2.0.

Simulation design

We performed six simulation experiments in this study. In the first, the simulated pedigree was similar to the maize pedigree described by Zhang et al. [15]. In current pedigree, the numbers of founders and non-founders were 100 and 200, respectively. Of these, founder lines were in linkage equilibrium so that the genotypes for markers and QTL with three alleles could be simulated. In other words, three alleles for each locus were assigned in equal proportions to each founder. Non-founders were bred via repeated self-pollination of a hybrid between two inbred lines. Thus, each non-founder line represents a RIL with respect to a known pair of parents. The genotypes of all the non-founders could be generated from the genotypes of their parents, analogous to simulating the genotypes of RIL from their parents. All of the non-founder lines could be used to detect QTL. Thirty-three equally spaced markers were simulated on three-chromosome segments 300 cM long. A total of 3 QTL, all of which overlapped with the markers, were placed at 50 cM of each chromosome; the QTL size, being the proportion of total phenotypic variance explained by the QTL, is 0.05, 0.10 and 0.15, respectively. The allelic effects were calculated by relating the genetic variance of the QTL to the allelic frequencies and effects. The phenotypic value of each line was the sum of the corresponding QTL genotypic values and the residual error, with an assumed normal distribution. These phenotypic values could be transferred into five ordinal categories with four threshold values: −1.2816, −0.5244, 0.5244 and 1.2816. Therefore, the frequencies of the five ordinal categories occurring in all the inbred lines have a ratio of 1∶2∶4∶2∶1. Each simulation run consisted of 100 replicates. For each simulated QTL, we counted the samples in which the LOD statistic surpassed 2.0. The ratio of the number of such samples (m) to the total number of replicates (100) represented the empirical power of this QTL. The FPR was calculated as the ratio of the number of false positive effects to the total number of zero effects considered in the full model. The other simulation experiments were performed similarly. All simulated parameters are given in Table 3.

A SAS program is available from the authors on request.

Supporting Information

Table S1.

Phenotypic values of ATI and STI in 257 soybean cultivars under study.

https://doi.org/10.1371/journal.pone.0059541.s001

(DOC)

Author Contributions

Conceived and designed the experiments: YMZ. Performed the experiments: JYF JZ WJZ SBW SFH. Analyzed the data: JYF JZ. Contributed reagents/materials/analysis tools: JYF. Wrote the paper: YMZ JYF.

References

1. Hackett CA, Weller JI (1995) Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51: 1252–1263.
- View Article
- Google Scholar
2. Xu S, Atchley WR (1996) Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics 143: 1417–1424.
- View Article
- Google Scholar
3. Rao SQ, Xu S (1998) Mapping quantitative trait loci for categorical traits in four-way crosses. Heredity 81: 214–224.
- View Article
- Google Scholar
4. Rao SQ, Li X (2000) Strategies for genetic mapping of categorical traits. Genetica 109: 183–197.
- View Article
- Google Scholar
5. Xu S, Yi N, Burke D, Galecki A, Miller RA (2003) An EM algorithm for mapping binary disease loci: application to fibrosarcoma in a four-way cross mouse family. Genet Res 82: 127–138.
- View Article
- Google Scholar
6. Xu C, Zhang YM, Xu S (2005) An EM algorithm for mapping quantitative resistance loci. Heredity 94: 119–128.
- View Article
- Google Scholar
7. Ramalingam J, Sevi A (2010) Mapping and tagging of qualitative traits in crop plants. In: Singh RK, Singh R, Ye GY, et al.. Molecular Plant Breeding: Principle, Method and Application. Houston: Studium Press LLC: . pp135–159.
8. Coffman CJ, Doerge RW, Simonsen KL, Nichols KM, Duarte CK, et al. (2005) Model selection in binary trait locus mapping. Genetics 170: 1281–1297.
- View Article
- Google Scholar
9. Li J, Wang S, Zeng Z-B (2006) Multiple interval mapping for ordinal traits. Genetics 173: 1649–1663.
- View Article
- Google Scholar
10. Yi N, Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155: 1391–1403.
- View Article
- Google Scholar
11. Yi N, Xu S, George V, Allison DB (2004) Mapping multiple quantitative trait loci for complex ordinal traits. Behav Genet 34: 3–15.
- View Article
- Google Scholar
12. Yi N, Banerjee S, Pomp D, Yandell BS (2007) Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits. Genetics 176: 1855–1864.
- View Article
- Google Scholar
13. Wu XL, Gianola D, Weigel K (2009) Bayesian joint mapping of quantitative trait loci for Gaussian and categorical characters in line crosses. Genetica 135: 367–377.
- View Article
- Google Scholar
14. Gonzalez-Recio O, Forni S (2011) Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genetics Selection Evolution 43(1): 7.
- View Article
- Google Scholar
15. Zhang Y-M, Mao Y C, Xie C Q, Smith H, Luo L, et al. (2005) Mapping QTL using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics 169: 2267–2275.
- View Article
- Google Scholar
16. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38: 203–208.
- View Article
- Google Scholar
17. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42: 355–360.
- View Article
- Google Scholar
18. Lü HY, Liu XF, Wei SP, Zhang YM (2011) Epistatic association mapping in homozygous crop cultivars. PLoS ONE 6(3): e17773.
- View Article
- Google Scholar
19. Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44: 821–824.
- View Article
- Google Scholar
20. Segura V, Vilhjálmsson BJ, Platt A, Korte A, Seren Ü, et al. (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44: 825–830.
- View Article
- Google Scholar
21. Brisbin A, Weissman MM, Fyer AJ, Hamilton SP, Knowles JA, et al. (2010) Bayesian linkage analysis of categorical traits for arbitrary pedigree designs. PLoS ONE 5: e12307.
- View Article
- Google Scholar
22. Diao G, Lin DY (2010) Variance-components methods for linkage and association analysis of ordinal traits in general pedigrees. Genetic Epidemiology 34: 232–237.
- View Article
- Google Scholar
23. Iwata H, Ebana K, Fukuoka S, Jannink JL, Hayashi T (2009) Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L. germplasms. Theor Appl Genet 118: 865–880.
- View Article
- Google Scholar
24. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130.
- View Article
- Google Scholar
25. Zhang YM, Xu S (2005) A penalized maximum likelihood method for estimating epistatic effects of QTL. Heredity 95: 96–104.
- View Article
- Google Scholar
26. Xu S (2010) An expectation–maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity 105: 483–494.
- View Article
- Google Scholar
27. Yi N, Liu N, Zhi D, Li J (2011) Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 7: e1002382.
- View Article
- Google Scholar
28. Li M (2011) Methodologies for functional mapping of quantitative trait loci and genome-wide association study (Ph D dissertation). Nanjing Agricultural University.
29. Wolfinger R, O'Connell M (1993) Generalized linear mixed models a pseudo-likelihood approach. Journal of Statistical Computation and Simulation 48: 233–243.
- View Article
- Google Scholar
30. Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis. London: Chapman and Hall/CRC, New York.
31. Mcgilchrist CA (1994) Estimation in generalized mixed models. Journal of Royal Statistical Society, Series B 56: 61–69.
- View Article
- Google Scholar
32. Che X, Xu S (2012) Generalized linear mixed models for mapping multiple quantitative trait loci. Heredity 109: 41–49.
- View Article
- Google Scholar
33. Zhang WJ (2012) Evaluation and association mapping for soybean salt-alkaline tolerance at seeding stage (Master of Science Dissertation). Nanjing Agricultural University.
34. Lipp M, Brodmann P, Pietsch K, Pauwels J, Anklam E, et al. (1999) IUPAC collaborative trail study of a method to detect genetically modified soybeans and maize in dried powder. Journal of AOAC International 82: 923–928.
- View Article
- Google Scholar
35. Xu Y, Li HN, Li GJ, Wang X, Cheng LG, et al. (2011) Mapping quantitative trait loci for seed size traits in soybean (Glycine max L. Merr.). Theor Appl Genet 122: 581–594.
- View Article
- Google Scholar
36. Wei S-P, Liu X-F, Yang S-X, Lü HY, Niu Y, et al. (2011) Comparison of various clustering methods for population structure in Chinese cultivated soybean (Glycine max L. Merr.). Journal of Nanjing Agricultural University 34(2): 13–17.
- View Article
- Google Scholar
37. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2): 945–959.
- View Article
- Google Scholar
38. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Molecular Ecology 14: 2611–2620.
- View Article
- Google Scholar

[ref1] 1. Hackett CA, Weller JI (1995) Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51: 1252–1263.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Xu S, Atchley WR (1996) Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics 143: 1417–1424.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Rao SQ, Xu S (1998) Mapping quantitative trait loci for categorical traits in four-way crosses. Heredity 81: 214–224.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Rao SQ, Li X (2000) Strategies for genetic mapping of categorical traits. Genetica 109: 183–197.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Xu S, Yi N, Burke D, Galecki A, Miller RA (2003) An EM algorithm for mapping binary disease loci: application to fibrosarcoma in a four-way cross mouse family. Genet Res 82: 127–138.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Xu C, Zhang YM, Xu S (2005) An EM algorithm for mapping quantitative resistance loci. Heredity 94: 119–128.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Ramalingam J, Sevi A (2010) Mapping and tagging of qualitative traits in crop plants. In: Singh RK, Singh R, Ye GY, et al.. Molecular Plant Breeding: Principle, Method and Application. Houston: Studium Press LLC: . pp135–159.

[ref8] 8. Coffman CJ, Doerge RW, Simonsen KL, Nichols KM, Duarte CK, et al. (2005) Model selection in binary trait locus mapping. Genetics 170: 1281–1297.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref9] 9. Li J, Wang S, Zeng Z-B (2006) Multiple interval mapping for ordinal traits. Genetics 173: 1649–1663.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref10] 10. Yi N, Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155: 1391–1403.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Yi N, Xu S, George V, Allison DB (2004) Mapping multiple quantitative trait loci for complex ordinal traits. Behav Genet 34: 3–15.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref12] 12. Yi N, Banerjee S, Pomp D, Yandell BS (2007) Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits. Genetics 176: 1855–1864.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref13] 13. Wu XL, Gianola D, Weigel K (2009) Bayesian joint mapping of quantitative trait loci for Gaussian and categorical characters in line crosses. Genetica 135: 367–377.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Gonzalez-Recio O, Forni S (2011) Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genetics Selection Evolution 43(1): 7.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Zhang Y-M, Mao Y C, Xie C Q, Smith H, Luo L, et al. (2005) Mapping QTL using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics 169: 2267–2275.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref16] 16. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38: 203–208.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref17] 17. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42: 355–360.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref18] 18. Lü HY, Liu XF, Wei SP, Zhang YM (2011) Epistatic association mapping in homozygous crop cultivars. PLoS ONE 6(3): e17773.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref19] 19. Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44: 821–824.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref20] 20. Segura V, Vilhjálmsson BJ, Platt A, Korte A, Seren Ü, et al. (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44: 825–830.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref21] 21. Brisbin A, Weissman MM, Fyer AJ, Hamilton SP, Knowles JA, et al. (2010) Bayesian linkage analysis of categorical traits for arbitrary pedigree designs. PLoS ONE 5: e12307.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref22] 22. Diao G, Lin DY (2010) Variance-components methods for linkage and association analysis of ordinal traits in general pedigrees. Genetic Epidemiology 34: 232–237.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref23] 23. Iwata H, Ebana K, Fukuoka S, Jannink JL, Hayashi T (2009) Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L. germplasms. Theor Appl Genet 118: 865–880.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Zhang YM, Xu S (2005) A penalized maximum likelihood method for estimating epistatic effects of QTL. Heredity 95: 96–104.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Xu S (2010) An expectation–maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity 105: 483–494.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref27] 27. Yi N, Liu N, Zhi D, Li J (2011) Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 7: e1002382.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref28] 28. Li M (2011) Methodologies for functional mapping of quantitative trait loci and genome-wide association study (Ph D dissertation). Nanjing Agricultural University.

[ref29] 29. Wolfinger R, O'Connell M (1993) Generalized linear mixed models a pseudo-likelihood approach. Journal of Statistical Computation and Simulation 48: 233–243.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref30] 30. Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis. London: Chapman and Hall/CRC, New York.

[ref31] 31. Mcgilchrist CA (1994) Estimation in generalized mixed models. Journal of Royal Statistical Society, Series B 56: 61–69.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref32] 32. Che X, Xu S (2012) Generalized linear mixed models for mapping multiple quantitative trait loci. Heredity 109: 41–49.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref33] 33. Zhang WJ (2012) Evaluation and association mapping for soybean salt-alkaline tolerance at seeding stage (Master of Science Dissertation). Nanjing Agricultural University.

[ref34] 34. Lipp M, Brodmann P, Pietsch K, Pauwels J, Anklam E, et al. (1999) IUPAC collaborative trail study of a method to detect genetically modified soybeans and maize in dried powder. Journal of AOAC International 82: 923–928.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref35] 35. Xu Y, Li HN, Li GJ, Wang X, Cheng LG, et al. (2011) Mapping quantitative trait loci for seed size traits in soybean (Glycine max L. Merr.). Theor Appl Genet 122: 581–594.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref36] 36. Wei S-P, Liu X-F, Yang S-X, Lü HY, Niu Y, et al. (2011) Comparison of various clustering methods for population structure in Chinese cultivated soybean (Glycine max L. Merr.). Journal of Nanjing Agricultural University 34(2): 13–17.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref37] 37. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2): 945–959.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref38] 38. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Molecular Ecology 14: 2611–2620.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

Figures

Abstract

Introduction

Results

Phenotypic variation for soybean alkaline-salt tolerance

Mapping M-QTL and QE interaction for ATI and STI

Mining elite alleles

Predicting novel parental combination

Prediction for potential candidate genes

Monte Carlo simulation studies

Comparison of new method with both single-QTL method and test of independence.

Effect of phenotypic distribution on QTL mapping.

Effect of the number of categories on QTL mapping.

Effect of sample size on QTL mapping.

Effect of the number of founders on QTL mapping.

Discussion

Materials and Methods

Soybean samples

Population structure

Generalized linear mixed Model

Prior distribution and joint posterior density

Parameter estimation

Genetic effect.

Genetic effect variances and related hyperparameters.

Non-genetic effect β.

Thresholds.

Summary of iterations

Statistical test

Simulation design

Supporting Information

Table S1.

Author Contributions

References