Conceived and designed the experiments: ZZ JL XD DJdK QZ. Performed the experiments: ZZ JL XD. Analyzed the data: ZZ JL PB DJdK QZ. Contributed reagents/materials/analysis tools: ZZ PB DJdK QZ. Wrote the paper: ZZ PB DJdK QZ.
The authors have declared that no competing interests exist.
With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest.
In the framework of mixed model equations, a new best linear unbiased prediction (BLUP) method including a trait-specific relationship matrix (TA) was presented and termed TABLUP. The TA matrix was constructed on the basis of marker genotypes and their weights in relation to the trait of interest. A simulation study with 1,000 individuals as the training population and five successive generations as candidate population was carried out to validate the proposed method. The proposed TABLUP method outperformed the ridge regression BLUP (RRBLUP) and BLUP with realized relationship matrix (GBLUP). It performed slightly worse than BayesB with an accuracy of 0.79 in the standard scenario.
The proposed TABLUP method is an improvement of the RRBLUP and GBLUP method. It might be equivalent to the BayesB method but it has additional benefits like the calculation of accuracies for individual breeding values. The results also showed that the TA-matrix performs better in predicting ability than the classical numerator relationship matrix and the realized relationship matrix which are derived solely from pedigree or markers without regard to the trait. This is because the TA-matrix not only accounts for the Mendelian sampling term, but also puts the greater emphasis on those markers that explain more of the genetic variance in the trait.
With the advances in molecular biotechnology, genome-wide high-density single nucleotide polymorphisms (SNP) marker data is becoming available for many farm animal and plant species. These data combined with phenotypic data can be used to estimate genetic merit
In the framework of genomic selection, many statistical methods have been proposed to estimate the marker effects in the training population. Based on the assumptions about the statistical distribution of the marker effects, these methods can be classified into two groups. The first group assumes that all markers have some effect on the trait of interest and that the variance of each marker effect is equal. A typical method using this assumption is ridge regression best linear unbiased prediction (RRBLUP)
An alternative to estimating GEBVs by summing up all the marker effects, is to estimate GEBVs directly within the framework of mixed model equations (MME). Conventional ‘animal model’ BLUP has been routinely applied in animal, tree and plant breeding for many decades. The predicting ability for individuals without phenotypic records of this method depends on the structure of the random effect variance-covariance matrix. In the classical MME, a numerator relationship matrix (NRM) based on the pedigree
Current implementations of the RRM are based on the ‘infinitesimal model’
Here we introduce a two-step BLUP method, named ‘best linear unbiased prediction with trait-specific marker derived relationship matrix’ (TABLUP), to estimate GEBVs utilizing trait-specific marker information. A simulation study was performed to investigate the benefit of the presented method for the accuracy of estimated breeding values. The rules to construct the TA matrix were derived. Genomic selection using TABLUP was compared with RRBLUP, BayesB and GBLUP in a range of scenarios. Factors affecting the TABLUP method and its features were discussed.
Our method involves two steps. First, the SNP effects in the training population, in which all individuals have their genotypic and phenotypic data available, are estimated using one of the methods mentioned above. Then, a trait-specific relationship matrix (TA) was derived from all the marker genotypes and their weights obtained from the first step. Finally, GEBVs of genotyped individuals, including all phenotyped individuals and other young non-phenotyped individuals, was estimated using MME with the TA-matrix.
Any method that has been proposed in the framework of genomic selection can be used to estimate marker effects in the training population. In our study, RRBLUP and BayesB were used with the following statistical model:
In the RRBLUP method, the simulated variance components were used as the true variance in the analyses. The
In the BayesB method, the exact ratio of the number of simulated QTL to the total number of markers was used as the prior value of 1−
A relationship matrix constructed using all markers without trait-specific weighting is equivalent to the G matrix in GBLUP, the so-called realized relationship matrix, and is identical for all traits. The trait-specific relationship matrix, in contrast, should specify the genetic covariance between two individuals for the trait of interest. The contribution of each locus to this covariance consists of two components: the IBD between both individuals, which is reflected by their marker genotypes, and the contribution of the locus to the genetic variance in the trait. Thus elements of the TA-matrix were obtained as
To simplify the arithmetic, we first obtained the full identity by state (IBS) at each locus. Subsequently we calculated the weighted average IBS over all loci, and finally corrected for the population average IBS to obtain a mean relatedness equal to zero. The full IBS at locus
Acronym | Estimation |
Weighted |
TAP | RRBLUP | Yes |
TAB | BayesB | Yes |
GBLUP | – | No |
The method used to estimate the marker effects.
For TABLUP, the GEBVs of all genotyped individuals are predicted by solving the MME, which included the TA matrix. The statistical model was
For RRBLUP and BayesB, the GEBV of a genotyped individual was calculated as the sum of all estimated marker effects according to its marker genotypes
The simulation started with a base population of 100 individuals, followed by 1,000 non-overlapping generations with the same population size, denoted as generation −999 to generation 0 to indicate historical generations. In the base population and each historical generation, 50 males were randomly mated with 50 females and each mating produced two offspring (one male and one female). After the 1,000 historical generations, six additional generations, numbered 1 to 6, were simulated. In generation 1, the population size was expanded from 100 to 1,000 by randomly mating 50 males with 50 females from generation 0, where each female produced 20 progeny (10 males and 10 females). From generation 1 to 5, 50 males were randomly selected from the 500 male individuals to be the sires of the next generation, and all 500 females were used as dams without selection. The population size of 1,000 for generation 2 to 6 was obtained by randomly mating each male with 10 females and each female produced two offspring. This resulted in a half sib family structure as depicted in
The simulated genome consisted of five chromosomes with a total length of 5 Morgan (1 Morgan per chromosome). On each chromosome, 1,000 marker loci were randomly located and each segment between two markers was considered to harbor a potential QTL, giving 5,000 markers and 4,995 potential QTL in total. Based on the distance between two adjacent loci, Haldane's mapping function was used to calculate the probability of having a recombination between adjacent loci on the same chromosome.
The mutation-drift equilibrium model was used to create polymorphic markers and QTL. In the base population, all markers and QTL had both alleles coded as 1. Mutations were allowed in all historical generations for all loci with a mutation rate of 1.25×10−3 per locus, per generation, and per animal. Under the mutation-drift equilibrium model, the expected heterozygosity when the population reaches equilibrium is
For each individual from generation 1 to 6, a true breeding value (TBV) was simulated by summing up all true QTL genotypic values, i.e.,
The total genetic variance was computed as the sum of variances across all QTL with the assumption of no correlation between QTL. The simulated additive genetic variance of each QTL was calculated as
Only the 1,000 individuals in generation 1 were assigned a phenotypic record. The phenotypic value
To investigate the effect of number of QTL and heritability on the accuracy of GEBVs, two groups of alternative scenarios were simulated in addition to the standard scenario described above. In the first group, four different levels of heritability were simulated: 0.05, 0.1, 0.3 and 0.9. In the second group, different numbers of QTL were simulated: 100, 200, 500 and 1,000. For all these alternatives, only the intended parameter was altered from the standard scenario. For all scenarios, 10 replicates were simulated.
The simulated (true) QTL effects and the marker effects estimated from RRBLUP and BayesB from one random replicate of the standard scenario are shown in
Panel A shows the absolute values of the simulated true QTL effects throughout the simulated genome. Panel B shows the absolute estimates of the marker effects throughout the genome use the BayesB approach. Panel C shows the absolute estimates of the marker effects throughout the genome use the RRBLUP approach. There were 50 true QTL and 5,000 markers. Beware of the scale difference in panel C.
Method | Correlation | Rank correlation | Regression |
BayesB | 0.809±0.009 | 0.798±0.010 | 0.998±0.014 |
RRBLUP | 0.724±0.011 | 0.710±0.011 | 1.064±0.015 |
TAP | 0.748±0.010 | 0.736±0.010 | 0.949±0.015 |
TAB | 0.790±0.008 | 0.778±0.009 | 0.899±0.016 |
GBLUP | 0.726±0.012 | 0.712±0.011 | 0.997±0.015 |
In breeding practice, rank correlation is more important than Pearson correlation, especially in truncation selection. On average, the rank correlation is 0.013 lower than the Pearson correlation. The ranking of the methods and the trend of both correlations were the same (
The regression coefficient of TBVs on GEBVs was used to measure the biases of GEBVs from different methods (
It is notable that GBLUP and RRBLUP performed equally in terms of correlations, which confirms the theoretical equivalence of the two methods. However, the regression coefficient was slightly different between these two methods (
The decline of accuracy of GEBVs over generations can be a measure of the persistency of the predicting ability for different methods. As shown in
The graph shows the correlation between estimated and true breeding values in generations 2–6 using GEBVs derived by a variable selection approach (BayesB), an approach using infinitesimal model (RRBLUP), BLUP methods with the trait-specific matrix using BayesB weights (TAB), the trait-specific matrix using infinitesimal model weights (TAP) and the average genomic relationship matrix using the infinitesimal model (GBLUP).
With the increase of the number of simulated QTL from 50 to 1,000, the accuracy of GEBVs in generation 2 decreased consistently for BayesB, increased consistently for RRBLUP and GBLUP (except for the case of 200 QTL), and decreased first (from 50 to 200 QTL) and then increased (from 200 to 1000 QTL) for both TABLUP methods, as shown in
Number of QTL | BayesB | RRBLUP | GBLUP | TAB | TAP |
50 | 0.809±0.009 | 0.724±0.011 | 0.726±0.012 | 0.790±0.008 | 0.748±0.010 |
100 | 0.786±0.012 | 0.740±0.017 | 0.739±0.017 | 0.770±0.013 | 0.744±0.015 |
200 | 0.763±0.011 | 0.734±0.012 | 0.735±0.011 | 0.749±0.010 | 0.724±0.010 |
500 | 0.763±0.009 | 0.745±0.009 | 0.748±0.010 | 0.756±0.010 | 0.732±0.009 |
1000 | 0.760±0.010 | 0.756±0.012 | 0.756±0.012 | 0.755±0.012 | 0.736±0.012 |
Heritability | BayesB | RRBLUP | GBLUP | TAB | TAP |
0.05 | 0.407±0.020 | 0.376±0.021 | 0.374±0.020 | 0.394±0.018 | 0.354±0.019 |
0.1 | 0.542±0.023 | 0.472±0.017 | 0.472±0.018 | 0.518±0.015 | 0.464±0.017 |
0.3 | 0.735±0.015 | 0.638±0.014 | 0.641±0.014 | 0.708±0.011 | 0.656±0.013 |
0.5 | 0.809±0.009 | 0.724±0.011 | 0.726±0.012 | 0.790±0.008 | 0.748±0.010 |
0.9 | 0.908±0.004 | 0.861±0.006 | 0.862±0.006 | 0.910±0.004 | 0.886±0.005 |
The main aim of this study was to present the two-step TABLUP method, which utilizes a trait-specific relationship matrix (TA) in the mixed model equations (MME), for estimating genomic breeding values in the framework of genomic selection. Rules to construct the TA matrix were derived and implemented. The performance of the TABLUP method was shown via simulation to compare with several other popular approaches under different scenarios.
The trait-specific relationship matrix TA is related to the trait of interest by including the information of both marker genotypes and the marker effect variances. In terms of predicting ability, the proposed TA matrix is an improvement upon the classical numerator relationship matrix (NRM) and the realized relationship matrix (RRM). In the framework of MME, the conventional BLUP, GBLUP and TABLUP use NRM, RRM and TA matrix as variance-covariance matrix for random genetic effects, respectively. The advantage of RRM over NRM has been investigated previously
The comparable performance of TABLUP and BayesB, especially between TAB and BayesB, suggests that TABLUP might be an equivalent model of BayesB. The equivalence between GBLUP and RRBLUP has been proven under the assumption that all markers contribute equally to the trait of interest
TABLUP and GBLUP have some features that other genomic selection methods based on model (1) lack. The most important feature is that the reliability of an individual's GEBV can be calculated. The reliabilities of GEBVs for single individuals are important for breeders to make selection decisions. The calculation of reliabilities using TABLUP is identical to that outlined for GBLUP by VanRaden
Choosing the right genomic selection method to apply in practical breeding is a challenge for breeders. In simulation studies, BayesB is nearly always better than RRBLUP
In our study, the IBS scoring rule proposed by Eding and Meuwissen
Heritability | GBLUP | TAB | TAP | TAB |
TAP |
0.05 | 0.374±0.020 | 0.394±0.018 | 0.354±0.019 | 0.404±0.021 | 0.388±0.019 |
0.1 | 0.472±0.018 | 0.518±0.015 | 0.464±0.017 | 0.540±0.023 | 0.497±0.016 |
0.3 | 0.641±0.014 | 0.708±0.011 | 0.656±0.013 | 0.734±0.014 | 0.677±0.014 |
0.5 | 0.726±0.012 | 0.790±0.008 | 0.748±0.010 | 0.808±0.009 | 0.764±0.010 |
0.9 | 0.862±0.006 | 0.910±0.004 | 0.886±0.005 | 0.910±0.004 | 0.884±0.005 |
TABLUP with the TA-matrix weighted by the absolute values of estimated marker effects from BayesB and without the correction of the mean IBS.
TABLUP with the TA-matrix weighted by the absolute values of estimated marker effects from RRBLUP and without the correction of the mean IBS.
The weighting rule used to construct the TA matrix was based on the expected covariance between individuals on the basis of the estimated marker effects. Because this follows the theoretical basis of the relationship matrix this type of weighting should in theory be optimal. However, we cannot exclude that for certain scenarios, other ad-hoc approaches may give a higher accuracy. For example, in
In conclusion, this article introduced the TABLUP approach as a flexible alternative between BayesB and GBLUP. For the scenarios studied, the proposed TABLUP method showed an advantage over GBLUP and RRBLUP, and performed nearly equally to BayesB in terms of accuracy of GEBVs. The TA matrix models both the Mendelian sampling term as well as the genetic architecture underlying the trait of interest. Therefore the application of TABLUP in genomic selection merits further exploration.
We are grateful to Chris S. Haley, John A. Woolliams, Ricardo Pong-Wong and Hans D. Daetwyler (The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, UK) for their helpful and constructive comments. We would like to thank two reviewers for their valuable comments.