Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation

We evaluated the performance of GBLUP including dominance genetic effect (GBLUP-D) by estimating variances and predicting genetic merits in a computer simulation and 2 actual traits (T4 and T5) in pigs. In simulation data, GBLUP-D explained more than 50% of dominance genetic variance. Moreover, GBLUP-D yielded estimated total genetic effects over 1.2% more accurate than those yielded by GBLUP. In particular, when the dominance genetic variance was large, the accuracy could be substantially improved by increasing the number of markers. The dominance genetic variances in T4 and T5 accounted for 9.6% and 6.3% of the phenotypic variances, respectively. Estimates of such small dominance genetic variances contributed little to the improvement of the accuracies of estimated total genetic effects. In both simulation and pig data, there were nearly no differences in the estimates of additive genetic effects or their variance between GBLUP-D and GBLUP. Therefore, we conclude GBLUP-D is a feasible approach to improve genetic performance in crossbred populations with large dominance genetic variation and identify mating systems with good combining ability.


Introduction
Genomic selection refers to the use of genome-wide dense single nucleotide polymorphism (SNP) markers to predict breeding values and subsequently select individuals [1]. Several approaches of genomic prediction have been presented. One of them is the genomic best linear unbiased prediction (GBLUP), which uses genomic information in the form of a genomic relationship matrix that defines the additive genetic covariance between individuals [2,3]. The genomic relationship coefficients are estimated with higher accuracy than when using pedigree information because genomic information can capture of Mendelian sampling across the genome. GBLUP has become popular approach in genomic selection of dairy cattle [4,5] because it is simple and has low computational requirements [6,7].
Most published models only include additive genetic effects [8], and little research has been performed to expand these models to predict genetic merits to account for dominance genetic effects. It can be argued that such expansion is difficult because calculation becomes complicated and de-regressed estimated breeding values are used as phenotypes in most applications of genomic selection [9]. However, dominance genetic effect is of theoretical and practical important because it is heavily used in crosses of animal breeds. In fact, assortative mating and mate allocation boost the field performances of livestock [10]. Genomic selection has, therefore, renewed the interest in the prediction of dominance genetic effects. For example, the dominance genetic variance accounted for 5.6% of the phenotypic variance by GBLUP including dominance genetic effect [11]. More recently, GBLUP method including dominance genetic effect was suggested and the software (GVCBLUP) are already available online (http:// animalgene.umn.edu/) [12][13][14].
The present study aimed to evaluate the performance of GBLUP including dominance genetic effect by estimating variance components and predicting genetic effects for both simulation and actual pig data.

Stochastic Simulation
A historical population was simulated to establish mutation drift equilibrium. The simulated genome comprised one chromosome 1 Morgan long that contained 6,000 SNP markers and 300 randomly spaced biallelic quantitative trait loci (QTL). In the first generation of the historical population, the initial allele frequencies of all markers and QTL were assumed to be 0.5. The recurrent mutation process was applied, and the mutation rate of markers and QTL was 5.0610 24 per locus per generation. Recombinations were sampled from a Poisson distribution with a mean of 1 per Morgan and were then randomly placed along the chromosome. The historical population evolved over 2,000 generations of random mating and random selection with a population size of 100 (50 males and 50 females) to reach mutation-drift balance.
After 2,000 historical generations, a base population (G0) and the subsequent 6 generations (G1 to G6) were generated as a recent population. The population size of G0 increased to 300 (150 males and 150 females). In G1 to G6, 30 sires were randomly selected and mated to 150 dams in each generation. Each dam had 1 son and 1 daughter; thus, each sire had 5 sons and 5 daughters.
In G0, 1,000 markers and 50 QTL were randomly selected among the segregating markers and QTL with minor allele frequencies .0.05. Let Q 1j and Q 2j be 2 alleles at the jth QTL. The genetic values are then given by a j , d j , and {a j for genotypes Q 1j Q 1j , Q 1j Q 2j , and Q 2j Q 2j , respectively. The value of a j was drawn from a gamma distribution with a shape parameter of 0.42; its sign was drawn at random with equal chance. According to the previous simulation study including dominance genetic effects [15], the value of d j was determined as the product of the absolute of a j and the degree of dominance, which was drawn from a normal distribution N(0,t 2 ). The total genetic effect (g i ) of the jth animal was calculated by summing all QTL genotypic values, and its variance (s 2 g ) was calculated as the sum of additive and dominance genetic variances (s 2 a and s 2 d ) [16], which were calculated as follows: where q j is the frequency of Q 1j . The broad-sense heritability (h 2 ) of the trait was 0.3. To obtain phenotypic values, an environmental effect was added to the total genetic effect, which was sampled from a normal distribution N(0,(1{h 2 )s 2 g =h 2 ). The phenotypes and genotypes of SNP markers were available for 1,500 and 1,800 individuals from G1 to G5 and G1 to G6, respectively. Thus, the reference population with both phenotypes and genotypes comprised 1,500 individuals from G1 to G5, and the test population with only genotypes comprised 300 individuals in G6.
In a standard simulation scenario, t and the number of markers in G0 (N m ) were set to 0.5 and 1,000, respectively. To investigate the effects of t and N m on the performance of the present method, 2 alternative scenarios were simulated in addition to the standard scenario. Three different values of t (0.25, 0.5, and 1.0) and N m (200, 1,000, and 5,000) were simulated in the first and seconds groups, respectively. For all of these alternatives, only the intended parameter differed from the standard scenario. Twenty replicates were simulated for each scenario.

PIC Pig Data
Publicly available data including pedigree, genotypic, and phenotypic information on a single Pig Improvement Company (PIC) nucleus pig line were used (http://www.g3journal.org/ content/2/4/429/Suppl/DC1). The total number of individuals was 3,512, and all have phenotypes for 2 traits (T4 and T5) and genotypes available from the PorcineSNP60 chip (N = 64,233). These phenotypes were already adjusted for environmental fixed effects such as sex, farm, and year of birth [17]. Then, 1,800 individuals were randomly selected from all individuals whose accuracies from the full PIC dataset exceeded 0.8. From these 1,800 individuals, 1,500 were selected from old generations and defined as the reference population while the other 300 individuals were defined as the test population. Genotypes were filtered for minor allele frequencies less than 0.05. The pig data are summarized in Table 1.

Genomic BLUP Model
The GBLUP including additive and dominance genetic effects termed GBLUP-D. The statistical model of GBLUP-D can be expressed as: where y is the vector of phenotypes; b is the vector of fixed effects; a and d are the vector of additive and dominance genetic effects of animals; X and Z are incident matrices for the fixed effects, additive, and dominance genetic effect, respectively; and e is the vector of residuals. Additive and dominance genetic effects were assumed to follow normal distributions: a*N(0,Gs 2 a ) and d*N(0,Ds 2 d ), where G and D are additive and dominance genomic relationship matrices, respectively. These matrices describe the relationships between genotyped individuals and can be constructed from the information on genome-wide SNP markers. Let A 1j and A 2j be 2 alleles at the jth marker locus and p j be the frequency of A 2j . The G matrix is created as follows [3]: where M a is the n|N m matrix (n is a number of individuals) and the element of M a for the ith individual at the jth marker is calculated as follows: Similarly, M d is assumed to be the n|N m matrix and the element of M d for the ith individual at the jth marker can be calculated as: This element describes the coefficients of d j in dominance deviations. Therefore, d and its variance can be derived as follows: where d is the N m dimensional vector of the jth element, which is d j . This dominance formula was also used in GVCBLUP [14]. Assuming the dominance genetic effects at different marker loci are identically and independently distributed normal variables, the variance of the genome-wide dominance effect is calculated as follows: Consequently, D can be calculated using M d : Variance components were estimated with average information restricted maximum likelihood (REML) [18]. The dataset of reference population were used to predict genetic effects of the  genotyped individuals in test population. The GBLUP-D model solutions yielded the estimates of additive (â a) and dominance genetic effects (d d). The estimates of the total genetic effect (ĝ g) were calculated by the sum ofâ a andd d. In GBLUP,ĝ g equalsâ a, because the dominance genetic effect is not considered. The predictive ability of the model was evaluated from accuracy and unbiasedness of estimates in the test population. The accuracies of estimated additive, dominance, and total genetic effects (râ a,a , rd d,d , and rĝ g,g , respectively) were measured as the correlations between the estimates and true values. Unbiasedness (bâ a,a , bd d,d , and bĝ g,g ) was measured using the regressions of estimates on true values. A regression coefficient of one denotes unbiasedness. Since the true values are unknown in pig data, estimated breeding values from the full PIC dataset (a Ã ) and phenotypes (y) were used instead of true additive and total genetic effects, respectively. In real data, the predictive ability of dominance genetic effect cannot be calculated and was inferred from that of total genetic effect. Tables 2 and 3 show the estimates of variance components and heritability in simulation data with various values of t and N m . For all values of t and N m , there were nearly no differences in estimates of additive genetic variance and narrow-sense heritability between GBLUP and GBLUP-D. The ratios of dominant genetic variance estimated by GBLUP-D were 90.6%, 61.1%, and 54.4% with a t of 0.25, 0.5, and 1.0 and 46.6%, 61.1%, and 84.0% with an N m of 200, 1,000, and 5,000, respectively. Tables 4 and 5 show the accuracies and unbiasedness of estimated genetic values calculated by GBLUP and GBLUP-D in simulation data. For all values of t and N m , râ a,a and bâ a,a were almost equal between GBLUP-D and GBLUP. In GBLUP-D, rd d,d and bd d,d increased with increasing t and N m . Meanwhile, rĝ g,g values in GBLUP-D exceeded those in GBLUP by 1.2%, 7.8%, and 24.7% with a t of 0.25, 0.5, and 1.0 and by 1.9%, 7.8%, and 4.2% with an N m of 200, 1,000, and 5,000, respectively. In GBLUP-D, larger values of t resulted in bĝ g,g being closer to 1.

PIC Pig Data
In T4 and T5 from the pig data, dominance genetic variance accounted for 9.6% and 6.3% of the phenotypic variance, respectively ( Table 6). The estimated additive genetic variances and residual variances calculated by GBLUP-D were smaller than those calculated by GBLUP. Thus, GBLUP-D consequently yielded lower narrow-sense heritability and higher broad-sense heritability than GBLUP.
In T4 and T5 from the pig data, there were nearly no differences between GBLUP and GBLUP-D with respect to râ a,a Ã (Table 7). In T5, rĝ g,y and bĝ g,y were slightly higher in GBLUP-D than GBLUP, whereas rĝ g,y and bĝ g,y were not higher in GBLUP-D in T4.

Stochastic Simulation
The GBLUP-D method captured the substantial ratios of the dominance genetic variances and estimated the individual dominance genetic effects although there were nearly no differences in estimates of additive genetic variance and narrowsense heritability between GBLUP and GBLUP-D. Our result indicates that GBLUP-D is expected to improve performance of the crossbreds, in particular when degree of dominance is large.
In the present study, the simulated genome comprised one chromosome 1 Morgan long. However, the whole genome sizes of livestock are larger. Here, another simulation data were constructed to evaluate the effect of the genome size on predictive ability. This simulation data comprised five chromosomes of 1 Morgan. The numbers of markers and QTL set to be 5,000 and 250 to obtain the same distances between markers and QTL as the initial simulation. Table 8 shows the accuracies and unbiasedness of estimated genetic values in this simulation data. The dominance genetic effects could be captured in this data, but the accuracies of additive, dominance and total genetic effects decreased in comparison with the genome of 1 chromosome. Hence, large size of reference population would be required when genome size is large.

PIC Pig Data
The degrees of estimated dominance genetic variance in T4 and T5 were nearly equal to those in simulation data with a t of 0.25. When t was 0.25, rĝ g,g and bĝ g,g were 1.2% and 0.3% higher in GBLUP-D than GBLUP. These results indicate the predictive ability of GBLUP-D in T4 and T5 only improved slightly because the degrees of dominance genetic variance were too small. In fact, rĝ g,y and bĝ g,y in GBLUP-D were little improved in comparison with GBLUP. In general, the degree of dominance genetic variance is expected to be much larger in crossbred populations than purebred ones. Since the present study was based on data from purebred PIC pig data, the degrees of dominance genetic variance in T4 and T5 might have been small. Table 4. Accuracies of estimates (râ a,a , rd d,d , and rĝ g,g ) and regression coefficients of estimates on their true values (bâ a,a , bd d,d , and bĝ g,g ) in the test population for simulation data with 3 dominance degrees (0.25, 0.5, and 1.0).  Table 5. Accuracies of estimates (râ a,a , rd d,d , and rĝ g,g ) and regression coefficients of estimates on their true values (bâ a,a , bd d,d , and bĝ g,g ) in the test population for simulation data with 200, 1,000, and 5,000 markers. Practical Use of GBLUP-D GBLUP-D has two practical uses. First, selection on the basis of GBLUP-D in crossbreds is useful for commercial production. In swine and poultry, crossbreds are the end product. The marker information from a purebred and its crossbred relatives enables the selection of candidate purebreds for the performance of their crossbred offspring [11]. Second, GBLUP-D could allow mating allocation to exploit dominance. An extra response will be obtained when an appropriate design of future matings using mating allocation techniques is implemented [10,19].

Additive and Dominance Relationship Matrix
In the presence of dominance genetic effects, the breeding values of A 1j A 1j , A 1j A 2j , and A 2j A 2j at the jth locus are {2p j fa j z(1{2p j )d j g, (1{2p j )fa j z(1{2p j )d j g,and(2{2p j ) fa j z(1{2p j )d j g, respectively. The additive genomic relationship matrix in GBLUP-D should be constructed considering d j . If all QTL are of complete dominance, then d j is 1. However, in practice, the value of d j cannot be determined because the degree of dominance is unknown. Therefore, in the present study, the additive genomic relationship matrix was the same in GBLUP-D and GBLUP. Although there are nearly no differences in the estimates of additive genetic effects or their variance between GBLUP and GBLUP-D, the additive genomic relationship matrix including d may yield good estimates of them.
A GBLUP method including dominance genetic effects was also proposed in a previous study [11]. In that model, the additive genomic relationship matrix is same as that in traditional GBLUP. However, the dominance genomic relationship matrix (DÃ) in the previous study differs from that in the present study and is defined as follows: where the element of H for the ith individual at the jth marker is calculated as This element corresponds to the heterozygosity coefficient but not the dominance genetic effect. To compare the performance of GBLUP-D in the present study and the model in the previous study [11], predictive ability was calculated in the standard simulation scenario (Table 9). GBLUP-D yielded higher accuracies of additive, dominance, and total genetic effects than the previous model [11]. This might be because the heterozygosity coefficient includes part of the additive genetic effect. In fact, the previous study [12] reported that dominance genetic variance calculated from the previous model [11] was larger than that from GBLUP-D.
Assuming linkage equilibrium and uncorrelated marker effects, the dominance genetic variances in the present study (s 2 d ) and the previous study [11] (s 2 dÃ ) are calculated as follows:  Table 7. Aces of estimates (râ a,a Ã and rĝ g,y ) and regression coefficients (bâ a,a Ã and bĝ g,y ) ofâ a on full PIC dataset (a Ã ) andĝ g on phenotypic value (y) in the test population for the PIC pig dataset.  2p j (1{p j )f1{2p j (1{p j )gs 2 d : In addition, the estimated dominance genetic effects in the present study (d d) and the previous study [11] (d d Ã ) are calculated as follows:d where V is the variance of y. If the distribution of the allelic frequencies is available,d d Ã can be transformed tod d.

Epistasis
Increasing knowledge about biological pathways and gene networks highlights the importance of gene-gene interactions, i.e., epistasis; some authors argue that much of the genetic variance in a population is due to such interactions [20][21][22]. When considering second-order epistasis in GBLUP, the epistatic genomic relationship matrix can be approximately calculated from the Hadamard product of the genomic relationship matrix. For example, additive by additive and additive by dominance interactions are represented as G#G and G#D, respectively. The present study tried to use linear mixed models including G#G and G#D for T4 and T5 in pig data. However, the variance components of these epistasis could not be estimated because they were outside parameter space (data not shown). The previous study [11] also included epistatic genomic relationship matrices in GBLUP. In that model, the estimated epistatic variances were almost 0 for daily gain in Danish Duroc pigs. The estimated epistatic variances in the Bayesian model for the percentage of CD8 + cells in publicly available mouse data were almost 0 [23]. Although these studies could not detect epistastic effect, in recent, the marker-generated kinship matrices were suggested in a new mixed model method [24] and nonparametric approaches and machine-learning techniques were recommended to model more complex gene interaction patterns [25][26][27].
Author Contributions Table 9. Accuracies of estimates (râ a,a , rd d,d , and rĝ g,g ) and regression coefficients of estimates on their true values (bâ a,a , bd d,d , and bĝ g,g ) in the test population for simulation data (t~0:5,N m~1 ,000).