Accounting for Genetic Architecture Improves Sequence Based Genomic Prediction for a Drosophila Fitness Trait

The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

Introduction
The ability to accurately predict phenotypes for quantitative traits from genome wide polymorphism data will revolutionize evolutionary biology, medicine and human biology, as well as breeding of agriculturally important plant and animal species. The most commonly used experimental approach to dissect the genotype-phenotype map has been to identify individual quantitative trait loci (QTLs) by linkage to, or association with, segregating molecular markers in mapping populations [1]. These studies evaluate the null hypothesis that variants are not associated with differences in trait means, and rejection of the null hypothesis gives biological insight into genes and genetic networks affecting naturally segregating quantitative variation. Genome wide association (GWA) mapping studies for quantitative traits and complex diseases in humans have identified over 2,000 novel variants [2]. However, the effects of individual variants are small, and collectively they explain only a small fraction of the total genetic and phenotypic variation for each trait, a phenomenon termed 'missing heritability' [3].
A second approach computes genome-based relationship matrices among individuals in a population based on all genotyped segregating variants, and uses this to estimate the fraction of additive genetic variance explained by the variants [4,5]. When applied to human quantitative traits and complex diseases, this method explains a much greater fraction of the total heritability than single marker analyses [6,7]. Importantly, when a statistical model is developed in a population for which individuals have been both genotyped for molecular markers and phenotyped for a quantitative trait, the model can be used to predict phenotypes for an independent sample of individuals from the same population with genotype information only (genomic prediction). Genomic prediction methods have been extended to predict genomic breeding values for ranking selection candidates in animal and plant breeding programs (genomic selection) [8][9][10][11][12]. Genomic selection can massively increase genetic progress and is currently widely utilized in applied breeding programs.
With the advent of next generation sequencing technologies, we can now perform genome wide association mapping and genomic prediction on populations of individuals with complete genome sequences. This scenario differs from GWA and genomic prediction analyses in which only a subset of segregating variants are genotyped in that the causal variants are themselves included in the list of polymorphic variants. If the true genetic architecture of a trait differs from the additive, highly polygenic model typically assumed in genomic prediction (GBLUP, genomic best linear unbiased prediction) [13,14], combining QTL mapping with genomic prediction may improve prediction accuracy and yield biologically relevant prediction models. Indeed, genomic prediction methods that incorporate epistatic or non-additive effects typically outperform their counterparts with only additive effects [15,16].
The Drosophila melanogaster Genetic Reference Panel (DGRP) consists of 205 sequenced inbred lines derived from the Raleigh, NC population [17,18]. Here, we report GWA and genomic prediction analyses for time to recover from a chill-induced coma, a component of fitness in Drosophila and other insects [19,20]. We find substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females and includes alleles with large additive effects as well evidence for epistasis [21]. Genomic predictive ability for chill coma recovery time is very low when based on a genomic relationship matrix including all markers, but is markedly improved when the relationship matrix only includes variants associated with main and epistatic effects on the trait.

Quantitative genetics of chill coma recovery time
We assessed time to recovery from a chill-induced coma for 176 of the 205 DGRP lines (S1 Table) with Illumina sequence data [17,18]. We find significant genetic variation (P = 1.07 x 10 −52 for the between line variance and P = 1.40 x 10 −6 for the sex by line interaction variance) for chill coma recovery time (Fig 1), with a broad sense heritability (± SE) ofĤ 2 = 0.35 (± 0.04) (S2 Table), similar to that previously reported for 157 DGRP lines [17]. The genetic correlation between chill coma recovery time in males and females is high (r MF ¼ 0:93). However, given the significant sex by line interaction variance (S2 Table), we considered males and females separately in subsequent analyses.

Genetic architecture of chill coma recovery time
Our previous GWA analysis of chill coma recovery time used single nucleotide polymorphisms (SNPs) for 157 Freeze 1.0 DGRP lines [17]. Sequences of these lines were obtained using both 454 and Illumina technology. Illumina sequences are now available for all 205 Freeze 2.0 DGRP lines, which have been genotyped for SNPs as well as small and large insertion/deletion (indel) variants and other non-SNP variants [18]. We performed a GWA analysis for chill coma recovery time based on line means, using a mixed model to account for relatedness for 176 DGRP lines with Freeze 2 genotypes of 1,868,905 common (minor allele frequency ! 0.05) bi-allelic variants meeting quality control metrics. The broad sense heritabilities of line means (± SE) areĤ 2 ¼ 0:92 (± 0.10) in females andĤ 2 ¼ 0:84 (± 0.10) in males (S2 Table). The increase over individual-based heritability estimates is because the line means are estimated with much greater precision given that we measured approximately 100 individuals per sex and line.
At a nominal P < 10 −5 threshold, we found 68 variants in or near 44 genes associated with female chill coma recovery time and 68 variants in or near 42 genes associated with male chill coma recovery time (S3 Table). A total of 26 genes were male-specific, 28 were female-specific, and 16 were common to both sexes (although the variants associated with males and females for the common genes were not necessarily the same). Three SNPs in females (2L_3753356_SNP in the 3'UTR of CG10019, 2L_4588764_SNP in the first intron of dumpy (dp), and X_18394766_SNP, a non-synonymous polymorphism in Rad51D) and one SNP in males (2L_4513330_SNP in the first intron of dp) had large effects and were genome-wide significant at a Bonferroni-corrected 5% significance threshold (P < 2.68 x 10 −8 ; S3 Table). These few significant additive SNPs explained 36% and 17% of the genetic variation among lines in females and males, respectively. Previously, we inferred widespread epistasis for female chill coma recovery time from the failure of associations with common SNPs to replicate between the Freeze 1.0 DGRP lines and an extreme-QTL GWA analysis of a large advanced intercross population derived from 40 DGRP lines, despite adequate power [21]. We therefore infer that the genetic architecture of chill coma recovery time is sex-specific, with a small number of SNPs with large main effects and variants with smaller additive effects and epistatic interactions that account for the remaining genetic variance in each sex.

Genomic prediction for chill coma recovery time using genome wide variants
We constructed genomic relationship matrices [5] from all of the 3,742,106 SNPs and 437,096 indels that were polymorphic in the 176 DGRP lines, as well as for the subset of 1,868,905 common variants (MAF ! 0.05). We used genomic best linear unbiased prediction (GBLUP) to predict mean chill coma recovery phenotypes for these lines, as described previously for starvation resistance and startle response [13].
We used 100 replicates of 5-fold cross-validation (CV) to assess the average correlation (r) between predicted genetic values using all common variants and observed phenotypes, for each sex separately. Surprisingly, we found that genomic-based predictive ability was very low (r = 0.08) in females, and in males, the estimate of additive genetic variance from the genomic relationship matrix was zero, leading to r = 0. Similar results (r = 0.08 in females and r = 0 in males) were obtained using all variants, suggesting that the low predictive ability of genomic prediction was not due to the omission of rare variants (Fig 2). The distributions of chill coma recovery times are not normal, and have a pronounced minor peak for longer recovery time in both sexes (Fig 1). The low predictive ability of genomic prediction is not, however, attributable to the non-normal distributions, as Box-Cox transformed data show the same pattern of low predictive abilities (r = 0.10 in females and r = 0 in males using common or all variants, Fig 2). This result is in contrast to previous analyses, where predictive ability for starvation resistance and startle response was 0.24 and 0.23, respectively [13].
The low predictive ability of genomic prediction could be due to low additive genetic variance for chill coma recovery time, despite appreciable total genetic variance. To assess this, we estimated the additive genetic variance for chill coma recovery using the genomic relationship matrix derived from common variants, both for individual data and for line means. Indeed, the estimate of additive genetic variance is zero for males and not significantly different from zero in females in both analyses (S4 Table). In fact, the likelihood profile for the additive genetic variance in males was flat near the origin and thus the maximum likelihood estimate was zero.
The observation that estimates of additive genetic variance using the genome-wide relationship matrix are not significantly different from zero is puzzling given the results from the GWA analyses. As noted above, we estimated that the top GWA hits accounted for 36% and Trait-Associated Genomic Prediction in Drosophila 17% of the variance among lines in females and males, respectively. There are several possible technical explanations for the discrepancy between the additive genetic variance expected to be contributed by variants detected in our GWA analysis and the non-significant or zero estimates of additive variance from the genome wide relationship matrix. First, the sample size of 176 lines is not large and may produce unstable estimates. However, the sample size was even smaller for starvation resistance and startle response in our earlier study in which the GBLUP estimates of additive genetic variance were reasonably accurate from a strictly additive model [13]. Second, the GBLUP model assumes a highly polygenic genetic architecture such that the effects of all variants are strictly additive, and normally distributed with equal variance. Departures of the true genetic architecture from these model assumptions, such as variants with large effects, non-additive genetic variance, or both, as inferred for chill coma recovery time, could thus lead to low GBLUP estimates of additive genetic variance. Further, the genomic variance estimated from the relationship matrix is not necessarily identical to the true genetic variance of the trait [22], although differences should be minor if the genomic relationship is constructed from sequence data assumed to harbor all causal variants, as in the present case. We assessed the effects of genetic architecture on estimates of additive genetic variance from the genomic relationship matrix by simulating phenotypes for the DGRP genotypes with different genetic architectures. We considered an additive model consisting of 100 QTLs each explaining 1% of the total genetic variance, a major gene model where one QTL explains all of the genetic variance, and an epistatic model where each of 50 pairs of interactions explains 2% of the total genetic variance. We simulated the phenotypic data such that the broad sense heritability is 37% (i.e., the same as for chill coma) and estimated the additive genetic variance using a mixed model. We performed 100 replicate under each scenario. While the additive model occasionally led to low estimates of additive genetic variance, the major gene model did so slightly more frequently and the epistatic model substantially more frequently (Fig 3). Therefore, we conclude that departures from the genetic architecture assumed by the GBLUP model can cause a substantial underestimation of the additive genetic variance and hence reduce predictive ability of the model.

Incorporating genetic architecture improves genomic prediction
Because the underlying genetic architecture affects the amount of estimated additive genetic variance, which is the variance component accessible by GBLUP prediction, we assessed whether genomic prediction could be improved by incorporating additive and/or epistatic trait-associated variants. We used leave-one-out cross-validation (LOOCV) in these analyses to maximize sample size in the training set. In each of the 176 LOOCV iterations, one line was left out and the remaining 175 lines were used to carry out a GWAS for single variants and pair-wise interactions between variants. We then selected the top trait-associated additive variants and/or epistatic pairs with P <10 -X to construct the genomic relationship matrix and predict the phenotype of the remaining line. We computed the predictive ability as the correlation between the vector of estimated genetic values and the vector of observed line means, and varied X to arrive at an optimal threshold (Fig 4). Using all common variants, we again found that GBLUP had low predictive ability in females (r = 0.07) and none in males (Fig 5). However, the maximum predictive ability increased to 0.40 in males and 0.43 in females when only the top SNPs were used to construct the relationship matrix (Fig 4 and Fig 5). Using top epistatic variants to build a pairwise epistatic genomic relationship matrix based on a modified approach of Astle and Balding [23] also resulted in an improvement in predictive ability, to 0.35 in males and 0.32 in females (Fig 5). Finally, incorporating both top additive variants and top epistatic variants (MAF > 0.15) improved the predictive ability in males to 0.48, but did not improve predictive ability in females (0.43) beyond that achieved by the additive model alone (Fig 5). In this combined model, we used the top epistatic variants leading to the best predictive ability in the epistatic LOOCV and varied the P-value threshold for the additive variants to optimize the predictive ability. Interestingly, in the additive only model, the highest predictive ability in males was achieved when a single variant was included (Fig 4), which coincided with the top association signal in males (2L_4513330, S3 Table).

Discussion
The sequenced inbred lines of the DGRP provide a unique opportunity to dissect the genetic architecture of Drosophila quantitative traits. Here, we performed GWA analyses for a component of fitness, time to recover from a chill-induced coma. The broad sense heritability of chill coma recovery time in the DGRP is moderately high (~0.40), and sex-specific. In contrast to results from most GWA analyses in human populations, in which the top GWA variants have small effects and collectively explain only a small fraction of the total heritability [2,3,24], we found that a small number of sex-specific SNPs with large marginal effects explain a substantial fraction of the variation in chill coma recovery time in the DGRP. These variants are common and are not in long-range linkage disequilibrium with other genomic variants, which can occur for variants with MAF < 0.05 due to the small size of the DGRP. In addition to large additive effects, our previous study implicated substantial epistasis for chill coma recovery time [21].
The complicated genetic architecture of chill coma recovery time is not unexpected for a fitness-related trait. Fisher's fundamental theorem of natural selection predicts reduced levels of additive genetic variation for such traits, with residual genetic variance attributable to additive and dominance variance from deleterious alleles maintained at low frequencies by mutationselection balance and balancing selection of alleles at intermediate frequencies, and epistatic Trait-Associated Genomic Prediction in Drosophila variance from gene-gene interactions [25]. Our GWA analysis did not evaluate the effects of low frequency alleles (MAF < 0.05). However, the additive effects were predominantly such that the carriers of rarer alleles took longer to recover from a chill-induced coma (S3 Table). To the extent that allele frequency is a proxy for fitness, this suggests that these alleles are deleterious with respect to chill coma recovery, and that they may be under a form of balancing selection. Possibilities include overdominance for chill coma recovery time or a beneficial effect on another fitness trait.
We computed genetic relationship matrices for all common variants and used GBLUP and cross-validation to assess predictive ability for chill coma recovery time. GBLUP is only one of many methods utilized in genomic prediction [26][27][28][29]. Simulation studies show that methods tailored to specific known underlying genetic architectures, in particular Bayes B [4], give improved predictive abilities over GBLUP when the genetic architecture consists of a few, additive loci with large effects [26][27][28]. However, Bayesian methods are very sensitive to the assumed prior distribution of effects, which are unknown a priori [29]. Alternatively, a recently proposed generalized ridge regression method estimated variant-specific shrinkage parameters from the data without the need for prior distributions [30]. However, this method made predictions that were comparable to GBLUP (S1 Fig). GBLUP is not only easy to implement but gives equivalent (and high) prediction accuracies when compared to other methods when applied to plant and animal breeding populations [14,[26][27][28], in which the relationship between individuals in the training and test populations is high. In human populations, however, in which the individuals in the training and test populations are not related, GBLUP Trait-Associated Genomic Prediction in Drosophila prediction accuracy is not high [14,31], suggesting that models incorporating genetic architecture could improve predictive ability.
When genetic architecture was not included in the model, the predictive ability we obtained for chill coma recovery time using GBLUP was very low for females and zero for males, due to estimates of narrow sense heritability that are not significantly different from zero in females and equal to zero in males. We hypothesized that this low predictive ability is due to violations of the additive infinitesimal model assumed by GBLUP that is inconsistent with the true genetic architecture of chill coma recovery time. This idea was supported by our simulation study (Fig 3). Our GWA analyses and previous results suggest that the true architecture of chill coma recovery includes a few large effect additive loci coupled with additional epistatic interactions. The low predictive ability from the mismatch between the true and assumed genetic architecture is further exacerbated by the low average genomic relatedness of the DGRP lines [13,18] and rapid decline of LD with physical distance in the DGRP [17,8]. Whether the violation of the infinitesimal model was due to major genes, non-additivity or both cannot be definitively answered with the available data.
The GBLUP model utilizes the average genomic relationship over all loci between individuals (or lines, in the case of the DGRP), and is the same for all traits measured in the population [32]. Furthermore, although epistasis for chill coma appears to be widespread [21], inclusion of a second variance component accounting for genome-wide pair-wise interactions among all considered variants did not improve prediction (S1 Fig). This is expected since most genetic variance is additive even if the gene action is epistatic, unless allele frequencies are intermediate [33,34]. However, if only a few loci or interactions between them affect the trait, using the entire genomic relationship matrix and/or genome-wide interactions essentially adds noise unless the average genomic relatedness matches the genomic relatedness at causal loci, leading to reduced prediction accuracy [14,32]. Indeed, our implementation of GBLUP that takes account of trait-associated QTLs [32], including epistasis and major additive effects, dramatically improved predictive ability.
The genomic prediction was greatly improved when adding SNPs or SNP combinations selected based on GWA analyses in each fold of the LOOCVs. Although this study was underpowered to detect epistatic interactions globally due to its small sample size, this does not mean that there is no epistasis among the top hits in the pair-wise epistasis tests. The improvement of prediction accuracy is likely a result of enriching for true causal variants among the list of variants used to construct the genetic covariance matrix. In addition to improved predictive ability, the trait-associated GBLUP approach affords the opportunity to evaluate the stability of particular additive variants and pairwise interactions by enumerating the number of times each enters the training model in the different folds of the LOOCV (S5 Table and S6 Table). Not surprisingly, the top additive variants in the GWA analysis were recapitulated among the variants entering the model in all or most of the folds of the LOOCV (S5 Table). However, there were relatively few pairwise interactions repeatedly entering the models (S6 Table); these interactions have the highest priority for future functional tests.
Epistasis is a major feature of the genetic architecture of time to recover from a chill-induced coma in Drosophila [21]. To what extent does this phenomenon apply to other quantitative traits in Drosophila, as well as to quantitative traits in other species? There is substantial evidence for epistatic interactions between QTLs in Drosophila and mice [1,21,24,34], chickens [35], Arabidopsis [36] and yeast [37,38]. Epistatic effects can be as large as main QTL effects, and can occur in opposite directions between different pairs of interacting loci and between loci without significant main effects on the trait. Epistasis can also occur between closely linked QTLs [36][37][38] and even between polymorphisms in a single gene [39]. Therefore, epistasis appears to be a common feature of the genetic architecture of a wide range of quantitative traits in genetic model organisms, and hence, by extension, in other less genetically tractable species as well.
The implications of pervasive epistasis are profound. Complex networks of genetic interactions provide the genetic basis of canalization (hidden genetic variation) [40]. Widespread and complex epistasis could lead to population-specific genetic architecture, since the effect of a given allele will be conditional on the presence of interacting alleles, which typically vary in frequency between populations [21,34,41]. This leads to non-replication of estimates of main effects in different populations and to population specific responses to artificial selection. Epistasis may be in part responsible for the phenomenon of missing heritability [3] in human complex traits and diseases, since estimates of effects of single markers are biased when epistasis exists but is not accounted for [34,42]. In this regard epistasis provides a reason for the apparent additive and close to infinitesimal genetic architecture of many quantitative traits: additivity is an emergent property of underlying epistatic gene action [34]. We know that developmental, neural, transcriptional, metabolic and biochemical networks are highly dynamic, interconnected and nonlinear [43]. Identifying epistatic genetic interaction networks will greatly inform our understanding of how molecular interactions affect variation in organismal quantitative traits [34,44], as well as improve genomic prediction.

Drosophila phenotypic data
We quantified chill coma recovery by transferring (without anesthesia) three to seven day-old flies to empty vials, and placing them on ice for three hours. We transferred the flies to room temperature, and recorded the time it took for each individual to right itself and stand on its legs [45]. We obtained two replicate measurements per sex and line, with 50 flies per replicate, for each of 177 DGRP lines. One line, DGRP_879, had a chill coma recovery phenotype two standard deviations from the mean, and was excluded from all analyses.

Heritability, genetic correlation, and variance components
To estimate broad sense heritability, we fitted the mixed model Y = μ + S + L + S×L + R(S×L) + ε to the individual level data. Y is phenotype, μ is the overall mean, S is the fixed effect of sex, L is the random effect of line, R is the random effect of replicate, and ε is the residual. We also fitted reduced models separately for males and females. We estimated the broad sense heritability for the full model asĤ 2 ¼ŝ for males and females, respectively, whereŝ 2 R is the between replicate variance component. To partition total genetic variance into additive and non-additive components, we fitted a mixed model y = Wμ + Zg + Zg' + e to individual level data in females and males separately. The random effects vector was symbolically separated into two in the model to help model specification despite the same incidence matrix Z with rows of unit vectors where one component is 1 and all others are 0 indicating the respective line the individual was from. Furthermore, W = (1,. . .,1) T ; μ is the overall mean; the additive genetic value g~N(0, σ G 2 s 2 G G) is assumed to be multivariate normal where G is the genomic relationship matrix of the n lines (also see below); the non-additive genetic value g'~N(0, σ G 2 s 2 G I) where I is the identity matrix; and the error term e~N(0, σ E 2 I). All variance components were estimated by REML using the Proc Mixed procedure in SAS software (Version 9.2 for Linux, [46]).

Genome wide association analysis
We performed genome wide association analysis on line means. Associations between single variants and line means for chill coma recovery time were tested by a mixed effect model accounting for relatedness among lines (including relatedness due to shared inversion karytopes) using FastLMM [18,47]. We used models of form Y = μ + V + a + ε, where V is the fixed effect of the polymorphic variant and a is a polygenic term whose covariance is specified by the genomic relationship matrix to evaluate the effects of markers for males and females separately. We performed these analyses for 1,865,879 biallelic variants for which the Phred scaled variant quality (10log 10 P where P is the probability value from a likelihood ratio test testing the existence of a variant) was greater than 500, the minor allele frequency was ! 0.05, and genotype call rate was ! 0.8 among the DGRP lines [18]. We considered only homozygous genotypes whose JGIL [48] quality scores were greater than 20. We estimated marginal allelic effects of each variant as one-half the difference in trait mean between the variant classes (polarized by allele frequency, such that the effect is the difference between the major and minor alleles) [25].

Bioinformatics analyses
We used SnpEff [49] for functional annotation of DNA variants based on the 5.49 release of the FlyBase [50] annotation.

Genomic Best Linear Unbiased Prediction (GBLUP)
We used the underlying statistical model y = Wμ + Zg + e (Model 1) to perform genomic prediction. The i th component of the q-vector y is the phenotypic value of the i th line that is used for prediction (i = 1,. . .,n), W = (1,. . .,1) T ; μ is the overall mean; g~N(0, σ G When including two different random components, we used the statistical model y = Wμ + Z 1 g 1 + Z 2 g 2 + e (Model 2) where g 1~N (0, σ G1 2 G 1 ) and g 2~N (0, σ G2 2 G 2 ) are assumed to be independent multivariate normal vectors where G 1 and G 2 are two different genomic relationship matrices, and Z 1 and Z 2 are the corresponding incidence matrices. The BLUP for this model can be obtained by solving the mixed model equations We estimated variance components for Model 1 and Model 2 using maximum likelihood (ML) as implemented by the R package"RandomFields", version 2.0.46, and the "fitvario" function (http://CRAN.R-project.org/package=RandomFields) [51].

Genomic relationship matrices
We constructed genomic relationship matrices from the 3,742,106 SNPs and 437,096 indels that were polymorphic in the 176 DGRP lines, as well as for the subset of 1,868,905 common variants (MAF ! 0.05). We only used variants with a call rate > 0.8 in the 176 lines. Missing genotypes were assigned the allele frequencies based on the set of 176 lines.
The additive genomic relationship matrix, G, is defined as G ¼ ðMÀPÞðMÀPÞ T 2 P s j¼1 p j ð1Àp j Þ [5], where M is the (n×s) matrix of genotype vectors for the n lines, with the s variants coded as -1, 1; and the j th column of P is (2(p j -0.5),. . .,2(p j -0.5)) T , where p j is the frequency of the second allele at locus j.
We calculated pairwise epistatic genomic relationship matrices by modifying an approach according to Astle and Balding [23]. For the j-th pair (SNP j1 , SNP j2 ) of interacting SNPs we built two relationship matrices, K j1 and K j2 (one for each of the two SNPs), according to where x ji is the genotype-vector and p ji is the allele frequency of SNP ji for i = 1, 2. We then calculated the Hadamard-product of K j1 and K j2 to obtain a matrix reflecting the interaction of SNP 1 and SNP 2 . This calculation was repeated for all pairs of interacting SNPs and finally averaged over all Hadamard-products, to obtain one final epistatic relationship matrix to be used as the pairwise epistatic genomic relationship matrix in the GBLUP models. In the additive case, the resulting covariance matrix is an unbiased and positive semi-definite estimator for the relationship matrix [23], and analogously K is an unbiased and positive semi-definite estimator for the additive x additive relationship matrix for the respective subset of m SNP pairs. Calculating the epistatic genomic relationship matrix as the Hadamard product of the additive genomic relationship matrices constructed from all SNPs involved in epistatic interactions [13] accounts for all pairwise interactions of the involved SNPs. Thus, for n epistatic SNP pairs comprising 2n SNPs there are 4n 2 SNP by SNP interactions. Our approach accounts for only the n pairwise interactions, i.e., for a SNP interacting with one other SNP, just this interaction is modeled and the interactions with the SNPs in the other n − 1 epistatic pairs are disregarded. In more general terms it is always possible to split the set of all pairwise SNP interactions in two subsets, where one subset reflects all interactions with a significant effect, and the complementary subset comprises all other pairwise interactions. The approach we used is equivalent to accounting for the significant interactions and ignoring the other interactions.

5-fold cross-validation using GBLUP
We first used 5-fold cross-validation (CV) [52][53][54] to assess prediction accuracy. In a 5-fold CV, the lines are randomly divided into five groups. Four of the five groups comprise the training set, and the remaining group constitutes the validation set, giving rise to five possible divisions of training and validation sets. For each of these divisions ("folds"), total genetic values for the lines of the validation set are predicted and the corresponding predictive ability defined as the correlation between predicted genetic values and observed phenotypic values is calculated. The five predictive abilities are then averaged to obtain one average correlation per CV replicate. These analyses were performed separately for males and females, with 100 replicates for each CV.

Leave-One-Out CV
We also performed a leave-one-out cross-validation (LOOCV) for trait-associated GBLUP. With n lines, the LOOCV consists of n folds. In each fold, n-1 observations are used as the training set and the phenotype of the remaining single line is predicted. This is repeated n times such that each line is predicted once. In each of the 176 folds, a GWA analysis for single variants and/or pair-wise interactions between variants was performed on the 175 lines of the training set only, i.e., all single marker and epistatic GWA analyses were repeated 176 times. The single marker GWA analysis was performed as described above while the epistatic GWA was performed as follows. We first pruned the genotype data for LD using the LD pruning utility in PLINK [55] such that no pair of variants has r 2 > 0.8 within a window of 100 variants, and constrained the analysis to interactions between variants of MAF > 0.15 and at least two lines for each of the four possible genotypes. Raw phenotypic data were adjusted for the effects of inversions and major principal components of the genotypic matrix for common variants by fitting them as fixed effects and taking residuals from the fitted model [18]. Finally we performed a full genome-wide screen for pairwise interactions, fitting models of form Y = μ + V A + V B + V A ×V B + ε, using FastEpistasis [56]. After the single marker and epistatic GWA analyses, we selected the top trait-associated additive variants and/or epistatic pairs with p < 10 -X in the respective training set to construct genomic relationship matrices and predicted the phenotype of the remaining line based on Model 1 and Model 2, incorporating additive and/or epistatic genomic relationship matrices as described above. We computed the predictive ability as the correlation between the vector of estimated genetic values (from all 176 folds) and the vector of observed line phenotypes, and varied the P-value threshold X to arrive at an optimal value. The same LOOCV approach was also applied using the bigRR package, which implements a variant ridge regression with variant specific shrinkage parameters [30]. For comparison with the trait-associated GBLUP, we also performed LOOCV with two variance components, the additive genomic relationship matrix, G, and the Hadamard product of G, G#G, with the latter of the two representing genome-wide pair-wise interactions among all variants. All cross-validation procedures and the GBLUP approach were implemented using R software [51].

Simulations
To investigate whether the genetic architecture of a quantitative trait could account for the difficulty of the additive genomic relationship to explain phenotypic variation, we performed simulations under three distinct genetic architectures: (1) a major gene explaining 37% of the phenotypic variation; (2) 100 loci additively explaining 37% of the phenotypic variation; and (3) 50 pairs of interacting loci explaining a total of 37% of the phenotypic variation. For the major gene and polygenic models, we randomly selected QTL sites from the genome and assigned their allelic effects such that each locus explained an equal amount of variance. For each pair of the randomly chosen QTLs in the epistatic model, the genotypic effect was assigned by the formula b(m 1 -p 1 )(m 2 -p 2 ), where m 1 and m 2 were the {-1, 1} coded genotypes, p 1 and p 2 were the allele frequencies of the two interacting loci, and b was the epistatic effect. To achieve equal variance for each interacting pair, we first calculated the sample variance of (m 1 -p 1 )(m 2 -p 2 ) and determined b accordingly.  Table. Variance components from models partitioning total genetic and additive genetic variance using the genomic relationship matrix.