Skip to main content
Advertisement
  • Loading metrics

Average semivariance yields accurate estimates of the fraction of marker-associated genetic variance and heritability in complex trait analyses

  • Mitchell J. Feldmann,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Plant Sciences, University of California, Davis, California, United States of America

  • Hans-Peter Piepho,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany

  • William C. Bridges,

    Roles Conceptualization, Formal analysis, Methodology, Validation, Writing – review & editing

    Affiliation Department of Mathematical Sciences, Clemson University, Clemson, South Carolina, United States of America

  • Steven J. Knapp

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    sjknapp@ucdavis.edu

    Affiliation Department of Plant Sciences, University of California, Davis, California, United States of America

Abstract

The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more ‘large-effect’ loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability () for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and . The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and , in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and . The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance.

Author summary

The contributions of individual genes to the phenotypic variation observed for genetically complex traits has been an ongoing and important challenge in biology, medicine, and agriculture. While many genes have statistically undetectable effects, those with large effects often warrant in-depth study and can be important predictors of complex phenotypes such as disease risk in humans or disease resistance in domesticated plants and animals. The genes identified through associations with genetic markers in complex trait analyses typically account for a fraction of the heritable variation, a genetic parameter we called ‘marker heritability’. We discovered that textbook statistical methods systematically overestimate marker heritability and thus overestimate the contributions of specific genes to the phenotypic variation observed for complex traits in natural and experimental populations. We describe the source of the upward bias, validate our findings through computer simulation, describe methods for bias-correcting estimates of marker heritability, and illustrate their application through empirical examples. The statistical methods we describe supply investigators with more accurate estimates of the contributions of specific genes or networks of interacting genes to the heritable variation observed in complex trait studies.

Introduction

The genetic variation observed in nature is frequently caused by genes with quantitative effects [17]. Their discovery and characterization has been a dominant feature of quantitative genetic studies in biology, evolution, agriculture, and medicine since the introduction of methods for genotyping DNA variants genome-wide [811], and the parallel development of statistical methods for finding associations between DNA variants and the underlying genes or quantitative trait loci (QTL) [2, 4, 5, 7, 1216]. A significant breakthrough was achieved when Lander and Botstein [12] introduced ‘interval mapping’ and showed that genomes could be systematically searched to identify QTL in populations genotyped with a genome-wide framework of genetically mapped DNA markers. As genotyping technologies advanced and marker densities increased, genome-wide association study (GWAS) methods emerged to search genomes for genotype-to-phenotype associations by exploiting the historical recombination in populations [14, 15, 1719]. The concept of genomic prediction emerged as a counterpart to GWAS, initially for estimating genomic-estimated breeding values (GEBVs) in domesticated plants and animals and later for estimating polygenic risk scores (PRSs) in humans and model organisms [2023]. These technical advances precipitated a consequential shift in the study of quantitative traits from analyses of phenotypic variation limited and informed by pedigree or family data to genome-wide analyses of genotype-to-phenotype associations and genomic prediction informed by genotypic data [6, 7, 13, 16, 20, 2431].

The phenotypic variation observed in a population is customarily partitioned into genetic and non-genetic components to estimate heritability, repeatability, and reliability of the quantitative traits under study [24, 25, 30, 32]. The genetic component can be caused by any number of genes with quantitative effects, even a single gene, but more often by multiple genes with a range of effects [31, 3343]. For most quantitative traits, that number is unknown but presumed to be large and undiscoverable [3, 6, 7, 22, 32, 34]. Because genes with small effects are challenging to identify and validate, the ‘many genes with small effects’ hypothesis has been difficult to conclusively falsify [21, 22, 32]. Despite the uncertainty surrounding the identity, number, effects, and interactions of genes in the undiscovered fraction [6], three decades of complex trait analyses in humans, domesticated plants and animals, Drosophila, Arabidopsis, yeast, mice, zebrafish, and other organisms have shown that the ‘discovered’ genes are typically small in number, large in effect, and collectively only explain a fraction of the genetic variance () [13, 16, 28, 32, 3436, 44, 45]. The unexplained fraction has been called ‘missing heritability’ [4648].

The discovered genes in polygenic systems of genes are often necessary but not sufficient for predicting quantitative phenotypes, e.g., disease risk in humans or yield in domesticated plants and animals [3, 21, 34, 42, 44, 49]. There is a large body of evidence that the QTL effects for many quantitative traits are gamma family distributed, where the discovered genes are found in the upper or thin tail of the distribution above the threshold of statistical significance [34]. The presumption is that the lower or heavy tail of the gamma family distribution is caused by many genes with small effects, the chief tenet of the infinitesimal model of quantitative genetics [6, 26, 32, 50]. Genes with large effects often dominate the ‘non-missing heritability’, mask or obscure the effects of other quantitatively acting genes, and pleiotropically affect multiple quantitative phenotypes [16, 35, 39, 51], e.g., mutations in the BRCA2 gene can have large effects, are incompletely penetrant, interact with other genes, and are necessary but not sufficient for predicting breast, ovarian, and other cancer risks in women [52]. The large-effect QTL BTA19 pleiotropically affects milk yield, protein yield, and productive life in Guernsey cattle (Bos taurus) [43], and branching and pigment genes (BR, PHY, and HYP) have large effects, interact, and pleiotropically affect several genetically correlated seed biomass traits in sunflower (Helianthus annuus) [53]. Despite decades of directional selection, loci with large effects often segregate (have not been fixed) in domesticated plant and animal populations [33, 34, 37, 38, 40, 54, 55]. The fractions of the genetic variances explained by BRCA2, BTA19, BR, PHY, and HYP were not reported in those studies. What fraction of the heritability for breast cancer risk, for example, can be explained by the known mutations in BRCA2? Our study explored the accuracy of methods for estimating that parameter.

Our surveys and others substantiate that the missing and non-missing fractions of the genetic variance are commonly either not estimated or inaccurately estimated in GWAS and other gene finding studies, e.g., the statistical significance of individual marker loci from sequential regression analyses are typically reported without correcting for the effects of other discovered marker loci through multilocus partial regression analyses or Type III ANOVA [17, 19, 22, 34, 56]. Such analyses are necessary for accurately assessing the statistical importance of the underlying gene and gene-gene interaction effects in a multilocus system, e.g., when multiple loci are identified by GWAS (sequential analyses of individual loci), their effects are more accurately estimated by simultaneous analysis using partial regression analysis approaches and even then can be upwardly biased [51]. The estimation problem we studied is intertwined with the broader problem of accurately describing multilocus systems of genes with large effects. We show that the discovered fraction of the genetic variance can be grossly overestimated and that the cause of the problem is a mathematical artifact in the expected values of variance components and their ratios. We revisited the problem of estimating the non-missing and missing fractions of heritability in candidate gene and other complex trait analyses, in part because of the systematic upward bias we discovered, in addition to inconsistencies in the methods commonly applied to the problem. The solutions to the problem presented here are straightforward and primarily applicable to the study of genes with large effects, especially those affecting the accuracy of genomic predictions for disease risk or breeding value [21, 43]. The optimum approaches for weighting or correcting for loci with large effects in genomic prediction are not completely clear; however, in artificial selection settings where the favorable alleles for discovered loci are unequivocally known, those alleles can be directly selected via marker-assisted selection (MAS) with genomic selection exerting pressure on unknown loci underlying the additive genetic variance not explained by the segregation of known large effect loci [54, 5761].

Lande and Thompson [62] proposed the parameter to estimate the discovered or non-missing fraction of the genetic variance, where is the fraction of the genetic variance associated with statistically significant markers in linkage disequilibrium (LD) with genes or QTL affecting the trait under study (here QTL refers to a chromosome segment predicted to harbor a gene or genes affecting a quantitative trait). Similarly, marker heritability () estimates the non-missing fraction of the phenotypic variance () associated with statistically significant markers in LD with causal genes or QTL. Here a distinction needs to be made between and genomic heritability, a parameter estimated by summing the effects of a dense genome-wide sample of markers, only some of which are predicted to be in LD with the underlying causal genes or QTL [27, 30, 63]. We are not proposing marker heritability as a replacement or substitute for genomic heritability but as a parameter for parsing out the non-missing fraction of heritability associated with discovered loci, especially loci like BRCA2 and BTA19 [43, 52]. The genetic variance component () in these ratios can be estimated from pedigree or family information (as shown in our examples) or genomic information (as reviewed by [30] and [63]). For either, is simply the variance explained by marker loci with effects large enough to be statistically detected and important enough to be specifically studied and modeled, perhaps as fixed effects [22, 39, 40, 51, 61]. Despite a direct and logical connection to heritability, estimates of p and are seldom reported in complex trait studies, whereas genomic heritability estimates are commonly reported in genomic prediction studies [30, 34, 62].

Here we show that p and are often overestimated in complex trait analyses. The problem we discovered is unrelated to selection bias, the phenomena where the effects of discovered QTL are inflated by biased sampling from truncated distributions with small sample sizes [6469], and unrelated to the upward biases known to arise in GWAS [70]. While selection bias is a well known and widely cited problem in complex trait analyses, we describe a previously unreported and cryptic source of bias in estimates of p and . To identify the source of the bias and explore the problem in greater depth, we compared the accuracy of average marginal variance (AMV) [71, 72] and average semivariance (ASV) [73] methods for estimating p and . AMV is the acronym applied throughout this paper for the ANOVA and REML variance component estimation methods commonly described in textbooks and implemented in statistical software for the analysis of generalized linear mixed models (GLMMs), e.g., the ‘lme4’ R package and the SAS packages ‘GLM’ and ‘GLIMMIX’ [24, 25, 72, 7479]. We introduced the average marginal variance terminology here to facilitate comparisons of the differences between AMV and ASV methods for estimating variance component ratios. The ASV methods we applied to the problem are extensions of those described by Piepho [73] for estimating the total variance and coefficient of determination (R2) in GLMM analyses. For the AMV and ASV analyses shown throughout this paper, REML was used to estimate the variance components [56, 72, 75, 79]. The source of the bias was discovered, however, through algebraic analyses of the expected mean squares (EMSs) from ANOVA. We describe that source and approaches for bias-correcting ANOVA or REML estimates of p and from the commonly applied AMV methods. We show that ASV methods directly yield unbiased estimates of p and that are identical to bias-corrected AMV estimates. Finally, we discuss the connection of these random effects methods to the fixed effect methods commonly applied in QTL mapping and genome-wide association studies [51, 80, 81].

Results and discussion

Overestimation of the genetic variance explained by markers in linkage disequilibrium with causative genes or QTL

The overestimation problem described here was originally discovered in a reanalysis of data from genetic studies in plants where REML estimates of exceeded REML estimates of broad-sense heritability (H2) and REML estimates of p and exceeded 1.0, the theoretical upper limit for these parameters (Table 1). We initially suspected that selection bias might be the culprit [68, 69, 8284] but concluded that selection bias alone could not explain or . Although proof was lacking and the bias was non-obvious, we hypothesized that many estimates in the theoretical range (0.0 ≤ p ≤ 1.0) must also be upwardly biased. The proof was found through algebraic analyses of the ANOVA estimators of , , and for balanced and unbalanced data (S1, S2 and S4 Texts). Although variance components are commonly estimated using REML, as was done in the analyses shown throughout this paper, algebraic analyses of ANOVA expected mean squares (EMSs) identified the source of the bias and yielded explicit algebraic solutions for bias correcting ANOVA and REML estimates of p and .

thumbnail
Table 1. REML estimates of marker-associated variance (), the fraction of the genetic variance explained by markers (), and marker heritability () from random marker effects analyses and coefficients of determination (R2) from Type II and Type III fixed marker effects analyses for large effect loci identified in cattle, sunflower, and strawberry studies.

https://doi.org/10.1371/journal.pgen.1009762.t001

The source of the bias was identified by expressing the estimator of p as a function of the ANOVA estimators of and for balanced data and algebraically simplifying the equations. The linear mixed models (LMMs) and ANOVA estimators of the variance components needed to show this are described here. We start with the analysis of a single marker locus in an experiment where entries (e.g., individuals, families, or strains) are replicated, can be estimated, and the data for entries and markers are balanced. Extensions for one to three marker loci with unbalanced data are shown in S1, S2 and S3 Texts. Two LMMs are needed for estimating , , p, and . Consider a study where nG entries are phenotyped for a normally distributed quantitative trait using a balanced completely randomized study design with rG replications/entry, nM marker genotypes/locus, and rM replications/marker genotype. The LMM needed for estimating (the between entry variance component) is: (1) where yjk is the jkth phenotypic observation, μ is the population mean, Gj is the random effect of the jth entry, ϵjk is the random effect of the jkth residual, , , j = 1, 2, …, nG, and k = 1, 2, …, rG. Suppose entries are genotyped for a single marker locus (M) in linkage disequilibrium with a gene or QTL affecting the quantitative phenotype (yjk). The between entry source of variation from LMM (1) can be partitioned into marker (M) and entry nested in marker (G : M) sources of variation (this is the residual genetic variation among entries not explained by markers in the model). The LMM for estimating and is: (2) where yijk is the ijkth phenotypic observation, Mi is the random effect of the ith marker genotype at locus M, G : Mi(j) is the random effect of the jth entry nested in the ith marker genotype, ϵijk is the random effect of the ijkth residual, i = 1, 2, 3, , , and .

The ANOVA estimator of the between-entry variance component () from LMM (1) with balanced data is: (3) where MSG = SSG/dfG is the between entry mean square, SSG is the between entry sum of squares, dfG = nG − 1 is the between entry degrees of freedom, is the residual mean square, SSϵ is the residual sum of squares, dfϵ = nG(rG − 1) − 1 is the residual degrees of freedom, is the residual variance component, and rG is the number of replications per entry [74]. The between-entry variance component has a theoretical genetic interpretation when entries are progeny with genetic relationships known from pedigrees, e.g., monozygotic twins, full-sib families, or recombinant inbred lines [24, 25, 30]. ANOVA estimators of the marker locus M and entry nested in M variance components from LMM (2) with balanced data are: (4) and (5) respectively, where nG:M is the number of entries nested in each marker genotype, , , MSG:M is the entry nested in M mean square, and MSM is the mean square for marker locus M. The residuals in LMMs (1) and (2) are identical when the data are balanced (). Hence, for a single marker locus with balanced data, the ANOVA estimator of p is: (6) and the ANOVA estimator of broad-sense marker heritability on an entry-mean basis is: (7) where is the phenotypic variance on an entry-mean basis [25, 76].

The overestimation of p and was not obvious from inspection of ANOVA estimators (6) and (7). The source of the bias was discovered by substituting SSM + SSG:M for SSG in the ANOVA estimator of from (3) and simplifying: (8) where the fraction kM is source of the bias, 0 < kM < 1, rM is the number of replications per marker genotype, nG:M is the number of entries nested in marker loci, SSM is the marker sum of squares, dfM is the marker degrees of freedom, rM is the number of replicates of each marker genotype, SSG:M is the entry nested in marker sum of squares, and dfG:M is the entry nested in marker degrees of freedom. The term kM in (8) depends on degrees of freedom and nG:M and is hereafter referred to as the kM bias coefficient, where the subscript M indexes the intralocus and interlocus effects of marker loci.

Eq (8) shows that the sum of ANOVA estimates of and from LMM (1) are greater than the ANOVA estimate of from LMM (2): (9)

Although the SS for sources of variation in these LMMs are additive (SSM + SSG:M = SSG), the mean squares are not (MSM + MSG:MMSG). Because , the sum from LMM (2) overestimates by a factor of . The ANOVA estimators of p and from analyses of LMMs (1) and (2) are upwardly biased because is multiplied by the fraction kM in their denominators, and not the numerators: (10) and (11)

Substituting for in the denominators of p and decreases but does not eliminate the bias because is multiplied by kM in the denominator (S1 Fig). For a single marker with balanced data, we found that: (12) and (13) where 0 < kM < 1. Hence, the bias is caused by the kM multiplier in the expected values of the ANOVA estimators of p and . As shown later, simulation analyses confirmed that (9) and (13) accurately predict the upward bias caused by kM. Moreover, we concluded that the bias could be corrected by multiplying ANOVA or REML estimates of by kM in the numerators of p and estimates.

Genetic models with unbalanced genotypic data

We started with the special case of balanced data, which seldom arises in practice, but develop results here for the general case of unbalanced data. Following the same approach as that shown above for a single locus with balanced data, we found kM coefficients for bias-correcting ANOVA and REML estimates of p and for analyses of one to three marker loci with unbalanced genotypic data (S1, S2 and S3 Texts). For a single marker locus with unbalanced genotypic data, we found: (14) where nG is the number of entries, dfG are the degrees of freedom for entries, and is the number of entries nested in the hth marker genotype (S1 Text). This simplifies to (12) for a single marker locus with balanced genotypic data.

The kM coefficients become slightly more complicated as the number of marker loci increases but nevertheless follow a predictable algebraic pattern, e.g., for a two locus genetic model, see equations (S10)-(S12) in S2 Text. Similarly, for a three locus genetic model, see equations (S19)-(S25) in S3 Text. kM is greater (kM bias is proportionally smaller) for interaction than main effects, e.g., for two marker loci, kM1 < kMM2 < 1 and kM2 < kMM2 < 1, where kM1 is the coefficient for M1, kM2 is the coefficient for M2, and is the coefficient for the M1 × M2 epistatic interaction (S2 Text). kM for the two-locus interaction (kMM2) is larger than kM for the individual marker loci (kM1 and kM2) because the denominator (dfG rG) is constant, whereas the numerators increase and approach the denominator as the degrees of freedom for marker effects increase. Therefore, the upward bias is proportionally smaller for the M1 × M2 variance component than the M1 or M2 variance components for a two locus genetic model. Similarly, for a three locus genetic model, the upward bias is proportionally smaller for the M1 × M2 × M3 interaction variance component than the two-way interaction variance components (M1 × M2, M1 × M3, and M2 × M3). These results naturally extend to genetic models with more than three loci. Algebraic results are only shown for three marker loci because we found that the the kM bias problem can be directly solved using average semivariance estimation methods when analyzing more complex genetic models (see below). Although certainly not limited to three marker loci, the methods described herein are primarily designed to study the effects of one to a few genes with large effects, e.g., BRCA2 [52], BTA19 [43], and the examples shown in Tables 1 and 2, and not to replace GWAS or QTL mapping.

thumbnail
Table 2. Type I, II, and III sums of squares for fixed effect analyses of markers associated with QTL identified in GWAS and QTL mapping experiments in cattle and sunflower.

https://doi.org/10.1371/journal.pgen.1009762.t002

Study designs without replications or repeated measures of individuals or families

LMMs (1) and (2) arise in study designs where entries (individuals, families, or strains) are replicated, e.g., in studies with domesticated plants, biological replicates of half-sib or full-sib families, doubled haploid or recombinant inbred lines, or testcross hybrids are commonly phenotyped [24, 25, 31, 76, 87, 88] (see the sunflower example in Table 1). These same LMMs apply to study designs for monozygotic twins in humans and other mammals and clonally replicated individuals in asexually propagated plants, e.g., cassava (Manihot esculenta), strawberry (Fragaria × ananassa), and apple (Malus × domestica) (see the strawberry examples in Table 1). The extension of the proposed kM bias correction solutions to LMMs with repeated measures is straightforward and should have applications in studies where large effect loci are important determinants of the genetic variation underlying quantitative traits in both replicable and unreplicable organisms or populations [8894].

When entries are unreplicated, the random error or residual source of variation in LMM (2) disappears ( becomes the residual) and , , and p cannot be estimated; however, the marker heritability can be estimated using the phenotypic variance among unreplicated individuals (). As before, this variance component ratio is upwardly biased by the factor kM (see the cattle example in Table 1). Without the insights gained from the algebra shown in equations (10), (S3), (S9), and (S18), and S1, S2 and S3 Texts, the bias would not be obvious unless one or more estimates of marker heritability exceeded 1, which only happens when the loci under study have very large effects. That was exactly how we originally discovered the bias problem in the first place (Table 1). The bias is systematic and ubiquitous but not immediately obvious when estimates fall within the expected range (). The same bias correction solutions we proposed for study designs with replications of entries can be applied in study designs where entries are unreplicated. When unreplicated entries are genotyped with a dense genome-wide of markers, be estimated using a genomic or pedigree relationship matrix [92, 9597], which yields an estimate of p.

Average semivariance estimation directly solves the bias problem

The AMV methods proposed above for bias correcting ANOVA or REML estimates of p and are straightforward to apply in practice because they are the methods widely described in textbooks and implemented in popular statistical software packages, e.g., the R package ‘lme4’ and SAS package ‘GLIMMIX’ [78, 98]. Here we show that the bias problem can be directly solved by applying average semivariance (ASV) estimation methods [73]. As before, we start by showing results for a single marker locus with balanced genotypic data. AMV notation and estimators are reformulated in matrix notation here to build the foundation for describing ASV notation and estimators. The input for both are the adjusted entry-level means () from LMM (1) stored in an nG-element vector. These are the best linear unbiased estimates (BLUEs) for entries [73, 99]. The LMM equivalent to (2) for the entry-level means analysis of the effect of a single marker locus (M) is: (15) where is the phenotypic mean for the ijth entry, μ is the population mean, Mi is the random effect of the ith marker genotype, , G : Mi(j) is the random effect of entries nested in M, , is the residual error, and . The residual variance-covariance matrix (R) is estimated in the first stage of a two-stage analysis [99101]. The between-entry variance can be partitioned into and with individual variance-covariance matrices Gc defined by the genetic model, e.g., different main and interaction effects among marker loci.

The AMV estimator of the phenotypic (total) variance among observations for LMM (15) is: (16) where V is the variance-covariance matrix of the phenotypic observations, nG is the number of entries, tr(V) is the trace of V, is the marginal variance explained by the cth genetic factor in the model (e.g., M and G : M), Zc are design matrices for the c genetic factors, is the AMV estimator of the residual variance, and R is the residual variance-covariance matrix. The AMV estimator of the genetic variance among entries (G) is: (17) where is a nG identity matrix. From LMM (15), the AMV estimator of the variance associated with a single marker locus with balanced data is: (18) where , , is a nM identity matrix, is an nG:M-element unit vector, and uM is a vector of random effects for M. The AMV estimator of the variance associated with the residual genetic variation among entries nested in M is: (19) where uG:M is a vector of random entry nested in M effects and is a nG identity matrix. Hence, the AMV estimators of and are identical to ANOVA estimators (4) and (5), respectively, with entry means as input for the former and original observations as input for the latter.

ASV, or the average variance of differences among observations, leads to a definition of the total variance that provides a natural way to account for the heterogeneity of variance and covariance among observations [73, 102]. ASV can be defined for any variance-covariance structure in a generalized LMM and allows for missing and unbalanced data [73]. The ASV estimator of total variance is half the average variance of pairwise differences among entries and can be partitioned into independent sources of variance, e.g., genetic and non-genetic or residual: (20) where is the idempotent matrix used for column-wise mean-centering, is an nG × nG identity matrix, and is an nG × nG unit matrix [73]. accounts for the variance and covariance of the phenotypic observations. From (20), is the variance explained by the cth genetic factor (uc), where c indexes genetic factors, the genetic factors are marker locus effects and entries nested in marker locus effects, and is the residual variance. The variance explained by the cth genetic factor is , e.g., for a single marker locus M, . , , and the biases of these ASV estimators are defined in S4 Text.

The ASV estimator of the genetic variance among entries (G) is: (21) where is a nG identity matrix. Hence, from Eqs (8), (17) and (21), AMV and ASV estimators of the between-entry variance component () are equivalent (). The ASV estimator of the variance associated with M is: (22) where kM = dfM nG:M/dfG is the bias correction coefficient, , dfG = nG − 1, dfM = nM − 1, and dfG:M = dfGdfM. This definition of the kM-bias coefficient is identical to the earlier definition with rG factored out (see Eq 12). Eq (22) shows that the ASV estimator of is corrected by the fraction kM, which correctly scales the estimate of to the genetic variance and yields unbiased estimates of p and . From Eqs (9) and (22), we found that by the factor kM. The ASV estimator of the variance associated with G : M is: (23)

The ASV estimator of p for a single marker locus (M) is: (24)

Similarly, the ASV estimator of for a single marker locus is: (25) where is the phenotypic variance on an entry-mean basis [25]. From these results, we found that: (26) and showed that ASV estimators of p and are unbiased (automatically corrected for kM).

Computer simulations confirmed that ASV-REML estimates of p and are unbiased

Computer simulations confirmed that AMV-REML estimates of p (6) and (7) are upwardly biased by the factor kM and that ASV-REML estimates of these parameters form (24) and (25) are unbiased (Figs 1 and 2). The mean of AMV-REML estimates of p and from 21 different simulation study designs (S1 Table) were identical to those predicted by the kM coefficients shown in S1, S2 and S3 Texts. Several insights arose from the simulation analyses. First, the bias caused by kM increased as increased but was proportionally constant for different (Fig 1). These results show that the overestimation of p and is greatest for genes and gene-gene interactions with large effects (Fig 1). Their effects could be inflated by selection bias over and above kM bias [67, 68, 82, 83, 103]; hence, we concluded that kM-bias and selection bias could operate in combination to inflate estimates of the contribution of a locus to the heritable variation in a population (S1, S2 and S3 Texts). Moreover, because the bias increases as the effect of the locus increases, we concluded that the overestimation problem is worst for large-effect QTL (Fig 1). Second, kM bias was greater for unbalanced than balanced data (Fig 1D and 1E). The effect of unbalanced data was more extreme for the F2 simulation (Fig 1D) where the expected genotypic ratio was 1 AA: 2 Aa: 1 aa than for simulations where 10 or 33% of the observations were randomly missing for markers with roughly equal numbers of replicates/marker genotype (Fig 1E and 1F). Third, the F2 and missing data simulations further showed that the precision of estimates of these parameters decreased as the genotypic data imbalance increased. Even though bias-corrected AMV and ASV estimates of these parameters are unbiased, the sampling variances among the simulated F2 samples were larger than observed for the 10 and 33% missing data samples and yielded a small percentage of estimates slightly greater than 1.0 (Fig 1D). For the other simulation study designs (Fig 1), none of the ASV estimates exceeded 1.0. The sample variances of p and can be estimated using data resampling methods, e.g., bootstrapping [104], or the estimators we developed using the Delta method (S5 Text) [25, 105, 106]. Equations (S44) and (S45) in S5 Text show that ASV estimates are more precise than AMV estimates by a factor of . These predictions perfectly aligned with the empirical bootstrap estimates. Fourth, the relative biases were not affected by the number of replications of entries or the number of entries, although the precision of estimates increased as nG and increased (Fig 2). Predictably, the number of entries (nG) dramatically affected the precision of estimates of (Fig 2C and 2D). The relative biases were not affected by rG or ; however, the sampling variances were strongly affected by and decreased as increased (Fig 2E and 2F and S2 Fig).

thumbnail
Fig 1. Accuracy of AMV and ASV estimators of marker heritability.

AMV and ASV estimates of are shown for 1,000 segregating populations simulated for different numbers of entries (nG individuals, families, or strains), five replications/entry (rG = 5), true marker heritability () ranging from 0 to 1, and one to three marker loci with three genotypes/marker locus (nM1 = 3). AMV estimates of marker heritability (; red highlighted observations) and ASV estimates of marker heritability (; blue highlighted observations) are shown for: (A) one locus with balanced data for nG = 540 entries (study design 1); (B) two marker loci with interaction (M1, M2, and MM2) and balanced data for nG = 540 (study design 2); (C) three marker loci with interactions (M1, M2, M3, MM2, MM3, MM3, and MMM3) and balanced data for nG = 540 (study design 3); (D) an population segregating 1:2:1 for one marker locus with rG:M = 135 entries for both homozygotes and rG:M = 270 heterozygous entries, and nG = 540 (study design 4); (E) one locus with 10% randomly missing data among 540 entries (study design 5); and (F) one locus with 33% randomly missing data among 540 entries (study design 6). Study design details are shown in S1 Table.

https://doi.org/10.1371/journal.pgen.1009762.g001

thumbnail
Fig 2. Effect of rG, nG, and on the relative bias of AMV and ASV estimators of .

(A and B) Phenotypic observations were simulated for 1,000 populations segregating for a single marker locus with three genotypes (nM = 3), nG = 900 progeny, and rG = 1, 2, 5, 10, or 20 (study designs 7–11). The marker locus was assumed to be in complete linkage disequilibrium with a single QTL that explains 50% of the phenotypic variance (). (A) Distribution of the relative biases of AMV estimates of for different rG. The relative bias was identical for different rG. (B) Distribution of the relative biases of ASV estimates of for different rG. The relative bias was identical for different rG. (C and D) Phenotypic observations were simulated for 1,000 populations segregating for a single marker locus with three genotypes (nM = 3), five replications/entry (rG = 5), and nG = 450, 900, 1,800, 3,600, or 7,200 entries/population (study designs 12–16). The marker locus was assumed to be in complete linkage disequilibrium with a single QTL that explains 50% of the phenotypic variance (). (C) Distribution of the relative biases of AMV estimates of for different nG. The relative bias was identical across the variables tested. (D) Distribution of the relative biases of ASV estimates of for different nG. The relative bias () was identical across the variables tested. (E and F) Phenotypic observations were simulated for 1,000 populations segregating for a single marker locus with three genotypes (nM = 3), five replications/entry (rG = 5), and nG = 450 entries/population. The marker locus was assumed to be in complete linkage disequilibrium with a single QTL that explains 5–95% of the phenotypic variance ( to 0.95 (study designs 17–21). (E) Distribution of the relative biases of AMV estimates of for different . The relative bias was identical across the variables tested. (F) Distribution of the relative biases of ASV estimates of for different . The relative bias was identical across the variables tested.

https://doi.org/10.1371/journal.pgen.1009762.g002

GWAS example: A single marker locus with highly unbalanced genotypic data

The bias-correction methods described above are illustrated here for highly unbalanced genotypic data from a GWAS experiment. Variance components were estimated for two SNP markers (AX493 and AX396) in LD with a gene (FW1) conferring resistance to Fusarium wilt in a strawberry (Fragaria × ananassa) GWAS population (nG = 564) genotyped with a genome-wide framework of SNP markers [86]. Both SNP markers had highly significant GWAS effects with −log10(p) = 6.61 × 10−31 for AX493 and 2.95 × 10−222 for AX396. Genotype frequencies were highly unbalanced for both markers with a scarcity of AA homozygotes (2.8%) for AX396 (16AA : 177Aa : 371aa) and a 1 : 2 : 1 ratio for AX493 (141AA : 282Aa : 141aa). For both loci, the minor allele frequency was >0.05. The kM for these data (kAX493 = 0.62 and kAX396 = 0.47) were calculated as shown in S1 Text. The AMV-REML estimate of for AX396 exceeded 1.0, a telltale sign of kM-bias (Table 1). AMV-REML estimates of and for both SNP markers were double or nearly double their bias-corrected ASV-REML estimates (Table 1). The bias-corrected estimate of marker heritability for AX396 was 0.62, versus 1.33 for the uncorrected estimate. Even with bias-correction, the sum of ASV-REML estimates of and for AX493 was slightly greater than the ASV-REML estimate of . This result was consistent with findings for highly unbalanced marker genotypic data in our simulation studies where a certain fraction of bias-corrected estimates exceeded the theoretical limit for heritability because of decreased precision (Fig 1). The kM-bias problem would not necessarily have been detected in the analysis of AX396 because the p and estimates fell within the expected range, e.g., (Table 1). Although both SNP markers were closely associated with FW1, they accounted for dramatically different fractions of genetic variance because of historic recombination and because neither are causal DNA variants or in complete LD with causal DNA variants [17, 19, 86, 107].

QTL mapping example: Three marker loci with slightly unbalanced genotypic data

Statistics are shown here for an analysis of three marker loci (BR, PHY, and HYP) affecting seed oil content in a sunflower (Helianthus annuus) RIL population using LMM (27) [53]. The genotypic data were only slightly unbalanced and the three marker loci were identified by QTL mapping. The kM needed for bias-correcting AMV-REML estimates of p and are shown in S3 Text (Table 1). The AMV-REML estimates of p and were nearly double the bias-corrected ASV-REML estimates, e.g., the AMV-REML estimate of for the three-locus genetic model (0.79) was nearly two-fold greater than the ASV-REML estimate (0.41) (Table 1). Similarly, the AMV-REML estimate of p for the BR locus (0.54) was slightly more than double the bias-corrected (ASV-REML) estimate (0.26). Hence, the uncorrected REML estimates of p and grossly inflated the predicted contributions of the three marker loci to genetic variation for seed oil content (Table 1).

GWAS example: Three marker loci with unbalanced genotypic data and unreplicated entries

The application of bias-correction is illustrated here for a genetic model with three marker loci, highly unbalanced genotypic data, and a single phenotypic observation per individual— and p could not be estimated for this example because individuals were unreplicated. Variance components were estimated for three SNP markers (rs10, rs45, and rs20) on chromosomes 2, 6, and 22, respectively, affecting white spotting (%) in a Holstein–Friesian cattle (Bos taurus) population (nG = 2, 973) [85]. These SNP markers had the largest effects among those predicted to be in LD with genes affecting white spotting. The genotypic frequencies were 50AA : 586Aa : 2, 337aa for rs10, 78AA : 736Aa : 2, 159aa for rs45, and 237AA : 976Aa : 1, 760aa for rs20. The kM for these data (krs10 = 0.35, krs45 = 0.41, and k20 = 0.54) were calculated as shown in S3 Text. The uncorrected AMV-REML estimate of for the three-locus genetic model (0.76) was substantially greater than the bias-corrected ASV-REML estimate (0.37) (Table 1). Similar differences were observed for the three marker loci.

Candidate gene analysis: Fixed or random, BLUE or BLUP?

Our study was partly motivated by inconsistencies in the statistical approaches applied in candidate gene and other complex trait analyses when testing hypotheses and fitting genetic models for multiple large-effect loci. With the high densities of genome-wide markers commonly assayed in gene finding studies, investigators often identify markers tightly linked to candidate or known causal genes, as exemplified by diverse real world examples [17, 19, 33, 34, 37, 38, 40, 42, 43, 52, 54, 108]. The candidate marker loci are nearly always initially identified by genome-wide searches using sequential (marker-by-marker) approaches [56, 72, 75, 79, 109, 110]. Complicated and often misunderstood problems arise in the estimation and interpretation of statistics from sequential fixed effect analyses when the data are unbalanced [79, 111, 112]. Most importantly, there are multiple model fitting and analysis options (Type I, II, and III ANOVA) and the reduction in error sums of squares (SSE), test statistics, and parameter estimates differ among them, a problem that disappears when the data are balanced or when single large effect loci are discovered [79, 111113]. Our review of the literature uncovered substantial variation and inconsistencies in the statistical approaches applied to the problem of fitting multilocus genetic models, testing multilocus genetic hypotheses, and calculating best linear unbiased estimates (BLUEs) from a fixed effects analysis of marker loci.

The problems that arise in fixed effect analyses of unbalanced data profoundly affect parameter estimates and statistical inferences but have not been universally recognized or addressed in complex trait analyses [79, 112]. We reanalyzed the cattle and sunflower examples with markers as fixed effects (Table 2) to show this, illustrate the challenges and nuances of fixed effects analyses of unbalanced data, and facilitate comparisons between random and fixed effects analyses of marker loci [56, 56, 75, 79, 109112]. Following the discovery of statistically significant marker-trait associations from a marker-by-marker genome-wide scan, the natural progression would be to analyze multilocus genetic models where the effects of the discovered loci are simultaneously corrected for the effects of other discovered loci [79, 112], as shown in our multilocus analysis examples (Tables 1 and 2). This is straightforward when the genotypic data are balanced or nearly balanced (as in the sunflower example) but more complicated and convoluted when the genotypic data are unbalanced (as in the cattle example) [75, 79, 111, 112]. Although methods for fixed effect analyses of factorial treatment designs (multilocus genetic models) with unbalanced data are well known [56, 79, 109, 110, 112], there are several model fitting and parameter estimation variations that can lead to dramatically different parameter estimates and statistical inferences. This is perfectly illustrated by the cattle example where the coefficients of determination (analogous but not identical to ) from Type I, II, and III analyses were substantially different from each other and from estimates from the random effects analysis (Tables 1 and 2). The differences and ambiguities among the different fixed effects approaches disappear when the random effects approach is applied to the problem.

The analysis of markers as random effects in multilocus analyses of known or candidate genes with large effects with ASV, although historically uncommon, simultaneously yields unbiased estimates of the variance component ratios investigated in the present study (p and ) and best linear unbiased predictors (BLUPs) of the additive and dominance effects of the causative loci identified by marker associations, in addition to solving the often ambiguous problems that arise in fixed effects analyses of unbalanced data [32, 75, 77, 79, 112, 113]. As discussed in depth below and illustrated through a reanalysis of the cattle and sunflower examples (Table 2), the random effects approach we described (ASV with REML estimation of the variance components) yields accurate estimates of the underlying genetic parameters (variance component ratios and BLUPs of marker effects) from a single unambiguous generalized linear mixed model analysis, whereas wildly different parameter estimates can arise among the multitude of fixed effects analyses that investigators might elect to apply in practice when the underlying genotypic and phenotypic data are unbalanced (Tables 1 and 2).

As substantiated by our simulation analyses (Figs 1 and 2), ASV with REML estimation of the underlying variance components yields accurate estimates of p and for marker loci and interactions between marker loci, both individually and collectively, and BLUPs of the the additive and dominance effects of marker loci [76, 113115]. When the genotypic data are unbalanced, the order with which marker and marker × marker effects enter the genetic model profoundly affects parameter estimates and statistical inferences in fixed effect analyses [56, 72, 74, 116]. To illustrate this, the main effects of marker loci A, B, and C were estimated for the six possible Type I ANOVA orders of the three loci (ABC, ACB, BAC, BCA, CAB, and CBA) (Table 2). Predictably, the reduction in the error sums of squares for a particular locus differed for each Type I order in the cattle example: the Type I SS ranged from 591.4 to 3,552.3 for rs10, 4,880.5 to 9,504.4 for rs20, and 4,259.6 to 8,384.8 for rs45. The R2, or PVE, estimates for marker loci were radically different among the six Type I ANOVA and Type II and III analyses. The Type I SS were, in addition, significantly greater than the Type III SS for nearly every factor. Although Type III statistics are commonly estimated and reported in analyses of factorial treatment designs with unbalanced data, there are compelling arguments for estimating Type II statistics [109, 110]; nevertheless, as we have argued, the fixed effects approach is unnecessary.

Broadly speaking, the large effect loci segregating in a population are typically necessary but not sufficient for predicting genetic merit or disease risks but are often important enough to warrant deeper study and, in animal and plant breeding, direct selection via MAS or direct modelling in genome selection applications [21, 32, 57]. The BLUP (random marker effects) approach we applied was designed to align the study of loci with large and highly predictive effects with the BLUP approaches commonly applied to genomic prediction problems that are agnostic or indifferent to the effects of individual loci, the so-called “black box” of genomic prediction [6, 7, 20, 21, 88, 117121]. The predictive markers associated with large effect marker loci can be integrated into the genome-wide framework of marker loci applied in genomic prediction or incorporated as fixed effects when estimating GEBVs or PRSs [21, 54, 5761]. One of the greatest strengths of the random effects (BLUP) approach is that the genetic parameters can be estimated from a single REML analysis free of the challenges and uncertainty associated with the fixed effects model building process [79, 109, 110, 112]. Finally, if our conclusions are correct, the complex trait analysis literature is riddled with overestimates of the genotypic and phenotypic variances explained by specific genes or QTL.

Materials and methods

Simulation studies

We used computer simulation to estimate the bias and assess the accuracy of uncorrected and bias-corrected REML estimates of p and for 21 study designs (S1 Table and S4 Text). Phenotypic observations (yijk) for LMMs (1) and (2) were simulated for nM = 3 genotypes/marker locus and 21 combinations of study design variables (nG, rG, rM, and H2) with balanced or unbalanced data (S1 Table). Simulations were performed to assess the accuracy of REML estimates of p and for 21 study designs with 1,000 replicates per study design (S1 Table). The phenotypic observations for each sample were obtained by generating random normal variables for entries, markers, and residuals using the R function rnorm() with known means and variances [122] as described by [123, 124]. The simulated random effects of entries, markers, and replications in LMMs (1) and (2) were summed to obtain n = nG rG phenotypic observations for each study design. Variance components for the random effects in LMMs (1) and (2) were estimated using the REML function implemented in and assess the accuracy of AMV and ASV estimators of p and . For study designs 1–6, the true marker heritability randomly varied from 0 to 1. Study designs 1–6 demonstrate how different numbers of marker loci (m) and unbalanced data affect estimates of p and (Fig 1; S1 Table). For study designs 5 and 6, we randomly deleted 10 and 33% of the phenotypic observations, respectively, to create unbalanced data. For study designs 7–21, the true variances of the independent variables were fixed for all samples, which allowed us to estimate the bias and relative bias associated with the different estimators (the biases are shown in S4 Text). Study designs 7–21 illustrate how rG, nG, and affected the biases and relative biases of p and (Fig 2; S1 Table). The variance components were estimated using REML in the lme4::lmer() v1.1–21 [78] package in R v4.0.2 [122]. We estimated the sample variances of AMV and ASV estimates of p for each study design (S1 Table). Finally, we developed estimators of the sampling variances of p and using the delta method [25, 106], as shown in S5 Text.

Estimation examples

To illustrate the application of bias-correction methods and the differences between AMV and bias-corrected AMV estimates of p and , we reanalyzed data from a GWAS study in cattle (Bos taurus), a QTL mapping study in oilseed sunflower (Helianthus annuus L.) [53], and a GWAS study of Fusarium wilt resistance in strawberry (Fragaria × ananassa Duchesne ex Rozier) [86]. For the sunflower study, two replications (rG = 2) of nG = 146 recombinant inbred lines (RILs) were phenotyped for seed oil concentration (g/kg) and genotyped for three marker loci (BR, PHY, and HYP) with two homozygous marker genotypes/locus [53]. For the cattle study, unreplicated entries (rG = 1; nG = 2, 973) were phenotyped for white spotting (%) and genotyped for three marker loci (rs10, rs45, rs20) with three marker genotypes per locus [85]. LMM (2) expanded to three marker loci with all possible interactions among marker loci is: (27) where BRh is the hth effect of the BR locus, PHYi is the ith effect of the PHY locus, HYPj is the jth effect of the HYP locus, G : (BR × PHY × HYP)hij(k) is the kth effect of entries nested in the hijth BR × PHY × HYP interaction, and ϵhijkl is the hijklth residual effect. The data for RILs were balanced, whereas the data for marker genotypes were slightly unbalanced. Each of the eight BR × PHY × HYP homozygotes were observed in the RIL population; however, the number of entries nested in each marker genotype (nG:M) varied from nG:BR = 81 : 65, nG:PHY = 60 : 86, and nG:HYP = 70 : 76. Variance components for LMMs (1) and (27) were estimated using the REML method in lme4::lmer() [78]. The marker-associated genetic variances for individual marker loci and two- and three-way interactions among marker loci were bias-corrected using the formula described in S1, S2 and S3 Texts.

For the strawberry study, four replications (rG = 4) of 565 entries (nG = 565) from a genome-wide association study (GWAS) were phenotyped for resistance to Fusarium wilt and genotyped for single nucleotide polymorphism (SNP) markers in LD with FW1, a dominant gene conferring resistance to Fusarium oxysporum f.sp. fragariae, the causal pathogen [86]. The replications were asexually propagated clones of individuals; hence, the expected causal variance among individuals was equal to the total genetic variation in the population, analogous to monozygotic twins [25]. Genetic parameters were estimated for two SNP markers (AX493 and AX396) that were tightly linked to FW1 [86]. The genotypic data for both markers were highly unbalanced. Genotype numbers were 141 AA : 282 Aa : 141 aa for AX493 and 16 AA : 177 Aa : 371 aa for AX396, where A and a are alternate SNP alleles. The variance components were estimated for LMMs (1) and (2) using REML method implemented in the R package lme4::lmer() [78]. REML estimates of the marker-associated genetic variances for both marker loci were bias-corrected using the approach described in S1 Text.

For the cattle study, we used a model similar to (27) for the analysis. However, because entries are unreplicated in this experiment, we cannot include the entries nested in the three-way marker interaction (G : M) term because it has the same levels as the residual. The LMM for this case study is: (28) where rs10h is the hth effect of the peak SNP (rs109979909) on chromosome 2, rs45i is the ith effect of the peak SNP on chromosome 6 (rs451683615), rs20j is the jth effect of the peak SNP on chromosome 22 (rs209784468), and G : (rs10×rs45×rs20)hij(k) is the hij(k)th residual effect comprising residual genetic effects G : M and residual error. In this experiment, there were k entries and k observations, and because of this we cannot fit LMM (1) without incorporating pedigree or genomic relatedness. In this single case, we estimate from the log transformed data to use in the denominator of .

We used the SAS package PROC GLM [77] for Type I and III analyses of the sunflower and cattle data with marker loci as fixed effects. Type I analyses were done for the six possible orders of main effects (ABC, ACB, BAC, BCA, CAB, and CBA) and a single order for marker × marker interactions (A × B, A × C, B × C, and A × B × C), where A, B, and C are the three marker loci (factors). For the ABC order, the reduction SS for the main effects were R(A | μ), R(B |μ, A), and R(C | μ, A, B), where μ is the population mean. Similarly, for the ACB order, the reduction SS for the main effects were R(A | μ), R(C |μ, A), and R(B | μ, A, C), and so on for the other four orders (BAC, BCA, CAB, and CBA). C, A × B, A × C, B × C, A × B × C), the reduction SS for the main effect of B was R(B | B, C, A × B, A × C, B × C, A × B × C). For comparison, the Type III reduction SS for the main effects were R(A | μ, B, C, A × B, A × C, B × C, and A × B × C), R(B | μ, A, C, A × B, A × C, B × C, and A × B × C), and R(C | μ, A, B, A × B, A × C, B × C, and A × B × C).

Supporting information

S1 Table. Simulation study designs and variables.

Normally distributed phenotypic observations were simulated for 21 study designs and associated linear mixed models by varying the number of observations (n = nG × rG), the number of entries (nG), the number of replications/entry (rG), the number of marker loci (m), nM = 3 genotypes/marker locus, the number of entries/marker genotype (nG:M), and marker heritability (). One thousand samples of size n were simulated for each study design. The segregation of a single marker locus in an F2 population was simulated in study design 4 The number of entries nested in marker genotypes for study design 4 was equivalent to the expected number for the segregation of a co-dominant DNA marker in a population segregating 1 AA : 2 Aa : 1 aa for a single marker locus. In this example, there are 135 entries nested in AA, 270 entries nested in Aa, and 135 entries nested in aa and each are replicated 5 times.simulates the segregation of a single locus in an F2 population The number of entries/genotype for study design 4.

https://doi.org/10.1371/journal.pgen.1009762.s001

(PDF)

S1 Fig. Accuracy of AMV and ASV estimators of marker heritability when the phenotypic variance is estimated by pooling marker and residual genetic sources of variation ().

AMV and ASV estimates of when from LMM (1) is replaced with for AMV from LMM (2) or for ASV. Estimates are shown for 1,000 segregating populations simulated for different numbers of entries (nG individuals, families, or strains), five replications/entry (rG = 5), true marker heritability () ranging from 0 to 1, and one to three marker loci with three genotypes/marker locus (nM1 = 3). The AMV estimates (shown in red) equal , whereas the ASV estimates (shown in blue) equal . AMV estimates of marker heritability (; red highlighted observations) and ASV estimates of marker heritability (; blue highlighted observations) are shown for: (A) one locus with balanced data for nG = 540 entries (study design 1); (B) two marker loci with interaction (M1, M2, and M1 × M2) and balanced data for nG = 540 (study design 2); (C) three marker loci with interactions (M1, M2, M3, M1 × M2, M1 × M3, M2 × M3, and M1 × M2 × M3) and balanced data for nG = 540 (study design 3); (D) a population segregating 1:2:1 for a single marker locus with rG:M = 135 entries for both homozygotes and rG:M = 270 heterozygous entries, and nG = 540 (study design 4); (E) one locus with 10% randomly missing data among 540 entries (study design 5); and (F) one locus with 33% randomly missing data among 540 entries (study design 6). Study design details are shown in S1 Table.

https://doi.org/10.1371/journal.pgen.1009762.s002

(TIFF)

S2 Fig. Relative bias of AMV and ASV estimators of marker heritability.

Relative biases of AMV and ASV estimates of are shown for 1,000 segregating populations simulated for different numbers of entries (nG individuals, families, or strains), five replications/entry (rG = 5), true marker heritability () ranging from 0 to 1, and one to three marker loci with three genotypes/marker locus (nM1 = 3). AMV estimates of marker heritability (; red highlighted observations) and ASV estimates of marker heritability (; blue highlighted observations) are shown for: (A) one locus with balanced data for nG = 540 entries (study design 1); (B) two marker loci with interaction (M1, M2, and M1 × M2) and balanced data for nG = 540 (study design 2); (C) three marker loci with interactions (M1, M2, M3, M1 × M2, M1 × M3, M2 × M3, and M1 × M2 × M3) and balanced data for nG = 540 (study design 3); (D) an population segregating 1:2:1 for one marker locus with rG:M = 135 entries for both homozygotes and rG:M = 270 heterozygous entries, and nG = 540 (study design 4); (E) one locus with 10% randomly missing data among 540 entries (study design 5); and (F) one locus with 33% randomly missing data among 540 entries (study design 6). Study design details are shown in S1 Table.

https://doi.org/10.1371/journal.pgen.1009762.s003

(TIFF)

S1 Text. ASV estimator of the fraction of the genetic variance associated with a single marker locus for unbalanced data.

https://doi.org/10.1371/journal.pgen.1009762.s004

(PDF)

S2 Text. ASV estimator of the fraction of the genetic variance associated with two marker loci for unbalanced data.

https://doi.org/10.1371/journal.pgen.1009762.s005

(PDF)

S3 Text. ASV estimator of the fraction of the genetic variance associated with three marker loci for unbalanced data.

https://doi.org/10.1371/journal.pgen.1009762.s006

(PDF)

S4 Text. Biases of AMV and ASV estimators of marker-associated variance.

https://doi.org/10.1371/journal.pgen.1009762.s007

(PDF)

S5 Text. Sample variances for AMV and ASV estimators of p and .

https://doi.org/10.1371/journal.pgen.1009762.s008

(PDF)

References

  1. 1. Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265(5181):2037–2048. pmid:8091226
  2. 2. Risch NJ. Searching for genetic determinants in the new millennium. Nature. 2000;405(6788):847–856. pmid:10866211
  3. 3. Glazier AM, Nadeau JH, Aitman TJ. Finding genes that underlie complex traits. science. 2002;298(5602):2345–2349. pmid:12493905
  4. 4. Consortium CT, et al. The nature and identification of quantitative trait loci: a community’s view. Nature reviews Genetics. 2003;4(11):911.
  5. 5. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nature reviews genetics. 2005;6(2):95–108. pmid:15716906
  6. 6. Hill WG. Understanding and using quantitative genetic variation. Philos Trans R Soc London, Ser B. 2010;365(1537):73–85. pmid:20008387
  7. 7. Hill WG. Quantitative genetics in the genomics era. Curr Genomics. 2012;13(3):196–206. pmid:23115521
  8. 8. Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. American journal of human genetics. 1980;32(3):314. pmid:6247908
  9. 9. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409(6822):928–934. pmid:11237013
  10. 10. Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, et al. High-throughput genotyping by whole-genome resequencing. Genome research. 2009;19(6):1068–1076. pmid:19420380
  11. 11. Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, et al. SNP detection for massively parallel whole-genome resequencing. Genome research. 2009;19(6):1124–1132. pmid:19420381
  12. 12. Lander ES, Botstein D. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121(1):185–199. pmid:2563713
  13. 13. Mackay TFC. The genetic architecture of quantitative traits. Annu Rev Genet. 2001;35(1):303–339. pmid:11700286
  14. 14. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385–389. pmid:15761122
  15. 15. Consortium WTCC, et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661.
  16. 16. Mackay TFC, Stone EA, Ayroles JF. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet. 2009;10(8):565. pmid:19584810
  17. 17. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7–24. pmid:22243964
  18. 18. Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet. 2017;18(2):117. pmid:27840428
  19. 19. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. pmid:28686856
  20. 20. Meuwissen T, Hayes B, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–1829. pmid:11290733
  21. 21. Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Complex trait prediction from genome data: contrasting EBV in livestock to PRS in humans: genomic prediction. Genetics. 2019;211(4):1131–1141. pmid:30967442
  22. 22. Crouch DJ, Bodmer WF. Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants. Proceedings of the National Academy of Sciences. 2020;117(32):18924–18933. pmid:32753378
  23. 23. Wray NR, Lin T, Austin J, McGrath JJ, Hickie IB, Murray GK, et al. From basic science to clinical application of polygenic risk scores: a primer. JAMA psychiatry. 2021;78(1):101–109. pmid:32997097
  24. 24. Falconer D, Mackay T. Introduction to Quantitative Genetics. Harlow, Essex, UK. Longmans Green; 1996.
  25. 25. Lynch M, Walsh B. Genetics and analysis of quantitative traits. vol. 1. Sinauer Sunderland, MA; 1998.
  26. 26. Walsh B. Quantitative genetics in the age of genomics. Theoretical Population Biology. 2001;59(3):175–184. pmid:11444958
  27. 27. Visscher PM, Medland SE, Ferreira MA, Morley KI, Zhu G, Cornes BK, et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2(3). pmid:16565746
  28. 28. Roff DA. A centennial celebration for quantitative genetics. Evolution. 2007;61(5):1017–1032. pmid:17492957
  29. 29. Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008;4(2):e1000008.
  30. 30. Visscher PM, Hill WG, Wray NR. Heritability in the genomics era—concepts and misconceptions. Nat Rev Genet. 2008;9(4):255–266. pmid:18319743
  31. 31. Roff DA. Evolutionary quantitative genetics. Springer Science & Business Media; 2012.
  32. 32. Bernardo R. Reinventing quantitative genetics for plant breeding: something old, something new, something borrowed, something BLUE. Heredity. 2020;125(6):375–385. pmid:32296132
  33. 33. Andersson L. Genetic dissection of phenotypic diversity in farm animals. Nature Reviews Genetics. 2001;2(2):130–138. pmid:11253052
  34. 34. Hayes B, Goddard ME. The distribution of the effects of genes affecting quantitative traits in livestock. Genetics Selection Evolution. 2001;33(3):1–21. pmid:11403745
  35. 35. Mackay TF. Quantitative trait loci in Drosophila. Nature reviews genetics. 2001;2(1):11–20. pmid:11253063
  36. 36. Andersson L, Georges M. Domestic-animal genomics: deciphering the genetics of complex traits. Nature Reviews Genetics. 2004;5(3):202–212. pmid:14970822
  37. 37. Anderson JA, Chao S, Liu S. Molecular breeding using a major QTL for Fusarium head blight resistance in wheat. Crop Science. 2007;47:S–112.
  38. 38. Septiningsih EM, Pamplona AM, Sanchez DL, Neeraja CN, Vergara GV, Heuer S, et al. Development of submergence-tolerant rice cultivars: the Sub1 locus and beyond. Annals of Botany. 2009;103(2):151–160. pmid:18974101
  39. 39. Lorenz K, Cohen BA. Small-and large-effect quantitative trait locus interactions underlie variation in yeast sporulation efficiency. Genetics. 2012;192(3):1123–1132. pmid:22942125
  40. 40. Saatchi M, Schnabel RD, Taylor JF, Garrick DJ. Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds. BMC Genomics. 2014;15(1):1–17.
  41. 41. Huang H, Cao J, Hanif Q, Wang Y, Yu Y, Zhang S, et al. Genome-wide association study identifies energy metabolism genes for resistance to ketosis in Chinese Holstein cattle. Animal genetics. 2019;50(4):376–380. pmid:31179571
  42. 42. Freebern E, Santos DJ, Fang L, Jiang J, Gaddis KLP, Liu GE, et al. GWAS and fine-mapping of livability and six disease traits in Holstein cattle. BMC genomics. 2020;21(1):1–11. pmid:31931710
  43. 43. Li B, VanRaden P, Null D, O’Connell J, Cole J. Major quantitative trait loci influencing milk production and conformation traits in Guernsey dairy cattle detected on Bos taurus autosome 19. Journal of Dairy Science. 2021;104(1):550–560. pmid:33189290
  44. 44. Bernardo R. Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci. 2008;48(5):1649–1664.
  45. 45. Bernardo R. Bandwagons I, too, have known. Theor Appl Genet. 2016;129(12):2323–2332. pmid:27681088
  46. 46. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747. pmid:19812666
  47. 47. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010;11(6):446–450. pmid:20479774
  48. 48. Young AI. Solving the missing heritability problem. PLoS Genet. 2019;15(6):e1008222. pmid:31233496
  49. 49. Bernardo R. What if we knew all the genes for a quantitative trait in hybrid crops? Crop Sci. 2001;41(1):1–4.
  50. 50. Hill WG. Applications of population genetics to animal breeding, from Wright, Fisher and Lush to genomic prediction. Genetics. 2014;196(1):1–16. pmid:24395822
  51. 51. De Villemereuil P, Morrissey MB, Nakagawa S, Schielzeth H. Fixed-effect variance and the estimation of repeatabilities and heritabilities: issues and solutions. Journal of Evolutionary Biology. 2018;31(4):621–632. pmid:29285829
  52. 52. Gaudet MM, Kirchhoff T, Green T, Vijai J, Korn JM, Guiducci C, et al. Common genetic variants and modification of penetrance of BRCA2-associated breast cancer. PLoS Genet. 2010;6(10):e1001183. pmid:21060860
  53. 53. Tang S, Leon A, Bridges WC, Knapp SJ. Quantitative trait loci for genetically correlated seed traits are tightly linked to branching and pericarp pigment loci in sunflower. Crop Sci. 2006;46(2):721–734.
  54. 54. Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic architecture of complex traits and accuracy of genomic prediction: coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS genetics. 2010;6(9):e1001139. pmid:20927186
  55. 55. Seabury CM, Oldeschulte DL, Saatchi M, Beever JE, Decker JE, Halley YA, et al. Genome-wide association study for feed efficiency and growth traits in US beef cattle. BMC genomics. 2017;18(1):1–25. pmid:28521758
  56. 56. Littell RC. Analysis of unbalanced mixed model data: a case study comparison of ANOVA versus REML/GLS. Journal of Agricultural, Biological, and Environmental Statistics. 2002;7(4):472.
  57. 57. Daetwyler HD, Calus MP, Pong-Wong R, de Los Campos G, Hickey JM. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics. 2013;193(2):347–365. pmid:23222650
  58. 58. Gianola D. Priors in whole-genome regression: the Bayesian alphabet returns. Genetics. 2013;194(3):573–596. pmid:23636739
  59. 59. Moore JK, Manmathan HK, Anderson VA, Poland JA, Morris CF, Haley SD. Improving Genomic Prediction for Pre-Harvest Sprouting Tolerance in Wheat by Weighting Large-Effect Quantitative Trait Loci. Crop Science. 2017;57(3):1315–1324.
  60. 60. Rice B, Lipka AE. Evaluation of RR-BLUP Genomic Selection Models that Incorporate Peak Genome-Wide Association Study Signals in Maize and Sorghum. The Plant Genome. 2019;12(1). pmid:30951091
  61. 61. Spindel J, Begum H, Akdemir D, Collard B, Redoña E, Jannink J, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity. 2016;116(4):395–408. pmid:26860200
  62. 62. Lande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124(3):743–756. pmid:1968875
  63. 63. de los Campos G, Sorensen D, Gianola D. Genomic heritability: what is it? PLoS Genet. 2015;11(5):e1005048. pmid:25942577
  64. 64. Beavis WD. QTL analyses: power, precision, and accuracy. In Patterson AH (ed) Molecular Dissection of Complex Traits. 1998; p. 145–162.
  65. 65. Melchinger AE, Utz HF, Schön CC. Quantitative trait locus (QTL) mapping using different testers and independent population samples in maize reveals low power of QTL detection and large bias in estimates of QTL effects. Genetics. 1998;149(1):383–403. pmid:9584111
  66. 66. Utz HF, Melchinger AE, Schön CC. Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples. Genetics. 2000;154(4):1839–1849. pmid:10866652
  67. 67. Allison DB, Fernandez JR, Heo M, Zhu S, Etzel C, Beasley TM, et al. Bias in estimates of quantitative-trait–locus effect in genome scans: demonstration of the phenomenon and a method-of-moments procedure for reducing bias. Am J Hum Genet. 2002;70(3):575–585. pmid:11836648
  68. 68. Xu S. Theoretical basis of the Beavis effect. Genetics. 2003;165(4):2259–2268. pmid:14704201
  69. 69. Bernardo R. What proportion of declared QTL in plants are false? Theor Appl Genet. 2004;109(2):419–424. pmid:15085262
  70. 70. Göring HH, Terwilliger JD, Blangero J. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001;69(6):1357–1369. pmid:11593451
  71. 71. Henderson CR. Estimation of variance and covariance components. Biometrics. 1953;9(2):226–252.
  72. 72. Searle SR, Gruber MH. Linear models. Wiley Online Library; 1971.
  73. 73. Piepho HP. A coefficient of determination (R2) for generalized linear mixed models. Biom J. 2019;61:860–872. pmid:30957911
  74. 74. Searle SR. Linear models for unbalanced data. Wiley; 1987.
  75. 75. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O. SAS system for mixed models. vol. 633. SAS institute Cary, NC; 1996.
  76. 76. Bernardo R. Breeding for quantitative traits in plants. vol. 1. Stemma press Woodbury, MN; 2002.
  77. 77. Inc SI. SAS/STAT 13.1 User’s Guide: Chapter 43—The GLIMMIX Procedure. Author Cary, NC; 2013. Available from: https://support.sas.com/documentation/onlinedoc/stat/131/glimmix.pdf.
  78. 78. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. J Stat Softw. 2015;67(1):1–48.
  79. 79. Gbur EE, Stroup WW, McCarter KS, Durham S, Young LJ, Christman M, et al. Analysis of generalized linear mixed models in the agricultural and natural resources sciences. vol. 156. John Wiley & Sons; 2020.
  80. 80. Broman KW, Sen S. A Guide to QTL Mapping with R/QTL. vol. 46. Springer; 2009.
  81. 81. Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. The American Journal of Human Genetics. 2010;86(1):6–22. pmid:20074509
  82. 82. Beavis W, Smith O, Grant D, Fincher R. Identification of quantitative trait loci using a small sample of topcrossed and F4 progeny from maize. Crop Sci. 1994;34(4):882–896.
  83. 83. Luo L, Mao Y, Xu S. Correcting the bias in estimation of genetic variances contributed by individual QTL. Genetica. 2003;119(2):107–114. pmid:14620950
  84. 84. Zhang J, Yue C, Zhang Y. Bias correction for estimated QTL effects using the penalized maximum likelihood method. Heredity. 2012;108(4):396–402. pmid:21934700
  85. 85. Jivanji S, Worth G, Lopdell TJ, Yeates A, Couldrey C, Reynolds E, et al. Genome-wide association analysis reveals QTL and candidate mutations involved in white spotting in cattle. Genet Sel Evol. 2019;51(1):62. pmid:31703548
  86. 86. Pincot DD, Poorten TJ, Hardigan MA, Harshman JM, Acharya CB, Cole GS, et al. Genome-wide association mapping uncovers Fw1, a dominant gene conferring resistance to Fusarium wilt in strawberry. G3: Genes, Genomes, Genet. 2018;8(5):1817–1828. pmid:29602808
  87. 87. Conner JK, Hartl DL, et al. A primer of ecological genetics. Sinauer Associates Incorporated; 2004.
  88. 88. Mrode RA. Linear models for the prediction of animal breeding values. Cabi; 2014.
  89. 89. Choy Y, Brinks J, Bourdon R. Repeated-measure animal models to estimate genetic components of mature weight, hip height, and body condition score. Journal of animal science. 2002;80(8):2071–2077. pmid:12211374
  90. 90. Dekkers JC. Commercial application of marker-and gene-assisted selection in livestock: strategies and lessons. J Anim Sci. 2004;82(suppl_13):E313–E328. pmid:15471812
  91. 91. Dekkers J. Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet. 2007;124(6):331–341. pmid:18076470
  92. 92. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–4423. pmid:18946147
  93. 93. Sun X, Habier D, Fernando RL, Garrick DJ, Dekkers JC. Genomic breeding value prediction and QTL mapping of QTLMAS2010 data using Bayesian methods. In: BMC proceedings. BioMed Central; 2011. p. 1–8.
  94. 94. Isik F, Holland J, Maltecca C. Genetic data analysis for plant and animal breeding. Springer; 2017.
  95. 95. Astle W, Balding DJ. Population structure and cryptic relatedness in genetic association studies. Statistical Science. 2009;24(4):451–471.
  96. 96. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565. pmid:20562875
  97. 97. Endelman JB, Jannink JL. Shrinkage estimation of the realized relationship matrix. G3: Genes, Genomes, Genet. 2012;2(11):1405–1413. pmid:23173092
  98. 98. Cary N. SAS/STAT 13.1 User’s Guide; 2013.
  99. 99. Piepho HP, Moehring J, Schulz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biom J. 2012;54(6):844–860.
  100. 100. Damesa TM, Möhring J, Worku M, Piepho HP. One step at a time: stage-wise analysis of a series of experiments. Agron J. 2017;109(3):845–857.
  101. 101. Damesa TM, Hartung J, Gowda M, Beyene Y, Das B, Semagn K, et al. Comparison of Weighted and Unweighted Stage-Wise Analysis for Genome-Wide Association Studies and Genomic Selection. Crop Sci. 2019;59:2572–2584.
  102. 102. Schmidt P, Hartung J, Bennewitz J, Piepho HP. Heritability in Plant Breeding on a Genotype-Difference Basis. Genetics. 2019;212(4):991–1008. pmid:31248886
  103. 103. Estaghvirou SBO, Ogutu JO, Schulz-Streeck T, Knaak C, Ouzunova M, Gordillo A, et al. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding. BMC Genomics. 2013;14(1):860.
  104. 104. Efron B, Tibshirani R. The bootstrap method for assessing statistical accuracy. Behaviormetrika. 1985;12(17):1–35.
  105. 105. Oehlert GW. A note on the delta method. The American Statistician. 1992;46(1):27–29.
  106. 106. Johnson NL, Kemp AW, Kotz S. Univariate discrete distributions. vol. 444. John Wiley & Sons; 2005.
  107. 107. Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9(1):29. pmid:23876160
  108. 108. Jensen J, Su G, Madsen P. Partitioning additive genetic variance into genomic and remaining polygenic components for complex traits in Dairy cattle. BMC Genet. 2012;13(1):44. pmid:22694746
  109. 109. Herr DG. On the history of ANOVA in unbalanced, factorial designs: The first 30 years. The American Statistician. 1986;40(4):265–270.
  110. 110. Langsrud Ø. ANOVA for unbalanced data: Use Type II instead of Type III sums of squares. Statistics and Computing. 2003;13(2):163–167.
  111. 111. Stroup W, Littelll R. Impact of variance component estimates on fixed effect inference in unbalanced linear mixed models. Proceedings of the Kansas State University Conference on Applied Statistics in Agriculture. 2002;14:32–48.
  112. 112. Stroup WW, Milliken GA, Claassen EA, Wolfinger RD. SAS for mixed models: introduction and basic applications. SAS Institute; 2018.
  113. 113. Piepho H, Möhring J, Melchinger A, Büchse A. BLUP for phenotypic selection in plant breeding and variety testing. Euphytica. 2008;161(1):209–228.
  114. 114. Searle SR, Casella G, McCulloch C. Variance components. John Wiley & Sons; 1992.
  115. 115. Molenaar H, Boehm R, Piepho HP. Phenotypic selection in ornamental breeding: it’s better to have the BLUPs than to have the BLUEs. Frontiers in plant science. 2018;9:1511. pmid:30455707
  116. 116. Hector A, Von Felten S, Schmid B. Analysis of variance with unbalanced data: an update for ecology & evolution. Journal of animal ecology. 2010;79(2):308–316. pmid:20002862
  117. 117. Gianola D, Perez-Enciso M, Toro MA. On marker-assisted prediction of genetic value: beyond the ridge. Genetics. 2003;163(1):347–365. pmid:12586721
  118. 118. Goddard M, Hayes B. Genomic selection. J Anim Breed Genet. 2007;124(6):323–330. pmid:18076469
  119. 119. Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics. 2013;194(3):597–607. pmid:23640517
  120. 120. Meuwissen T, Hayes B, Goddard M. Genomic selection: A paradigm shift in animal breeding. Animal Frontiers. 2016;6(1):6–14.
  121. 121. Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017;22(11):961–975. pmid:28965742
  122. 122. R Core Team. R: A Language and Environment for Statistical Computing; 2020. Available from: https://www.R-project.org/.
  123. 123. Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;25(24):4279–4292. pmid:16947139
  124. 124. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–2102. pmid:30652356