Advertisement
  • Loading metrics

Disentangling group specific QTL allele effects from genetic background epistasis using admixed individuals in GWAS: An application to maize flowering

  • Simon Rio,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France

  • Tristan Mary-Huard,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France, MIA, INRAE, AgroParisTech, Université Paris-Saclay, 75005, Paris, France

  • Laurence Moreau,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France

  • Cyril Bauland,

    Roles Data curation, Investigation, Resources

    Affiliation Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France

  • Carine Palaffre,

    Roles Data curation, Investigation, Resources

    Affiliation UE 0394 SMH, INRAE, 2297 Route de l’INRA, 40390, Saint-Martin-de-Hinx, France

  • Delphine Madur,

    Roles Data curation, Investigation, Resources

    Affiliation Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France

  • Valérie Combes,

    Roles Data curation, Investigation, Resources

    Affiliation Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France

  • Alain Charcosset

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    alain.charcosset@inrae.fr

    Affiliation Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France

Disentangling group specific QTL allele effects from genetic background epistasis using admixed individuals in GWAS: An application to maize flowering

  • Simon Rio, 
  • Tristan Mary-Huard, 
  • Laurence Moreau, 
  • Cyril Bauland, 
  • Carine Palaffre, 
  • Delphine Madur, 
  • Valérie Combes, 
  • Alain Charcosset
PLOS
x

Abstract

When handling a structured population in association mapping, group-specific allele effects may be observed at quantitative trait loci (QTLs) for several reasons: (i) a different linkage disequilibrium (LD) between SNPs and QTLs across groups, (ii) group-specific genetic mutations in QTL regions, and/or (iii) epistatic interactions between QTLs and other loci that have differentiated allele frequencies between groups. We present here a new genome-wide association (GWAS) approach to identify QTLs exhibiting such group-specific allele effects. We developed genetic materials including admixed progeny from different genetic groups with known genome-wide ancestries (local admixture). A dedicated statistical methodology was developed to analyze pure and admixed individuals jointly, allowing one to disentangle the factors causing the heterogeneity of allele effects across groups. This approach was applied to maize by developing an inbred “Flint-Dent” panel including admixed individuals that was evaluated for flowering time. Several associations were detected revealing a wide range of configurations of allele effects, both at known flowering QTLs (Vgt1, Vgt2 and Vgt3) and new loci. We found several QTLs whose effect depended on the group ancestry of alleles while others interacted with the genetic background. Our GWAS approach provides useful information on the stability of QTL effects across genetic groups and can be applied to a wide range of species.

Author summary

Identification of genomic regions involved in genetic architecture of traits has become commonplace in quantitative genetics studies. Genetic structure is a common feature in human, animal and plant species and most current methods target genomic regions whose effects on traits are conserved between genetic groups. However, a heterogeneity of allele effects may be observed due to different factors: a group-specific correlation between the alleles of the tagged marker and those of the causal variant, a group-specific mutation at the causal variant or an epistatic interaction between the causal variant and the genetic background. We propose a new method adapted to structured populations including admixed individuals, which aims to identify these genomic regions and to unravel the previous factors. The method was applied to a maize inbred diversity panel including lines from the dent and the flint genetic groups, as well as admixed lines, evaluated for flowering time. Several genomic regions were detected with various configurations of allele effects, with evidence of epistatic interactions between some of the loci and the genetic background.

Introduction

Quantitative traits are genetically determined by numerous regions of the genome, also known as quantitative trait loci (QTLs). The advent of high density genotyping of single nucleotide polymorphisms (SNPs) has opened the way to the identification of QTLs in diversity panels. These studies, referred to as genome-wide association studies (GWAS), use the linkage disequilibrium (LD) between the SNPs and causal variants at QTLs underlying the traits of interest. The panels evaluated in GWAS often include sets of individuals with complex pedigrees or genetic structure [1]. The latter is a common feature in human, animal and plant species and arises when groups of individuals cease to mate with each other and start to be subjected to different evolutionary forces, such as drift or selection [2].

Applying GWAS in a diversity panel including individuals from different groups raises the issue of spurious associations. The stratification of a population into genetic groups generates LD between loci that are differentiated between groups but not necessarily genetically linked. When a given trait is characterized by contrasted group-specific means, all these SNPs will correlate to it and may be detected as false positives. An efficient control of these spurious associations can be done by taking structure and kinship into account in the statistical model [1, 3]. This procedure will however limit the statistical power at differentiated SNPs, making them difficult to detect in multi-group GWAS, especially in case of rare alleles [4].

In a structured population, group-specific allele effects can be observed at SNPs, and testing an overall effect using a standard GWAS model may not be effective if the QTL effect is of opposite sign in the different groups. Such effects can result from group differences in LD between SNPs and QTLs across genetic groups. A different LD extent or linkage phase between linked loci can be explained by specific dynamics of population size such as bottlenecks or expansions [5, 6]. Such patterns of LD were identified in numerous species including human [7, 8], dairy and beef cattle [9, 10], pig [11], wheat [12] and maize [1316]. A genetic mutation appearing in a QTL region may also lead to group-specific allele effects if it occurred in a founder specific of the genetic group. Several Mendelian syndromes of obesity were shown to result from mutation within specific ethnicities in human [17]. Another possibility consists in QTLs interacting with other loci that have differentiated allele frequencies between groups (i.e. interacting with the genetic background). In human, this possibility was discussed for a candidate gene associated with a higher risk of myocardial infarction in African American than in European populations [18, 19]. Another example is a SNP in the promoter region of HNF4A gene which was associated with a higher risk of developing type 2 diabetes in Askenazi compared to United Kingdom populations [20]. This locus was later proven to be interacting with another gene in the Askenazi population [21]. In maize, evidences of QTLs with group-specific allele effects can also be found, even though the cause of these differences remains unclear. The presence of allelic series has been demonstrated for QTLs associated with flowering time, including Vgt1 [22]. A QTL with group-specific allele effects was also identified in a maize diversity panel for a phenology trait [23]. More generally, studying the stability of QTL allele effects across genetic backgrounds is an important issue. In human, it determines the ability of a genetic marker to predict the predisposition of an individual to develop a genetic disease across ethnic groups. In plant or animal breeding, it conditions the success of introgressing a favorable allele coming from a source of diversity into an elite genetic material.

Different GWAS strategies were adopted to address this issue depending on the species. In human, GWAS mostly focused on a specific genetic group, and these group-specific studies were compared later through meta-analyses [24, 25]. Some of these meta-analyses revealed highly conserved effects between populations [26, 27] while other put in evidence more differences [28]. In dairy cattle, the first GWAS studies focused on a specific breed [2931]. More recently, multi-breed GWAS were conducted to refine QTLs locations by taking advantage of the low LD extent observed in such composite populations [3234]. In maize, the possibility to use seeds from different origins and generations led geneticists to assemble GWAS panels with a broad range of genetic materials [3537]. These panels often include a limited proportion of admixed individuals that were derived from crosses between individuals from different genetic groups. The genomes of these admixed individuals consist in mosaics of fragments with different ancestries. Admixture events are a common feature in living species and can contribute to the successful colonization of new environments [38, 39]. In plants, innovative admixed genetic materials were created to enable high statistical power of QTL detection along with a wide spectrum of genetic diversity studied, such as nested association mapping (NAM) [40] or multi-parent advanced generation inter-cross (MAGIC) [41]. Both NAM and MAGIC populations are of great interest to study the stability of QTL effects in a wide range of genetic backgrounds. However, they generally include a limited number of founders and do not address the stability of QTL allele effects across genetic groups.

This study aimed at evaluating the interest of producing admixed individuals, derived from a large set of parents, in order to decipher the genetic architecture of a trait using innovative GWAS models. The objectives were (i) to demonstrate the interest of multi-group analyses to identify new QTLs, (ii) to highlight the interest of applying multi-group GWAS models to identify group-specific allele effects at QTLs and (iii) to show how admixed individuals can help to disentangle the factors causing the heterogeneity of allele effects across groups: local genomic differences or epistatic interactions between QTLs and the genetic background. To our knowledge, no method has been proposed in the literature to address the last objective. This method was applied to a maize inbred population evaluated for flowering traits, including dent, flint and admixed lines. Maize flowering time is an interesting trait to analyze in quantitative genetics studies. It is considered as a major adaptive trait by tailoring vegetative and reproductive growth phases to local environmental conditions.

Materials and methods

Genetic material and genotypic data

Genetic material consisted in a panel of 970 maize inbred lines assembled within the “Amaizing” project. It gathered 300 dent lines, 304 flint lines and 366 admixed doubled haploids, further referred to as admixed lines. The dent lines were those included in the “Amaizing Dent” panel [42] and the flint lines were those included in the “CF-Flint” panel [16]. The dent and flint lines aimed at representing the diversity of their respective heterotic group used in European breeding and included several breeding generations. The admixed lines were derived from 206 hybrids between flint and dent lines, mated according to a sparse factorial design (Fig 1), followed by in situ gynogenesis [43] to produce fixed admixed inbred lines. Each dent or flint line was involved in 0 to 11 hybrids (1.21 in average), each leading to 1 to 4 admixed lines (1.77 in average). In total, 171 dent lines and 172 flint lines were involved as parents of admixed lines.

thumbnail
Fig 1. Diagram of admixed lines production from hybrids obtained by mating dent and flint lines according to a sparse factorial design.

https://doi.org/10.1371/journal.pgen.1008241.g001

All the flint and dent lines were genotyped using the 600K Affymetrix Maize Genotyping Array [44]. Residual heterozygous data was treated as missing and all missing values were imputed independently within each group using Beagle v.3.3.2 and default parameters [45]. The few heterozygous genotypic datapoints imputed by Beagle (0.00084% of all datapoints) were randomly assigned to homozygous genotypes. The admixed lines were genotyped with a 15K chip provided by the private company Limagrain which included a reduced set of SNPs from the 50K Illumina MaizeSNP50 BeadChip [46]. Eight check lines were genotyped with both 600K and 15K genotyping technologies to standardize the reference alleles (0/1) on the set of shared SNPs between the 600K and 15K datasets (9,015 SNPs). Admixed lines were then imputed to 600K SNPs using the following procedure, illustrated in S1 Fig. The positions of recombination breakpoints and the parental origins of the alleles for admixed lines were determined with the set of 9,015 shared SNPs. SNPs for which parental lines carry different alleles allowed us to identify the parental line that transmitted its allele to its admixed progeny. For a given admixed line, changes of parental origins of alleles along a given chromosome indicated the location of recombination breakpoints. A smoothing of parental allele origins was performed for the few SNPs indicating discordant information with respect to the chromosome block in which they were located. In this case, we considered the underlying genotypic datapoint as missing. Parental origins of alleles in admixed lines were imputed up to 600K using adjacent SNP information. If a set of SNPs to be imputed was located within a recombination interval, the new position of the breakpoint was positioned at half of that ordered set, according to the physical position of the SNPs along the chromosome (average proportions of SNPs located within such intervals was 0.93% for a given admixed individual). Alleles at SNPs were then imputed based on their origin using parental genotypic data. The MITE associated with the flowering QTL Vgt1 [47, 48] was also genotyped for all the individuals (0: absence, 1: presence). There was a total of 482,013 polymorphic SNPs in this dataset, for which we had information for each individual concerning the SNP allele (0/1), its ancestry (dent/flint) and the genetic background (dent/flint/admixed) in which it was observed.

The dent genome proportion of the admixed lines ranged from 0.16 to 0.86 with a mean equal to 0.51 (S2 Fig). Possible selection biases were studied along the genome by comparing the observed allele frequencies with the expected allele frequencies given the pedigree. No major pattern was observed, suggesting no or minor selection biases among the admixed lines (S3 Fig). A PCoA was performed on genetic distances computed as Dl,l = 1 − Kl,l, with Kl,l being the kinship coefficient between lines l and l′ computed following Eq (2)—see below—assuming a common genetic background for all individuals, i.e. using an average frequency of allele 1 at each locus. The flint and dent lines are clearly distinguished on the two principal coordinates, with a small overlapping region in the center of the graph, while the admixed lines fill the genetic space between the two groups (Fig 2). The same PCoA calculated using the set of 9,015 shared SNPs between the 600K and 15K datasets showed a very similar structure pattern on the first two axes, as shown in S4 Fig.

thumbnail
Fig 2. PCoA on genetic distances with coloration of individuals depending on their genetic background: dent, flint or admixed.

https://doi.org/10.1371/journal.pgen.1008241.g002

LD between pairs of loci was estimated separately in the dent and the flint datasets using the square correlation r2 between loci pairs. We only considered SNPs for which at least ten individuals carried the minor allele in both dent and flint datasets. For each group, LD was calculated and averaged for sets of loci pairs characterized by a similar physical distance ranging from 0 to 2 Mbp, considering a sliding window of 1Kbp. The inter-group LD comparison revealed a higher LD extent in the dent than in the flint genetic group (S5 Fig), which was consistent with previous studies [1316]. As suggested by [9], the persistence of LD linkage phases across flint and dent genetic groups was evaluated by computing the correlation between the r estimated in each group, along the same sliding window of 1Kbp. We also studied the consistency of LD linkage phases between groups by computing the correlation between their signs in the two groups, giving a value of “0” and “1” for a negative and a positive r, respectively. LD phases were very consistent over short physical distances but began to diverge dramatically when the loci were distant by more than 100-200 Kbp (S6 Fig).

Phenotypic data

All the lines were evaluated per se at Saint-Martin-de-Hinx (France) in 2015 and 2016 for male flowering (MF) and female flowering (FF), in calendar days after sowing. Each trial was a latinized alpha design where every line was evaluated two times on average. Field trials were divided into two blocks of 33 sub-blocks each comprising 36 plots. To avoid competition between genetic backgrounds, dent, flint and admixed lines were sown in different sub-blocks. Three check lines were repeated in all sub-blocks (B73, F353 and UH007). Each plot consisted in a row of 25 plants. MF and FF were measured as a median value within the whole plot.

The contribution of Genotype x Environment (GxE) interactions to the phenotypic variance and the level of broad-sense heritability were investigated using the following model: where Yjklrc is the phenotype, μ is the intercept, βj is the fixed effect of trial j, αk is the fixed effect of genetic background k (dent, flint, admixed, or the different checks: B73, F353 and UH007), Gkl is the random genotype effect of line l in genetic background k (not for checks) with being the genotypic variance in genetic background k, (G × β)jkl is the random GxE interaction of line l in genetic background k for trial j, with being the GxE variance in the genetic background k for trial j, Ejklrc is the error with being the error variance for trial j, Xjr and Zjc are the row and column random effects in trial j, respectively, as defined by the field design. All random effects are independent of each other. The row and column effects were modeled as independent or using an autoregressive model (AR1), as determined based on the AIC criterion (S1 Table). Least squares means (), further referred to as phenotypes (Ykl), were computed over the whole design using the same model, with genotypes as fixed effects: where γkl is the fixed genotype effect of line l in genetic background k. Model parameters were estimated using ASReml-R and restricted maximum likelihood (ReML) [49].

General polygenic model

In this study, the following general polygenic model was considered: (1) where Ykl is the phenotype (least squares mean) of line l in genetic background k among the N individuals of the sample, μ is the intercept, αk is the genetic background effect with k ∈ {D, F, A} for dent, flint and admixed genetic background, respectively, Gkl is the random genetic value of the line with being the concatenated vector of the genetic values in each genetic background where , Kk,k is the kinship matrix between individuals from genetic background k and k′ computed following Eq (2), is the genetic variance in genetic background k, is the genetic covariance between genetic background k and k′, Ekl is the error associated with line l in genetic background k with independent and identically distributed, and is the error variance.

The kinship between lines l from genetic background k and l′ from genetic background k′, Kkl,kl, was computed following [50]: (2) where Wlm is the genotype of line l at locus m coded 0/1 and fmk is the frequency of allele 1 at locus m in genetic background k. Note that Eq (2) simplifies to the kinship estimator proposed by [51] when l and l′ belong to the same genetic background.

GWAS models

In this study, three GWAS models were applied to different population samples (Table 1). The GWAS strategies were (i) to analyze dent and flint lines separately using a standard GWAS model M1, (ii) to analyze dent and flint lines jointly using a GWAS model M2 accounting for allele ancestry (confounded with the genetic background) and (iii) to analyze dent, flint and admixed lines using a GWAS model M3 accounting for both allele ancestry and the genetic background of the individuals. All models aimed at detecting a SNP effect, defined as a contrast effect between alleles 0 and 1 at a given SNP.

thumbnail
Table 1. Population sample to which each GWAS model was applied with the corresponding number of SNPs conserved for the analysis (at least 10 individuals carrying the minor allelic state).

https://doi.org/10.1371/journal.pgen.1008241.t001

Standard GWAS model M1.

The first GWAS model M1 [1] was applied separately to the dent and flint datasets. For each SNP among the M loci, one has: where is the effect of the SNP allele i at locus m (Table 2). All other terms are identical to those appearing in Eq (1), and the kinship was computed following Eq (2) which simplifies to the kinship estimator proposed by [51]. The existence of a SNP effect was tested using hypothesis .

thumbnail
Table 2. Allelic states observed in each GWAS model, resulting from a combination of SNP alleles, their ancestry and the genetic background in which they are observed.

https://doi.org/10.1371/journal.pgen.1008241.t002

Multi-group GWAS model M2.

We applied a multi-group GWAS model M2 jointly to the flint and dent datasets, specifying the allele ancestry (confounded with the genetic background). For a given SNP m, one has: where is the effect of the SNP allele i with ancestry j at locus m, as defined in Table 2. All other terms are identical to those appearing in Eq (1). At a given SNP, the following hypotheses were tested:

Hypotheses and test the existence of a dent and a flint SNP effect, respectively. Hypothesis tests for a general SNP effect while tests for a divergent SNP effect between the dent and flint ancestries.

Multi-group GWAS model M3.

We applied a multi-group GWAS model M3 jointly to the flint, dent and admixed datasets, specifying the allele ancestry and the genetic background of the individual. For a given SNP m, one has: where is the effect of the SNP allele i with ancestry j at locus m in genetic background k, as defined in Table 2. All other terms are identical to those appearing in Eq (1). At a given SNP, 16 hypotheses were tested (Table 3). Hypotheses referred to as “simple” (, , and ) were tested to identify QTLs with a significant SNP effect for each combination of ancestries and genetic backgrounds. For instance, tests whether a dent SNP effect (differential effect between alleles 0 and 1 of dent ancestry) is significant in the admixed genetic background. Hypotheses referred to as “general” (, , , and, ) were used to identify QTLs with a mean SNP effect over ancestries and genetic backgrounds. For instance, tests for a general flint SNP effect in the flint and the admixed genetic backgrounds and tests for a general SNP effect over ancestries and genetic backgrounds. Hypotheses referred to as “divergent” (, , , , , , , , ) were tested to identify QTLs with a contrasted SNP effect between ancestries and/or genetic backgrounds. For instance, tests for a divergent dent SNP effect between the dent and the admixed genetic backgrounds, which amounts to testing an epistatic interaction between the SNP and the genetic background (see S1 Appendix for details).

thumbnail
Table 3. Linear combinations tested with M3 compared to hypotheses tested using other GWAS models (M1 and M2).

https://doi.org/10.1371/journal.pgen.1008241.t003

On a biological standpoint, a QTL with contrasted SNP effects between groups can be caused by (i) a local genomic difference due to a group-specific genetic mutation for all or part of the lines and/or to group differences in LD or (ii) an interaction with the genetic background. Under the first hypothesis, one expects that the effect of a SNP depends on its ancestry but not on the genetic background (admixed or pure, see Fig 3a). Under the second hypothesis, we expect a SNP effect, for a given ancestry, to vary depending on the genetic background. One example would be a QTL with a strong SNP effect in a dent genetic background, but none in the flint genetic background, while the SNP effects would be of intermediate size for alleles of both ancestries in the admixed genetic background (see Fig 3b). Note that other complex configurations are possible, justifying the inclusion of all tests in the analysis.

thumbnail
Fig 3. Schematic of allele effects when divergent SNP effects are observed between groups, depending on the biological hypothesis: (a) local genomic difference between groups (LD or mutation) and (b) allele effects interacting with the genetic background.

The denomination of the allelic states on the x-axis include the SNP allele (0/1), its ancestry (D/F) and the genetic background in which it is observed (D/A/F), as presented in Table 2.

https://doi.org/10.1371/journal.pgen.1008241.g003

For the three GWAS models, a SNP was discarded if its minor allelic state, as defined in Table 2, was carried by less than 10 individuals, or if it carried a redundant genetic information (genetic information identical to that of another SNP already included in the dataset). To avoid prohibitive computational times, a two-step strategy was adopted for the inference of models M2 and M3. In a first step, the parameters of the “null” model of Eq (1) were estimated. The variance parameters were then plugged into their respective covariance matrices in order to derive a genetic covariance matrix G and an error covariance matrix R. In a second step, a model was fitted that included SNP fixed effects, as defined in M2 (or M3), and two random effects (one genetic effect and one error effect) with covariance matrices G and R, respectively. Note that this strategy corresponds to fitting M2 (or M3) while keeping some variance ratios fixed to their respective values obtained in the “null” model.

Model parameters were estimated using ReML and the linear combinations of fixed effects were tested using Wald tests, both implemented in the R-package MM4LMM [52]. P-values were computed using the asymptotic null distribution of the Wald statistic, as presented in [4]. The false discovery rate (FDR) was controlled by applying the procedure of [53] jointly to the whole set of tests defined by each GWAS strategy, and repeatedly for each trait. All GWAS strategies were evaluated for their ability to control type I error and for their statistical power, using simulated phenotypes. Results are presented in S2 Appendix. In general, all models correctly controlled for false positives, and a higher power was observed for multi-group models, notably due to their ability to identify QTLs with complex configurations of effects.

For a given hypothesis tested, significant SNPs were clustered into QTLs if they were located within a physical window of 3 Mbp, leading to a LD below 0.05 between markers of different QTLs.

Results

Associations detected and comparison of GWAS strategies

We observed a substantial phenotypic variability within the dent, flint and admixed genetic backgrounds for both traits. The variance components estimated in the phenotypic analysis are summarized in S1 Table. GxE variances were limited and the broad sense heritabilities were high for each genetic background, ranging from 0.88 in the admixed lines to 0.96 in the dent and flint lines for both MF and FF. The model parameters estimated using the general polygenic model of Eq (1) are presented in S2 Table and showed a larger genetic variance in the dent compared to the flint and admixed genetic backgrounds.

For each GWAS model, two levels of FDR were used: 5% and 20% to declare a SNP as significantly associated. The number of significant SNPs detected and the corresponding number of QTLs were summarized in Table 4 for both traits. The location of QTLs detected using a FDR of 20% was represented along the genome in Fig 4 for MF and in S7 Fig for FF. All associations are listed in S3 and S4 Tables. Note that some SNPs were declared significant by a model (e.g. M1) but were discarded with another model (e.g. M3) because of the filtering on the frequency of each allelic state.

thumbnail
Table 4. Number of SNPs associated with each trait, depending on the GWAS strategy, using a FDR of 5% and 20%.

The number of corresponding QTLs is also indicated.

https://doi.org/10.1371/journal.pgen.1008241.t004

thumbnail
Fig 4. Position of QTLs detected with (a) M1, (b) M2 and (c) M3 for MF using a FDR of 20%.

The size of the grey dots is proportional to the -log10(pval) of the test at the most significant SNP of the region. Red vertical lines correspond to the location of the QTLs presented in section “Highlighted QTLs”. Note that major QTLs detected by a model may be discarded with another model because of filtering on allele frequencies.

https://doi.org/10.1371/journal.pgen.1008241.g004

First, a standard GWAS model M1 was applied separately to the dent and the flint datasets. Based on a 20% FDR, 35 SNPs were associated with MF in the dent dataset while 21 SNPs were associated in the flint dataset. These SNPs can be clustered into 12 QTLs in the dent dataset and into 13 QTLs in the flint dataset. Interestingly, none of these SNPs were detected in both datasets and they only pointed to one common QTL between datasets, which was located in the vicinity of Vgt2 on chromosome 8 [15].

Secondly, dent and flint datasets were analyzed jointly using model M2, which takes into account the dent or flint ancestry of the allele. Note that the allele ancestry is confounded with the genetic background in this model. Based on a 20% FDR, 10 SNPs were associated with MF and were significant for (5 SNPs), (4 SNPs) and (3 SNPs). Some SNPs displayed more than one significant test, which explains why the total number of SNPs over the four tests did not sum to 10. These SNPs can be clustered into 5 QTLs that were significant for (4 QTLs), (2 QTLs) and (2 QTLs). Some QTLs were already detected using M1 such as the QTL located in the vicinity of Vgt3 on chromosome 3 [54, 55] detected in the dent dataset. Other QTLs were specific to M2 like the QTL located chromosome 1 detected using for FF, or specific to M1 such as the QTL located on chromosome 2 detected in the flint dataset. Based on a 20% FDR, a larger number of QTLs was detected with M1 compared to M2 for both traits.

Finally, the dent, flint and admixed lines were analyzed jointly using model M3 which distinguished the allele ancestry and the genetic background. The existence of a dent SNP effect was tested in the dent () and in the admixed genetic backgrounds (), and similarly for the flint SNP effect ( and ). Several hypotheses on general and divergent SNP effects were also tested between ancestries and genetic backgrounds (Table 3). Based on a 20% FDR, 56 SNPs were associated with MF and were significant for (19 SNPs), (2 SNPs), (4 SNPs) and others. These SNPs can be clustered into 17 QTLs that were significant for (5 QTLs), (2 QTLs), (4 QTLs) and others. Some of the QTLs were already detected using M1 and M2 such as the QTL located in the vicinity of Vgt3 on chromosome 3, while several QTLs were specific to M3 such as the QTL detected in chromosome 2 using . Several QTLs were detected as showing a divergent SNP effect, including hypotheses testing an interaction with the genetic background. Based on a 20% FDR, a similar number of QTLs was detected using M3 and M1 for MF and M3 was intermediate between M1 and M2 for FF.

Highlighted QTLs

Among the 17 QTLs detected for MF with M3, six QTLs were selected and studied in further details. These QTLs had (i) at least one significant test among M3 hypotheses based on a FDR of 20%, and (ii) a large frequency for each allele with a minimum of 23 lines carrying the minor allelic state (Vgt1). Among them, SNPs were located in the vicinity of known maize flowering QTLs: Vgt1 [22, 47, 48], Vgt2 [15] and Vgt3 [54, 55]. For all QTLs, information concerning their physical position along the genome, the frequency of each allelic state and their -log10(pval) at each test was summarized in Table 5. The distribution of the phenotypes is illustrated for each allele after adjusting for the variation due to the polygenic background in Fig 5, and their location along the genome is indicated by red vertical lines in Fig 4.

thumbnail
Fig 5. Boxplots of phenotypes adjusted for polygenic background variation using relatedness (MF K corrected) for the different alleles of the six highlighted QTLs: (a) Vgt1, (b) Vgt2, (c) Vgt3, (d) QTL4.1, (e) QTL2.1 and (f) QTL7.2 using M3.

The denomination of the allelic states on the x-axis includes the SNP allele (0/1), its ancestry (D/F) and the genetic background in which it was observed (D/A/F), as presented in Table 2.

https://doi.org/10.1371/journal.pgen.1008241.g005

thumbnail
Table 5. Information regarding the six highlighted QTLs.

The -log10(pval) of M2 and M3 were obtained by training the complete GWAS models with all the genetic components presented in Eq (1) on the six SNPs that were previously detected using the approximate model.

https://doi.org/10.1371/journal.pgen.1008241.t005

The SNP matching Vgt1 region on chromosome 8 was detected as associated with MF (20% FDR) using (-log10(pval) = 5.96) in M3. This QTL showed a contrasted effect between alleles of different ancestries with an apparent inversion of effects (Fig 5a). This observation was supported by a high -log10(pval) for the tests related to a divergent SNP effect between ancestries: (3.83), (3.90), (4.13) and (5.96). Conversely a low -log10(pval) was detected for tests and , which would have otherwise suggested an interaction with the genetic background. These results support the existence of a local genomic difference at Vgt1 between the dent and the flint genetic groups for MF, but no interaction with the genetic background.

The SNP matching Vgt2 region on chromosome 8 was detected as associated with MF (20% FDR) using (-log10(pval) = 6.68) in M3. This QTL showed a conserved effect across ancestries and genetic backgrounds (Fig 5b). This observation was supported by a high -log10(pval) for tests related to a general SNP effect: (6.04), (6.30), (5.23), (3.65) and (6.68), and a low -log10(pval) for tests related to divergent SNP effects (all below 1).

The SNP matching Vgt3 region on chromosome 3 was detected as associated with MF (5% FDR) using (-log10(pval) = 8.69) in M3. This QTL showed a large effect in the dent genetic background, a medium effect in the admixed genetic background regardless of the allele ancestry and a small effect in the flint genetic background (Fig 5c). This observation was supported by a high -log10(pval) for the tests related to the dent SNP effect in the dent genetic background: Δm (M1 (Dent), 10.99), (9.42) and (8.69), and a low -log10(pval) for the tests related to the flint SNP effect in a flint genetic background. Like for Vgt2, a high -log10(pval) was also detected for tests related to a general SNP effect: (7.81), (7.11), (6.09) and (6.81), but a high -log10(pval) was detected for the test related to a divergent SNP effect between the dent and the flint genetic backgrounds: (3.47). There was also a high -log10(pval) for a divergent dent SNP effect between different genetic backgrounds: (2.28). All these results support the existence of a QTL effect that tends to be higher when the dent genome proportion increases within individuals. It suggests that Vgt3 interacts with the genetic background for MF.

The SNP matching a region further referred to as QTL4.1 on chromosome 4 was detected as associated with MF (20% FDR) using (-log10(pval) = 6.59) in M3. This QTL is very similar to Vgt1 as it showed a contrasted effect between alleles of different ancestries with an apparent inversion of effects (Fig 5d). This observation was supported by a high -log10(pval) for the tests related to a divergent SNP effect between ancestries: (5.54), (6.59) and (5.38). These results support the existence of a local genomic difference at QTL4.1 between the dent and the flint genetic groups for MF, but no interaction with the genetic background.

The SNP matching a region further referred to as QTL2.1 on chromosome 2 was detected as associated with MF (5% FDR) using (-log10(pval) = 8.99) in M3. This QTL showed a flint effect in the admixed genetic background (Fig 5e), which was supported by a high -log10(pval) for the test (8.99). Although there was a high -log10(pval) for a general flint SNP effect across genetic backgrounds: (6.42), a high -log10(pval) was observed for a divergent SNP effect between those same alleles: (3.98). A high -log10(pval) was also observed for a divergent SNP effect between different ancestries in the admixed genetic background: (5.44). All these results support the existence of a QTL effect existing only for alleles of flint ancestry in the admixed genetic background. It suggests that QTL2.1 is specific of flint ancestry and interacts with the genetic background for MF.

The SNP matching a region further referred to as QTL7.2 on chromsome 7 was detected as associated with MF (20% FDR) using (-log10(pval) = 6.20) in M3. This QTL showed contrasted dent effects between the dent and the admixed genetic backgrounds (Fig 5f). This observation was supported by a high -log10(pval) for the test related to a divergent dent SNP effect between genetic backgrounds: (5.43). A high -log10(pval) was also observed for the hypothesis testing the equality between the divergent dent SNP effect and the divergent flint SNP effect: (6.20). All these results support the existence of a QTL with opposite effects between the dent and the admixed genetic backgrounds. It suggests that QTL7.2 interacts with the genetic background for MF.

Discussion

Accounting for genetic groups in GWAS

The stratification of the population sample into distinct genetic groups is a common feature in GWAS studies that challenges the methods to detect QTLs. A simple way to deal with genetic groups is to analyze them separately. In our study, a standard GWAS model M1 was applied separately to the dent and the flint datasets. Among the QTLs detected for MF, only one was detected in both dent and flint datasets, and not at the same SNPs, while none were detected in common for FF. One may question whether observing such differences between datasets indicated group specific allele effects, or simply group differences in terms of statistical power due to a difference in allele frequency. This question often arises when GWAS is applied separately to genetic groups, as in maize [16, 56] or dairy cattle [57, 58], and is very difficult to answer except for obvious configurations such as associations at SNPs segregating only in one group.

Another way to handle genetic groups is to analyze them jointly. One possibility is to apply model M1 while specifying genetic structure as a global fixed effect, in order to prevent the detection of spurious associations. In dairy cattle, this strategy generally improved the precision concerning QTL locations by taking advantage of the low LD extent observed in multi-group datasets. However, while [34] and [33] observed a gain in statistical power due to a larger population size, [32] detected less QTLs by combining breeds compared to separate analyses. They attributed this finding to the limited amount of QTLs segregating within both Holstein and Jersey breeds, but also reported that QTLs detected in both breeds showed only small to medium correlations between within-breed estimates of SNP effects (e.g. 0.082 for milk yield). Obviously, applying M1 jointly to genetic groups does not address directly the problem of whether QTL effects are conserved or not between genetic groups.

A model specifying group specific allele effects was referred to as M2 in this study. As with M1, the existence of a SNP effect can be tested for each group, but M2 also allows one to test the existence of a general and a divergent SNP effects between groups. In our study, this model allowed to test for a dent () and a flint () SNP effect, along with a general () and a divergent () SNP effects between flint and dent ancestries. Note that testing is similar, although not strictly equivalent, to testing a SNP effect by applying M1 to a multi-group dataset. Using in M2, the same weights are given to allelic contrasts in the two groups. Applying M1 to a multi-group dataset would only be equivalent to applying M2 when considering markers with identical allele frequencies in the two groups. Using the hypotheses specifically tested in M2 ( and ), it was possible to detect new QTLs that were not detected with M1. In particular, a QTL detected on chromosome 1 for FF had a divergent SNP effect between the dent and flint genetic groups, suggesting the existence of group-specific QTL effects in this dataset. Some QTLs were detected in common with M1 but each strategy allowed the detection of specific QTLs, demonstrating the complementarity between the models. In conclusion, M2 was efficient to identify QTLs with either conserved or specific allele effects between ancestries, but observing group-specific allele effects provided little insight regarding the cause of this specificity. Admixed individuals helped to tackle this issue.

Benefits from admixed individuals

Admixed individuals were generated for this study by mating pure individuals of each group according to a sparse factorial design. Integrating these admixed individuals in GWAS can be done by simply analyzing the joint multi-group dataset using M1 or M2, which may lead to a gain in statistical power, due to an increase in population size. More interestingly, admixed individuals can be used to disentangle the factors causing the heterogeneity of allele effects across groups.

We developed model M3 to distinguish the allele ancestry (dent/flint) and the genetic background (dent/flint/admixed). As shown using simulations (S2 Appendix), applying M3 should result in a gain in statistical power by (i) testing an overall SNP effect for SNP with conserved effects accross ancestries and/or genetic backgrounds, and (ii) testing hypotheses for complex configurations between allele effects. When applied to MF, 17 QTLs were detected (20% FDR). While many of these QTLs were previously detected using M1 and M2, the new hypotheses tested allowed us to discover new interesting regions.

For equivalent tests in M1, M2 and M3 (e.g. Δm (Dent) in M1, in M2 and in M3), the lower number of associations detected with M2 and M3 compared to M1 for real traits can be attributed to a different filtering on allele frequencies, the use of an approximate model for M2 and M3, and to the randomness associated with a particular experiment. Regarding false positive control, the observation of the QQ-plots of the test p-values of M1, M2 and M3 did not show particular problems, as presented for MF in S8, S9, and S10 Figs and for FF in S11, S12 and S13 Figs.

The idea of exploiting admixed individuals has been proposed in the creation of NAM [40] and MAGIC [41] populations. Compared to our approach, such experimental populations include a limited number of founders, generally selected in different genetic groups. This is beneficial to increase power of detection for alleles which were rare in parental groups. However these populations cannot address the question of the epistatic interaction with the genetic background of the original groups. Both our approach and NAM and MAGIC designs are therefore expected to have complementary properties.

Heterogeneity of maize flowering QTL allele effects

From a global perspective, a high number of QTLs have been detected in previous maize studies [16, 22, 37, 59, 60]. When evaluating the American and European NAMs, [22] and [61] showed that flowering time is a trait controlled by a large number of QTLs, many of which display variable effects across individual recombinant populations. Our study highlights consistently a high number of QTLs and confirms a large variation in allele effects. It provides further elements on the origin of this variation, by identifying QTLs affected by local genomic differences, epistasis with the genetic background, or both.

When doing GWAS in a multi-group population, geneticists generally assume that QTL effects are conserved between groups. Such QTLs were detected in our study with the example of the SNP associated with MF in the vicinity of Vgt2 [15] and its candidate gene: the flowering activator ZCN8 [6264] on chromosome 8. At this SNP, all hypotheses that tested a general SNP effect had a high -log10(pval), and conversely for hypotheses testing a divergent SNP effect. When simultaneously interpreting all tests, Vgt2 appeared to have an effect that is conserved between genetic groups. Such a QTL can easily be detected in a multi-group population sample using a standard GWAS model [1]. However many QTLs showed more complex patterns.

When group-specific allele effects are only due to group differences in LD or group-specific mutations at the QTL, the difference in allele effects should be conserved between the pure and the admixed genetic backgrounds. A first QTL matching this situation is Vgt1 [22, 47, 48] (candidate gene: ZmRap2.7) that was detected by a SNP located on chromosome 8. High -log10(pval) were observed when testing for a divergent SNP effect between ancestries (), suggesting a local genomic difference. It remains difficult to disentangle the effect of LD from that of a genetic mutation without complementary analysis. LD was shown to be different between groups, with a higher LD extent in the dent group (S5 Fig), while LD phases appeared well-conserved at short distances (S6 Fig). However, a strong overall conservation of LD phases at short distances does not exclude a specific configuration for a given SNP-QTL pair. Note that Vgt1 was surprisingly not detected using the MITE located 548 Kbp before the detected SNP. [48] already showed the existence of other genetic variants being more associated with maize flowering than the MITE in the vicinity of Vgt1, such as CGindel587. Another QTL (QTL4.1) was detected by a SNP located on chromosome 4 and had a very similar profile to that of Vgt1. Its position is close (< 700 Kbp) to GRMZM2G126253, a candidate gene for maize flowering time proposed by [60]. To validate the hypothesis of a local genomic difference at these QTLs, one could produce near isogenic lines with the two alleles from both ancestries introgressed in a dent and a flint genetic backgrounds. A phenotypic evaluation of these individuals would give a definitive proof of a local genomic difference.

Group-specific allele effects may also be due to an interaction with the genetic background. A first QTL matching this profile was detected by a SNP in the vicinity of Vgt3 on chromosome 3 [54, 55] and its candidate gene ZmMADS69 [65]. This QTL showed an effect varying according to the genetic background: large in the dent, intermediate in the admixed and small in the flint. A high -log10(pval) was observed for tests that supported this hypothesis: a dent SNP effect in the dent genetic background () and a divergent dent SNP effect between genetic backgrounds (). If this interaction with the background involves numerous loci, introgressing alleles from a dent into a flint genetic background may lead to disappointing results, as the effect would probably vanish with repeated back-cross generations. If interactions mostly involve a single locus, the effect at Vgt3 effect is conditioned by the allele at the other locus, so that a simultaneous introgression may be necessary to reach the desired effect. Using near isogenic lines that cumulated an early mutation at Vgt1 [66] and the early allele at Vgt3, the effect of Vgt3 was shown to vanish in presence of the early allele of Vgt1 (A. Charcosset pers. comm.), which supports the hypothesis of Vgt3 interacting with the genetic background. Recently, [65] demonstrated the action of ZmMADS69, the candidate gene of Vgt3, as being an activator of the regulatory module ZmRap2.7—ZCN8, which are the candidate genes of Vgt1 and Vgt2, respectively. The existence of such interactions is consistent with flowering time being controlled by a network of interacting loci, as now well established in model species arabidopis [67].

Other examples of QTLs interacting with the genetic background were identified. Two of them featured a similar profile in the sense that they mainly exhibited a QTL effect in the admixed genetic background. One was located on chromosome 2 (QTL2.1) and showed a flint effect in the admixed genetic background, while the other QTL was located on chromosome 7 (QTL7.2) and showed an opposite dent effect between the dent and the admixed genetic backgrounds. Such QTLs are interesting as they are mainly revealed when creating admixed genetic material. They also suggest complex epistatic interactions between QTLs for these traits. The position of QTL2.1 is close (< 1.4 Mbp) to ereb197 and the position of QTL7.2 is close (< 100 Kbp) to dof47. Both are candidate genes for maize flowering time proposed by [60].

The existence of epistatic interactions was also evaluated globally by decomposing the genetic variance into an additive and an epistatic component, as suggested by [68]. This confirmed the existence of epistatic interactions between pairs of loci for FF and MF (S5 Table) and supported the possibility of QTLs interacting with the genetic background, resulting from epistatic interactions with loci that have differentiated allele frequencies between groups. It would be interesting to test the existence of epistatic interactions between each pair of loci. However, a filtering on crossed allele frequencies between pairs of loci would lead to discard most SNPs from the analysis. Other possibilities would be to test the epistatic variance of each SNP against the polygenic background, as proposed by [6971].

Conclusion

In this study, we proposed an innovative multi-group GWAS method which accounts and tests for the heterogeneity of QTL allele effects between groups. The addition of admixed individuals to the dataset was useful to disentangle the factors causing the heterogeneity of allele effects, being either local genomic differences or epistatic interactions with the genetic background. Only homozygous inbred lines were considered in this study, but the method may be generalized to heterozygous individuals. Recently many studies focused on the problem of genomic prediction across genetic groups [42, 7275]. In such scenarios, the stability of QTL effects across genetic backgrounds is an important factor impacting the prediction accuracy. It is also an important factor of the relevancy of any marker based diagnostic in complex/structured populations. Our approach opens new perspectives to investigate this stability in a wide range of species.

Supporting information

S1 Fig. Imputation diagram of admixed lines.

Diagram illustrating the procedure applied to impute admixed DH lines from 15K to 600K SNPs using the parental origin of alleles.

https://doi.org/10.1371/journal.pgen.1008241.s001

(TIF)

S2 Fig. Histogram of dent genome proportion among admixed lines.

https://doi.org/10.1371/journal.pgen.1008241.s002

(TIF)

S3 Fig. Genome-wide selection biases among admixed lines.

Absolute difference between observed allele frequency of the reference allele fo estimated on the admixed lines and their expected value fe along each chromosome (|fofe|). The expected allele frequencies were computed as the mean of flint and dent allele frequencies estimated on the parental lines by taking into account the contribution of each parent. A cubic smoothing spline was adjusted using the R function “smooth.spline”, and plotted in red.

https://doi.org/10.1371/journal.pgen.1008241.s003

(TIF)

S4 Fig. PCoA on genetic distances using the set of 9,015 shared SNPs between the 600K and 15K datasets.

Individuals were colored depending on their genetic background: dent, flint or admixed.

https://doi.org/10.1371/journal.pgen.1008241.s004

(TIF)

S5 Fig. LD extent.

LD extent estimated separately in dent and flint genetic groups using the standard r2. LD was calculated and averaged for loci pairs characterized by a similar physical distance ranging from 0 to 2 Mbp, considering a sliding window of 1Kbp. A cubic smooth spline was adjusted for each group, using the R function “smooth.spline”.

https://doi.org/10.1371/journal.pgen.1008241.s005

(TIF)

S6 Fig. Conservation of LD phases.

Conservation of LD phases estimated using the correlation (a) between the r of dent and flint groups, and (b) between the signs of r in the dent and flint groups. LD was calculated and averaged for loci pairs characterized by a similar physical distance ranging from 0 to 2 Mbp, considering a sliding window of 1Kbp. A cubic smooth spline was adjusted for each method, using the R function “smooth.spline”.

https://doi.org/10.1371/journal.pgen.1008241.s006

(TIF)

S7 Fig. Position of QTLs detected for FF.

Position of QTLs detected for FF with a FDR of 20% using (a) M1, (b) M2 and (c) M3. The size of the grey dots is proportional to the -log10(pval) of the test at the most significant SNP of the region.

https://doi.org/10.1371/journal.pgen.1008241.s007

(TIF)

S1 Table. Parameters estimated in the phenotypic analysis.

The lines “Row-Column” refer to the modeling of rows and columns as defined by the experimental design. AR1 refers to the autoregressive model AR1, while IID refers to the modeling of rows and columns as being independent and identically distributed among rows and among columns for a given trial. For more information, see the ASReml-R reference manual by [49]. The mean of each trial j (with j ∈ {2015, 2016}) was computed following: where Nk is the number of individuals (genotypes) in genetic background k (with k ∈ {D, A, F}) and N is the total number of individuals. The mean of each genetic background was computed following: . The genetic variance of each genetic background k and the GxE variance of each genetic background k in each trial j were also reported. The heritabilities of each genetic background k were computed as: where is the mean number of genotype replicates in trial j.

https://doi.org/10.1371/journal.pgen.1008241.s014

(XLSX)

S2 Table. Parameters estimated using the general polygenic model.

The parameters included the mean μk and, the genetic variance of each genetic background, the genetic covariance between genetic background k and k′, and the error variance , with k ∈ {D, A, F}. The genetic correlations rkk between genetic backgrounds were also reported, with .

https://doi.org/10.1371/journal.pgen.1008241.s015

(XLSX)

S3 Table. Information regarding significant SNPs for MF.

Information regarding significant SNPs for MF using all GWAS strategies: the name of the SNP, the chromosome on which it is located, its position in bp along the chromosome, the frequency of the allelic state observed in the dataset in which it was tested, the GWAS model applied, the hypothesis tested, the estimated values of the contrast (Delta), the Wald statistics and the -log10(pval) of the test (obtained from the approximate model for M2 and M3), and the FDR for which it was declared significant.

https://doi.org/10.1371/journal.pgen.1008241.s016

(XLSX)

S4 Table. Information regarding significant SNPs for FF.

Information regarding significant SNPs for FF using all GWAS strategies: the name of the SNP, the chromosome on which it is located, its position in bp along the chromosome, the frequency of the allelic state observed in the dataset in which it was tested, the GWAS model applied, the hypothesis tested, the estimated values of the contrast (Delta), the Wald statistics and the -log10(pval) of the test (obtained from the approximate model for M2 and M3), and the FDR for which it was declared significant.

https://doi.org/10.1371/journal.pgen.1008241.s017

(XLSX)

S5 Table. Additive, epistatic and residual variance components for each trait with the p-value (pval) of the epistatic component using a likelihood-ratio LR test.

The existence of epistasis can be investigated using a test based on variance components. The epistatic variance component between pairs of loci was estimated on the joint dent, flint and admixed dataset using a model neglecting genetic structure: y = 1μ + g + ge + e, where y is the vector of phenotypes, 1 is a vector of 1, μ is the global intercept, g is the vector of additive genetic values with , K is the kinship matrix computed following Eq (2) and assuming a common genetic background for all individuals, i.e. using the average frequency of allele 1 at each locus, is the global genetic variance, ge is the vector of global epistatic deviations with , is the epistatic genetic variance between pairs of loci, e is the vector of errors with , I is the identity matrix, is the error variance. Note that KK is the Hadamard product of the kinship matrix with itself. This model can be seen as a simplified version of the one proposed by [68], as purely homozygous lines were used. The epistatic variance component was tested using a LR test between this model and the same model without the term ge.

https://doi.org/10.1371/journal.pgen.1008241.s018

(XLSX)

S1 Appendix. Interpretation of the test .

This appendix shows that tests for an epistatic interaction between the SNP and the genetic background.

https://doi.org/10.1371/journal.pgen.1008241.s019

(PDF)

S2 Appendix. False discovery rate and statistical power of GWAS models.

In this appendix, the properties of the new GWAS models were evaluated in terms of false discovery rate and statistical power of the tests.

https://doi.org/10.1371/journal.pgen.1008241.s020

(PDF)

Acknowledgments

We thank Stéphane Nicolas (GQE—Le Moulon) for his contribution to genotypic data assembly. We thank Bernard Lagardère, Jean-René Loustalot (INRA Saint-Martin de Hinx) for their contribution to the panel assembly and the coordination of seed production, all the breeding companies partners of the Amaizing project for the production of admixed lines and the company Limagrain for the genotyping of admixed lines. We are grateful to partners of the CornFed project: Univ. Hohenheim (Germany), CSIC (Spain), CRAG (Spain), CRB Maize (France) and to MTA ATK (Hungary), NCRPIS (USA), CRA-MAC (Italy) who contributed to genetic material.

References

  1. 1. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics. 2006;38:203–208. pmid:16380716
  2. 2. Wright S. Evolution in Mendelian populations. Genetics. 1931;16:97–159. pmid:17246615
  3. 3. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. pmid:16862161
  4. 4. Rincent R, Moreau L, Monod H, Kuhn E, Melchinger AE, Malvar RA, et al. Recovering Power in Association Mapping Panels with Variable Levels of Linkage Disequilibrium. Genetics. 2014;197(1):375–387. pmid:24532779
  5. 5. Pritchard JK, Przeworski M. Linkage Disequilibrium in Humans: Models and Data. The American Journal of Human Genetics. 2001;69(1):1–14. pmid:11410837
  6. 6. Rogers AR. How Population Growth Affects Linkage Disequilibrium. Genetics. 2014;197(4):1329–1341. pmid:24907258
  7. 7. Sawyer SL, Mukherjee N, Pakstis AJ, Feuk L, Kidd JR, Brookes AJ, et al. Linkage disequilibrium patterns vary substantially among populations. European Journal Of Human Genetics. 2005;13:677–686. pmid:15657612
  8. 8. Evans DM, Cardon LR. A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. The American Journal of Human Genetics. 2005;76:681–687. pmid:15719321
  9. 9. de Roos APWM, Hayes BJ, Spelman RJ, Goddard ME. Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 2008;179:1503–1512. pmid:18622038
  10. 10. Porto-Neto LR, Kijas JW, Reverter A. The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes. Genetics Selection Evolution. 2014;46(1):22.
  11. 11. Badke YM, Bates RO, Ernst CW, Schwab C, Steibel JP. Estimation of linkage disequilibrium in four US pig breeds. BMC Genomics. 2012;13(1):24. pmid:22252454
  12. 12. Hao C, Wang L, Ge H, Dong Y, Zhang X. Genetic Diversity and Linkage Disequilibrium in Chinese Bread Wheat (Triticum aestivum L.) Revealed by SSR Markers. PLOS ONE. 2011;6(2):1–13.
  13. 13. Van Inghelandt D, Reif JC, Dhillon BS, Flament P, Melchinger AE. Extent and genome-wide distribution of linkage disequilibrium in commercial maize germplasm. Theoretical and Applied Genetics. 2011;123(1):11–20. pmid:21404061
  14. 14. Technow F, Riedelsheimer C, Schrag TA, Melchinger AE. Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects. Theoretical and Applied Genetics. 2012;125(6):1181–1194. pmid:22733443
  15. 15. Bouchet S, Servin B, Bertin P, Madur D, Combes V, Dumas F, et al. Adaptation of Maize to Temperate Climates: Mid-Density Genome-Wide Association Genetics and Diversity Patterns Reveal Key Genomic Regions, with a Major Contribution of the Vgt2 (ZCN8) Locus. PLOS ONE. 2013;8(8):1–17.
  16. 16. Rincent R, Nicolas S, Bouchet S, Altmann T, Brunel D, Revilla P, et al. Dent and Flint maize diversity panels reveal important genetic potential for increasing biomass production. Theoretical and Applied Genetics. 2014;127(11):2313–2331. pmid:25301321
  17. 17. Stryjecki C, Alyass A, Meyre D. Ethnic and population differences in the genetic predisposition to human obesity. Obesity Reviews. 2018;19(1):62–80. pmid:29024387
  18. 18. Tang H. Confronting ethnicity-specific disease risk. Nature Genetics. 2006;38(1):12–15.
  19. 19. Helgadottir A, Manolescu A, Helgason A, Thorleifsson G, Thorsteinsdottir U, Gudbjartsson DF, et al. A variant of the gene encoding leukotriene A4 hydrolase confers ethnicity-specific risk of myocardial infarction. Nature Genetics. 2006;38(1):68–74. pmid:16282974
  20. 20. Barroso I, Luan J, Wheeler E, Whittaker P, Wasson J, Zeggini E, et al. Population-Specific Risk of Type 2 Diabetes Conferred by HNF4A P2 Promoter Variants. Diabetes. 2008;57(11):3161–3165. pmid:18728231
  21. 21. Neuman RJ, Wasson J, Atzmon G, Wainstein J, Yerushalmi Y, Cohen J, et al. Gene-Gene Interactions Lead to Higher Risk for Development of Type 2 Diabetes in an Ashkenazi Jewish Population. PLOS ONE. 2010;5(3):1–6.
  22. 22. Buckler ES, Holland JB, Bradbury PJ, Acharya CB, Brown PJ, Browne C, et al. The Genetic Architecture of Maize Flowering Time. Science. 2009;325(5941):714–718. pmid:19661422
  23. 23. Durand E, Bouchet S, Bertin P, Ressayre A, Jamin P, Charcosset A, et al. Flowering Time in Maize: Linkage and Epistasis at a Major Effect Locus. Genetics. 2012;190(4):1547–1562. pmid:22298708
  24. 24. Evangelou E, Ioannidis JPA. Meta-analysis methods for genome-wide association studies and beyond. Nature Genetics. 2013;14:379–389.
  25. 25. Li YR, Keating BJ. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Medicine. 2014;6(10):91. pmid:25473427
  26. 26. Ioannidis JPA, Ntzani EE, Trikalinos TA. ‘Racial’ differences in genetic effects for complex diseases. Nature Genetics. 2004;36(12):1312–1318. pmid:15543147
  27. 27. Marigorta UM, Navarro A. High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants. PLOS Genetics. 2013;9(6):1–13.
  28. 28. Ntzani EE, Liberopoulos G, Manolio TA, Ioannidis JPA. Consistency of genome-wide associations across major ancestral groups. Human Genetics. 2012;131(7):1057–1071. pmid:22183176
  29. 29. Cole JB, VanRaden PM, O’Connell JR, Van Tassell CP, Sonstegard TS, Schnabel RD, et al. Distribution and location of genetic effects for dairy traits. Journal of Dairy Science. 2009;92(6):2931–2946. pmid:19448026
  30. 30. Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic Architecture of Complex Traits and Accuracy of Genomic Prediction: Coat Colour, Milk-Fat Percentage, and Type in Holstein Cattle as Contrasting Model Traits. PLOS Genetics. 2010;6(9):1–11.
  31. 31. Cole JB, Wiggans GR, Ma L, Sonstegard TS, Lawlor TJ, Crooker BA, et al. Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary U.S. Holstein cows. BMC Genomics. 2011;12(1):408. pmid:21831322
  32. 32. Raven LA, Cocks BG, Hayes BJ. Multibreed genome wide association can improve precision of mapping causative variants underlying milk production in dairy cattle. BMC Genomics. 2014;15(1):62. pmid:24456127
  33. 33. van den Berg I, Boichard D, Lund MS. Comparing power and precision of within-breed and multibreed genome-wide association studies of production traits using whole-genome sequence data for 5 French and Danish dairy cattle breeds. Journal of Dairy Science. 2016;99(11):8932–8945. pmid:27568046
  34. 34. Sanchez MP, Govignon-Gion A, Croiseau P, Fritz S, Hozé C, Miranda G, et al. Within-breed and multi-breed GWAS on imputed whole-genome sequence variants reveal candidate mutations affecting milk protein composition in dairy cattle. Genetics Selection Evolution. 2017;49(1):68.
  35. 35. Flint-Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM, Mitchell SE, et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. The Plant Journal. 2005;44(6):1054–1064. pmid:16359397
  36. 36. Camus-Kulandaivelu L, Veyrieras JB, Madur D, Combes V, Fourmann M, Barraud S, et al. Maize Adaptation to Temperate Climate: Relationship Between Population Structure and Polymorphism in the Dwarf8 Gene. Genetics. 2006;172(4):2449–2463. pmid:16415370
  37. 37. Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM, et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biology. 2013;14(6):R55. pmid:23759205
  38. 38. Rius M, Darling JA. How important is intraspecific genetic admixture to the success of colonising populations? Trends in Ecology & Evolution. 2014;29(4):233–242. https://doi.org/10.1016/j.tree.2014.02.003.
  39. 39. Brandenburg JT, Mary-Huard T, Rigaill G, Hearne SJ, Corti H, Joets J, et al. Independent introductions and admixtures have contributed to adaptation of European maize and its American counterparts. PLOS Genetics. 2017;13(3):1–30.
  40. 40. McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, et al. Genetic Properties of the Maize Nested Association Mapping Population. Science. 2009;325(5941):737–740. pmid:19661427
  41. 41. Cavanagh C, Morell M, Mackay I, Powell W. From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants. Current Opinion in Plant Biology. 2008;11(2):215–221. https://doi.org/10.1016/j.pbi.2008.01.002. pmid:18295532
  42. 42. Rio S, Mary-Huard T, Moreau L, Charcosset A. Genomic selection efficiency and a priori estimation of accuracy in a structured dent maize panel. Theoretical and Applied Genetics. 2019;132(1):81–96. pmid:30288553
  43. 43. Bordes J, Dumas de Vaulx R, Lapierre A, Pollacsek M. Haplodiploidization of maize (Zea mays L.) through induced gynogenesis assisted by glossy markers and its use in breeding. Agronomie. 1997;17:291–297.
  44. 44. Unterseer S, Bauer E, Haberer G, Seidel M, Knaak C, Ouzunova M, et al. A powerful tool for genome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array. BMC Genomics. 2014;15(1):823. pmid:25266061
  45. 45. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. The American Journal of Human Genetics. 2009;84(2):210–23. pmid:19200528
  46. 46. Ganal MW, Durstewitz G, Polley A, Bérard A, Buckler ES, Charcosset A, et al. A Large Maize (Zea mays L.) SNP Genotyping Array: Development and Germplasm Genotyping, and Genetic Mapping to Compare with the B73 Reference Genome. PLOS ONE. 2011;6(12):1–15.
  47. 47. Salvi S, Sponza G, Morgante M, Tomes D, Niu X, Fengler KA, et al. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(27):11376–11381. pmid:17595297
  48. 48. Ducrocq S, Madur D, Veyrieras JB, Camus-Kulandaivelu L, Kloiber-Maitz M, Presterl T, et al. Key Impact of Vgt1 on Flowering Time Adaptation in Maize: Evidence From Association Mapping and Ecogeographical Information. Genetics. 2008;178(4):2433–2437. pmid:18430961
  49. 49. Butler DG, Cullis BR, Gilmour AR, Gogel BJ, Thompson R. ASReml-R Reference Manual Version 4; 2009. VSN International Ltd, Hemel Hempstead, HP1 1ES, UK.
  50. 50. Wientjes YCJ, Bijma P, Vandenplas J, Calus MPL. Multi-population Genomic Relationships for Estimating Current Genetic Variances Within and Genetic Correlations Between Populations. Genetics. 2017;207(2):503–515. pmid:28821589
  51. 51. VanRaden PM. Efficient Methods to Compute Genomic Predictions. Journal of Dairy Science. 2008;91(11):4414–4423. pmid:18946147
  52. 52. Laporte F, Charcosset A, Mary-Huard T. Efficient ReML inference in Variance Component Mixed Models using Min-Max algorithms. 2019. Forthcoming.
  53. 53. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. 1995;57(1):289–300.
  54. 54. Salvi S, Corneti S, Bellotti M, Carraro N, Sanguineti MC, Castelletti S, et al. Genetic dissection of maize phenology using an intraspecific introgression library. BMC plant biology. 2011;11:4. pmid:21211047
  55. 55. Salvi S, Emanuelli F, Soriano JM, Zamariola L, Giuliani S, Bovina R, et al. Cloning of Vgt3, a major QTL for flowering time in maize. In: 59th Annual Maize Genetics Conference; 2017.
  56. 56. Revilla P, Rodríguez VM, Ordás A, Rincent R, Charcosset A, Giauffret C, et al. Association mapping for cold tolerance in two large maize inbred panels. BMC Plant Biology. 2016;16(1):127. pmid:27267760
  57. 57. Buitenhuis B, Janss LL, Poulsen NA, Larsen LB, Larsen MK, Sørensen P. Genome-wide association and biological pathway analysis for milk-fat composition in Danish Holstein and Danish Jersey cattle. BMC Genomics. 2014;15(1):1112. pmid:25511820
  58. 58. Buitenhuis B, Poulsen NA, Larsen LB, Sehested J. Estimation of genetic parameters and detection of quantitative trait loci for minerals in Danish Holstein and Danish Jersey milk. BMC Genetics. 2015;16(1):52. pmid:25989905
  59. 59. Chardon F, Virlon B, Moreau L, Falque M, Joets J, Decousset L, et al. Genetic architecture of flowering time in maize as inferred from quantitative trait loci meta-analysis and synteny conservation with the rice genome. Genetics. 2004;168(4):2169–2185. pmid:15611184
  60. 60. Li Yx, Li C, Bradbury PJ, Liu X, Lu F, Romay CM, et al. Identification of genetic variants associated with maize flowering time using an extremely large multi-genetic background population. The Plant Journal. 2016;86(5):391–402. pmid:27012534
  61. 61. Giraud H, Lehermeier C, Bauer E, Falque M, Segura V, Baulaud C, et al. Linkage Disequilibrium with Linkage Analysis of Multiline Crosses Reveals Different Multiallelic QTL for Hybrid Performance in the Flint and Dent Heterotic Groups of Maize. Genetics. 2014;198(4):1717–1734. pmid:25271305
  62. 62. Meng X,Muszynski MG, Danilevskaya ON. The FT-Like ZCN8 Gene Functions as a Floral Activator and Is Involved in Photoperiod Sensitivity in Maize. The Plant Cell. 2011;23(3):942–960. pmid:21441432
  63. 63. Lazakis CM, Coneva V, Colasanti J. ZCN8 encodes a potential orthologue of Arabidopsis FT florigen that integrates both endogenous and photoperiod flowering signals in maize Journal of Experimental Botany. 2011;62(14):4833–4842. pmid:21730358
  64. 64. Guo L, Wang X, Zhao M, Huang C, Li C, Li D, et al. Stepwise cis-Regulatory Changes in ZCN8 Contribute to Maize Flowering-Time Adaptation Current Biology. 2018;28(18):3005–3015. pmid:30220503
  65. 65. Liang Y, Liu Q, Wang X, Huang C, Xu G, Hey S, et al. ZmMADS69 functions as a flowering activator through the ZmRap2.7-ZCN8 regulatory module and contributes to maize flowering time adaptation. New Phytologist. 2019;221(4):2335–2347. pmid:30288760
  66. 66. Chardon F, Hourcade D, Combes V, Charcosset A. Mapping of a spontaneous mutation for early flowering time in maize highlights contrasting allelic series at two-linked QTL on chromosome 8. Theoretical and Applied Genetics. 2005;112(1):1–11. pmid:16244856
  67. 67. Bouché F, Lobet G, Tocquin P, Périlleux C. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Research. 2015;44(D1):D1167–D1171. pmid:26476447
  68. 68. Vitezica ZG, Legarra A, Toro MA, Varona L. Orthogonal Estimates of Variances for Additive, Dominance, and Epistatic Effects in Populations. Genetics. 2017;206(3):1297–1307. pmid:28522540
  69. 69. Jannink JL. Identifying Quantitative Trait Locus by Genetic Background Interactions in Association Studies. Genetics. 2007;176(1):553–561. pmid:17179077
  70. 70. Crawford L, Zeng P, Mukherjee S, Zhou X. Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLOS Genetics. 2017;13(7):1–37.
  71. 71. Legarra A, Vitezica ZG, Naval-Sánchez M, Henshall J, Raidan F, Li Y, et al. Association analysis of loci implied in “buffering” epistasis. bioRxiv. 2019;637579.
  72. 72. de Roos APW, Hayes BJ, Goddard ME. Reliability of Genomic Predictions Across Multiple Populations. Genetics. 2009;183(4):1545–1553. pmid:19822733
  73. 73. Chen L, Schenkel F, Vinsky M, Crews DH, Li C. Accuracy of predicting genomic breeding values for residual feed intake in Angus and Charolais beef cattle. Journal of Animal Science. 2013;91(10):4669–4678. pmid:24078618
  74. 74. Guo Z, Tucker DM, Basten CJ, Gandhi H, Ersoz E, Guo B, et al. The impact of population structure on genomic prediction in stratified populations. Theoretical and Applied Genetics. 2014;127(3):749–762. pmid:24452438
  75. 75. Duhnen A, Gras A, Teyssèdre S, Romestant M, Claustres B, Dayde J, et al. Genomic Selection for Yield and Seed Protein Content in Soybean: A Study of Breeding Program Data and Assessment of Prediction Accuracy. Crop Science. 2017;57(3):1325–1337.