The genetic architecture of the maize progenitor, teosinte, and how it was altered during maize domestication

The genetics of domestication has been extensively studied ever since the rediscovery of Mendel’s law of inheritance and much has been learned about the genetic control of trait differences between crops and their ancestors. Here, we ask how domestication has altered genetic architecture by comparing the genetic architecture of 18 domestication traits in maize and its ancestor teosinte using matched populations. We observed a strongly reduced number of QTL for domestication traits in maize relative to teosinte, which is consistent with the previously reported depletion of additive variance by selection during domestication. We also observed more dominance in maize than teosinte, likely a consequence of selective removal of additive variants. We observed that large effect QTL have low minor allele frequency (MAF) in both maize and teosinte. Regions of the genome that are strongly differentiated between teosinte and maize (high FST) explain less quantitative variation in maize than teosinte, suggesting that, in these regions, allelic variants were brought to (or near) fixation during domestication. We also observed that genomic regions of high recombination explain a disproportionately large proportion of heritable variance both before and after domestication. Finally, we observed that about 75% of the additive variance in both teosinte and maize is “missing” in the sense that it cannot be ascribed to detectable QTL and only 25% of variance maps to specific QTL. This latter result suggests that morphological evolution during domestication is largely attributable to very large numbers of QTL of very small effect.

Introduction Ever since Charles Darwin employed domestication as a model for natural evolution [1], crop domestication has been extensively studied by evolutionary biologists [2][3][4][5][6][7]. Part of the power of domestication as a model for evolution is that the crop ancestors are extant and cross-compatible with crop species, allowing genetic dissection of the inheritance of domestication traits. Domestication also allows us to discern the impact of selection on genetic architecture through comparison of pre-and post-domestication populations. Since the genetic architecture of the ancestral species influences the possible outcomes when selection is applied to move a population to a new optimum, such comparisons can reveal how genetic architecture can both facilitate and constrain domestication [8].
Maize and teosinte offer an excellent system for genetic analysis of domestication. Both phylogenetic and archaeological evidence revealed that maize was domesticated from Balsas teosinte (Z. mays ssp. parviglumis) by a single domestication event in southern Mexico about 9,000 years ago [9,10]. As outcrossing species, teosinte and maize conform to many of the assumptions underlying standard population genetic models, empowering the application of diverse evolutionary analyses [11]. Maize has undergone dramatic morphological change from its wild ancestor teosinte during domestication. A typical teosinte plant has multiple long lateral branches, each tipped with a tassel, whereas a typical maize plant has one or two short branches, each tipped with a single ear. A teosinte plant produces many two-ranked ears, each with only a few fruitcase-enveloped kernels, easily shattering into single-seed units at maturity. In contrast, a maize plant produces only one or two multiple-ranked ears, each with hundreds of naked grains, remaining intact on the cob at maturity. Finally, maize has a large number of genomic and genetic resources available (https://maizegdb.org/), including high-quality genomes of inbred lines [12][13][14][15], high-density genotypic data [16], and diverse mapping populations [17,18].
In this paper, we report a comprehensive comparison of genetic architecture in 18 domestication traits for matched populations of a maize landrace and teosinte. Our overarching goal is to better define genetic architecture in maize and teosinte and ask how it changed as a result of domestication. From our analyses, we observed that genetic architecture has been reshaped during domestication with a substantial reduction in the number of segregating QTL affecting domestication traits. We observed that regions of the genome that are strongly divergent between teosinte and maize (high F ST ) explain less variation in maize than teosinte. We also observed that genomic regions of high recombination explain a disproportionately large proportion of heritable variance both before and after domestication. Finally, we observed that only 25% of the additive variance in both teosinte and maize can be ascribed to specific QTL. Overall, our work suggests that trait evolution during maize domestication is largely attributable to very large numbers of QTL of very small effects.

QTL and their effects
Our maize landrace population shows a strongly reduced number of QTL for domestication traits relative to our teosinte population, indicating that the domestication bottleneck and/or selection caused the loss or fixation of many functional alleles. Using a stepwise regression approach, we performed genome-wide QTL scans for 18 domestication traits scored in both our teosinte and maize landrace populations (Table 1). We detected a total of 451 QTL in teosinte (ranging from 3 to 52 per trait) but only 213 QTL in maize landrace (ranging from 0 to 27) (Fig 1; S1 Table). We grouped the 18 traits into three previously defined groups: vegetative/flowering time, reproductive, and environmental response traits [8]. Among the three predefined trait groups, reproductive traits showed the greatest reduction in the number of QTL, and vegetative traits a strong, but more modest reduction. Environmental response traits showed slightly more QTL in maize landrace than in teosinte, but the overall trend is a substantial reduction in QTL numbers after domestication. The most striking difference was observed for GW, for which we detected 52 QTL in teosinte but only 6 QTL in maize landrace (S1 Table). Large effect QTL were observed in both populations but appear more common in maize landrace. Here, we define large effect as a QTL with a standardized additive effect greater than 1 phenotypic standard deviation, where the additive effect was estimated by stepwise regression. Seven of the 451 QTL in teosinte have a large effect and 12 of the 213 QTL in maize landrace have a large effect (Fig 2; S1 Table). The frequency of 5.6% in maize landrace is significantly larger than 1.6% in teosinte (P = 0.0052, two-sided Fisher's exact test). Thus, large effect QTL occur in both populations but they are more common (both in frequency and total number) in maize landrace. One example of a large effect QTL is a GW QTL on chromosome 4 in teosinte (T/A, S4_184126058, P = 6.38E-50) (Fig 3A), where this QTL has an additive effect of 8.7 mg in GW, which corresponds to a 34% increase from the population mean of 25.6 mg or 1.61 phenotypic standard deviations. Another example is a major effect QTL for PLHT on chromosome 7 in maize landrace (C/T, S7_3086083, P = 4.10E-28) (Fig 3B), where this QTL has an additive effect of 44.3 cm in PLHT, which corresponds to a 19% decrease from the population mean of 230.5 cm or 1.34 phenotypic standard deviations.
Overall, large effect QTL have low MAF for both teosinte and maize landrace, suggesting mutation-selection balance may play an important role in modulating genetic variation in both taxa. By plotting the distribution of standardized additive effect against MAF for all QTL (Fig 2), we observed that the largest effect QTL have low MAF in both teosinte and maize landrace. The rarest alleles (parent MAF < 0.05) have effects that are in a range from 0.01 to 4.05 phenotypic standard deviations, while the effects of the most common alleles (parent MAF > 0.3) never exceed 0.3 phenotypic standard deviations. This is true of both teosinte and maize landrace. For example, the largest effect QTL for GW in teosinte (T/A, S4_184126058, P = 6.38E-50) has a MAF of 0.012 and the minor allele comes from a single one of our 49 teosinte parents (PC_N57_ID2) (Fig 3A). Similarly, the largest effect QTL for PLHT in maize landrace (C/T, S7_3086083, P = 4.10E-28) has a MAF of 0.032 and the minor allele  comes from a single one of our 40 maize landrace parents (164_1) (Fig 3B). These observations suggest that most segregating large effect alleles are likely the product of relatively recent mutation and selection has not yet had sufficient time to either remove them from the population or bring them to fixation.
A concern with this analysis is that effect estimates for rare alleles may be biased upwards because of the relatively small sample size for rare alleles which may cause these QTL to only be detected when their effects are overestimated. To address this concern, we plotted the distribution of standardized additive effect for SNPs whether the SNP is significant or not by each trait, where the additive effects were estimated from our GLM model. Also, to minimize error in the estimation of the effects of rare allele, we required a minimum of 10 homozygous plants for the rare allele. Examination of the plot of MAF by effect size this way still indicates that rare alleles have larger effects for all 18 traits (S1 Fig). This result indicates that bias in the estimate of rare allele effect sizes cannot fully explain the observed relationship between effect size and MAF.
Balancing selection may maintain some genetic variation in both teosinte and maize landrace. Examination of the plots of MAF by effect size shows a few QTL with high frequency and moderately strong effects (Fig 2). Such alleles could be maintained in the population by balancing selection. An interesting example of this is a QTL on chromosome 3 for DTA in teosinte (C/T, S3_160586402, P = 9.64E-16), where the MAF is 0.48 and the additive effect is 0.19 phenotypic standard deviations (Fig 4A). This pattern fits the expectation of balancing selection. Similarly, we observed a QTL on chromosome 1 for GW in maize landrace (A/G, S1_44817082, P = 1.75E-16), where the MAF is 0.45 and the additive effect is 0.25 phenotypic standard deviations (Fig 4B), suggesting this QTL may be under balancing selection.
There is more dominance in maize landrace than teosinte, which may be related to the depletion of additive variance by selection during domestication. Previously, Yang et al. [8] reported that there is more dominance genetic variance than additive variance in maize landrace relative to teosinte, and these authors suggested that this pattern may be the result of selection during domestication having depleted the additive variance. Here, by plotting the ratio of dominance to additive effects (D/A) (Fig 5A), we observed a small shift to greater dominance in maize landrace than in teosinte. The mean absolute value of the D/A ratio is 0.76 in teosinte (ranging from 0.001 to 11.21), while the mean absolute value of the D/A ratio is 1.11 in maize landrace (ranging from 0.01 to 12.77). Moreover, the cumulative distribution plot of D/A shows a significant difference in the two distributions (Fig 5B; P = 1.12E-05, Kolmogorov-Smirnov test). The increase of the D/A ratio in maize landrace relative to teosinte suggests that additive variants were more likely fixed or lost during domestication and thus aligns with the observation of Yang et al. [8].

Candidate genes for QTL in teosinte and maize landrace
Our analyses identify multiple strong candidate genes in both teosinte and maize landrace. Although our population design and stepwise regression analysis provide only limited precision for mapping QTL to the underlying genes, we saw multiple well-defined peaks in the Manhattan plots from our GLM scan of the genome, allowing us to identify candidate genes associated with the most significant QTL peaks (P<1E-20) ( Table 2). There were 9 such QTL in teosinte and 7 QTL in maize landrace. In teosinte, potential candidate genes were defined for DTS, PLHT, LFWD, PROL, ED and GW. For example, the largest effect QTL for DTS is located on chromosome 3 (G/A, S3_161034847, P = 4.87E-40) upstream of SBP-transcription factor 22 and a known flowering time gene MADS-transcription factor 69 (ZmMADS69, Zm00001d042315). Interestingly, the largest QTL for DTA (C/T, S3_160586402, P = 9.64E-16) is nearby, located downstream of ZmMADS69. For GW, the largest effect QTL is located on chromosome 4 (T/A, S4_184126058, P = 6.38E-50) upstream of AP2-EREBP-transcription factor 17. A second interesting candidate is the next gene downstream, trehalose-6-phosphate   [19]. For LFWD, the largest effect QTL is located on chromosome 6 (A/C, S6_170102149, P = 1.73E-25) in a gene encoding 2-oxoglutarate-dependent dioxygenase AOP1, but there is another gene nearby, tassel sheath1 (tsh1, Zm00001d039113), known to effect leaf size. For PLHT, the largest effect QTL is located on chromosome 1 (C/T, S1_156526333, P = 6.45E-28) upstream of a gene encoding Ubox domain-containing protein 13. There is another gene nearby, rough sheath2 (rs2, Zm00001d030737), which encodes a MYB-domain protein and is expressed in lateral organ primordia and their initials. This gene has been reported to affect plant architecture including plant height [20].
In maize landrace, such large effect QTL were detected for PLHT, TILN, EILN and TGWP. For example, the largest effect QTL for PLHT is located on chromosome 7 (C/T, S7_3086083, P = 4.10E-28) just upstream of a gene encoding phosphoprotein phosphatase inhibitors. As was also true for the major PLHT QTL in teosinte, there is a rough sheath1 gene (rs1, Zm00001d018742) nearby [21]. For EILN, the largest effect QTL is located on chromosome 8 (A/C, S8_160608498, P = 1.84E-25) upstream of a gene encoding SUPPRESSOR OF ABI3-5. For TILN, the QTL on chromosome 2 (C/T, S2_5692647, P = 1.17E-23) is located upstream of a gene encoding a zinc finger domain-containing protein.

Teosinte and maize landrace have a similar level of missing heritability
The variation in our populations can be attributed to both QTL of effect sizes that are large enough to be detected by our QTL scan, as well as very small effect QTL that escape detection but still contribute to heritable variation. The latter control the "missing heritability". By fitting linear models with QTL and/or covariates, we obtained the R 2 values for the QTL alone, family effects (Principal Components, PCs) alone, and R 2 values for variance explained by neither QTL nor family effects. We defined "missing heritability" as the portion of the additive variation not explained by the QTL. We compared these values to the additive genetic variance of each trait reported by Yang et al. [8].
Most heritable variation is "missing" in both teosinte and maize landrace, suggesting that undetectable small effect QTL predominate in the genetic architecture of these populations. We observed that detectable QTL could explain only 26% and 21% of additive genetic variance for teosinte and maize landrace, respectively, signifying that the remaining variation (74% and 79%, respectively) is missing (Fig 6A; S2 Table). This result indicates that QTL of very small effect explain most of the variance in both teosinte and maize landrace and that domestication did not alter the amount of missing heritability appreciably. Part of the missing heritability is captured by family effects (PCs in our stepwise regression model), corresponding to 46% for teosinte and 51% for maize landrace. These percentages may represent functional alleles that are unique to individual parents (rare alleles). Finally, about 17% and 26% of additive genetic variance in teosinte and maize landrace, respectively, cannot be associated with either QTL or family effects and may represent very small effect common alleles that occur in multiple families in our populations.
If alleles of very small effects distributed evenly throughout the genome are predominant in controlling heritable variation, then we should observe a correlation between chromosome length and the proportion of the variance explained. We performed variance component analysis (VCA) for 18 traits by chromosomes. The results show that the proportion of variance explained by chromosome is highly correlated with the length of chromosome for both teosinte (r = 0.84, P = 0.0024) and maize landrace (r = 0.79, P = 0.0069) (Fig 6B), a result consistent with an important role for small effect alleles explaining most heritable variance.

Partitioning of the genetic variance by genomic features
To estimate the proportion of trait genetic variance associated with different features of the genome, we performed variance component analysis for 18 traits in both teosinte and maize landrace. This analysis involved classifying SNPs into subsets depending upon different genomic features, generating kinship matrices for each subset, and fitting these kinship matrices along with the phenotypic data to partition the genetic variance (heritability, h 2 ).
High recombination regions explain most heritable variance in both teosinte and maize landrace. To assess how recombination would affect the heritable variance, we calculated the recombination rate in windows of 10 kb, defined quintiles of the 10 kb recombination rates after accounting for gene density, and used SNPs in each quintile to calculate kinship matrices in teosinte and maize landrace, respectively. By comparing the relative proportion of additive genetic variance across 18 traits, we observed that regions with the highest recombination rate (top quintile) explain a very large proportion of the additive genetic variance on average in both teosinte and maize landrace although they represent only 20% of the genome (Fig 7A), suggesting that high recombination regions harbored most of the variation upon which selection could act during domestication. While variance in these regions was somewhat depleted by domestication, it remains high in maize landrace.
Regions of the genome that are most divergent between teosinte and maize landrace explain more heritable variance in teosinte than in maize landrace. To assess the divergence between teosinte and maize landrace, we calculated F ST [22] using 32.5 million SNPs that were either segregating in teosinte or maize landrace and obtained an average F ST value in windows of 50 SNPs. The 50-SNP F ST value ranges from 0 to 0.77 and has an average of 0.07. For VCA, we defined quintiles of F ST and used SNPs in each quintile to calculate kinship matrices in teosinte and maize landrace, respectively. By comparing the relative proportion of additive genetic variance across 18 traits, we observed that the most divergent bin explains 8% of the additive genetic variance on average in teosinte, which is more than~3% in maize landrace (Fig 7B), suggesting that divergence is associated with depletion of additive genetic variants in the maize landrace.

The inheritance of other traits in teosinte
Beyond the QTL analysis for 18 common core domestication traits in teosinte and maize landrace, we also identified QTL for seven additional traits that were scored in teosinte but not in maize landrace. These traits include BRAN, CULM, FCLN, FCLW, FCTR, SDDM and STAM (S3 Table). We detected a total of 222 QTL, ranging from 6 to 56 per trait (S1 Table). For these traits, we also observed that large effect QTL have low MAF (S2 Fig). For example, we detected a total of 19 QTL for CULM, which is a domestication trait related to vegetative gigantism. There were two large effect QTL for CULM which have a standardized additive effect greater than 1 phenotypic standard deviation (S1 Table). The largest effect QTL was detected on chromosome 1 (G/A, S1_178995091, P = 1.14E-27) but there are no obvious candidate genes nearby. Interestingly, we detected a QTL on chromosome 1 (G/A, S1_75448410, P = 3.06E-09) which is located within the inversion on chromosome 1 (Inv1n, B73 RefGen v4: 65.9-116.7 Mb) where a CULM QTL had previously been mapped [23].
A major change during maize domestication is the loss of seed dormancy from teosinte to maize. We detected six QTL for seed dormancy (SDDM). Among the QTL, there were three large effect QTL which have a standardized additive effect greater than 1 phenotypic standard deviation (S1 Table). The largest effect QTL was detected on chromosome 6 (G/T, S6_151838921, P = 1.10E-22) but there are no obvious candidate genes nearby. Interestingly, none of the six QTL exceed a MAF of 0.03, suggesting mutation-selection balance may play an important role in modulating genetic variation for SDDM. The common allele at five of the six QTL is associated with rapid germination.

Discussion
Our results showed that the maize landrace population has far fewer QTL for domestication traits as compared to our teosinte population. Previous reports suggest that most domesticated crops have experienced a "domestication bottleneck" which reduced their nucleotide diversity relative to their wild ancestors [24]. For maize, there has been on average a reduction in nucleotide diversity of 30% as compared to teosinte and this reduction becomes much more severe (>65%) for genes under selection [25,26]. Given that the domestication bottleneck and/or selection would cause the loss or fixation of many functional alleles at QTL, it was expected to find a reduced number of QTL in our maize landrace relative to teosinte.
Our observation of more and therefore novel QTL in teosinte compared to maize implies that teosinte should possess many useful alleles absent in maize. This represents an opportunity for breeding, as crop wild relatives have been extremely valuable resource to identify allelic variation by genome-wide association studies (GWAS) or quantitative trait locus (QTL) mapping approaches, and these alleles, if favorable, can be introduced into a domesticated background by marker-assisted selection [27,28]. For maize-teosinte, a BC 2 S 3 RIL (recombinant inbred line) population [29] and TeoNAM (teosinte nested association mapping) population [17] have been developed to comprehensively map QTL from teosinte. Using this BC 2 S 3 population, a recent study demonstrated that fine-mapping QTL for maize leaf angle and then transferring the favorable teosinte allele into modern hybrids can enhance high-density maize grain yield [30]. Therefore, the favorable alleles of QTL that we detected in our teosinte population could be introduced into maize cultivars in the future.
Our observation that large effect alleles in both teosinte and maize landrace are rare fits expectations under mutation-selection balance [11]. Rare alleles of large effect would be rapidly removed if detrimental or driven to fixation if beneficial due to strong selection, and thus not be observed at middling frequencies. In contrast, alleles of very small effect experience weaker selection such that they move more slowly to fixation and can exhibit middling frequencies [31,32]. Height in humans is an example from the literature of a trait apparently under mutation-selection balance, showing an inverse relationship between effect size and allele frequency [33]. Such an inverse relationship is also observed from a QTL study of inflorescence traits in maize, in which the number of large-effect QTL is very few and caused by low-frequency SNPs [34]. In our study, one example of a rare allele of large effect is the QTL for GW in teosinte (T/A, S4_184126058, P = 6.38E-50), where the MAF is 0.012 among the progeny and the minor allele comes from a single one of our 49 teosinte parents (PC_N57_ID2) (Fig 3A). Overall, the negative correlation between effect size and MAF in our data suggest an important role for mutation-selection balance in teosinte and maize.
While mutation-selection may be the principal force operating on QTL, we observed some hints that balancing selection may be present as well. In contrast to the expectation that large effect alleles are rare under mutation-selection balance, allele frequencies are expected to be more intermediate if balancing selection is operating [11]. An example of possible balancing selection in our data is the largest effect QTL for DTA in teosinte on chromosome 3 near ZmMADS69 with an MAF of 0.48 (Fig 4A). Flowering time is a trait for which selection pressure may vary from year to year due to annual variation in rainfall and temperature [35]. In this situation, allelic fitness will oscillate from year to year, preventing segregating functional alleles at the locus from either being lost or fixed. However, we note that stabilizing selection on a phenotype which has recently shifted its optimum can also explain high frequency large effect alleles without invoking balancing selection [36].
For seed dormancy in teosinte, we expected to observe evidence for balancing selection in the form of functional alleles maintained at a high frequency since adaptation should favor a mix of phenotypes with seeds that germinate over a broad temporal range. Seed dormancy desynchronizes germination such that not all seed germinates the following season when environmental conditions for plant growth may be unsuitable in some years. Thus, selection should favor allelic diversity in wild species such that at least some seeds will be programed for delayed germination until future growth seasons [37]. Contrary to this expectation, we observed all six QTL detected for seed dormancy do not exceed a MAF of 0.03 which better fits a mutation-selection balance mechanism. In domesticated cultivars, it is desirable to have uniform germination to ensure synchronous maturity and therefore efficient harvesting by farmers [38,39]. Accordingly, there is no seed dormancy in maize, and seeds, once planted, germinate readily.
Although large-effect QTL are rare, we observed the largest allele effects in maize (Fig 2). This reflects the differences in demography and selective forces acting on maize compared to teosinte. Teosinte traits are under selection because of their genetic correlations with fitness. Human selection in maize includes some similar fitness components (such as viability and fecundity), but the genetic correlations among maize traits are generally lower than among teosinte traits [8], and humans also select strongly for ear and kernel type, for example, possibly at the expense of natural fitness traits [40,41]. Therefore, selection coefficients on fitnessrelated traits in maize are likely to be lower than in teosinte, reducing the effectiveness of selection on fixing or purging large-effect alleles. In combination with reduced selection, maize populations are managed by humans, who tend to plant seed from relatively few female parents [42], leading to greater genetic drift; but also migration is likely much more important in maize landraces than in teosinte because of active seed exchange among Mexican farmers [43,44]. Recent migration could introduce novel alleles that were already fixed in other maize landraces, resulting in their segregation in the study population.
Another possibility is that the effective population size of our maize landrace is smaller than that of our teosinte population, so that selection would be more effective in teosinte. According to the Watterson estimator θ = 4N e μ, where N e is the effective population size and μ is the pergeneration mutation rate of the population [45], we calculated θ T for the teosinte population and θ M for the maize population using the common set of 4.2 million SNPs. The results show the mean Watterson's θ in maize is 89.8% smaller than in teosinte (θ M = 3.89E-04; θ T = 4.33E-04). If we use the estimates of mutation rate in maize, which is around 3×10 −8 per site per generation [46,47], then N e in our maize landrace is smaller than N e in our teosinte population, thus selection would be less effective in maize landrace and cannot bring large effect alleles into loss or fixation.
We observed more dominance in maize landrace than teosinte as shown by an increase in D/A ratio (Fig 5), suggesting that the additive genetic variants were depleted at a faster rate than the dominance genetic variants during domestication. This is consistent with the observation of Yang et al. [8] that selection during domestication depleted the additive variance relative to the dominance variance. Population genetic theory has shown that traits that were subject to directional selection have significantly higher dominance estimates than traits that are not subject to strong selection [48,49]. This theory aligns with our observation that maize landrace, which has been subject to strong selection, has an excess of dominant variants relative to teosinte for domestication traits.
Genomic scans can help to identify candidate genes for follow-up analyses. Our stepwise regression approach detected several strong candidate genes in both teosinte and maize landrace. (1) We detected a major effect QTL on chromosome 3 for flowering time in teosinte near ZmMADS69, which is under balancing selection. Several recent studies using maize-teosinte mapping populations have mapped a flowering time QTL containing this gene [17,50,51]. Another recent study further fine-mapped ZmMADS69, which functions as a flowering activator through the ZmRap2.7-ZCN8 regulatory network and contributes to both long-day and short-day adaptation [52]. Notably, these studies investigated ZmMADS69 in maize-teosinte crosses and they did not identify the causative site affecting flowering time. Therefore, our results could be useful for future investigation to discover novel alleles in teosinte alone, and determine the functional variant of ZmMADS69. (2) We also detected a major effect QTL on chromosome 4 for seed weight in teosinte. For this QTL, a candidate gene TPP4 is near the QTL peak. TPP4 is a fully redundant paralogue of RAMOSA3 (RA3) and loss of both RA3 and TPP4 in maize leads to reduced meristem determinacy and more inflorescence branching [53]. (3) Finally, we detected two rough sheath genes as candidate QTL of plant height: rough sheath1 in landrace and rough sheath2 in teosinte. Both genes have been reported to affect plant architecture including plant height [20,21]. It may be worth exploring natural variation in these genes and how domestication has acted on them.
We observed that a large proportion (~75%) of the additive heritability is missing in the sense that it cannot be ascribed to the detected QTL. This is true for both our teosinte and maize landrace populations across ten domestication traits (Fig 6). Missing heritability has been widely investigated in human studies and a well-characterized example is height [54]. The heritability of height has been estimated to be~0.8 by twin studies, while the~50 detected variants (QTL) first associated with height explained only~5% of the phenotypic variance [55]. Subsequent larger studies identified 180 height genes that accounted for only 10% of the phenotypic variation [56] and 697 genes that explained only 20% of the heritable variation [57]. There are three main hypotheses regarding missing heritability [58,59]: (1) missing heritability is largely due to rare variants of large effect, (2) the majority of missing heritability is attributable to common variants of very small effect, and (3) heritability estimates from classic twin studies are biased upward so there actually is no missing heritability.
For our study, large effect but rare variants should be detected because only 49 or 40 parents contributed to the 4,455 or 4,398 progeny. By our design, variants of large effect that are rare in nature become relatively common in our mapping population. Despite this design, we detected relatively few rare QTL of large effect and all QTL together explain only about 25% of the additive variance. Thus, we can exclude the hypothesis that rare alleles of large effect explain an appreciable fraction of the missing heritability. Similarly, we used a robust design and common-garden protocol to estimate trait heritabilities [8], thus it seems unlikely that our heritabilities are biased upward. Moreover, our heritability estimates and QTL detection were performed on precisely the same populations. Thus, our data argue strongly that most heritable variation in both teosinte and landrace is controlled by a mix of rare and common alleles of very small effect sizes below the threshold for detection as individual QTL-that is the infinitesimal model [60,61].
Our variance component analysis shows that SNPs in regions of high recombination explain most heritable variance in both teosinte and maize landrace, suggesting recombination facilitates the maintenance of additive variation in our populations. It has been proposed that there is an interaction between natural selection and the recombination rate, known as the Hill-Robertson effect, such that there is a reduction in the effectiveness of natural selection in regions of low recombination [62,63]. With low recombination, selection on one locus will drag variants at linked loci to fixation thereby reducing variation in the region [64]. A well-characterized example is reduced codon bias in regions of low recombination in Drosophila, which supports that natural selection is less effective against slightly disadvantageous mutations in regions lacking recombination [65]. Our observation of greater additive variance being associated with SNPs in regions of high recombination can be explained by the Hill-Robertson effect although this effect may be confounded with the Bulmer effect [11], as we observed the lower recombination regions have a bit more LD (S4 Table).
We also show that SNPs in regions of high divergence (high F ST ) between maize and teosinte explain more variance in teosinte than maize landrace, suggesting variants in these regions were either removed or brought to fixation during domestication, congruent with the results of Xue et al. [66]. A common feature of crop genomes is regions of reduced nucleotide diversity in the crop relative to its wild progenitor [5,25,28]. One example is a selective sweep of~60-90 kb 5' to the maize domestication gene tb1 that has an extended region of low nucleotide diversity [67,68]. Another example is a large region on chromosome 10 in maize that was the target of strong selection during domestication, producing a 1.1 Mb segment with >15 genes that lost essentially all genetic diversity [69]. Such regions of high diversity in teosinte and low diversity in maize result in higher nucleotide divergence between maize and teosinte (S3 Fig). Thus, the observation that SNPs in regions of high divergence explain more genetic variance in teosinte accords with the expectation that many variants should have been either removed or fixed during and after domestication bottleneck.
The genetic dissection of crop domestication has been a focus of agronomists and evolutionary biologists ever since the rediscovery of Mendel's law of inheritance. In this enterprise, diverse approaches have been utilized including classic Mendelian genetics, phylogenetics, population genetics, genomic scans for selection, analysis of differential gene expression, QTL mapping, association mapping, and cloning/charactering major genes controlling domestication traits [4,5,7]. In this paper, we take a novel approach of comparing the genetics architecture of 18 domestication traits in maize and its ancestor teosinte using matched populations. From our analyses, we infer that there has been a substantial depletion of quantitative genetic variants in maize relative to teosinte in terms of the numbers of segregating QTL for domestication traits. At the same time, the degree of dominance of segregating QTL has increased in maize, likely a consequence of selective removal of additive variants. We observed that regions of the genome that are strongly differentiated between teosinte and maize (high F ST ) explain less quantitative variation in maize than teosinte, suggesting that, in these regions, allelic variants were brought to (or near) fixation during domestication. Finally, we observed that about 75% of the additive variance in both teosinte and maize is missing in the sense that it cannot be ascribed to detectable QTL and only 25% of variance maps to specific QTL. This latter result suggests that morphological evolution during domestication is largely attributable to very large numbers of QTL of very small effect.

Population, genotype and phenotype
Data for our analyses were previously published, and so details concerning population construction, plant growth, phenotyping, and SNP genotyping by Genotype-by-Sequence (GBS) technology for both our teosinte and landrace populations can be found in Yang et al. [8]. Briefly, a population of 70 teosinte plants from the near the town of Palmar Chico in Balsas river drainage of Mexico and a population of 55 maize landrace (Tuxpeño) plants from a nearby location were sampled. DNA from all 125 plants was used for wholegenome-sequencing (WGS). Of the 70 teosinte plants, 49 were used as parents and selfed and intermated to produce a total of 4,455 teosinte progeny. Similarly, of the 55 landrace plants, 40 were used as parents and selfed and intermated to produce a total of 4,398 maize landrace progeny. The parentage of progeny was determined using the GBS data of the parents and progeny.
The teosinte and landrace progeny were grown in neighboring fields near Homestead, Florida over during two winter seasons (2013-14 and 2014-15). Eighteen domestication traits were scored on both the teosinte and landrace progeny and these were the focus of the work of Yang et al. [8] (Table 1). Seven additional traits were scored in teosinte alone and analyzed in this paper (S3 Table). For GBS, a total of 34,899 SNPs was scored for teosinte and 40,255 SNPs for maize landrace. Yang et al. [8] estimated a variety of quantitative genetic parameters for these populations including: additive genetic variance, dominance genetic variance, phenotypic variance, genetic-by-environmental variance, selection intensity, genetic correlation matrix, and genetic variance-covariance matrix. All phenotype and genotype data from Yang et al. [8] are available at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1820997116/-/ DCSupplemental and https://doi.org/10.6084/m9.figshare.7655588.
In this paper, we added to this dataset by determining the WGS for all 125 teosinte and landrace parent plants. We extracted a total of 18 million and 21 million SNPs from the WGS data for teosinte and maize landrace, respectively, after removing sites with missing rate above 10% and heterozygosity rate above 60% in the parents. Using skim-WGS of selected progeny of both teosinte and landrace, we phased the SNPs in the 49 teosinte parents and 40 landrace parents that contributed to the 4455 teosinte and 4398 landrace progenies. Yang et al. [8] reported the recombination breakpoints on all chromosomes for all progeny as defined by the GBS SNPs. Using these breakpoint locations and phased WGS SNPs of the parents, we were able to project the WGS SNPs of the parents onto all progeny. This process resulted in a total of 32.5 million SNPs with 17.8 million segregating SNPs in teosinte and 18.9 million segregating SNPs in maize landrace, of which 4.2 million are shared in both populations. These SNPs are available via Cyverse Data Commons: https://doi.org/10.25739/sa4n-b482. Details on how this was accomplished are presented in S1 Appendix.

Genome-Wide Association Study (GWAS)
GWAS was performed by a scan of the genome using stepwise regression analysis. We first fit a general linear model (GLM) to reduce SNP number. The model included field variables, the inbreeding coefficient, and the first 50 principal components (PCs) based on the GBS SNPs as covariates. We then selected the SNP with the lowest P value within each 200-SNP bin along each chromosome, and used the reduced set of SNPs for stepwise regression to map quantitative trait loci (QTLs) for each trait by fitting an additive plus dominance model with the same covariates as used for GLM for stepwise regression. Both GLM and stepwise regression were implemented with TASSEL5 software [70]. Details on the scanning model are presented in S1 Appendix.

Variance Component Analysis (VCA)
To estimate the proportion of trait additive genetic variation associated with different classes of SNPs, we used a procedure to estimate variance components associated with different subsets of the SNPs [66,[71][72][73]. In brief, variance component analysis (VCA) was done by (1) classifying SNPs into subsets based on a hypothesis of interest, (2) generating kinship matrices for each subset using TASSEL5 [70], and (3) fitting these kinship matrices along with phenotypic data into LDAK5 (http://dougspeed.com/ldak/), using a generalized restricted maximum likelihood (REML) solver to partition the genetic variance (heritability, h 2 ) into the proportion accounted for by each SNP subset [72,73]. Details on how we performed VCA in this study are presented in S1 Appendix.