Skip to main content
Advertisement
  • Loading metrics

Elucidating the patterns of pleiotropy and its biological relevance in maize

  • Merritt Khaipho-Burch ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    mbb262@cornell.edu

    Affiliation Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York

  • Taylor Ferebee,

    Roles Formal analysis, Investigation, Methodology, Validation, Writing – review & editing

    Affiliation Department of Computational Biology, Cornell University, Ithaca, New York

  • Anju Giri,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Institute for Genomic Diversity, Cornell University, Ithaca, New York

  • Guillaume Ramstein,

    Roles Formal analysis, Investigation, Methodology, Software, Writing – review & editing

    Affiliations Institute for Genomic Diversity, Cornell University, Ithaca, New York, Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark

  • Brandon Monier,

    Roles Formal analysis, Investigation, Methodology, Software, Writing – review & editing

    Affiliation Institute for Genomic Diversity, Cornell University, Ithaca, New York

  • Emily Yi,

    Roles Investigation, Visualization, Writing – review & editing

    Affiliation Institute for Genomic Diversity, Cornell University, Ithaca, New York

  • M. Cinta Romay,

    Roles Data curation, Resources, Writing – review & editing

    Affiliation Institute for Genomic Diversity, Cornell University, Ithaca, New York

  • Edward S. Buckler

    Roles Conceptualization, Project administration, Supervision, Writing – review & editing

    Affiliations Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York, Institute for Genomic Diversity, Cornell University, Ithaca, New York, USDA-ARS, Ithaca, New York, United States of America

Abstract

Pleiotropy—when a single gene controls two or more seemingly unrelated traits—has been shown to impact genes with effects on flowering time, leaf architecture, and inflorescence morphology in maize. However, the genome-wide impact of biological pleiotropy across all maize phenotypes is largely unknown. Here, we investigate the extent to which biological pleiotropy impacts phenotypes within maize using GWAS summary statistics reanalyzed from previously published metabolite, field, and expression phenotypes across the Nested Association Mapping population and Goodman Association Panel. Through phenotypic saturation of 120,597 traits, we obtain over 480 million significant quantitative trait nucleotides. We estimate that only 1.56–32.3% of intervals show some degree of pleiotropy. We then assess the relationship between pleiotropy and various biological features such as gene expression, chromatin accessibility, sequence conservation, and enrichment for gene ontology terms. We find very little relationship between pleiotropy and these variables when compared to permuted pleiotropy. We hypothesize that biological pleiotropy of common alleles is not widespread in maize and is highly impacted by nuisance terms such as population structure and linkage disequilibrium. Natural selection on large standing natural variation in maize populations may target wide and large effect variants, leaving the prevalence of detectable pleiotropy relatively low.

Author summary

The genetic basis of complex traits has been thought to exhibit pleiotropy, which is the notion that a single locus can control two or more unrelated traits. Widespread reports in the human disease literature show genomic signatures of pleiotropic loci across many traits. However, little is known about the prevalence and behavior of pleiotropy in maize across a large number of phenotypes. Using association mapping of common alleles in over one hundred thousand traits, we determine how pleiotropic each region is and use these quantitative scores to functionally characterize each region of the genome. Our results show little evidence that pleiotropy is a common phenomenon in maize. We observe that maize does not exhibit the same pleiotropic characteristics as human diseases in terms of prevalence, gene expression, chromatin accessibility, or sequence conservation. Rather than pervasive pleiotropy, we hypothesize that strong selection on large and wide effect loci and the need for trait independence at the gene level keep the prevalence of pleiotropy low, thus, allowing for the adaptation of maize varieties to novel environments and conditions.

Introduction

Pleiotropy, a term defined by German geneticist Ludwig Plate in 1910, refers to the phenomenon where a single locus can affect two or more seemingly unrelated traits [1]. Biological pleiotropy (also known as horizontal pleiotropy) describes situations in which a single causal variant affects two or more traits, different causal variants colocalize in the same gene and are tagged by the same genetic variant, or different causal variants colocalize in the same gene and are tagged by different genetic variants [2]. In contrast, mediated pleiotropy (also known as vertical pleiotropy) occurs when one variant impacts a phenotype that is then causal for another phenotype. Instances of mediated pleiotropy are difficult to ascertain and require a deeper understanding of how different biological pathways interact. Both horizontal and vertical pleiotropy can be biased by spurious pleiotropy arising from misclassifying relationships between traits or high linkage disequilibrium tagging nearby non-causal variants [2].

Some examples of single-locus pleiotropy have been observed in maize. The Vegetative to generative transition 1 (Vgt1) regulatory sequence controls leaf number and flowering time by regulating an APETALA2-like gene, ZmRap2.7, through what may be mediated pleiotropy [36]. Polymorphisms within Vgt1 cause an additive genetic effect of roughly 2–4.5 days to pollen shed and an increase in the number of internodes. True pleiotropy of this locus was confirmed through introgression of the Vgt1 locus into transgenic plants [7,8]. The teosinte branched 1 (tb1) locus is an example of biological pleiotropy, which impacts the number and length of internodes in lateral branches and inflorescences due to the upregulation of gene expression in maize compared to teosinte. This upregulation results from the insertion of a Hopscotch transposable element 64 kb upstream of tb1 [912]. Fine-mapping of tb1 confirmed this biological pleiotropy [10,13,14]. Although other examples of biological pleiotropy at single genes in maize have been described, many have not been confirmed through fine-mapping to distinguish between tight linkage and true pleiotropy [15].

With joint linkage studies and genome-wide association studies (GWA or GWAS) in plants, the search for pleiotropy became more promising. However, few maize genes have shown true biological or mediated pleiotropy. Some pleiotropy is present between phenotypes of the same trait type, like the architecture of tassels, ears, leaves, or flowering time; yet, very few single nucleotide polymorphisms (SNPs) are shared between different trait types [16,17]. Some evidence of pleiotropy between root, leaf, and flowering traits has been shown but requires additional testing and validation [17]. A similar study showed evidence of pleiotropy but within inflorescence traits, such as kernel row number with thousand kernel weight and tassel standing, tassel length, and tassel spike length [18]. The only robust examples of pleiotropy validated in maize are between flowering traits and leaf number [16,18]. The overlap of only a few key genes between flowering and leaf number agree with prior literature in that both traits share a developmental pathway and that the transition from halting leaf growth to the initiation of tassel development is highly correlated [3,1921].

A lack of pervasive pleiotropy was also observed in sorghum, where a large set of 234 plant architecture and agronomic traits was investigated within the Sorghum Association Panel [22]. These results may be partially due to the lack of marker saturation within GWAS models or relying on Bayesian SNP selection models to consistently select the same pleiotropic SNP across multiple independent traits. However, even with these methods, some large effect pleiotropic loci were found along with novel pleiotropic sites between traits [22]. The lack of additional shared genes between different trait types within prior studies suggests that pleiotropy is not a common phenomenon in maize or sorghum. Many traits that do show this apparent pleiotropy may be under similar regulation due to the traits sharing the same developmental phytomers. These developmentally correlated traits, which share high morphogenic relatedness, may falsely suggest that pleiotropy is much more widespread than reality.

Some studies have calculated principal components over their traits to create uncorrelated variables used to determine the degree of pleiotropy [18]. Using uncorrelated traits addresses the concern that all phenotypes are inherently related and not truly distinct if they share the same evolutionary or developmental origin [23]. However, many of these studies may miss large, causal loci impacting their phenotypes of interest in principal component based traits as opposed to observed trait values, biasing estimates of pleiotropy [18]. Additionally, GWAS-based pleiotropy studies have been restricted to finding common alleles with modest to large effects, while rare variants with small effects are difficult to find [2427]. With a narrow range of allelic detection, the ability to discern all putatively pleiotropic effects is limited. This may partially explain the minimal pleiotropy described in crop plants [18,22].

Pleiotropy has been described much more frequently in human disease literature. Some studies report that nearly half of the disease-associated genes within the human GWAS catalogue are pleiotropic [27]. Similar pleiotropy is found among other human traits and disease studies [2830]. To characterize pleiotropic loci, a study investigating 321 vertebrate genes showed that highly pleiotropic genes showed higher gene expression [31]. In human studies, pleiotropic genes are strongly associated with higher expression [28], are expressed in more tissues [29], and are enriched in active chromatin states compared to genome-wide loci [29]. Additionally, major pleiotropic loci in vertebrate genes evolve at slower rates than less pleiotropic genes, suggesting that the degree of pleiotropy may constrain adaptive evolution [31].

While human studies are more likely to look for pleiotropy within diseases, the extent of pleiotropy among common adaptive variants in maize remains elusive. The primary strengths of using maize for investigating pleiotropy are much higher levels of diversity and phenotypic measurements on replicated clones. However, a key drawback is that the number of recent recombination events is much lower than in some powerful human studies (e.g., UK Biobank), limiting genomic resolution. These strengths and drawbacks have become apparent in the literature, with many maize pleiotropy studies confined to large effect domestication loci [7,10,14], in the number of traits they investigate [18,32,33], to low mapping resolution of quantitative trait loci (QTL) [3,16,34,35], or variable methods of distinguishing and selecting a true biologically pleiotropic locus [36]. Additionally, a major caveat to pleiotropy studies in maize is investigator bias in phenotyping, relatively common, easier to measure field-based traits which may behave differently from molecular phenotypes such as gene expression or metabolites. These field versus molecular phenotypes may show distinct patterns of pleiotropy that have been relatively unexplored in the literature. Although developmentally, some of these bulked traits may share the same generating phytomers, they still offer an alternative view of pleiotropy.

In this study, we investigated the pervasiveness of pleiotropy in maize using a large set of 120,597 previously published phenotypes in the US Nested Association Mapping population and the Goodman Association Panel (hereafter NAM and GAP, respectively) that generated 480 million significant quantitative trait nucleotides. We used these association results to calculate a quantitative estimate of how pleiotropic each genic and intergenic region is within the genome. Then, we functionally characterized these loci through gene expression, chromatin openness, and sequence conservation. We aimed to investigate instances of true biological pleiotropy where variants directly impact multiple traits while trying to limit the amount of spurious pleiotropy occurring from linkage disequilibrium.

Results

Genome-wide association of traits across NAM and GAP

To determine the extent of pleiotropy within the maize genome, we performed GWAS on 120,597 traits across NAM (n = 5,186) and GAP (n = 271). These traits are outlined in further detail in Supplemental Tables 1–3. These traits include 121 NAM field traits, 222 GAP field traits, 3,873 GAP leaf metabolite mass features [37], and 116,381 GAP expression values across seven diverse tissues [38]. The traits were categorized into four categories based on the population and type of trait measured: 1) NAM field, 2) GAP field, 3) GAP mass features, and 4) GAP expression traits (S1 Table). The NAM and GAP field were further categorized into eight different trait types to investigate inner trait category pleiotropy. These trait types include tassel, ear, flowering, disease, height, kernel, leaf, and vegetative traits. A schematic of our methods is shown in Fig 1. Genotype data came from HapMap 3.2.1 SNPs and were imputed using Beagle 5.1 with the USDA-ARS NCRPIS collection SNPs as the reference. After filtering, roughly 15 million NAM SNPs and 17 million GAP SNPs remained for association mapping.

thumbnail
Fig 1. Schematic of the methods used within this study.

(A) A total of 120,597 traits were gathered across four population-trait categories: 1) NAM field, 2) GAP field, 3) GAP mass features, and 4) GAP expression. Trait values were randomly shuffled (i.e., permuted) five or ten times to create null distributions of no pleiotropy. (B) The observed and permuted data were then mapped with the Fast Association method in rTASSEL on 15 million NAM SNPs and 17 million GAP SNPs using a combination of global and window principal components and PEER factors to control population structure and experimental design. All results with p-value < 0.00001 were kept for both the permuted and observed data. (C) GWAS results were then split by trait type and permutation and were then aggregated into genic and intergenic intervals based on the significant B73 v4 trait-SNP association coordinate. (D) A matrix of the number of unique traits mapping to each interval was used to estimate pleiotropy independently for each population-trait category.

https://doi.org/10.1371/journal.pgen.1010664.g001

The prevalence of pleiotropy is low but shows variation by trait type and population

Each population-trait category was investigated independently due to the uniqueness of the genetic architecture of each population, the variable trait sample size, and to investigate the differences in the patterns of pleiotropy between trait types. For each population-trait category, the number of unique traits mapping to each genic and intergenic interval was counted for both the observed and permuted association results to gain a quantitative metric on how pleiotropic each interval was within the genome. These pleiotropy estimates include biological, mediated, and spurious pleiotropy, as distinguishing between the different forms is difficult without a deeper understanding of how all individual traits are regulated. A total of 75,490 intervals with a mean length of 27,902 bp were used. Additionally, to gain a SNP-level estimate of pleiotropy, we randomly subsampled one million Hapmap 3.2.1 SNPs and counted all unique traits associated with each SNP. Counts for the null distributions of no pleiotropy originated from ten permutations of the field and mass features, while the expression phenotypes were permuted five times. We adjusted all our permuted phenotypes before GWA by the same principal components and PEER factors used within the GWA models to not inflate p-values due to the underlying population structure held within principal components.

We compared our distribution of observed significant traits per interval to the permuted trait counts. Across populations and trait types, 67.7–98.4% of intervals have 0–1 significant traits associated with them (Fig 2). For our SNP level pleiotropy estimates, 57.3–98.6% of SNPs have 0–1 significant traits associated with them (S1 Fig). Most intervals in the observed data controlled more traits than the permutations. This does not completely imply pervasive pleiotropy but rather that most traits are polygenic.

thumbnail
Fig 2. Distribution of QTL by interval.

Of the intervals showing a five-fold higher proportion of pleiotropy over their permutations (right of the vertical dashed line), intergenic ranges tend to exhibit higher pleiotropy than genic ranges. Each value along the x-axis was calculated from the natural log of the number of observed traits mapping to each interval divided by the mean count of traits in the permuted data with a pseudo-count of plus one in the numerator and denominator. Values left of the vertical dashed line indicate higher pleiotropy in the permuted data versus the observed data suggesting the prevalence of high noise or no trait-SNP associations in either the observed or permuted data (peak at zero). Distributions are split into genic (gray) and intergenic (yellow) intervals for (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression traits.

https://doi.org/10.1371/journal.pgen.1010664.g002

For intervals exhibiting five-fold higher evidence in the observed versus the mean permuted data, NAM field phenotypes display some pleiotropy at intervals putatively associated with 2–19 traits (median = 5 traits) in 1.56% of intervals. GAP field traits show pleiotropy associated with 2–65 traits (median = 9) in 32.3% of intervals genome-wide. GAP mass features showed much less pleiotropy, with 3.03% of intervals associating with between 2–695 traits (median = 63). Lastly, GAP expression data display slightly more pleiotropy with 31% of intervals associating with 2–6264 traits (median = 391). In relation to the total number of expression traits (n = 116,381), a median of 0.33% of all expression traits impacted these highly pleiotropic intervals. For GAP mass features (n = 3,873), a median of 1.62% of all mass features contributed to these high pleiotropy estimates.

For SNPs exhibiting five-fold higher evidence in the observed versus the mean permuted data, NAM field phenotypes show some pleiotropy putatively associated with 2–9 traits (median = 2 traits) in 1.45% of SNPs. GAP field traits show pleiotropy associated with 2–35 traits (median = 2) in 1.49% of SNPs. GAP mass features show pleiotropy associated with 2–188 traits (median = 11) in 6.85% of SNPs, and the GAP expression data display pleiotropy associated with 2–879 traits (median = 21) in 42.7% of SNPs. Overall this suggests that there may be a set of intervals and SNPs with high pleiotropy, but as we will show below, this is mostly a product of factors impacting mapping.

These pleiotropy estimates for GAP field and expression data may be higher than NAM field and GAP mass feature estimates. For GAP field data, these pleiotropy estimates might be slightly inflated due to residual population structure that was not controlled for well during the association mapping step [39] but were mapped this way for consistency across traits. Additionally, for all the population and trait types, retaining all trait-SNP associations with p-value < 0.00001 may have introduced noise depending on the genetic architecture of the trait being mapped that could, in turn, overrepresent the amount of pleiotropy present.

To understand how different trait types may impact the distribution of pleiotropy, rather than all traits in aggregate, NAM and GAP field traits were split into eight different types to investigate inner-category pleiotropy. These trait types included tassel, ear, flowering, disease, height, kernel, leaf, and vegetative traits. While the distribution of QTL by interval varies by the specific trait type, all individual trait types closely resemble the distribution of QTL across all traits (S2 and S3 Figs, S8 Table).

Mapping features explain most pleiotropy

We analyzed various biological and nuisance (GWA mapping) features to understand their importance in explaining this apparent pleiotropy. We evaluated two hypotheses: (1) that this apparent pleiotropy was due to characteristics of genetic mapping; or (2) various aspects of biological regulation and mutation-selection balance. Modeled biological features consisted of the maximum gene and protein expression values in B73 across twenty-three diverse tissues [40], the mean Genomic Evolutionary Rate Profiling (GERP) scores at sites overlapping with significant GWA hits [41], and the number of ATAC-seq peaks [42]. Nuisance features, which have to do with genetic mapping and resolution, were similarly modeled and included the average R2 linkage disequilibrium within the NAM and GAP, the total number of SNPs per interval used for the GWA analysis, and interval size. We also included an adjusted GAP expression pleiotropy estimate for NAM field, GAP field, and GAP mass feature models to see if pleiotropy at an intermediate molecular step had any effect on field or metabolite regulation.

All pleiotropy scores are highly correlated with nuisance terms (Fig 3). The number of input SNPs, the interval size, and average linkage disequilibrium all show strong, positive correlations with pleiotropy values. Additionally, pleiotropy values between populations and trait types are highly correlated with each other. For the biological features, there is a weak correlation between pleiotropy and GERP, max protein expression, max RNA expression, and the number of ATACseq peaks. A full correlation matrix including gene ontology (GO) terms is available in S4 Fig.

thumbnail
Fig 3. Correlation matrix of key biological, nuisance (mapping), and pleiotropy values.

Values indicate the Pearson correlation between values colored by the strength of their correlation.

https://doi.org/10.1371/journal.pgen.1010664.g003

With these biological, nuisance, and adjusted pleiotropy terms we ran four models (Models 1–4) that varied in their input features and model architectures. Each model type was run eight separate times for each population-trait category (NAM field, GAP field, GAP mass features, and GAP expression) and permutation type (observed versus permuted data). All models were trained using an iterative leave-one-chromosome out approach. Model 1 was a random forest model that included only nuisance, or mapping terms. Model accuracy varied by the population and trait category (R2 = 0.29–0.87, S5 Table). Prediction on held-out chromosome data showed a strong relationship between predicted and observed pleiotropy (R2 = 0.85–0.96, Fig 4), suggesting that pleiotropy within these populations can be predominantly described by mapping terms.

thumbnail
Fig 4. Predicted versus observed pleiotropy values for Model 1 (random forest with only nuisance terms) from four individual random forest models trained on interval-based observed pleiotropy values.

The dashed line represents the 1–1 identity line, while the solid line represents fitted values. Panels show the results for the (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression data.

https://doi.org/10.1371/journal.pgen.1010664.g004

Model 2 was trained on pleiotropy scores using all biological and nuisance terms described previously using a random forest model. Overall, the models performed well, with R2 values ranging from 0.35 to 0.87 (S5 Table) and showed good performance when predicting on the left-out chromosome across models (S6 Fig). Across population-trait categories in the observed and permuted data, the most important terms in predicting pleiotropy were nuisance variables that included the number of input SNPs, average R2 linkage disequilibrium, and interval size (Fig 5). All biological features had lower importance for predicting pleiotropy than the nuisance variables. The mean GERP score showed moderate importance across models, and RNA and protein expression showed the least importance of these biological features. All other biological features were only slightly important. The ranking of features between the observed and permuted data models remained largely the same with very little deviation (S5 Fig). For the eight individual NAM and GAP field trait types, there was some variability in the ordering of variables compared to the counts across all traits, however, nuisance terms remained the most important across variables (S7, S8, S9 and S10 Figs).

thumbnail
Fig 5. Relative importance scores from random forest models suggest high importance for nuisance terms and low importance for biologically relevant terms in explaining interval-based pleiotropy.

The random forest results shown are from Model 2 (with no gene-ontology terms). Across all four population-trait categories, nuisance variables such as the number of input SNPs, interval size, and average linkage-disequilibrium (LD) showed higher relative importance over biological features. The plots show the observed pleiotropy values for (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression data. Model 2 results for the permuted data are available in S5 Fig. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.g005

Model 3 was also a random forest model trained using the same biological and nuisance terms as Model 2 but included fourteen gene ontology terms thought to be important in explaining highly pleiotropic loci. Model 4 was a gradient boosting model with only the terms described in Model 2 (no gene ontology terms). Models 2, 3, and 4 performed similarly, with the nuisance variables showing the highest relative importance (Fig 5, S5, S11 and S13 Figs). All models had comparable model performance, with the gradient boosting method (Model 4) having slightly higher prediction accuracy (Model 4 R2 0.46–0.92) over most random forest models (Model 3 R2 0.33–0.98; S6, S12 and S14 Figs, and S5 Table). For Model 3, which included gene ontology terms, all gene ontology terms showed lower relative importance than the biological and nuisance variables (S11 Fig).

For the SNP-level pleiotropy estimates, we ran a gradient boosting model similar to Model 4. This model included the following biological and nuisance terms: the maximum gene and protein expression of the closest gene within 5 Kb of the focal SNP, the GERP scores at that site, if the SNP was in an ATAC-seq peak, the location of the SNP (genic or intergenic), and the average R2 linkage disequilibrium. These models did not perform as well (R2 = 0.003–0.281, S5 Table), potentially due to the sparsity of trait-SNP associations at these subsampled SNP sites. The SNP-level models showed high importance for the average R2 linkage disequilibrium and very little importance for any biological terms (S15 and S16 Figs).

Due to the overwhelmingly large effect of mapping features in describing pleiotropy, an adjusted interval-based pleiotropy statistic was created. This adjusted statistic was created from Model 1 by subtracting the predicted pleiotropy from the observed pleiotropy score and was used to look at different biological relationships in more depth. This adjusted pleiotropy statistic was meant to curb, but not completely eliminate, the amount of statistical pleiotropy within the models while trying to retain some information about biological and mediated pleiotropy.

The relationship between adjusted pleiotropy and observed pleiotropy is available in S17 Fig. We regressed these adjusted pleiotropy values against the max RNA expression across 23 diverse B73 tissues and the mean GERP score. We found no difference in the relationship between adjusted pleiotropy with GERP or the max RNA expression value for NAM field, GAP field, or GAP mass features compared to the permuted data. However, we did find a change in the relationship between RNA expression and GERP with GAP expression pleiotropy data in the observed versus the permuted data, where more pleiotropic intervals showed lower GERP sequence conservation and higher expression values (S18 and S19 Figs). These relationships of observed GAP expression pleiotropy with max RNA expression (adj. R2 = -2.41e-05, p-value = 0.767) and GERP (adj. R2 = -6.68e-06, p-value = 0.4016) were not significant.

With these adjusted pleiotropy scores, we then investigated the most pleiotropic genes within each population and trait type. The top twenty most pleiotropic genes by population and trait type are presented in S7 Table. Genes that were shared across populations and traits types included Zm00001d035776, which is a paired amphipathic helix protein shown to be involved in regulating flowering time and internode number in Arabidopsis [43], pleiotropic in both GAP field and GAP expression data. Zm00001d040920 is a tRNA methyltransferase pleiotropic within NAM field and GAP expression data. Lastly, we searched for overlap in our top twenty highly pleiotropic genes within a prior large-scale maize pleiotropy study [17]. We found one shared sphingomyelin synthetase family protein (Zm00001d034600), in GAP field traits that impacted two tassel traits, and a G2-like1 gene (Zm00001d044785) in GAP mass features that impacted six root traits [22].

Gene ontology suggests differential enrichment for highly and lowly pleiotropic intervals

To determine if there was any differential enrichment in gene ontology terms among our putatively pleiotropic sites, we categorized our genic intervals into highly and lowly pleiotropic sets based on their adjusted pleiotropy values. Genic intervals were classified as lowly or highly pleiotropic if they had the lowest 25th percentile or highest 75th percentile of adjusted pleiotropy values within each population-trait category. We then queried how enriched the lowly and highly pleiotropic gene sets were for molecular function (MF), biological process (BP), and cellular component terms. No cellular component terms were significant.

For molecular function, we found an enrichment of highly pleiotropic terms shared between populations involved in binding including ribonucleotide, nucleotide, nucleoside phosphate, protein, ATP, and carbohydrate derivative binding (Fig 6). Additionally, terms involving potassium ion binding, kinase activity, and ligase activity were shown to be significant in highly pleiotropic intervals in some populations. For intervals with low pleiotropy, terms associated with serine-type peptidase activity, peroxidase activity, heme binding, copper-ion binding, catalytic activity, and antioxidant activity were significant for NAM field or GAP expression data. No terms for the lowly pleiotropic GAP field and GAP mass features data were significant.

thumbnail
Fig 6. Gene ontology enrichment for highly and lowly pleiotropic intervals for the four population-trait categories.

Highly and lowly pleiotropic values fell within the top 75th percentile and bottom 25th percentile of adjusted pleiotropy values for each population-trait category. All molecular function, biological process, and cellular component terms were tested. Terms with false discovery rate (FDR) corrected p-values below p-value < 0.05 were retained. The x-axis represents highly or lowly pleiotropic intervals split by the population and trait type, and the y-axis shows the top significant gene ontology terms. The value of the FDR significance level is colored in blue and the size of the dots represents the proportion of genes found in that GO category.

https://doi.org/10.1371/journal.pgen.1010664.g006

For biological process terms, we found an enrichment in the response to oxidative stress for lowly pleiotropic intervals in the NAM field traits. Highly pleiotropic biological processes included terms involved with signaling, signaling transduction, and cell communication in the GAP expression data and phosphorylation in the GAP field data.

Additionally, we preformed GO enrichment for the top 90th and bottom 10th percentile, and top 99th and bottom 1st percentile of adjusted pleiotropy values. For the top 90th and bottom 10th percentile of adjusted pleiotropy values, numerous terms were significant for molecular function and biological process, including many terms for DNA and RNA binding for highly pleiotropic sites (S20 Fig). For the top 99th and bottom 1st percentile of adjusted pleiotropy values, only three biological process terms and five molecular function terms were significant for the highly pleiotropic sites (S21 Fig). Significant terms included helicase activity between the GAP mass features and the GAP expression data, adenyl ribonucleotide and nucleotide binding for the GAP field data, and protein localization terms for the NAM field data. No terms were significant for the lowly pleiotropic sites or cellular component terms in either enrichment analysis.

Discussion

Our results show modest but pervasive pleiotropy in the Nested Association Mapping and Goodman Association Panel across a wide range of field, metabolite, and expression phenotypes that could almost always be highly explained by mapping resolution characteristics. Across 120,597 maize traits, we have shown that some apparent pleiotropy is present across populations; however, much of this pleiotropy is highly correlated and explained by association mapping parameters in multiple models. A total of 67.7–98.4% of intervals are associated with zero to one traits. These distributions are reminiscent of the L-shaped pleiotropy distributions derived from GWA- and QTL-based studies in other species like mouse, yeast, nematode, and stickleback fish, where the large majority of loci have little to no effect on traits while a minimal subset of sites show pleiotropic effects [44,45]. Within these distributions, intergenic intervals have higher pleiotropy values confirming prior results in maize where most GWA hits fall within larger, intergenic regions rather than genic regions [46], which would lead to higher overall pleiotropy scores. One caveat to this study is that we cannot detect pleiotropy in fixed mutations because our estimates are based on the prevalence and frequency of association results which are values dependent on allele frequencies within populations. We are thus more likely to find common variants with only modest effects on phenotypes than rare variants with small effects [2426].

Given our distribution of pleiotropic loci derived from common alleles, we investigated the top twenty intervals with the largest adjusted pleiotropy values. A paired amphipathic helix protein (Zm00001d035776) shown to be involved in regulating flowering time and internode number in Arabidopsis [43] and a tRNA methyltransferase (Zm00001d040920) was found across two populations. All other genes, within or next to the most pleiotropic intervals, were only present in one population. In GAP field traits, xylanase/glycosyl hydrolase10 (xyn10, Zm00001d039958) has been shown to impact mesocotyl length after deep-seeding [47], with the larger family of glycoside hydrolases being involved in plant cell enlargement and composition [48]. A highly pleiotropic locus in GAP mass features included empty pericarp2 (Zm00001d005675), which is involved in embryo morphogenesis, shoot development, and the negative regulation of the heat shock response [4951]. GAP expression data showed ZIM-transcription factor 8 (zim8, Zm00001d004277), a candidate gene for Striga damage [52], kernel composition, and popping expansion [53].

To understand the genetic architecture behind these pleiotropic loci we ran multiple random forest and gradient tree boosting models investigating the relationship between biological and nuisance (mapping) variables to characterize the patterns of pleiotropy. Nuisance variables, including the interval size, average linkage disequilibrium, and the number of input SNPs, showed the highest relative importance across models. Of much lower relative importance were biological features, including maximum gene expression across multiple B73 tissues [40], the amount of accessible chromatin [42], and the average evolutionary conservation [41]. Even after adjusting pleiotropy scores using the nuisance variables, we found no strong or significant relationship with maximum gene expression or the average GERP score in any population-trait category.

In contrast to our results in maize, pleiotropy is pervasive in human and mouse literature [2830,44,54]. It has been shown that highly pleiotropic loci are associated with changes in gene expression in mice [31] and humans [28], broad expression across many tissues [26,29], and an enrichment in accessible chromatin [26]. These enrichment patterns may not be present in maize due to constant selective pressures on novel pleiotropic loci originating after its domestication from teosinte roughly 9,000 years ago [55,56] and constant selection in modern breeding programs, making pervasive wide- and large-effect pleiotropic loci big targets for selection. In a population with large standing natural variation, mutations within wide- and large-effect loci would not be advantageous. This agrees with Fisher and Orr models where mutations causing larger phenotypic effects are less likely to be beneficial over small effect mutations and thus more likely to affect complex organisms [57,58]. Additionally, genetic independence among traits is essential so that favorable combinations of alleles can aid in the adaptation of novel varieties suited to new environments and conditions [59]. Thus, our results of modest pleiotropy agree with the notion that many independent loci control a single trait, and these loci originate from sites with varying chromatin accessibilities, RNA expression values, and conservation levels.

Trait selection heavily influences pleiotropy results. Small subsets of measured traits originating from the same phytomer or highly correlated traits such as amino acid content [60] or grain quality [61] may already share similar regulatory pathways leading to enrichment in pleiotropic loci [23]. This may lead to unrealistic expectations of the pervasiveness of pleiotropy genome-wide. Because pleiotropy estimates rely heavily on allele frequencies, allele effect sizes, the traits under investigation, mapping resolution, and the selection of candidate markers, we argue that many claims of pleiotropy in the plant literature are misleading. Additionally, pleiotropic calls on QTL intervals in prior studies in wheat [62], rice [61,63], maize [34,35], and soybean [64] lack the genomic resolution to distinguish between tight linkage and true pleiotropy due to limited markers and large linkage groups.

Although many pleiotropy claims are misleading, this does not mean that pleiotropy does not exist in plants. These pleiotropic relationships may be more confined to fixed, large-effect loci such as teosinte branched 1 (tb1), Vegetative to generative transition 1 (Vgt1), or closely related traits that share similar developmental pathways. These closely related traits shown to exhibit some pleiotropy include key regulators of flowering time between days to silking, days to anthesis, and the anthesis-silking interval [3], some inflorescence architecture traits [16], and co-regulation between flowering time and leaf number [16,19,20].

In this study, we achieved phenotypic and marker saturation to assess the prevalence of pleiotropy within the maize genome. There is minimal pleiotropy present that, in aggregate, shows little to no relationship with gene expression, sequence conservation, or DNA accessibility and is highly impacted by mapping resolution terms including linkage disequilibrium and genomic interval size. Pleiotropy can still exist within the maize genome but is highly constrained by selection and tight linkage of multiple causal loci. We suggest that in future work of the plant genetics community, caution should be taken before claiming pleiotropy is the causal agent controlling multiple phenotypes. We recommend that: 1) mapping and marker resolution is sufficient to distinguish between tight linkage and pleiotropy, 2) there be robust testing of a null hypothesis of no pleiotropy through statistical methods or null distributions [36,65], 3) there be enough unrelated traits to create pleiotropy estimates, and 4) there be functional characterization or validation of pleiotropic claims through the identification of shared biological pathways or fine-mapping. Additionally, there should be more specificity in the types of pleiotropy under investigation by adhering to previously described nomenclature [2,66].

Materials and methods

Collection of a wide range of phenotypes in NAM and GAP

To determine the degree of pleiotropy in the maize genome, we curated traits measured across two maize mapping populations: the US Nested Association Mapping population (US-NAM, or NAM) [67] and the Goodman Association Panel (GAP) [68]. NAM is a biparental population of 26 diverse inbred lines crossed to a common parent, B73, with 5,186 progeny, while the Goodman Association Panel consists of 271 diverse inbred lines. Data were collected from 3,873 leaf mass features [37], 116,381 RNA expression values from seven tissues [38], and 343 field traits. Field traits included flowering time [3,69], plant architecture [21,46,7075], disease and insect resistance [7683], and kernel composition [8490]. Many of these traits are replicated across environments, so a single representative version of that trait was kept. These traits, their publication of origin, and trait values are available in S1S3 Tables. Due to their size, GAP mass features and expression data are available in their original publications [37,38].

As shown in the visual methods, we split all phenotypes into four categories based on population and trait type (Fig 1). These population-trait categories were: 1) NAM field, 2) GAP field, 3) GAP mass features, and 4) GAP expression traits. Some traits within GAP field traits included kernel carotenoids and metabolites [84,86,8890], which could be classified into the GAP mass features category because they are also molecular phenotypes. However, only the mass features measured within Zhou et al. 2019 were classified as GAP mass features [37]. The field traits were further classified into eight trait types to determine if there was any pleiotropic pattern within and between different trait types. These trait types included flowering, disease, tassel, ear, leaf, height, vegetative, and kernel traits (S1 Table).

To create an estimate of statistical pleiotropy to compare to our estimates of biological pleiotropy, we permuted each phenotype ten times for NAM field, GAP field, and GAP mass features. Due to the large number of trait observations, we permuted GAP expression values five times. Phenotypes were permuted by fitting a linear model with their respective principal components or PEER (Probabilistic Estimation of Expression Residuals) factors [91] as described in the association mapping section, and we gathered their fitted and residual values. Residual values were independently permuted five or ten times before adding back in the effects of the fitted values. This permutation step ensured that covariances between traits were broken and that the effect of the principal components and PEER factors did not inflate GWA p-values due to their underlying population structure. The genotype matrix was not permuted to maintain the LD structure between SNPs and QTL.

Association mapping of a wide range of phenotypes

We used the previously generated NAM and GAP Hapmap 3.2.1 SNPs imputed using Beagle 5.1 [92] with the USDA-ARS NCRPIS collection SNPs as the reference [93]. After filtering based on a minimum allele frequency of 0.01, roughly 17 million and 15 million SNPs remained in the GAP and NAM populations, respectively.

GAP field and mass feature traits were mapped with the following model to account for population structure: y = 1 + 3 global PCs + e. The three global principal components (PCs) were calculated on a subset of 66,527 SNPs from 3,545 diverse inbred lines in the USDA-ARS NCRPIS collection, and then GAP SNPs were rotated to such PCs. The 66,527 SNPs were chosen because they represented sites with no missing data shared between the three maize populations. This method offered similar control for population structure and kinship over PCs calculated solely within GAP and was consistent when mapping traits across multiple populations [94]. This also saves on computational time compared to using mixed linear models. For GAP expression traits, twenty-five tissue-specific PEER factors were added to the model to control for experimental design and sampling variability as described previously [38].

Due to the population and family structure arising from NAM’s half-sib design, more stringent control of population structure was needed to reduce the false-positive rate. The following two models were used in NAM: y = 1 + 3 global PCs + main window PCs + e and y = 1 + 3 global PCs + mid window PCs + e, where the global PCs were calculated similarly to GAP but using NAM SNPs. Main window PCs were calculated only for NAM by breaking the genome into 360 gene windows based on B73 AGPv4 gene coordinates, and PCs were calculated within each window. A total of 360 genes were chosen per window because they offered a reasonable number of degrees of freedom when mapping within NAM. For each window, enough PCs were taken to explain 15% of the total variance. To combat potential edge effects arising from these ‘main’ windows, we slid the windows over by 180 genes, recalculated window PCs (named ‘mid’ window PCs), and then used both collections of PCs in the association analysis. This resulted in 469 ‘main’ window PCs and 512 ‘mid’ window PCs. The union of significant sites between these two models was used for further analyses. In cases where both models had a significant association for a single SNP, the most significant association (with the smallest p-value) was taken. All PCs used to map traits in NAM and GAP are available in S4 Table. All observed data and permuted phenotypes were then mapped using the rTASSEL package in R [95] with the Fast Association method [96], keeping all trait-SNP associations with p-value < 0.00001.

Calculating a quantitative pleiotropy score

Due to high linkage disequilibrium and the difficulty in ascertaining the true causal SNP(s) from a list of significant associations, we chose to investigate pleiotropy within genic and intergenic intervals. We did this by breaking the genome into an equal number of genic and intergenic intervals based on gene annotations in B73 AGPv4 with the GenomicRanges package in R [97]. Any overlapping genic ranges were merged into one large interval. This resulted in n = 75,490 intervals with a mean length of 27,902 bp. For SNP-level pleiotropy, one million Hapmap 3.2.1 SNPs were randomly subsampled using the subsetSites TASSEL plugin. Using a custom TASSEL plugin (version 5.2.79, CountAssociationsPlugin) [98], we counted the number of trait-associated SNPs for each interval or each SNP for each population-trait category separately for both the observed and permuted results. The matrices of intervals by trait counts were then used to calculate the number of unique (non-redundant) traits mapping to each interval or each SNP. For the NAM and GAP field data this resulted in 80 and 167 non-redundant (unique) traits, respectively. We chose to calculate the number of traits mapping to each interval or SNP over using other previously established pleiotropy methods because it provided a quantitative score on the variability of trait-SNP associations, made no assumptions on which single SNP was the ‘most’ pleiotropic, and did not require effect sizes of SNPs. Additionally, we did not perform fine-mapping or colocalization analyses due to the nature of recombination between maize versus humans, where many of these classification techniques are optimized. Human GWAS studies have larger populations with more recent recombination events than maize with smaller populations with ancestral recombination blocks. As most recombination in maize occurs near the 5’ end of genes, with very little recombination in the intergenic blocks, interval-based approaches are likely to deliver results at the best possible resolution from our association analyses. Investigation of pleiotropy enrichment compared to the permuted data was performed by taking the log of the observed divided by the mean permuted pleiotropy count for each interval plus a pseudo-count of one. The pseudo-count of one in the numerator and denominator ensured that all intervals were retained in the plot.

Functionally characterizing pleiotropy using linear, random forest, and gradient boosting models

To determine what factors may impact pleiotropy in maize, we investigated which annotations were the most important in determining the degree of pleiotropy within our traits and populations. We trained three random forest models using the R ranger package [99] and one gradient boosting model using the R XGBoost package [100], using features that are known to statistically interfere with GWA analyses and features that may biologically impact pleiotropy.

Model 1 was a random forest model that only included nuisance (mapping) features. All terms were calculated for each interval and included average R2 linkage disequilibrium within NAM and GAP calculated with the TASSEL MeanR2FromLDPlugin function, the total number of SNPs per interval used for the GWA analysis before p-value filtering, and the interval size. All features were calculated for each population separately.

Models 2 (random forest) and 4 (gradient boosting) included both nuisance and biological features. Biological features included the mean Genomic Evolutionary Rate Profiling (GERP) scores of SNPs that overlapped exactly (based on their physical position) with significant GWA SNPs averaged across an interval, the number of ATAC-seq sites previously analyzed [42], RNA and protein expression values in B73 across 23 diverse tissues [40], the interval type (i.e., genic or intergenic), and adjusted GAP expression pleiotropy values. The GAP expression pleiotropy values were included because we hypothesized that any pleiotropy in the expression data, being an intermediate trait in the central dogma occurring before metabolites and field phenotypes, would potentially explain how these other classes of traits were being regulated. Model 2 was used for the interval-based pleiotropy data across all trait categories and for the eight separate trait types.

For the SNP-level pleiotropy counts, a gradient boosting model was trained using the XGBoost package, similar to Model 4 using a subset of features. These features were linkage disequilibrium, GERP, the max RNA and protein expression of the nearest gene within 5 Kb, if the SNP was in an ATAC-seq site, and the location of the SNP (genic or intergenic).

For Model 3, we included all nuisance and biological variables as Models 2 and 4; however, we also added fourteen biological process or molecular function gene ontology terms to the model. These terms describe the presence or absence of that ontology term to that interval. These gene ontology terms were scored for each interval where 1 denoted that genic interval contained a gene belonging to that gene ontology term, while 0 meant that genic and all intergenic intervals did not harbor a gene belonging to that ontology term. These terms included transcription DNA-templated (BP, GO:0006351), translation (BP, GO:0006412), regulation of transcription DNA-templated (BP, GO:0006355), regulation of translation (BP, GO:0006417), leaf development (BP, GO:0048366), regulation of leaf development (BP, GO:2000024), flower development (BP, GO:0009908), regulation of flower development (BP, GO:0009909), regulation of catalytic activity (BP, GO:0050790), signal transduction (BP, GO:0007165), nucleotide binding (MF, GO:0000166), DNA binding (MF, GO:0003677), RNA binding (MF, GO:0003723), and DNA-binding transcription factor activity (MF, GO:0003700).

All models were trained and tested using a leave-one-chromosome-out approach, to avoid overfitting. This was done by cycling though all ten maize chromosomes, training the model using data from nine chromosomes, and testing on the left-out chromosome. The random forest models (Models 1–3) trained using R/ranger were built with default parameters, using the impurity variable importance mode and 500 trees. For the gradient tree boosting model (Model 4), we performed a grid search over the hyperparameter space of eta and max.depth and chose the values eta = 0.5, max.depth = 6 for each population, as these values attained approximately the lowest test error in each population. All other XGBoost parameters were set to their default values. Each model’s relative importance scores were calculated by dividing all importance scores by the maximum importance score. A correlation matrix of all model terms is available in S4 Fig.

Due to the high importance of linkage disequilibrium, the number of input SNPs, and interval size in the random forest models, an adjusted interval-based pleiotropy metric was created. Adjusted pleiotropy was calculated by subtracting the predicted values of the following random forest model: pleiotropy ~ averageLinkageDisequilibrium + numberInputSNPs + intervalSize (Model 1) from the observed pleiotropy values for each population-trait category. The relationship between adjusted pleiotropy and observed pleiotropy at the interval level is available in S8 Fig. Associations with adjusted pleiotropy were then assessed for the max B73 expression across numerous tissues and the average GERP scores using linear models in R. These adjusted pleiotropy values aimed to control for spurious pleiotropy, however, we still cannot distinguish between instances of true biological, mediated, or spurious pleiotropy. All raw and adjusted pleiotropy values and biological annotations used for analyses are available in S6 and S9 Tables.

Enrichment of adjusted pleiotropy scores with gene ontology terms

To investigate which gene ontology terms were enriched in highly and lowly pleiotropic regions, we subsetted our adjusted pleiotropy scores to genic regions, uplifted our B73 gene models from AGPv4 to RefGen_v5, and connected the RefGen_v5 genes with GO terms using Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.interproscan.tsv obtained from MaizeGDB [101]. We then gathered the highest 75th percentile, 90th percentile, and 99th percentile of adjusted pleiotropy values in each population-trait category and labeled these loci as highly pleiotropic. Lowly pleiotropic values were the lowest 25th, 10th, and 1st percentile of adjusted pleiotropy values in each population-trait category. We then used the intervals in these percentiles as inputs into topGO [102]. We investigated the genes associated with molecular function, biological processes, and cellular components, retaining all significant false discovery FDR corrected terms from a Fisher’s Exact test (p-value < 0.05). For population-trait categories with many significant GO results, only the top thirteen terms with the lowest FDR corrected p-values within each GO category were plotted. Significant GO terms and all other plots in this manuscript were visualized using ggplot2 [103].

Supporting information

S1 Table. Publications and traits used in this study.

https://doi.org/10.1371/journal.pgen.1010664.s001

(CSV)

S2 Table. NAM phenotypic data used in the analysis.

https://doi.org/10.1371/journal.pgen.1010664.s002

(CSV)

S3 Table. GAP phenotypic data used in the analysis.

https://doi.org/10.1371/journal.pgen.1010664.s003

(CSV)

S4 Table. Global, main window, and midway window principal components used to map NAM traits and global principal components used to map phenotypes in GAP.

https://doi.org/10.1371/journal.pgen.1010664.s004

(XLSX)

S5 Table. Random forest and gradient boosting performance and prediction accuracy for all observed and permuted data models.

R2 prediction accuracy is calculated using an iterative leave-one-chromosome out approach. Prediction accuracy for Model 1 (random forest, only nuisance terms), 2 (random forest), 3 (random forest with gene ontology terms), and 4 (gradient boosting) are provided for the SNP- and interval-based models.

https://doi.org/10.1371/journal.pgen.1010664.s005

(CSV)

S6 Table. Interval-level aggregated raw pleiotropy counts, adjusted pleiotropy counts, and values for all features used for analyses.

https://doi.org/10.1371/journal.pgen.1010664.s006

(CSV)

S7 Table. Top twenty intervals and their associated genes by population and trait type.

https://doi.org/10.1371/journal.pgen.1010664.s007

(CSV)

S8 Table. Interval-level pleiotropy counts for the NAM and GAP field traits split into eight trait types.

https://doi.org/10.1371/journal.pgen.1010664.s008

(CSV)

S9 Table. SNP-level pleiotropy counts and values for all biological and nuisance features used for analyses.

https://doi.org/10.1371/journal.pgen.1010664.s009

(CSV)

S1 Fig. Distribution of QTL by SNP.

Of the SNPs showing a five-fold higher proportion of pleiotropy over their permutations (right of the vertical dashed line). Each value along the x-axis was calculated from the natural log of the number of observed traits mapping to each SNP divided by the mean count of traits in the permuted data with a pseudo-count of plus one in the numerator and denominator. Values left of the vertical dashed line indicate higher pleiotropy in the permuted data versus the observed data suggesting the prevalence of high noise or no trait-SNP associations in either the observed or permuted data (peak at zero). Distributions are split into genic (gray) and intergenic (yellow) SNPs for (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression traits.

https://doi.org/10.1371/journal.pgen.1010664.s010

(TIF)

S2 Fig. Distribution of QTL by interval for the eight different NAM field trait types.

Very few intervals show a five-fold higher proportion of pleiotropy over their permutations (right of the vertical dashed line) in the NAM field traits. Each value along the x-axis was calculated within a trait category from the natural log of the number of observed traits mapping to each interval divided by the mean count of traits in the permuted data with a pseudo-count of plus one in the numerator and denominator. Values left of the vertical dashed line indicate higher pleiotropy in the permuted data versus the observed data suggesting the prevalence of high noise or no trait-SNP associations in either the observed or permuted data (peak at zero).

https://doi.org/10.1371/journal.pgen.1010664.s011

(TIF)

S3 Fig. Distribution of QTL by interval for the eight different GAP field trait types.

Very few intervals show a five-fold higher proportion of pleiotropy over their permutations (right of the vertical dashed line) in the GAP field traits. Each value along the x-axis was calculated within a trait category from the natural log of the number of observed traits mapping to each interval divided by the mean count of traits in the permuted data with a pseudo-count of plus one in the numerator and denominator. Values left of the vertical dashed line indicate higher pleiotropy in the permuted data versus the observed data suggesting the prevalence of high noise or no trait-SNP associations in either the observed or permuted data (peak at zero).

https://doi.org/10.1371/journal.pgen.1010664.s012

(TIF)

S4 Fig. Full correlation matrix between terms used within the interval-based random forest and gradient boosting models.

Only the first permuted value from each of the four population-trait categories was included in the correlation matrix for simplicity.

https://doi.org/10.1371/journal.pgen.1010664.s013

(TIF)

S5 Fig. Relative importance scores from random forest models of biological and nuisance features in explaining interval-based permuted pleiotropy values for Model 2 (random forest without gene ontology terms).

Across all four population-trait categories, nuisance variables showed higher relative importance over biological features. The plots show the observed data for the (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression data. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s014

(TIF)

S6 Fig. Predicted versus observed pleiotropy values for Model 2 (random forest without gene ontology terms) from eight individual random forest models trained on interval-based pleiotropy values.

The dashed line represents the 1–1 identity line, while the solid line represents fitted values. Panels (a), (c), (e), and (g) show the observed results while panels (b), (d), (f), and (h) show the permuted results. Panels (a) and (b) show NAM field, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression. The plots show the observed and predicted values across all held-out chromosomes from the leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s015

(TIF)

S7 Fig. NAM field relative importance scores from random forest models of biological and nuisance features in explaining interval-based observed pleiotropy values for Model 2 (random forest without gene ontology terms) for the eight individual trait types.

Across the eight trait types, nuisance variables showed higher relative importance over biological features. The plots show the observed data for the NAM field results (a) across all traits, (b) flowering, (c) tassel, (d) ear, (e) height, (f) disease, (g) vegetative, (h) leaf, and (i) kernel traits. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s016

(TIF)

S8 Fig. NAM field relative importance scores from random forest models of biological and nuisance features in explaining interval-based permuted pleiotropy values for Model 2 (random forest without gene ontology terms) for the eight individual trait types.

Across the eight trait types, nuisance variables showed higher relative importance over biological features. The plots show the permuted data for the NAM field results (a) across all traits, (b) flowering, (c) tassel, (d) ear, (e) height, (f) disease, (g) vegetative, (h) leaf, and (i) kernel traits. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s017

(TIF)

S9 Fig. GAP field relative importance scores from random forest models of biological and nuisance features in explaining interval-based observed pleiotropy values for Model 2 (random forest without gene ontology terms) for the eight individual trait types.

Across the eight trait types, nuisance variables showed higher relative importance over biological features. The plots show the observed data for the GAP field results (a) across all traits, (b) flowering, (c) tassel, (d) ear, (e) height, (f) disease, (g) vegetative, (h) leaf, and (i) kernel traits. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s018

(TIF)

S10 Fig. GAP field relative importance scores from random forest models of biological and nuisance features in explaining interval-based permuted pleiotropy values for Model 2 (random forest without gene ontology terms) for the eight individual trait types.

Across the eight trait types, nuisance variables showed higher relative importance over biological features. The plots show the data for the GAP field results (a) across all traits, (b) flowering, (c) tassel, (d) ear, (e) height, (f) disease, (g) vegetative, (h) leaf, and (i) kernel traits. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s019

(TIF)

S11 Fig. Relative importance scores from random forest models of biological, nuisance, and gene ontology features in explaining interval-based observed and permuted pleiotropy values for Model 3 (random forest with gene ontology terms).

Across all four population-trait categories, nuisance variables showed higher relative importance over biological features. The plots show data for the observed data in panels (a), (c), (e), and (g) and the permuted data in panels (b), (d), (f), and (h). Panels (a) and (b) show NAM field results, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression data. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s020

(TIF)

S12 Fig. Predicted versus observed pleiotropy values for Model 3 (random forest with gene ontology terms) from eight individual random forest models trained on interval-based pleiotropy values.

The dashed line represents the 1–1 identity line, while the solid line represents fitted values. Panels (a), (c), (e), and (g) show the observed results while panels (b), (d), (f), and (h) show the permuted results. Panels (a) and (b) show NAM field, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression. The plots show the observed and predicted values across all held-out chromosomes from the leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s021

(TIF)

S13 Fig. Relative importance scores from random forest models of biological, nuisance, and gene ontology features in explaining interval-based observed and permuted pleiotropy values for Model 4 (gradient boosting).

Across all four population-trait categories, nuisance variables results showed higher relative importance over biological features. The plots show data for the observed data in panels (a), (c), (e), and (g) and the permuted data in panels (b), (d), (f), and (h). Panels (a) and (b) show NAM field results, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression data. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s022

(TIF)

S14 Fig. Predicted versus observed pleiotropy values for Model 4 (gradient boosting) from eight individual random forest models trained on interval-based pleiotropy.

The dashed line represents the 1–1 identity line, while the solid line represents fitted values. Panels (a), (c), (e), and (g) show the observed results while panels (b), (d), (f), and (h) show the permuted results. Panels (a) and (b) show NAM field, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression. The plots show the observed and predicted values across all held-out chromosomes from the leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s023

(TIF)

S15 Fig. Relative importance scores from random forest models of biological, nuisance, and gene ontology features in explaining SNP-based observed and permuted pleiotropy values for a reduced version of Model 4 (gradient boosting).

Across all four population-trait categories, nuisance variables showed higher relative importance over biological features. The plots show data for the observed data in panels (a), (c), (e), and (g) and the permuted data in panels (b), (d), (f), and (h). Panels (a) and (b) show NAM field results, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression data. The bar charts depict the mean relative importance and standard error of each variable from a leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s024

(TIF)

S16 Fig. Predicted versus observed pleiotropy values for a reduced version of Model 4 (gradient boosting) from eight individual random forest models trained on SNP-based pleiotropy.

The dashed line represents the 1–1 identity line, while the solid line represents fitted values. Panels (a), (c), (e), and (g) show the observed results while panels (b), (d), (f), and (h) show the permuted results. Panels (a) and (b) show NAM field, (c) and (d) GAP field, (e) and (f) GAP mass features, and (g) and (h) GAP expression. The plots show the observed and predicted values across all held-out chromosomes from the leave-one-chromosome-out model.

https://doi.org/10.1371/journal.pgen.1010664.s025

(TIF)

S17 Fig. There is a small, negative relationship between interval-based adjusted pleiotropy and observed pleiotropy in the four population-trait categories.

The scatter plots show the relationship between the adjusted and observed pleiotropy values for (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression traits. Values in the top right of each plot show the R2 values from correlating the adjusted versus unadjusted pleiotropy data.

https://doi.org/10.1371/journal.pgen.1010664.s026

(TIF)

S18 Fig. Interval-based adjusted pleiotropy is not strongly correlated with the max B73 RNA expression across 23 diverse tissues.

The line plots of adjusted pleiotropy for only genic ranges against the max RNA expression show the observed (gray lines) and permuted data (yellow lines). Panels show adjusted pleiotropy scores for (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression traits. Values in the top right of each plot show the R2 values from correlating the observed genic and permuted genic adjusted pleiotropy data separately against the max RNA expression value across 23 B73 tissues. Shading around lines shows the 95% confidence interval.

https://doi.org/10.1371/journal.pgen.1010664.s027

(TIF)

S19 Fig. Interval-based adjusted pleiotropy is not correlated with the mean GERP score.

Line plots show adjusted pleiotropy against the mean GERP score for all genic and intergenic intervals. Mean GERP was calculated by averaging overlapping observed GWA SNPs with GERP SNPs for each population-trait category. Results for the observed data are in gray, while permuted results are in yellow. Panels show adjusted pleiotropy scores for (a) NAM field, (b) GAP field, (c) GAP mass features, and (d) GAP expression traits. Values in the top right of each plot show the R2 values from correlating the observed and permuted adjusted pleiotropy data against the mean GERP score. Shading around lines shows the 95% confidence interval.

https://doi.org/10.1371/journal.pgen.1010664.s028

(TIF)

S20 Fig. Gene ontology enrichment for highly and lowly pleiotropic intervals for the four population-trait categories falling within the top 90th percentile and bottom 10th percentile of adjusted pleiotropy values for each population-trait category.

All molecular function, biological process, and cellular component terms were tested. Terms with FDR corrected p-values below p-value < 0.05 were retained. The x-axis represents highly or lowly pleiotropic intervals split by the population and trait type, and the y-axis shows the top significant gene ontology terms. The value of the FDR significance level is colored in blue and the size of the dots represents the proportion of genes found in that GO category.

https://doi.org/10.1371/journal.pgen.1010664.s029

(TIF)

S21 Fig. Gene ontology enrichment for highly and lowly pleiotropic intervals for the four population-trait categories falling within the top 99th percentile and bottom 1st percentile of adjusted pleiotropy values for each population-trait category.

All molecular function, biological process, and cellular component terms were tested. Terms with FDR corrected p-values below p-value < 0.05 were retained. The x-axis represents highly or lowly pleiotropic intervals split by the population and trait type, and the y-axis shows the top significant gene ontology terms. The value of the FDR significance level is colored in blue and the size of the dots represents the proportion of genes found in that GO category.

https://doi.org/10.1371/journal.pgen.1010664.s030

(TIF)

Acknowledgments

We thank Sara Miller for copy editing and Michelle Stitzer for their suggestions in the preparation of this manuscript.

References

  1. 1. Stearns FW. One hundred years of pleiotropy: a retrospective. Genetics. 2010;186: 767–773. pmid:21062962
  2. 2. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14: 483–495. pmid:23752797
  3. 3. Buckler ES, Holland JB, Bradbury PJ, Acharya CB, Brown PJ, Browne C, et al. The genetic architecture of maize flowering time. Science. 2009;325: 714–718. pmid:19661422
  4. 4. Phillips RL, Kim TS, Kaeppler SM, Parentoni SN, Shaver L, Stucker RE, et al. Genetic dissection of maturity using RFLPs. Embrapa Milho e Sorgo-Artigo em anais de congresso (ALICE). 1992. pp. 135–150.
  5. 5. Vlăduţu C, McLaughlin J, Phillips RL. Fine mapping and characterization of linked quantitative trait loci involved in the transition of the maize apical meristem from vegetative to generative structures. Genetics. 1999;153: 993–1007. pmid:10511573
  6. 6. Salvi S, Tuberosa R, Chiapparino E, Maccaferri M, Veillet S, van Beuningen L, et al. Toward positional cloning of Vgt1, a QTL controlling the transition from the vegetative to the reproductive phase in maize. Plant Mol Biol. 2002;48: 601–613. pmid:11999837
  7. 7. Salvi S, Sponza G, Morgante M, Tomes D, Niu X, Fengler KA, et al. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proc Natl Acad Sci U S A. 2007;104: 11376–11381. pmid:17595297
  8. 8. Ducrocq S, Madur D, Veyrieras J-B, Camus-Kulandaivelu L, Kloiber-Maitz M, Presterl T, et al. Key impact of Vgt1 on flowering time adaptation in maize: evidence from association mapping and ecogeographical information. Genetics. 2008;178: 2433–2437. pmid:18430961
  9. 9. Studer A, Zhao Q, Ross-Ibarra J, Doebley J. Identification of a functional transposon insertion in the maize domestication gene tb1. Nat Genet. 2011;43: 1160–1163. pmid:21946354
  10. 10. Doebley J, Stec A, Hubbard L. The evolution of apical dominance in maize. Nature. 1997;386: 485–488. pmid:9087405
  11. 11. Weber A, Clark RM, Vaughn L, Sánchez-Gonzalez J de J, Yu J, Yandell BS, et al. Major regulatory genes in maize contribute to standing variation in teosinte (Zea mays ssp. parviglumis). Genetics. 2007;177: 2349–2359. pmid:17947410
  12. 12. Doebley J, Stec A, Gustus C. teosinte branched1 and the origin of maize: evidence for epistasis and the evolution of dominance. Genetics. 1995;141: 333–346. Available: https://www.ncbi.nlm.nih.gov/pubmed/8536981 pmid:8536981
  13. 13. Clark RM, Wagler TN, Quijada P, Doebley J. A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and inflorescent architecture. Nat Genet. 2006;38: 594–597. pmid:16642024
  14. 14. Studer AJ, Doebley JF. Do large effect QTL fractionate? A case study at the maize domestication QTL teosinte branched1. Genetics. 2011;188: 673–681. pmid:21515578
  15. 15. Chen Y, Lübberstedt T. Molecular basis of trait correlations. Trends Plant Sci. 2010;15: 454–461. pmid:20542719
  16. 16. Brown PJ, Upadyayula N, Mahone GS, Tian F, Bradbury PJ, Myles S, et al. Distinct genetic architectures for male and female inflorescence traits of maize. PLoS Genet. 2011;7: e1002383. pmid:22125498
  17. 17. Mural RV, Sun G, Grzybowski M, Tross MC, Jin H, Smith C, et al. Association mapping across a multitude of traits collected in diverse environments in maize. Gigascience. 2022;11. pmid:35997208
  18. 18. Bouchet S, Bertin P, Presterl T, Jamin P, Coubriche D, Gouesnard B, et al. Association mapping for phenology and plant architecture in maize shows higher power for developmental traits compared with growth influenced traits. Heredity. 2017;118: 249–259. pmid:27876803
  19. 19. Colasanti J, Muszynski M. The Maize Floral Transition. In: Bennetzen JL, Hake SC, editors. Handbook of Maize: Its Biology. New York, NY: Springer; 2009. pp. 41–55. https://link.springer.com/chapter/10.1007/978-0-387-79418-1_3
  20. 20. Li D, Wang X, Zhang X, Chen Q, Xu G, Xu D, et al. The genetic architecture of leaf number and its genetic relationship to flowering time in maize. New Phytol. 2016;210: 256–268. pmid:26593156
  21. 21. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat Genet. 2011;43: 159–162. pmid:21217756
  22. 22. Mural RV, Grzybowski M, Miao C, Damke A, Sapkota S, Boyles RE, et al. Meta-Analysis Identifies Pleiotropic Loci Controlling Phenotypic Trade-offs in Sorghum. Genetics. 2021. pmid:34100945
  23. 23. Stitzer MC, Ross-Ibarra J. Maize domestication and gene interaction. New Phytol. 2018;220: 395–408. pmid:30035321
  24. 24. Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322: 881–888. pmid:18988837
  25. 25. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019. pmid:31068683
  26. 26. Shikov AE, Skitchenko RK, Predeus AV, Barbitoff YA. Phenome-wide functional dissection of pleiotropic effects highlights key molecular pathways for human complex traits. Sci Rep. 2020;10: 1037. pmid:31974475
  27. 27. Chesmore K, Bartlett J, Williams SM. The ubiquity of pleiotropy in human disease. Hum Genet. 2018;137: 39–44. pmid:29164333
  28. 28. Jordan DM, Verbanck M, Do R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol. 2019;20: 222. pmid:31653226
  29. 29. Watanabe K, Stringer S, Frei O, Umićević Mirkov M, de Leeuw C, Polderman TJC, et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2019. pmid:31427789
  30. 30. Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, et al. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89: 607–618. pmid:22077970
  31. 31. Su Z, Zeng Y, Gu X. A preliminary analysis of gene pleiotropy estimated from protein sequences. J Exp Zool B Mol Dev Evol. 2010;314: 115–122. pmid:19637279
  32. 32. Bomblies K, Doebley JF. Pleiotropic effects of the duplicate maize FLORICAULA/LEAFY genes zfl1 and zfl2 on traits under selection during maize domestication. Genetics. 2006;172: 519–531. pmid:16204211
  33. 33. Foster T, Hay A, Johnston R, Hake S. The establishment of axial patterning in the maize leaf. Development. 2004;131: 3921–3929. pmid:15253937
  34. 34. Pan Q, Xu Y, Li K, Peng Y, Zhan W, Li W, et al. The Genetic Basis of Plant Architecture in 10 Maize Recombinant Inbred Line Populations. Plant Physiol. 2017;175: 858–873. pmid:28838954
  35. 35. Yang J, Liu Z, Chen Q, Qu Y, Tang J, Lübberstedt T, et al. Mapping of QTL for Grain Yield Components Based on a DH Population in Maize. Sci Rep. 2020;10: 7086. pmid:32341398
  36. 36. Rice BR, Fernandes SB, Lipka AE. Multi-Trait Genome-wide Association Studies Reveal Loci Associated with Maize Inflorescence and Leaf Architecture. Plant Cell Physiol. 2020. pmid:32186727
  37. 37. Zhou S, Kremling KA, Bandillo N, Richter A, Zhang YK, Ahern KR, et al. Metabolome-Scale Genome-Wide Association Studies Reveal Chemical Diversity and Genetic Control of Maize Specialized Metabolites. Plant Cell. 2019;31: 937–955. pmid:30923231
  38. 38. Kremling KAG, Chen S-Y, Su M-H, Lepak NK, Romay MC, Swarts KL, et al. Dysregulation of expression correlates with rare-allele burden and fitness loss in maize. Nature. 2018;555: 520–523. pmid:29539638
  39. 39. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38: 203–208. pmid:16380716
  40. 40. Walley JW, Sartor RC, Shen Z, Schmitz RJ, Wu KJ, Urich MA, et al. Integration of omic networks in a developmental atlas of maize. Science. 2016;353: 814–818. pmid:27540173
  41. 41. Kistler L, Yoshi Maezumi S, de Souza JG, Przelomska NAS, Costa FM, Smith O, et al. Multi-proxy evidence highlights a complex evolutionary legacy of maize in South America. Dryad; 2018. pmid:30545889
  42. 42. Lu Z, Marand AP, Ricci WA, Ethridge CL, Zhang X, Schmitz RJ. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat Plants. 2019;5: 1250–1259. pmid:31740772
  43. 43. Lang JD, Ray S, Ray A. Sin1, a Mutation Affecting Female Fertility in Arabidopsis, Interacts with Mod1, Its Recessive Modifier. Genetics. 1994;137: 1101–1110. pmid:7982564
  44. 44. Wang Z, Liao B-Y, Zhang J. Genomic patterns of pleiotropy and the evolution of complexity. Proc Natl Acad Sci U S A. 2010;107: 18034–18039. pmid:20876104
  45. 45. Albert AYK, Sawaya S, Vines TH, Knecht AK, Miller CT, Summers BR, et al. The genetics of adaptive shape shift in stickleback: pleiotropy and effect size. Evolution. 2008;62: 76–85. pmid:18005154
  46. 46. Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES. Association mapping across numerous traits reveals patterns of functional variation in maize. PLoS Genet. 2014;10: e1004845. pmid:25474422
  47. 47. Zhao X, Niu Y. The Combination of Conventional QTL Analysis, Bulked-Segregant Analysis, and RNA-Sequencing Provide New Genetic Insights into Maize Mesocotyl Elongation under Multiple Deep-Seeding Environments. Int J Mol Sci. 2022;23. pmid:35457037
  48. 48. Nazipova A, Gorshkov O, Eneyskaya E, Petrova N, Kulminskaya A, Gorshkova T, et al. Forgotten Actors: Glycoside Hydrolases During Elongation Growth of Maize Primary Root. Front Plant Sci. 2021;12: 802424. pmid:35222452
  49. 49. Fu S, Scanlon MJ. Clonal mosaic analysis of EMPTY PERICARP2 reveals nonredundant functions of the duplicated HEAT SHOCK FACTOR BINDING PROTEINs during maize shoot development. Genetics. 2004;167: 1381–1394. pmid:15280250
  50. 50. Fu S, Meeley R, Scanlon MJ. Empty pericarp2 encodes a negative regulator of the heat shock response and is required for maize embryogenesis. Plant Cell. 2002;14: 3119–3132. pmid:12468731
  51. 51. Fu S, Rogowsky P, Nover L, Scanlon MJ. The maize heat shock factor-binding protein paralogs EMP2 and HSBP2 interact non-redundantly with specific heat shock factors. Planta. 2006;224: 42–52. pmid:16331466
  52. 52. Stanley AE, Menkir A, Ifie B, Paterne AA, Unachukwu NN, Meseka S, et al. Association analysis for resistance to Striga hermonthica in diverse tropical maize inbred lines. Sci Rep. 2021;11: 24193. pmid:34921181
  53. 53. Li J, Li D, Espinosa CZ, Pastor VT, Rasheed A, Rojas NP, et al. Genome-wide analyses reveal footprints of divergent selection and popping-related traits in CIMMYT’s maize inbred lines. Journal of Experimental Botany. 2021. pp. 1307–1320. pmid:33070191
  54. 54. Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet. 2016;48: 709–717. pmid:27182965
  55. 55. Matsuoka Y, Vigouroux Y, Goodman MM, Sanchez G J, Buckler E, Doebley J. A single domestication for maize shown by multilocus microsatellite genotyping. Proceedings of the National Academy of Sciences. 2002;99: 6080–6084. pmid:11983901
  56. 56. Piperno DR, Ranere AJ, Holst I, Iriarte J, Dickau R. Starch grain and phytolith evidence for early ninth millennium B.P. maize from the Central Balsas River Valley, Mexico. Proc Natl Acad Sci U S A. 2009;106: 5019–5024. pmid:19307570
  57. 57. Fisher RA. The genetical theory of natural selection. The Clarendon Press; 1958.
  58. 58. Orr HA. Adaptation and the cost of complexity. Evolution. 2000;54: 13–20. pmid:10937178
  59. 59. Wallace JG, Larsson SJ, Buckler ES. Entering the second century of maize quantitative genetics. Heredity. 2014;112: 30–38. pmid:23462502
  60. 60. Xie L-H, Zhu Y-J, Tang S-Q, Wei X-J, Sheng Z-H, Jiao G-A, et al. Pleiotropic Effects of Rice Florigen Gene RFT1 on the Amino Acid Content of Unmilled Rice. Front Genet. 2020;11: 13. pmid:32076435
  61. 61. Ponce KS, Ye G, Zhao X. QTL Identification for Cooking and Eating Quality in indica Rice Using Multi-Parent Advanced Generation Intercross (MAGIC) Population. Front Plant Sci. 2018;9: 868. pmid:30042770
  62. 62. Fan X, Cui F, Ji J, Zhang W, Zhao X, Liu J, et al. Dissection of Pleiotropic QTL Regions Controlling Wheat Spike Characteristics Under Different Nitrogen Treatments Using Traditional and Conditional QTL Mapping. Front Plant Sci. 2019;10: 187. pmid:30863417
  63. 63. Vishnukiran T, Neeraja CN, Jaldhani V, Vijayalakshmi P, Raghuveer Rao P, Subrahmanyam D, et al. A major pleiotropic QTL identified for yield components and nitrogen content in rice (Oryza sativa L.) under differential nitrogen field conditions. PLoS One. 2020;15: e0240854. pmid:33079957
  64. 64. Li X, Tian R, Kamala S, Du H, Li W, Kong Y, et al. Identification and verification of pleiotropic QTL controlling multiple amino acid contents in soybean seed. Euphytica. 2018;214: 93.
  65. 65. Boehm FJ, Chesler EJ, Yandell BS, Broman KW. Testing Pleiotropy vs. Separate QTL in Multiparental Populations. G3. 2019;9: 2317–2324. pmid:31092608
  66. 66. Paaby AB, Rockman MV. The many faces of pleiotropy. Trends Genet. 2013;29: 66–73. pmid:23140989
  67. 67. McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, et al. Genetic properties of the maize nested association mapping population. Science. 2009;325: 737–740. pmid:19661427
  68. 68. Flint-Garcia SA, Thuillet A-C, Yu J, Pressoir G, Romero SM, Mitchell SE, et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J. 2005;44: 1054–1064. pmid:16359397
  69. 69. Hung H-Y, Shannon LM, Tian F, Bradbury PJ, Chen C, Flint-Garcia SA, et al. ZmCCT and the genetic basis of day-length adaptation underlying the postdomestication spread of maize. Proc Natl Acad Sci U S A. 2012;109: E1913–21. pmid:22711828
  70. 70. Hung H-Y, Browne C, Guill K, Coles N, Eller M, Garcia A, et al. The relationship between parental genetic or phenotypic divergence and progeny variation in the maize nested association mapping population. Heredity. 2012;108: 490–499. pmid:22027895
  71. 71. Peiffer JA, Flint-Garcia SA, De Leon N, McMullen MD, Kaeppler SM, Buckler ES. The genetic architecture of maize stalk strength. PLoS One. 2013;8: e67066. pmid:23840585
  72. 72. Peiffer JA, Romay MC, Gore MA, Flint-Garcia SA, Zhang Z, Millard MJ, et al. The genetic architecture of maize height. Genetics. 2014;196: 1337–1356. pmid:24514905
  73. 73. Foerster JM, Beissinger T, de Leon N, Kaeppler S. Large effect QTL explain natural phenotypic variation for the developmental timing of vegetative phase change in maize (Zea mays L.). Theor Appl Genet. 2015;128: 529–538. pmid:25575839
  74. 74. Leiboff S, Li X, Hu H-C, Todt N, Yang J, Li X, et al. Genetic control of morphometric diversity in the maize shoot apical meristem. Nat Commun. 2015;6: 8974. pmid:26584889
  75. 75. Krill AM, Kirst M, Kochian LV, Buckler ES, Hoekenga OA. Association and linkage analysis of aluminum tolerance genes in maize. PLoS One. 2010;5: e9958. pmid:20376361
  76. 76. Poland JA, Bradbury PJ, Buckler ES, Nelson RJ. Genome-wide nested association mapping of quantitative resistance to northern leaf blight in maize. Proc Natl Acad Sci U S A. 2011;108: 6893–6898. pmid:21482771
  77. 77. Bian Y, Yang Q, Balint-Kurti PJ, Wisser RJ, Holland JB. Limits on the reproducibility of marker associations with southern leaf blight resistance in the maize nested association mapping population. BMC Genomics. 2014;15: 1068. pmid:25475173
  78. 78. Olukolu BA, Wang G-F, Vontimitta V, Venkata BP, Marla S, Ji J, et al. A genome-wide association study of the maize hypersensitive defense response identifies genes that cluster in related pathways. PLoS Genet. 2014;10: e1004562. pmid:25166276
  79. 79. Benson JM, Poland JA, Benson BM, Stromberg EL, Nelson RJ. Resistance to gray leaf spot of maize: genetic architecture and mechanisms elucidated through nested association mapping and near-isogenic line analysis. PLoS Genet. 2015;11: e1005045. pmid:25764179
  80. 80. Olukolu BA, Bian Y, De Vries B, Tracy WF, Wisser RJ, Holland JB, et al. The Genetics of Leaf Flecking in Maize and Its Relationship to Plant Defense and Disease Resistance. Plant Physiol. 2016;172: 1787–1803. pmid:27670817
  81. 81. Olukolu BA, Negeri A, Dhawan R, Venkata BP, Sharma P, Garg A, et al. A connected set of genes associated with programmed cell death implicated in controlling the hypersensitive response in maize. Genetics. 2013;193: 609–620. pmid:23222653
  82. 82. Samayoa LF, Malvar RA, Olukolu BA, Holland JB, Butrón A. Genome-wide association study reveals a set of genes associated with resistance to the Mediterranean corn borer (Sesamia nonagrioides L.) in a maize diversity panel. BMC Plant Biol. 2015;15: 35. pmid:25652257
  83. 83. Hu Y, Ren J, Peng Z, Umana AA, Le H, Danilova T, et al. Analysis of Extreme Phenotype Bulk Copy Number Variation (XP-CNV) Identified the Association of rp1 with Resistance to Goss’s Wilt of Maize. Front Plant Sci. 2018;9: 110. pmid:29479358
  84. 84. Cook JP, McMullen MD, Holland JB, Tian F, Bradbury P, Ross-Ibarra J, et al. Genetic architecture of maize kernel composition in the nested association mapping and inbred association panels. Plant Physiol. 2012;158: 824–834. pmid:22135431
  85. 85. Diepenbrock CH, Kandianis CB, Lipka AE, Magallanes-Lundback M, Vaillancourt B, Góngora-Castillo E, et al. Novel Loci Underlie Natural Variation in Vitamin E Levels in Maize Grain. Plant Cell. 2017;29: 2374–2392. pmid:28970338
  86. 86. Harjes CE, Rocheford TR, Bai L, Brutnell TP, Kandianis CB, Sowinski SG, et al. Natural genetic variation in lycopene epsilon cyclase tapped for maize biofortification. Science. 2008;319: 330–333. pmid:18202289
  87. 87. Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM, et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 2013;14: R55. pmid:23759205
  88. 88. Lipka AE, Gore MA, Magallanes-Lundback M, Mesberg A, Lin H, Tiede T, et al. Genome-wide association study and pathway-level analysis of tocochromanol levels in maize grain. G3. 2013;3: 1287–1299. pmid:23733887
  89. 89. Owens BF, Lipka AE, Magallanes-Lundback M, Tiede T, Diepenbrock CH, Kandianis CB, et al. A foundation for provitamin A biofortification of maize: genome-wide association and genomic prediction models of carotenoid levels. Genetics. 2014;198: 1699–1716. pmid:25258377
  90. 90. Shrestha V, Yobi A, Slaten ML, Chan YO, Holden S, Gyawali A, et al. Multiomics approach reveals a role of translational machinery in shaping maize kernel amino acid composition. Plant Physiol. 2022;188: 111–133. pmid:34618082
  91. 91. Stegle O, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7: 500–507. pmid:22343431
  92. 92. Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Hum Genet. 2018;103: 338–348. pmid:30100085
  93. 93. Ramstein GP, Larsson SJ, Cook JP, Edwards JW, Ersoz ES, Flint-Garcia S, et al. Dominance Effects and Functional Enrichments Improve Prediction of Agronomic Traits in Hybrid Maize. Genetics. 2020;215: 215–230. pmid:32152047
  94. 94. Privé F, Luu K, Blum MGB, McGrath JJ, Vilhjálmsson BJ. Efficient toolkit implementing best practices for principal component analysis of population genetic data. Bioinformatics. 2020. pmid:32415959
  95. 95. Monier B, Casstevens TM, Bradbury PJ, Buckler ES. rTASSEL: An R interface to TASSEL for analyzing genomic diversity. J Open Source Softw. 2022;7: 4530.
  96. 96. Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28: 1353–1358. pmid:22492648
  97. 97. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9: e1003118. pmid:23950696
  98. 98. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23: 2633–2635. pmid:17586829
  99. 99. Wright MN, Ziegler A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, Articles. 2017;77: 1–17.
  100. 100. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery; 2016. pp. 785–794. 10.1145/2939672.2939785
  101. 101. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25: 25–29. pmid:10802651
  102. 102. Alexa A, Rahnenfuhrer J. topGO: Enrichment Analysis for Gene Ontology. In: Bioconductor [Internet]. [cited 15 Apr 2022]. Available: https://bioconductor.org/packages/release/bioc/html/topGO.html
  103. 103. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York; 2016. https://ggplot2.tidyverse.org