Integrating transcriptomic network reconstruction and eQTL analyses reveals mechanistic connections between genomic architecture and Brassica rapa development

Robert L. Baker; Wen Fung Leong; Marcus T. Brock; Matthew J. Rubin; R. J. Cody Markelz; Stephen Welch; Julin N. Maloof; Cynthia Weinig

doi:10.1371/journal.pgen.1008367

Abstract

Plant developmental dynamics can be heritable, genetically correlated with fitness and yield, and undergo selection. Therefore, characterizing the mechanistic connections between the genetic architecture governing plant development and the resulting ontogenetic dynamics of plants in field settings is critically important for agricultural production and evolutionary ecology. We use hierarchical Bayesian Function-Valued Trait (FVT) models to estimate Brassica rapa growth curves throughout ontogeny, across two treatments, and in two growing seasons. We find genetic variation for plasticity of growth rates and final sizes, but not the inflection point (transition from accelerating to decelerating growth) of growth curves. There are trade-offs between growth rate and duration, indicating that selection for maximum yields at early harvest dates may come at the expense of late harvest yields and vice versa. We generate eigengene modules and determine which are co-expressed with FVT traits using a Weighted Gene Co-expression Analysis. Independently, we seed a Mutual Rank co-expression network model with FVT traits to identify specific genes and gene networks related to FVT. GO-analyses of eigengene modules indicate roles for actin/cytoskeletal genes, herbivore resistance/wounding responses, and cell division, while MR networks demonstrate a close association between metabolic regulation and plant growth. We determine that combining FVT Quantitative Trait Loci (QTL) and MR genes/WGCNA eigengene expression profiles better characterizes phenotypic variation than any single data type (i.e. QTL, gene, or eigengene alone). Our network analysis allows us to employ a targeted eQTL analysis, which we use to identify regulatory hotspots for FVT. We examine cis vs. trans eQTL that mechanistically link FVT QTL with structural trait variation. Colocalization of FVT, gene, and eigengene eQTL provide strong evidence for candidate genes influencing plant height. The study is the first to explore eQTL for FVT, and specifically do so in agroecologically relevant field settings.

Author summary

We estimate the developmental dynamics of plant growth using mathematical functions to fit continuous functions to discrete plant height data collected throughout growth, and we use the parameters defining these mathematical functions as data. We identify genomic regions controlling plant growth and filter a novel transcriptomic data set using network reconstruction models to identify the genes and eigengenes associated with plant height. We combine these genomic and transcriptomic data to predict variation in plant height, and we use quantitative genetics to mechanistically connect plant genetics, transcriptomics, and development. Our approach demonstrates two powerful methods for the type of data reduction (FVT modeling and gene expression network reconstruction for targeted eQTL analyses) and data integration that will be necessary for driving forward the field of genetics in the post-genomic era. To the best of our knowledge, we are the first to apply these techniques to continuous models of plant development, and the first to do so in agroecologically relevant field settings.

Citation: Baker RL, Leong WF, Brock MT, Rubin MJ, Markelz RJC, Welch S, et al. (2019) Integrating transcriptomic network reconstruction and eQTL analyses reveals mechanistic connections between genomic architecture and Brassica rapa development. PLoS Genet 15(9): e1008367. https://doi.org/10.1371/journal.pgen.1008367

Editor: Kirsten Bomblies, Eidgenossische Technische Hochschule Zurich, SWITZERLAND

Received: March 1, 2019; Accepted: August 13, 2019; Published: September 12, 2019

Copyright: © 2019 Baker et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The linkage map used for QTL and eQTL analyses is available in [58]. Replicate level FVT parameters are presented in S7 Table; RIL-specific gene expression values are available via FigShare at https://doi.org/10.6084/m9.figshare.7661093. All other data are included in supplemental files associated with the manuscript.

Funding: This work is supported by National Science Foundation (https://nsf.gov) Grants IOS-1306574 to RLB and IOS-0923752 to CW, SW, and JNM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Plant developmental dynamics are correlated with fitness and yield [1,2]. Therefore, characterizing the mechanistic connections between the genetic architecture governing plant development and the resulting ontogenetic dynamics of plants in field settings is critically important to improving agricultural production and understanding evolutionary fitness. Forward genetic approaches such as quantitative trait mapping are an attractive method of characterizing genetic architecture because they do not require a priori information such as candidate loci and can be used to describe additive effects as well as pleiotropic and epistatic loci [3–5]. Transcriptomic co-expression analyses and expression QTL (eQTL) have also been used to identify the underlying genetic architecture responsible for phenotypic variation [e.g. 6]. Recently, combining information from genomic association studies and transcriptomic expression analyses has been used to pinpoint candidate genes [7–10]. However, co-expression network analyses can also provide insight into the mechanistic connections between QTL genotypes and phenotypes. Here, we ask whether QTL, co-expression analyses, or a combination thereof best predict phenotypic variation. In combination with a targeted eQTL analyses in agroecologically relevant field settings, we characterize the mechanistic connections between the genomic architecture, transcriptomic expression networks, and phenotypic variation throughout plant development.

Development rarely occurs in discrete steps, yet developmental data are typically collected at multiple distinct but inter-dependent time points. Function-Valued Trait (FVT) modeling is one method of estimating the underlying continuous nature of development and avoiding complicated repeated measures statistics, which often compromise statistical power in downstream analyses [11,12]. One approach to FVT modeling involves fitting mathematical functions to discrete data to estimate continuous curves that represent the change of a trait or character as a function, typically of time [13–15]. Although there are multiple approaches to modeling continuous growth, one particular advantage of FVT modeling is that parameters describing developmental growth curves can be extracted from the FVT models and used as biologically interpretable and inter-relatable traits such as the relationship between growth rates, durations, inflection points denoting cessation of growth, and final sizes. This ‘parameters as data’ approach enables a broad array of analyses at both genetic and phenotypic levels [2,16]. In the current study, we employ a Bayesian hierarchical approach to FVT modeling that utilizes global information from the entire dataset as well as each genotype to estimate replicate-level parameters describing growth curves that underlie the developmental dynamics of plant height.

One inherent but seldom addressed complication in studying developmental genetics is that development of a given trait rarely occurs independently of organism-level attributes. For instance, in plants the pool of available carbon can severely limit and alter organismal-level development including aspects of determinate structures such as leaves [17,18] and indeterminant growth such as plant height [19]. Further, including physiological parameters such as carbon assimilation in plant breeding models is predicted to accelerate and improve yield gains [20]. One solution is using a hierarchical Bayesian approach to FVT modeling that incorporates genotype-specific values for organism-level physiological conditions such as carbon availability (estimated, for instance, using maximum photosynthetic capacity, A_max) to statistically factor out variation caused by resource availability. Accounting for carbon availability in FVT parameter estimation can increase estimates of heritability and improve QTL mapping results [21,22].

QTL mapping provides a well-tested method of uncovering the genetic architecture of complex Function-Valued Traits (FVT). FVT variation may arise from structural or regulatory genes that differ among sampled genotypes. Examining gene expression can therefore provide insight into the mechanistic connections between genomic architecture and developmental dynamics of phenotypes [23–26]. We use Mutual Rank (MR) and Weighted Gene Co-expression Network Analyses (WGCNA) to identify expression networks associated with FVT trait variation. These networks are then used to focus our analysis to specific sub-sets of biologically relevant expression traits for eQTL mapping [27,28]. Interestingly, the genomic architecture of eQTL appears to depart from that of other phenotypic QTL such as FVT QTL in two important respects: first, gene expression traits tend to have only one or a few eQTL whereas morphological phenotypic traits are often highly polygenic [29]. Second, eQTL from multiple expression traits in diverse taxa from yeast to Brassica can be highly colocalized into eQTL “hotspots”. These hotspots may indicate a regulatory gene or switch that has a disproportionate impact on downstream gene expression [30–32]. In contrast, QTL for morphological traits may colocalize, but typically they do not do so to the same extent [31,33]. Whether general eQTL trends hold for targeted expression traits in agroecologically relevant field settings remains unknown. Further, to the best of our knowledge eQTL mapping has not been used to examine the mechanistic basis of developmental morphology captured via function-valued trait modeling.

Here, we estimate continuous developmental growth curves of plant height, a trait that when selected upon can lead to more effective increases in yield than directly selecting on yield itself [34], in a set of Brassica rapa Recombinant Inbred Lines (RILs) while mathematically factoring out the effects of carbon availability. We examine the patterns of genetic correlations among parameters describing change in height over time such as growth duration and final plant size, and we ask whether these developmental parameters correlate with yields. Using QTL mapping, we outline the genetic architecture of plant height development. Next, we use MR and WGCNA to identify genes and gene network modules whose expression patterns correlate with FVT parameters. We compare the predictive capacity of QTL and co-expression approaches in two ways: first, we test the relative effectiveness of QTL vs. MR genes vs. WGCNA module eigengenes (and combinations thereof) in explaining genetic variation of developmental traits. Second, we test whether QTL for FVT traits are enriched for genes identified via co-expression approaches. To explore the mechanistic basis of FVT QTL, we perform eQTL mapping on our MR genes and WGCNA module eigengenes. For eQTL and FVT QTL that colocalize, we explore the relative proportion cis- vs. trans-eQTL and their effect sizes. We ask whether eQTL colocalize to regulatory hotspots and if so how these compare to FVT QTL. Our eQTL analysis offers an additional line of inference for candidate gene identification as well as a potential mechanistic explanation for the regulation of yield-related FVT QTL.

Results

Function-Valued Traits (FVTs)

Brassica rapa Recombinant Inbred Lines from the IMB11xR500 cross were grown in the field in 2011 and 2012. In each year, there were two treatments: crowded and uncrowded. Multiple replicates of the full RIL set were planted out for each combination of year and treatment. FVT modeling was conducted based on replicate-level data, the data were sufficient to support all aspects of the growth curves modeled, and the models fit the data well (Fig 1 for example model fits). Plots for all FVT models can be found in S1 Fig and a conceptual overview of the analyses performed is presented in S2 Fig.

Download:

Fig 1.

Representative genotypes (A, IMB211; B, R500) of Bayesian FVT trait estimation approaches for uncrowded plants from the 2012 season. Within each panel, dots represent observed data. Colors indicate replicates within each genotype and indicate that each replicate was measured multiple times throughout the growing season. The black line is the Bayesian estimate of logistic growth curve that best represents each genotype. The yellow envelope is a 95% credible envelope for the observed data; the green envelope is a 95% credible envelope for where new data is predicted to occur for a specific genotype and environment combination.

https://doi.org/10.1371/journal.pgen.1008367.g001

Phenotypic plasticity and heritability

To assess the effects of the environment on plastic growth responses, we analyzed raw replicate level data. Although there were main effects of Block (nested within treatment) and genotype (RIL ID) for all traits, there were no significant main effects of crowding (treatment; Table 1). However, there was genetic variation for a plastic response to crowding for all traits except iD (inflection time, in Degree Days; treatment-by-genotype interaction; Table 1).

Download:

Table 1. Phenotypic plasticity and heritabilities of FVT parameters.

Block is nested within the Treatment effect. Treatment corresponds to the crowded and uncrowded treatments in 2012 and Genotype indicates RIL id. Significant effects are emphasized by bold text.

https://doi.org/10.1371/journal.pgen.1008367.t001

Genetic correlations

To explore the genetic relationships among the height FVT parameters and previously published estimates of plant phenology and fitness, we conducted a correlation analysis on BLUPs of each trait. In general, the pattern of genetic correlations within years and treatments was similar. Uncrowded (UN) r from 2012 was correlated with all traits except Hmax (Fig 2). In contrast, Crowded (CR) r in 2012 was negatively correlated with other all other 2012 CR FVT traits, with all CR phenology traits (except the bolting-to-flowering interval) and CR fitness traits (S3 Fig). UNr in 2012 was negatively correlated with UNd and iD but not Hmax. UNr 2012 was also negatively correlated with phenology and fitness. These patterns of genetic correlations are largely consistent across both years and treatments (S3 Fig); a representative subset of correlations from 2012 is presented in Fig 2.

Download:

Fig 2.

Genetic correlations among UN 2012 FVT height, phenology, and fitness traits. Each point is a genotypic mean (BLUP). Bonferroni corrections for multiple tests (n = 7) have been applied. Non-significant correlations are in gray. All time is expressed in Degree Days. * p<0.05, ** p<0.01, *** p<0.001, **** p<0.0001, NS p ≥ 0.05.

https://doi.org/10.1371/journal.pgen.1008367.g002

QTL mapping

To further explore the genetic architecture of the height FVT parameters, we conducted QTL mapping analyses of the height FVT traits. In total we mapped 32 individual QTL from 2012 (2011 FVT QTL are presented in S1 Table); however, an alternative interpretation is that we mapped as few as 9 highly pleiotropic QTL. QTL were observed throughout the genome, except on chromosomes 2, 4, and 8. Most QTL localized to chromosome 3, 9 and 10. Across all traits, each QTL explained 29% of trait variation on average. The minimum explained variance was 9.5% and the maximum was 73% of variance (Fig 3 & S1 Table).

Download:

Fig 3. A map of all QTL identified in 2012.

Horizontal lines on chromosomes indicate the position of RNAseq markers used to genetic map construction. Each QTL is indicated with a vertical arrow under the trait name. Horizontal hatches indicate QTL position, the arrow length indicates 1.5 LOD support limits. Arrow heads and color (up, red = positive; down, blue = negative) indicate QTL direction relative to the R500 parent. Exact locations, markers, and LOD scores for all QTL can be found in S1 (Table).

https://doi.org/10.1371/journal.pgen.1008367.g003

Genes under FVT QTL

To determine positional candidates within mapped FVT QTL, we compared our FVT QTL to the B. rapa genome and identified genes under the QTL. We restricted our search to QTL with LOD > 9 (Table 2). All 9 of these QTL were on either chromosome 3 or 10. Because several of the QTL co-localized (had overlapping 1.5 LOD confidence intervals), we often found the same genes under multiple QTL. After removing duplicate entries, we found 490 unique genes underlying the 9 QTL investigated (S2 Table).

Download:

Table 2. Fishers exact tests for enrichment of FVT QTL for MR-identified genes.

https://doi.org/10.1371/journal.pgen.1008367.t002

RNAseq

We used RNA sequencing (RNAseq) to understand the transcriptomic mechanisms underlying FVT QTL and as an alternative approach for examining the genetic architecture of our FVT traits without a priori knowledge. 21,147 genes of 28,668 genes with detectable expression in UN treatment were differentially expressed among RILs (FDR < 0.01). The 10,000 genes with the most variable expression among RILs were used for downstream network analysis.

Mutual Rank (MR) network analysis

MR Network Analysis is an alternative method that is independent of QTL analysis for identifying genes that may contribute to phenotypic variation. Genes identified as members of MR networks therefore likely contribute to and predict phenotypic variation. To find gene co-expression networks relevant to the FVT model parameters, we built MR networks nucleated on each FVT model parameter and performed permutation analyses to determine the statistical significance of our networks. Ninety-five or more of 100 permutations had zero connections between FVT parameters and gene expression. Therefore, our MR networks are enriched for bona fide connections at a variety of MR threshold cutoffs (The MR30 network is shown in Fig 4; larger networks become difficult to visualize and are presented in S4 Fig). Complete gene membership for all MR-thresholds annotated with the best hit obtained by blastn against the predicted A. thaliana proteome are presented in supplemental materials S3 Table.

Download:

Fig 4. A scale-free diagram of the Mutual Rank network nucleated around FVT traits from 2012 with a cutoff of 30.

Network nodes consist of either FVT traits or co-expressed genes. FVT traits are shown in red circles and genes are indicated in blue circles. Network edges indicate significant correlations. Purple lines indicate positive correlation values while green lines indicate negative correlation values and line thickness corresponds to strength of the correlation. UN, uncrowded; r, growth rate; d, duration of growth; iD, time in degree days when the growth curve reached its inflection point; Hmax, estimated maximum height based on FVT modeling. Additional network cutoffs, 2011, and 2012 crowded networks are in S4 Fig; gene names and annotations are in S3 Table.

https://doi.org/10.1371/journal.pgen.1008367.g004

We used Fisher’s exact test to determine whether FVT QTL were enriched for MR-identified genes. We found no evidence for enrichment for MR10 networks (p = 1.0) but significant evidence for enrichment for MR20, MR30, and MR50 networks (p<5E-09; Table 2). In theory, MR10 networks should contain only those genes whose expression values are most highly correlated with FVT phenotypes. The non-significant results for MR10 may be caused by low power due to the single gene identified.

To visualize the relationship between FVTs and genes in the MR networks, we made scatter plots of each FVT against each directly connected MR gene (S5 Fig). For a minority of these genes (21 of 71) the samples group into two clusters on the scatter plot, corresponding to RILs with low/high expression of the MR gene. This pattern suggests that expression of the MR gene and the FVT value are either controlled by the same gene or by two closely linked genes. However, the majority (50 of 71) of the FVT / MR gene pairs show a relatively continuous linear relationship. This pattern is unlikely to arise simply by linkage and suggests that expression of these MR genes is indeed biologically related to FVT values.

Single Nucleotide Polymorphism (SNP) identification in for MR genes

To identify candidate SNPs that could alter protein function of MR gene products, we compared the sequence data from the RIL population for all MR50 genes and identified segregating SNPs that are predicted to cause an amino acid changes (see methods). We identified a total of 53 SNP variants across 24 MR50 genes (S4 Table). Most of these were mis-sense mutations predicted to have moderate effects on gene function (including three mis-sense mutations in Br025497, a BEL1 homolog). However, one frame-shift mutation predicted to have a high impact on gene function was identified in unannotated gene Bra08635.

Weighted Gene Co-expression Network Analysis (WGCNA)

In a second, statistically independent approach to identifying gene expression networks related to estimates of FVT trait parameters, we used a Weighted Gene Co-expression Network Analysis (WGCNA) to identify 50 gene co-expression network modules consisting of a median of 90 genes each. We calculated eigengene values for gene expression within each module. Modules of interest were identified as those with a significant correlation between the eigengene expression values and FVT model parameters across the RILs (Fig 5). Gene Ontology (GO) enrichment analysis was performed to examine the potential function of correlated modules (S5 Table); below we discuss correlations with module eigengenes that had at least one GO term enriched. There are positive correlations between 2012 BLUPs for maximum height (Hmax), growth duration (d), and the time that the growth curve reached its inflection point (iD) and the “cyan (34)” module (related to protein translation), the “midnight blue (35)” module (related to wounding/herbivore defense responses as well as some abiotic stress responses), and the “blue (30)” module (enriched for genes related to cell division and development). This suggests that plants that have a longer duration of growth and reach a higher maximum height produce more protein, undergo more rounds of cell division, and have increased defense signaling. These three parameters also showed negative correlations with the “brown (9)” module (enriched for actin cytoskeleton and protein dephosphorylation terms). Hmax is negatively correlated with “yellow (45)” (enriched for terms related to photosynthesis). This correlation could be caused by a difference in cellular maturation rates: plants with more rapid cellular differentiation would be expected to show an upregulation of chloroplast genes and reduced growth due to earlier differentiation and consequently relative lack of cell elongation.

Download:

Fig 5. Correlations among WGCNA identified eigengenes and UN 2012 FVT traits.

Significant correlations are denoted with an asterisk. r, growth rate; d, duration of growth; iD, time in degree days when the growth curve reached its inflection point; Hmax, estimated maximum height based on FVT modeling. Modules are named using a numerical system (above) and color scheme (below).

https://doi.org/10.1371/journal.pgen.1008367.g005

Comparisons of QTL and network modeling for phenotypic prediction

To test the effectiveness of QTL, MR genes, and WGCNA in explaining the variation in FVT trait estimates, we compared a series of additive linear models based on QTL, MR genes, or WGCNA eigengenes both singly and in combination. For UNr (in 2012), models containing only QTL outperformed models containing either MR30 identified gene expression or WGCNA-identified eigengene expression (Table 3). For two-data type models, models with only QTL outperformed those containing multiple data types. For Hmax, MR gene expression outperformed both QTL and WGCNA-identified eigengene expression as well as combinations of two data types. For both traits, the full model (with all three data types for r, but which reduced to WGCNA and MR gene expression values for Hmax) were the best models for explaining phenotypic variation (r: F_(5,110) = 25.31, p<0.0001; Hmax: F_(9,106) = 33.16, p<0.0001). Similarly, the best two-data type models were a significantly better fit to the data than the best single-data type models (r: F_(5,114) = 40.182, p<0.0001; Hmax: F(4,113) = 80.398, p<0.0001). For all comparisons, the significantly better model according to ANOVA also had lower AIC scores (Table 3). Taken together, these results indicate that although each approach has significant predictive capacity, combining multiple approaches improves estimation of trait variation.

Download:

Table 3. Comparison of additive linear models using genetic and transcriptomic data to explain 2012 uncrowded phenotypic data.

https://doi.org/10.1371/journal.pgen.1008367.t003

eQTL analyses and colocalization of eQTL with FVT QTL

Because including MR and WGCNA results both improved upon linear models for FVT traits that contained just QTL (Table 3) and because all models that included MR and WGCNA gene/eigengene expression values were significant and predicted FVT trait variation, we used eQTL analyses to assess the mechanistic relationship between MR gene/WGCNA eigengene expression and FVT QTL. For the 56 MR50-identified genes, 40 genes had a total of 41 significant eQTL, 22 of which were cis-eQTL (S6 Table). The 41 eQTL were distributed on chromosomes 1, 3, 4, 6, 9, and 10. In congruence with FVT QTL mapping results, there were eQTL with particularly high LOD scores on chromosomes 3 and 10 (LOD >75; Fig 6). There was significant overlap among 2012 FVT QTL confidence intervals and MR eQTL confidence intervals based on permutation tests (n = 1000, p = 0.003). One explanation for co-localization of FVT QTL and MR eQTL is pleiotropy; i.e. the same genetic change is causing changes in MR gene expression and in the FVT trait. An alternative interpretation is that causal loci are in linkage disequilibrium. These two interpretations are not mutually exclusive; it is likely that pleiotropy explains the association for some traits and linkage explains the association for others.

Download:

Fig 6. Expression trait QTL (eQTL) identified using Composite Interval Mapping (CIM) for MR50-identified genes where MR networks were nucleated around UN FVT traits.

Note the eQTL hotspots on chromosomes 3 and 10.

https://doi.org/10.1371/journal.pgen.1008367.g006

Of the 40 MR50 genes that had eQTL, a total of 37 genes had a total of 38 eQTL that overlapped with FVT QTL. Eighteen of the 37 MR50 genes with eQTL that colocalized with FVT QTL had cis-eQTL (Table 4). The co-occurrence of these loci as MR-identified cis-eQTL and FVT QTL identifies a list of strong candidate genes for regulating the FVT traits. One MR gene (Bra 012899) had multiple eQTL that overlapped with FVT QTL; both of these were on chromosomes 3 and 10 and both were trans-eQTL.

Download:

Table 4. MR-identified genes with cis-eQTL that co-localize with UN 2012 FVT QTL.

Note that because multiple FVT QTL overlap, a single MR cis-eQTL may colocalize with FVT QTL for multiple traits. Genes with missense mutations (S4 Table) are indicated in bold.

https://doi.org/10.1371/journal.pgen.1008367.t004

Next we performed eQTL analyses (S6 Fig) for the 11 WGCNA-identified eigengene modules that were significantly correlated with UN 2012 FVT (see Fig 5). Six of these 11 eigengenes had eQTL: Chromosome 3 harbored strong eigengene eQTL for “darkslateblue (13)”, “steelblue (33)”, and “yellowgreen (22)” (all with no go enrichment; nge). Chromosome 6 had eQTL for “midnightblue (35)” (herbivore/wounding). Chromosome 10 had eQTL in two locations, one for “brown (9)” (actin cytoskeleton) and “lightgreen (29)” (nge), the other for “midnightblue (35)” (herbivore/wounding). Four eigengenes had eQTL that colocalized with FVT QTL, indicating a potential causative connection between eigengenes and FVT for r, iD, and Hmax (Table 5). However, each eigengene had only one eQTL that colocalized with an FVT QTL.

Download:

Table 5. Eigengene eQTL and FVT QTL colocalization.

https://doi.org/10.1371/journal.pgen.1008367.t005

The second chromosome 10 location (“midnightblue (35)”) overlaps with the FVT QTL9 and the eigengene has significant correlations with d and iD FVTs indicating a possible causative connection. We then performed permutation tests and determined that FVT QTL were enriched for WGCNA eQTL (n = 1000, p = 0.002).

Discussion

Plant height is often correlated with fitness and yield. Height is a complex and dynamic trait that changes over the course of development, and variation in plant height is necessarily generated through variation in developmental dynamics. However, similar heights can be achieved through multiple different growth curves. Quantifying the underlying genetic architecture and mechanistic basis of growth dynamics may result in improved estimations of final plant height, fitness, and yield. Here, we use Bayesian hierarchical modeling to estimate Function-Valued Trait (FVT) parameters describing continuous plant growth and explore their correlations with phenology and fitness. We test whether mapped QTL, genes identified through Mutual Rank (MR) co-expression, eigengenes identified through Weighted Gene Network Co-expression Analyses (WGCNA), or combinations of these information types best explain variation in agroecologically relevant FVT traits in the field. Further, we employ eQTL analyses to explore the regulatory mechanisms that connect FVT QTL with phenotypic variation.

Although development typically occurs in a continuous fashion, most studies quantifying development necessarily collect data at discrete timepoints. We take a “parameters as data” approach to FVT modeling to estimate the continuous nature of plant development [2,16]. Much as floral development or leaf development has well defined core molecular genetic pathways that govern organ formation, elaboration, or elongation [reviewed in 35], there is likely a core genetic architecture that contributes to plant height. However, exogenous and endogenous factors can influence the outputs of these developmental programs. For instance, crowding may trigger a shade avoidance response and lead to rapid increases in height [e.g. 36]. Similarly, plant carbon status can affect the developmental morphology and final size of organs such as leaves [17,18,21]. We took two approaches to examining the core developmental genetics of plant height. First, we grew plants across multiple growing seasons and in crowded and uncrowded conditions. Second, we included a genotype-specific co-factor in our FVT models that accounts for variation in photosynthetic rates (approximated through A_max), thereby statistically factoring out variation due to carbon availability and allowing us to more directly interrogate the developmental genetic architecture and molecular mechanisms contributing to plant height [21,22]. In our study, all FVT traits had relatively high broad sense heritabilities (>70%), and all had significant main effects of genotype. Although there were no significant main effects of treatment (i.e. treatment means did not differ), all FVT trait estimates (except iD) exhibited genetic variation for assimilation-independent phenotypic plasticity via a genotype by environment (GE) interaction, likely because of rank-order differences across treatments at the genotypic level (Table 1).

Morphological phenotypes, such as components of yield and height, can be highly integrated throughout development [reviewed in 37]. Final height is often used as a proxy for yield or fitness, yet plant growth dynamics throughout ontogeny may also be correlated with aspects of yield such as fruit and seed set [38,39]. In our experimental set of Brassica rapa Recombinant Inbred Lines (RILs), plant developmental dynamics including duration of growth (d), the inflection point in the growth curve that represents the change from exponentially accelerating to decelerating growth (iD), and estimates of final plant height (Hmax) were all significantly and positively genetically correlated (Fig 2). Interestingly, growth rates (r) were negatively correlated with d and iD, but were not correlated with Hmax, indicating that while there is a trade-off between growth rates and durations, duration of growth may be more important for final plant height than growth rate. All of our estimates of plant growth and final size were significantly genetically correlated with phenology and yield traits, demonstrating that the developmental dynamics of growth can be related to crop yields and plant fitness through mechanisms that are at least partially independent of final size. Because final size is positively correlated with yields while growth rates are negatively correlated with yields, selection for maximum yields at early harvest dates may come at the expense of late harvest yields and vice versa.

To examine the genetic architecture underlying the FVT estimates of growth rates, durations, and final sizes, we mapped QTL for FVT parameters. Of particular note, when QTL for r colocalized with d, the QTL were of opposite sign, confirming our negative genetic correlations between growth rates and durations, and indicating potentially pleiotropic loci contributing to both traits. On average, FVT QTL explained 24% of trait variation and the number of genes under each QTL ranged into the hundreds. To narrow down the list of candidate genes and understand the mechanistic regulation of FVT via QTL, we took two additional transcriptomic co-expression approaches to exploring regulation of FVT traits: First, we seeded a Mutual Rank (MR) co-expression network with FVT traits and asked which gene expression values correlated with variation in FVT traits. Second, we constructed 50 eigengenes based on a Weighted Gene Co-expression Network Analysis (WGCNA) and asked which eigengenes were correlated with individual FVT trait. We found that FVT QTL were significantly enriched for MR genes, indicating that these two approaches were identifying some common drivers of FVT traits. To compare the effectiveness of all three approaches, we asked whether QTL, MR genes, or eigengenes best explained variance in FVT traits. Although QTL outperformed both co-expression network modeling approaches for r, combining data from multiple approaches yielded improvements in our models, indicating that even though QTL, MR genes, and eigengenes often physically co-localize within the genome, they are not interchangeable with one another (Table 3).

To understand the potential function of genes related to growth WGCNA and MR networks, we examined gene annotations of homologous Arabidopsis thaliana genes. Although about half of the eigengenes that correlated with FVT BLUPs had no gene ontology enrichment, three eigengenes with eQTL on chromosome 9 were enriched for actin/cytoskeleton, herbivore/wounding and cell division. The MR30 genes include a homolog of the homeodomain gene BEL1 [40]; BEL1 homologs have been implicated in regulation of the shoot apical meristem [41] and thus could be related to plant growth. The BEL1 homolog had three missense mutations predicted to have moderate impact on genet function. An additional gene was identified with homology to the COBRA family gene COBL4/IRX6 (negatively correlated with iD), involved in secondary cell wall biosynthesis. The MR30 network also contains a number of genes involved in metabolic homeostasis. Four of these genes are localized to the plastid and negatively correlated with d and iD, including three orthologs of the plastidic lipid phosphate phosphatase epsilon 2 gene (LPPε2), which is potentially involved in synthesis of diacylglycerol, a precursor to essential photosynthetic membrane components [42]. Another plastid-localized MR30 network gene is ENHANCER OF SOS3-1 (ENH1); ENH1 functions to mitigate the effects of reactive oxygen species [43]. Thus, plants with longer growing periods appear to put less resources into photosynthesis. The MR30 network also includes a homolog of the A. thaliana LATERAL ORGAN BOUNDARY DOMAIN37 (LBD37) gene, which is an important regulator of nitrogen response in both A. thaliana and Oryza sativa [44,45]. LDB37 is negatively correlated with Hmax and had two moderate missense mutations. Two genes involved in amino acid synthesis or homeostasis are present in the MR30 network and show positive correlations with d and iD: first, a homolog of ASPARTATE KINASE1 (AK1), which is required for regulation of aspartate, lysine, and methionine was recovered [46]. The B. rapa AK1 homolog had a single moderate mis-sense mutation. Second, AROMATIC ALDEHYDE SYNTHASE (AAS), which converts phenylalanine into phenylacetaldehyde [47] was also present. Overall the MR30 network results point to a close connection between metabolic regulation and growth.

Transcriptomic data allowed us to further explore the regulatory control of the FVT using eQTL mapping of WGCNA eigengenes and MR genes. eQTL mapping treats gene expression levels as quantitative traits. When combined with QTL studies of morphological phenotypes, the ultimate goal of eQTL mapping is to identify the molecular genetic changes in gene expression that lead to structural phenotypic variation, thus providing mechanistic explanations for the associations between genotype and phenotype [48]. In humans, such studies demonstrate that eQTL can be used in a cell-type specific fashion to annotate GWAS associations [49]. Based on the 56 MR50 genes in our study, we identified 41 significant eQTL, 40 of which colocalized with FVT QTL. Six of the 11 WGCNA eigengenes that correlated with FVT also had eQTL, and four of these eQTL colocalized with FVT QTL. These data demonstrate that the relationship between genomic loci (FVT QTL) and phenotypic variation in FVT traits is likely mediated by gene expression, specifically the expression of the genes and eigengenes we identified via MR and WGCNA.

Our eQTL results qualitatively departed from common morphological trait QTL analyses in two ways. First, MR-identified gene expression traits mapped to all chromosomes except chromosome 2, but two locations had multiple eQTL with very high LOD scores (>75): the top of chromosome 3 and the middle of chromosome 10. Virtually all genes had eQTL that mapped to one of these two locations, a common result potentially indicating an eQTL ‘hotspot’ [50]. A previous study of the effects of soil phosphorous using the same B. rapa RILs also identified eQTL hotspots [30], but on different chromosomes. The colocalization of eQTL hotspots and FVT QTL may indicate novel regions involved in pleiotropic co-regulation of several downstream genes in the regulatory network contributing to change in plant height [29].

Although the presence of eQTL hotspots indicates pleiotropic gene regulation, our eQTL analyses also qualitatively departed from the FVT QTL analysis in that most of the gene expression traits we mapped were not polygenic. Of the 56 MR gene expression traits mapped, only 1 had multiple eQTL that colocalized with FVT QTL. eQTL studies commonly find a relative paucity of polygenic regulation compared to structural QTL studies, and our results support the general consensus that expression traits and structural phenotypes have distinctly different genetic architectures [but see 32 for a counter-example]. However, most eQTL are of relatively large effect, meaning that many small effect eQTL could remain undetected and contribute to polygenic regulation of gene expression traits [29], and these eQTL may or may not occur in regulatory hotspots.

To further understand the regulation of expression traits and FVT QTL, we divided MR eQTL into two classes: putative cis- and trans-eQTL where cis-eQTL likely correspond to cis-regulatory elements influencing gene expression [51]. In contrast, trans-eQTL do not contain the gene whose expression pattern is mapped and likely correspond to trans-acting factors such as transcription factors that influence the MR gene expression [52]. In our study, 53% of all eQTL identified were cis-eQTL. Of the 40 MR genes with eQTL that colocalized with FVT QTL, 18 were cis (45%) and the remaining 22 were in trans, which is a much higher than the proportion of cis-eQTL than identified in an intraspecific maize cross [53]. Because our B. rapa RILs are also generated from an intraspecific cross, theoretical and experimental work suggesting that trans gene regulation should be more prevalent than cis regulation at the intraspecific level [54,55, but see 56 for an exception]. Our targeted eQTL mapping conducted in an agroecologically relevant field setting deviates from these expectations, indicating that our network construction may act as an strong filter for biologically relevant candidate genes with cis-eQTL.

Our study demonstrates the importance of examining not just final plant height, but the developmental dynamics that contribute to height growth curves in agroecologically relevant field settings. We fit function-valued trait models to our data and, while statistically factoring out aspects of physiology such as carbon assimilation rates, demonstrate that parameters describing continuous developmental growth curves are correlated with plant fitness and yield. There is genetic variation for plasticity of growth rates and final sizes, but not the inflection point (transition from accelerating to decelerating growth) of growth curves. Changes in the sign of bivariate correlations indicate a trade-off between yields at given final size vs. yields at early developmental times. We map FVT QTL to multiple chromosomes and utilize a guided eQTL mapping approach to investigate the regulatory mechanisms connecting genotype to FVT phenotype. Specifically, we use WGCNA to identify eigengenes for actin/cytoskeleton and cell division processes whose expression values that correlate with FVT traits. FVT trait seeded MR co-expression networks had an overall association with metabolic regulation and growth processes. We demonstrate that combining multiple approaches yields the best explanation of phenotypic variance. We identify more cis-eQTL than expected, and these eQTL are highly colocalized at regulatory hotspots, likely including transcription factors that influence downstream gene regulation. Because our cis- and trans-eQTL hotspots colocalize with FVT QTL, these expression traits are likely components of the molecular regulatory mechanisms mediating the generation of FVT phenotypic variation from genomic variation (Fig 7).

Download:

Fig 7. Integrating data from multiple biological levels and analyses reveals the mechanistic regulatory connections between genomic architecture and Brassica rapa developmental phenotypes.

Function-Valued Trait QTL (2012 uncrowded data, Fig 3), Weighted Gene Co-expression Network Analysis (WGCNA) identified eigengene eQTL (S4 Table), and genes identified via Mutual Rank (MR 30) co-expression (S3 Table) occur at regulatory hotspots on chromosomes 10 and 3, indicating that these MR genes are candidate master regulators that integrate information to generate developmental trait variation. MR gene cis-eQTL (pink links) on chr10 and 3 lend further credence to this relationship. MR genes with trans-eQTL (green links) that map to these hotspots are putative upstream genes feeding in to the FVT regulatory network (Fig 6). From exterior to center: chromosomes in black, linkage map in grey, FVT QTL in red, eigengene eQTL in blue, MR genes in cyan, MR trans-eQTL in light green and MR cis-eQTL in pink.

https://doi.org/10.1371/journal.pgen.1008367.g007

Materials and methods

Species description

Brassica rapa (Brasssicaceae) is an herbaceous crop species first domesticated in Eurasia. This study was conducted on Recombinant Inbred Lines (RILs) derived from crossing R500, a yellow sarson oil seed variety, with IMB211, which is a rapid cycling line derived from the Wisconsin Fast Plant line (WFP). All RILs are expected to be >99% homozygous [57–60]. In comparison with IMB211, R500 flowers later, attains a larger size and greater biomass, and allocates more resources to seed production. This experiment includes 120 RILs as well as R500 and representative IMB211 genotypes.

Experimental design and data collection

In 2011, and 2012, the IMB211 × R500 RILs were germinated in the University of Wyoming greenhouse in fertilized field soil, and transplanted into the field at two planting densities, as previously described [1]. Briefly, crowded (CR) plants consisted of 5 plants of the same genotype per 4” peat pot with the central plant designated as a focal individual. The uncrowded (UN) treatment consisted of a single plant per pot. When the cotyledons were expanded, plants were transplanted to the field into randomly located blocks that consisted of either UN or CR plants. Each block contained a full RIL set (and representatives of the RIL parental genotypes), and RIL locations were randomized within blocks with 25cm between each focal plant. For phenotypic data collection 6 UN blocks were transplanted into the field in 2011 and in 2012 8 CR and 8 UN blocks were transplanted. In 2011, an additional 5 UN blocks were transplanted into the field for RNAseq. Plants were watered daily to field capacity and treated with pesticides as needed following Baker et al. [1]. Each year, we collected data on the timing of germination, bolting, and flowering by surveying plants 5–7×/week. We recorded temperature data every 5s in the greenhouse and field using a series of Onset Hobo data loggers (Bourne, MA, USA) and a Campbell Scientific (Logan, UT, USA) CR23X data logger equipped with a Vaisala (Helsinki, Finland) HMP-50 sensor. Temperature data were used to produce hourly and daily means, as well as hourly and daily minimums and maximums, for Degree Day (DD) calculations, which used a B. rapa-specific base value of 0.96°C [61].

Morphological data

Plant height was recorded for all plants starting at leaf emergence. In 2011, height was measured 6 times during the growing season, and these measurements captured final heights. In 2012, height was measured 2–3 times per week until senescence. Perhaps because of the increased frequency of data collection for 2012 FVT trait estimates, our RNAseq data corresponds more closely to 2012 plant-level phenotypic data compared to 2011, and we focus our analyses on 2012 plant-level phenotypic data. We present 2012 results for all FVT data; full results of FVT traits and FVT QTL including 2011 data can be found in supplemental materials. Flowering phenology and performance were estimated based on 2012 fruit and seed numbers, as described in Baker et al. [1].

Function-Valued Trait (FVT) modeling and data analysis

Height data were visually inspected for erroneous data points on a replicate level following Baker et al [1]. FVT modeling for trait estimation used Bayesian approaches that fit logistic growth curves to longitudinal height data [Eq 1; adapted from 21]. Height for each individual replicate plant is represented by a minimum of 5 and maximum of 13 sequential measurements. Briefly, we utilized a three-level hierarchical Bayesian model that retains the measurement data structure to account for information across all plants and genetic lines within the population, including replicate plants within each line.

1

Replicate-level parameters were extracted from the fitted logistic growth curves and treated as trait data [13,14,21,62,63]. These parameters include the growth rate (r, cm/DD), and an estimate of the maximum height based on the asymptote of the logistic growth curve (Hmax, in cm). Additional parameters were algebraically extracted from the growth curve and include the duration of growth (d, in DD) and the inflection point of the growth curve in Degree Days (iD, in DD). The parameter d was defined as the time in DD when 95% of the final size (Hmax) was achieved. The parameter iD reflects the transition from exponentially accelerating to decelerating growth rates.

The hierarchical Bayesian model was implemented using PyMC, a Bayesian Statistical Modeling Python module. The model parameters were estimated via MCMC using the Metropolis-Hastings algorithm [64,65]. The MCMC estimations were performed using a single chain to sample 500,000 iterations, which includes the first discarded 440,000 burn-in iterations; the remaining 60,000 iterations were retained. By thinning to 1 iteration in 20, the retained iterations were reduced to 3,000 samples for every FVT parameter from which the posterior distributions were tabulated. All parameters’ trace and auto-correlation plots were examined to ensure that the MCMC chain had adequate mixing and had reached convergence. All observed data for each genotype were plotted with two 95% credible interval envelopes. The inner, yellow envelope represents the credible intervals for the model based on the observed data, and the green envelope (Fig 1., S1 Fig) is the 95% credible interval where future observations from the same environment are expected [22,66].

Phenotypic plasticity

To detect environmental factors that might affect the correspondence between genotype and phenotype, we analyzed replicate level phenotypic datasets from 2012. We tested for the main effects of genotype and treatment and all possible interactions using the lme4 and pbkrtest packages in the R statistical environment [67–69]. In these tests, all effects were considered random and block was nested within the treatment effect. Significant main effects of environment (treatment) were considered evidence of phenotypic plasticity, and interactions of treatment × genotype was considered evidence for genetic variation in phenotypic plasticity.

Best Linear Unbiased Predictions (BLUPs)

BLUPs were calculated independently for UN and CR treatments in R using the lmer function in the lme4 package while controlling for block effects [69,70]. Broad sense heritability (H²) was calculated as the genotypic variance divided by the sum of genotypic, block, and residual variances.