Conceived and designed the experiments: RCM TA LW. Performed the experiments: JL RCM. Analyzed the data: TG. Contributed reagents/materials/analysis tools: MS. Wrote the paper: TG. Supervised the whole project: JS. Wrote the computer programmes: TG. Provided support in the analysis: JL SA MS. Proofread the manuscript: TA JL SA MS. Provided support in interpreting the results: TA LW RCM. Revised the manuscript: LW RCM JS.
The authors have declared that no competing interests exist.
Hybrids represent a cornerstone in the success story of breeding programs. The fundamental principle underlying this success is the phenomenon of hybrid vigour, or heterosis. It describes an advantage of the offspring as compared to the two parental lines with respect to parameters such as growth and resistance against abiotic or biotic stress. Dominance, overdominance or epistasis based models are commonly used explanations.
The heterosis level is clearly a function of the combination of the parents used for offspring production. This results in a major challenge for plant breeders, as usually several thousand combinations of parents have to be tested for identifying the best combinations. Thus, any approach to reliably predict heterosis levels based on properties of the parental lines would be highly beneficial for plant breeding.
Recently, genetic data have been used to predict heterosis. Here we show that a combination of parental genetic and metabolic markers, identified via feature selection and minimum-description-length based regression methods, significantly improves the prediction of biomass heterosis in resulting offspring. These findings will help furthering our understanding of the molecular basis of heterosis, revealing, for instance, the presence of nonlinear genotype-phenotype relationships. In addition, we describe a possible approach for accelerated selection in plant breeding.
The introduction of the concept of hybrids probably represents the most important single innovation in plant breeding. Central to the concept of hybrid breeding is the phenomenon of heterosis or hybrid vigour
Despite its central importance for the hybrid breeding concept, the molecular basis responsible for the heterosis effects is far from being understood. From a genetics perspective, dominance, overdominance or epistasis based models are commonly used as explanations for heterosis
One central observation in hybrid breeding programs is that the extent of heterosis is strongly dependent on the two parental lines. Thus, with respect to yield, for instance, heterosis can vary strongly depending upon the specific combination. In addition to its puzzling biological complexity, this represents a severe challenge to plant breeding, as usually the best combination of two parental lines can only be determined by trial and error. Therefore many thousands of testcrosses are required in order to find the optimal parental combination for the trait of interest. Numerous attempts have been followed with the goal to reduce the level of uncertainty concerning suitable parental combinations and to achieve some level of predictability. Not surprisingly mostly genetic markers of the two parents were used in these approaches
Despite the fact that genetic markers have proven to be highly useful in plant breeding for marker-assisted selection of simple traits that are difficult to assay
We set out to test whether heterosis prediction can be improved by using more complex parameters of the parental line than genetic markers. To this end, we decided to investigate parental metabolic markers, i.e. relative levels of particular metabolic compounds, with regard to their predictive power for biomass heterosis in the plant model organism
As shown in the Results section, the predictive power of genetic data for heterosis is significantly improved by combining it with metabolic measurements of one parent, suggesting complex mechanisms underlying heterosis. Finally, when analyzing the minimal set of metabolic and genetic markers needed for heterosis prediction in the two different testcross populations used, three classes can be identified. Firstly, there are markers highly predictive in both kinds of testcross combinations, secondly, we find markers specifically predictive for one kind of testcross combination, and there are also markers negligible in any of our prediction models.
As described in the Introduction, heterosis levels are a function of the two parental lines used to produce the offspring. Classical heterosis experiments are performed between two completely different parental lines, which have been inbred to a high degree and are essentially homozygous, resulting in a highly heterozygous F1 offspring. We here deliberately chose a different and genetically simpler experimental set-up as a first step towards investigation of the information carried by metabolic markers in comparison to, or in combination with, molecular genetic markers. To this end, a recombinant inbred line (RIL) population consisting of 359 lines derived from a cross between the two
The analysis of a full or half diallele involves crosses among numerous lines with different degrees of relatedness, and in each cross only a subset of the genetic factors responsible for heterosis might be shared with any other cross. In contrast to that, in our much simpler experimental design one parent was kept constant (C24 and Col-0 respectively, cf.
The experimental set-up comprised the two
To determine the degree of heterosis displayed in the two different testcross set-ups, biomass was measured in the testcross-progeny and compared to the mean of the biomasses of the two corresponding parental lines (cf.
For training and evaluating the prediction models, we applied a partial least squares (PLS) approach
In a first approach we compared the whole set of available genetic versus metabolic markers according to their suitability for heterosis prediction. Applying PLS-modelling on all 110 genetic markers, the Pearson correlation between observed and in cross-validation predicted heterosis was 0.39 for the C24 crosses and 0.35 for the Col-0 crosses. Using, on the other hand, all available 181 metabolic markers as predictor set, the corresponding numbers were 0.27 for C24 crosses and 0.24 for Col-0 testcrosses.
These results show that some prediction has been achieved. But the level of predictive power is very low, above all in the case of the 181 metabolic markers. We believe this to be a problem of overfitting, indicated by the fact that the larger predictor set yields the smaller prediction success in cross-validation.
In order to rank the available variables according to their contribution to the response, we utilized the variable importance in the projection (VIP), which calculates the contribution of each predictor variable to the response in the respective PLS-model
Applying this feature selection approach to our four PLS models (i.e. heterosis according to the Col-0 or C24 parent using either metabolic or genetic markers) led to improvements of the predictive power to values of about 0.40 (cf.
The diagrams shown here demonstrate the trade-off between overfitting and loss of information in the different models. The x-axis represents the number of predictors used to train the respective model. Red dots display the predictive power of the particular model in leave-one-out validation (LOOV). Panel A, C and E correspond to C24-heterosis, the remaining ones to Col-heterosis. The number of predictors, which maximizes the predictive power, is referred to as Opt, and it differs in the various models. In each case, the predictive power decreases by incorporating too many predictors in the corresponding model. This effect is due to overfitting. On the other hand, loss of information occurs, if too few predictors are selected in the model. Min refers to the minimal number of predictors that does not yet imply a significant loss of predictive power. The corresponding predictive power is still within the estimated confidence interval (gray lines) of the maximal predictive power. Black dots demonstrate the estimation of these confidence intervals. They represent the predictive power of the optimal predictor set when using jackknife resampled data (cf.
Response variable | C24-heterosis | Col-heterosis |
0.42 | 0.41 | |
0.40 | 0.38 | |
0.50 | 0.48 |
We wanted to answer the question of whether a combination of genetic and metabolic markers improves the predictability of heterosis as compared to the use of genetic markers alone. Since our variable selection algorithm allowed to overcome the problem of overfitting when dealing with a high number of predictors, we were now in a position to apply this algorithm on the combined set of metabolic and genetic markers (i.e. a total of 291 predictors) and test their suitability for heterosis prediction using PLS regression (cf.
As shown in
One advantage of the experimental design used here is the identification of variables that are important for heterosis prediction in both testcross populations as compared to variables that are strongly relevant in the prediction models for only one kind of combination, i.e. either with C24 or with Col-0. However, in order to perform this analysis, the number of variables in the respective models had to be reduced further to a minimal set of variables whose predictive power does not differ significantly from the optimal predictive power. To achieve this goal we developed a MDL based strategy (cf.
When comparing the relative importance of the 110 genetic markers as judged by their contribution to heterosis prediction with the genetic analysis of heterotic biomass QTL, a substantial overlap is observed (cf.
Each of the 110 genetic markers and its position on a Chromosome (Chr) is represented by two circles. The circle size indicates high (large circles), medium and low (small circles) VIP in the genetic model for Col-heterosis (circle left of chromosome) and C24-heterosis prediction (circle right of chromosome). We refer to the VIP as high and medium, if the corresponding marker is contained in the minimal genetic model and in the optimal genetic model, respectively. Markers specific for Col-heterosis prediction and those specific for C24-heterosis prediction are coloured in dark blue and in light blue, respectively. To allow positional comparison, support intervals of biomass QTL and three kinds of heterosis related QTL detected by Meyer et al. (2008, submitted) are plotted as coloured boxes along the chromosomes (grey: biomass, green: Z2
Eight genetic markers turned out to be highly predictive for heterosis in both testcross populations. This was indicated by their presence in the minimal genetic model and in the minimal combined genetic-metabolic model for both, C24- and Col-heterosis prediction. Five of those highly predictive markers are located at the top of chromosome 4, a region containing important QTL for heterosis and biomass
Fifteen genetic markers are specific for predicting heterosis in either the Col-0 or the C24 crosses. They are contained in the minimal genetic model and in the minimal combined genetic-metabolic model for one testcross population, but in none of the corresponding models for the other testcross population. Two of these markers are specific for Col-heterosis prediction. They are both located at the bottom of chromosome 3, while the thirteen markers specific for C24-heterosis prediction are distributed to all five chromosomes. Also in the case of the specific genetic markers there is some overlap with QTL (cf.
Applying the MDL based feature selection method to the metabolic markers, we identified the most important ones with respect to heterosis prediction (cf.
Highly pred in both TC set-ups | Specific for C24-het prediction | Specific for Col-het prediction |
Alanine | Fumaric acid | Gluconic acid |
Cellobiose | Galactose | Glycerol-3-phosphate |
Glycine | Glucose-6-phosphate | Fructose-6-phosphate |
Oxalic acid | Maleic acid | 2,4-Hydroxy butanoic acid |
Propanoic acid | Maltose | α-Tocopherol |
Urea | Putrescine | 5 Unknowns |
7 Unknowns | Raffinose | |
Salicylic acid | ||
Tyrosine | ||
7 Unknowns |
On the other hand, fourteen metabolic markers were identified as important for heterosis prediction irrespective of the parent (cf.
When comparing different metabolic markers concerning the distribution of their levels among the 359 RILs, it becomes obvious that the highly predictive markers tend to deviate from normal distributions (cf.
Here, unit area histograms are presented, i.e. the particular curve shows proportions rather than absolute numbers. Thus it constitutes a simple density estimate. The x-axis demonstrates normalized metabolite levels and is divided into equidistant intervals. The y-axis represents the relative frequency per interval. The panels A and B show the two metabolic markers with the highest VIP in each investigated model, i.e. Unknown 31 (using a functional group prediction service offered by the Golm Metabolome Database
As shown in the Results section, the combined use of selected genetic and metabolic variables of the RIL population leads to a significant improvement of the predictive power for the heterosis effect observed in the progeny of both test parents, i.e. the C24 and the Col-0 testcrosses.
The cause of this can at least partially be rationalised. It can be assumed that heterosis is primarily, if not exclusively, dependent on the match of the two alleles (or sets of alleles) derived from both parents, suggesting that complete genetic information should be sufficient for predicting heterosis. The genetic markers used here and in most other studies however are far from representing a complete coverage of the entire genome. On the other hand, in our study, individual genetic markers contain much information on the neighbouring regions, since we investigated a RIL population. Each line carries on average 1.86 recombinations per chromosome, and the average length of segments of a certain parental origin is 32.06 cM
In our analysis, metabolic markers turned out to carry information on heterosis that is complementary to the genetic markers' information. Therefore, one should assume that epistatic gene effects are actually involved in heterosis. This is consistent with recent analyses which indicate a strong role for epistasis in the manifestation of heterosis in
Concerning an effective genetic heterosis prediction model, the complex trait thus would require to be modelled via complex nonlinear interactions of the different genetic markers. Since we know little concerning gene-gene interactions in relation to their influence on the trait of our interest, too many possibilities exist for modelling such a nonlinear phenomenon. Thus, nonlinear gene interactions should be integrated indirectly by using metabolic markers. This is very simple to model, since this approach essentially follows a linear combination of genetic and metabolic markers (cf.
The chain of causality from genes to phenotype is displayed here. Since genes are at the starting point of the causal chain, one established way to model a phenotype Y is to use genetic markers X and combine them linearely (genetic model in red). Using instead predictor variables Z, which are close to the phenotype, such as metabolites, presents another promising way to predict a complex phenotype, since those variables integrate already parts of the complex gene interactions (represented by products Xi*Xj). The advantage is that we do not need to know, which genes actually interact in which way, and that the model can stay simple. It just linearely combines metabolite variables, thus integrating non-linear interactions indirectly (metabolite model in blue). In the end, one might use a combination of both approaches, i.e. combining different levels of the causal chain, to explain as much as possible of the complex phenotype. Hence one integrates linear relationships concerning the response as well as non-linear gene interactions, while sticking to the simple model ansatz of a linear combination of genetic predictors X and metabolite variables Z (combined model in violet).
Biomass is an integrated variable reflecting the action (and thus presence) of many, if not all, genes of the respective genotype. We therefore expected, following the argument used for metabolic markers, that the RILs' biomass as additional predictor is even more valuable.
Actually, just like metabolic markers (cf.
The combination of the RILs' metabolic markers and the biomass variable as predictors yields a predictive power of 0.51 and 0.47 for C24- and for Col-heterosis respectively. These numbers correspond approximately to the predictive power of the combined metabolite-genetic models (cf.
On the other hand, we added the genetic markers as predictors to the pure biomass model. In that way, the predictive power has been increased in both testcross populations to an almost equal amount, from 0.44 and 0.35 up to 0.60 and 0.50 respectively, for the C24 and the Col-0 testcrosses, independently of the presence or absence of metabolic markers as predictors. In both testcross populations, these numbers represent a significant improvement, as compared to all considered models (cf.
Some of the predictors, such as parental biomass, are significantly correlated to the heterosis effect in at least one of the testcross populations. Accordingly, their contribution in the respective multivariate models is high. Such variables show an interesting difference in their influence on heterosis prediction in the two testcross populations.
In contrast to the biomass variable, which correlates stronger with C24-heterosis (Pearson correlation: −0.45) than with Col-heterosis (Pearson correlation: −0.36), the remaining highly predictive markers are more important for Col- than for C24-heterosis. The most important single effect is that of the first genetic marker at top of chromosome 4. Its correlation with Col-heterosis is 0.40, as compared to 0.17 with C24-heterosis. The four subsequent markers on top of chromosome 4 also have a higher single effect on Col-heterosis (correlations between 0.31 and 0.35) than any marker has on C24-heterosis, where the strongest correlation (apart from the biomass variable) is −0.20 with a specific marker in the middle of chromosome 3.
Since the most important single effects are stronger on Col-heterosis than on C24-heterosis, the contribution of the corresponding markers in the multivariate genetic model is also higher in the case of Col-heterosis (cf.
Naturally, this affects also the combined genetic-metabolic models (cf.
These observations about the substantial influence of top of chromosome 4 for Col-heterosis (VIP-ranking 1 in the minimal genetic model) in connection with its minor influence on C24-heterosis (VIP-ranking 22 in the minimal genetic model) can be explained in the following way. It is crucial for strong Col-heterosis that the respective testcross is heterozygous at top of chromosome 4, i.e. that the corresponding parental RIL carries C24 alleles at this locus, indicated by a positive coefficient of the marker in the model (cf.
We found an opposite trend for the genetic marker in the middle of chromosome 3, which is only important for C24-heterosis (VIP-ranking 5 in the minimal genetic model). Indicated by the negative sign of the coefficient, the respective RIL parent is required to carry Col-0 alleles at this locus to yield high heterosis in C24 testcrosses, whereas it does not matter for Col-0 testcrosses whether they are homozygous or heterozygous at this locus. Consequently Col-0 is assumed to be dominant at the middle of chromosome 3. The same rational is applicable for some other accession-specific markers at other loci. Nevertheless, as mentioned above, we have to take into account also epistatic effects, in consequence of which the single dominance effects are often not as clearly revealed.
Concerning the metabolic markers important for C24- and Col-heterosis prediction, there are some of them, whose model coefficients are positive in the Col-0 model while negative in the corresponding C24-model (e.g. glycine) and vice versa. This means, for instance in the case of glycine, that high levels in the parental RIL are associated with high Col-heterosis but with low heterosis in the C24 testcross. On the other hand, we find in the Col-0 parent a low level of glycine, whereas it is high in the C24 parent (and moderate in the F1 progeny). Consequently, the fact that a Col-0 testcross shows stronger heterosis when the corresponding RIL parent has a high glycine level, and a C24 testcross performs better when the corresponding RIL parent has a low glycine level, suggests that heterosis is dependent on an optimally balanced level of certain metabolites.
That is in agreement with the established metabolic balance concept of heterosis, which requires the coordination of all reactions and systems for efficient growth under a given environment
This idea is also supported by some of our specific metabolic markers (cf.
As described in the Results section, combined genetic-metabolic models yield a correlation of approximately 0.5 between observed and in LOOV predicted response (cf.
However, an enrichment of crosses displaying a higher likelihood for strong heterosis would probably be helpful. To evaluate whether this approach might be feasible, we trained a linear discriminant model for C24-heterosis. This model discriminates the 359 RILs into lines with high and low heterotic potential using our minimal genetic-metabolic predictor set. In LOOV 233 RILs (i.e., 65%) were classified correctly by that model. Incorporating all 181 metabolic markers and all 110 genetic markers resulted in a discriminant model that classified only 49% of the RILs correctly, which corresponds to a random classifier. Thus the two-step approach described here, i.e. identifying in a quantitative model the most important set of genetic and metabolic markers and subsequently applying those in a qualitative model, might be an interesting path also with respect to practical applications
In addition to the incorporation of gene interactions by metabolic markers, as shown in this study, we will also consider nonlinear relationships of the most important metabolic markers identified in this work. This seems to be a promising step, since on the one hand, the highly predictive metabolic markers show distributions deviating from normal ones. On the other hand, for linear models normal distributions of the predictors are required. Consequently, the consideration of some nonlinear terms in the discrimination model, which account for the special distributions of the highly predictive markers, will probably improve the classification success.
Data was derived from a RIL population
We investigated two sets of predictor variables, both, independently and in combination. The first set was the metabolic profile of the RILs consisting of 181 metabolic markers. Their levels were measured via gas-chromatography/mass-spectrometry at 15 days after sowing, and the data was normalized as described elsewhere
Our response variables of interest were C24-heterosis and Col-heterosis. These were determined by the shoot biomass of the particular RIL, the shoot biomass of the corresponding testcross C24 x RIL (or Col-0 x RIL) and that of the background line C24 (or Col-0). Shoot biomass was measured as dry weight at 15 days after sowing. The determination of the shoot biomass values (cf.
The absolute mid-parent heterosis is defined as the difference between the hybrid's shoot biomass and the mean shoot biomass of its parents. Dividing the absolute mid-parent heterosis by the mean shoot biomass of the parents, results in the relative mid-parent heterosis, which we focus on in this analysis (cf.
Our computations were conducted on the free software environment
For training the particular prediction models, we used the PLS approach
The rational of using PLS instead of ordinary least squares regression is the potential existence of latent variables, on which the particular response depends, but which are not available, because they cannot be measured. Fortunately, one latent variable often regulates not only the response of interest but also several other variables, which are in fact measured in experiments. Since they are all influenced by the one latent variable, many of them correlate strongly. Therefore we can down-weight some of the measured variables. The remaining variables could be considered as biomarkers for the particular response. That means there is not necessarily a causal dependence, but the biomarkers act as an indicator for the response. Since we consider very complex response effects, such as heterosis, probably several latent variables are involved. Thus, it is possible that two different latent variables influence the same measurable variable. For example, one of them up- and the other one down-regulates the measurable variable. Then, knowing only the value of the measured variable, it is not possible to reconstruct both latent variables correctly. We therefore have to weight the measured variables and to select the optimal number of latent variables. We use the PLS-algorithm for this purpose. Low weights in important latent variables give a hint for measured variables that do not improve the prediction.
The PLS-approach is able to handle a high number of measured predictor variables relative to the number of samples. But certain response effects are very sensitive with regard to overfitting. Down-weighting the measured variables with a low contribution to the response, is not enough in this case. We therefore go one step further and suppress variables that contribute the least to the response of interest in the PLS-model.
The contribution of a predictor variable to the response is measured by the VIP suggested previously
The computational costs of optimizing the number of latent variables increase with the number of original variables in the model. We therefore do not undertake those efforts for every possible number of original variables. The increment of our outer loop is ten, as a start, to get a general idea (cf.
To exclude the possibility that the predictive power is insignificant (which means that the in LOOV predicted response does not correlate significantly with the observed response), we perform a randomisation test. We build 1000 random permutations of the true response. Then the Pearson correlation between the in LOOV predicted response and each permutation is calculated. The 1000 random correlations are compared to the predictive power, i.e. the correlation between the predicted response and the real response.
To judge the difference in predictive power between two distinct predictor sets, confidence intervals for the predictive power of optimized models are estimated using jackknife resampling. Given a predictor set and a response variable, we randomly select 358 out of our 359 samples and recompute the predictive power of the optimal variable selection on the resampled jackknife data. The resampling and the recomputation of predictive power are repeated 200 times. This results in a specific range of values for the predictive power of the regarded optimal variable selection for the particular response. We consider the central 99% of those 200 values as an estimation for the confidence interval. It stretches from the mean of the two smallest values to the mean of the two largest ones. If the estimated confidence intervals for different predictor sets or for different response variables are disjoint, we consider the discrepancy of the corresponding predictive powers significant.
In addition, the estimation of a confidence interval for the optimal predictive power enables us to follow Occam's razor
The predictor variables, which are always contained in the minimal variable selection, can be considered as biomarkers. Comparing the biomarkers for the two different testcross settings, we learn about common and specific effects, We refer to a certain metabolic marker as specific for heterosis prediction in one testcross population, if and only if it is contained in both, the minimal metabolic variable selection and the minimal variable selection of the combined genetic-metabolic approach for one testcross setting, but in none of the corresponding minimal variable selections for the other testcross setting. On the other hand, we consider a certain metabolic marker as highly predictive for heterosis in both testcross populations, if it is contained in the minimal metabolic and also in the minimal metabolic-genetic variable selection for both testcross settings (cf.
If we want to consider the order of importance of the biomarkers (cf.
Additional histograms of metabolic markers over the 359 RILs, in comparison to a random generation. The panels A-F show the six metabolic markers most important according to their VIP in the minimal metabolic and in the minimal combined genetic-metabolic models. Their deviation from a single normal distribution seems to abate with decreasing importance. Panels G-K show the 5 metabolic markers with the lowest mean VIP in the complete metabolic and the complete combined genetic-metabolic model for C24- and Col-heterosis, in comparison to a random generation of 359 numbers following a normal distribution with expectation zero and a standard deviation of one third (L).
(0.05 MB PDF)
VIP of genetic markers in the complete (A) and in the minimal (B) genetic models. The figure shows the contribution of the 110 genetic markers, which have been arranged horizontally, to the respective response in the different models. The darker the bar, the higher the VIP of the corresponding marker in the particular model. Panel A shows that, when training on the whole set of markers, the contribution to C24-heterosis prediction is more diffused on many different markers, whereas it is strongly concentrated on top of chromosome 4 in the case of Col-heterosis. Panel B shows the VIP when trained on those selected genetic markers, which turned out to be essential predictors in the respective models. Each of the five chromosomes is represented by at least one marker in both minimal models, which overlap on the chromosome 1, 3 and 4. The two markers at the bottom of chromosome 3 only selected as predictors in the minimal Col-heterosis model were confirmed to be Col-heterosis specific markers by the combined genetic-metabolic model (cf.
(0.11 MB TIF)
Histograms of allele frequencies. The figure shows how the allele frequencies are distributed in the RIL population. The panels A and B deal with the frequency of C24 alleles and Col-0 alleles, respectively. For each RIL there are 110 genetic markers featuring C24, Col-0 or heterozygosity. The x-axis represents the number of the corresponding alleles per RIL. The y-axis presents how many of the 359 RILs show the corresponding allele frequency. The mean frequency of C24 alleles and Col-0 alleles per RIL is 48% and 50%, respectively. In average, 2% of the markers are heterozygous. The minimal frequencies for C24 and Col-0 alleles are 11% and 10%, respectively, while the maximal frequencies are 89% and 85%. The standard deviation was about 15.4 and 15.1, respectively. Using a Mantel test (P<0.001) to estimate association between marker matrices, we did not find significant differences in marker distribution between the two sub-populations
(0.00 MB PDF)
Feature selection process in metabolite model for C24-heterosis. Panel A shows the crude optimization of predictive power according to the number of predictors in the model with increment 10. It determines the breakpoint for the refined version (panel B) to reduce the computational costs (cf.
(0.03 MB PDF)
Estimated confidence intervals of the optimal predictive power in leave-one-out validation.
(0.03 MB DOC)
Number of variables in the optimal and minimal predictor sets, in comparison to the number of all available predictors in the corresponding set. Abbreviations: Het, heterosis; Gen, genetic marker set; Met, metabolic marker set; Bio, biomass marker; Gen-Met, combined genetic-metabolic marker set etc.
(0.04 MB DOC)
List of metabolic markers that turned out to be relevant neither for C24- nor for Col-heterosis prediction in our analysis, i.e. omission of those markers did not significantly deplete the predictive power in the metabolic models or in the combined genetic-metabolic models.
(0.03 MB DOC)
List of the 40 metabolic markers highly ranked in C24-heterosis prediction, sorted according to the sum of their VIP in the minimal metabolic (met) and in the minimal combined genetic-metabolic (gen-met) model. Interestingly, the three highest ranked metabolic markers are distinct from those most important for biomass per se prediction
(0.02 MB XLS)
List of the 40 metabolic markers highly ranked in Col-heterosis prediction, sorted according to the sum of their VIP in the minimal metabolic (met) and in the minimal combined genetic-metabolic (gen-met) model. Interestingly, again the three highest ranked metabolic markers are distinct from those most important for biomass per se prediction
(0.03 MB XLS)
Dry weight (DW) and rMPH values for the 359 investigated RILs and the corresponding testcrosses (TC-C24 and TC-Col), including the standard deviation (sd) and the arithmetic mean (mean) for each column.
(0.10 MB XLS)
We thank Hannelore Liero and Henning Läuter for their suggestions and advice concerning mathematical statistics and for helpful discussions and encouragement. We thank Detlef Groth, Jan Hummel and Manuela Hische for support with databases. We thank Stefanie Hartmann for proof-reading the manuscript. We also thank all partners within the DFG Priority Programme SPP 1149 “Heterosis in Plants”.