Improving seed oil yield and quality are central targets in rapeseed (Brassica napus) breeding. The primary goal of our study was to examine and compare the potential and the limits of marker-assisted selection and genome-wide prediction of six important seed quality traits of B. napus. Our study is based on a bi-parental population comprising 202 doubled haploid lines and a diverse validation set including 117 B. napus inbred lines derived from interspecific crosses between B. rapa and B. carinata. We used phenotypic data for seed oil, protein, erucic acid, linolenic acid, stearic acid, and glucosinolate content. All lines were genotyped with a 60k SNP array. We performed five-fold cross-validations in combination with linkage mapping and four genome-wide prediction approaches in the bi-parental population. Quantitative trait loci (QTL) with large effects were detected for erucic acid, stearic acid, and glucosinolate content, blazing the trail for marker-assisted selection. Despite substantial differences in the complexity of the genetic architecture of the six traits, genome-wide prediction models had only minor impacts on the prediction accuracies. We evaluated the effects of training population size, marker density and phenotyping intensity on the prediction accuracy. The prediction accuracy in the independent and genetically very distinct validation set still amounted to 0.14 for protein content and 0.17 for oil content reflecting the utility of the developed calibration models even in very diverse backgrounds.
Citation: Zou J, Zhao Y, Liu P, Shi L, Wang X, Wang M, et al. (2016) Seed Quality Traits Can Be Predicted with High Accuracy in Brassica napus Using Genomic Data. PLoS ONE 11(11): e0166624. https://doi.org/10.1371/journal.pone.0166624
Editor: Harsh Raman, New South Wales Department of Primary Industries, AUSTRALIA
Received: March 29, 2016; Accepted: November 1, 2016; Published: November 23, 2016
Copyright: © 2016 Zou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by the National Basic Research Program of China (Grant No. 2015CB150200), the National Key Research and Development Program of China (No.2016YFD0101300), and the Natural Science Foundation of Hubei Province Key Program 2014CFA008. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: QTL, quantitative trait loci; DH, diploid haploid; BLUEs, best linear unbiased estimates; LD, linkage disequilibrium; FDR, false-discovery rate; PVE, phenotypic variance explained by all QTL; GBLUP, genomic best linear unbiased prediction; RR-BLUP, ridge regression best linear unbiased prediction; EG-BLUP, extended GBLUP model
Rapeseed (Brassica napus L.) is one of the most important oilseed crops worldwide . The breeding goal for rapeseed is high oil yield coupled with excellent oil quality [2–4]. The latter is mainly driven by the composition of the fatty acid components of erucic acid (C22:1), stearic acid (C18:0), oleic acid (C18:1), linoleic acid (C18:2), and linolenic acid (C18:3) [2, 3, 5]. Moreover, protein and glucosinolate content determine to a large extent the quality of the rapeseed meal [6–8]. All of these seed traits are influenced by the environment [9–11], and their precise estimation requires phenotyping in replicated multi-environmental field trials. Moreover, measuring quality traits in rapeseed is often labor-intensive. Therefore, quality traits are interesting targets for genomic-assisted crop improvement.
Genomic-assisted crop improvement can either be based on marker-assisted selection  or genome-wide predictions [12, 13]. In marker-assisted selection, the performance of individuals is predicted using a few diagnostic markers associated with the traits under consideration . In contrast, genome-wide prediction exploits many markers without performing marker-specific significance tests . The accuracy of marker-assisted selection and genome-wide predictions depends on the genetic architecture underlying the traits under consideration. Marker-assisted selection is most effective if the trait is controlled by a few genes with large effects. If the genetic architecture is complex, quantitative trait loci (QTL) detection is not reliable and genome-wide prediction is more powerful .
The presence of QTL underlying quality traits in rapeseed has been investigated in linkage and linkage disequilibrium mapping studies [1, 3, 9–11, 17–28]. Accumulated information of the QTL accounting for seed quality traits such as seed fatty acid has also been identified in other Brassica species, such as B. oleracea and B. juncea [29, 30], which could provide reference for the comparison between species. However, linkage and linkage disequilibrium mapping, are often afflicted by upwards biased estimates in terms of the proportion of genotypic variance explained by QTL. Therefore, cross- or independent validations have been suggested to obtain unbiased estimates of QTL effects but have been applied only in a limited number of studies in rapeseed [31, 32].
The potential and limits of genome-wide predictions have been examined for several major crops, such as barley , wheat [15, 34–36], maize [37–42], rice , sunflower , forage plants , sugar beet [46, 47], and soybean [48, 49]. The results underlined the potential of genome-wide prediction as a powerful tool to accelerate selection gain in plant breeding. Recent studies in rapeseed also highlighted the potential of genome-wide prediction of flowering time [31, 50, 51], plant height, protein content, oil content, glucosinolate content, grain yield [31, 51]. Nevertheless, the benefits of genome-wide prediction compared to marker-assisted selection have not been examined in rapeseed. Moreover, the potential to exploit epistasis to predict seed quality traits has not been investigated, although previous studies suggested that epistatic interactions were important for fatty acid metabolism .
This study is based on a published dataset from the bi-parental TN DH population comprising 202 DH lines, which has been intensively used to study the genetic architecture of important agronomic traits [9–11, 22, 23] and were genotyped with an Infinium 60K-SNP array  being extensively used in Brassica [24, 53, 54]. The two parents of the TN DH mapping population originated from the European and Chinese genepools and have been used widely for rapeseed breeding programs in both target regions. Our objectives were to (i) test for the presence of QTL exhibiting reliable and large effects using five-fold cross-validations, (ii) investigate the effect of the genetic architecture on the superiority of different genome-wide prediction models, (iii) examine the potential to improve the prediction accuracy by modeling digenic epistatic effects, (iv) validate the prediction accuracy in a genetically independent population, and (v) discuss the consequences for implementing genome-wide predictions in applied rapeseed breeding programs.
Materials and Methods
Plant materials and field trials
A bi-parental DH population of B. napus denoted as TN DH has been developed, comprising 202 unique lines . The DH lines were derived from a microspore culture based on the F1 cross between Tapidor and Ningyou7. The parent Tapidor is a European winter cultivar with low erucic acid and glucosinolate content in the seeds. The parent Ningyou7 is a Chinese semi-winter cultivar with high erucic acid and glucosinolate content in seeds. The TN DH mapping population along with its two parents was grown in 11 winter and semi-winter ecotype environments (S1 Table). The phenotypic data was generated and used in a previous linkage mapping study, which was based on a limited set of markers [9–11, 22, 23]. The experimental design was a randomized complete block design with 3 replications. Every plot comprised three rows with a total plot size of 3.0 to 4.0 m2.
Phenotypic data was collected for six important seed quality traits for each DH line and parent: seed oil content (%) and protein content (%), which were separately defined as the percentages of the oil and protein in the total seed dry weight, respectively; three important components of the fatty acid in the seed oil: the erucic acid content (%), the linolenic acid content (%), and the stearic acid content (%); and the content of glucosinolates in the total seed dry weight (µmol/g). The quality traits were determined based on near infrared reflectance spectroscopy measuring three technical and three biological replicates. The details of the phenotyping are outlined in detail in previous studies [10, 11, 22].
A total of 117 genetically independent B. napus inbred lines were used in this study for validating the prediction accuracy based on the TN DH population. The validation population was developed based on hundreds of crosses between B. rapa and B. carinata accessions [55, 56]. The validation population was grown in one semi-winter environment (Wuhan, China) in 2013–2014 in a trial with three replicates. Every plot comprised two rows with a total plot size of 2.0 to 3.0 m2. Seed oil content and protein content was measured using the same method as that used for the TN DH population.
Phenotypic data analyses
The best linear unbiased estimates (BLUEs) of phenotypic values and variance components were estimated by the following linear mixed model using ASREML-R software :
The genotype effects were treated as fixed effects and the other effects were treated as random. To estimate variance components, all effects were treated as random. Broad-sense heritability was calculated as the ratio of genotypic to phenotypic variance: where NE refers to the number of environments, NR is the average number of replications per location, is the genotypic variance, is the variance of genotype times environment interaction, and refers to the error variance.
Genotypic data analyses
The 202 DH lines of the TN DH population and the two parents were previously fingerprinted using a 60k SNP array based on an Illumina Infinium assay . Quality control was performed and those markers have been removed which are either monomorphic, have missing values of >5%, a minor allele frequency <5%, or degree of heterozygosity >5% in the DH population. After applying the quality check outlined above, 180 DH lines with 13,678 high-quality SNP markers remained. By aligning the marker sequence of the 13,678 SNPs to the reference “Darmor-bzh” genome of B. napus version 4.1 via BLAST analysis, 9,628 SNP markers could be assigned a unique physical position in the genome with the parameters of 100% alignment, E value <10−20 and mismatch <2 (S2 Table). After removing redundant SNPs in full linkage disequilibrium (LD), 1,527 markers representing recombination loci (referred to as representative markers) remained (S2 Table). The 1,527 representative markers included 1,052 representative markers from 1,052 genetic bins and 475 single markers. From each of the genetic bins, one marker with the least missing rate and the best available physical alignment position was selected as representative marker. In this way, a total of 1,527 representative markers were obtained and used for the subsequent analysis. Pairwise LD between markers was calculated as the squared Pearson moment correlation coefficient using R package genetics . The 117 lines of the validation population were genotyped using the same SNP array and the 1,527 representative markers selected in the TN DH population were used for prediction.
QTL mapping and genome-wide prediction
For the QTL mapping, the SNP markers were coded according to the F∞ metric . The genome-wide QTL mapping method is based on the inclusion of cofactors  obtained by stepwise multiple linear regressions using the Bayesian information criterion . The genome-wide scan was conducted comparing the full model comprising the SNP and all cofactors versus a reduced model including only cofactors. We used a false-discovery rate (FDR) of P<0.1 to test for significance. The proportion of the phenotypic variance explained (PVE) by all QTLs, was estimated using the adjusted R2 values fitting a multiple regression .
We performed a five-fold cross-validation of the QTL mapping in which the total population of 180 DH lines was randomly divided into two groups with 100 replications according to the ratio of 4:1 (one group with 144 lines and the other group with 36 lines). One hundred and forty-four lines were used as the training set and the remaining 36 were used as the test set. QTL mapping was performed in each training set and estimated QTL effects were used to predict the genetic values of the lines of the test set. The prediction accuracy was defined as the correlation between the predicted and observed phenotypic values standardized with the square root of the heritability.
For the genome-wide prediction, four different models were used in this study. We implemented three methods exploiting the additive marker effect: genomic best linear unbiased prediction (GBLUP), ridge regression best linear unbiased prediction (RR-BLUP) , and BayesCπ . To accelerate computation speed and eliminate the impact of LD on the prediction accuracy of BayesCπ, we removed SNPs with r2>0.95. For BayesCπ, the Gibbs sampling ran 20,000 times, and the first 6,000 cycles were used as burn in. We also implemented an extended GBLUP model denoted as EG-BLUP, which models digenic epistatic effects as well as additive effects . The accuracies of all these genome-wide prediction methods were determined based on the adjusted entry means for the 180 genotypes applying five-fold cross-validation. Details of the implementation of the models have been described elsewhere [41, 42, 65]. We performed 100 cross-validation runs and estimated the accuracy as the Pearson correlation coefficient between predicted and observed values standardized with the square root of the heritability.
To evaluate the dependence of prediction accuracy on training set size, we applied cross-validation with randomly selected subsets of n (n = 48, 80, 112, 144) lines from the full data to form the training set and used the remaining lines as the test set. To evaluate the dependence of prediction accuracy on marker density, we selected subsets of m (m = 100, 1,000, 5,000, 13,678) evenly distributed markers from the full dataset and applied five-fold cross-validations using all 180 lines. The sampling procedure was randomly repeated 100 times for each scheme, and the prediction accuracies were averaged across the 100 cross-validation runs. We focused in the above outlined analyses of sampling of marker subsets and training set sizes on the traits seed oil content and protein content. The traits were selected because oil content was evaluated in a large number of 11 environments and protein content exhibited a high heritability.
We also evaluated the prediction accuracy using an independent validation population. The marker effects were estimated based on RR-BLUP and the TN DH population. Marker effects were used to predict the performance of the 117 individuals of the validation population. The prediction accuracy was again estimated as the Pearson correlation coefficient between predicted and observed values standardized with the square root of the heritability. Heritability was estimated using the variance components estimated for the TN DH population.
Intensive field evaluation of the TN DH population resulted in high-quality phenotypic data
We combined the information on seed protein content with previously published data for other seed quality traits of the TN DH population. We observed a wide variation of BLUEs approximating a normal distribution for most traits, except for erucic acid content (Fig 1, S3 Table). The analyses across environments revealed significant (P<0.001) variances for genotypes, environments, and interactions between genotypes and environments (Table 1). Broad-sense heritability estimates were high for the six traits, ranging from 0.81 for protein content to 0.98 for erucic acid content. Consequently, the intensive phenotyping resulted in high-quality data representing an excellent source for dissecting the genetic basis of the six traits.
All correlations passed significance tests with P-values less than 0.001 except for the correlation between protein content and erucic acid, glucosinolates, and stearic acid content.
In total, 80% of the pairwise trait comparisons were significantly (P<0.001) associated with Pearson moment correlation coefficients ranging from -0.84 between erucic acid content and stearic acid content to 0.66 between erucic acid content and glucosinolate content (Fig 1). Interestingly, protein content was only poorly associated with erucic acid, glucosinolate, and stearic acid content. This lack of associations points to independent biochemical pathways and genes controlling the two classes of traits.
Large differences in the complexity of the genetic architecture of the six seed quality traits
Altogether, 151 SNP markers passed the FDR significance level of P<0.1 in the genome-wide QTL mapping scan (Figs 2 and S1). The QTL numbers for the six traits ranged from 8 to 59 and were distributed across 19 chromosomes of B. napus. Phenotypic variance explained by a single putative QTL exceeded 5% for 27 SNPs and reached 45% for a QTL located on chromosome C03 controlling erucic acid content (Table 2). A second major QTL was detected on chromosome A08 for erucic acid content, explaining 31% of the phenotypic variance. However, the majority of the QTLs, especially those influencing oil and protein content, exhibited only minor effects. Among the detected QTLs, seven were putative pleiotropic QTLs influencing two traits. For instance, the marker “Bn-scaff_15794_1-p347392”, which was physically aligned to C03 and detected as a putative pleiotropic QTL, explained 26% and 45% of the phenotypic variance for stearic acid and erucic acid concentration, respectively.
The x-axis represents the corresponding physical position of each SNP of the 13,678 SNPs across the genome from chromosome A01 to A10 and C01 to C09. Those markers without unique alignment to the reference genome were arranged in the axis noted as “not assigned”. The Y-axis represents the corresponding false-discovery rate (FDR) of each QTL indicating the significance for QTL calling. The PVE, i.e. proportion of the phenotypic variance explained by each QTL, is listed in Table 2.
We used five-fold cross-validation to reliably estimate the potential of marker-assisted selection (MAS). The average accuracy of MAS ranged from 0.47 for protein content to 0.81 for erucic acid content (Table 3). These values were substantially lower compared to the non-cross-validated results (Table 2), underlining the need to validate findings of linkage mapping.
Accuracies of genome-wide prediction in the TN DH population
We used four different models to investigate the efficiency of genome-wide prediction for the six seed quality traits. Genomic selection significantly showed higher prediction accuracies than MAS for all traits, with the most pronounced differences observed for linolenic acid, oil, and protein content (Table 3). The average prediction accuracy of RR-BLUP was the highest, while BayesCπ performed best for erucic acid and glucosinolate content. The most complex model comprising main and epistatic effects, EG-BLUP, performed best for linolenic acid content. In general, traits with high heritability could be predicted with higher accuracy compared to traits with low heritability.
As expected for a bi-parental mapping population, a large number of markers were in tight LD and could thus be grouped into genetic bins because of the absence of recombination events. We reduced the co-linearity among markers and removed redundant markers in full linkage disequilibrium, resulting in a subset of 1,527 SNP markers (S2 Table, S2 Fig). Prediction accuracy increased on average by 3% using the reduced 1,527 representative marker set compared to genomic selection based on all SNPs (Table 3).
Effects of marker density, training population size, and number of environments on prediction accuracy
Genome-wide prediction based on RR-BLUP performed best on average and, in addition, was computationally efficient. Therefore, we conducted comprehensive analyses on the factors driving the accuracy in genome-wide prediction exclusively based on RR-BLUP. We varied the training population size and marker density and examined the accuracy of genome-wide predictions in our study. The accuracy remained in the range of 0.44 to 0.67 for all traits using only 48 lines as the training set (Fig 3). Interestingly, prediction accuracy reached a peak with 1,000 randomly selected markers and decreased only marginally for a subset of 100 markers. The prediction accuracy increased by ~4% for all six traits when using a representative set of markers compared to the 1,527 random evenly distributed markers (Table 2). Thus, our results indicated that to improve the accuracy of genome-wide prediction in a bi-parental population, the population size is more important than the density of markers.
Average prediction accuracy of genomic selection applying RR-BLUP based on (a) varying training population sizes and (b) number of markers.
We further studied the effects of the number of environments and training population size on the accuracy of genomic selection by focusing on oil and protein content. The traits were selected because oil content was evaluated in a large number of 11 environments and protein content exhibited a high heritability. We randomly selected training sets comprising n = 48, 80, 112, and 144 lines evaluated for oil content evaluated in subsets of environments (k = 2, 3,…, 11 for oil content; k = 2, 3, 4, 5 for protein content). The accuracy was estimated as the Pearson moment correlation coefficient between predicted genotypic values and the adjusted entry means of all remaining lines evaluated across all environments. This type of cross-validation allows for the study of the prediction accuracy assuming reduced phenotyping intensity. As the test set was not evaluated in any of the environments, their performance could not be estimated by phenotypic correlations between environments. The prediction accuracies based on phenotypic data from only two environments were 0.73 for oil content and 0.60 for protein content (Fig 4). Compared to the accuracy evaluated with the full dataset, the accuracy decreased only in the range of 3% to 6%. The accuracy remained at 0.55 for oil content when only 48 lines and 2 environments were used.
Accuracies of genome-wide prediction for seed oil content and protein content validated in a diverse population of 117 B. napus lines
A panel of 117 diverse lines was genotyped and phenotyped in one environment in order to validate the prediction accuracies of seed oil content and protein content. A total of 1148 common genetic bin markers across the AC genome, were screened for the two populations. Since we observed the highest accuracies for RR-BLUP in the TN DH population, we also used this method for prediction. The prediction accuracy amounted to 0.14 for protein content and 0.17 for oil content based on the genetic bin markers.
Erucic acid, stearic acid, and glucosinolate content are promising targets for marker-assisted selection
Understanding the genetic basis of seed oil yield and quality is important for efficient rapeseed breeding . Previous studies revealed differences in the complexity of the genetic architecture of the six quality traits examined in our study [1, 3, 9–11, 17, 20, 22, 72], which were further substantiated using five-fold cross-validations (Table 2; S3 Table). Oil, protein, and linolenic content are characterized by the absence of a reliable large-effect QTL, while erucic acid, stearic acid, and glucosinolate content are to a large degree controlled by a few QTL exhibiting large effects. For instance, the major QTL located in A08 and C03 (Table 2) totally explained 76.66% of the phenotypic variance for erucic acid, which has been widely identified previously in TN DH population and other mapping populations of B. napus [21, 22, 24, 73]. The major QTL located on C03 and explaining 16.44% of the phenotypic variance for oil content was identified in both of TN DH population and KN DH population . The QTL with large genetic effects for total seed glucosinolates located in A08, C01 and C03, were also identified previously in this and other mapping populations [9, 19, 75]. These QTLs are interesting targets for marker-assisted selection, which can be applied in rapeseed breeding in combination with the enrichment of target alleles for F2 populations prior to producing DH populations. Besides of the consistent identified QTL, we also detected several new QTLs accounting for these seed quality traits with minor effects in TN DH population compared to the previous QTL identification in this population [10, 11, 22], which possibly because of the improved detection power using the high density SNP markers compared to the previous QTL identification using the relatively low-density markers. For example, the QTL “Bn-Scaffold000217-p20168” in C05, “Bn-scaff_16130_1-p1013445” and “Bn-scaff_16130_1-p1039452” located in C07 was newly identified for seed oil content of this population compared to that detected in Jiang et al., (2014). It is important to note that due to the absence of a physical position of 2,828 SNPs without alignment to the reference genome, we could not compare those QTL without unique physical position with previous studies.
Genetic architecture marginally impacts the choice of the genome-wide prediction model
Previous simulation studies revealed that equal shrinkage of marker effects as applied in RR-BLUP can be inappropriate for traits influenced by QTLs exhibiting large effects [13, 76]. In these cases, Bayesian models such as BayesB or BayesCπ, which allow specific shrinkage of every marker , are expected to outperform RR-BLUP. The superiority of BayesB over RR-BLUP has been reported for glucosinolate content in a previous genome-wide prediction study based on a diverse panel of 391 rapeseed lines derived from nine families . Superiority of Bayes models versus RR-BLUP has also been observed for flowering time in the TN DH population . In accordance with this observation, prediction accuracies for erucic acid and glucosinolate content were maximized when applying BayesCπ, with improvements of 8–10% compared to RR-BLUP (Table 2). In contrast, for stearic acid, RR-BLUP outperformed BayesCπ despite the presence of large-effect QTL. This is most likely due to two reasons. First, the ratio between the phenotypic variance explained by the two large-effect QTL versus that explained by the remaining small-effect QTL is approximately 1 to 1 for stearic acid content, while this ratio is 5 to 1 for erucic acid and glucosinolate content. Second, one large-effect QTL controlling stearic acid is reflected in several marker-trait associations with SNPs being in tight linkage disequilibrium (r2 >0.8), while the QTLs are reflected by only a limited number of SNPs for erucic acid and glucosinolate content.
Epistasis, the interaction between genes , is an additional potential force influencing the choice of the biometrical model for genome-wide prediction . Previous linkage and linkage disequilibrium mapping studies in rapeseed indicated that epistatic effects are involved in fatty acid metabolism [11, 47]. Consequently, we implemented EG-BLUP for genome-wide prediction, which explicitly considers digenic additive by additive epistatic effects . We observed, however, higher prediction accuracies of EG-BLUP compared to the other genome-wide prediction models only for linolenic acid content (Table 2). Moreover, the gains in prediction accuracy were only marginal. These negligible benefits are in contrast to the non-cross-validated results of previous linkage and linkage disequilibrium studies [11, 47] and point to the strong need to validate the role of epistatic effects. In summary, the accuracy of genomic selection does not crucially depend on the choice of a suitable genome-wide prediction model and is an attractive alternative to marker-assisted selection.
Implementation of genome-wide prediction in rapeseed breeding
The successful implementation of genome-wide prediction in rapeseed breeding requires that a certain threshold of prediction accuracy is realized [40, 79]. Previous model studies in wheat and maize suggested a threshold for the prediction accuracy of 0.5 [80, 81]. We chose two important traits, oil content and protein content, to illustrate the size of the training population, the number of environments, and the marker density required to reach a prediction accuracy of 0.5 for the bi-parental population.
In accordance with previous studies based on bi-parental populations [82–85], approximately one thousand markers were required before the prediction accuracy plateaued (Fig 3). Increasing the number of markers introduced problems due to collinearities. Prediction accuracies were higher for a reduced a set of 1,527 SNPs, which represented recombination loci in the population, in contrast to the full 13,678-marker set (Table 3). Thus, to improve the accuracy of genome-wide prediction in a bi-parental population, the population size indicating recombination events obtained is more important than the density of markers.
The number of lines has a greater impact on the prediction accuracy than the number of environments (Figs 3 and 4). The prediction accuracy is already stagnating at three environments, and thus it is more efficient to invest in training population size. For protein content, approximately 144 lines evaluated in two environments were needed to reach an accuracy of 0.6. For oil content, prediction accuracy amounted to 0.6 when the training population was decreased to 80 lines and the number of environments reduced to two. These results suggest that genome-wide prediction can be successfully implemented in bi-parental populations even with small training population sizes and is an attractive complement to phenotypic selection to improve seed quality traits.
The prediction accuracy within bi-parental populations is of central importance examining the potential to implement genome-wide prediction in breeding programs exploiting the double-haploid technology. Moreover, it is of interest to study the potential to use the prediction model also in unrelated populations. We examined an extreme validation scenario for the prediction of seed oil and protein content using a genetically diverse sample of 117 lines which were based on crosses between B. rapa and B. carinata accessions [55, 56]. The prediction accuracy in this independent and genetically very distinct validation population still amounted to 0.14 for protein content and 0.17 for oil content. While interpreting the prediction accuracies it has to be considered that the validation population exhibits genome segments from B. rapa/B. carinata. However, the used Brassica 60K-SNP array was developed based on the AC genome sequence of B. rapa, B. olearaca and B. napus. Thus, the lack of unique polymorphisms of B. carinata is expected to impair the prediction accuracies. Taking this into consideration, our independent validation reflects the high quality of the developed calibration models even in very diverse backgrounds highlighting the prospects of genome-wide prediction for routine rapeseed breeding programs.
S1 Fig. Quantile-quantile plots of association mapping for six traits using different methods.
The green lines are the -log10 P-values of the linear regression method. The red lines are the -log10 P-values of the stepwise multiple linear regression method. The expected uniform distribution of negative -log10 P-values is indicated by the diagonal line in blue.
S2 Fig. Decay of linkage disequilibrium with physical distance.
Within each physical distance class, marker pairs are clustered into five groups with varying r2 values.
S1 Table. Locations, years and environments for the field experiment.
S2 Table. The physical alignment information of the SNPs of the TN DH population to the reference "Darmor-bzh" genome of B. napus.
The authors gratefully acknowledge the previous colleagues who have contributed to the collected phenotypes and genotypes of the TN DH population (Dr. Dan Qiu, Dr. Congcong Jiang, Dr. Yan Long, Dr. Ruiyuan Li, Dr. Ji Feng, Dr. Jiaqin Shi and others). We also acknowledge Dr. Bin Yi for his technical help on the SNP chip analysis.
- Conceptualization: JZ JCR.
- Data curation: JZ MW JLM.
- Formal analysis: YSZ JZ MW.
- Funding acquisition: JZ.
- Methodology: YSZ JZ PFL JCR.
- Project administration: JZ JCR.
- Resources: JLM JZ LS XHW.
- Software: YSZ JCR JZ.
- Supervision: JZ JCR.
- Validation: YSZ MW PFL JZ.
- Writing – original draft: JZ YSZ.
- Writing – review & editing: JCR JLM.
- 1. Delourme R, Falentin C, Huteau V, Clouet V, Horvais R, Gandon B, et al. Genetic control of oil content in oilseed rape (Brassica napus L.). Theoretical and Applied Genetics. 2006;113(7):1331–45. pmid:WOS:000241261000014.
- 2. Möllers C. Potential and future prospects for rapeseed oil. In: Gunstone FD (ed) Rapeseed and canola oil—production, processing, properties and uses. Oxford, UK: Blackwell Publishing; 2004.
- 3. Zhao JY, Dimov Z, Becker HC, Ecke WG, Mollers C. Mapping QTL controlling fatty acid composition in a doubled haploid rapeseed population segregating for oil content. Mol Breeding. 2008;21(1):115–25. pmid:WOS:000251321400010.
- 4. Abbadi A, Leckband G. Rapeseed breeding for oil content, quality, and sustainability. Eur J Lipid Sci Tech. 2011;113(10):1198–206. pmid:WOS:000297012300004.
- 5. Velasco L, Becker H. Estimating the fatty acid composition of the oil in intact-seed rapeseed (Brassica napus L.) by near-infrared reflectance spectroscopy. Euphytica. 1998;101(2):221–30.
- 6. Bell JM. Nutrients and toxicants in rapeseed meal: a review. Journal of animal science. 1984;58(4):996–1010. pmid:6202670.
- 7. Liu Z, Hirani AH, McVetty PBE, Daayf F, Quiros CF, Li GY. Reducing progoitrin and enriching glucoraphanin in Brassica napus seeds through silencing of the GSL-ALK gene family. Plant Mol Biol. 2012;79(1–2):179–89. pmid:WOS:000303410400014.
- 8. Vageeshbabu HS, Chopra VL. Genetic and biotechnological approaches for reducing glucosinolates from rapeseed-mustard meal Plant Biochemistry and Biotechnology. 1997;6(2):53–62.
- 9. Feng J, Long Y, Shi L, Shi JQ, Barker G, Meng JL. Characterization of metabolite quantitative trait loci and metabolic networks that control glucosinolate concentration in the seeds and leaves of Brassica napus. New Phytol. 2012;193(1):96–108. pmid:WOS:000298300800014.
- 10. Jiang CC, Shi JQ, Li RY, Long Y, Wang H, Li DR, et al. Quantitative trait loci that control the oil content variation of rapeseed (Brassica napus L.). Theoretical and Applied Genetics. 2014;127(4):957–68. pmid:WOS:000333353400016.
- 11. Wang XD, Long Y, Yin YT, Zhang CY, Gan L, Liu LZ, et al. New insights into the genetic networks affecting seed fatty acid concentrations in Brassica napus. Bmc Plant Biol. 2015;15. pmid:WOS:000351903900001.
- 12. Lande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124(3):743–56. pmid:1968875; PubMed Central PMCID: PMC1203965.
- 13. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29. pmid:11290733; PubMed Central PMCID: PMC1461589.
- 14. Bernardo R. Molecular markers and selection for complex traits in plants: Learning from the last 20 years. Crop Sci. 2008;48(5):1649–64. pmid:WOS:000259792100001.
- 15. Zhao Y, Mette MF, Gowda M, Longin CFH, Reif JC. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat. Heredity. 2014;112(6):638–45. pmid:WOS:000336501000009.
- 16. Heslot N, Jannink JL. An alternative covariance estimator to investigate genetic heterogeneity in populations. Genet Sel Evol. 2015;47:93. pmid:26612537; PubMed Central PMCID: PMC4661961.
- 17. Burns MJ, Barnes SR, Bowman JG, Clarke MHE, Werner CP, Kearsey MJ. QTL analysis of an intervarietal set of substitution lines in Brassica napus: (i) Seed oil content and fatty acid composition. Heredity. 2003;90(1):39–48. pmid:WOS:000181165800008.
- 18. Chen YB, Qi L, Zhang XY, Huang JX, Wang JB, Chen HC, et al. Characterization of the quantitative trait locus OilA1 for oil content in Brassica napus. Theoretical and Applied Genetics. 2013;126(10):2499–509. pmid:WOS:000324873400006.
- 19. Gajardo HA, Wittkop B, Soto-Cerda B, Higgins EE, Parkin IAP, Snowdon RJ, et al. Association mapping of seed quality traits in Brassica napus L. using GWAS and candidate QTL approaches. Mol Breeding. 2015;35(6). pmid:WOS:000356310400017.
- 20. Hu XY, Sullivan-Gilbert M, Gupta M, Thompson SA. Mapping of the loci controlling oleic and linolenic acid contents and development of fad2 and fad3 allele-specific markers in canola (Brassica napus L.). Theoretical and Applied Genetics. 2006;113(3):497–507. pmid:WOS:000239002300012.
- 21. Lu GY, Harper AL, Trick M, Morgan C, Fraser F, O'Neill C, et al. Associative Transcriptomics Study Dissects the Genetic Architecture of Seed Glucosinolate Content in Brassica napus. DNA Res. 2014;21(6):613–25. pmid:WOS:000347101100004.
- 22. Qiu D, Morgan C, Shi J, Long Y, Liu J, Li R, et al. A comparative linkage map of oilseed rape and its use for QTL analysis of seed oil and erucic acid content. Theoretical and Applied Genetics. 2006;114(1):67–80. pmid:WOS:000242154000008.
- 23. Zou J, Jiang CC, Cao ZY, Li RY, Long Y, Chen S, et al. Association mapping of seed oil content in Brassica napus and comparison with quantitative trait loci identified from linkage mapping. Genome / National Research Council Canada = Genome / Conseil national de recherches Canada. 2010;53(11):908–16. pmid:WOS:000285555000007.
- 24. Li F, Chen BY, Xu K, Wu JF, Song WL, Bancroft I, et al. Genome-Wide Association Study Dissects the Genetic Architecture of Seed Weight and Seed Quality in Rapeseed (Brassica napus L.). DNA Res. 2014;21(4):355–67. pmid:WOS:000344243900002.
- 25. Downey RK, Craig BM. Genetic control of fatty acid biosynthesis in rapeseed (Brassica napus L). J Am Oil Chem Soc. 1964: 41:475–478.
- 26. Fourmann M, Barret P, Renard M, Pelletier G, Delourme R, Brunel D. The two genes homologous to Arabidopsis FAE1 co-segregate with the two loci governing erucic acid content in Brassica napus. Theor Appl Genet. 1998:96: 852–858.
- 27. Harper AL, Trick M, Higgins J, Fraser F, Clissold L, Wells R, et al. Associative transcriptomics of traits in the polyploid crop species Brassica napus. Nat Biotechnol. 2012; 30(8):798–802. pmid:22820317.
- 28. Wu G, Wu Y, Xiao L, Li X, Lu C. Zero erucic acid trait of rapeseed (Brassica napus L.) results from a deletion of four base pairs in the fatty acid elongase 1 gene. Theor Appl Genet. 2008: 116(4):491–9. pmid:18075728.
- 29. Barker GC, Larson TR, Graham IA, Lynn JR, King GJ. Novel insights into seed fatty acid synthesis and modification pathways from genetic diversity and quantitative trait Loci analysis of the Brassica C genome. Plant Physiol. 2007:144(4):1827–42. pmid:17573542; PubMed Central PMCID: PMC1949901.
- 30. Gupta V, Mukhopadhyay A, Arumugam N, Sodhi YS, Pental D, Pradhan AK. Molecular tagging of erucic acid trait in oilseed mustard (Brassica juncea) by QTL mapping and single nucleotide polymorphisms in FAE1 gene. Theor Appl Genet. 2004: 108(4):743–9. pmid:14564400.
- 31. Wurschum T, Abel S, Zhao YS. Potential of genomic selection in rapeseed (Brassica napus L.) breeding. Plant Breeding. 2014;133(1):45–51. pmid:WOS:000330800600006.
- 32. Raman H, Raman R, Coombes N, Song J, Prangnell R, Bandaranayake C, et al. Genome-wide association analyses reveal complex genetic architecture underlying natural variation for flowering time in canola. Plant Cell Environ. 2016;39(6):1228–39. pmid:26428711 in process.
- 33. Zhong S, Dekkers JC, Fernando RL, Jannink JL. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a Barley case study. Genetics. 2009;182(1):355–64. pmid:19299342; PubMed Central PMCID: PMC2674832.
- 34. Rutkoski JE, Heffner EL, Sorrells ME. Genomic selection for durable stem rust resistance in wheat. Euphytica. 2011;179(1):161–73. pmid:WOS:000289305300015.
- 35. Crossa J, de los Campos G, Perez P, Gianola D, Burgueno J, Araus JL, et al. Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers. Genetics. 2010;186(2):713–U406. pmid:WOS:000282807400022.
- 36. Zhao YS, Li Z, Liu GZ, Jiang Y, Maurer HP, Wurschum T, et al. Genome-based establishment of a high-yielding heterotic pattern for hybrid wheat breeding. P Natl Acad Sci USA. 2015;112(51):15624–9. pmid:WOS:000366916000042.
- 37. Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, et al. Genome-based prediction of testcross values in maize. Theoretical and Applied Genetics. 2011;123(2):339–50. pmid:WOS:000291600800012.
- 38. Bernardo R. Genomewide markers as cofactors for precision mapping of quantitative trait loci. Theoretical and Applied Genetics. 2013;126(4):999–1009. pmid:WOS:000316766000011.
- 39. Bernardo R. Genomewide Selection of Parental Inbreds: Classes of Loci and Virtual Biparental Populations. Crop Sci. 2014;54(6):2586–95. pmid:WOS:000346571900020.
- 40. Riedelsheimer C, Czedik-Eysenberg A, Grieder C, Lisec J, Technow F, Sulpice R, et al. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nature genetics. 2012;44(2):217–20. pmid:22246502.
- 41. Zhao YS, Zeng J, Fernando R, Reif JC. Genomic Prediction of Hybrid Wheat Performance. Crop Sci. 2013;53(3):802–10. pmid:WOS:000319527000008.
- 42. Zhao YS, Gowda M, Wurschum T, Longin CFH, Korzun V, Kollers S, et al. Dissecting the genetic architecture of frost tolerance in Central European winter wheat. J Exp Bot. 2013;64(14):4453–60. pmid:WOS:000326724700025.
- 43. Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redona E, et al. Genomic Selection and Association Mapping in Rice (Oryza sativa): Effect of Trait Genetic Architecture, Training Population Composition, Marker Number and Statistical Model on Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding Lines. PLOS Genet. 2015;11(2). ARTN e1004982 pmid:WOS:000352081800038.
- 44. Reif JC, Zhao YS, Wurschum T, Gowda M, Hahn V. Genomic prediction of sunflower hybrid performance. Plant Breeding. 2013;132(1):107–14. pmid:WOS:000313893100014.
- 45. Hayes BJ, Cogan NOI, Pembleton LW, Goddard ME, Wang JP, Spangenberg GC, et al. Prospects for genomic selection in forage plant species. Plant Breeding. 2013;132(2):133–43. pmid:WOS:000317422300001.
- 46. Hofheinz N, Borchardt D, Weissleder K, Frisch M. Genome-based prediction of test cross performance in two subsequent breeding cycles. Theoretical and Applied Genetics. 2012;125(8):1639–45. pmid:WOS:000310952400004.
- 47. Wurschum T, Reif JC, Kraft T, Janssen G, Zhao Y. Genomic selection in sugar beet breeding populations. BMC genetics. 2013;14:85. pmid:24047500; PubMed Central PMCID: PMC3848454.
- 48. Bao Y, Vuong T, Meinhardt C, Tiffin P, Denny R, Chen SY, et al. Potential of Association Mapping and Genomic Selection to Explore PI 88788 Derived Soybean Cyst Nematode Resistance. Plant Genome-Us. 2014;7(3). pmid:WOS:000345157300003.
- 49. Shu YJ, Yu DS, Wang D, Bai X, Zhu YM, Guo CH. Genomic selection of seed weight based on low-density SCAR markers in soybean. Genet Mol Res. 2013;12(3):2178–88. pmid:WOS:000331717400002.
- 50. Li L, Long Y, Zhang LB, Dalton-Morgan J, Batley J, Yu LJ, et al. Genome Wide Analysis of Flowering Time Trait in Multiple Environments via High-Throughput Genotyping Technique in Brassica napus L. PLOS One. 2015;10(3). pmid:WOS:000351425400084.
- 51. Jan HU, Abbadi A, Lucke S, Nichols RA, Snowdon RJ. Genomic Prediction of Testcross Performance in Canola (Brassica napus). PLOS One. 2016;11(1). ARTN e0147769 pmid:WOS:000369528600057.
- 52. Zhang Y, Thomas CL, Xiang JX, Long Y, Wang XH, Zou J, et al. Construction of a high-density SNP-based genetic linkage map in Brassica napus and QTL meta-analysis of root traits under contrasting phosphorus supply in two growth systems. 2016:under review.
- 53. Liu LZ, Qu CM, Wittkop B, Yi B, Xiao Y, He YJ, et al. A High-Density SNP Map for Accurate Mapping of Seed Fibre QTL in Brassica napus L. PLOS One. 2013;8(12). pmid:WOS:000329116700026.
- 54. Qian LW, Qian W, Snowdon RJ. Sub-genomic selection patterns as a signature of breeding in the allopolyploid Brassica napus genome. Bmc Genomics. 2014;15. pmid:WOS:000347731500002.
- 55. Xiao Y, Chen L, Zou J, Tian E, Xia W, Meng J. Development of a population for substantial new type Brassica napus diversified at both A/C genomes. Theor. Appl. Genet. 2010;121, 1141–1150. pmid:WOS:000281794100012.
- 56. Zou J, Zhu J, Huang S, Tian E, Xiao Y, Fu D, et al. Broadening the avenue of intersubgenomic heterosis in oilseed Brassica. Theor. Appl. Genet. 2010;120, 283–290. pmid:WOS:00027280370000951.
- 57. Butler D, Cullis B, Gilmour A, Gogel B. ASREML-R, Reference Manual Version 3 Queensland Department of Primary Industries and Fisheries: Brisbane. 2009.
- 58. Chalhoub B, Denoeud F, Liu SY, Parkin IAP, Tang HB, Wang XY, et al. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science. 2014;345(6199):950–3. pmid:WOS:000340524700051.
- 59. Warnes GR. “The Genetics Package,” R News; 2003.
- 60. Falconer D, Mackay T. Introduction to Quantitative Genetics 4th edn Longman: Harlow. 1996.
- 61. Schwarz G. Estimating the Dimension of a Model. Ann Statist. 1978;6:461–4.
- 62. Utz HF, Melchinger AE, Schon CC. Bias and Sampling Error of the Estimated Proportion of Genotypic Variance Explained by Quantitative Trait Loci Determined From Experimental Data in Maize Using Cross Validation and Validation With Independent Samples. Genetics. 2000;154(3):1839–49. pmid:10866652; PubMed Central PMCID: PMC1461020.
- 63. Whittaker JC, Thompson R, Denham MC. Marker-assisted selection using ridge regression. Genet Res. 2000;75(2):249–52. pmid:10816982.
- 64. Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection. BMC bioinformatics. 2011;12:186. pmid:21605355; PubMed Central PMCID: PMC3144464.
- 65. Jiang Y, Reif JC. Modeling Epistasis in Genomic Selection. Genetics. 2015;201(2):759–+. pmid:WOS:000362838500030.
- 66. Zhao J, Huang J, Chen F, Xu F, Ni X, Xu H, et al. Molecular mapping of Arabidopsis thaliana lipid-related orthologous genes in Brassica napus. Theor Appl Genet. 2012;124(2):407–21. pmid:21993634.
- 67. Sun M, Hua W, Liu J, Huang S, Wang X, Liu G, et al. Design of new genome- and gene-sourced primers and identification of QTL for seed oil content in a specially high-oil Brassica napus cultivar. PLOS One. 2012;7(10):e47037. 10.1371. pmid:23077542 PMCID: PMC3470593.
- 68. Körber N, Bus A, Li J, Parkin IA, Wittkop B, Snowdon RJ, et al. Agronomic and Seed Quality Traits Dissected by Genome-Wide Association Mapping in Brassica napus. Front Plant Sci. 2016;7:386. 10.3389. pmid:27066036 PMCID: PMC4814720.
- 69. Xu JF, Long Y, Wu JG, Xu HM, Wen J, Meng J, et al. QTL mapping and analysis of the embryo and maternal plant for three limiting amino acids in rapeseed meal. Eur Food Res Technol.2015,240:147–158.
- 70. Huang XQ, Huang T, Hou GZ, Li L, Hou Y, Lu YH. Identification of QTLs for seed quality traits in rapeseed (Brassica napus L.) using recombinant inbred lines (RILs). Euphytica (2016) 210:1–16.
- 71. Wen J, Xu JF, Long Y, Wu JG, Xu HM, Meng JL, et al. QTL mapping based on the embryo and maternal genetic systems for non-essential amino acids in rapeseed (Brassica napus L.) meal. J Sci Food Agric. 2016;96(2):465–73. doi: 10.1002 pmid:25645377.
- 72. Zhao JY, Becker HC, Zhang DQ, Zhang YF, Ecke W. Oil content in a European x Chinese rapeseed population: QTL with additive and epistatic effects and their genotype-environment interactions. Crop Sci. 2005;45(1):51–9. pmid:WOS:000226435300007
- 73. Smooker AM, Wells R, Morgan C, Beaudoin F, Cho K, Fraser F, et al. The identification and mapping of candidate genes and QTL involved in the fatty acid desaturation pathway in Brassica napus. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2011;122(6):1075–90. pmid:21184048.
- 74. Wang XD, Wang H, Long Y, Li D, Yin Y, Tian J, et al. Identification of QTLs associated with oil content in a high-oil Brassica napus cultivar and construction of a high-density consensus map for QTLs comparison in B. napus. PLOS One. 2013:8(12):e80569. pmid:24312482; PubMed Central PMCID: PMC3846612.
- 75. Fu Y, Lu K, Qian LW, Mei JQ, Wei DY, Peng XH, et al. Development of genic cleavage markers in association with seed glucosinolate content in canola. Theoretical and Applied Genetics. 2015;128(6):1029–37. pmid:WOS:000354633800003.
- 76. Xu R. Measuring explained variation in linear mixed effects models. Statistics in medicine. 2003;22(22):3527–41. pmid:14601017.
- 77. Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the Bayesian alphabet. Genetics. 2009;183(1):347–63. pmid:19620397; PubMed Central PMCID: PMC2746159.
- 78. Park YC. Theory for the number of genes affecting quantitative characters: II. Biases from drift, dominance, inequality of gene effects, linkage disequilibrium and epistasis. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 1977;50(4):163–72. pmid:24407765.
- 79. Albrecht T, Auinger HJ, Wimmer V, Ogutu JO, Knaak C, Ouzunova M, et al. Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years. Theoretical and Applied Genetics. 2014;127(6):1375–86. pmid:WOS:000336756200009.
- 80. Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME. Plant Breeding with Genomic Selection: Gain per Unit Time and Cost. Crop Sci. 2010;50(5):1681–90. pmid:WOS:000281060300010.
- 81. Longin CFH, Mi XF, Wurschum T. Genomic selection in wheat: optimum allocation of test resources and comparison of breeding strategies for line and hybrid breeding. Theoretical and Applied Genetics. 2015;128(7):1297–306. pmid:WOS:000356148000006.
- 82. Lorenzana RE, Bernardo R. Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theoretical and Applied Genetics. 2009;120(1):151–61. pmid:WOS:000271939900013.
- 83. Heffner EL, Jannink JL, Iwata H, Souza E, Sorrells ME. Genomic Selection Accuracy for Grain Quality Traits in Biparental Wheat Populations. Crop Sci. 2011;51(6):2597–606. pmid:WOS:000295839200031.
- 84. Hickey JM, Dreisigacker S, Crossa J, Hearne S, Babu R, Prasanna BM, et al. Evaluation of Genomic Selection Training Population Designs and Genotyping Strategies in Plant Breeding Programs Using Simulation. Crop Sci. 2014;54(4):1476–88. pmid:WOS:000338773100018.
- 85. Zhang X, Perez-Rodriguez P, Semagn K, Beyene Y, Babu R, Lopez-Cruz MA, et al. Genomic prediction in biparental tropical maize populations in water-stressed and well-watered environments using low-density and GBS SNPs. Heredity. 2015;114(3):291–9. pmid:WOS:000349671000006.