Unraveling the Genetic Basis of Seed Tocopherol Content and Composition in Rapeseed (Brassica napus L.)

Background Tocopherols are important antioxidants in vegetable oils; when present as vitamin E, tocopherols are an essential nutrient for humans and livestock. Rapeseed (Brassica napus L, AACC, 2 n = 38) is one of the most important oil crops and a major source of tocopherols. Although the tocopherol biosynthetic pathway has been well elucidated in the model photosynthetic organisms Arabidopsis thaliana and Synechocystis sp. PCC6803, knowledge about the genetic basis of tocopherol biosynthesis in seeds of rapeseed is scant. This project was carried out to dissect the genetic basis of seed tocopherol content and composition in rapeseed through quantitative trait loci (QTL) detection, genome-wide association analysis, and homologous gene mapping. Methodology/Principal Findings We used a segregating Tapidor × Ningyou7 doubled haploid (TNDH) population, its reconstructed F2 (RC-F2) population, and a panel of 142 rapeseed accessions (association panel). Genetic effects mainly contributed to phenotypic variations in tocopherol content and composition; environmental effects were also identified. Thirty-three unique QTL were detected for tocopherol content and composition in TNDH and RC-F2 populations. Of these, seven QTL co-localized with candidate sequences associated with tocopherol biosynthesis through in silico and linkage mapping. Several near-isogenic lines carrying introgressions from the parent with higher tocopherol content showed highly increased tocopherol content compared with the recurrent parent. Genome-wide association analysis was performed with 142 B. napus accessions. Sixty-one loci were significantly associated with tocopherol content and composition, 11 of which were localized within the confidence intervals of tocopherol QTL. Conclusions/Significance This joint QTL, candidate gene, and association mapping study sheds light on the genetic basis of seed tocopherol biosynthesis in rapeseed. The sequences presented here may be used for marker-assisted selection of oilseed rape lines with superior tocopherol content and composition.


Introduction
Vitamin E is an essential micronutrient for humans and mammals, which have no ability to synthesize it. Vitamin E only accumulates in photosynthetic organisms, in which it consists of tocopherols and tocotrienols, a group of amphipathic molecules composed of a polar chromanol head group derived from the shikimate (SK) pathway and a polyprenyl lipophilic side chain from the methylerythritol phosphate (MEP) pathway and chlorophyll degradation; these amphipathic molecules differ in the degree of saturation of their aliphatic tails. Within the tocopherols and tocotrienols, four forms (a, b, c, d) vary in the number and position of methyl groups on the chromanol ring. The atocopherol form is considered to have the highest nutritional value for humans and livestock [1][2][3][4]. Tocopherols are the major form of vitamin E in the seeds of dicots, but tocotrienols exist widely in the seeds of monocots [5,6]. Besides its nutritional value, vitamin E is also a major natural antioxidant in seed oils, making it critical for polyunsaturated fatty acid stability [5]. Due to its benefits for health and oil quality, improving the content or composition of vitamin E in staple crops has been a major aim of crop breeding [7].
The vitamin E biosynthetic pathway has been well elucidated in the model species Arabidopsis thaliana and Synechocystis sp. PCC6803 [1,6,[8][9][10][11][12][13][14]. Most of the genes encoding the key enzymes of the core biosynthetic pathway have been identified and functionally characterized, and select genes have been transformed and overexpressed individually or collectively in various plants. For example, the c-tocopherol methyltransferase gene VTE4 has been overexpressed in the seed of A. thaliana to elevate a-tocopherol content (aTC), and co-overexpression of the 4-hydroxyphenylpyruvate dioxygenase gene PDS1 and the homogentisate phytyltransferase gene VTE2 has been carried out in rapeseed (Brassica napus L.) seeds to increase the total tocopherol content (TTC) [15,16]. However, the results were inconsistent between experiments. It is possible to alter the tocopherol composition (TCO) by seedspecific overexpression of VTE4, resulting in nearly complete conversion of c-tocopherol to a-tocopherol in the seeds of A. thaliana. In contrast, it is relatively difficult to significantly increase the TTC in rapeseed. The most successful attempt at increasing TTC occurred in rapeseed, with nearly two-fold enhancement of TTC after co-transformation with A. thaliana genes HPPD and VTE2 [11,[15][16][17][18]. These observations indicate that the tocopherol biosynthetic pathway and its regulation are more complex than supposed to date.
Edible oils are major dietary sources of vitamin E [19]. Rapeseed is one of the most important oil crops, and is grown mostly in temperate climates worldwide. The most abundant types of vitamin E in rapeseed oil are aand c-tocopherol, as well as a small proportion of d-tocopherol [20][21][22]. Seeds of rapeseed vary widely in terms of tocopherol content and composition. In a study by Goffman and Becker (2002) in a germplasm collection of 87 rapeseed accessions, the TTC in seeds of rapeseed ranged from 182 to 367 ppm [19]. Fritsche et al. (2012) reported an even broader range of variation (197.5 to 460.1 ppm) for TTC in one of the investigated germplasm collections [23]. These variations provide an incentive for breeding high-tocopherol varieties having superior tocopherol content. The genetic mechanism for this large variation remains unclear.
Quantitative genetic approaches, which map quantitative trait loci (QTL) onto linkage maps or which detect associations between markers and phenotypes, are powerful methods to dissect complex metabolic traits [24]. As an example, Wentzell et al. (2007) found that all glucosinolate expression QTL coincided with glucosinolate metabolic QTL in A. thaliana, indicating that metabolic QTL regions may encompass candidate genes for the respective metabolic pathway [25]. Gilliland et al. (2006) detected Arabidposis QTL associated with tocopherol content and composition, and identified 14 QTL affecting tocopherol content and composition in seeds. Of these 14 QTL, five contained tocopherol biosynthesis candidate genes, implying that QTL mapping has the power to uncover genetic variations that previously had not been well characterized, with the exception of variations caused by mutations of known genes [26]. To date, a single study of QTL mapping associated with seed tocopherol content and composition has been carried out in rapeseed. Marwede et al. (2005) used a segregating doubled haploid population of rapeseed to identify eight QTL distributed on six linkage groups. Furthermore, Marwede et al. demonstrated that seed tocopherol content and composition were significantly affected by genotype, environment, and strong genotype6environment interactions. The authors reasoned that only a small number of genes are involved in tocopherol biosynthesis. However, the use of only one doubled haploid population with quite low phenotypic variation was a limitation of that investigation [27].
Association analysis based on linkage disequilibrium (LD) is another strategy for illustrating quantitative inheritance. There are two association analysis approaches: candidate-gene association analysis and genome-wide association study (GWAS). Candidategene association analysis, which is applicable to relatively simple or well-dissected biosynthesis pathways, is based on the notion that sequence variations within candidate genes cause phenotypic variation. In contrast, GWASs rely on a very high marker density to tag any region of the genome [28]. This method has been applied in many crops due to the benefit of detecting phenotypeassociated polymorphisms without constructing mapping populations [29,30]. One limitation of association analysis is that collections of genotypes, especially large collections with abundant genetic diversity, may have different population structures caused mainly by local adaptation or a variety of selections and familial relatedness from recent co-ancestry, which can give rise to spurious associations [31]. Nevertheless, a series of statistical programs and methods, such as STRUCTURE, SPAGeDi, unified mixed-model, and principal component analysis (PCA), have been developed to overcome this limitation [32][33][34]. Rare sequence variants are another problem. The basic theory for GWAS is that common genetic variations explain quantitative trait variations; rare alleles will therefore decrease the detection power of GWAS [35]. The extent of LD determines the resolution of an association analysis. A high level of LD decay implies low resolution, but a low level of LD decay means that a higher marker density is needed [36,37]. Two GWASs previously investigated population structures and LD to demonstrate that rapeseed populations are highly structured and that LD decays rapidly [38,39].
Rapeseed and A. thaliana share a common ancestor that existed 14-20 million years ago [40]. Comparative alignment analysis between A. thaliana and rapeseed and in silico mapping of A. thaliana genes onto the rapeseed linkage map previously enabled efficient QTL mapping in rapeseed [41]. In contrast to A. thaliana, rapeseed is a polyploid species with a genome that is ten times larger [42]. On average, 2-6 copies of each A. thaliana gene occur in the rapeseed genome [43,44]. Correspondingly, the number of tocopherol biosynthesis genes in rapeseed is expected to be much higher than that in A. thaliana.
In this report, we describe the genetic architecture of seed tocopherol content and composition in rapeseed using bi-parental populations and a worldwide panel of rapeseed accessions. This combination of QTL and association mapping analysis provides a detailed picture of the network of tocopherol biosynthesis genes in rapeseed.

Phenotypic Diversity and Trait Correlation
In a first experiment, a Tapidor6Ningyou7 doubled haploid population (TNDH), its reconstructed F 2 (RC-F 2 ) population, and both parents were grown under four environments over two growing seasons. Moderate differences were observed between Tapidor and Ningyou7 for aTC, c-tocopherol content (cTC), TTC, and TCO. The winter-type parent Tapidor had a consistently higher TTC than the semi-winter-type parent Ningyou7; such that the TTC ranged from 349 ppm to 355 ppm in Tapidor but varied from 330 ppm to 331 ppm in Ningyou7 through different environments ( Figure S1). Broad variations occurred in tocopherol content and composition, with normal or near-normal distributions, and extreme values at both ends of the distributions exceeded the extreme values of both parental distributions, indicating transgressive segregation ( Figure S2A-P). Analysis of variance (ANOVA) revealed highly significant (P,0.0001) genetic and environmental effects for tocopherol content and composition. Heritabilities were considerably high, ranging between 0.65 and 0.78 for tocopherol traits (Table 1). Furthermore, we found high genetic correlations between cTC and TTC (0.91) as well as TCO (0.78; Table 2).
In a second experiment, 142 rapeseed accessions from all over the world were grown under two environments over two growing seasons. Large variations in tocopherol content and composition were observed. Most remarkably, the TTC varied from 181 ppm to 460 ppm ( Figure S2Q-X). As in the TNDH population, genetic variances were highly significant. The broad-sense heritabilities were even higher, ranging between 0.65 and 0.85 (Table 1). Interestingly, genetic correlations were considerably different from each other, as in the TNDH population. The highest correlation was detected between aTC and TCO (0.93), while the correlation between cTC and TTC was only 0.43 (Table 2).

QTL Mapping for Tocopherol Content and Composition in the TNDH and RC-F 2 Populations
We calculated QTL for aTC, cTC, TTC, and TCO. Phenotypic data were taken from the TNDH population (three environments) and from the RC-F 2 population (one environment). A total of 57 QTL were detected, with 53 QTL in the TNDH population and four QTL in the RC-F 2 population. These 57 QTL explained 5.0-20.3% of the phenotypic variation in tocopherol content and composition, and 70% of the QTL exerted modest effects, with R 2 ,10%. For most of the QTL (63%), Tapidor alleles caused an elevation in tocopherol contents which was in accordance with the higher tocopherol contents of this parent (Table S1). All QTL were distributed across ten linkage groups (A2, A3, A5, A7, A9, A10, C2, C3, C8, and C9), with four main QTL clusters on A3, A7, A9, and C3 ( Figure S3). QTL clusters on linkage groups A7 and A9 (qTOC.A7 and qTOC.A9), which were mainly associated with aTC and TTC, were detected in both populations.
Doubled haploid plants carrying the positive Tapidor QTL alleles (qTOC.A7 and qTOC.A9) were selected for backcrossing with the recurrent parent Ningyou7, which carries the negative alleles in these QTL regions. We investigated 63 BC 4 F 2 lines and 265 BC 4 F 3 lines via molecular markers located in or near qTOC.A7 and qTOC.A9. We confirmed that these lines carried the positive QTL alleles from Tapidor in qTOC.A7 and qTOC.A9. The lines were grown in the field and the greenhouse, and tocopherol contents were measured. As a result, 14 nearly isogenic lines (NILs) containing the Tapidor QTL alleles had significantly higher aTC or TTC (Table 3 and Figure S4).
The meta-analysis, which was used to integrate QTL for various traits in different environments into unique QTL, was carried out in two steps. First, QTL detected for the same trait in different   environments were integrated into nonredundant QTL. In the second step, QTL for different traits were merged into unique QTL. We identified 47 nonredundant QTL, of which 15, 11, 12, and nine were associated with aTC, cTC, TTC, and TCO, respectively (Table S2). Further analysis of the nonredundant QTL revealed 33 unique QTL in the TNDH and RC-F 2 populations. Of these, 16 unique QTL were found to exert pleiotropic effects, as each of them was associated with two or more tocopherol traits ( Table 4). Investigation of epistatic interactions revealed six significantly interacting pairs of loci controlling tocopherol content and composition in the TNDH and RC-F 2 populations in various environments. Six QTL were involved in these interactions, including one QTL/QTL interaction, four QTL/non-QTL interactions, and one non-QTL/non-QTL interaction. Interactions were mainly additive6additive. TTC and cTC were significantly affected by additive6additive epistatic effects, which explained 9.5-20.8% of the genotypic variance (Table S3).

Genetic and in silico Mapping of Candidate Genes Associated with Tocopherol Biosynthesis
We selected five B. napus genes with high similarity to A. thaliana genes VTE2, VTE3, VTE4, VTE5, and PDS1 as candidate genes for genetic linkage mapping. These genes were mapped to five linkage groups using the existing B. napus map (TNDH population). Interestingly, three of these genes co-localized with previously mapped tocopherol QTL ( Figure S5). Comparative alignment of B. napus and A. thaliana was implemented based on 375 markers with sequence information that can be aligned with A. thaliana sequences in the Arabidopsis information resource (TAIR, http://www.arabidopsis.org/; Table S4). Subsequently, we searched the TAIR database for genes underlying SK, MEP, chlorophyll degradation, and tocopherol core biosynthesis pathways in A. thaliana. These A. thaliana genes were aligned to the TNDH linkage map based on the comparative alignment of B. napus and A. thaliana and 14 A. thaliana genes located in seven unique QTL. Twelve genes were responsible for homogentisate and phytyl diphosphate biosynthesis, two precursors of tocopherol biosynthesis (Table 4 and Table S4).

Whole-genome Association Analysis
Considerable phenotypic variation in tocopherol content and composition was observed in the rapeseed association-mapping panel of 142 accessions in two environments (Table 1 and Figure  S2Q-X). Forty simple sequence repeat markers, evenly distributed across the TNDH linkage groups, were used to infer population structure to derive 102 polymorphic loci (Table S5). We determined the population structure by using the program STRUCTURE and by PCA [31,33]. Program STRUCTURE revealed three subgroups with k = 3 identified as the best turning point with the highest Dk in the association panel; most B. napus accessions were assigned to subgroup 1, 36 accessions were in subgroup 2, and 28 accessions were assigned to subgroup 3 ( Figure 1A and Figure 1B ). Subgroup 1 consisted of nearly all winter-type accessions, including the parent of the TNDH population (Tapidor). In contrast to subgroup 1, most accessions in subgroup 3 were annuals (spring type). Subgroup 2 contained both winter-and spring-type accessions, as well as semi-wintertype accessions such as the other parent of the TNDH population (Ningyou7; Table S6). The population substructure identified by program STRUCTURE was confirmed by the analysis of PCA ( Figure 1C).
Next, we calculated the familial relatedness (kinship) between accessions by utilizing the same set of markers as the subsequent association analysis. Relatedness was calculated with 101 markers with 224 polymorphic loci. More than 65% of the pairwise kinship values ranged between 0-0.05, indicating a low level of relatedness between varieties (Figure 2). When we tested the effects of population structure (Q) and kinship (K) on phenotypic variation, we found that both parameters exerted a significant effect on phenotypic variation, e.g. the Q effect and the K effect explained 39.7% and 47.1% of the aTC variation, respectively (2009-2010 growing season in Jingzhou; Table 5).
The extent of LD decay in the association panel was evaluated using pairwise combinations of 81 markers derived from the TNDH linkage map (Table S5). LD decay decreased within 2 cM over the whole genome and within 1 cM on chromosome A9 (Figure 3 and Figure S6).

Tocopherol Variation in B. napus Seeds
Plant oil-derived tocopherols are an important source of vitamin E, which is a necessary micronutrient for human health. In this study, tocopherol content and composition were tested in a doubled haploid population, its derived RC-F 2 population, and a panel of 142 B. napus accessions. A broad range of phenotypic variation was observed, which is largely in accordance with previous results [45,46]. For all four traits measured, the phenotypic variation of the doubled haploid population largely exceeded both parents. Thus, the transgressive variation for tocopherol traits in B. napus has great potential for breeding rapeseed varieties with improved tocopherol characters.
Here we present a refined genetic analysis of tocopherol characters in B. napus. Previously, the heritabilities of tocopherol characters were reported to be low. Marwede et al. (2005) calculated rather low broad-sense heritabilities ranging from 0.23 for aTC to 0.5 for cTC, with 0.41 and 0.42 for TTC and TCO, respectively, resulting from significant genotype 6 environment interactions [20,27,46]. In our study, high broad-sense heritabilities were calculated for all four traits with the doubled haploid population as well as with the association panel, although strong genotype6environment interactions and environmental effects were detected. Thus, our observations imply that genetic variation was the main contributor to tocopherol variability, and that these traits are suitable for application in B. napus breeding.
As expected, significant genetic correlation coefficients were calculated between tocopherols and tocopherol-associated traits in the doubled haploid population and the association panel.  [20,27]. However, a significant genetic correlation (0.51) was detected between aTC and cTC in the doubled haploid population that had not been reported before. The genetic correlation in the association panel was not in accord with the correlation in the TNDH population; for example, the correlation between aTC and cTC was positive in the TNDH population but negative in the association panel, indicating that the complex population structure of the association panel may affect the evaluation of genetic effects. The prominent genetic correlations between cTC and oil content as well as between TTC and oil content indicate that an increase in oil content can result in elevated TTC in B. napus seeds and vice versa.

Genome-wide QTL Detection and Homologous Gene Mapping
Fifty-seven QTL distributed on ten linkage groups were detected in the populations under study. Furthermore, metaanalysis revealed 33 unique QTL, of which 16 were pleiotropic. Thus, we detected considerably more QTL than a previous study that reported eight QTL related to tocopherol content and composition on six linkage groups in a doubled haploid population [27]. However, a comparison of QTL regions between these two populations was not possible due to the lack of anchor markers. By comparing the QTL distributions of these two populations, we demonstrated that four linkage groups (A3, A7, C3, and C9) carried QTL in both populations, with two linkage groups (A7 and C3) containing QTL associated with the same traits (Table S8). This observation suggests that QTL mapping results in bi-parental populations are subject to the variation between the two parents. Four QTL clusters were detected; qTOC.A3 was associated with aTC and TCO, qTOC.A7 with aTC and TTC, and qTOC.A9 was especially associated with cTC and TTC. These results are consistent with our genetic correlation analysis, in which high genetic correlations were detected between aTC and TCO (0.53), aTC and TTC (0.78), and cTC and TTC (0.91).
Epistasis is another main genetic factor underlying complex traits [47]. In this study, cTC, TTC, and TCO each had two interaction pairs. Significant additive6additive epistasis effects had been detected for cTC and TTC in both doubled haploid population and its derived RC-F 2 population. These results imply the importance of epistasis effects in the genetic basis of cTC and TTC.
Rapeseed oil with high TTC is supposed to have good oil stability due to the antioxidant function of tocopherols, and high aTC oils have good nutritional value [48]. It is noteworthy that two QTL clusters, qTOC.A7 associated with aTC and TTC and qTOC.A9 with cTC and TTC, are most applicable for advancing tocopherol content and composition in seed of B. napus. Several NILs introgressed segments of the higher parent (Tapidor); plants derived from these two QTL clusters showed significant elevation of aTC and TTC compared with the recurrent parent Ningyou7, indicating that genetic variations underlying these QTL influenced tocopherol content. Further dissection of the genetic basis of these QTL will benefit the breeding of B. napus varieties with high aTC or TTC.
Thanks to the rapid development of genomic research in plants, the tocopherol-related biosynthesis pathway has been well characterized in recent decades. Many genes encoding the key enzymes of these pathways have been cloned and used for the genetic engineering of biofortified staple crops [3,7,9,11,15,18,[49][50][51][52][53][54][55][56]. A candidate/homologous gene approach based on characterized genes in metabolism biosynthesis pathways was a strategy used for dissecting complex traits, a strategy that can also assist in the identification of the genes responsible for QTL [57,58]. Here, 14 A. thaliana genes associated with tocopherol biosynthesis were mapped onto the TNDH linkage map and co-localized with seven unique QTL by in silico mapping. Subsequently, five B. napus genes homologous to A. thaliana in the core pathway of tocopherol biosynthesis were mapped onto the TNDH linkage map by genetic mapping, with three co-localized in QTL regions. Homologous genes located in QTL regions may provide information regarding the genetic variations underlying the QTL. On the other hand, we did not identify known genes in many of the detected QTL such as qTOC.A9, a major QTL cluster related to cTC and TTC. It has been suggested that novel genetic loci affect tocopherol contents in B. napus seeds, a hypothesis in accord with the observation that just five of 14 QTL contained known genes related to tocopherol biosynthesis in an investigation of tocopherol content and composition in two Arabidopsis recombinant inbred lines [26]. Furthermore, an alternative reason may be the polyploid nature of B. napus due to the fact that the B. napus genome is ten times larger

Genome-wide Association Analysis and Comparison with QTL Mapping
Association mapping based on LD analysis is a widely applied method for dissecting complex traits such as quantitative traits [59]. Although this method enjoys many advantages, such as the lack of a need to construct mapping populations and a large range of variations, several limiting factors exist [29].
Population structure is one of these limiting factors. Many statistical programs have been developed to resolve this problem, including the transmission disequilibrium test and the quantitative transmission disequilibrium test, which are used for family-based samples. Genomic control and structure association are applied to germplasm-based samples. Genomic control employs a large number of random markers to evaluate the effect of population structure, and assumes that this effect is fixed for all markers in the association analysis. However, genomic control may result in the loss of power for markers with unusual allele frequencies across ancestral populations [33,[60][61][62][63]. PCA, which describes the variation detected by all markers in terms of a few main component variables, has become a popular tool in population genetics [33,64].
In this investigation, population structure was examined with PCA and the structure association-based program STRUCTURE. Both analyses assigned the association panel into three subgroups. Furthermore, we demonstrated that population structure contributed to phenotypic variation, with the exception of TTC. Although we detected a low level of pairwise relatedness, kinship significantly contributed to the phenotypic variation for all traits. This observation was consistent with previous studies. Atwell et al.  [65,66].
The extent of LD decay is an important factor in association analysis. In this study, the LD decayed rapidly over the whole genome as well as in individual linkage group A9. Our results support previous reports that the LD extended only for ,2 cM in canola quality winter rapeseed and for 1 cM in a species-wide germplasm set of B. napus. Thus, high-resolution mapping can be obtained through association mapping in B. napus with a high density of markers [38,39].
Dozens of statistical models have been developed for association analysis [29,[67][68][69][70]. Six widely used models were tested in this investigation, with the result that different traits fitted to different models. In this study, models that included 'K' performed better than models that only contained 'Q'. This observation was consistent with the report that the P observed value from the GLM model greatly deviated from the P expected value, followed by the 'Q' model, while the P observed value from the 'K' model and the 'K+Q' model were close to the P expected value for TTC, plant height, and kernel length in a maize association panel [71]. After comparing various models, Stich et al. suggested that the 'K+Q' model was not only appropriate for association mapping in humans, maize, and Arabidopsis, but also for rapeseed, potato, and sugar beet, indicating that the 'K+Q' model can be applied widely to various species [72].
After correcting for population structure effects, 61 loci were significantly associated with tocopherol content and composition, and 17 loci were detected in the field experiments from both growing years. This observation implies that the association panel had abundant genetic variation for tocopherol content and composition, and that some genetic loci were stable across different environments, which should be useful in marker-assisted selection. Interestingly, 11 of the associated loci were co-localized with QTL regions, demonstrating the complementarity of association analysis and QTL mapping. The combination of these two approaches allowed us to exploit the abundant recombination events and mutations in the association samples during a long history as well as the statistical power of QTL mapping to detect the loci of rare alleles. Therefore, the joint use of linkage mapping and association mapping is a good alternative strategy for detecting genetic variations [28,[68][69][70]72,73].

Conclusions
Our results demonstrate that the wide variations in tocopherol content and composition, the high levels of broad-sense heritabilities, and the complex but significant genetic correlations among tocopherol characters occurred not only in the bi-parental populations but also in the association panel. These observations suggest that there is tremendous genetic potential for improving the tocopherol content of B. napus. In addition, dozens of unique QTL and associated loci were detected in the bi-parental populations and the association panel for tocopherol content and composition from multiple environments, indicating that tocopherol content variation was caused by variations in many genetic loci. We used recombinant backcross lines to dissect the QTL regions of qTOC.A7 and qTOC.A9. We discovered that lines with the introgressed segments of the Tapidor parent exhibited elevated aTC or TTC, showing that genetic variations underlying the QTL confidence intervals explained the variations in tocopherol content. Further analysis of these QTL will enable us to fully uncover the genetic basis for the variation in tocopherol content in B. napus seeds. Furthermore, approximately one-quarter of the unique QTL confidence intervals from in silico and genetic mapping identified homologous genes associated with tocopherol biosynthesis from A. thaliana, which provides information for QTL dissections. Finally, 17 significantly associated loci were identified in the data from both growing years; 11 of these loci were located in the QTL confidence intervals, which will be useful for breeding superior rapeseed varieties with high tocopherol content by marker-assisted selection. Taken together, QTL mapping, association analysis, and homologous gene mapping and alignment revealed a complex genetic network for tocopherol biosynthesis.

Plant Materials and Field Experiments
A segregating F 1 -derived doubled haploid population of 202 lines had been previously developed from a mating between a European winter cultivar, Tapidor, and a Chinese semi-winter cultivar, Ningyou7 [74]. Crosses had previously been made among doubled haploid lines to obtain a RC-F 2 population with 436 lines [75,76]. The TNDH population and its parents were planted in three natural environments at three locations in China (Wuhan, TNDH lines singled out by marker-assisted selection were used to develop NILs for qTOC.A7 and qTOC.A9. A BC 4 F 2 population was constructed for qTOC.A7 in 2006 in the field of Wuhan, while a BC 4 F 3 population was developed for qTOC.A9 in 2009 in the greenhouse of Kiel. Ningyou7 was the recurrent parent for these two populations.
The plants grown in the field and greenhouse belonged to Huazhong Agricultural University and were grown only for DNA and RNA extraction and phenotypic evaluation. These field studies did not involve endangered or protected species.

Tocopherol Content Measurement
A homogenous mixture of 30-50 mg of mature B. napus seeds were ground in a swing-mill (Geno/Grinder, Germany) with two 5-mm metal beads in the presence of 1000 ml n-heptane. Samples were incubated for 24 h in the dark at 220uC, and then centrifuged at 4uC for 15 min at 16,0006g; 50 ml of the clear supernatant were collected for high-performance liquid chromatography as previously described [18,77]. Tocopherols were identified by comparison of retention time, and concentrations were calculated by comparison of the area values with values from exterior standard tocopherols (Merck, Germany). TTC was designated as the sum of aTC and cTC in air-dried seeds. TCO was the ratio of aTC to cTC.

Statistical Analysis of Phenotypic Variance
Statistical analysis was conducted with SAS 8.0 [78]. Genotype, environment, and genotype6environment interaction variances in the TNDH population, the RC-F 2 population, and the association panel were analyzed by ANOVA in the GLM. The broad-sense heritability was calculated with the formula h 2 = s 2 g /(s 2 g +s 2 ge / n+s 2 e /nr), where s 2 g , s 2 ge , s 2 e , n, and r represent the genetic variance, the interaction variance of genotype6environment, the error variance, the number of environments, and the number of replications, respectively. Genetic correlation was calculated with the formula r G = cov Gxy /(s 2 Gx 6s 2 Gy ) 1/2 , where cov Gxy , s 2 Gx , and s 2 Gy were the genetic covariance and variance of the pair-wise traits, respectively. The significance of each genetic correlation was determined using a t-test of the correlation coefficients [76]. The mean value of each trait for all populations was used in subsequent QTL and association analyses.

Linkage Map Construction and QTL Detection
A linkage map was developed with 344 molecular markers derived from the TNDH population [74]. Many molecular markers, including simple sequence repeats, restriction fragment length polymorphisms, sequence-related amplified polymorphisms, and sequence-tagged sites had previously been added to this core linkage map [75,76]. In this report, a new linkage map spanning 2190 cM with 790 molecular makers was constructed by JoinMap3.0 (http://www.kyazma.nl/index.php/mc.JoinMap) and utilized in subseequent QTL analysis (Table S4). The program Windows QTL Cartographer 2.5 was used with the composite interval method for QTL mapping [79]. To define the QTL thresholds, the permutation test was carried out by randomly shuffling the trait values 1000 times under the condition of P = 0.05 [80]. LOD values of 2.47-3.26 for TNDH and 3.96-4.47 for RC-F 2 were adopted to identify significant QTL. QTL detected in different environments were integrated into unique QTL in two steps with BioMercator 2.1 when their confidence intervals overlapped [81,82]. QTL for the same trait in different environments were integrated into non-redundant QTL, then Table 6. Associated markers identified in both two years' genome wide association studies of tocopherol content and composition with 101 markers.

Marker
LG non-redundant QTL for different traits were integrated into unique QTL. The genetic effects of tocopherol, including singlelocus and two-locus effects, in different environments were detected with QTLmapper 2.0 [83]. The QTL nomenclature in this report generally follows the description of Long et. al [75,76]. The identified QTL were designated with the initial letter ''q,'' followed by an abbreviation for tocopherol (TOC), the linkage group name, and an abbreviation representing the various forms of tocopherol. The nonredundant QTL were named with the initial designation ''nq,'' followed by TOC and the linkage group. If more than one nonredundant QTL were detected in the same linkage group, the QTL name included an alphabetical letter. The unique QTL were designated with the initial letters ''uq,'' followed by TOC, the linkage group name, and the serial number of the QTL in the linkage group. QTL in the same linkage group were considered to be a QTL cluster, which was designated with the initial letter ''q,'' followed by TOC and the linkage group name.
Comparative Alignment and in silico Mapping between B. napus and A. thaliana Comparative alignment between B. napus and A. thaliana was based on the 375 molecular markers with sequence information (Table S4 and Figure S7). Homologous genes in the MEP, SK, and chlorophyll degradation pathways referred to the description of Almeida et.al [84]. The subsequent steps of comparative alignment and in silico mapping proceeded as in previous reports [41,75].

Population Structure and Kinship Evaluation in the Association Panel
The association panel was genotyped with 101 molecular markers, which resulted in 327 polymorphism loci. Polymorphism loci with frequencies below 10% were excluded to avoid the effect of rare alleles; 224 polymorphism loci were identified for use in subsequent analysis.
One hundred and two polymorphism loci, derived from 40 molecular markers evenly distributed on 19 TNDH linkage groups, were used to evaluate population structure (Table S5). These markers were used to test 142 accessions based on the method described by Chen et al. [87]. STRUCTURE 2.2, which is based on Bayesian clustering, was implemented for assigning the natural accessions into subpopulations [32]. We tested various numbers of subpopulations ranging from k = 1 to k = 10. Five runs were processed for each k value with 100,000 burn-in length and 100,000 iterations. The results (Q matrix) of replicate runs output from STRUCTURE were integrated by the CLUMPP software [88]. Subsequently, the number of subpopulations were determined by the Dk method [89]. PCA was carried out based on data from the same markers with the software NTYSpc [90]. A covariance matrix exported from NTYSpc was used for subsequent association analysis. The effects of population structure for all traits were evaluated by SAS PROC GLM. The model randomly included two of the three Q matrices on the condition of k = 3.
Kinship was estimated with the software SPAGeDi based on data from all 101 markers [38]. All negative kinship coefficients were set to zero and then multiplied twice prior to association analysis [91]. The effects of kinship for all traits were tested with TASSEL V3.0 with MLM model and calculated as h 2 = s 2 a / (s 2 a +s 2 e ), where s 2 a and s 2 e were the genetic variance and the error variance, respectively [92].

Linkage Disequilibrium Evaluation
LD between markers was evaluated with TASSEL V3.0 by calculating r 2 between makers. Loci on the same linkage group were used to evaluate LD decay. The threshold of significant LD for these linked loci was defined as the 95% quantile of the r 2 value among unlinked loci pairs. LD decay with genetic distance was tested by nonlinear regression of r 2 values [39,93].

Model Comparisons and Association Analysis
Six models were compared to choose the most suitable model for each trait. The first model, ANOVA, did not consider population structure (Q), PCA, or kinship (K) effects. The second model (Q) considered Q effects, while the third model, PCA, considered the population structure effects developed from PCA. The fourth model (K) considered kinship effects, while the fifth model (PCA+K) considered both PCA and K effects and the last model (Q+K) considered both Q and K effects. The ANOVA, Q, and PCA model were calculated by GLM in TASSEL V3.0, while the K, PCA+K, and Q+K models were evaluated by MLM in TASSEL V3.0 [92]. The quantile-quantile plots of estimated -log10 (p) were constructed from the observed p values from the marker-phenotype association and the expected p value, supposing that no associations were observed between markers and traits [94]. Finally, association analysis was carried out with the best suitable model for each trait with TASSEL V3.0.