Regulatory Network of Secondary Metabolism in Brassica rapa: Insight into the Glucosinolate Pathway

Brassica rapa studies towards metabolic variation have largely been focused on the profiling of the diversity of metabolic compounds in specific crop types or regional varieties, but none aimed to identify genes with regulatory function in metabolite composition. Here we followed a genetical genomics approach to identify regulatory genes for six biosynthetic pathways of health-related phytochemicals, i.e carotenoids, tocopherols, folates, glucosinolates, flavonoids and phenylpropanoids. Leaves from six weeks-old plants of a Brassica rapa doubled haploid population, consisting of 92 genotypes, were profiled for their secondary metabolite composition, using both targeted and LC-MS-based untargeted metabolomics approaches. Furthermore, the same population was profiled for transcript variation using a microarray containing EST sequences mainly derived from three Brassica species: B. napus, B. rapa and B. oleracea. The biochemical pathway analysis was based on the network analyses of both metabolite QTLs (mQTLs) and transcript QTLs (eQTLs). Co-localization of mQTLs and eQTLs lead to the identification of candidate regulatory genes involved in the biosynthesis of carotenoids, tocopherols and glucosinolates. We subsequently focused on the well-characterized glucosinolate pathway and revealed two hotspots of co-localization of eQTLs with mQTLs in linkage groups A03 and A09. Our results indicate that such a large-scale genetical genomics approach combining transcriptomics and metabolomics data can provide new insights into the genetic regulation of metabolite composition of Brassica vegetables.


Introduction
Brassica crops are important food sources (oil, vegetables and condiments), and are considered to have beneficial nutritional properties such as antitumoral activities [1], [2], [3]. The presumed healthy components such as phenylpropanoids, phenolics, flavonoids and glucosinolates have been widely characterized in Brassica [4], [5], [6]. For example, Brassica leaves have been found to accumulate conjugated forms of both flavonols (quercetin, kaempferol and isorhamnetin) and flavones (apigenin and luteolin) [5]. In recent years, breeding for nutritional quality became an important research topic and in this context metabolomics approaches, using NMR, LC-MS and or GC-MS, have enabled the parallel assessment of the relative levels of a broad range of metabolites [7], [8], [9], [10]. In Arabidopsis, a Brassica rapa relative, the metabolite variation was found to be abundant and its genetic regulation complex, plausible candidate regulators could be identified after LC-MS mass peaks were assigned to genomic loci [11].
The components of the metabolome can be regarded as the end products of expressed genes and define the biochemical phenotype of a cell or tissue. Several complementary analytical methods can be applied in order to enable profiling, either relatively or quantitatively, of the various chemical classes of metabolites present in an organism [12], [13].
Transcriptomics measures the variation in mRNA transcript abundance and expression profiles can be treated as heritable traits to genetically map expression quantitative trait loci (eQTL); this type of analysis has been denominated as genetical genomics [14].
Thus, the integration of metabolomics with transcriptomic and genomic platforms has frequently been used as a strategy to identify candidate genes involved in the regulation of the levels of specific metabolites in plant systems [11] [15], [16], [17], [18].
The investigation of selected biochemical pathways of pre-defined metabolites showed that the connections between gene expression and metabolite variation are complex [19], [20].
The variation in B. rapa morphology is huge (oil, turnip, pak choi, Chinese cabbage and several Asian leafy morphotypes) and the variation in metabolite composition is similarly large [21], [22], [23]. This variation has increased the interest of plant breeders to breed for improved phytonutrient quality in Brassica. Several studies in A. thaliana, B. rapa and B. napus aimed to identify QTL for phytonutrients. The most studied secondary metabolites are the glucosinolates and the detected QTL not only confirmed the quantitative nature of this trait, but also allowed the identification of key biosynthetic and regulatory genes (Hasan et al. 2008,Lou et al. 2008, Wentzel et al. 2007, Feng et al. 2012.
The triplicated genome of B. rapa has a well described synteny with Arabidopsis [24], [25], [26]. As a consequence of the evolutionary gene triplication event, many genes have paralogues. The triplicated nature of the Brassica genome represents a challenge to unravel the genetics of metabolic traits using genetical genomics approaches.
In the present study we profiled both metabolite and transcript abundance in the leaves of six weeks-old plants from a Doubled Haploid (DH) population developed from an F1 cross between a yellow sarson (R500) and a pak choi (PC175) plant. We applied both an untargeted metabolomics approach for semi-polar secondary metabolites, including glucosinolates, flavonoids and phenylpropanoids, using liquid chromatography-mass spectrometry (LC-MS), and targeted analytical approaches to profile lipidsoluble isoprenoids (carotenoids and tocopherols) and folates. Additionally, the whole genome transcript level was performed on leaves of six week old plants from all DH lines using a distant pair design with a 60-mer oligo microarray assembled using EST sequences mainly from three species: B. napus, B. rapa and B. oleracea [27]. To prioritize the number of candidate genes, we subsequently focused on six known biochemical pathways of health-related phytonutrients, with the aim to identify regulatory genes. Furthermore, for the construction of a transcriptional regulatory network we used the very well described glucosinolate pathway [28], [29]. These data are an important reference for breeding purposes and a step to gain insight in the genetic factors responsible for the metabolite variation in B. rapa.

Materials and Methods
Parental materials to develop a doubled haploid (DH) population A B. rapa DH population was developed from a cross between pak choi PC-175 (cv.Nai Bai Cai; accession number VO2B0226), as the male parent, and yellow sarson YS-143 (accession number FIL500). The parental accessions were selected based on their differences in phenotypic characteristics and genetic distance [30], [31]. Actually, this population is a reciprocal cross of the previously developed population DH38 as described in [31].
The DH population was created using the microspore culture protocol described in [31], [32] [33] and [34]. The progeny of the DH plants from three F1 plants were used for the phenotyping and genotyping. The resulting population was named DH68 and consisted of 92 DH lines and for each line the corresponding F1 parent was known.

Plant growth conditions
The plants of the DH lines were grown in the greenhouse under the following conditions: 16 hrs light and temperature between 18 and 21uC. After one week, germinated seedlings were transplanted and randomly distributed over three different blocks. Five weeks after transplanting, the 3 rd and 4 th leaves of each replicate were collected and immediately frozen in liquid nitrogen to be grinded into fine powder and stored at 270uC. Each replicate was grinded individually and a pool of equally weighed amount of each of the three replicates was used for metabolic and transcriptomic profiling, as well as for DNA marker profiling to construct a linkage map and perform QTL analysis.

Construction of a genetic linkage map
Leaf material for DNA extraction was collected from seedlings and then ground using a Retsch 300 shaker (Retsch BV, Ochten, the Netherlands). DNA was isolated based on modified CTAB methods [35]. The AFLP analysis was performed according to Vos et al. (1995) [36].The AFLP and marker primer combinations used for the mapping were those combinations used for map construction in [37]. Linkage analysis and map construction were carried out using JoinMap 4.0 (http://www.kyazma.nl) for each population. Linked loci were grouped on the basis of pairwise LOD values between 4 and 7. The Kosambi mapping function was used to convert recombination data to map distances.

LC-MS metabolic profiling
Leaf samples were analyzed for variation in semi-polar metabolite composition using LC-QTOF-MS, essentially as described in [10]. In short, 0.5 g FW of frozen leaf powder of each DH line and the parents was extracted with 1.5 ml of methanol containing 0.1% formic acid. Samples were sonicated and then filtered (Captiva 0.45 mm PTFE filter plate, Ansys Technologies) into 96-well plates with 700 ml glass inserts (Waters) using a TECAN Genesis Workstation equipped with a 4-channel pipetting robot and a TeVacS 96-wells filtration unit. Samples were injected (5 ml) using an Alliance 2795 HT instrument (Waters), separated on a Phenomenex Luna C18 (2) column (2.06150 mm, 3 mm particle size) using a 5-35% acetonitrile gradient in water (acidified with 0.1% formic acid) and then detected on-line firstly by a Waters photodiode array detector (wavelength 220-600 nm (Waters) and secondly by a Water-Micromass QTOF Ultima MS with negative electrospray ionization (m/z 80-1500). Leucine enkephalin was used for online mass correction.
Metalign software (www.metalign.nl) was used to automatically extract and align all relevant mass signals (signal to local noise ratio .3) from the raw data files. The total of 6,673 mass peaks was filtered for signals being present in at least 15 samples and having amplitudes of at least 100 ion counts per scan (about 6 times the noise value) in at least one of the samples. Then, mass signals originating from the same metabolites were clustered based on their similar retention times and variation over samples, using MSClust software [13]. This retained 228 so-called centrotypes, or reconstructed metabolites, of which the relative abundance was represented by the total ion counts of the clustered signals.

Targeted analyses of folates and lipid-soluble isoprenoids
Lipid soluble isoprenoids, including carotenoids, tocopherols and chlorophylls, were extracted using Tris-buffer/methanol/ chloroform and analyzed by HPLC-PDA-fluorescence as described before [21]. Folates were extracted in a Na-acetate buffer pH 4.7 containing 20 mM DTT, and total folate levels were determined using a Lactobacillus casei-based microbiological assay, after enzymatic deconjugation by c-glutamyl hydrolase [21].

RNA isolation
Total RNA was extracted using the TRIZOL reagent (Invitrogen) starting with approximately 300 mg of frozen leaf material. RNA concentration and purity were quantified with Nanodrop measurements and the quality of the total RNA was checked on a 1% RNase free agarose gel.
Total RNA (5 ul) was treated with the DNase I Amplification Grade kit (Invitrogen) for digestion of single and double stranded DNA according to manufacturer's intructions.
Total RNA was cleaned using the RNeasy Mini Kit (Qiagen) starting with the 100 ml of DNase I treated RNA.The concentration of the cleaned RNA was measured and the samples were diluted with nuclease free water (Qiagen) to 400 ng/ml in a total volume of 10 ml.

Microarray design
The distant pair design proposed for two colour microarrays experiments by 35 was followed and implemented in the R package designGG (http://gbic.biol.rug.nl/designGG/). This design uses genetic marker information to identify pairs of individuals with maximum dissimilarity across the mapping population and improves the efficiency of eQTL studies. In our study we used information obtained from 48 pairs of DH lines and the information on parental lines was additionally hybridized in two microarrays with dye swap of Cy3 and Cy5. The probes were classified in correspondence to the EST sequence origin: (1) B.rapa, (2) B.napus, (3) B.oleracea and (4) other Brassica species [27].

QTL mapping analyses
QTL analysis was performed using the basic single marker regression procedure present in R/qtl [38]. This was done for both the expression ratio values and the metabolite datasets in a similar fashion, leading to results that could be easily combined in the end. A total of 78,688 expression probes together with the 228 reconstructed metabolites and the values of targeted metabolites (carotenoids,tocopherols and folic acid) were mapped back to the genetic map of B. rapa (Figures S1, S2, S3, S4) using the basic model. The expressions were measured using two-color array technology and for the mapping we used the ratio's between two genotypes In this model the genetic effect was annotated for the expression ratio's as described in [39]; b is the effect of the different allele (1 for A.B 0 for A = = B and 21 A,B). This model was evaluated at each marker to get an estimate of the allelic effect on the expression probes. This results in a P-value, which was transformed into a LOD score. These LOD scores where then visualized in different ways to show underlying genetic architecture, by using QTL profile plots and heat maps.
Six single-copy genes with eQTLs detected using the microarray profiles were selected to compare results obtained from the microarray and real-time PCR (BrARR3_A09, BrFRL2_A09, and BrCAM1_A07, BrCYCLIND1:1_A02, BrKRP2_A03 and BrDRL1_A08). Transcripts of these genes were profiled with two technical replications using the RNA samples of the 92 DH68 lines that were previously used for microarray analysis, as described in [40], [37]. The eQTLs for these six single-copy genes identified using the microarray were confirmed by RT-qPCR, with higher LOD scores for RT-qPCR data.
We constructed the glucosinolates mQTL profile based on the metabolites detected by the untargeted LC-MS analysis. LOD scores for each marker were calculated using an MQM procedure [38]. The output of the transcriptional data of the candidate regulatory genes (eQTLs) was filtered for a LOD threshold value of .3.5. The data was analyzed using the MetaNetwork computational protocol as described in [41] to obtain second order correlation values. The generated files were plotted using Cytoscape [42].

Identification of known regulatory genes for network analysis
To identify eQTL involved in the regulation of the biosynthesis of common Brassica phytochemicals, we first screened the probes represented on the microarray against a compiled list of genes that are known to be involved in the regulation of six biosynthetic pathways leading to the production of flavonoids, phenylpropanoids, glucosinolates, carotenoids, tocopherols and folate [28], [29], [43], [44], [45] (Table S1). The nucleotide sequences of the microarray probes were compared with the gene sequences of the current version of the B. rapa genome sequence available in the Brassica database (BRAD) (http://brassicadb.org/brad/) [46]. To identify cis and trans effects, the paralogues information of the selected candidate genes were also obtained from the BRAD website. The glucosinolate reference pathway was drawn in Cytoscape [42]. The reference KEGG pathway was imported after the implementation of the kgml reader plugin in Cytoscape.

Genetic linkage map of DH68
A genetic linkage map was constructed for population DH 68. A total of 456 markers were mapped in the DH population ( Figure  S1). The total map length was 1233.221 cM and consisted of 10 linkage groups, corresponding to the 10 chromosomes of Brassica rapa. The largest linkage group was A03 with a size of 192.22 cM and the smallest linkage group was A04 with a size of 63.646 cM. Each of the linkage groups had at least one SSR marker, which allowed the identification of the corresponding chromosome and the comparison with previously published maps [31]. In addition to the SSR markers, gene targeted markers related to the glucosinolate pathway were mapped in this population as well. A total of 21 markers related to the glucosinolate (GLS) biosynthetic pathway were mapped in all the linkage groups except for A08, A05,A04 and A10. The linkage group with most GLS genes mapped was A03 with eight genes mapped. The mapping of this particular group of markers together with the SSRs was of aid for the identification of the map orientation and for further syntenic comparison with Arabidopsis thaliana in the search of candidate genes for metabolic pathways and cis/trans effects.

Whole genome QTL analysis of metabolic content
Making use of targeted metabolic extraction and analyses procedures, including quantification using reference compounds, it was possible to measure the variation in absolute levels of selected health-related phytochemicals, i.e. tocopherols, carotenoids and folic acid, in the leaves of B. rapa. The parental line YS-143 produced significantly more of c-tocopherol, a-tocopherol, lutein, b-carotene and folate than the parental line PC-175. Variation of carotenoids, tocopherols and folate within the DH population are presented in Table 1. A transgressive segregation for the quantity of content of these targeted compounds was observed: most of the levels were transgressive in a positive and negative direction. QTL analysis of these targeted compounds revealed significant mQTL regions LOD .3 distributed across the whole genome table S2. Overlapping QTL regions, with a possible regulatory function for more than one carotenoid, were located on linkage groups A03, A05 and A10. No significant results were found for d-tocopherol. (Figures S2, S3, S4).
The LC-MS untargeted profiling resulted in the identification of 228 reconstructed metabolites; from this set 41 were identified as corresponding to the flavonoids pathway (Table S3).
Significant mQTLs were detected for 166 out of the 228 reconstructed metabolites (i.e. 73%). These mQTL were not equally distributed over the B. rapa linkage groups, as coldspots and hotspots for the genetic regulation of metabolite content could be identified (Figure 1). The most important regulatory region was located on linkage group A07, where mQTLs were detected for 47% of all LC-MS metabolites.
Whole genome QTL analysis of transcriptomics data and candidate gene filtering We followed a regression analysis of the transcript abundance represented by 78,278 informative probes on the microarray against 456 mapped markers. In total, 44,358 probes were detected as significant against a genetic marker with a LOD score .3.5.The whole genome profile of the number of eQTL versus chromosome position indicates that there is no evidence of eQTL clustered as hotspots. Instead, the eQTL were distributed randomly across the genome with a higher number than average on linkage groups A02, A03 and A09. (Figure 2).
In general, to identify regulatory genes we searched for genomic regions in which metabolite levels (mQTL) seem to co-localize with significant LOD values of annotated probes (eQTL) corresponding to candidate genes of their biochemical pathway. While the region in linkage group A07 was detected as a hotspot for mQTLs, no eQTLs were observed among the known biosynthetic genes. However, when the complete set of probes with a significant LOD score of more than 3.5 in linkage group A07 was filtered for GO terms related to flavonoid regulation, we could identify three probes representing genes involved in positive regulation of flavonoid biosynthesis. (Table S4) For most metabolites we could identify at least one eQTL overlapping with one mQTL. Only for a-tocopherol, having a QTL on linkage group A02, none of the annotated probes showed an eQTL at the same genomic region, and also for folic acid we were unable to identify probes that co-localize with its mQTL in A05 and A09, even when the selected LOD value of the annotated probes was lowered to LOD = 3. In general we were able to identify cis and trans eQTLs for the target metabolites in several linkage groups. The eQTL results corresponding to each pathway are summarized in table S5 and figures S2, S3, S4.
Additionally, correlation analysis was performed between all the significant probes, annotated as known regulatory genes or transcription factors, with eQTL LOD scores .3.5 and a correlation value .0.3. The correlation analysis indicated a high within and between pathway correlation (Figure 3). The highest correlation values .0.8 were found between eQTLs results from different paralogs and between different genes belonging to the flavonoid pathway.

Glucosinolate mQTLs Analysis
To further analyze the regulatory network of a specific phytochemical pathway we focused on the very well characterized glucosinolate biosynthetic pathway (Figure 4).
The parental lines yellow sarson (R500) and a pak choi (PC175) are known to differ in their metabolic and genetic compositions [21], [22]. We detected 20 glucosinolates in the parental lines and within the DH population (Table S6). Significant mQTLs were detected for 13 out of the 20 glucosinolates, mostly representing aliphatic glucosinolates, while mQTLs were not detected for the aromatic glucosinolate and for two out of the three indolic glucosinolates.
The mQTLs representing 12 aliphatic glucosinolates showed co-localization mostly in the genomic regions on linkage groups A03 and A09 ( Figure 5). This co-localization indicates a high genetic correlation between glucosinolates. In the case of linkage group A03, mQTLs were found for the long chain aliphatic glucosinolates and in A09 the mQTLs were mostly detected for the short-chain (C3 to C5) aliphatic glucosinolates and their modified forms. In the case of the indolic glucosinolates we could identify one mQTL for neoglucobrassicin, which was located in linkage group A02. This mQTL had a lower LOD score value in comparison to the LOD scores for aliphatic glucosinolates.

Transcriptional regulation of the glucosinolate pathway
In an effort to enrich the genetic map with informative markers, we mapped genetic markers related to the glucosinolate pathway in the different linkage groups. Although we found significant gene expression QTLs (eQTLs) for the regulation of glucosinolate metabolism in all the chromosomes, two hot spots were identified in chromosome A03 and A09 (Figure 4, 6). Along linkage group A03 three positions were identified as important because of the colocalization of significant eQTLs. At top of chromosome A03 (38.4 cM) we mapped a genetic marker for the GLS transcription factor Myb29, while a putative eQTL representing Myb 29 was found at 22.4 cM (with LOD of 2.1, which is lower than the significance threshold). At the Myb29 eQTL position at 22.4 cM, we found co-localizing eQTLs representing GLS biosynthesis genes: 2CYP79F1A06, 1-2CYP83A1A04, 2BCAT4A05, 2GSL-OHA03(2) and 2UGT74C1A05. We mapped a marker for GSL-OH at the middle of chromosome A03 (70.4cM), while the eQTL for the probe representing GSL-OH mapped in an interval spanning the same location. The GSL-OH eQTL interval colocated with the genetic map position of 3IPMISSU1A03 and 1MAM1A03. At the lower bottom position of A03, eQTL corresponding to 3IPMISSU1A03 and 1-2MAM1A03 colocalized at the position where a genetic marker for BCAT4 was mapped   1 cM). Finally, also in linkage A03, cis-regulation was found for the MAM gene, as the 1-2MAMA03 probe showed an eQTL at the position of the MAM genetic marker (126.0 cM). In addition, the same MAM probes had a significant eQTL at the position of a Myb28 marker (mapped at 127.2 cM on A03), although we did not find a Myb28-eQTL at this position.
Along linkage group A09 several regions were identified as eQTL hotspots. Many regions in linkage group A09 showed either colocalization of eQTLs or cis-effect of genes, which have been mapped at the corresponding positions in this population.

Glucosinolate Biosynthesis in Brassica rapa
The comparison of the mQTL and eQTL data revealed several GLS biosynthesis genes and transcription factors involved in the regulation of glucosinolate biosynthesis in B. rapa ( Figure 6). In linkage group A03, for example, we were able to dissect the genetic function of the paralog of GSL-OH in A03 in the biosynthesis of progoitrin: its mQTL colocalized with both the eQTL and the genetic marker for 2GSL-OHA03 (2). In linkage group A03 at the large interval containing the eQTLs for 3IPMISSU1A03 and 1-2MAM1A03, the mQTLs for gluconapin, glucoraphanin, heptyl (I, II, III) and hexyl III glucosinolates colocalized within the region where genetic markers for the MAM gene and for the BrBCAT4 gene mapped. In linkage group A09 overlapping mQTLs were found for glucobrassicanapin, glucoerucin, gluconapin, hexyl-GLS III, methylsulphonylbutyl-GLS and progoitrin. These mQTLs co-localized at the interval containing several eQTLs and the genetic marker for UGT74B1.  In summary, for the eQTL analysis a total of 94 probes were selected as representatives of candidate genes for the GLS pathway. Forty-two from these 94 probes, representing 25 candidate genes, showed at least one eQTL. These genes had both cis and trans eQTL in most hotspots in the genomic regions at A03 and A09, with most of the genes showing trans-eQTL effects at the position of Myb29 in A03, and both UGT74B1 and Myb28 in A09.

Discussion
A genetical genomics approach combining metabolic and expression data obtained from a DH population was chosen to gain insight into the genetic regulation of the metabolite composition in B. rapa leaves. To illustrate the strength of this approach, we selected six biosynthetic pathways involved in the production of health-related phytochemicals: carotenoids, tocopherols, folates, glucosinolates, flavonoids and phenylpropanoids. These pathways have very well been studied in Arabidopsis thaliana and most of the genes involved in these pathways have been characterized [28,29], [43], [44] [47], [48], [49].
Traditionally, the synteny between Brassica and Arabidopsis and the recent availability of the B. rapa genome sequence have assisted in the prediction of candidate genes in genomic regions where a phenotypic QTL has been detected [25], [26], [31]. In the present study the use of large scale metabolomics approaches using both targeted and untargeted analytical platforms, combined with transcriptome analysis of a selection of annotated probes, allowed us to further analyze the QTL results and predict a group of candidate genes. The QTLs detected for many metabolites detected by untargeted LC-MS clustered within a genomic region at linkage group A7, indicating that these metabolites possibly shared a common genetic regulator.
To identify influential genes and gene products, the genetical genomics approach emerged as a tool to combine expression profiling with molecular marker analysis through the use of quantitative trait loci (QTL) analysis in a segregating population [14]. For our study we profiled the transcript abundance of the 92 DH lines with a microarray assembled from B. napus, B. rapa and B. oleracea EST sequences [27]. The direct comparison of metabolite mQTL and eQTL maps has shown the predictive capacity of eQTL to detect candidate genes for phenotypic differences in Arabidopsis and B. rapa [19]. This comparison between mQTL and eQTL in our study revealed co-localization of both in many cases. The predictive value of the QTL comparison through co-localization was very successful in the case of the isoprenoids and the glucosinolate pathway, but not in the case of both total folate and the phenylpropanoid and flavonoids with mQTL on A07. However, in the case of the flavonoids further inspection of significant eQTL (LOD.3.5) in linkage group A07 helped us to identify three genes with predicted function in flavonoids regulation in Arabidopsis.
To further analyze the regulatory network of a phytochemical pathway, we focused on the very well characterized pathway leading to the biosynthesis of glucosinolates. The variation and genetic regulation of the glucosinolate content has been widely studied [50], [51], [52], [53], [54], [55]) at different developmental stages and organs and with different approaches [49]. In the Ler vs Cvi RIL population of Arabidopsis, two major loci were found through QTL analyses to explain the observed variation for most of the aliphatic glucosinolates [11], [52], [53]. The MAM locus is responsible for the variation in chain length [56] and the AOP locus is responsible for the variation in side chain modification [53]. In a previous QTL study for glucosinolate variation in B. rapa leaves, a major QTL for the content of a number of aliphatic glucosinolates was identified on linkage group A03 in a DH population of a cross between a yellow sarson and a pak choi accession [31].
With the B. rapa sequence information available several candidate genes for glucosinolate regulation have been identified [28], [45]. However, these candidate genes have never been identified as loci underlying QTL and their functions have not been validated in B. rapa. In our study we were able to identify a metabolic network that clearly suggests a differential regulation of different glucosinolate classes. In the case of linkage group A03, mQTLs were detected for the long-chain aliphatic glucosinolates while at A09 mQTL were mostly detected for short-chain (C3 to C5) aliphatic glucosinolates and their modified forms. In these hotspot regions in A03 and A09 cis and trans effects were observed and most of the genes showed trans effects at the position of Myb29 in A03 and Myb28 -UGT74B1 in A09.
The analysis of expression differences, within a DH population, of a selected group of candidate genes for these selected pathways from the Arabidopsis research helped us to focus on well-known and characterized genes. Although these candidate genes are potential regulators of important pathways leading to metabolites of interest, the results indicated that the relevance of these genes is Secondary Metabolite Regulation in Brassica rapa PLOS ONE | www.plosone.org different in B. rapa. For example in our population the hotspot Myb28 -UGT74B1 in A09 is more relevant in B. rapa and also in B. napus seeds [57] than the AOP locus in A. thaliana. One difference that can have an effect in the QTL analysis is the triplication of the genomes in Brassica with the presence of paralogues. Our results with the presented genetical genomics approach indicate a possible sub-functionalization of the paralogues in B. rapa. For example, for the gene SUR1 we found two microarray probes annotated as corresponding to SUR1 copies mapping to linkage groups A07 and A09, with one copy showing significant eQTL values only at A07 (therefor called SUR1A07) and the other copy showing cis and trans eQTLS effects at A09 (thus called SUR1A09).
Significant genetic contribution to the regulation of the level of secondary metabolites in B. rapa opens the possibility for application of metabolic engineering in Brassica crops. However, it is important to realize that correlation between different pathways can exist, preventing the identification of a hotspot for regulation of specific metabolites. Our results of eQTL analysis indicated cross talk between several pathways, which means that they are interdependent, most likely as these pathways use common substrates ( Figure 3). The data generated in our study is valuable for the selection of B. rapa accessions with higher or lower content of specific metabolites, which can help in markerassisted selection for breeding purposes based on the identified candidate regulatory genes. shows QTL expression results of microarray probes representing candidate genes, names are listed on the right. turquoise (logp = 3), dark turquoise(logp = 3-4), darkcyan (4)(5), red (logp = 5-7), darkred (logp = 7-10), white(logp = .10). (TIF)