Exploring the basis of 2-propenyl and 3-butenyl glucosinolate synthesis by QTL mapping and RNA-sequencing in Brassica juncea

Brassica juncea is used as a condiment, as vegetables and as an oilseed crop, especially in semiarid areas. In the present study, we constructed a genetic map using one recombinant inbred line (RIL) of B. juncea. A total of 304 ILP (intron length polymorphism) markers were mapped to 18 linkage groups designated LG01-LG18 in B. juncea. The constructed map covered a total genetic length of 1671.13 cM with an average marker interval of 5.50 cM. The QTLs for 2-propenyl glucosinolates (GSLs) colocalized with the QTLs for 3-butenyl GSLs between At1g26180 and BnapPIP1580 on LG08 in the field experiments of 2016 and 2017. These QTLs accounted for an average of 42.3% and 42.6% phenotypic variation for 2-propenyl and 3-butenyl GSLs, respectively. Furthermore, the Illumina RNA-sequencing technique was used to excavate the genes responsible for the synthesis of GSLs in the siliques of the parental lines of the RIL mapping population, because the bulk of the seed GSLs might originate from the siliques. Comparative analysis and annotation by gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) revealed that 324 genes were involved in GSL metabolism, among which only 24 transcripts were differentially expressed genes (DEGs). Among those DEGs, 15 genes were involved in the biosynthesis and transport of aliphatic GSLs, and their expression patterns were further validated by qRT-PCR analysis. Joint QTL mapping and RNA-sequencing analyses reveal one candidate gene of IIL1 (LOC106416451) for GSL metabolism in B. juncea. These results will be helpful for further fine mapping, gene cloning and genetic mechanisms of 2-propenyl and 3-butenyl GSLs in B. juncea.


Plant materials and field trial
Parents G266 and G302 are DH lines. The seeds of the G266 line had low 2-propenyl and high 3-butenyl GSL contents, while G302 had high 2-propenyl and low 3-butenyl GSL contents. The RILs produced by G266×G302 displayed great variation in agronomic traits, such as flowering time and number of seeds per silique as presented in our earlier study [33]. Three replicates for each of the parental lines G266 and G302, their F 1 and 167 F 6 RILs were planted on the farm of Guizhou University, Guiyang, China in 2016 and 2017. The design of the trial was a randomized complete block. Each plot consisted of two rows with one size of 3.66 m 2 (3 m×1.22 m). Five grams of seeds from three plants in each plot were harvested at maturity and analyzed for 2-propenyl and low 3-butenyl GSL contents. The average 2-propenyl and low 3-butenyl GSL contents of the three replicates were used for QTL analysis.

DNA extraction and polymerase chain reaction (PCR)
Genomic DNA was extracted from young leaves of the parental lines, F 1 and 167 F 6 RILs using the modified sodium dodecyl sulfate method [34]. PCR of the ILP markers was carried out according to our previous studies [11,35]. Each PCR (20 μl) contained 1× standard PCR buffer (NEB), 1 U of Taq polymerase (NEB), 0.25 μM forward primer, 0.25 μM reverse primer, 100 μM each dNTP and 50 ng of genomic DNA in a total volume of 20 μL. The PCR amplification consisted of an initial denaturation at 94˚C for 5 min; 35 cycles consisting of 94˚C (45 sec); 55˚C (45 sec) and 72˚C (1 min); followed by termination at 72˚C for 7 min. All PCR products were analyzed by electrophoresis in 2% agarose gels in 1× tri-acetate-ethylene diaminetetra acetic acid buffer. Gels were visualized by staining in ethidium bromide and photographed on a digital gel documentation system.

Construction of genetic linkage map and QTL analysis
The genetic linkage map of B. juncea was constructed by using JoinMap 4.0 software at LOD�4.0 [36]. Recombination frequencies were converted to map distances in cm using the Kosambi mapping function and the genetic map was drawn with MapChart software [37]. QTL analysis of 2-propenyl and 3-butenyl GSL contents was performed using the interval mapping method of MapQTL 6.0 software [38]. A permutation test (1,000 replications) was used to determine the significance level for LOD with a genome-wide probability of p<0.05.

Glucosinolate component analysis
The 2-propenyl and 3-butenyl GSL contents of the mature seeds from each plot were analyzed following published methods [39,40] with minor modifications. Each seed sample was crushed and 200 mg of each sample was extracted twice with 2 ml boiling 70% methanol. The concentration of GSLs in the seeds was determined by high-performance liquid chromatography (Waters 2487/600/717) using the ISO9167-1 (1992) standard method.

RNA extraction, preparation, sequencing and data analysis
Total RNA (2 μg) was extracted from fresh seed coats of 20 DAP (days after pollination) siliques of three independent plants for each of the parental lines G266 and G302 using the TRIzol kit (Invitrogen, Carlsbad, CA), according to the manufacturer's instructions. The RNA purity was checked using the Kaiao K55001Spectrophotometer (Kaiao, Beijing, China), and the RNA integrity and concentration were assessed using the RNA Nano 6000 Assay Kit for the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). Then, the six RNA samples were sent to the ANOROAD GENOME company (http://www.genome.cn/) for the construction of cDNA libraries and Illumina deep sequencing according to the paper of Wang et al. [41]. The raw RNA-sequencing data were filtered by a Perl script, following the steps of Wu et al. [42].

Identification and annotation of differentially expressed genes (DEGs)
DESeq2 v1.6.3 was designed for differential gene expression analysis between two samples with three biological replicates under the theoretical basis obeys the hypothesis of negative binomial distribution for the value of count. The p-value was corrected by the BH method. Genes with q�0.05 and |log2_ratio|�1 were identified as differentially expressed genes (DEGs) [43]. The DEGs obtained were further annotated with Gene Ontology (GO, http:// geneontology.org/) and analyzed by KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.kegg.jp/) [44,45]. The GO enrichment of DEGs was implemented by the hypergeometric test, in which the p-value is calculated and adjusted to produce the q-value, and the data background is the genes in the whole genome. GO terms with q<0.05 were considered to be significantly enriched. GO enrichment analysis was used to determine the biological functions of the DEGs. KEGG is a database resource containing a collection of manually drawn pathway maps representing our knowledge of molecular interaction and reaction networks. The KEGG enrichment of the DEGs was determined by the hypergeometric test, in which pvalue was adjusted by multiple comparisons to produce the q-value. KEGG terms with q<0.05 were considered to be significantly enriched.

Quantitative real time-PCR (qRT-PCR) analysis
Quantitative real-time PCR (qRT-PCR) was used to verify the transcript levels of the RNA-Seq results. Total RNA was extracted using the TRIzol kit (Invitrogen), according to the manufacturer's instructions. Then, the cDNA was synthesized by reverse transcription using Prime-Script RT reagent kits with gDNA Eraser (Takara, Dalian, China) according to the manufacturer's instructions. Sixteen gene-specific primers for qRT-PCR were designed based on reference unigene sequences randomly chosen from the DEGs using Primer Premier 5.0. Real-time PCR was conducted using SsoAdvanced TM Universal SYBRGreen Supermix (Hercules, CA) in a typical 20 μl PCR mixture. The 20 μl mixture contained 10 μl SYBR Green Supermix (2×), 0.4 μl reverse and forward primers (10 μM), 2 μl (100 ng) template cDNA, and 7.2 μl ddH 2 O. The qRT-PCR conditions were 95˚C for 2 min, followed by 40 cycles of 95˚C for 10 s (denaturation), followed by 60˚C for 20 s (annealing and extension). The 2 -ΔΔCt algorithm was used to calculate the relative level of gene expression. The β-actin gene was used as the internal control, and the T399 samples served as the control. All qRT-PCR were performed with three biological replicates, and run on a Bio-Rad CFX96 Real Time System (Bio-Rad, Hercules, CA, USA).

Construction of one genetic linkage map
A total of 304 polymorphic loci of the 359 polymorphic markers (84.7%) were mapped on 18 linkage groups and covered a genetic length of 1671.13 centiMorgans (cM) with an average marker interval of 5.50 cm (Table 1

QTL mapping of 2-propenyl and 3-butenyl glucosinolate contents
The 167 RILs, their parental lines G266 and G302, and the F 1 , were grown in the field with three replications. These lines were grown at Guiyang and distributed normally in 2016 and 2017 (Fig 3). No significant difference across the two years for 2-propenyl and 3-butenyl GSL contents was detected (p = 0.594 and p = 0.888, respectively). The 2-propenyl GSL contents was significantly negatively correlated with the 3-butenyl GSL contents in 2016 (r = -0.920, p = 0.000) and 2017 (r = -0.914, p = 0.000), respectively. The parents of the population differed in 2-propenyl GSL contents, with mean values of 17.29 μmol/g and 180.90 μmol/g for G266 and G302, respectively ( Table 2). The mean 2-propenyl GSL contents of F 1 was 67.34 μmol/g, which was closer to that of the female parent and slightly higher than the mean value of the RIL mapping population (63.45 μmol/g) ( Table 2). The range of 2-propenyl GSL contents in the RILs was 10.70~214.36 μmol/g in 2016 and 9.59~215.60 μmol/g in 2017 ( Table 2). The The genetic basis of 2-propenyl and 3-butenyl glucosinolate synthesis parents of the population differed in 3-butenyl GSL contents, with mean values of 86.98 μmol/ g and 15.80 μmol/g for G266 and G302, respectively ( Table 2). The mean 3-butenyl GSL contents of F 1 was 63.80 μmol/g, more similar to that of the female parent and slightly lower than the mean value in the RIL mapping population (75.98 μmol/g) ( Table 2). The range of 3-butenyl GSL contents of the RILs was 8.71~170.63 u mol/g in 2016, and 5.82~166.04 μmol/g in 2017 (Table 2) (Table 3 and Fig 4). Another two QTLs for 3-butenyl GSL contents were detected and colocalized with GNA-2016 and GNA-2017 between At1g26180 and BnapPIP1580 on LG08 (Table 3 and Fig 4). The two QTLs for 3-butenyl GSL contents explained 31% and 38.4% of the total variation in 2016 and 2017 respectively ( Table 3). All these QTLs were mapped to a region adjacent to the ILP marker At4g20150-1 on LG08 (Table 3 and Fig 4).

Synteny relationships between LG08 and A08 of B. juncea and B. rapa
LG08 contained a total of 35 ILP markers. Among these, 13 (37.1%) were developed from the single-copy genes of A. thaliana, and 17 and 5 were developed from the unique  LG08 showed synteny between LG08 and A08 of B. juncea (Fig 4). Furthermore, the sequences of the unique transcript fragments for developing the 22 B. napus and B. rapa markers were used to blast against the Brassica rapa genome (Brassica rapa cultivar Chiifu, Brapa_1.0) in NCBI (https://www.ncbi.nlm.nih.gov/), and 18 (81.8%) were mapped to the A08 chromosome (S1 Table). The synteny analysis indicated that LG08 was exactly chromosome A08 of the A genome.
Transcriptome analysis of the parental siliques as potential GSL source for seeds. To construct a de novo transcriptome database, three RNA libraries were generated for each of G266 and G302 lines through Illumina sequencing. A total of 139,197,152 and 138,957,982 raw reads were generated from the G266 and G302 libraries, respectively (S2 Table). After removing low quality reads, adapter polluted reads and higher N contents (>5%) reads, a total of 134,953,284 (96.95%, T399) and 134,832,452 (97.03%, T085) clean reads were obtained (S2 Table). After filtering out the genes that contained only one exon or encoded short peptide chains (<50 amino acid residues), a total of 81,826 transcripts were revealed by blasting the reference genome using DESeq2 (v1.6.3). To functionally annotate those transcripts, the 81,826 transcripts were blasted in search of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Finally, 8,1694 transcripts (92.64%) were successfully annotated by GO and KEGG, and 324 ones of these transcripts were involved in GSL metabolism (S3 Table).
To identify differentially expressed genes (DEGs) involved in the GSL metabolism of siliques between G266 and G302. A rigorous algorithm with a threshold of "FDR�0.05 and jLog 2 Ratioj�1" was developed and used as thresholds to judge the significance of differences in transcript abundance. It was found that only 24 transcripts were DEGs related to GSL metabolism. Among those DEGs, 15 genes are involved in the biosynthesis and transport of aliphatic GSLs (Table 4), and 5 and 4 genes are involved in the biosynthesis of indolic GSLs and the breakdown of GSLs, respectively (S4 Table).
To verify the expression of these transcripts detected by RNA-Seq, the 15 candidate genes involved in the aliphatic GSL metabolism pathway were randomly chosen for validation by qRT-PCR. Detailed information about the chosen DEGs and primers is listed in S5 Table. The data obtained by qRT-PCR were consistent with the RNA-Seq results (Fig 5), suggesting the reliability of the transcriptome database.

Joint QTL mapping and RNA-sequencing analyses reveal the candidate genes for GSL metabolism in B. juncea
To integrate the results of QTL mapping and RNA-sequencing, we performs an alignment analysis between the 24 DEGs related to GSL metabolism and the reference genome of B. juncea [16] by the BLAST-like alignment tool [54]. Only two DEGs of LOC106429668 encoding MYB28/MYB29/MYB76 (physical position: 23,048,306) and LOC106416451 (physical position: 5,833,626) encoding IIL1 were located on A08 of B. juncea genome. In addition, the DNA sequence designed for the PIP markers on A08 also blast the B. juncea reference genome [16]. The QTLs for 2-propenyl and 3-Butenyl GSL is located between the physical position of 18,549,777 (BnapPIP592) and the start point of the chromosome, the region of which was overlapped with the position of LOC106416451.

Discussion
Recombinant inbred lines (RILs) are an important resource in the genetic mapping of complex traits in many species. The RIL mapping population was successfully produced in our laboratory, allowing us to construct a genetic linkage map by utilizing ILP markers in B. juncea. The ILP primers were designed on the conserved exons flanking the target intron of cDNA/EST sequences to exploit its polymorphism. Each ILP marker locus might represent one gene copy in the studied genome. Taking the polymorphic and monomorphic loci together, approximately 46.8% of the 306 polymorphic ILP primers in the present study revealed more than one locus, indicating a very close ratio revealed by ILP and RFLP primers in the earlier studies [55,56]. The multiple loci revealed by ILP primers confirmed the polyploidy of B. juncea. The higher polymorphism ratio of 24.1% in the ILP primers between the parental lines G266 and G302 not only revealed the hypervariability of ILP primers but also suggested a high degree of variation between the parental genomes. This high genetic difference between the parental synteny between LG08 and A08 of B.rapa through balsting analysis with Brapa_1.0 of Brassica rapa cultivar Chiifu in NCBI (https://www.ncbi.nlm.nih.gov/).
https://doi.org/10.1371/journal.pone.0220597.g004 The genetic basis of 2-propenyl and 3-butenyl glucosinolate synthesis lines would be convenient to increase the density of genetic markers in different linkage groups in B. juncea. The 2-propenyl and 3-butenyl GSLs are the major glucosinolates found in B. juncea [57]. The colocated QTL regions of 2-propenyl and 3-butenyl GSLs on A08 of B. juncea represented one novel QTL first detected on A08 in Brassiceae [57][58][59] that explained average phenotypic variations of 42.5% and 42.6%, respectively. The first reason might be that fewer studies focused on 2-propenyl and 3-butenyl GSLs of the germplasm originating in China as the important center of origin in B. juncea. Second, the exact order of linkage groups was difficult to obtain before the B. juncea genome was sequenced [16]. Another reason might be recombination events between the chromosomes of B. juncea in different regions. Although QTL mapping of 2-propenyl and 3-butenyl GSLs has been performed previously in B. juncea [57][58][59], no convenient and reliable primers could be used for gel detection in marker-assisted breeding. In the present study, all 306 primers mapped on B. juncea were ILP-type, providing a convenient, specific and rapid detection method for agarose gel electrophoresis in marker-assisted breeding. The ILP marker At4g20150-1 on A08 had the nearest distance of 1.37-3.37 cM to the peak. The ILP primer At4g20150 produced six fragments between 400 bp and 700 bp in the present study and mostly represented six copies of one gene, among which two copies were polymorphic and mapped to LG01 (At4g20150-2) and LG08 (At4g20150-1), respectively. In another genetic map of B. juncea, the primer amplified three copies from A3, B7 and B8 [32]. The specific marker of At4g20150 was codominant, clear and simple to score by agarose gel, which resulted in one 700 bp fragment. The novel QTL and the linked marker of At4g20150-1 can be helpful in exploiting the metabolic mechanism of 2-propenyl and 3-butenyl GSLs. The marker tightly linked to QTLs can also be used for marker-assisted selection (MAS). For example, Xu et al. (2018) transferred a thermostable β-amylase from wild barley into a commercial variety, and identified several elite lines with MAS [60], and a major QTL for resistance to Fusarium head blight was transferred from Thinopyrum elongatum onto durum wheat 7AL chromosome arm by MAS [61]. The genetic basis of 2-propenyl and 3-butenyl glucosinolate synthesis GSL synthesis in seeds is nearly nonexistent, so these compounds are mainly imported from other tissues [17, [62][63][64]. As the closest GSL source and the only organ with similar types of GSLs, the siliques might be the source of most seed GSLs in Brassicaceae. In the present study, the RNA-Seq technique was used to screen the key genes for aliphatic GSL synthesis in the siliques of the parental lines G266 and G302. In the siliques of G302, which has high aliphatic GSL contents, 9 DEGs associated with CYP83A1, IIL1, IPMI2, IPMI SSU1, AOP3, MYB28, MYB29 and MYB76 were upregulated, resulting in its high GSL contents. Surprisingly, SUR1 and SOT17-18 were downregulated for unknown reasons. Furthermore, GTR2 is upregulated in the siliques of G302, and thus, it might play a more important role in GSL transport from siliques to seeds than the downregulated gene GTR1. The process of GSL metabolism is complicated, and few advances have been achieved. The main reasons might be the transport of GSLs among different organs. The combined method of QTL mapping and RNA-Seq should be helpful for the future fine mapping and gene cloning of 2-propenyl and 3-butenyl GSLs.
The joint QTL mapping and RNA-sequencing analyses reveal one candidate gene of LOC106416451 encoding IIL1. IIL1 is mainly responsible for the isomerization of 2-alkylmalic acid to form 3-alkyl-malic acid [47,50,65,66]. In the present study, IIL1 is significantly highly (Log 2 Fold Change = 9.71, p = 3.58E-15) expressed in the siliques of G302 with high aliphatic GSLs than that in G266 with low aliphatic GSLs. The primary work validates that the IIL1 might be the key gene for GSL regulation in the present RIL mapping population. However, more work is needed to narrow the QTL region and validate the candidate gene of LOC106416451 in our future study.
In addition, the mapping population used in the present study displayed great variation in agronomic and quality traits in this study and our earlier study [33]. The constructed genetic map would be useful in QTL mapping, gene cloning and marker-based precision breeding of more important traits in B. juncea. Because the number of traditional genetic markers is limited, we would sequence the RILs to develop more SNPs (single nucleotide polymorphism) and create a unified, saturated genetic map of B. juncea in the future.
Supporting information S1