Fine mapping of the major QTL for seed coat color in Brassica rapa var. Yellow Sarson by use of NIL populations and transcriptome sequencing for identification of the candidate genes

Yellow seed is a desirable trait in Brassica oilseed crops. The B. rapa var. Yellow Sarson carry unique yellow seed color genes which are not only important for the development of yellow-seeded oilseed B. rapa cultivars but this variant can also be used to develop yellow-seeded B. napus. In this study, we developed near-isogenic lines (NILs) of Yellow Sarson for the major seed coat color QTL SCA9-2 of the chromosome A9 and used the NILs to fine map this QTL region and to identify the candidate genes through linkage mapping and transcriptome sequencing of the developing seeds. From the 18.4 to 22.79 Mb region of SCA9-2, six SSR markers showing 0.63 to 5.65% recombination were developed through linkage analysis and physical mapping. A total of 55 differentially expressed genes (DEGs) were identified in the SCA9-2 region through transcriptome analysis; these included three transcription factors, Bra028039 (NAC), Bra023223 (C2H2 type zinc finger), Bra032362 (TIFY), and several other genes which encode unknown or nucleic acid binding protein; these genes might be the candidates and involved in the regulation of seed coat color in the materials used in this study. Several biosynthetic pathways, including the flavonoid, phenylpropanoid and suberin biosynthetic pathways were significantly enriched through GO and KEGG enrichment analysis of the DEGs. This is the first comprehensive study to understand the yellow seed trait of Yellow Sarson through employing linkage mapping and global transcriptome analysis approaches.

juncea are amphidiploid species, each carrying two genomes and these genomes are evolved from a common prototype through polyploidization and re-arrangements [49]; this can introduce complexity in research and identification of a gene. Therefore, important knowledge can be gained through working with a simpler genome species, like B. rapa. For example, Kebede et al. [31] detected a second QTL for seed color on A9, Kebede and Rahman [50] detected a QTL for silique length on A5, and Rahman et al. [51] detected a major QTL for seed glucosinolate content on A2 while working with the diploid species B. rapa; these QTLs, however, could not be found in literature from studies with the amphidiploid species. The objective of this study was to fine map the major seed color QTL in B. rapa by use of a set of near-isogenic lines (NILs) differing for seed color and to identify the candidate genes involved in the control of this trait through transcriptome sequencing of the developing seeds of the NILs. Knowledge gained from this research can be used in marker-assisted selection for the development of yellow-seeded Brassica oilseed crops, as well as for the development of gene-based molecular markers.

Plant materials
Near-isogenic BC 4 S 1 and BC 4 S 2 (four backcross followed by one and two time self-pollination) populations were developed from a cross between two self-compatible B. rapa lines, Sampad (Yellow Sarson, yellow seed) and 3-0026.27 (yellowish brown seed), and using Sampad as the recurrent parent. For this, marker-assisted backcrossing technique was applied where the plants heterozygous for seed color were identified using the co-dominant SCA9-2 QTL markers of A9 reported by Kebede et al. [31]. From this backcross program, the BC 4 plants heterozygous for seed color were self-pollinated for BC 4 S 1 seeds; seed color of the BC 4 S 1 seed families was brown. Fifty five plants of a single BC 4 S 1 family were grown in a growth chamber (20˚C, 16 h light, 8 h dark) and were self-pollinated by bag isolation for BC 4 S 2 seeds; all these plants were also genotyped with simple sequence repeat (SSR) markers from the two QTL regions, SCA9-1 and SCA9-2, of A9 [31]. Segregation for SSR marker CB10022A from the QTL region of SCA9-2 was found in this population where the plants carrying Sampad allele in homozygous condition produced uniform yellow seeds, while the plants homozygous for 3-0026.27 allele or heterozygous for Sampad and 3-0026.27 alleles produced brown seed (S1 Fig). In contrast, all 55 BC 4 S 1 plants were homozygous for Sampad allele for the SSR marker CB10103A from the QTL region SCA9-1; this suggests that only the allele of the major QTL SCA9-2 were segregating in this population, as could be expected in a NIL population.
Based on genotypic data of the BC 4 S 1 plants and phenotypic data of the harvested BC 4 S 2 seeds, a total of 582 BC 4 S 2 plants from five yellow-, five homozygous brown-and five heterozygous brown-seeded BC 4 S 2 families were grown in a greenhouse. All these plants were genotyped with SSR markers and phenotyped of seed color for fine mapping of the seed color gene.
Descriptive statistics of the BC 4 S 1 and BC 4 S 2 populations, such as mean and standard error, were calculated in Microsoft Excel 2003, Pearson's correlation coefficient and one-way analysis of variance (ANOVA) were calculated using the software program Statistical Product and Service Solutions (SPSS) [52].
Seeds of the BC 4 S 1 plants were analyzed for seed oil and protein contents by Near Infrared (NIR) spectroscopy method and reported as percent of whole seed at 0% moisture basis. designed from the SCA9-2 QTL region and by use of Brassica genome sequence information (http://brassicadb.org/brad/datasets/pub/Genomes/Brassica_rapa/V1.0/V1.5/) [54], were tested on the two parents for polymorphism; the polymorphic markers, as well as the published QTL markers of SCA9-2 [31] were used to genotype the BC 4 S 2 plants.

RNA extraction and cDNA library construction
Self-pollinated seeds at the age of 10, 20, 30, 40 and 50 days after pollination (DAP) were collected from all above-mentioned 55 BC 4 S 1 plants. However, after confirmation of their genotype through SSR marker analysis of the BC 4 S 1 and BC 4 S 2 plants, and progeny test result of the BC 4 S 1 plants (i.e. seed color of the BC 4 S 2 plants), developing seeds from four yellow-seeded and four homozygous brown-seeded BC 4 S 1 plants were selected and mixed to constitute the yellow-and brown-seeded bulks for RNA-seq analysis. Thus, the total number of bulks was 2 seed color type × 5 growth stages × 2 replicates = 20. RNA extraction was done using a RNeasy mini kit (QIAGEN, Canada), and the first-strand cDNA synthesis was done using a Super-Script II Reverse Transcriptase kit (Invitrogen, USA).

Transcriptome sequencing and differentially expressed genes (DEGs) calling
Twelve of the above-mentioned 20 bulk RNA samples of the BC 4 S 1 plants, collected at 20, 30 and 40 DAP, were used for RNA sequencing. Preparation of the cDNA libraries and sequencing (paired-end 150 bp) was done by Novogene (https://en.novogene.com/) using the Illumina Hiseq platform. After quality control of the raw reads (Q20 score � 95%, Q30 score � 88%), the clean reads were mapped to the B. rapa reference genome (ftp://ftp.ensemblGenomes.org/ pub/plants/release-37/fasta/brassica_rapa/dna/) with the gene annotation files (ftp://ftp. ensemblGenomes.org/pub/release-23/plants/gtf/brassica_rapa/) by using the default parameters in TopHat2 v.2.1.1 [55]. Mapping information from all samples was combined and assembled using the Cufflinks assembler to identify the novel genes and optimize the known transcripts [56]. Normalized gene expression level was measured by Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced (FPKM) method [57] using the HTSeq software. DESeq software [58,59] was used to screen the DEGs, and the filtered DEGs (q-value<0.05) were used for clustering and Venn Diagram analysis. The sequence data was submitted to the Sequence Read Archive (SRA) database in NCBI (Accession number: PRJNA512955).
Fine mapping and transcriptome sequencing -Brassica seed color

Quantitative real-time PCR analysis (qRT-PCR) of selected genes
Based on RNA-seq data analysis and the target gene mapping result, 13 genes from the SCA9-2 QTL region and 19 genes from different biosynthetic pathways, such as phenylpropanoid, flavonoid, lignin and fatty acid biosynthetic pathways, were selected for further validation by qRT-PCR method. All 20 bulks of RNA of the yellow-and brown-seeded plants, collected at five different stages after pollination (10,20,30,40 and 50 DAP), were used for this analysis. The total reaction volume was 20 μL comprising of 10 μL 2× Promega GoTaq qPCR Master Mix, 1 μL 20 μM forward primer, 1 μL 20 μM reverse primer, 100 ng first-strand cDNA, and nuclear-free water was added to make 20 μL. PCR reaction was performed in StepOnePlus Real-Time PCR Systems (Applied Biosystems) with the PCR profile: 2 min at 94˚C for hotstart activation followed by 40 cycles of 3 s at 94˚C for denaturation, 30 s at 60˚C for annealing and extension, and the dissociation curves were generated after the final amplification cycle by increasing the temperature from 60 to 95˚C. The 2 -ΔΔCt method [60] was adapted to calculate the expression of the selected genes, and Actin7 was used as internal reference gene.

Hierarchical cluster (HCL), SNP and Indel, transcription factor analysis of the DEGs, and development of CAPS markers
Heatmap and HCL analysis of the DEGs based on FPKM vales of the yellow and brown seeds were performed using Multi Experiment Viewer (Mev) software v.4.9.0 [61]. Genome Analysis Toolkit (GATK3) was used for SNP and INDEL calling. CAPS (cleaved amplified polymorphic sequence) markers, based on the SNPs identified in this study, were developed from the target region of SCA9-2 of A9 using the software program dCAPS Finder 2.0 (http://helix.wustl.edu/ dcaps/dcaps.html) to detect mutated restriction endonuclease cleavage sites. Details of PCR amplification of the markers and restriction site analysis of the parents and segregating BC 1 S 2 population is described elsewhere [62].
Multiple sequence alignment analysis of the mutant (yellow seed) and reference genes' protein sequences was performed using the online Clustal Omega software (https://www.ebi.ac. uk/Tools/msa/clustalo/). Transcription factor and Protein Kinase Identifier and Classifier (iTAK) was used to perform transcription factor analysis [63].

Gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) enrichment, protein-protein interaction (PPI) analysis
Plant GO term enrichment analysis was done by GOseq [64], and the KEGG enrichment analysis was done using KOBAS 2.0 software [65] and A. thaliana as the reference species. The PPI analysis of DEGs was conducted in STRING (search tool for recurring instances of neighbouring genes) database (https://string-db.org/) using B. rapa as the reference organism [66].

Phenotype of the NIL populations
The near-isogenic BC 4 S 1 plants differing for yellow and brown seed color were compared for different seed quality traits (Table A in S1 File). The yellow seed had significantly greater oil as well as sum of oil and protein contents as compared to the brown seeds. Oil and sum of oil and protein content showed significant negative correlation with seed coat color (Table A in S1 File) confirming the importance of the yellow seed for these two seed quality traits.
No seed coat pigment synthesis was observed in the developing seeds up to 30 DAP. At 40 DAP, the brown seeds had significant amount of color, and embryos were still green; and at 50 DAP, the brown seeds turned to completely brown, and the yellow seeds turned to yellow due to the change in embryo color to yellow (Fig 1).

Fine mapping of the major QTL of A9
Based on sequence information of the SSR markers used by Kebede et al. [31] (Fig 2D), a physical map of the chromosome A9 was constructed by aligning the marker sequences to the genome sequence of B. rapa cv. Chiifu-401 ( Fig 2C). Of the seven markers from 0.0 to 44.6 cM region (included the QTL SCA9-1) of this linkage map, four could be positioned on the physical map (27.02~32.45 Mb) of A9. On the other hand, of the 14 markers from 63.0 to 120.0 cm region (included the QTL SCA9-2), nine could be positioned on the physical map of which eight mapped at 21.08~25.18 Mb of A9 ( Fig 2C); however, the marker CB10022A which showed co-segregation with seed color in the BC 4 S 1 population could not be positioned on this map.
Based on linkage and physical position of the above-mentioned markers around SCA9-2 ( Fig 2C and 2D), a 3.5 Mb region between the markers nrc793 and nrc780 was chosen to increase the density of marker; this region included the marker CB10022A of the genetic map constructed by Kebede et al. [31]. A total of 40 new SSR markers were designed based on B. rapa genome sequence information, where only six showed polymorphism between the yellow-and brown-seeded plants; these markers positioned at 20.10 to 24.13 Mb in the B. rapa genome (Table 1; Fig 2B). Of the six new markers, two (SC2-SSR01 and SC2-SSR31) behaved as co-dominant markers while four behaved as dominant marker. The BrTT1 gene could be positioned at 18.77 Mb position of this map.
The newly designed six SSR markers and nrc793 were used to genotype the BC 4 S 2 population derived from five heterozygous BC 4 S 1 plants to study their linkage association with seed color as well as to construct a genetic linkage map (Fig 2A). Of the seven markers, the co-dominant marker SC2-SSR01 was found to be the closest to nrc793 on the physical map; this marker was capable of distinguishing the yellow-, homozygous brown-and heterozygous brown-seeded plants in the same manner as CB10022A, however, showed 3.13% recombination (3.13 cM away from SCA9-2) in the BC 4 S 2 segregating population. In contrast, all yellow-seeded BC 4 S 2 families derived from five yellow-seeded BC 4 S 1 plants carried the Yellow Sarson allele of SC2-SSR01 in homozygous condition ( Table 2). The dominant marker SC2-SSR07 and the co-  Table 2). The markers SC2-SSR07 (0.63 cM from SCA9-2, 21.83 Mb), SC2-SSR15 (3.13 cM from SCA9-2, 22.79 Mb) and SC2-SSR26 (5.65 cM from SCA9-2, 24.13 Mb) are located on one  [31]; the position of the QTL SCA9-2 is shown. Marker position (cM) indicates the distance from SCA9-2. (b) A physical map of the QTL region SCA9-2 of A9 constructed based on one SSR marker used by Kebede et al. [31] and the six newly-developed markers co-segregating with seed color; the position of the BrTT1 gene is also shown. Different colors indicate different scaffolds (purple = scaffold000059, blue = scaffold000040, green = scaffold000081, yellow = scaffold000084, brown = scaffold000045)(c) A physical map of A9 constructed based on sequence information of the markers from the major and minor QTL regions of the linkage group A9 published by Kebede et al. [31]; the physical map was constructed through aligning the marker sequences to the B. rapa reference genome in BRAD database. (d) The genetic linkage map of A9 published by Kebede et al. [31]; the major seed coat color QTL SCA9-2 and the minor QTL SCA9-1 are shown.
https://doi.org/10.1371/journal.pone.0209982.g002 Table 1. Primer sequence of the simple sequence repeat (SSR) markers, their physical position and genetic distance from the major seed coat color QTL SCA9-2 of A9 of Brassica rapa. side of SCA9-2 and distributed on the Scaffold000081 and Scaffold000045; their location on the genetic linkage map linearly followed the position on the physical map (Fig 2A and 2B). On the other hand, the markers SC2-SSR01 (3.13 cM from SCA9-2, 21.12 Mb), nrc793 (3.13 cM from SCA9-2, 21.08 Mb), SC2-SSR35 (2.5 cM from SCA9-2, 20.99 Mb) and SC2-SSR31 (1.25 cM from SCA9-2, 20.10 Mb) are located on the other side of SCA9-2 and gathered on the Scaf-fold000040; their order on the genetic (cM) and physical (Mb) was not linear. The gene BrTT1 was found to be located on the Scaffold000059 (Fig 2B). Based on the results from this study, it can be assumed that the chromosomal region, including the Scaffold000081, Scaffold000040 and Scaffold000059, may carry the candidate gene for seed coat color.  Fig and S3B Fig).   In total, 515 DEGs were detected between the yellow and brown seeds at these three developmental stages, and these DEGs distributed on 10 chromosomes and 17 scaffolds of B. rapa. Fifty-five of these DEGs were detected in all three developmental stages (Fig 3, Table C in S1 File). Of the 55 DEGs, 30 (>50%) were mapped to the chromosome A9, providing additional evidence to the existence of the major QTL SCA9-2 on this chromosome. These 30 genes included 5 novel and 25 known genes and located at 8,110,483 to 28,778,767 nt position of A9; 10 of the DEGs, in fact, are located in the SCA9-2 QTL region of 18,744,634 to 22,759,730 nt (Table C in S1 File). In addition to the above-mentioned 30 DEGs, 165 DEGs from A9 also showed significant differences at one or two stages of seed development; 45 of which located in the SCA9-2 region (Table D in S1 File). Thus, a total of 55 (10 + 45) DEGs from the SCA9-2 QTL region constituted the candidate gene pool for seed coat color, and this included 32 down-regulated, 21 up-regulated and two time-varying DEGs (Table E in S1 File).

Expression profiling analysis of selected genes by qRT-PCR
qRT-PCR analysis was done to study the reliability of the RNA-seq data. For this, a total of 27 single-copy genes, comprising of 12 genes from the SCA9-2 QTL region and 15 genes from the known biosynthetic pathways, were used. Highly significant correlation (R 2 = 0.94) between the FPKM and 2 -ΔΔCt values of these 27 single-copy genes (Fig 4, Table F in S1 File) demonstrated the reliability of the RNA-seq data in our study. Some of the genes used for qRT-PCR analysis, such as BrTT4, BrTT6, BrTT19, BrAHA10, BrTT1, BrTT2 and BrTT16 showed higher level of expression at 10 DAP. Hierarchical cluster analysis of the 19 genes (gene families) from the flavonoid and phenylpropanoid biosynthetic pathways and 13 genes (gene families) from the SCA9-2 QTL region (included 8 up-regulated and 5 down-regulated DEGs, which showed significant fold-change and to be potential candidate genes for seed coat color) grouped these genes into four sub-clusters (Fig 5). Almost all structural genes from the flavonoid biosynthetic pathway and one transcription factor Bra028039 (NAC family protein) were included in one sub-cluster.

HCL analysis of 515 DEGs
HCL analysis was done to find the genes from the SCA9-2 QTL region co-expressing with the flavonoid pathway genes (might function together with the flavonoid pathway genes) to be considered as the potential candidate genes. Based on the HCL analysis of the FPKM values from the above-mentioned 515 DEGs in yellow and brown seeds, nine sub-clusters were obtained (Pearson's correlation value = 0.4) (S6 Fig). Nearly all of the DEGs from the flavonoid biosynthetic pathway were included in the sub-cluster 2; this cluster included greater number of DEGs (172 DEGs, 33.4% of all DEGs) as compared to the other sub-clusters and also included 16 DEGs from the major QTL region (SCA9-2) of the chromosome A9 (Table E in S1 File). The expression value of the DEGs from this sub-cluster showed significant differences between yellow and brown seeds at 20 DAP.

SNP, indel and transcription factor (TF) analysis
SNPs and Indel analysis was done on the above-mentioned 55 (10 + 45) DEGs from the SCA9-2 QTL region where 38 found to carry SNP and one carried both SNP and Indel (Table E in S1 File); these 39 DEGs were considered to be the putative candidate genes. Beside this, seven SNPs were detected in the transcript of BrTT1 gene (Bra028067). Alignment of the deduced amino acid sequence of the BrTT1 gene of the yellow-seeded NIL with the reference amino acid sequence in BRAD database and the homologous amino acid sequence of Dahuang (B. rapa) [34] identified two silent and two sense mutations which were same in both yellow- Fine mapping and transcriptome sequencing -Brassica seed color seeded NIL and Dahuang; however, this analysis also identified three novel sense mutations in the yellow-seeded NIL, i.e. in Yellow Sarson (S7 Fig). A total of seven CAPS markers designed (Table G in S1 File) of which five yielded predicted PCR products; however, restriction endonuclease cleavage sites could not be detected by these markers. The CAPS marker 206TaqI was able to distinguish the yellow and brown-seeded parents based on PCR results; however, it was not able to detect the restriction endonuclease cleavage site. On the other hand, the CAPS marker 191BamHI clearly distinguished the parents and different genotypes of the NIL population based on restriction endonuclease cleavage site results (Fig 6, Table G in S1 File); this further strengthen the reliability of the mapping results reported above (Fig 2).
Transcription factor analysis was done on all the above-mentioned 515 DEGs. A total of 41 TFs were detected among these DEGs; three of these, Bra028039 (NAC family protein), Bra023223 (C2H2 type zinc finger family protein) and Bra032362 (TIFY family protein), were located in the SCA9-2 QTL region (Table E in S1 File).

GO and KEGG enrichment analysis of the DEGs between the yellow-and brown-seeded samples
GO term enrichment analysis was done to functionally profile (molecular function, biological process and cellular component) the DEGs, and the KEGG enrichment analysis was done to identify the biosynthetic/metabolic pathway in which the DEG's are involved. GO terms enrichment analysis of the above-mentioned 515 DEGs showed that 15 GO terms were significantly enriched (corrected p-value < 0.05) at 20 DAP; this included eleven biological-process terms, three molecular-function terms and one cellular-component term (S8 Fig, Table H in S1 File). Ten of the 11 biological-process terms were found to be related to phenylpropanoid, flavonoid or phenol-containing compound biosynthesis/metabolism (S8 and S9 Figs, Table H in S1 File), while the single term GO:0010345 was found to be involved in suberin biosynthesis. Suberin biosynthesis is known to be related not only to lipid biosynthesis but also to phenylpropanoid biosynthesis as the building blocks of suberin biosynthesis are the products of these two pathways [67]. No significantly enriched GO terms were detected at 30 and 40 DAP (Table H in S1 File).
KEGG pathway enrichment analysis indicated the involvement of flavonoid biosynthetic pathway (ath00941, down-regulated) in all stages of seed development (Table I in S1 File). Other pathways like cutin, suberin and wax biosynthesis (ath00073), fatty acid degradation (ath00071) and fatty acid biosynthesis (ath00061) were found to be up-regulated only at 20 DAP (Table I in S1 File; Fig 7). Thus, based on the GO and KEGG pathway enrichment analysis, it is evident that phenylpropanoid, flavonoid, suberin and fatty acids biosynthetic pathways are involved in the development of seed coat color (Tables H and I in S1 File), and a total of 139 DEGs were identified based on these analyses (Table J in S1 File).

PPI analysis of DEGs
Based on available information of the pathways involved in phenylpropanoid, lignin, flavonoid, and the cutin and suberin biosynthesis [20,[67][68][69] and the pathway maps available in KEGG database (ath00940: Phenylpropanoid biosynthesis; ath00941: Flavonoid biosynthesis; ath00073: Cutin, suberin and wax biosynthesis; ath00061: Fatty acid biosynthesis; ath00062: Fatty acid elongation; ath00071: Fatty acid degradation) (http://www.genome.jp/kegg/ pathway.html), a simplified flow diagram was developed (Fig 8) to understand the involvement of the DEGs identified in this study in the development of seed coat color. The expression and annotation of 242 genes involved in these pathways were collected and summarized in Table K in S1 File.
PPI analysis was performed between the above-mentioned 55 (10 + 45) DEGs from the SCA9-2 QTL region (Table E in S1 File) and the 242 genes (Table K in S1 File) from different enriched and related biosynthetic pathways. Results showed that 41 genes (25 genes from the flavonoid biosynthetic pathway, four genes from the phenylpropanoid biosynthetic pathway, six genes from the suberin biosynthetic pathway, three genes from fatty acid degradation pathway, two genes from fatty acid biosynthetic pathway and one gene from phenoproponoid biosynthetic pathway) interacted with seven DEGs from the SCA9-2 QTL region (S10 Fig). Six of these DEGs (Bra028065, Bra027990, Bra028006, Bra027988, Bra023223, Bra032367) interacted directly or indirectly with the BrTT1 (Bra028067), BrTT2 (Bra035532), BrTT16 (Bra026507, Thus, based on annotation of the function of the genes, TF analysis, SNP and Indel analysis, PPI analysis, GO and KEGG enrichment and HCL analysis, a total of 11 DEGs (Table 3, Table E in S1 File) from the SCA9-2 QTL region were identified to be the putative candidate genes involved controlling seed coat color in Yellow Sarson.

Seed coat color genes/QTL of the A9 chromosome of Brassica
To date, several researchers, such as Lou et al. [30], Rahman et al. [70,71], Zhang et al. [72], Li et al. [17], Xiao et al. [33], Bagheri et al. [32], Padmaja et al. [24], Stein et al. [73], Qu et al. [74,75], Hong et al. [48], Wang et al. [76] and Wang et al. [23,34] have reported a major seed coat color gene or QTL on the chromosome A9 of B. rapa, B. napus and B. juncea. By use of the information reported by the above-mentioned researchers, the physical position of the QTL/ genes and the markers for seed coat color were aligned to the reference B. napus and B. rapa genome (http://brassicadb.org/brad/blastPage.php) and presented in Fig 9. In addition to this, the homologous TT genes of A9 of B. napus and B. rapa reported by Qu et al. [77] were also included in the Fig 9. Based on this, it was found that the major seed coat color gene or QTL of Fine mapping and transcriptome sequencing -Brassica seed color the chromosome A9 of B. napus position either on the middle or at one end of the chromosome, while the major QTL of B. rapa and B. juncea in all cases position at the middle of this chromosome. The two homologous TT genes, BrTT1 and BrTT8 (BjTT8), which were reported by several researchers [17,23,24,34] to be involved in the control of seed coat color in B. rapa and B. juncea, were located at the middle of A9 (Fig 9).

Discussion
Several studies on seed coat color indicated that a major locus is capable of regulating the formation of seed coat pigments in B. rapa [17,22,28,33,35,78]. In the present study, genetic analysis and molecular mapping of seed coat color by use of NILs further supported this, and confirmed the major QTL SCA9-2 located on the chromosome A9. Kebede et al. [31] also detected a minor QTL SCA9-1 on this chromosome; this QTL shows interaction with the QTL SCA9-2 and has also been found to affect the seed coat color. However, recurrent backcrossing of the F 1 to the Yellow Sarson parent Sampad with selection for the SCA9-2 QTL alleles resulted in the production of the NILs which carried only the Sampad allele in the SCA9-1, as expected. Table 3. Potential candidate genes for seed coat color identified in the major QTL region SCA9-2 of the chromosome A9 of Brassica rapa. Fine mapping and transcriptome sequencing -Brassica seed color SSR markers have been used most commonly due to several advantages, such as high abundance and occurrence throughout the genome, codominant nature, and can be detected through PCR-based assay [79,80]; therefore, we developed tightly linked SSR markers from the SCA9-2 QTL region by taking the advantage of the B. rapa genome sequence. Four (SC2-SSR01, nrc793, SC2-SSR35, SC2-SSR31) of the seven linked markers were found to be Fine mapping and transcriptome sequencing -Brassica seed color located at one side of the QTL region (20.10 to 21.12 Mb) of the Scaffold000040. Based on linkage association of these four markers and their physical position in the reference B. rapa genome, it is apparent that a chromosome inversion might have occurred in the Scaf-fold000040 of the NILs used in our study. The Yellow Sarson B. rapa used in this study is genetically distinct from the other variants of this species including the Chinese cabbage [80], therefore, slight structural variation between the chromosomes of Yellow Sarson and the reference genome of Chinese cabbage cv. Chiifu was not unexpected. Cytologically, one of the chromosomes of Yellow Sarson was found to be slightly different from the chromosome of the other oilseed type B. rapa at pachytene stage [81]. Another possibility could be an error during genome assembly of the Scaffold000040 of A9; however, more markers need to be developed from this genome region to confirm this.

Gene ID Location Expression in yellow seeds
Transparent testa 1 (TT1) and Transparent testa 8 (TT8) are known to be the two important regulatory factors involved in the flavonoid biosynthetic pathway (for review, see [68]). The TT1 encodes a C2H2 zinc finger protein, which interacts with R2R3-MYB factors and is also known to regulate the expression of several key structural genes, such as TT4, TT3, TT18, BAN, and TT12, involved in the flavonoid biosynthetic pathway [82,83]; while the TT8 encodes a Basic Helix-Loop-Helix (bHLH) protein, can form a ternary complex with TT2 and TTG1 and regulates the expression of BAN gene; the TT8 is also under the control of other regulators [84][85][86][87]. According to Wang et al. [23,34], the BrTT1 is involved in the control of seed coat color in the B. rapa landrace Dahuang. However, we were not able to detect the BrTT1 in the SCA9-2 QTL region using FPKM value through RNA-Seq analysis, but qRT-PCR analysis showed that the expression of BrTT1 was significantly down-regulated at 10 and 30 DAP, not at 20 DAP, in the Yellow Sarson used in this study (S11 Fig). Wang et al. [34] reported significant difference in the level of expression of this gene from 7 to 42 DAP in Dahuang. When comparing the expression level of the other genes from the flavonoid pathway, we found that BrTT4, BrTT5, BrTT6, BrTT3, BrTT18, BrBAN, BrTT8 and BrTTG2 were strongly inhibited while a certain level of expression was found for BrTTG1, BrTT2 and BrTT16 in the yellow seeds; similar level of expression of these genes was also found in Dahuang. However, the pattern of expression of most of the BrTT genes was different in these two studied materialsthe highest level of expression was observed in our study at the early stage of seed development (10 and 20 DAP) (Fig 5, S6 and S11 Figs), while the highest level of expression was found at the middle stage (35 DAP) in Dahuang [23]. Lian et al. [88] also provided evidence that TT1 gene is involved in the control of seed color in B. napus. Based on these evidences, it is apparent that the mechanism of the regulation of seed coat color in Yellow Sarson is different from the mechanism involved in Dahuang. However, PPI analysis provided evidence for the association of several down-regulated genes from the SCA9-2 QTL region with BrTT1 (S10 Fig). Thus, based on qRT-PCR analysis and previous reports [23,34], it appears that BrTT1 might be involved in the regulation of seed coat color in Yellow Sarsonwhether this gene plays a critical role in this would need further functional investigation.
RNA-Seq analysis as well as qRT-PCR results showed that another important gene BrTT8 was significantly down-regulated in the yellow seeds at all stages of seed development (S11 Fig), however, this gene could not be detected in the SCA9-2 QTL region. Therefore, BrTT8 might not be a plausible candidate and this gene might be regulated by other upstream regulatory genes.
In this study, through RNA-Seq and qRT-PCR analysis, we have identified several DEGs in the SCA9-2 QTL region which have not been reported previously; it is highly likely that some of these genes play a major role in the control of seed color. The seed color of the B. rapa variant Yellow Sarson used in this study is bright yellow and is very unique. Seeds of this variant also lack pigment in the hilum which often can be found in other yellow-seeded lines, and the genetic control of hilum color is independent of the genetic control of seed color [89]. Therefore, it is not unlikely that some other candidate genes are involved in the control of seed color in Yellow Sarson.
RNA-Seq and qRT-PCR analysis also showed that, nearly all of the important structural genes (BrPAL,BrC4H,BrTT4,BrTT5,BrTT6,BrTT7,BrTT3,BrTT18,BrBAN,BrTT12,BrTT19,BrAHA10) and three important TFs (BrTT8, BrTTG2, BrTT1) from the phenylpropanoid and flavonoid biosynthetic pathways were determined to be down-regulated in the yellow seeds (Table K in S1 File; S11 Fig); therefore, it is possible that the candidate gene involved in the regulation of seed coat color in our materials might be an upstream regulatory gene. Indeed, the 55 DEGs detected in the SCA9-2 QTL region included three TFs, Bra028039 (NAC), Bra023223 (C2H2 type zinc finger) and Bra032362 (TIFY); all these contained SNPs or Indel and were down-regulated in the yellow seed, and Bra023223 and Bra032362 clustered with the BrTT genes (Table E in S1 File, Table 3). Furthermore, Bra023223 encodes the same type of protein as BrTT1 (C2H2 zinc finger protein) and was found to interact with the BrTT2 and BrTT16 (S10 Fig), therefore, this could be an important candidate gene for seed coat color in our material. Some of the other genes, such as Bra032367 and Bra032295 which showed interaction with the important genes from the flavonoid and fatty acid biosynthesis and degradation pathways, were down-regulated in the yellow seeds; however, none of these genes seems to have a regulatory function based on KEGG annotation and from the description of the homologous genes in A. thaliana. Several DEGs that encode unknown or nucleic acid binding proteins [90][91][92][93], such as Bra023269 (unknown protein) and Bra023172 (argonaute 2, nucleic acid binding protein), etc. (Table 3), which we identified through RNA-Seq analysis, could also be potential candidate genes for seed coat color formation. Therefore, further bioinformatics and functional analysis of the putative candidate genes, would be needed to conclude the candidate gene involved in the blockage of seed coat color formation in Yellow Sarson.
Comparative transcriptome analysis of yellow and brown seeds (seed coat) has been done in B. napus by Wang et al. [44] and Hong et al. [48] and in B. juncea by Liu et al. [18]. According to Wang et al. [44], the yellow seed mutant trait in B. napus is controlled by a single Mendelian gene; transcriptome analysis of the developing seeds of brown-and yellow-seeded NILs showed no significant difference in the level of expression of the TT gene orthologues between these two types of seeds; KEGG pathway enrichment analysis by this research group showed that the phenylpropanoid biosynthetic pathway could be enriched. Working with a different set of yellow-and brown-seeded B. napus NILs, Hong et al. [48] identified few novel genes to be responsible for the formation of seed coat color; however, in this case, expression of the TT gene orthologues, including all structural genes (except BnTT7) and important regulatory genes (BnTT8, BnTT1 and BnTT16) from the flavonoid biosynthetic pathway, were significantly down-regulated in the yellow seeds as compared to the brown seeds. In addition to this, the level of expression of most of these genes was found to reach at the peak at 28, 35 or 42 DAP with the greatest expression at 28 DAP; and the genes (BnPAL1, BnPAL2, BnC4H, Bn4CL1, Bn4CL3) from the phenylpropanoid biosynthetic pathway were found to be downregulated in the yellow seeds. In our study, GO and KEGG analysis showed enrichment for the flavonoid (down-regulated), phenylpropanoid (down-regulated), cutin, suberin and wax (upregulated) biosynthetic pathways and the ABC transporters (up-regulated), as has been reported in B. napus [48]. However, the other significantly enriched pathways, such as fatty acid degradation (up-regulated) and glycerolipid metabolism (up-regulated), which we identified in this study has not been reported by Hong et al. [48]; in contrast, they reported significant enrichment of plant-pathogen interaction (up-regulated) and plant hormone signal transduction (up-regulated). As found in our study, Hong et al. [48] also reported that the lignin biosynthetic pathway was not enriched in B. napus. In the case of B. juncea, transcriptome analysis of yellow-and brown-seeded NILs showed that genes from the flavonoid biosynthetic pathway, such as BjTT3, BjTT18 and BjBAN, as well as the genes involved in the phenylpropanoid, phenylalanine, tyrosine and tryptophan biosynthetic pathways are involved in the formation of seed coat color in this species [18].
Thus, it is apparent that, the genes involved in the phenylpropanoid and flavonoid biosynthetic pathways are generally involved in the formation of seed coat color in Brassica. However, some other enriched pathways as well as the expression profile of the flavonoid pathway genes could be different in different materials; this might result from the difference in the major seed coat color genes in these materials. Additionally, the cutin, suberin and wax biosynthetic pathway were also enriched and the genes from this pathway were up-regulated in the Yellow Sarson NILs used in this study, as has also been reported by Hong et al. [48] in B. napus NILs. Similarly, the tryptophan biosynthetic pathway was found to be enriched in our study as has been reported by Liu et al. [18] in B. juncea NILs; these pathways are known to have relation with lipid and protein biosynthesis. Thus, our findings provide valuable information on the genetic control and the complex transcriptome dynamics involved in the modulation of seed coat color.