Comparison of the Transcriptome between Two Cotton Lines of Different Fiber Color and Quality

To understand the mechanism of fiber development and pigmentation formation, the mRNAs of two cotton lines were sequenced: line Z128 (light brown fiber) was a selected mutant from line Z263 (dark brown fiber). The primary walls of the fiber cell in both Z263 and Z128 contain pigments; more pigments were laid in the lumen of the fiber cell in Z263 compared with that in Z128. However, Z263 contained less cellulose than Z128. A total of 71,895 unigenes were generated: 13,278 (20.26%) unigenes were defined as differentially expressed genes (DEGs) by comparing the library of Z128 with that of Z263; 5,345 (8.16%) unigenes were up-regulated and 7,933 (12.10%) unigenes were down-regulated. qRT-PCR and comparative transcriptional analysis demonstrated that the pigmentation formation in brown cotton fiber was possibly the consequence of an interaction between oxidized tannins and glycosylated anthocyanins. Furthermore, our results showed the pigmentation related genes not only regulated the fiber color but also influenced the fiber quality at the fiber elongation stage (10 DPA). The highly expressed flavonoid gene in the fiber elongation stage could be related to the fiber quality. DEGs analyses also revealed that transcript levels of some fiber development genes (Ca2+/CaM, reactive oxygen, ethylene and sucrose phosphate synthase) varied dramatically between these two cotton lines.


Introduction
Upland cotton (Gossypium hirsutum L.) is the largest natural fiber producer of the plants. In recent years, interest in naturally colored cotton has grown because it may reduce pollution, making it preferable to white fiber which requires a dyeing process [1][2][3]. However, its commercial application is very limited due to the lack of fiber color diversity and low fiber quality [2]. Limited brown (different color depth) and green fiber lines, among other varieties, have been used in the textile industry. A previous study demonstrated that there was a significant negative correlation between the degree of fiber color and lint percentage and fiber quality traits in cotton [4]. Therefore, subsequent studies should focus on improving the fiber quality and revealing the underlying mechanisms for pigmentation formation in naturally colored cotton.
Early genetic analysis suggested that the brown color of cotton fiber was controlled by one incompletely dominant major gene [5]. Furthermore, gene expression analysis and dimethylaminocinnaldehyde staining showed that tannins could be the key chemical responsible for the brown color in cotton fiber [6]. Subsequent chemical research indicated that the brown pigment in cotton fiber might be the chinone compound oxidated from condensed tannins, and the accumulation period of condensed tannins was from 10 DPA-25 DPA [7,8]. However, the molecular mechanism that underlies pigmentation in colored cotton fiber is still unknown.
With the development of next generation sequencing technology, RNA-seq provides a powerful tool to rebuild our knowledge of transcriptomics. By directly sequencing and assembling the mRNA, the whole transcriptome could be de novo reconstructed precisely and efficiently [9], aligned with public databases for function annotation and the critical genes could be assessed using pathway classification. In addition, gene expression can be measured and the number of transcripts can be obtained if the appropriate level of sequencing was performed. The application of RNA-seq technology has been used successfully for various species [10][11][12][13][14].
Lines Z263 and Z128, two brown fiber inbred lines with dark and light brown fiber respectively, both derived from a cross between white and brown fiber cotton. To reveal the whole transcriptome landscape of the natural colored cotton fiber, and to understand the molecular mechanism of pigment formation, the transcriptomes of these two lines at the whole early developmental stage (0 dpa-20 dpa) were sequenced using RNA-seq technology and analyzed.

Plant material
Line Z263 (Gossypium hirsutum L.), with dark brown fiber, was selected from a cross between white cotton (Zhong 6331) and brown fiber cotton (Crd). Line Z128 (G. hirsutum L.) was a selected mutant from Z263. They were planted in an experimental field at the Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS) under normal agronomic management conditions. The bolls at the day of anthesis (0 day post anthesis/DPA), 5 DPA, 15 DPA and 20 DPA were harvested and stored in an ice box, then the ovules of 0 DPA and fibers of other stages were dissected and separated on ice as fast as possible, and then stored at 280uC immediately.

Fiber quality measurement, fiber microstructure detection and cellulose test
To test the fiber quality of Z263 and Z128, the following measurements were included: upper half mean length (mm), uniformity index (%), micronaire, fiber elongation (%), fiber strength, 15 g lint samples of each line were analyzed using USTER HVI 1000 (USTER Technologies, Inc., Uster, Switzerland). The fiber microstructure detection and cellulose test were performed according to Ru et al [15].

RNA extraction and cDNA library construction
Total RNAs were extracted from each sample using the CTAB method described by Wan and Wilkins [16], with minor modifications to increase the yield. All RNA quality and quantity were measured by 1.0% agarose gel and an ultraviolet spectrophotometer. Four stages of RNAs (0 DPA, 5 DPA, 15 DPA and 20 DPA) with the same concentration and quality from each line were mixed together as one mixed library. The two libraries of Z263 and Z128 were constructed using the method described by Xia et al. [17].

RNA-seq and sequence de novo assembly
The sequencing of two libraries of Z263 and Z128 were performed on HiSeq 2000 (illumina) by the Beijing Genomics Institute (BGI) (Shenzhen, Guangdong, China). The raw reads, transformed from images, were first processed by removing adaptors and redundant fragments to generate clean reads. The clean reads de novo assembly was carried out using the short reads assembling program-SOAPdenovo [18]. Clean reads with a certain length of overlap were first combined to form ''contigs''. Then the clean reads were mapped back to the contigs; it was possible to detect contigs from the same transcript, as well as the distances between them, by paired-end reads. Next, SOAPdenovo connected the contigs into ''scaffolds''; ''N'' was used to present unknown sequences. Paired-end reads were used again to fill the gaps between scaffolds, longer sequences which could not extend at either end were assembled and defined as ''unigenes''. In this study, the unigenes from the two libraries were further processed by sequence splicing and redundancy removal with the sequence cluster program-TGICL [19] to acquire non-redundant unigenes that were as long as possible.

Differential expressed genes (DEGs) identification and enrichment analysis
Referring to the method described by Audic and Claverie [22], the Beijing Genomics Institute (BGI) developed a rigorous l is the real transcripts of the gene ð Þ The total clean tag number of sample 1 is N 1 , and total clean tag number of sample 2 is N 2 ; gene A holds x tags in sample 1 and y tags in sample 2. The probability of gene A expressed equally between two samples can be calculated with the following formula: The P value corresponds to the differential gene expression test. The false discovery rate (FDR) is a method used to determine the threshold of the P value in multiple tests. In our study, the reads per kilobase of exon model per million mapped reads (RPKM) value, was calculated referring to the formula described by Mortazavi et al. [23] and was used to quantify the transcript level of Z128 versus Z263. FDR#10 23 and the absolute value of the log2Ratio (Z128_RPKM/Z263_RPKM) $1 as the threshold, were used to judge significant differences in gene expression. DEGs were then subjected to GO functional enrichment analysis and KEGG pathway enrichment analysis.
GO functional enrichment analysis provides GO terms, which significantly enrich in DEGs compared to the genome background, indicating that the DEGs are connected to interesting biological functions. All DEGs are firstly mapped to GO terms in the database, calculating gene numbers for every term, then the ultra-geometric test is used to find significantly enriched GO terms in DEGs compared to the genome background. The formula is: Where N is the number of genes with GO annotation; n is the number of DEGs in N; M is the number of genes that are annotated according to the GO terms; m is the number of DEGs in M. The calculated P value#0.05 was taken as a threshold. GO terms fulfilling this condition are defined as significantly enriched GO terms in DEGs. This analysis recognizes the main biological functions that DEGs participate in.
Pathway enrichment analysis identifies significantly enriched metabolic pathways or signal transduction pathways in DEGs compared with the whole genome background. The formula is the same as that in GO analysis. Here, N is the number of genes with KEGG annotation, n is the number of DEGs in N, M is the number of genes related to specific pathways, and m is number of DEGs in M (Qvalue#0.05). All DEGs are further mapped on to each pathway; up-regulated genes are marked with red borders while down-regulated genes are marked with green borders.

qRT-PCR analysis for genes related to flavonoid synthesis
The genes and primers used for the gene expression analysis related to flavonoid synthesis were listed in Table 1. qRT-PCR was performed in a total volume of 20 mL with 10 mL SYBR Premix Ex Taq(26) (Takara, Japan), PCR forward primer (10 mM) 0.4 mL,PCR reversed primer 0.4 mL (10 mM),ROX Reference Dye II (506) 0.4 mL,cDNA template 2.0 mL and ddH 2 O 6.8 ml on a 7500 real-time PCR machine (Applied Biosystems) according to the manufacturer's instructions. PCR amplification employed a 10 s denaturing step at 95uC, followed by 5 s at 95uC and 40 s at 60uC with 40 cycles. Relative mRNA levels were calculated by the 2 2DDCT method with Gh18S (accession number: L24145) as an internal control.

Statistical analysis
All of the experiments concerning data comparisons were performed three times. Statistical analyses were performed using the S-N-K method of independent-samples t-test at 95% confidence with IBM SPSS Statistics 11.0 (SPSS Inc., Chicago, USA). Values with different lowercases represent a significant difference at P,0.05.

The differences in fiber color and quality between Z263 and Z128
Line Z128 was a selected mutant from Z263 and both lines have similar genetic backgrounds. However, their fibers were different. As shown in Fig. 1, the fiber of Z263 was dark brown while that of Z128 was light. Though the primary walls of the fiber cell in both Z263 and Z128 contain pigments (Fig. 1), more pigments were laid in the lumen of the fiber cell in Z263 compared with that Z128. This resulted in a darker color in the Z263 fiber. However, the fiber yield and quality of Z128 was better than that of Z263. The lint percentages (%), upper half mean length (mm), micronaire and fiber strength in Z128 were also better than those in Z263 (Fig. 1). Furthermore, Z128 contained more cellulose (98.5%) than Z263 (94.5%).

The de novo assembled transcriptome of the fibers in Z263 and Z128
By removing useless sequences, a total of 38,114,054 (2,858,554,050 nucleotides) and 39,355,642 (2,951,673,150 nucleotides) 75 bp-length clean reads were obtained from the Z128 and Z263 mRNA libraries, respectively. A total of 170,201 and 182,404 contigs, 143,588 and 151,478 scaffolds and 59,926 and 71,895 unigenes were assembled in the Z128 and Z263 library, respectively. Because both of the samples for library construction were collected from the same tissue (ovule and fiber), two unigene libraries were taken forward for sequence clustering and redundancy removal to generate a new unigene library (All-unigene library) to make the non-redundant unigenes as long as possible. The All-unigene library contained 71,895 unigenes with an average length of 533 bp, which was obviously greater and longer than the other two libraries. The length of most of the unigenes in the three libraries was in the range of 100-1,000 bp, accounting for 93.08%, 92.44% and 88.21% in Z128-, Z263-and all-unigene library, respectively.
Since the all-unigene library contained the most complete and longest sequences, it was used to run batch alignment with a cutoff E-value of 10 25 on online public databases. A total of 49,941 (69.46%) and 31,714 (44.11%) unigenes received annotations from the NCBI non-redundant (nr) and Swiss-Prot databases, respectively. KEGG, KOGs and GO similarity analyses indicated that 20,241 (28.15%), 14,333 (19.94%) and 7,757 (10.79%) unigenes matched these databases, respectively. To investigate the genomic similarity between Gossypium and other species, we estimated all annotations of unigenes from nr. The result showed that the most abundant unigenes were annotated as ''Vitis vinifera'', ''Ricinus communis'' and ''Populus trichocarpa'', which accounted for 29.71%, 29.59% and 24.63%, respectively. Furthermore, we annotated 68.2% and 80.2% unigenes on the recently r-eleased A and D genome of diploid cotton (which were considered to be two donor g-enomes of the tetroploid cotton subgenome), respectively.
The all-unigene library was aligned with the KEGG pathway database, and the result showed a total of 38,645 unigenes which were classified into six categories, mostly concentrated in three of them: metabolism (11,371; 29.42%), protein families (10,638; 27.53%) and cellular processes (6,959; 18.01%). Furthermore, 1,315 unigenes were clustered in the ''biosynthesis of secondary metabolites'' category (Table S1).

Differentially expressed genes (DEGs) analysis
The unigene expression level in the Z128 and Z263 libraries were compared. A total of 13,278 (20.26%) unigenes were significantly differentially expressed when these two non-redundant libraries were compared (FDR#0.001, |log 2 Ratio|$1). A total of 5,345 (8.16%) of them were up-regulated, while 7,933 (12.10%) of them were down-regulated; the others were not DEGs (Fig. 4).
The GO analysis results showed that when corrected (P-value# 1), seven enriched terms belonged to three ontologies, one of them was categorized in ''biological process'', three were in ''cellular component'' and three were in ''molecular function'' ( Table 2). Table 3 showed the top 10 DEGs enriched pathways, seven of which were categorized as a ''metabolism'' pathway, two were categorized as ''genetic information processing'' pathways and the other one was categorized as ''environmental information processing''. Further identification indicated that five of the ''metabolism'' terms belonged to the ''biosynthesis of secondary metabolites'' sub-category.

Expression of related genes for color and fiber development in two cotton lines of different fiber color and quality
The pigmentation deposits in the brown colored cotton fiber were closely related to the flavonoid and proanthocyanidins biosynthesis. In this study, the ''anthocyanin biosynthesis'' and ''flavone and flavonol biosynthesis'' appeared on the top 10 list of the KEGG enrichment analysis (Table 3). Further analysis using the KEGG database indicated that a total of 14 unigenes were involved in the ''flavone and flavonol biosynthesis'' pathway (Table 4), which could be further classified as three orthology categories. ''K05280'' contained one up-regulated unigene which encoded flavonoid 39-hydroxylase (F39H). In addition, 10 downregulated unigenes that encoded flavonol 3-O-methyltransferase (OMT) belonged to ''K05279''; in ''K10757'', all unigenes encoded flavonol 3-O-glucosyltransferase (FOGT), two of them were up-regulated, and one was down-regulated. A total of four unigenes were mapped in the ''anthocyanin biosynthesis'' pathway, all of them were up-regulated. All four unigenes were involved in anthocyanin 5-O-glucosyltransferase (5GT) encoding.
Another important pigmentation pathway in plants is the ''flavonoid biosynthesis pathway'', which is also considered to be a key pathway for pigment formation in brown fiber cotton. Although it was not shown in Table 3, eight important genes were involved in this pathway. Chalcone synthase (CHS), chalcone isomerase (CHI), leucoanthocyanidin reductase (LAR), anthocyanidin reductase (ANR) and anthocyanidin synthase (ANS) were down-regulated in this pathway, while flavonoid 39-hydroxylase (F39H), flavanone 3-hydroxylase (F3H) and flavonol synthase (FLS) were all up-regulated. According to the distribution of DEGs in the entire pathway, most of the down-regulated DEGs (CHS, CHI, LAR, ANR, ANS) were enriched in upstream and downstream pathways, and the up-regulated DEGs were in the middle of the flavonoid biosynthetic pathway (F3H, F39H, FLS). However, another gene in the middle of the pathway, the dihydroflavonol 4-reductase (DFR), was unchanged (Fig. 5). Furthermore, a gene that encodes 5-O-glucosyltransferase in the anthocyanin biosynthesis pathway was up-regulated. The downregulated genes in the flavonoid biosynthesis pathway suggested that there were less pigments in Z128 compared with that in Z263. This was confirmed by the lighter brown fiber color in Z128 compared to the dark brown fiber in Z263.
To test the reliability of comparative transcriptional data, qRT-PCR analysis was performed for CHS, CHI, LAR, DFR, F3H, F39H, ANR and ANS. Samples were selected from the flavonoid biosynthesis pathway across five developmental time points from 0 DPA to 20 DPA. Overall, the results of the qRT-PCR analysis were consistent with the results from the transcriptome for the mixed mRNAs of five developmental time points of the eight selected genes. However, when Z128 was compared to Z263 at 10 DPA, the selected eight genes had significantly higher transcript levels in Z128 (Fig. 5). This suggests that the genes involved in the flavonoid biosynthesis pathway at 10 DPA are related to better fiber quality formation. Moreover, the genes involved in flavonoid biosynthesis affected the fiber color and fiber quality of the brown fiber cotton. Some other genes such as the ethylene related factors also regulated fiber development [24]. According to Table 5, all of the ethylene related factors had higher expression levels in Z128 than in Z263. The DEGs analysis also revealed that transcript levels of different fiber development related factors varied dramatically in cotton fibers, such as reactive oxygen [25], ethylene [24], Ca 2+ /CaM [26], and sucrose phosphate synthase [27] (Table 5).

Discussion
The RNA-seq approach based on next generation sequencing technology provided us with a new method to study the transcriptome of developing cotton fibers. It was not dependent on existing genome information and was an efficient way to quantify the expression level of a single gene without high background noise [28]. In recent years, this technology has been successfully applied in transcriptome studies for many non-model organisms [13,17,[29][30].
In this study, we mixed the RNA from four important fibers at various developmental stages (5, 10, 15 and 20 DPA) to de novo assemble the transcriptome of developmental colored cotton fiber. A total of 125,014 unigenes were generated in two sequencing libraries, which were further assembled into 71,895 all-unigenes with an average length of 533 bp, 69.46% of which could be matched in the NCBI database (E-value#10 25 ). This volume of data was greater than that reported in previous studies on other species [17,30]. Approximately 20,000 unigenes identified from G. hirsutum were recorded in the NCBI database (Nov 2011). In this study, approximately 50,000 unigenes (EST) were matched with nr database records. Therefore, we believe that our unigene library contained almost all of the known unigenes from G. hirsutum. In conclusion, we acquired a high quality and wellassembled transcriptome library for developing colored cotton fibers.
In all nr-annotated unigenes, only 1,537 (3.08%) were directly annotated with the field of ''Gossypium hirsutum''; most other unigenes were assigned to other species, which mainly included ''Vitis vinifera'', ''Ricinus communis'' and ''Populus trichocarpa''. This implies that the genome of cotton may be very similar to these species and this could be a reference for a prospective cotton sequencing project.
Z263 was the offspring which derived from a cross between white fiber cotton and dark brown cotton, and Z128 was an inbred line selected from Z263 with a lighter color and better fiber quality. Therefore, the similar genetic backgrounds provided a fine model with which to study the mechanism underlying the formation of brown color in cotton fiber. We compared the whole transcriptome for each and, unexpectedly, an abundance of unigenes (20.26%) revealed significant differential expression between Z263 and Z128 (FDR#0.001, |log 2 Ratio|$1). This evidence demonstrates that all DEGs are relatively evenly distributed in most of the relevant metabolism pathways. Namely, the divergence of cotton fiber color and quality was the result of complex processes generated by multiple metabolic processes.
There is limited information in the literature on the molecular mechanism that underlies the formation of fiber color in cotton. Xiao et al [6] cloned five flavonoid structure genes from brown cotton fiber and found that all the cloned genes could be involved in pigmentation metabolism for brown fiber. Several studies focused on chemicals also suggested that proanthocyanidins (condensed tannins) might be the precursor of pigmentation in natural colored cotton fiber [7][8]. Here, almost of all the unigenes which encoded the key enzymes (CHS, CHI, LAR, ANS and ANR) of the flavonoid biosynthesis pathway were down-regulated in Z128 compared with that in Z263, implying that accumulation of proanthocyanidins in Z263 might be more than that in Z128. Zhan et al. [7][8] suggested that the depth of brown color might be closely related to the accumulated quantity of condensed tannins. Another unexpected finding in this study was related to the ''anthocyanin biosynthesis'' pathway. As a downstream metabolic pathway, it is one of the most important elements of pigment biosynthesis in plants [31][32][33]. We found that all unigenes involved in this pathway were significantly up-regulated (read in Z263 = 0) in Z128 and homologous with anthocyanin 5-O-glucosyltransferase (5GT), which could make anthocyanin more stable by modification [34]. Another recent study demonstrated that the lack of glucose at the 5 position of anthocyanin could lead to color variation in carnations [35]. This result implied that the depth of brown cotton fiber color variation might be the consequence of an interaction between oxidized tannins and glycosylated anthocyanin.
F3H has been thoroughly studied for decades and it is predominantly expressed during the fiber elongation stage in G. barbadense, a process that has no relation to pigment formation [36]. Furthermore, when the F3H-RNAi segment was transferred into brown fiber plants, they yielded more stunted fibers. Transgenic analysis showed that the suppression of F3H not only inhibited fiber elongation but also retarded fiber maturation [37]. F3H was suppressed in Z263 but up-regulated in Z128. This evidence suggests that F3H is important in fiber development. Like F3H, F39H and FLS were very important in the pigments synthesis. Therefore, it is possible that the genes in the middle were Fiber Development and Pigmentation Formation in Brown Cotton up-regulated. According to Xiao et al [6] and Zhan et al [7], tannins could be the key chemical responsible for the brown color in cotton fiber. As shown in Fig. 5, LAR, ANR, and ANS were the key enzymes to accumulate the tannins. The down-regulated CHS, CHI, LAR, ANR, and ANS genes in the main pathway of pigment formation should be related to lighter fiber color of Z128 than that of Z263.
The pigmentation related genes not only regulated the fiber color but also influenced the fiber quality. The flavonoids are abundant and widely distributed plant secondary metabolites, and known to be an active participant in fiber development [38]. Previous studies showed that in fiber cells, the flavonoid genes were dominantly expressed in the fiber elongation stage [39][40][41]. In our study, CHS, CHI, LAR, DFR, F3H, F39H, ANR and ANS showed higher expression levels at 10 DPA in Z128, thus highlighting that flavonoid metabolism represents a novel pathway with the potential for cotton fiber improvement. Our GWAS analysis of SNP in cotton germplasm indicated that the genes involved in flavonoid biosynthesis were also associated with fiber quality traits (unpublished data). Therefore, the highly expressed flavonoid gene in the fiber elongation stage in Z128 should be related to better fiber quality.
Cotton fibers are single-celled trichomes that differentiate from the ovule epidermis, including fiber initiation, elongation, secondary cell wall biosynthesis and maturation, leading to mature fibers. Ca 2+ and ROS are two important factors involved in fiber cell growth [42]. Ca 2+ /Calmodulin (CaM) is involved in plant growth and development through interaction with ROS signaling [43]. Based on gene expression profile analysis, Ca 2+ /CaM is implicated in cotton fiber elongation. However, currently, there remains little direct evidence of the mechanism of Ca 2+ /CaM on cotton fiber development. In our study, Ca 2+ related genes were either downregulated or up-regulated in Z128 compared with that in Z263.
Recent literature indicates that ethylene acts as a positive regulator of root hair, apical hook, and hypocotyl development [44][45][46]. Furthermore, Shi et al. [24] found that ethylene biosynthesis was one of the most significantly up-regulated biochemical pathways during fiber elongation. Exogenously applied ethylene promoted robust fiber cell expansion, whereas its biosynthetic inhibitor L-(2-aminoeth oxyvinyl)-glycine (AVG) specifically suppressed fiber growth. The ethylene biosynthesis pathways in our data were not shown in Table 3 as the ''top 10 DEGs enriched pathways in KEGG analysis'', however, a number of ethylene related genes were up-regulated in Z128 compared with that in Z263, such as Unigene61435_All, Unigene40770_All, and Unigene62869_All. This suggests that ethylene related genes may contribute to better fiber quality in Z128.

Supporting Information
Table S1 Classification of all-unigenes by KEGG analysis.