Transcriptome Analysis of Flower Sex Differentiation in Jatropha curcas L. Using RNA Sequencing

Background Jatropha curcas is thought to be a promising biofuel material, but its yield is restricted by a low ratio of instaminate / staminate flowers (1/10-1/30). Furthermore, valuable information about flower sex differentiation in this plant is scarce. To explore the mechanism of this process in J. curcas, transcriptome profiling of flower development was carried out, and certain genes related with sex differentiation were obtained through digital gene expression analysis of flower buds from different phases of floral development. Results After Illumina sequencing and clustering, 57,962 unigenes were identified. A total of 47,423 unigenes were annotated, with 85 being related to carpel and stamen differentiation, 126 involved in carpel and stamen development, and 592 functioning in the later development stage for the maturation of staminate or instaminate flowers. Annotation of these genes provided comprehensive information regarding the sex differentiation of flowers, including the signaling system, hormone biosynthesis and regulation, transcription regulation and ubiquitin-mediated proteolysis. A further expression pattern analysis of 15 sex-related genes using quantitative real-time PCR revealed that gibberellin-regulated protein 4-like protein and AMP-activated protein kinase are associated with stamen differentiation, whereas auxin response factor 6-like protein, AGAMOUS-like 20 protein, CLAVATA1, RING-H2 finger protein ATL3J, auxin-induced protein 22D, and r2r3-myb transcription factor contribute to embryo sac development in the instaminate flower. Cytokinin oxidase, Unigene28, auxin repressed-like protein ARP1, gibberellin receptor protein GID1 and auxin-induced protein X10A are involved in both stages mentioned above. In addition to its function in the differentiation and development of the stamens, the gibberellin signaling pathway also functions in embryo sac development for the instaminate flower. The auxin signaling pathway also participates in both stamen development and embryo sac development. Conclusions Our transcriptome data provide a comprehensive gene expression profile for flower sex differentiation in Jatropha curcas, as well as new clues and information for further study in this field.


Results
After Illumina sequencing and clustering, 57,962 unigenes were identified. A total of 47,423 unigenes were annotated, with 85 being related to carpel and stamen differentiation, 126 involved in carpel and stamen development, and 592 functioning in the later development stage for the maturation of staminate or instaminate flowers. Annotation of these genes provided comprehensive information regarding the sex differentiation of flowers, including the signaling system, hormone biosynthesis and regulation, transcription regulation and ubiquitin-mediated proteolysis. A further expression pattern analysis of 15 sex-related genes using quantitative real-time PCR revealed that gibberellin-regulated protein 4-like protein and AMP-activated protein kinase are associated with stamen differentiation, whereas auxin response factor 6-like protein, AGAMOUS-like 20 protein, CLAVATA1, RING-H2 finger protein ATL3J, auxin-induced protein 22D, and r2r3-myb transcription factor contribute to embryo sac development in the instaminate flower. Cytokinin oxidase, Unigene28, auxin repressed-like protein ARP1, gibberellin receptor protein GID1 and auxin-induced protein X10A are involved in both stages mentioned above. In addition to its function in the differentiation and development of the stamens, the gibberellin signaling pathway also functions in Introduction Jatropha curcas L., which belongs to the family Euphorbiaceae, is endemic to tropical America and is now prevalent in Africa and Asia. Being composed of 75% unsaturated fatty acids, the oil from J. curcas seeds is suitable for producing biodiesel. In addition to biodiesel, J. curcas has potential use in the production of medicinal compounds. Moreover, due to its low nutrient requirement and drought tolerance, J. curcas can grow on wastelands, degraded lands and mine-contaminated lands, and afforestation with this plant has the potential to meet the increasing demand for oil seed crops without sacrificing agricultural land. However, in many locations, the planting of J. curcas brings little economic benefit due to the low seed yield of this plant, which greatly prevents the exploitation and expansion of this energy crop. Indeed, the yield in semi-arid areas or wasteland is 2-3 ton ha -1 yr -1 [1][2][3]. Even under conditions that are suitable for its growth (good soil and an average annual rainfall of 900-1200 mm), the dry seed yield under optimal management practice is only 5 ton ha -1 yr -1 [2][3][4]. As the small ratio of instaminate to staminate flowers (1/10-1/30) is one of the critically limited factors for the low seed yield of J. curcas [5], exploring the mechanism of floral sex differentiation in this plant is urgent; this valuable information would provide clues for improving seed yield.
As a new biofuel plant, J. curcas has attracted much attention of researchers, and studies on its genetics and genomics have been conducted in recent years. The genome size of J. curcas is estimated at 410 Mb [6]. Natarajan et al. [7] obtained 12,084 expressed sequence tags (ESTs) from developing seeds, and the contig assembly of these ESTs resulted in 6,361 unigenes. In the study of Costa et al. [8], 13,249 ESTs were obtained from developing and germinating endosperm and were assembled into 4,622 unigenes. Wang et al. [9] reported the gene expression profiles of the flower bud at the metaphase stage and of the developing seed from the early to metaphase stages after pollination; a total of 9,289 ESTs were obtained and then assembled into 4,502 unique sequences. Ban et al (2014) analyzed the transcriptome of the inflorescence meristems of Jatropha curcas treated with cytokinin, obtaining 81,736 unigenes, providing comprehensive information of cytokinin-response genes [10]. A total of 40,929 complete and partial sequences of protein-encoding genes were obtained by Sato et al. [11] through genome sequence analysis. Several flower development-related genes have also been identified, including a flowering time regulator (SVP), three floral meristem identity genes (APETALA2, APE-TALA3, and PISTILLATA), and five flowering regulators (CONSTANS, FLOWERING LOCUS D and F, LEAFY, and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1). However, molecular information about the florescence processes of J. curcas, including the gene expression profiles of different flower development stages, the genes involved in flower development, and sex-related genes, is still very limited.
In this research, Illumina short-read technology and a digital gene expression (DGE) system were used, and de novo transcriptome assembly and sequence annotation were performed. DGE libraries from flower buds at different stages for sex differentiation and floral development were compared with each other, which would help in identifying differentially expressed genes (DEGs) that are involved in the sex differentiation of flowers and in gaining insight into the mechanism of flower sex differentiation in J. curcas.

Morphological observation of developing flowers
Sex differentiation occurred after the initiation of five petal primordia; unlike staminate flower, the instaminate flower experienced a hermaphrodite period. During the development of staminate flowers, the five outer whorls of stamen primordia were initiated first, followed by the five inner whorls of stamen primordia. The instaminate flowers experienced a period of bisexual flowers. When the three carpel primordia were initiated, ten stamen primordia were also initiated and developed along with the development of the carpels. As the three carpels began to fuse, the development of the stamens was arrested and gradually degenerated as the ovary bulged (Fig 1). The sex differentiation process of J. curcas flowers could be divided into six stages: stage 1: sex differentiation is not initiated; stage 2: from stamen primordia beginning to differentiate to ten stamen primordia formed; stage 3: from ten stamen primordia formed to mature staminate flowers; stage 4: from carpel primordia beginning to differentiate to three distinct carpels formed; stage 5: carpels continue developing; stage 6: from carpels beginning to fuse to mature instaminate flowers (Fig 1).
Using an anatomical lens, the external morphology of flower organs, especially the stamens and the carpels, were observed to identify the development stage of flower buds. The flower bud samples were then divided into six groups according to their development stages: sample 1 (S1), the flower buds of stage 1; sample 2 (S2), the flower buds of stage 2; sample 3 (S3), the flower buds of stage 3; sample 4 (S4), the flower buds of stage 4; sample 5 (S5), the flower buds of stage 5; and sample 6 (S6), the flower buds of stage 6 (Fig 1).

Illumina transcriptome sequencing and assembly of clean reads
To analyze the transcriptome of J. curcas, mRNA was isolated from total RNA and then fragmented; these mRNA fragments were used as templates to synthesize cDNA. The obtained cDNA was sequenced using the Illumina sequencing platform. After filtering the adaptors and low-quality sequences, 54,996,594 clean reads with a mean length of 90 bp were obtained (Table 1). These clean reads were then assembled using short reads assembling software (Trinity). As a result, a total of 57,962 unigenes were obtained, including 26,689 clusters and 31,273 singletons (Table 2), with a mean length of 988 bp and an N50 of 1474 (i.e., 50% of the assembled bases were assembled into a unigene with a length of 1474 bp or longer) ( Table 2, Fig 2). The E-value and similarity distribution of the best hits against the nr database were shown in Fig 3A and 3B, respectively.  (Table 3). With regard to species distribution, the highest percentage of unique sequences matched to the genes of Ricinus communis (66.3%), followed by Populus trichocarpa (16.9%), Vitis vinifera (8.0%), Glycine max (1.8%), J. curcas (1.6%), Medicago truncatula (0.7%), Hevea brasiliensis (0.5%) Stamens and carpels continue to develop. q. Carpels close gradually, and stamens stop developing. r. In the instaminate flower, the ovary bulges, and the stamens become smaller and degenerate. s. The stamens had degenerated, and the stigma was formed. B, Bracts and bract primordia; C, carpels and carpel primordia; F, floral primordia; I, inflorescence primordia; P, petals and petal primordia; S1, the outer whorl of stamens; S2, the inner whorl of stamens; Se1~Se5, the first sepal, the second sepal, . . ., the fifth sepal (according to the order of initiation), respectively; St: stigma. Stage 1 (S1): from a to e; stage 2 (S2): from f to h; stage 3 (S3): from i to j; stage 4 (S4): k-l; stage 5 (S5): from m to p; stage 6 (S6): from q to s (S6-1: q, the carpels began to fuse; S6-2: r, the carpels were fused; S6-3: s, the stigma was formed.) (Fig 1s is  and other unclear proteins (4.2%) (Fig 3C). It is possible that only a few unigenes matched the EST database because the final J. curcas genome data are not yet available in NCBI.

Sequencing of DGE libraries and alignment with a reference database
Eighteen DGE libraries for the samples at different flower development phases were sequenced. The analysis for the sequencing saturation, homogenization and randomness suggested that these results could reflect the actual expression of genes; the data were thus suitable for further analysis (S1

Predicted genes involved in floral development and sex differentiation
Regarding the DEGs related to the transition of the meristem from vegetative to reproductive phase, transcript abundance for the majority of genes was reduced in the maturation stage of stamen (S1-S2-S3) or in the early development stage of instaminate flower (S1-S4), but enhanced in S6 of instaminate flower development (S3 Fig). Transcripts for the S-locus-specific-glycoprotein-S6-precursor (unigene26458_S7), F-box and WD40-domain protein (Unigene25463_S7), S-adenosylmethionine_decarboxylase (CL2711.Contig3_S7), and flowering-related B-class MADS-box protein (Unigene20469_S7) genes were increased in the mature stage in both staminate (S1-S2-S3) and instaminate organs (S1-S4-S5-S6). In contrast, the transcripts of certain gene were reduced during the development of staminate or instaminate organs, such as the genes for protein-unusual-floral organs (Unigene 13103_S7), Apetala2 (Unigene19958_S7), JHL07K02.12 (CL7553.Contig1_S7), RAD54B (CL3359.Contig1_S7), bel1-homeotic-protein (Unigene23982_S7) and Protein-BABY-BOOM (Unigene14170_S7). Other genes exhibited the highest transcript level in the stamen or carpel developing stage of staminate or instaminate organs (stages S2, S4-S5), including Sepallata-1-like-protein (CL3654.Contig1_S7). Accordingly, higher transcripts in both the S1 and S6 stages were observed for more than one-fourth of the transporter DEGs that showed changes in transcript levels during the S1-S6 stages (S4 Fig). Specifically, some genes were up-regulated in the staminate (S1-S2-S3) and instaminate (S1-S4-S5-S6) organ maturation stages, including metal-nicotianamine transporter (transporter of boron, copper, zinc/iron), ATP-binding cassette (transporter of Nramp, hexose, sugar, aromatic and neutral amino acid), cationtransporting ATPase [transporter of aquaporin (PIP and NIP family member)], outward rectifying K+ exchanger, and vacuolar cation/proton exchanger. Members of several large gene families, such as the nitrate transporter family, amino acid family and multidrug resistance protein family, exhibited various changes during floral organogenesis. Transcripts for genes related to phytohormone synthesis or responsive proteins also varied in different floral stages (S5 Fig). Transcripts for the majority of genes related to the synthesis of auxin, ethylene, and jasmonic acid (referred to as lipoxygenase) or response to the stimulation of these three phytohormone were increased at the staminate (S1-S2-S3) and instaminate (S1-S4-S5-S6) organ maturation stages, whereas transcripts of cytochrome (referred to the synthesis of abscisic acid) were reduced at these two stages. Based on flower sex differentiation traits, sex-related genes were screened out and divided into three groups: group 1, genes differently expressed between staminate and instaminate flower at the stamen and carpel differentiation stages (genes showed difference not only in the expression trend between the stages from S1 to S2 in staminate flower and stages from S1 to S4 in instaminate flower but also in the expression level between S2 and S4); group 2, genes differently expressed between staminate and instaminate flower at the stamen and carpel development stage (genes showed difference not only in the expression trend between the stages from S2 to S3 in staminate flower and stages from S4 to S5 in instaminate flower but also in the expression level between S3 and S5); and group 3, genes differently expressed between staminate and instaminate flower at their later development stage (genes showed difference not only in the expression trend between the stages from S2 to S3 in staminate flower and stages from S5 to S6 in instaminate flower but also in the expression level between S3 and S6); Sex-related genes were screened out from the DEGs, including 85 genes in group 1, 126 in group 2 and 592 in group 3, and then analyzed in the GO and KEGG databases. The molecular functions that only distributed in group 3 included three biological process subcategories (biological adhesion, viral reproduction and pigmentation), three cellular component subcategories (extracellular matrix, extracellular matrix part and membrane-enclosed lumen) and one molecular function subcategory (protein binding transcription factor activity), indicating that these molecular functions are related to the sex differentiation of flowers, such as carpel fusion and stamen degeneration. Locomotion was distributed in groups 1 and 3, suggesting that this process is closely related with stages 2 and 4 of sex differentiation. However, valine, leucine and isoleucine degradation and protein export were only found in group 1, suggesting that these two pathways are involved in carpel and stamen differentiation ( Fig 5). Furthermore, being only found in group 3, several pathways (such as brassinosteroid biosynthesis; RNA polymerase; ubiquitin-mediated proteolysis; purine metabolism; pyrimidine metabolism; arginine, proline, cysteine, methionine and tryptophan metabolism; terpenoid biosynthesis; phosphatidylinositol signaling system; and glycan degradation) appear to participate in sex differentiation (such as the development of stamens and flower mature in staminate; carpel fusion and flower mature in instaminate) (Fig 6). On the other hand, the genes showed both different expression levels and opposite expression trends between S2 and S4 or S6 and S3 were screened out as important sex-related genes, obtaining 14 genes ( Table 4). The annotation and expression level of these genes were showed in Table 4.

Gene expression patterns of selected genes
To validate the expression profiles obtained by RNA-Seq, quantitative real-time PCR (qRT-PCR) was performed on twelve genes selected at random with high or low expression levels. Expression comparisons were performed between S1 and S2, S1 and S3, S1 and S4, S1 and S5, and S1 and S6. For all genes, the trend in qRT-PCR expression agreed with the RNA-Seq data except for that of Unigene17081_S7 and Unigene17829_S7 (Fig 7). The expression patterns of 15 selected genes were shown in Fig 6. Unigene11972 _S7, Uni-gene3816_S7, Unigene4053_S7, CL8098.Contig1_S7, Unigene22184_S7, Unigene2885_S7 and Unigene8983_S7 exhibited a small degree of up-regulation in S2 and S3, but a large extent of up-regulation was observed at embryo sac development stage of instaminate (S6-1, S6-2 and S6-3; S6 stage for instaminate flower development is divided by S6-1, S6-2 stage, S6-3 stage, as mentioned in Figs 1 and 8). The expressions of Unigene3816_S7 and Unigene8983_S7 changed slightly in S4 and S5, while that of Unigene22184_S7, Unigene2885_S7 and Unigene11972_S7 showed small increases in S4 and/or S5. Unigene4053_S7 and CL8098.Contig1_S7 were slightly down-regulated in S4 and S5, and CL3449.Contig2_S7 and Unigene25207_S7 were greatly down-regulated after S1 in both instaminate and staminate flowers. Unigene5429_S7, Unigene28_S7 and Unigene11739_S7 were markedly up-regulated at S2, followed by a downregulation at S3 in staminate flowers; but were not significantly changed at S4 and S5, followed by up-regulation at S6-1 and S6-2, and down-regulation at S6-3 in instaminate flowers. CL1944.Contig 2_S7 was down-regulated at S2, and then shown a small up-regulation at S3 in staminate flowers; and firstly down-regulated at S4, and greatly up-regulated at S6-2, followed by a moderate down-regulation at S6-3 in instaminate flowers. The expression of Uni-gene5534_S7 was not significantly affected in S2 or S4 and showed only a small increase at S3 but a greater increase at S5; after a small decrease at S6-1, this expression greatly increased at S6-2 followed by a small decrease at S6-3. Unigene19315_S7 was only significantly up-regulated at S6-2, followed by a down-regulation at S6-3 in instaminate flowers.

Discussion
In total, 54,996,594 clean reads were obtained by transcriptome sequence analysis, and 57,962 unigenes were obtained via read assembly. Of these unigenes, 47,423 were successfully Expression comparisons were performed between S1 and S2, S1 and S3, S1 and S4, S1 and S5, and S1 and S6.   (Table 3). To investigate the molecular mechanism of flower sex differentiation, we constructed eighteen DGE libraries from flower buds at different developmental stages to analyze gene expression patterns. The qualities of the DGE libraries were further confirmed by qRT-PCR (Fig 7). In our study, the majority of DEGs for the meristem transition from vegetative to reproductive stage were induced prior to the S1 stage when sex differentiation was not initiated yet, and down-regulated in the reproductive stage (after S2 stage) due to the sensitive cellular regulation system (S3 Fig). A previous study showed that ethylene and auxin were involved in floral organogenesis [12]; similarly, the majority of DEGs related to ethylene and auxin were up-regulated in the male and female organ mature stages (S3 Fig). Certain stress hormone-related genes were also up-regulated in this process, which is consistent with the studies of Liu et al. [13] and Huang et al. [14,15]. Metal absorption by floral organs would be accelerated during the reproductive stage; thus, boron, copper, and zinc/iron transporters, and a metal-chelating transporter (metal-nicotianamine transporter) were up-regulated (S2 Fig). Based on the DGE analysis, sex-related genes were screened out, including 85 genes in group 1, 126 genes in group 2 and 592 genes in group 3. As sex differentiation progressed, the number of sex-related genes increased (Figs 5 and 6). Several molecular functions and biological pathways were unique to group 3 genes, suggesting that they contribute to the late period of sex differentiation, such as the extracellular matrix, biological adhesion, protein binding transcription factor activity, active base and amino acid metabolism, glycan degradation, protein biosynthesis, brassinosteroid biosynthesis, terpenoid biosynthesis, phosphatidylinositol signaling system and ubiquitin-mediated proteolysis (Figs 5 and 6).
Application of exogenous 6-benzyladenine (6-BA) (a synthetic cytokinin) on inflorescence buds with a diameter about 0.5 cm was reported to increase both the total number of flowers per inflorescence and the instaminate-to-staminate flower ratio in J. curcas [10]. A further gene expression profiles analysis showed that 6-BA treatment not only up-regulated JcCKX5b (a cytokinin oxidases/dehydrogenases gene), but also down-regulated JcLOG9 (a gene involves in the final step of bioactive cytokinin synthesis), whereas hardly affected other genes that function in cytokinin oxidation or biosynthesis [10], which suggested that exogenous cytokinin application perhaps negatively affected the biosynthesis of endogenous cytokine. Similarly, in present research, a total of five cytokinin oxidases/dehydrogenases genes (Unigene11972_S7, Unigene2716_S7, Unigene13960_S7, Unigene13959_S7, Unigene9593_S7) were identified, and only Unigene11972_S7 was differently expressed between staminate and instaminate flowers, showing a down-regulation at the stages from S2 to S3 in staminate and a up-regulation at the stages from S4 to S6 in instaminate flower. Different from the transcriptome data, the expression level of Unigene11972_S7 determined by qRT-PCR exhibited a significant increase at stamen development stage in staminate flower, and a sharper and larger increase after carpel differentiation stage in instaminate flower (Fig 8J). Pan et al (2014) and our study suggested that endogenous cytokinin would be decreased during sex differentiation and development of J. curcas. To elucidate the role of dogenous cytokinin during sex differentiation and development of J. curcas, it is necessary to study the endogenous cytokinin level and the expression pattern of the genes involved in cytokinin biosynthesis, metabolism and signaling of flower bud at different stage of sex differentiation. Gibberellic acid (GA) was expected to play important roles in flower sex differentiation in J. curcas. AGAMOUS-like 20 (AGL20), a gene that is downstream of FLC, is either positively Fig 7. Expression pattern of randomly selected genes. The fold changes of the genes were calculated as the log 2 value of the S2/S1, S3/S1, S4/S1, S5/ S1, and S6/S1 comparisons and are shown on the y-axis ("Fold expression" indicates the fold changes of DGEs, and "Normalized fold expression" indicates the fold changes of the genes by qRT-PCR analysis). regulated by CO and GA or negatively regulated by FLC [16], and acts as an important integrator of four pathways (vernalization, autonomous, photoperiod and GA-dependent pathways) controlling flowering in Arabidopsis. GID1 positively regulates the GA-signaling cascade in Arabidopsis. GID1 forms a complex with GA (GID1-GA) and then directly interacts with DELLA protein (a repressor of GA signaling), promoting interaction between DELLA and SLY1 (a component of the SCF SLY1 E3 ubiquitin ligase) and triggering the degradation of DELLA [17,18]. In addition, GASA4 can be up-regulated by GA in meristematic regions and is involved in floral meristem identification, early flower development, and style and stamen filament maturation [19,20]. In J. curcas, AGL20, GID1 and GASA4 perhaps participate in the embryo sac development of instaminate flowers (the stages from carpel begain to fuse to flower mature, such as S6-1, S6-2 and S6-3), while GASA4 also functions in stamen differentiation. AGL20 (CL3449.Contig2_S7) exhibited a marked down-regulation after S1, a slight up-regulation trend at S5, and a notable up-regulation at S6-2 ( Fig 8A). GID1 (Unigene5534_S7) was greatly up-regulated after the three carpels became distinct in instaminate flowers (S4) ( Fig  8K); and GASA4 (Unigene5429_S7) was up-regulated at S2 and S6-2 stage (Fig 8I). Furthermore, a transcription factor (Unigene28_S7) that negatively regulates the GA signaling pathway was up-regulated at the stage of stamen differentiation (S2) (Fig 8L). In addition, the RING-H2 finger protein ATL3J (CL8098.Contig1_S7) and r2r3-myb transcription factor (Uni-gene3816_S7) were expected to function in the embryo sac development stage of instaminate flowers in J. curcas, and both genes were found to indeed be up-regulated at these stages (S6-1, S6-2 and S6-3) (Fig 8N and 8O). However, the slight up-regulation of r2r3-myb at S2 and S3 suggested that it also functioned in the differentiation and development of stamens (Fig 8O). RING finger protein, a subfamily of ubiquitin protein ligase E3, plays key roles in regulating growth/developmental processes and might be directly involved in the regulation of hormone signaling pathways, including that of GA [21][22][23][24]. R2R3-MYB genes, members of a large family, are involved in the signal transduction pathways of salicylic acid [25], abscisic acid [26], GA [27,28], and jasmonic acid [16], and play roles in sperm cell specification [29], stamen and pollen maturation [30], and anther tapetum and stigma papillae development.
Pentatricopeptide repeat-containing (PPR) proteins mediate specific RNA-processing events, including RNA editing [51], transcript processing [52], and translation initiation [53], and are involved in the restoration of cytoplasmic male sterility in the mitochondria [54][55][56] as well as the first mitotic division during male or female gametogenesis [57]. A PPR protein (Unigene2885_S7) might be involved in the differentiation of the stamen and carpel, and embryo sac development at its later stage, as suggested by a small up-regulation at stages S2 and S4 and a large up-regulation at embryo sac development stage of instaminate flower (S6-1, S6-2 and S6-3) (Fig 8B). CLAVATA1 (CLV1) can restrict WUSCHEL gene expression to maintain shoot and floral meristems [58][59][60][61][62] and control the number of carpels in the floral meristem [63]. In J. curcas, two homologous CLV1 genes (Unigene19315_S7 and Unigene4053_S7) appear to be involved in the embryo sac development (S6-2, S6-3 and/or S6-1). However, the up-regulation of Unigene4053_S7 in S2 and S3 suggested that this gene perhaps also functioned in stamen development (Fig 8C and 8D). On the other hand, an AMP-activated protein kinase (Unigene11739_S7) is expected to involve in the development of staminate flowers (S2, 3) and the embryo sac of instaminate flower (S6-1, S6-2 and S6-3) (Fig 8M), as suggested by its up-regulation at these stages.
The genes with different expression levels and opposite expression trends between S2 and S4 or S6 and S3 were screened out as important sex-related genes (Table 4). Of the 14 genes obtained, 2 genes (CL4566.Contig1_S7 and Unigene6830_S7) could not be annotated, and 7 genes had functional annotation based on sequence similarity, including CL8092.Contig2_S7 (inorganic phosphate transporter), unigene17854_S7 (ubiquitin carboxyl-terminal hydrolase), Unigene17053_S7 (CRABS CLAW), Unigene16429_S7 (ATP-binding protein), Uni-gene22673_S7 (cytochrome C oxidase subunit 1), CL778.Contig3_S7 (chlorophyll A/B-binding protein) and Unigene4888 (transcription factor). Of these seven genes, only CRABS CLAW has been reported to function in the sex differentiation of flowers, such as in carpel morphogenesis [64,65] and in the development of ovules and stamens, and is expected to have similar functions in J. curcas sex differentiation. Of the remaining six genes, chlorophyll A/B-binding protein and Unigene4888_S7 contribute to carpel differentiation; ubiquitin carboxyl-terminal hydrolase and inorganic phosphate transporter contribute to the embryo sac development; Unigene16429_S7 is expected to promote stamen degeneration at later development stages of instaminate flower; and cytochrome C oxidase subunit 1 contributes to the development of the stamen.

Conclusions
A total of 57,962 assembled unigenes were obtained using Illumina sequencing technology, and 47,423 unigenes were annotated. DGE library analysis provided comprehensive, valuable information regarding the sex differentiation of flowers in J. curcas. The expression pattern analysis using real-time qPCR suggested the following: GASA4, CLV1 (Unigene4053_S7) and AMP-activated protein kinase contribute to stamen differentiation; ARF6-like protein, AGL20, RING-H2 finger protein ATL3J, auxin-induced protein 22D, CLV1 (Unigene19315_S7) and r2r3-myb transcription factor contribute to development of mbryo sac in instaminate flower; and cytokinin oxidase, a transcription factor, ARP1-like protein, GID1 and X10A, function in both stamen differentiation and embryo sac development. GA signaling pathways contribute not only to the differentiation and development of stamens, but also to the development of embryo sac. In addition, auxin signaling pathways were thought to involve in stamen development, and also were important for the embryo sac development. Overall, our data provide valuable new clues and information about the sex differentiation of flowers in J. curcas, and further study on the functions of sex-related genes will be helpful in eluciding the sex differentiation mechanism of J. curcas.

Flower sample collection
Mixed flower buds or inflorescences at different developmental stages of J. curcas were collected from thirty trees in Zhenfeng, Guizhou Province, China (36°14 0 50.2@N, 87°51 0 47.8@E) (J. curcas is not an endangered or protected species, and no specific permission is required for its collection.). The flower buds used for microscopic observation were fixed immediately in formaldehyde-acetic acid-50% alcohol mixtures (4: 6: 90, v/v), and those used for transcriptome analysis were dipped immediately in RNAlocker (Tiandz, Inc, Beijing China) on ice.

Observation of flower buds at different developmental stages
For microscopic observation, after having been fixed in formaldehyde-acetic acid-50% alcohol mixtures for 27 h, the buds or inflorescences then were transferred to 70% ethanol for storage; thereafter, buds were anatomized in 95% ethanol under an optical anatomical lens. For SEM (scanning electron microscope) study, the buds selected under optical anatomical lens were desiccated through a graded series of alcohol-isoamyl acetate, coated with palladium using a Hitachi E-1010 at 15 mA. Flower organs were then observed with a Hitachi S-4800 SEM at 10.0 kV.

Flower sample collection and total RNA extraction
The flower buds stored in RNAlocker were then sorted according to development phases (stage 1: sex differentiation is not initiated; stage 2: from stamen primordia beginning to differentiate to ten stamen primordia formed; stage 3: from ten stamen primordia formed to mature staminate flowers; stage 4: from carpel primordia beginning to differentiate to three distinct carpels formed; stage 5: carpels continue developing; stage 6: from carpels beginning to fuse to mature instaminate flowers.) using an optical anatomical lens (Fig 1). For total RNA extraction, flower buds in the six different developmental phases were ground in liquid nitrogen, and total RNA was isolated using E.A.N.A Plant RNA Kit (Omega Bio-Tek, Inc, Norcross GA USA) according to the manufacturer's manual.
Library preparation of developing flower for transcriptome analysis RNA samples (named S7) from flower buds in six different developmental phases (as described above) were pooled for transcriptome analysis to obtain comprehensive gene expression information during the process of flower sex differentiation. After DNase I treatment, magnetic beads with oligo (dT) were used to isolate mRNA. The mRNA was then fragmented into short fragments in fragmentation buffer, and cDNA was synthesized using the mRNA fragments as templates. The resulting short cDNA fragments were purified with a QIAquick PCR extraction kit and resolved in EB buffer. After the fragment ends were repaired and poly (A) tailed, the short fragments were ligated to sequencing adapters. Suitable fragments were selected as templates for PCR amplification. During the QC steps, an Agilent 2100 Bioanalyzer and an ABI Step OnePlus Real-Time PCR System were used for the quantification and qualification of the sample library. Sequencing of the library was performed using an Illumina HiSeq™ 2000 at Beijing Genome Institute (Shenzhen, China).

Illumina sequencing, de novo assembly of reads, and unigene annotation
The raw reads produced from the sequencing machines were filtered by removing adaptor sequences, low-quality reads, and reads with a percentage of unknown nucleotides of more than 5%. The de novo assembly of clean reads into unigene was carried out using a short reads assembling program: Trinity [66]. The resulting sequences were called unigenes. Gene family clustering then was performed, and the unigenes were divided into two classes. One class included clusters with the prefix CL followed by the cluster id. There were several unigenes in one cluster, and the similarity between them was more than 70%. The other class included singletons with the prefix unigene. A Blastx alignment (E-value < 1.0E -5 ) was carried out between the unigenes and protein databases (NR, Swiss-Prot, KEGG and COG), and the best alignment results were used for the sequence direction determination of the unigenes. When the results of different databases conflicted with each other, a priority order of NR, Swiss-Prot, KEGG and COG was followed to decide the sequence direction of the unigenes. When a unigene was not aligned with any of the above databases, its sequence direction was determined using ESTScan software.

DGE library preparation and sequencing
Total RNA from flower bud samples in six different development stages (S1, S2, S3, S4, S5, and S6) was first treated with DNase I to degrade any possible contaminating DNA, and then the mRNA was enriched using oligo (dT) magnetic beads. The obtained mRNA was fragmented into short fragments (approximately 200 bp) in fragmentation buffer. First-strand cDNA was synthesized using random-hexamer primers, and the second strand was synthesized using the first strand as template. The double-stranded cDNA was purified with magnetic beads. End reparation and 3'-end single nucleotide A (adenine) addition were then performed. Finally, sequencing adaptors were ligated to the fragments, and the fragments were enriched by PCR amplification. During the QC step, an Agilent 2100 Bioanalyzer and an ABI StepOnePlus Real-Time PCR System were used to qualify and quantify the sample library. The library products were then ready for sequencing via Illumina HiSeq TM 2000.

DGE analysis
By base calling, the original image data produced by sequencing were transferred into sequences (raw reads). To obtain clean reads for further analysis, the raw reads were filtered by removing adaptor sequences, low-quality reads, and reads with a percentage of unknown bases (N) of more than 10%. The clean reads were then mapped to the transcriptome of a developing flower reference database using SOAPaligner/SOAP2, and no more than 2 mismatches were allowed in the alignment. The expression level of each gene was determined by the numbers of reads that were uniquely mapped to the specific gene and the total number of uniquely mapped reads in the sample. The gene expression level was calculated using RPKM method (reads per kb per million reads) [67]. If there was more than one transcript for a gene, the longest transcript was used to calculate its expression level and coverage.
A false discovery rate (FDR) 0.001 and an absolute value of log 2 Ratio ! 1 were used as the threshold to determine the DEGs. The DGEs were then subjected to GO and KEGG Ontology (KO) enrichment analysis using hypergeometric testing. The enriched P-values were calculated as follows: P ¼ 1À

Quantitative real-time PCR (qRT-PCR) analysis
To confirm the validation of DGEs, total RNA was isolated from the same samples used for DGE establishment. Moreover, total RNA was extracted from flower buds at eight different development phases (S1, S2, S3, S4, S5, S6 stage, and S6 stage are further divided into S6-1, S6-2 and S6-3 stage), and used for expression pattern analysis of 15 selected sex-related genes ( Fig  1). The total RNA (1 μg) of each sample was used for first-strand cDNA synthesis using AMV RNA PCR Kit 3.0 (Takara). These cDNA samples were then used for real-time PCR using an ABI StepOnePlus Real-Time PCR System (Applied Biosystems, Inc. USA) with 2× SYBR green PCR mix (QIAGEN, Shanghai, China). The cycling procedure was 95°C for 3 min, followed by 40 cycles of 94°C for 10 s, 59°C for 10 s and 72°C for 40 s. The β-tubulin gene was chosen as the endogenous reference gene for the qPCR analysis. The sequences of the applied primers are given in supporting information S1 File (Tables A and B). Three replicates were analyzed for each gene. The average threshold cycle (Ct) was calculated, and the relative expression level of each gene was then calculated according to the 2 -⊿⊿Ct method. A one-way ANOVA analysis of the gene expression level of the samples at different development stages was performed using the software Statistical Package for the Social Science (SPSS) version 11.5 (SPSS Inc., Chicago, IL, USA). The data were subjected to log-transformation if necessary. The individual treatment means were compared using the LSD (least significance difference) test.
Supporting Information S1 File. Primers for the genes and reference gene used for the validation of expression profiles (Table A). Primers for the genes and reference gene used for the expression pattern analysis (Table B)