Comparative Transcriptome Analysis Reveals Sex-Biased Gene Expression in Juvenile Chinese Mitten Crab Eriocheir sinensis

Sex-biased genes are considered to account for most of phenotypic differences between males and females. In order to explore the sex-biased gene expression in crab, we performed the whole-body transcriptome analysis in male and female juveniles of the Chinese mitten crab Eriocheir sinensis using next-generation sequencing technology. Of the 23,349 annotated unigenes, 148 were identified as sex-related genes. A total of 29 candidate genes involved in primary sex determination pathways were detected, indicating the sex determination cascade of the mitten crab might be more complex than previously supposed. Differential expression analysis showed 448 differentially expressed genes (DEGs) between the two transcriptomes. Most of DEGs were involved in processes such as metabolism and immunity, and not associated with obvious sexual function. The pathway predominantly enriched for DEGs were related to lysosome, which might reflect the differences in metabolism between males and females. Of the immune DGEs, 18 up-regulated genes in females were humoral immune factors, and eight up-regulated genes in males were pattern recognition receptors, suggesting sex differences of immune defense might exist in the mitten crab. In addition, two reproduction-related genes, vitellogenin and insulin-like androgenic gland factor, were identified to express in both sexes but with significantly higher level in males. Our research provides the first whole-body RNA sequencing of sex-specific transcriptomes for juvenile E. sinensis and will facilitate further studies on molecular mechanisms of crab sexual dimorphism.


Introduction
Sexual dimorphism, which differentiates males and females in morphological, physiological and behavioral characteristics, is a common phenomenon in the animal kingdom. Based on the nearly identical genomes, the phenotypic differences between the sexes are thought to largely result from sex differences in gene expression [1,2]. So far, sex-biased gene expression purified from total RNA using Dynabeads mRNA Direct Micro kit (Ambion, USA). The final concentration was determined using a NanoDrop2000 spectrophotometer (Thermo Scientific, USA).

cDNA library construction and sequencing
Two libraries for male and female juveniles of E. sinensis were constructed in this study. For library preparation, equal amounts of purified mRNA samples from two biological replicates were pooled together for the cDNA synthesis. The cDNA libraries were prepared using an Ion Total RNA-seq kit v2 (Life Technologies, USA), with 100 ng fragmented mRNA. Adapter ligation, size selection, nick repair and amplification (12 cycles) were performed as described in the Ion Proton protocol associated with the kit. The resulting cDNA libraries were purified by AMPure beads (Beckman Coulter, USA), and their concentrations and sizes were determined by Agilent2100 (Agilent, USA). Emulsion PCR and enrichment steps were performed using an Ion PI Template OT2 200 kit v3 (Life Technologies) according to the manufacturer's instructions. PI Chips were loaded according to the spinning protocol and sequencing was performed on Ion Proton Sequencer using the Proton 200 sequencing kit (Life Technologies). Base calls were collected with Torrent Suite using v4.0.2 software.

Sequence assembly
The raw data for each pool of samples were separately trimmed to remove adaptors and low quality regions (< Q20). Reads with a length of less than 50 bp were also discarded. The remaining reads without ambiguous bases (N) were de novo assembled in a unique file by Trinity (http://trinityrnaseq.sourceforge.net/) with k-mer length of 25 referring to the strategy of Grabherr et al. [24]. Trinity combining three independent software modules: Inchworm, Chrysalis and Butterfly, applied sequentially to process large volumes of RNA-seq reads into contigs, de Bruijn graphs and full-length transcripts.

Gene annotation
Gene functional annotation was performed by sequence comparison with public databases. All assembled transcripts were searched against the NCBI non-redundant (nr) protein database (http://www.ncbi.nlm.nih.gov/) using BlastX algorithm with an E-value cutoff of 1E-05. The unigenes were obtained after clustering the top hit results. Gene Ontology (GO) annotations were determined using Blast2GO to obtain a functional classification of the unigenes [25]. Egg-NOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) was performed to predict and classify potential functions of the unigenes based on known orthologous gene products [26]. EC (Enzyme Commission) terms and biochemical pathway information were generated by Kyoto Encyclopedia of Genes and Genomes (KEGG) database [27]. In addition, the unigenes associated with sex determination and differentiation were identified manually according to the annotation in consulting published literature and public datasets. and 6/5. Compound microsatellites were defined as repeats interrupted by a non-repetitive sequence of a maximum of 100 nucleotides.

Differential expression analysis
The reads for a specific transcript were counted by mapping reads to assembled unigene sequences. The unigene expression was calculated using the reads per kb per million reads (RPKM) method [31]. Differentially expressed genes (DEGs) were identified by the DESeq program [32]. The fold change values > 2 and false discovery rates (FDR) adjusted significance values < 0.05 (-log10 (0.05) = 1.3) were considered as the threshold to judge the significance of unigene expression.
GO, eggNOG, KEGG Orthology (KO) and KEGG pathway enrichment analyses were used to categorize DEGs and evaluate DEGs in the potential biological pathways. Processes, functions or components in the GO and KEGG pathway enrichment analyses with p-values less than 0.05 (-log10 (0.05) = 1.3) were considered to be significantly different in the DEGs. Based on public databases and the published literatures, the crucial DEGs related to metabolism, immunity and reproduction were manually checked.
The DEG encoding insulin-like androgenic gland factor (IAG) was selected for further sequence and phylogenetic analysis. Multiple amino acid sequence alignment was performed using the Clustal X with the default settings [33]. Neighbor-joining tree with bootstrap values were constructed for phylogenetic analysis using MEGA 4.0 [34]. All the reference sequences for phylogenetic analysis were derived from GenBank.

Quantitative Real-time PCR (qRT-PCR) validation
To validate RNA-seq data and expression profiles obtained from DESeq analysis, mitten crabs at the same developmental stage, the independent samples of RNA-seq, were used for real-time PCR analyses. Three biological replicates were prepared for male and female juveniles. Approximately 2.5 μg of total RNA of each sample obtained as previously described were treated with DNase I (Ambion, USA) at 37°C for 1 h, reverse-transcribed using M-MLV reverse transcriptase (Promega, China) and amplified by qRT-PCR.
The SYBR Green I RT-PCR assay was carried out in an ABI PRISM 7300 Sequence Detection System (Applied Biosystems, USA). The gene-specific primers designed for the 12 genes are listed in S1 Table. These 12 genes all exhibited large significant differences in expression between male and female libraries. The β-actin from E. sinensis was chosen as reference gene for internal standardization. The PCR was carried out in a total volume of 10 μl, containing 5 μl of 2×SYBR Premix Ex Taq (TaKaRa, China), 0.2 μl 50×ROX Reference Dye, 2 μl of the diluted cDNA mix, 0.2 μl of each primer (10 μM), and 2.4 μl of sterile distilled H 2 O. The PCR program was 95°C for 5 min, followed by 40 cycles of 95°C for 5 s and 60°C for 31 s. Each sample was run in triplicate along with the internal control gene. To confirm that only one PCR product was amplified and detected, dissociation curve analysis of amplification products was performed at the end of each PCR reaction. After the PCR program, data were analyzed with ABI7300 SDS software (Applied Biosystems). Fold change for the gene expression relative to controls was determined by the 2 -ΔΔCt method [35]. All data were given in terms of relative mRNA expression as mean ±S.E. The results were subjected to one-way analysis of variance (one way ANOVA) using SPSS 13.0, and the P values less than 0.05 and 0.01 were considered statistically significant.

Transcriptome sequencing and assembly
Two cDNA libraries were generated with pooled mRNAs from the whole bodies of female and male juveniles of E. sinensis. After quality trimming and the removal of adapters, sequencing runs performed on Ion Proton platform produced a total of 42,979,050 reads for the female and 47,560,370 reads for the male. All data were deposited in NCBI Short Read Archive database with accession numbers SRX554564 and SR554562. Clustering and assembly of these reads resulted in 282,954 contigs with an average length of 274 bp. Further assembly analysis showed all contigs contributed to 151,128 transcripts with an average length of 614 bp (S2 Table).

Functional annotation
BlastX searches of all assembled transcripts against the NCBI nr database revealed 23,349 unigenes with significant matches to existing protein sequences above the preset cutoff value. The average length of these annotated unigenes was 988 bp with N50 length of 1,375 bp (S2 Table).

SNP and SSR discovery
Using SAMtools/VarScan software, 48,753 SNPs and 15,271 indels were identified from E. sinensis unigenes. The overall frequency of all types of SNPs, including indels, was one per 360 bp ( Fig 1A). Transition occurred 2.7 times more frequently than transversion. A/G was the most abundant transition (28.3%), and A/T was the most abundant transversion (5.3%). Indels were less frequent than transitions, with a frequency of one per 1,511 bp and a total proportion of around 23.9%. Of the identified SNPs and indels, 55,230 were found in both female and male E. sinensis unigenes. 3,411 SNPs and indels were found exclusively in female unigenes, while 5,383 were just present in male unigenes ( Fig 1B).
In addition, using MISA, 3,928 perfect and 262 compound SSRs were detected from E. sinensis unigenes (S3 Table). Trinucleotide repeats were the most frequent type, counting a total number of 2,286 (54.6%) with AGG/CCT as a major motif accounted for 31.4% of all trinucleotide repeats. Dinucleotides repeats were the second most frequent type accounting for 37.3% of all SSRs with the abundance of AC/GT motif. And the other two distinguished types, compound and tetranucleotide repeats, accounted for less than 10% of all SSRs.

Genes related to sex determination and differentiation
Of the total unigenes, 148 were identified as sex-related genes based on sequence annotation and the published literature (S4 Table). These genes could be categorized into 36 groups and  putatively involved in sex determination and differentiation, oogenesis, spermatogenesis and gonad development. Among these unigenes, 40 were classified as testis or sperm-specific genes, and 24 were ovary or oocyte-specific genes. Eleven unigenes annotated as SOX (SRY related HMG box) gene family were found, including SOX2, SOX5, SOX8, SOX14, SOXB1, SRY, SRYbox containing protein 8 and B2b2.
A total of 29 genes with similarity to those involved in primary sex determination pathway were identified (Table 1). Eight genes, including two fruitless (Fru), two transformer-2 (Tra-2) and four feminization-1 (Fem-1) genes, were associated with the sex determination in Drosophila melanogaster and Caenorhabditis elegans. Fourteen genes involved in mammalian female sexual development were detected: five genes belonging to the WNT signaling pathway (RSPO-1, WNT4 and β-catenin), a transcription factor (FOXL2), three estradiol receptors (ER), and five activin-binding proteins (FST). Seven genes involved in mammalian male sexual development were examined: a doublesex and mab-3 related transcription factor 2 (Dmrt2), three prostaglandin D synthases (PGDS), and three genes belonging SOX transcription factors (SRY and SOX8).
The primary sex determination pathway related genes were expressed in both female and male E. sinensis transcriptomes (S3 Fig). EsPGDS was the most highly expressed transcript (RPKM value > 800), followed by EsER3 (RPKM value > 50) and EsHPGDS (RPKM value > 40). EsDmrt2, EsPGDS and EsSRY presented higher expression pattern in males (fold change values > 1.5), while EsFST5, EsRSPO-1p and EsFST5p showed preferential expression in females (fold change values > 1.5). The other 23 genes were similar expressed in females and males.
From the 148 unigenes annotated for sex-related genes, 18 SNPs were identified in female individuals and 23 in male individuals (S5 Table). Most unigenes had only one SNP, except for vitelline membrane outer layer protein I, SOX14, oocyte zinc finger protein, male reproductive-related microfibril-associated protein spermatogenesis-associated protein 5 and 20. In addition, five compound and 18 perfect SSRs were detected from the sex-related unigenes (S6 Table). The size of SSRs ranged from 12 to 242, with trinucleotide repeats as the most frequent type. Three sex-related genes, including SRY-box containing gene 8, SRY interacting protein 1 and zinc finger protein 76, were found to contain both SNPs and SSRs.

Differential expression between the sexes
The DESeq method identified 448 differentially expressed unigenes between the two transcriptomes, including 188 up-regulated in females and 260 up-regulated in males (S7 Table). The distribution of the significant changes detected was illustrated in a volcano plot, where the statistical significance of each unigene was plotted against fold change (S4 Fig). Sequences with the highest average differences between the sexes also had the smallest false discovery rate (FDR) values.
KEGG analysis revealed that 139 DEGs with KO terms were involved in 29 predicated biological pathways (S5 Fig). The most abundant pathways included 'carbohydrate metabolism' (36 DGEs), 'glycan biosynthesis and metabolism' (24 DGEs), 'transport and catabolism' (24 DGEs) and 'lipid metabolism' (16 DGEs). The significantly distinct categories were 'carbohydrate metabolism', 'glycan biosynthesis and metabolism', 'metabolism of cofactors and vitamins', 'transport and catabolism', 'digestive system, 'lipid metabolism', 'xenobiotics biodegradation and metabolism' and 'amino acid metabolism' (p < 0.05, S5 Fig). Except for metabolic pathway, the pathway predominantly enriched for DEGs were related to lysosome (ko04142) of 'transport and catabolism' (Fig 3). There were 19 DEGs with ten KO terms in lysosomal pathway, including 18 up-regulated in males and one up-regulated in females. In addition, one important pathway identified as enriched for the up-regulated genes in males were associated with sphingolipid metabolism (ko00600) of 'lipid metabolism' (S6 Fig). Candidate DEGs involved in metabolism, immunity and reproduction By sequence annotation and functional classification, we identified 61 differentially expressed unigenes related to metabolism, immunity and reproduction. The metabolism-related DGEs could be categorized into four classes, including five female-biased genes in amino acid metabolism, ten male-biased genes in lipid metabolism, six male-biased genes in glycan metabolism, and three male-biased genes in cytochrome P450 (CYP450) superfamily (Table 2). A total of 30 immunity-related DGEs were found, including 18 up-regulated humoral immune factors in females, eight up-regulated pattern recognition receptors (PRRs) (CLEC and LGBP) and four other up-regulated immune factors in males ( Table 3). The female-biased immune DGEs were largely involved in antimicrobial peptide synthesis (Lyz, Crustin and Carcinin), prophenoloxidase (proPO) system (SP, PPAF and Kazal) and antioxidant system (SOD and Trx). Of the reproduction-related DGEs, two genes involved in ovarian development showed female-biased expression, while five genes, including two Vg homologs, VMO1, IAG and NPC2, showed male-biased expression (Table 4). Sex-Specific Transcriptomics in Juvenile E. sinensis A partial sequence of 741 bp (comp61312_c0_seq3), without an ATG codon but with a TAA stop codon at position 526 bp, was identified to encode an IAG ortholog (EsIAG, S7 Fig). The deduced amino acid sequence was 132 aa in length and contained B chain and A chain, with three disulfide bridges (between C B9 and C A12 , C B20 and C A25 , C A11 and C A16 , respectively). Multiple sequence alignment of EsIAG and six decapod IAGs showed the conserved cysteine residues shared by all sequences (S7A Fig). EsIAG displayed 36.3% amino acid identity with Callinectes sapidus IAG1, 32.5% with C. sapidus IAG2 and 33.1% with Scylla paramamosain IAG. The phylogenetic tree showed that the IAGs formed two major clades: one with three isopods and the other containing five subclades from the decapods (S7B Fig). The crab IAGs were separated in two gruops. EsIAG was clustered with IAGs from the Atlantic blue crab C. sapidus and the mud crab S. paramamosain to form one group. Another group included IAG from the blue swimmer crab Portunus pelagicus, which had a closer relationship with two IAGs from crayfish.

qRT-PCR validation of RNA-seq data
To verify the result of RNA-seq analysis, 12 DGEs based on RNA-seq were selected for qRT-PCR to further investigate the expression profiles. The expression patterns from qRT-PCR showed general agreement with those from the RNA-seq (Fig 4). Five genes including AspRS, NIT2, Lyz, Crustin and PPAF were up-regulated in females, while Vg, ARSA, Se-Gpx, CYP3A4, CLEC, LGBP and IAG were up-regulated in males. Among these genes, NIT2 showed the largest up-regulation in females and CYP3A4 manifested the largest up-regulation in males, which was consistent with RNA-seq results. The consistent expression between qRT-PCR and RNA-seq analyses confirmed the accuracy of Proton results.

Whole-body reference transcriptome of male and female Eriocheir sinensis
Compared with the available data from database and the previously reported transcriptomes from male gonads, the present study, providing more than 10 Gb clean data, is the first wholebody RNA sequencing of sex-specific transcriptomes for juvenile Chinese mitten crab E. sinensis. This resource expands the limited amount of sex-related sequence data of E. sinensis, which will facilitate further molecular investigation on sexual dimorphism of crab. Similar to recent studies in E. sinensis [36][37][38], only about 15% of all assembled transcripts are successfully matched in the nr database, probably due to the limited number of crustacean sequences in public database. However, this percentage is higher than that in gonad transcriptomes of S. paramamosain [17], supposedly because the whole bodies of juvenile are used and more extensive sequencing depth are applied in our transcriptomes. From our sequencing effort, a large number of SNPs and SSRs are detected for future genetics studies. This database, especially the sex-related markers, will play important roles in the exploration and utilization of sex-related genes, and may provide powerful tools for early gender identification and breeding in this commercially important species.  Sex-Specific Transcriptomics in Juvenile E. sinensis

Candidate genes involved in sex determination and differentiation
Though several sex-related genes were obtained previously [39][40][41], little information about sex determination cascades is available in the mitten crab. GO analysis and orthology prediction supply gene function classification labels and an overall framework for the sex-specific transcriptomes. Two Tra-2 and two Fru genes are identified in our transcriptome database. The Tra-2 and Fru genes, by promoting female sexual development or directing male sexual behavior, have been proved to participate in the Drosophila sex determination cascade [42,43]. Together with the previously cloned sex-lethal (Sxl) [41] and our identified double sex (dsx) [44], most of ortholog genes in 'Sxl-Tra/Tra-2-Dsx/Fru' pathway are detected in the mitten crab. As suggested in the penaeid shrimp [16,45], the present data provides a hint that the mitten crab might adopt a similar sex determination pathway with that in Drosophila. However, the absence of Tra and lack of a sex specificity of Sxl in the mitten crab, also reported in the silkworm Bombyx mori [46][47][48], imply that genetic regulation for sex determination potentially does not initiate from Sxl.
We discover four homologs of Fem-1 that have not been identified previously in E. sinensis. In C. elegans, Fem-1, encoding an ankyrin repeat protein Fem-1, is a component of the signal transduction pathway controlling sex determination [49]. In our high density linkage map of E. sinensis, another ankyrin repeat-containing gene is found to be located on the putative sex chromosome, suggesting its possible role in sex determination [44]. However, in the present study, the four homologs of Fem-1 are expressed similarly in both males and females. Whether Fem-1 participates in the process of sex differentiation in the crab has yet to be established.
Besides the above genes, many genes involved in mammalian sex determination cascade are reported in E. sinensis, further indicating the complex sex determination system of the mitten crab. In mammals, the presence of the male-determining SRY gene directs the undifferentiated gonad to develop into testes by promoting the expression of SOX9 [50]. Early ovarian development has long been considered to be a default pathway switched on passively by the absence of SRY gene. However, recent reports have revealed that FOXL2-leading pathway and RSPO-1activating signaling pathway act independently and complementary to each other to promote ovarian development [51][52][53]. In invertebrates, orthologs of FOXL2 have been characterized, but without a good understanding of their role in reproduction [54][55][56][57]. Though the orthologs genes in 'RSPO-1/WNT4/β-catenin' signaling pathway are detected, further investigations are needed to determine whether this pathway exists in the mitten crab.

Patterns of gene expression between the sexes
Differential expression analysis reveals 188 and 260 significantly expressed unigenes in female and male transcriptomes, respectively. These results show a slight imbalance in favor of the male sequences. This tendency has also been reported in adult transcriptomes of Caligus rogercresseyi [58] and gonad transcriptomes of the green mud crab [17] and other species such as Acipenser fulvescens [59] and Haliotis rufescens [60]. Coinciding with that in male and female rainbow trout embryos [20], most DEGs in this study are not related to sexual function. Further GO and KEGG analyses reveal these DEGs are largely involved in biological processes, such as lipid metabolism, glycan metabolism, transport and catabolism, and immune system process. This suggests that there are inherent and broad differences in the transcriptomes of male and female mitten crab, and that these differences are present before sexual maturation.
Lysosomes are membrane-enclosed organelles that contain an array of enzymes capable of breaking down all types of biological polymers-proteins, nucleic acids, carbohydrates and lipids [61]. Of DEGs involved in lysosomal pathway, one gene encoding protease legumain is up-regulated in females, while other genes encoding glycosidases, sulfatases, sphingomyelinase and membrane proteins are up-regulated in males. Together with the identified DEGs related to metabolism, this differential expression pattern might due to the higher level of amino acid metabolism in females as well as higher levels of glycan and lipid metabolisms in males. Three metabolism-related CYP450 genes, CYP2B19, CYP3A4 and CYP9Z7, were identified as malebiased DGEs. Enzymes in CYP2 and CYP3 families, especially CYP3A4, have important roles in steroid biosynthesis and metabolism in human [62]. The higher transcripts of CYP450 enzymes in males might indicate male mitten crabs require a large amount of steroid during early juvenile stages.
Sex differences in the immune defense, where females show greater immunity or resistance to infection, have been demonstrated for several arthropods [63][64][65][66][67]. In many cases, females have higher levels of hemocytes or PO activity than males in the absence of infection. Most of these studies have focused on insects such as butterflies, crickets, dragonflies and scorpionflies, with relatively few studies on crustaceans. Here, we first report some genes that potentially contribute to the sex differences in the immune system of the mitten crab.
Humoral immune responses that mainly occur in hemolymph include proPO system, clotting cascade and a wide array of antimicrobial peptides [68]. The identified female-biased humoral immune factors indicate females have greater lysozyme and PO activities in hemocytes, which is in agreement with those studies in insects [64][65][66]. PRRs, as a set of germlineencoded receptors, can interact with pathogen associated molecular pattern (PAMP) and activate innate immune response [69]. The higher transcripts of PRRs in males might suggest male mitten crabs could trigger quick and effective defense responses in the presence of pathogens infection. Sex differences in immunity appear to be related to differential reproductive strategies and the resulting resource trade-offs in life history. The identified immune DEGs provide a framework for future research to unravel the mechanism of six-biased immune regulation in crab.
Among the reproduction-related DGEs, two special genes vitellogenin (Vg) and insulin-like androgenic gland factor (IAG) were identified. Vg, usually considered as a female specific protein, could serve as energy resource for embryonic development [70]. Apart from its nutritional function, Vg has been shown to play important roles in innate immunity by acting as a multivalent pattern recognition receptor, a bactericidal molecule or an acute phase protein [71]. The Vg gene is normally silent in males, but can be activated by estrogen exposure [72,73]. Interestingly, we detected the expression of Vg in the normal physiological conditions of male E. sinensis, which is consistent with the finding in European honey bee Apis mellifera [74,75] but contrary to most studies in crustaceans [76][77][78]. This male-biased expression suggests that Vg might have functions in addition to its roles in oocyte maturation and energy supply for embryogenesis. IAG, a key regulator of male sex differentiation in crustaceans [79], is first discovered from E. sinensis transcriptomes. EsIAG shows remarkable structural similarity and sequence homology with other IAGs, suggesting that it is a member of the insulin/insulin-like growth factor family. By qRT-PCR validation, EsIAG is expressed in both sexes with a significantly higher level in males. It suggests that EsIAG might be not expressed exclusively in the male AG, which is also reported in S. paramamosain [79], C. sapidus [80] and Fenneropenaeus chinensis [81].
In conclusion, this is the first whole-body, sex-specific transcriptomes of juvenile E. sinensis using RNA-seq technology. More than 90 million clean reads were obtained, and some candidate genes in sex determination and differentiation were found. A large number of differentially expressed genes between the sexes were identified, and most of them had no obvious sexual function. Many potential SNPs and SSRs were detected that could be used for further gender identification and genetic breeding studies. This study will not only provide valuable genetic resources for the understanding of sexual dimorphism in E. sinensis, but also facilitate further investigations of functional genomics for this species and other closely related species.