Developmental Transcriptomic Features of the Carcinogenic Liver Fluke, Clonorchis sinensis

Clonorchis sinensis is the causative agent of the life-threatening disease endemic to China, Korea, and Vietnam. It is estimated that about 15 million people are infected with this fluke. C. sinensis provokes inflammation, epithelial hyperplasia, and periductal fibrosis in bile ducts, and may cause cholangiocarcinoma in chronically infected individuals. Accumulation of a large amount of biological information about the adult stage of this liver fluke in recent years has advanced our understanding of the pathological interplay between this parasite and its hosts. However, no developmental gene expression profiles of C. sinensis have been published. In this study, we generated gene expression profiles of three developmental stages of C. sinensis by analyzing expressed sequence tags (ESTs). Complementary DNA libraries were constructed from the adult, metacercaria, and egg developmental stages of C. sinensis. A total of 52,745 ESTs were generated and assembled into 12,830 C. sinensis assembled EST sequences, and then these assemblies were further categorized into groups according to biological functions and developmental stages. Most of the genes that were differentially expressed in the different stages were consistent with the biological and physical features of the particular developmental stage; high energy metabolism, motility and reproduction genes were differentially expressed in adults, minimal metabolism and final host adaptation genes were differentially expressed in metacercariae, and embryonic genes were differentially expressed in eggs. The higher expression of glucose transporters, proteases, and antioxidant enzymes in the adults accounts for active uptake of nutrients and defense against host immune attacks. The types of ion channels present in C. sinensis are consistent with its parasitic nature and phylogenetic placement in the tree of life. We anticipate that the transcriptomic information on essential regulators of development, bile chemotaxis, and physico-metabolic pathways in C. sinensis that presented in this study will guide further studies to identify novel drug targets and diagnostic antigens.


Introduction
Clonorchis sinensis causes clonorchiasis, which is endemic to Korea, China, Taiwan and Vietnam; approximately 15 million people are estimated to be infected [1 -3]. C. sinensis is a significant pathogen both from an epidemiological and clinical perspective, as people who develop clonorchiasis are debilitated, thereby negatively impacting socio-economic activities. In C. sinensis endemic areas, inhabitants become infected by eating raw or inappropriately cooked fresh water fish caught from water bodies near their villages [4]. Fresh water fish are the hosts of C. sinensis metacercariae, which is the infective stage to humans.
Once C. sinensis eggs reach a fresh water body, they develop into miracidiae. When ingested by freshwater snails, the miracidia escapes from the egg and transforms into sporocysts, and then into rediae within several weeks. The cercariae emerge into fresh water and swim in search of freshwater fish, the second intermediate host. The cercaria penetrates the skin of a freshwater fish and its body becomes encysted by a cyst wall, followed by transformation into a metacercaria. Almost all freshwater fish can serve as the second intermediate hosts, with the highest infection rate and metacercarial burden found in the topmouth gudgeon, Pseudorasbora parva. When ingested by humans and other mammals, the metacercariae are retained for a while in the stomach and then passed down to the duodenum. The metacercariae excyst there and the hatched, juvenile C. sinensis migrate up into the bile duct. The juvenile flukes grow to adults that produce eggs in the biliary passages of the mammalian host.
C. sinensis flukes in the biliary tracts lacerate and apply pressure to the epithelia, and excrete waste products from their excretory bladder and regurgitate residual digests from their intestinal ceca. In addition, ovigerous C. sinensis adults excrete uterine fluid with high protein content when they ovulate. These various excretory and secretory products act as chemical irritants that provoke inflammation, epithelial hyperplasia, and periductal fibrosis in the biliary tracts. In human clonorchiasis patients, frequent symptoms are epigastric discomfort and dull pain, mild fever, loss of appetite, diarrhea, and jaundice [5]. Moreover, clonorchiasis has epidemiologically been reported to be associated with cholangiocarcinoma [6][7][8]. Furthermore, experimental studies have shown that C. sinensis infection induces the differentiation of liver oval cells into a bile duct cell lineage and promotes the development of cholangiocarcinoma in golden hamsters [9,10]. Recently, C. sinensis was officially classified along with Opisthorchis viverrini as a Group 1 biological carcinogen by the World Health Organization [11]. Among the excretory-secretory products produced by liver flukes, granulin was identified as a mitogenic agent capable of stimulating cell proliferation and epithelial hyperplasia [12]. By binding to Toll-like receptors, the excretory-secretory products of adult flukes activate the NF-kB pathway resulting in increased expression of the pro-inflammatory cytokine, IL-6 [13], which in turn leads to the production of reactive oxygen radicals. The endogenous reactive radicals damage DNA and could initiate carcinogenesis [14].
Expressed sequence tags (ESTs) generated from cDNA libraries cover a large proportion of functional mRNAs and can be assembled into overlapping contigs coding for almost complete open reading frames [15]. Schistosoma mansoni and S. japonicum were the first flukes for which transcriptome data was published [16,17]; these studies stimulated the generation of ESTs and functional cataloging of these ESTs from the human-infecting liver flukes, C. sinensis, O. viverrini, and Fasciola hepatica [18][19][20][21][22]. The recent widespread availability of next-generation sequencing technology has also stimulated high-throughput analyses of the transcriptomes of liver flukes with a focus on the pathobiological characteristics of these adult liver fluke transcriptomes [23,24]. Given that C. sinensis adults are considered carcinogenic agents, they are predicted to express genes encoding proteins that are also known to be involved in cancer development [25].
To further elucidate the pathogenesis and carcinogenesis provoked by C. sinensis infection, comprehensive molecular and genetic information covering the different developmental stages of this parasite is required. In this study, we generated and sequenced transcriptome-scale ESTs from three developmental stages of C. sinensis and investigated the biological properties, growth, host adaptations, and pathogenic features of these different developmental stages.

Ethics statement
Rabbits were handled in an accredited Korea Food and Drug Administration animal facility in accordance with the AAALAC International Animal Care policies (Accredited Unit, Korea FDA; Unit Number 000996). Approval for animal experiments was obtained from Korea FDA animal facility (NIH-06-15, NIH-07-16 and NIH-08-19).
Parasite resources, culture and RNA extraction C. sinensis metacercariae were collected from naturally infected P. parva caught in Jinju, Korea, and Shenyang, China. The fish were ground and digested artificially in gastric juice for 1 hr at 37uC. Particulate material was filtered out using a sieve with 0.15 mm mesh and washed several times with 0.85% saline. C. sinensis metacercariae were identified and collected under a dissecting microscope. Male New Zealand White rabbits, 1.5-3.0 kg (Samtaco Inc., Korea) were infected with 500 metacercariae each, and adult flukes were recovered from the bile ducts of these experimental rabbits 2 months after the infection. Bile juice collected from C. sinensis-infected rabbits was centrifuged at 2,000 g for 10 min and C. sinensis eggs were collected from the sediment. To extract total RNA, adult flukes, metacercariae, and eggs of C. sinensis were put into liquid nitrogen in a pre-chilled mortar on dry ice and pulverized using a Mixer Mill MM301 (Retsch GmbH, Haan, Germany). Total RNA was extracted from the ground tissues using TRI reagent (MRC, Inc., Cincinnati, OH, USA). Poly(A + ) mRNA was selected from the total RNA using the Absolutely mRNA Purification Kit (Stratagene, La Jolla, CA, USA) according to the manufacturer's instruction. The amounts of total RNA and mRNA were determined by measuring the absorbance at 260 nm and the degree of protein contamination was assessed by calculating the ratio of the absorbance at 260 nm to that at 280 nm. RNA integrity was assessed by examining ribosomal RNA bands on 1% RNA agarose gels stained with ethidium bromide.

Construction of cDNA libraries
Using the poly(A + ) mRNAs generated as described above, cDNA libraries of the three developmental stages of C. sinensis (adult, metacercaria, and egg) were constructed using the directional l ZAP cDNA synthesis/Gigapack III Gold cloning kit (Stratagene, La Jolla, CA, USA). First stand cDNAs were synthesized from mRNAs primed at the poly-A tail using reverse transcriptase and an oligo-dT linker-primer containing an XhoI restriction enzyme site. Following second strand synthesis, an EcoR I linker was ligated to the 59-termini followed by digestion with the restriction enzyme XhoI. These synthesized and assembled double strand cDNAs were size-fractionated using SepharoseH CL-2B gel filtration column chromatography. cDNA fractions longer than 500 bp were ligated into the ZAP Express vector pBK-CMV and the ligation products were packaged in vitro into cDNA libraries

Author Summary
Clonorchis sinensis is a significant pathogen that causes clonorchiasis, which is endemic to East Asian countries. This fluke provokes acute inflammation and chronic hyperplasic changes in the biliary tracts. C. sinensis promotes cholangiocarcinoma, and has been classified as a Group 1 biological carcinogen, alongside Opisthorchis viverrini, by the World Health Organization. Recently, transcriptomes for adult liver flukes have been reported with the molecular functionalities predicted on the bases of their transcriptomic data sets. We generated the developmental C. sinensis transcriptome for three different developmental stages, revealing that most functional genes were differentially expressed in each developmental stage; only a small proportion of the expressed genes were shared between the three stages. The developmental transcriptome describes the gene expression landscapes of C. sinensis adults, metacercariae, and eggs, and provides insight into how this fluke adapts to the distinctly different environments provided by its various hosts. We anticipate that the transcriptome will contribute significantly to the identification of intervention points along the developmental stages and allow the exploitation of novel potential targets for diagnostic, drug, and vaccine development purposes.
Developmental Transcriptome of Clonorchis sinensis www.plosntds.org using the ZAP Express cDNA Gigapack III Gold cloning Kit (Stratagene). cDNAs were directionally cloned into the pBK-CMV vector, which allows both prokaryotic and eukaryotic expression of large sequences and in vivo excision into a phagemid vector. The adult, metacercaria, and egg cDNA libraries were plated onto LBkanamycin plates, 23.5 cm623.5 cm, coated with X-gal/IPTG for blue/white selection. White colonies were randomly picked and inoculated into each well of a 384-well plate (Corning Co., Cortland, NY, USA) containing 40 ml Terrific Broth/kanamycin, followed by incubation for 16 hr at 37uC. For storage, the culture media in the 384-well plates were mixed with an equal volume of glycerol solution (65% glycerin, 0.1 M MgSO4, 0.025 M Tris-HCl, pH 8.0) and stored at 280uC. To assess cDNA quality, additional cDNA libraries were constructed in the pBluescript SK(+) vector for the adult and in the pAD-GAL42.1 vector for the metacercaria.

cDNA sequencing
A total of 60,768 colonies were picked: 30,144 from the adult cDNA library, 20,256 from the metacercaria cDNA library, and 10,368 from the egg cDNA library. Single plasmid colonies were transferred into 540 ml of Terrific Broth medium supplemented with 50 mg/ml kanamycin in a 96-deep well plate and incubated at 37uC overnight with gentle rotation (550 rpm). Plasmids containing C. sinensis cDNA were extracted using an alkaline lysis method [26,27]. The sequences of the cloned C. sinensis cDNAs were determined using the BigDye Terminator Cycle Sequencing Kit, ver. 3.1 (Applied Biosystems, Foster City, CA, USA). Sequencing reactions were performed in a 3-ml volume containing 250 ng plasmid DNA, 0.5 pmole universal primer, 0.87 ml of 5X Sequencing buffer, and 1.38 ml of distilled water. The cycling profile consisted of 35 cycles of denaturation at 96uC for 10 seconds, annealing at 50uC for 5 seconds, and extension at 60uC for 4 min. Sequencing products were purified via ethanol precipitation and read on an ABI 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA). The T3 forward primer and T7 reverse primer were used as sequencing primers.

DNA sequence trimming and assembly
Nucleotide sequences of the 60,768 clones were read once from the 59-end. The vector and adapter sequences were trimmed off all reads, as well as nucleotide stretches with a Phred score of 20 or less and poly A/T stretches [28,29]. Reads shorter than 100 bp were then filtered out of the analyses. A total of 52,745 reads that survived these quality control filters were assembled into clusters using the TGICL and CAP3 programs with the following parameters: an offset of 40 bp overlap, 95% minimum identity, and a maximum mismatched overhang of 30 bp [30,31]. Nucleotide sequences of the reads reported in this paper are registered in the DDBJ/EMBL/GenBank databases under the accession numbers FS126466-FS179210.

Bioinformatic analyses of ESTs
To annotate the assembled EST clusters, the nucleotide sequences of the clusters were translated into putative polypeptide sequences and these sequences were blasted against the NCBI non-redundant nucleotide and protein databases using the parameters of more than 30 matched amino acids, an identity greater than 25%, and an E-value less than 1e 24 using BLASTX. The domain structures of the translated polypeptides were predicted using InterProScan (data version v14.0) and an E-value of less than 1e 24 [32] and their potential function was assessed by gene ontology (GO) analysis. Moreover, we manually curated the gene descriptions in the databases by selecting known genes with a significant E-value to prevent incorrect assignment of annotated genes. To further enhance the reliability of the data and provide more accurate gene predictions for C. sinensis, we chose the species closest to C. sinensis for which data were available.
To generate developmental gene expression profiles of C. sinensis, the number of ESTs in each contig was counted according to developmental stage and analyzed using Fisher's exact test using a significance level of P,0.01 at IDEG6 [38] (http://telethon.bio. unipd.it/bioinfo/IDEG6_form/). Putative SNPs in the EST sequences were determined using the AutoSNP program [39].
To gain insight into the evolutionary history of C. sinensis and to investigate parasitism-related genes, the global similarity of C. sinensis whole ESTs at the amino acid sequence level were compared to those of other parasites and free-living platyhelminthes using the SimiTri program (cut-off score: 50) [40]. Whole EST sequences of comparator organisms were retrieved from the NCBI protein database. The relative similarities of the gene sequences of C. sinensis to those of other species were analyzed using TBLASTX and the SimiTri viewer. Bulk sequences of the comparator species were downloaded from the GenBank EST databases. Sequences with BLAST scores (bit score values) higher than 50 were collected from each large dataset. Nucleotide versus nucleotide comparisons of the dataset of interest to the different databases were performed using the TBLASTX algorithm. The primary data consisted of the similarity values of each sequence from the chosen database. The primary data was transformed into input for the SimiTri viewer. A gene is indicated as a square tile, and these tiles are colored by similarity scores to other datasets. Genes that were similar to genes in only one other database are not shown. Genes that showed similarity to genes in only two databases are shown as lines joining the two databases.

URL
More detailed information and raw data are accessible at http://grc.kribb.re.kr/pipeline2/.

Results and Discussion
The C. sinensis transcriptome To generate transcriptomes and evaluate developmental gene expression in C. sinensis, 60,768 clones were selected randomly from cDNA libraries of the adult, metacercaria, and egg developmental stages. After stringent quality filtering through base calling, vector sequence trimming, repeat masking, and contaminant screening, 52,745 high quality reads remained (success index of 86.8%) consisting of 27,070 reads from the adult stage, 15,872 reads from the metacercaria stage, and 9,803 reads from the egg stage (Table 1). The high quality reads were assembled and clustered into 12,830 C. sinensis assembled EST sequences (CsAEs) comprising 7,184 contigs and 5,646 singletons [31]. The length of the CsAEs ranged from 100-3,328 bp with an average length of 724 bp. Over 50% of the CsAEs were between 500-799 bp. More than 70% of the CsAEs consisted of less than 30 EST members, while the largest single CsAE had 643 EST members.

Developmental gene expression
A total of 52,745 reads collected from the adult, metacercaria, and egg stages were assembled into 7,184 contigs. Of these contigs, 1,887 (26.3%) were shared by two developmental stages; 648 contigs between the adult and egg stages, 974 between the adult and metacercaria stages, and 261 between the metacercaria and egg stages. A small portion of the transcriptome (564 contigs; 7.9%) occurred in all three developmental stages, suggesting that some of these are housekeeping genes expressed constitutively across the life stages of C. sinensis. A large number of the contigs (4,733; 65.9%) occurred in one of the three developmental stages ( Figure S1). This finding suggests that genes associated with growth can be identified from the C. sinensis developmental transcriptome. The expression levels of these genes, as determined by the number of ESTs contained within a cluster, were compared to investigate stage-specific patterns of transcription based on an arbitrary cut-off as well as the statistical significance of the expression differences. In C. sinensis, 119 contigs in the adult stage, 48 in the metacercaria stage, and 134 in the egg stage were significantly differentially expressed. The majority of CsAEs obtained from each developmental stage were non-annotated or hypothetical transcripts. The unknown CsAEs were broadly distributed at each stage, indicating that C. sinensis may have a unique developmental mechanism compared to other parasites.

Functional annotation
Our manual curation of the search reports generated by BLASTX and InterPro increased the annotation rate and accuracy of the functional notations of the CsAEs. Of the 12,830 CsAEs, 7,132 (55.6%) were found to have significant sequence similarities to sequences in the NCBI NR database and/ or in InterPro [41], while the remaining 5,698 (44.4%) CsAEs had no homolog ( Figure S2). One-half of the proteins translated from the CsAEs were annotated by BLASTX and found to be similar to proteins found in Schistosoma japonicum. Among them, the sequences of hsp90, RNA binding protein, and actin were highly conserved between C. sinensis and S. japonicum. Another 44% of the CsAEs were unique gene transcripts with no homolog in the databases searched.

Gene ontology
To predict the functions of the C. sinensis genes, we independently classified a total of 12,830 CsAEs into functional categories by analyzing automated gene ontology assignments [42]. The most accurate method to identify new members of known gene function among gene transcripts is to retrieve a sequence-based homology of the translated transcripts using domains extracted from a multiple alignment of gene members with known functions [43]. To functionally categorize CsAEs using domain homology searches, we translated the 12,830 CsAEs in six reading frames and recruited them into InterProScan [32], which aligned 6,106 CsAEs to InterPro entries (E-value#1e 24 ). Among these, 2,889 CsAEs were assigned to 23,178 GO accession numbers. The 23,178 accession numbers further generated 2,349 distinguished GO mappings in two major ontologies, molecular  Figure 1). The most abundant groups represented under the molecular function category were assigned to the GO categories of nucleotide binding (9.1%), nucleic acid binding (6.2%), ion binding (5.7%), transferase activity (6.0%), and hydrolase activity (7.4%). The most abundant groups represented under the biological process category corresponded to the GO categories of cellular component organization and biogenesis (2.3%), transport (2.9%), localization (3.0%), biosynthetic processes (2.4%), metabolic processes of nucleobases, nucleosides, nucleotides and nucleic acids (2.7%), and protein metabolism (5.2%). The apparent discrepancies between these values may be due to the fact that one InterProScan number can be assigned to more than one GO accession number, and one GO accession number can be mapped to multiple parental categories and CsAEs [33].

Putative single nucleotide polymorphisms (SNPs)
SNPs are the most common and abundant type of genetic variation between individuals and are used in population and evolutionary biology studies [44]. In addition, SNPs can be used as markers for physical and genetic mapping. SNP discovery based on large-scale EST datasets has proven an efficient technique for identifying large numbers of SNPs. Among the 7,184 CsAE contigs, SNPs were discovered in 2,896 contigs (40%). A large majority (77%) of the SNPs were detected from contigs consisting of 2-4 ESTs, while the remainder of the SNPs were discovered in contigs consisting of more than 5 ESTs ( Table 2). A total of 9,077 SNPs were detected from the 7,184 contigs, which is an average of  [21]. In eukaryotic organisms, SNPs occur every 500 to 1,000 bp, at frequencies higher than those found in non-eukaryotic organisms [45]. Almost all of the predicted CsAEs with .0.19 SNPs/100 bp were no-match genes, and only several were homologous to S. mansoni genes of unknown function. In trematodes, SNPs in the first codon position can result in an amino acid substitution, which may lead to structural changes in the respective proteins and affect the formation of functional domains, antigenic epitopes, or drug binding sites [44].

Abundantly expressed gene transcripts in the different developmental stages
The 30 genes most abundantly expressed were investigated further (Table S1). Genes with significant p-values were analyzed by one-to-one comparison with the number of reads per the same gene in each stage. Eighteen genes in the adult stage and 12 genes each in the egg and metacercaria stages were annotated. Highly abundant genes should be conserved based on the reasoning that they are major constituents of the transcriptome of the organism and are therefore likely to be functional, but the highly abundant genes found in C. sinensis had few homologs, indicating that this organism is developmentally quite different to other organisms for which gene abundance information is available. Genes expressed abundantly across all three stages coded for ubiquitin family proteins, elongation factor 1-alpha, fructose bisphosphate aldolase, which are all proteins involved in basal and energy metabolisms. Genes highly expressed in the adult compared to the metacercaria and egg encoded structural and reproduction-associated proteins (beta-tubulin, ferritin), detoxification proteins (glutathione Stransferase), transportation proteins (clonorporin 1, sodium/ glucose co-transporter), energy production proteins (GAPDH, mitochondrial malate dehydrogenase), and enzymes (cysteine protease, PHGPx isoform 1). In particular, cysteine proteases have previously been shown to be the most abundantly expressed proteins in C. sinensis adults [19,20,24]. In the egg stage, the gene encoding acyl-CoA synthetase long chain family member 5 (ACSL5) was highly expressed among the known genes. ACSL5 is a member of the highly conserved ACSL family. In protozoan parasites, ACSL5 is thought to catalyze the conversion of longchain fatty acids to CoA derivatives to enable parasite growth [46]. The majority of highly expressed CsAEs was unknown or hypothetical genes and therefore deserved further study.

Evolutionary and functionality conservation
To investigate the evolutionary and functional conservation of the transcriptome of C. sinensis, we estimated gene numbers and the degree of conservation among C. sinensis genes and the genomes of diverse eukaryotic organisms. We performed pair-wise sequence comparisons of all C. sinensis transcripts using BLASTX with an E-value cut-off from 1e 210 to 1e 2200 . The C. sinensis transcriptome shared 22.9% genes with Homo sapiens, 23.0% with Mus musculus, 20.5% with Drosophila melanogaster, 17.8% with Caenorhabditis elegans, and 26.6% Schistosoma japonicum at an E-value#10 220 . The CsAEs showed a moderate degree of sequence homology to the genes of the comparator organisms and a higher degree of sequence homology to S. japonicum genes ( Table 3). Genes highly conserved between the CsAEs and S. japonicum may be important for parasite survival. These genes are discussed in more detail in the next section. A small fraction of genes (37 genes; 0.3%) were highly conserved (E-value#10 2200 ) across all the animals compared in this study (Table 4). These genes encode proteins such as actin, tubulin, translation elongation factor 1, valosin-containing protein, glycogen phosphorylase, and heat shock protein 70.

Clonorchis sinensis and parasitism
SimiTri graphically displays the relative similarity of one organism to others using bulk datasets from the respective organisms. The degree of similarity among helminth sequences was determined using two-dimensional plots [40]. We used SimiTri to analyze the global relative similarity of CsAEs to other parasitic or free-living trematodes, cestode and nematode ( Figure 2). C. sinensis (12,830 CsAEs) was more similar to the two parasitic flukes, S. japonicum and O. viverrini, than to the freeliving helminths Schmidtea mediterranea (73,650 ESTs) and C. elegans (474,350 ESTs). When compared to both Opisthorchis viverrini (4,194 ESTs) and S. japonicum (99,069 ESTs), the closest neighbor of C. sinensis was O. viverrini, consistent with the general taxonomic classification of these trematodes and the recent molecular phylogeny of Digenean trematodes based on morphological characters and the sequence of the nuclear ribosomal small subunit (18S) [47].
Based on the SimiTri analyses results, 23 CsAEs were considered to contain parasitism-related genes of C. sinensis ( Table  S2). The genes encoded by these 23 CsAEs showed higher sequence similarity to the parasitic platyhelminths, O. viverrini, S. japonicum, and Taenia solium than to free-living ones, S. mediterranea and C. elegans (Figures 2A and B). Of these, 12 CsAEs had an unknown function, whereas 11 CsAEs had cell communication, ion transport, metabolic processes, nucleotide and protein binding, or oxidoreduction functional annotations.

Membrane proteins, channels and transporters
Channels. In the adult C. sinensis cDNA library, EST encoding K + -channel was abundant but that of Ca +2 -channel was rare (Table 5). No Na + -channel EST was found in any of the three developmental stages. From an evolutionary point of view, K + -channels are ancient, occurring in all three domains of life, and are abundant in invertebrate animals. K + -channels are ubiquitous and maintain cell homeostasis in organisms. Ca +2 -channels emerged later in evolutionary time. In more basal animals,    Ca +2 -channels are activated in response to action potentials of nerve systems and provoke slow actions and movements. Na +channels are more elaborate and are often multimeric, and are generally rare in invertebrates compared to higher vertebrate animals [48]. The frequency distribution of these cation channels is consistent with phylogenetic placement of C. sinensis in the tree of life. Specific structural motifs in the interacting domains of the beta subunit of Ca +2 channels render flukes PZQ susceptible. These specific structural motifs were found in the beta-subunits of C. sinensis adults, which explain why they are vulnerable to PZQ. Transporters. A large number of glucose transporters were found in both the adult and egg stages (Table 5). Glucose/ sodium co-transporters and Na + /K + -transporting ATPases were abundant only in adult C. sinensis, not in the metacercaria. This fluke species consumes large amounts of glucose to generate energy and metabolic intermediates for physiologic regulation. In adult C. sinensis, glucose appears to be imported actively from the environment through glucose/sodium co-transporters using ATP as an energy source. Glucose molecules may move passively through the glucose transporters between cells in fluke tissues. Anion channel proteins including chloride channels were 4-fold more frequent than cationic channels, which could compensate for the large amount of Na + ions co-imported with exogenous glucose. These anionic channels may be suitable targets for vaccine or drug development.
CsAEs coding for bile acid beta-glucosidase and a sodium-bile acid cotransporter, a component of the bile acid transportation pathway, were present in the C. sinensis EST pool (Table 5). Bile acid beta-glucosidase converts bile acid to a soluble conjugated form to facilitate its secretion. The sodium-bile acid cotransporter, which imports bile salts with Na + -dependency, was abundantly expressed in the metacercaria stage. This type of transporter is responsible for the influx of conjugated bile acids into hepatocytes, ileal enterocytes, and cholangiocytes in mammals [49,50]. The presence of these transporters in C. sinensis suggests that C. sinensis thrives in bile juice by utilizing bile acid and its derivatives for normal physiologic pathways. C. sinensis is expected to have a bile acid exporting system comprising bile salt export pumps, organic solute transporters, and/or multidrug resistance protein 2 to maintain cellular homeostasis [49,50].
Neurotransmitters & receptors. CsAEs encoding proteins involved in neurotransmission such as serotonin receptors, tryptophan hydroxylase, aromatic amino acid decarboxylase, glutamate receptors, glutaminase, GABA receptor-associated proteins, acetylcholine esterase, and DOPA-decarboxylase were found as rare species (Table S3). The presence of these neurotransmission-related proteins implies that C. sinensis has a web of serotoninergic, glutaminergic, and cholinergic neurons, which is consistent with the previous observation that the nervous systems of trematodes are highly conserved [51].

Proteases and protease inhibitors
Proteases of parasitic origin are known to be important virulence factors based on genomic and proteomic analysis of several major global helminth species [52,53,54]. Secretory proteases of parasites are ubiquitous enzymes that have been implicated in several diverse physiological and adaptive mechanisms, such as tissue penetration, larval migration, immunoevasion, digestion, and excystation [55]. Because these proteases are indispensable for parasite viability and growth, they have been suggested as potential targets for vaccines or chemotherapeutic agents [56,57]. We classified the CsAE proteases into four functional groups based on the catalytic type, namely serine, threonine, aspartate, and metallo-or cysteine proteases ( Figure 3A). Cysteine proteases showed the highest expression (68.8%) levels among the four types of proteases during all three developmental stages. Cysteine proteases of C. sinensis are developmentally controlled and essential for survival because they are involved in processes such as nutrient uptake, tissue invasion, and evasion from host immune attacks [58]. Of the C. sinensis cysteine proteases, the cathepsin F-like isoenzyme (CsCF-6) was expressed across all developmental stages, and we observed that transcript levels of this protein increased according to the developmental stage of the parasite [59]. Metalloproteases were detected in all three developmental stages of C. sinensis ( Figure 3A). Metalloproteases are crucial proteases for invasion and immune evasion of flukes in addition to the general roles they play in catabolic reactions and protein processing [57].
Parasites utilize protease inhibitors to survive in their hosts; protease inhibitors can prevent damage by mature proteases prior to their secretion from the parasite and protect them against the digestive proteases of the hosts [60]. Diverse proteins function as protease inhibitors, but they have a common biochemical mechanism and are characterized by rapid evolution of their sequences [61]. In our study, cysteine protease inhibitor expression was highest in the adult stage among all three stages ( Figure 3B). Cysteine protease inhibitors (cystatins) regulate cysteine proteases and modulate host immune responses [60]. Cystatins of C. elegans have been shown to inhibit cathepsin B while filarial cystatins have been shown to inhibit the proliferation of murine and human Tcells. Because C. sinensis cysteine proteases are expressed most abundantly in the intestinal epithelium for uptake of nutrients [59], cystatins could be expressed in the intestinal epithelium to fine-tune the intracellular activities of cysteine proteases. Expression of serine protease inhibitors remained stable in all stages with slightly greater expression levels observed in the egg stage ( Figure 3B). This finding is consistent with a previous study that demonstrated that serine protease inhibitors were present mainly in the eggs of C. sinensis [62]. Transcripts encoding metalloprotease inhibitors were found only in the metacercaria library and inhibitors of aspartic and threonine proteases were not identified in any of the developmental stages.

Antioxidant enzymes
Several antioxidant enzymes constituting the oxidoreduction system were encoded by the CsAEs, with the expression levels of the various enzymes varying according to developmental stage ( Figure 3C). Antioxidant enzymes catalyze reactions that neutralize endogenous and exogenous reactive oxygen species (ROS) that are produced either by aerobic cellular metabolism or by host immune responses [63]. Regulation of the expression levels of these enzymes during each development stage is therefore important to cope with host-produced ROS [64]. In the C. sinensis transcriptome, glutathione-S-transferases (GSTs) were the most highly expressed antioxidant enzymes, especially in the adult and egg stages. In Fasciola hepatica, GSTs are expressed at much lower levels in juvenile worms than in adult worms living in the bile duct, implying that adult worms require more protection against host immune responses [63]. In C. sinensis, two GST isoenzymes have been identified: a 26 kDa GST and a 28 kDa GST [65]. Glutathione peroxidase (GPx) and thioredoxins (TRX) were expressed differentially at high levels in the adult stage ( Figure 3C). C. sinensis GPx has been reported to be specifically localized in the vitellocytes of vitelline glands and in the premature eggs [66]. GPx defends against ROS and repairs ROS-induced damage in trematodes that do not have catalases [53]. TRX was highly expressed (Figure 3). In Schistosoma mansoni, TRX is secreted from eggs and plays a crucial role in protecting eggs from hostinduced ROS production [63]. All antioxidant enzymes were expressed at low levels in the metacercaria stage (Figure 3), which can be explained by the fact that this is a dormant state with a depressed metabolism that is protected by a cyst wall from exogenous oxidative stresses.

Stress responsive genes
Heat shock proteins (HSPs) function as molecular chaperones and play an important role in the stress response to a variety of biological stresses such as heat shock, hypoxia, mechanical stimuli, lack of glucose, and UV exposure by assisting in the refolding of denatured proteins into active forms or targeting them for degradation [67]. We identified six HSPs from the CsAEs: HSP110, HSP90, HSP70, HSP60, HSP DnaJ, and HSP20. Most HSPs were expressed at higher levels in the metacercaria and egg stages than in adult worms. The higher molecular weight HSPs such as HSP110, HSP90, HSP70 and HSP60 were more highly expressed in the metacercaria stage, while the lower molecular weight HSPs, HSP DnaJ and HSP20, were expressed at higher levels in the egg stage. In C. sinensis adults, HSPs were expressed at low levels relative to the metacercaria and egg stages ( Figure 3D).
In the life cycle of C. sinensis, the metacercariae experience a large thermal change when they move from the environment (ambient temperature) to the stomach of the mammalian host (37uC). Furthermore, as the metacercariae pass down and excyst in the duodenum, they face the osmotic stress of intestinal secretions, and then when they migrate into the bile duct, they are exposed to bile juices. The higher molecular weight C. sinensis HSPs are likely to responsd to these thermally-and environmentally-induced stresses [68,69]. The C. sinensis eggs that ovulate in the bile duct are carried down the intestine and passed out in the feces of the mammalian host into the environment, thereby experiencing cold shock. In the C. sinensis eggs, the low molecular weight HSPs may function to protect the eggs against cold thermal shock [70], while the high molecular weight HSPs might contribute to recovery from the cold shock [71].

Cell proliferation and cholangiocarcinoma-related genes
The C. sinensis transcriptome had several contigs encoding proteins associated with cell proliferation and apoptosis such as granulin, epidermal growth factor (EGF), tumor growth factor (TGF) interacting protein, and inhibitors and regulators of apoptosis. Among these proteins, granulin, encoded for by 67 reads, was most abundantly expressed with a 8.6-fold higher expression level in adults than metacercariae. Similarly, the EGF gene was also expressed at 2.5-fold higher levels in the adult stage than the metacercaria stage (Table 6). Transcriptomic datasets of C. sinensis and O. viverrini were previously analyzed for proteins common to carcinogenesis and a large number of the amino acid sequences of these trematodes were inferred to have homologs to genes involved in human cancer development [25]. The C. sinensis genes associated with apoptosis, cell proliferation, and cancer development encoded laminins, c-Jun N-terminal kinase, catenins, cyclin-dependent kinases, histone deacetylases, MFS transporters, serine/threonine kinases, and transcription factors ( Table 6). C. sinensis infection provokes both acute and chronic pathological changes such as proliferation of the bile duct epithelium and periductal fibrosis that is disseminated over the biliary tree from proximal to remote biliary capillaries [5]. The mitogen-like proteins secreted or excreted from C. sinensis are likely to be provocative agents that cause biliary epithelial alterations, as has been documented for the granulin-like growth factor of O. viverrini [72,73]. Proliferating cholangiocytes could be vulnerable to DNA damage from endogenous and exogenous carcinogens, bioreactive free radicals, and nitrosocompounds. The apoptosis inhibitor and regulatory proteins identified in the transcriptome of C. sinensis in this study (Table 6) may prevent the death of DNA-damaged cells and possibly facilitate their transformation into cancerous cells [73]. Epidemiologically, the incidence of cholangiocarcinoma is significantly higher in clonorchiasis endemic areas than in nonendemic areas [74]. Experimentally, infection with C. sinensis and the ingestion of dimethylnitrosamine by Syrian golden hamsters resulted in cholangiocarcinoma [9,75].

Drug target candidates
Praziquantel is currently used to treat clonorchiasis. However, the efficacy of praziquantel against clonorchiasis has been reported to be poor in northern Vietnam [76]. Tribendimidine has recently emerged as a promising alternative to praziquantel for the treatment of human opisthorchiasis [77]. Transcriptomic datasets facilitate the search for new drug targets by mining them with bioinformatic tools that employ statistical and network analyses, parasite-specific physico-metabolic pathways and developmental regulation can be found. Membrane proteins including channels, transporters, and permeases, which play important roles in hostparasite interactions, were some of the first novel targets identified using transcriptome data [78,79]. A total of 435 CsAEs that had more than two transmembrane domains were screened using the TMHMM algorithm [36], and the localization of the predicted proteins was determined using PSORTb based on homology to proteins of known localization [35]. CsAEs homologous to proteins of the free-living flatworm S. mediterranea, or of humans were selected using a cut-off E-value#1e 210 using BLASTX. Five genes not present in vertebrates were identified as putative drug targets using the screening strategy described above (Table S4). Tetraspanins, four-transmembrane-domain proteins, are present on the outer tegument of trematodes and function as receptors for host molecules [80]. Tetraspanins are a recognized vaccine target for S. mansoni [78]. ADP ribosylation-like factor-6 (ARL6) interacting protein, which interacts with the ARL6 protein [81], is involved in hematopoietic maturation processes such as protein transport, membrane trafficking, and cell signaling [82]. Myelin proteolipid protein is a transmembrane protein that has been suggested to serve as a structural component of myelin, contributing to both the stability and compact lamellar structure of myelin [83].

Diagnostic antigen candidates
Serodiagnostic methods are used to screen for patients infected with C. sinensis and as supportive diagnosis tools in individual patients. Antigen proteins have been identified from the excretory-secretory products of C. sinensis and have been purified from crude extracts [84]. As antigenic preparations, crude extracts of C. sinensis show high sensitivity but low specificity toward the sera of clonorchiasis patients. In contrast, some of the recombinant antigenic proteins used for screening have high specificity but low sensitivity [84]. Potent serodiagnostic antigens for clonorchiasis are therefore still lacking. As the first step to find antigen candidates, the C. sinensis transcriptome data was filtered for secretory signal peptides using the SignalP 3.0 server and a neural network/hidden Markov model [34], while proteins secreted into the extracellular space were searched for using PSORTb [35]. After removing proteins with transmembrane domains using TMHMM, 411 CsAEs were designated 'secretory'. To determine which of these proteins is potentially antigenic [36], the secretory candidates were filtered using the ABCpred server [85] with default parameters, and 43 CsAEs with two or more B-cell epitopes and a score of 0.82 or higher were obtained. By removing proteins homologous to mammalian and nuclear proteins, a total of 19 CsAEs were identified as putative antigen candidates. Examples of these candidates include a male sterility protein, cathepsin L-like cysteine proteinase A, protein disulfide isomerase-related protein P5 precursor, cryptosporidial mucin, and TGF-beta receptor interacting protein 1 (Table 7). Recombinant antigenic proteins encoded by the selected CsAEs could be synthesized in vitro using a high-throughput cell-free system [86], and their antigenicity for the serodiagnosis of clonorchiasis could be determined using various methods such as a protein chip [87]. Figure S1 Contigs and singlets of the assembled C. sinensis EST pool according to developmental stage. (TIFF) Figure S2 The annotation of CsAEs using Interpro and BLASTX. Whole 12,830 CsAEs were searched for homologs in the NCBI NR database and the InterPro database, and the retrieved homologs were manually curated. A total of 7,132 CsAEs were annotated with homologs, but the remaining 5,698 (orange) found to have no homolog. Of the annotateds, 5,387 CsAEs (violet) matched homologs in both databases, 719 ones (green) in the InterPro, and 1,026 ones (blue) in the BLASTX. (TIFF)