Comparative Transcriptomic Exploration Reveals Unique Molecular Adaptations of Neuropathogenic Trichobilharzia to Invade and Parasitize Its Avian Definitive Host

To date, most molecular investigations of schistosomatids have focused principally on blood flukes (schistosomes) of humans. Despite the clinical importance of cercarial dermatitis in humans caused by Trichobilharzia regenti and the serious neuropathologic disease that this parasite causes in its permissive avian hosts and accidental mammalian hosts, almost nothing is known about the molecular aspects of how this fluke invades its hosts, migrates in host tissues and how it interacts with its hosts’ immune system. Here, we explored selected aspects using a transcriptomic-bioinformatic approach. To do this, we sequenced, assembled and annotated the transcriptome representing two consecutive life stages (cercariae and schistosomula) of T. regenti involved in the first phases of infection of the avian host. We identified key biological and metabolic pathways specific to each of these two developmental stages and also undertook comparative analyses using data available for taxonomically related blood flukes of the genus Schistosoma. Detailed comparative analyses revealed the unique involvement of carbohydrate metabolism, translation and amino acid metabolism, and calcium in T. regenti cercariae during their invasion and in growth and development, as well as the roles of cell adhesion molecules, microaerobic metabolism (citrate cycle and oxidative phosphorylation), peptidases (cathepsins) and other histolytic and lysozomal proteins in schistosomula during their particular migration in neural tissues of the avian host. In conclusion, the present transcriptomic exploration provides new and significant insights into the molecular biology of T. regenti, which should underpin future genomic and proteomic investigations of T. regenti and, importantly, provides a useful starting point for a range of comparative studies of schistosomatids and other trematodes.


Introduction
The bird fluke Trichobilharzia regenti is a member of the Schistosomatidae (= blood flukes; Class Trematoda), a family of parasitic flatworms of medical and veterinary importance [1,2]. T. regenti is widely distributed geographically and is highly prevalent, for instance, in parts of Europe (including Russia), New Zealand and Iran [3][4][5][6]. Like blood flukes of the genus Schistosoma, T. regenti is dioecious, has a two-host life cycle (including a lymnaeid snail of the genus Radix) and has an invasive furcocercarial stage that actively penetrates the skin of a definitive vertebrate host. Unlike members of the genus Schistosoma, T. regenti invades and migrates through skin and nerves to then establish within the nasal mucosa [7][8][9]. During its aquatic phase, T. regenti can accidently penetrate human skin and cause cercarial dermatitis. Cercarial dermatitis, caused by avian schistosomes, is regarded as an emerging disease [10][11][12] although global economic losses are not known, it is accepted that this condition can have a considerable impact on local, tourism-based economies, and may also represent a debilitating occupational disease of rice farmers (see Horák et al., 2015 for review [12]). As avian (including T. regenti) and human schistosomes can occur in the same water reservoirs, there are at least two issues of relevance in relation to the differential diagnosis of disease: (a) Based on clinical signs, cercarial dermatitis caused by avian schistosomes can be confused with that caused by human schistosomes [13]. (b) Prevalence surveys of hepatointestinal or urogenital schistosomiasis of humans might be influenced/affected by serological cross-reactivity resulting from exposure to cercariae of avian schistosomes [14].
To better understand T. regenti and the diseases that this parasite causes, considerable research has focused on exploring its life cycle. Once shed from the intermediate aquatic snail host, the cercariae survive only for a limited time in water (1 to 1.5 days in related avian schistosomes [15], consuming their glycogen reserves acquired from the intermediate host [16]. Upon contact with the skin of the definitive host, the cercariae release secretions containing proteolytic enzymes (peptidases) from their circumacetabular and postacetabular penetration glands [17], which enable tissue penetration [18][19][20]. During penetration, the cercariae transform to schistosomula within~12 h [9,21]; for schistosomes, this process is accompanied by a loss of their tail, formation of a double (heptalaminar) membrane covering the tegument and a reduction of surface glycocalyx [21,22] as well as a switch from aerobic to anaerobic metabolism, depending on the amount of accessible glucose [23,24] and the activation of metabolic processes in the parasite's gut [25]. In contrast to human schistosomes, T. regenti schistosomula do not migrate directly to blood vessels, but rather enter peripheral nerves, and migrate to the spinal cord and brain of the host, during which they feed on neural tissue [8,26]. Having reached the pre-adult stage in the meninges, the schistosomula start to feed on blood and then migrate into the nasal cavity, likely via an intravascular route [27]. The significant damage to nerve tissue caused by migrating schistosomula can lead to behavioural changes, disorientation, paralysis or even death in some hosts [7,28].
Despite the importance of cercarial dermatitis in humans caused by T. regenti and the unique neuropathogenic effects of this parasite on its permissive avian hosts as well as experimental rodent hosts, little is known about the molecular mechanisms underlying tissue penetration, transformation of cercariae to schistosomula, tissue invasion and parasite-host interactions. Here, we propose that exploring the developmental transcriptomes of cercaria and schistosomulum of T. regenti will provide vital insights into the fundamental molecular biology of this parasite, and identify essential pathways and protein classes linked to early tissue invasion. An analysis of the developmental transcriptome of T. regenti should also fill gaps in our knowledge of the parasite's biology, as, to date, major molecular investigations have focused mainly on human blood flukes. Therefore, in the present study, we (i) assembled and annotated the transcriptome of two consecutive life stages of T. regenti involved in the first phases of infection of the avian host, (ii) identified key biological and metabolic pathways specific to each of these two developmental stages, and (iii) undertook comparative analyses using data available for taxonomically related blood flukes of the genus Schistosoma.

Parasite materials
Trichobilharzia regenti was maintained in snail intermediate (Radix lagotis) and definitive (Anas platyrhynchos f. domestica; breed-Cherry Valley strain) hosts in the Laboratory of Helminthology, Faculty of Science, Charles University in Prague, using an established protocol [1]. Four separate groups of infected snails (n = 20 snails per group), each representing a distinct biological replicate, were established to obtain four independent biological replicates of pooled cercarial and schistosomula samples (Fig 1).
Schistosomulum replicates: Upon light stimulation for 2 h, cercariae (mixed-gender) shed from each snail group were collected and used to infect seven-day old ducks (n = 4 replicates; 2,500 cercariae per duck). Pooled schistosomula were collected from the spinal cord of each infected duckling seven days following inoculation using established methods [7,28]. Briefly, the spinal cord was carefully prised apart manually (using dissection needles) in phosphatebuffered saline (PBS) and exposed to bright light for 1 h. Schistosomula (n = 4 replicates; 150 to 400 individuals per duckling) were collected, washed extensively in PBS and stored in TRIzol reagent (Invitrogen) at -80°C until further processing. For each biological replicate, a sample of cercariae (shed from the same snail group used to infect the ducks) was also collected for RNA isolation. Cercaria replicates: Cercariae were collected every second day for one week, with each batch washed twice in tap water, centrifuged at 2,500 ×g at 4°C, and then stored in TRIzol reagent (Invitrogen) at -80°C.

RNA isolation, library preparation and sequencing
Total RNA was purified from individual biological replicates (four for both cercariae and schistosomula) using TRIzol, and residual genomic DNA removed (DNA-free kit, Invitrogen) following the manufacturer's instructions. The integrity and quality of total RNA were determined using a Bioanalyzer 2100 (Agilent) and Qubit RNA BR assay kit (Invitrogen). Messenger RNA (mRNA) was purified, and short-insert (330 bp) complementary DNA (cDNA) libraries constructed and barcoded according to the manufacturer's instructions (TruSeq RNA Sample Preparation v.2, Illumina). All cDNA library was paired-end sequenced (2 × 211 base reads) on a single line using the HiSeq 2500 platform (Illumina).  Assembly of the transcriptome Sequencing adaptors and nucleotides with a Phred quality score of < 20 were removed using Trimmomatic v.0.3 [29]. The quality of filtered paired-end read data was manually assessed using FastQC [30] For each RNA-seq dataset, reads were corrected using Spades v.3.1.0 [31] and normalized digitally using the program khmer v.1.1 [32]. For each of the cercarial and schistosomula RNA-seq data sets, replicate datasets were pooled and used to assemble nonredundant transcriptomes using Oases v.0.2.8 [33], employing coverage cut-offs of 13 and 14, and k-mer values of 47 and 49, respectively. A final, merged transcriptome was generated by concatenating cercarial and schistosomula transcriptomes, and removing redundancy using CD-HIT-EST [34] using a nucleotide sequence identity threshold of 85%. Finally, coding domains were predicted using Transdecoder [35], and only transcripts encoding proteins of 30 amino acids were retained. The proportion of genome annotation represented by the non-redundant larval transcriptome was assessed using the program CEGMA [36].

Annotation of the transcriptome
Transcripts homologous (BLASTn; E-value cut-off: < 10 −5 ) to avian, bacterial or viral nucleotide sequences in the NCBI non-redundant nucleotide database [37] and translated proteins homologous (BLASTp; E-value cut-off: 10 −5 ) to transposable elements in the RepBase database [38] were quarantined as well as sequences shorter than 50 amino acids (aa) with no homology in public databases. The non-redundant, merged transcriptome was then annotated using an established pipeline [39]. Briefly, protein sequences inferred from transcripts were annotated using their closest homologues (BLASTp; E-value cut-off: 10 −5 ) in the following databases: NCBI non-redundant protein [37]; SwissProt [40]; MEROPS peptidase and peptidase inhibitor [41]-with predicted peptidases of unknown catalytic type and inhibitor homologues of unassigned peptidases being excluded; Kyoto Encyclopedia of Genes and Genomes (KEGG) [42], excluding KEGG "Human Diseases" and "Organismal Systems" categories. Conserved domains and their associated gene ontology (GO) annotations were predicted using the program InterProScan [43]. The server REVIGO was used to summarize GO terms and define a representative subset of terms using a simple clustering algorithm that relies on semantic similarity measures [44]. Excretory/secretory (E/S) proteins were predicted based on the presence of a signal peptide and the lack of a transmembrane domain; in addition, proteins were subjected to analysis using the program MultiLoc2 [45] to predict their sub-cellular location. Only proteins predicted to be extracellular or lysosomal (score: > 0.5) were included from the final set of predicted E/S proteins.

Analysis of differential transcription
For each biological replicate, paired, trimmed and corrected reads were mapped to the final transcriptome using RSEM [46]. The expected counts predicted were rounded to the highest whole number, and used as counts per transcript for differential gene transcription analysis using edgeR v.3.6.7 [47] and R v.3.10 [48] software packages, with read counts being normalized to account for any GC bias [49] and using the trimmed mean of M-values (TMM) [50]. Transcripts with more than a log 2 fold-change in transcription between the two developmental stages (i.e. cercaria and schistosomulum) and with a false discovery rate (FDR) of 0.01 were recorded as differentially transcribed. Enriched GO terms for transcripts recorded to be differentially transcribed in either developmental stage were tested using topGO [51] and the Fisher's exact test (p 0.05); representative enriched GO terms were inferred using the program REVIGO [44]. In addition, transcripts specific to either the cercaria or the schistosomulum were those with RSEM expected counts of > 2 in at least one of four replicates representing one but not the other developmental stage.

Results
Characterisation and annotation of the non-redundant transcriptome of T. regenti commenced with a start codon, and 11,434 terminated with a stop codon. Despite sequencing only two developmental stages, the T. regenti transcriptome includes 89.5% of the 358 conserved eukaryotic genes [36] (Table 1), and thus represents a substantial proportion of the gene set. On average, 61.5% and 74.4% of the sequence reads representing the cercaria and schistosomulum, respectively, mapped to the non-redundant T. regenti transcriptome (S1 Table). The final transcriptome and RNA-seq read data are available for download via the NCBI transcript reads archive and sequence read archive (SRA), respectively (BioProject ID: PRJNA292737). Totals of 88 and 243 potential contaminants (avian, viral and bacterial) were removed from the cercaria and schistosomulum transcriptomes, respectively. Following the removal of these sequences, the larval transcriptome was shown to encode 10,900 (77.5%) and 8,347 (57.8%) predicted proteins that were homologous (BLASTp; E-value 1e -05 ) to those in the NCBI (non-redundant proteins) and SwissProt databases, respectively (Table 1). In addition, 5,935 predicted proteins were assigned 41 unique KEGG BRITE protein families (Fig 2A; Tables 1 and S2), and 3,611 predicted proteins were assigned 172 biological KEGG pathways (Tables 1  and S2). Using the MEROPS peptidase and inhibitors database, 318 peptidases, including key molecules recognized to be involved in cercarial penetration and schistosomulum migration, nutritional uptake and/or immune evasion [19] were identified (S3 Table). Transcripts encoding metallopeptidases (n = 129) were abundant, and included ubiquinol-cytochrome c reductase proteins (n = 23), kell blood-group proteins (n = 8) and leucine aminopeptidases (n = 4). Transcripts encoding cysteine peptidases (n = 106) including cathepsin B (n = 11), cathepsin L (n = 6) cathepsin C (= dipeptidyl-peptidase; n = 3), ubiquitin-specific peptidases (n = 10) and legumains/aspariginyl endopeptidases (n = 2) and a ubiquitin-specific peptidase, were also well represented. Transcripts encoding serine peptidases (n = 62) represented indeterminate peptidases (n = 43), cathepsin A (= carboxypeptidase A; n = 3) and mitochondrial inner membrane peptidase 2 (n = 2). Transcripts encoding threonine peptidases (n = 15) were represented by proteasome subunits (n = 4). Also identified was a small number of transcripts encoding aspartic proteases (n = 6), including one cathepsin D. In total, transcripts representing 260 inhibitors of peptidases were identified in the (non-redundant) transcriptome of T. regenti, most of which represented inhibitors of metallo-(n = 110), serine (n = 77) and cysteine peptidases (n = 41) (S3 Table).
A relatively large number of proteins predicted were homologous to those encoded in the genomes of other trematodes, including schistosomes (10,909 proteins; 85.6% being similar to Schistosoma mansoni, S. japonicum and/or S. haematobium) ( Fig 2B) and Asian liver flukes (10,080 proteins; 79.3% being similar to Clonorchis sinensis and/or Opisthorchis viverrini) ( Fig  2C). Of note were 1,722 transcripts that had no homology to other trematode species; only 12 of them (including collagen alpha-3(VI) chain-like, 40S ribosomal protein S7-like, ReO_6, membrane magnesium transporter 1-B-like, VWFA and cache domain containing protein 1 and UPF0729 protein C18orf32 homolog) shared homology to proteins in the NCBI database. In addition, collagen, type VI, alpha and small subunit ribosomal protein S7e were predicted to be involved in five biological pathways (S4 Table). In addition, the larval transcriptome encoded 135 ES proteins, including various peptidases and their inhibitors, phosphatases, kinases, transferases and ribonucleases (S5 Table).
Differentially transcribed genes, and biological pathways enriched in either cercariae or schistosomula A total of 11,058 transcripts were shared by cercariae and schistosomula. Mapping results revealed 270 and 951 transcripts to be specific to the cercaria and schistosomulum stages, respectively (Fig 3A; S6 and S7 Tables). In total, 1,301 transcripts were up-regulated in cercariae and 1,876 in schistosomula (S8 and S9 Tables). The top twenty most differentially transcribed genes in cercariae encoded venom allergen-like protein 8, tegumental protein, calcium-binding proteins, glutamine synthetase and some uncharacterized proteins (with no homology to any sequences in public databases). The top twenty most differentially transcribed genes in schistosomula encoded peptidases (e.g. peptidase M26, cathepsin B1) and beta galactosidase, some uncharacterized proteins with homology to those of S. mansoni, two of which (Treg_015087, Treg_015334) were homologous to a saposin-like protein and beta hexosaminidase B based on InterPro classification respectively ( Table 2). A number of up-regulated transcripts encoded proteins with conserved functional domains (1,178 and 1,699 for cercariae and schistosomula, respectively) and/or similarity to proteins in the KEGG database [605 and 786 (BRITE) (Fig 3B), and 389 and 487 (pathway) for cercariae and schistosomula, respectively]. Using these transcripts, we were able to identify enriched KEGG BRITE protein families, biological pathways and GO terms (biological process) in each of the two larval stages (Tables 3, 4, S10 and S11).
In cercariae, 162 up-regulated transcripts represented five enriched KEGG BRITE protein families (Tables 3 and S10), with most transcripts encoding exosome-related proteins, including calmodulin (n = 10), long-chain acyl-CoA synthetase (n = 4), creatine kinase (n = 4) and tropomyosin 1 (n = 3). Other protein families were associated with: (a) metabolic processes involving lipid biosynthesis proteins (n = 9) and amino acid-related enzymes (n = 14), and (b) cellular processes linked to protein translation, ribosomes (n = 26) and ribosome biogenesis (n = 38). In addition, 202 transcripts up-regulated in cercariae represented 21 enriched KEGG (biological) pathways (Tables 3 and S10), with almost one quarter of them (n = 50) linked to    Tables 3 and S10). In addition to these enriched protein families and biological pathways, 161 transcripts up-regulated in the cercariae were linked to 36 GO terms for 'biological process' that were enriched in the cercarial stage (Tables 3 and S10). These GO terms were represented by eight representative biological processes, including cellular respiration (115 transcripts linked to 17 GO terms, including ATP metabolic process, glycerol-3-phosphate metabolic process, nucleotide biosynthetic process, organophosphate biosynthetic process and oxidationreduction process), glucose metabolism (29 transcripts representing five GO terms, including carbohydrate catabolic process, cellular carbohydrate metabolic process, glucose metabolic process and organic substance catabolic process), acto-myosin structure organization (23 transcripts representing five GO terms, including actomyosin structural organization, cell wall macromolecule metabolic process, cell wall organization or biogenesis, energy-coupled proton transport, down electrochemical gradient, and ribosome biogenesis) and coenzyme metabolism (22 transcripts representing four GO terms, including acetyl-CoA metabolic process, coenzyme catabolic process, coenzyme metabolic process and cofactor catabolic process). Other biological processes were represented by single GO terms and included carbohydrate metabolism (40 transcripts), cofactor metabolism (n = 22), generation of precursor metabolites and energy (n = 23) and pyridine-containing compound metabolism (n = 9). In schistosomula, 145 up-regulated transcripts represented five enriched KEGG BRITE protein families (Tables 4 and S11), with most encoding peptidases (61 transcripts) including cathepsin B (n = 11), cathepsin D (n = 8), separase (n = 4), leucyl amino peptidase (n = 3) and legumain (n = 3). Other protein families enriched were associated with: (a) metabolic processes involving heparan sulfate/heparin binding proteins (n = 22) and proteoglycans (n = 6), (b) genetic information processing involving chaperones and folding catalysts (n = 54) and (c) environmental information processes represented by cell adhesion molecules and their ligands (n = 42). In addition, 128 up-regulated transcripts represented 10 enriched KEGG biological pathways (Tables 4 and S11). Metabolic pathways involved carbohydrate, amino sugar or nucleotide sugar metabolism (n = 32); glycan biosynthesis and metabolism, including glycosaminoglycan degradation (n = 32), and glycosphingolipid biosynthesis (n = 33). In each metabolic pathway, 30 different transcripts encoding hexosaminidase were inferred to be involved. Pathways associated with signalling were also significantly enriched in the schistosomula, and represented by cell adhesion molecules (n = 6) and extracellular matrix-receptor interaction molecules (n = 10). Other enriched pathways in the schistosomula were associated with cellular processes, such as cell cycle (n = 21) and processes linked to lysosome function (78 transcripts, including 30 encoding hexosaminidases). In addition to the enriched protein families and biological pathways, 332 transcripts were linked to 28 GO ('biological process') terms enriched in the schistosomulum stage (Tables 4 and S11). These GO terms represented five core processes: cell adhesion (n = 105 transcripts; 18 associated GO terms), proteolysis (n = 215; 7 GO terms), biological adhesion (n = 37; one GO term), multicellular organismal process (n = 15; one GO term) and developmental processes (n = 20; one GO term).

Discussion
To date, genomic and transcriptomic studies of schistosomatids have focused on S. mansoni, S. japonicum and S. haematobium [52][53][54]. In the present study, we characterized the first transcriptome of any avian fluke, T. regenti, undertook comparative studies with other schistosomatids and elucidated differences between two key developmental stages responsible for host invasion (cercaria) and migration through the neural tissues (schistosomulum) at the molecular level.

Comparison of T. regenti with other schistosomatids
Based on similarity searches, the proteins predicted from the T. regenti transcriptome have 85.9% sequence homology to those of human schistosomes (BLASTx, E-value 10 −5 ), although divergent molecules (14.2%) are proposed to relate to considerable differences in biology of bird and human schistosomes. Trichobilharzia regenti shares 110 transcripts exclusively with S. japonicum, with no homologues in either S. mansoni or S. haematobium. This finding is consistent with S. japonicum being the most similar of the three human schistosomes to T. regenti in terms of chemical tools serving for skin penetration. In particular, a shared feature of T. regenti and S. japonicum is the absence, at both the mRNA and protein levels, of cercarial elastase [18,[55][56][57]. By contrast, S. mansoni and S. haematobium both use elastase for cercarial invasion of the definitive host [58,59]. While humans represent exclusive/dominant hosts of S. haematobium and S. mansoni, S. japonicum has an ability to infect a broader range of mammals (including pigs, water buffalo and water rats; [60]), but not birds as natural final hosts. In this case, therefore, the absence of elastase in both S. japonicum and T. regenti seems to be an interesting example of convergence. In other words, the data here indicate that T. regenti shares more unique transcripts (n = 110) with S. japonicum than either S. mansoni (n = 52) or S. haematobium (n = 57) (Fig 2B); the apparent absence of cercarial elastase from the two former species might reflect their distinct host affiliations/broader spectra compared with the latter two. Interestingly, 1,722 (13.6%) of predicted proteins of T. regenti had no homology to those of any other trematode species, for which molecular data are available (none of which are parasites of birds) (Fig 2B). Of these 1,722 predicted proteins, only 151 had homologues in public databases. Remaining 1,571 transcripts represented orphans (genes with no homology to known domains or proteins). However, some of these orphans were highly transcribed in cercariae and/or schistosomula. Interestingly, 10 and 8 of 20 genes exhibiting the highest transcription in cercariae and schistosomula, respectively, encoded orphans. Such molecules might have unique particular biological functions or processes in T. regenti associated with the penetration of avian skin (cercaria), specific migration through neural tissue (schistosomulum) and an adaptation to the avian definitive host (both stages). This statement is supported by biological and clinical evidence [26], showing that T. regenti is not able to effectively establish infection in the mammalian hosts under natural conditions.

Molecular aspects unique to the cercarial stage of T. regenti
In digenetic trematodes, interestingly, the cercaria are considered to be less transcriptionally active than other developmental stages [61][62][63], although the high transcription of genes encoding proteins participating in particular metabolic pathways described here likely reflects the specific biological requirements of this larval stage.
Carbohydrate metabolism. After leaving their snail intermediate host, schistosome cercariae must find and penetrate the epidermis of their definitive host within hours [16], and thus rely exclusively on stored energy reserves (including glycogen) [16]. In T. regenti cercariae, the enriched carbohydrate metabolism, including glycolysis (linked to glucose-6-phosphatase, fructose-1,6-bisphosphatase, phosphoenolpyruvate carboxykinase and pyruvate carboxylase), pyruvate, citrate cycle and oxidative phosphorylation pathways indicate the importance of aerobic degradation of glucose from glycogen sources, consistent with observations in other freeliving trematode stages [64][65][66][67][68]. However, interestingly, other pathways for the degradation of fructose, mannose, pentose phosphate, starch, sucrose, galactose, glyoxylate and dicarboxylate were also enriched in T. regenti cercariae and appear to be functional; the present results indicate that some enzymes likely participate in multiple metabolic pathways. For instance, citrate synthase is likely involved in glyoxylate and dicarboxylate metabolism as well as in the citrate cycle, the latter of which is regarded as a major metabolic pathway in T. regenti cercariae, corresponding to evidence for S. mansoni [69].
Translation and amino acid metabolism. The present findings show that T. regenti cercariae are less transcriptionally active than schistosomula, although the pathways linked to translational machinery, such as ribosome biogenesis, were significantly enriched, including 24 distinct large ribosomal subunit proteins. We speculate that cercariae synthesize ribosomes for immediate protein synthesis following their invasion of the definitive host, in which the energy and protein sources would appear to be unlimited. The limited pool of amino acids in schistosome cercariae might originate from the snail intermediate host and/or from biosynthesis [70]; in T. regenti, the transcripts encoding enzymes involved in the metabolism of amino acids, such as alanine (alanine transaminase), asparagine (aspartate-ammonia ligase), glutamine (glutamine synthetase) and cysteine (cystathionine gamma-lyase), were identified. All of these amino acids are derived from the citrate cycle or glycolysis, considered the main metabolic pathways in the cercarial stage. Although the synthesis of amino acids by parasitic stages of helminths is well recognized [71], this aspect has been seldom studied in free-living stages of schistosomes [62,72].
Key role for calcium during invasion. In T. regenti cercariae, calcium signalling may also be important for enriched cellular processes, as has been observed for S. japonicum [62]. For example, transcripts encoding calcium-binding protein (CaBP) represented the fifth most differentially transcribed gene (log-fold change: 16.5) between cercariae and schistosomula of T. regenti (cf. Table 2), which likely relates to CaBP being essential during the very active invasion process of larvae in the definitive host [73]. In other schistosomes, such as S. mansoni, there is a down regulation of CaBP in cercariae following epidermal penetration [73]. The preacetabular (circumacetabular) penetration glands of schistosome cercariae contain a high concentration of calcium [18,74,75]. This calcium has been suggested to regulate the activity of S. mansoni cercarial elastase [76,77] and of glycocalyx shedding by cross-linking endogenous postacetabular mucopolysaccharides [18,76]. Although there is no evidence here of elastase in the T. regenti cercaria, the highest transcription of any peptidase in this stage was exhibited by the calpain gene, which is strictly regulated by calcium ions. Calpain may regulate the surface membrane synthesis process, as it does in S. mansoni [78], but this proposals needs to be tested.

Molecular processes unique to the schistosomulum stage of T. regenti
During their invasion of the definitive host, the transformation of cercariae to schistosomula is associated with rapid and major morphological, biochemical and molecular changes. Unlike most schistosomes studied to date, the schistosomula of T. regenti do not enter the bloodstream, but rather seek out and migrate in nerves and consume neural tissue (as nutrition) during migration [26]. Compared with the circulatory system, the nervous system represents an immunologically and physiologically distinct environment (in terms of nutrients and immune responses). The present study investigated the schistosomula of T. regenti seven days after infection of the avian host and explored the molecular adaptations required for the parasite to establish and survive in this unique niche-the neural system.
Growth and development, and cell adhesion molecules (CAMs). In addition to playing critical roles in regulating normal cell integrity and cell-cell interactions, cell adhesion molecules (CAMs) can also regulate host-parasite interactions. For example, in protozoan parasites, CAMs have been reported to mediate the attachment of pathogens to host cells [79]. Whilst CAMs of T. regenti might participate in host-parasite interactions, it seems more likely that they regulate cell adhesion to maintain their multicellular structure [80]. Highly up-regulated transcription associated with CAMs in the schistosomula of T. regenti might indicate rapid growth and development of different organ structures within the definitive host. For instance, abundant transcription linked to neuroligin, netrin receptor and semaphorin might relate to the development of the nervous system in this developmental stage, based on evidence from other organisms [81,82].
Metabolism. Contrary to the situation in the cercariae of T. regenti, transcription of genes encoding enzymes involved in aerobic metabolic processes in the schistosomula is very low, and limited to the citrate cycle and oxidative phosphorylation. This finding suggests that the schistosomulum possesses a microaerobic metabolism seven days after infection of the definitive host, which is similar to observations made for human schistosomes [25]. However, in the schistosomulum stage of S. mansoni, for example, most energy is reported to originate via anaerobic glycolysis, with lactate as an end product [24,25,83], contrasting the situation in the T. regenti schistosomulum.
Proteolysis and histolysis during migration. The schistosomula of T. regenti need to migrate through the avian neural tissues to reach the nasal mucosa, where they mature to adults, mate and produce eggs [1]. The most abundant transcripts specifically in the schistosomulum stage encode proteolytic enzymes, including several cysteine peptidases, in particular, transcripts encoding cathepsin B1.5 (Treg_006320) (RSEM-expected counts: 30,779), followed by cathepsin L (Treg_006337) (12,349), cathepsin B1.6 (Treg_004279) (8,216), cathepsin L-like peptidase (Treg_006792) (5,514), cathepsin C (Treg_014726) (5,145) and leucine amino peptidase (4,966) (Treg_015017) (S9 Table). While cercariae store and express proteolytic enzymes in penetration glands for invasion via the skin, schistosomula degrade various host tissues during migration [84], including neural tissue, specifically in the case of T. regenti [26], and evade or block host immune attack [75]. The present study indicates that T. regenti schistosomula employ a considerably broader arsenal of proteolytic enzymes than cercariae do, with more than twice as many peptidases (n = 31) predicted to be upregulated in the former than in the latter stages, respectively.
According to a previous study [85], the digestive peptidase cathepsin B1 of T. regenti schistosomula (TrCB1) represents at least 6 distinct isoforms (TrCB1.1 to TrCB1.6). The isoforms TrCB1.5 and TrCB1.6 have the catalytic cysteine substituted by glycin and are thus inactive, and were previously reported to be the least abundant members of all six known isoforms [85]. Surprisingly, the present transcriptomic data indicate that the two genes encoding TrCB1.5 and TrCB1.6 have the highest transcription of any cathepsin B detected here in T. regenti. In metazoan organisms, the existence of inactive enzyme forms (also of peptidases) is quite common, and the percentage of expression of inactive/active isoforms usually ranges from 10% to 15%; inactive paralogs of peptidases are speculated to have a regulatory function, achieved by a competition with active peptidase forms for their substrate(s) [86]. Also present among the highly transcribed peptidases was cathepsin B2 (TrCB2; RSEM-expected counts: 4,386). Cathepsins B2 as well as B1 were previously identified as dominant peptidases produced by T. regenti schistosomula, with a major capacity to degrade myelin basic protein, one of the principal components of spinal cord tissues, whereas haemoglobinolytic activity was negligible [20,85,87]. These findings may explain the ability of schistosomula to migrate through the white matter of the spinal cord, composed predominantly of neuronal axons, and cause axonal damage (cf. [26]).
Abundant transcription (RSEM-expected counts: 12,349) in T. regenti was also linked to cathepsin L. This enzyme has broad specificity in various parasitic flatworms, and can assume various functions. Importantly, it can cleave a range of substrates, including collagen, laminin, fibronectin, haemoglobin and immunoglobulins [88][89][90], and is likely intimately involved in the migration of T. regenti schistosomula and their ability to modulate or evade host immune attack. The immunomodulatory effect of proteolytic enzymes has been investigated in other schistosomes, and the aspartic peptidase, cathepsin D, is recognized as a key representative that cleaves host IgG from the tegumental surface of adult worms [91]. Besides this immunomodulatory role, cathepsin D is known to participate in the primary cleavage of haemoglobin [92]. In the present study, cathepsin D was represented by nine highly up-regulated transcripts in the schistosomula of T. regenti. As schistosomula of this species do not feed on blood, this enzyme in this developmental stage might assume an imunomodulatory role, but this hypothesis needs to be tested. In summary, the high level of transcription associated with cathepsins in the schistosomulum of T. regenti provides further support for the significance of this group of enzymes in parasitic flatworms. In addition to cathepsins, schistosomula also transcribe genes encoding a broad array of proteolytic enzymes that participate in numerous biological processes and likely have key roles during the parasite's invasion, survival and longevity in the definitive host.
Lysosomal proteins. In addition to the cysteine peptidases discussed, a variety of other proteins linked to lipid processing, including lysophospholipase III, Niemann-Pick C2 protein, palmitoyl-protein thioesterase and saposin-like proteins (SAPs), were encoded in the transcriptome representing the schistosomulum stage of T. regenti. Palmitoyl-protein thioesterase is also encoded in the genomes of human schistosomes and C. sinensis [52][53][54]93] and is likely involved in the degradation of lipoproteins [94]. In relation to lipid trafficking, one transcript encoding a lipid-binding Niemann-Pick C2 protein, associated with cholesterol trafficking from the lysosome [95], was transcribed exclusively in T. regenti schistosomula. Genes encoding homologous proteins have been identified in the genomes of other trematodes, such as S. mansoni, S. japonicum, S. haematobium, C. sinensis and O. viverrini [52][53][54]93,96]. In addition, 10 different transcripts encoding SAPs were identified; SAPs can play diverse functional roles often through their interaction with lipids, lipid-degrading enzymes and lipid-presenting molecules [97], which may involve the activation of lipid-degrading enzymes, such as sphingolipid glycohydrolase [98]. Alternatively, SAPs might directly distrupt lipid membranes via the formation of pore-forming structures [99]. These proteins have been reported for trematodes including S. mansoni and Fasciola spp. and are presumed to facilitate the degradation of ingested host cells [100,101]. The high transcription of genes coding for proteins that interact with or bind to lipids might associate with the molecular machinery that T. regenti uses to degrade and then digest lipid-rich neural tissue, likely being critical for parasite migration and nutrition.
Interestingly, 38 different transcripts were predicted to encode hexosaminidase B, 30 were up-regulated in the schistosomula and 20 were uniquely transcribed in this stage. The analysis of enriched protein classes and pathways encoded by genes in schistosomula was biased by these different transcripts, which represent multiple KEGG classes, such that our interpretation is guarded. Although the function of hexosaminidase B in T. regenti is presently enigmatic, this enzyme is encoded in the genomes of human schistosomes [52][53][54] and O. viverrini [96]. We suspect that the high number of transcripts encoding this enzyme in T. regenti schistosomula might relate to multiple functions in this developmental stage. Based on independent evidence from humans, hexosaminidase can degrade sphingolipids (gangliosides) in neural tissue [102], and mutation of the hexosaminidase genes can lead to lethal neurodegenerative disorders (Tay-Sachs and Sandhoff or lysosomal storage disease) which are caused by an accumulation of gangliosides in the nervous system [103,104]. We suggest that an overexpression of hexosaminidase in T. regenti schistosomula leads to a degradation of neural tissue during their migration. In humans, only the alpha-subunit of hexosaminidase B is known to be involved in lipid degradation. As a corresponding alpha-subunit appears not to be encoded in T. regenti, the role of the hexosaminidase B should be explored. Although very little is known about hexosaminidases of helminths, homologs appear to have one or more roles in parasite-host interactions, for example, in the parasitic nematode Trichinella spiralis due to its specific sugarbinding property (lectin-like activity) or its specific glycohydrolase catalytic activity [105].
In conclusion, we describe here the first molecular exploration of neuropathogenic Trichobilharzia of birds and identify key biological pathways and proteins central to the invasion of the avian host and migration and development within neural tissues. Of particular significance is that we have been able to: (i) dissect the molecular differences between the cercarial stage (during cutaneous penetration) and the schistosomula stage (during neural migration and establishment within the avian host) and (ii) establish some biological distinctions between neuropathogenic T. regenti of birds and related blood flukes (schistosomes) of humans. The present annotated transcriptome for T. regenti provides a useful resource for comparative studies of schistosomatids and other trematodes, and should also underpin future genomic and proteomic investigations of T. regenti.
Supporting Information S1 Table. Summary of raw reads, trimmed reads, trimmed and normalized reads for the transcriptomic data of cercariae and schistosomula of Trichobilharzia regenti. Results of mapping of paired, trimmed, corrected, non-normalized reads from cercariae and schistosomula to the final non-redundant transcriptome of Trichobilharzia regenti. P1-pair read 1; P2-pair read 2; STD-Standard deviation (XLSX) S2 Table. Summary of the classification of predicted proteins from transcriptome of cercariae and schistosomula of Trichobilharzia regenti based on homology (BLASTp; Evalue 10 −5 ) to annotated proteins in the Kyoto Encyclopedia of Genes and Genomes (KEGG) BRITE functional hierarchies database, and pathway maps for cellular and organismal functions.  Table. Summary of transcripts upregulated in the schistosomulum stage of Trichobilharzia regenti based on read mapping to (consensus) transcriptome, using National Center for Biotechnology Information (NCBI) annotation. Sorted in descending order based on log2-fold change of differential transcription.