Massively Parallel Sequencing and Analysis of the Necator americanus Transcriptome

Background The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses. Methodology/Principal Findings A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%), proteinase inhibitors (7.8%) or calcium-binding EF-hand proteins (6.7%). Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%), oxidative phosphorylation (63%) and/or proteases (60%); most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins) were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins) or amino acid metabolism (e.g., asparagine t-RNA synthetase). Conclusions This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human hookworm, A. duodenale, for both fundamental and applied purposes, including the prevalidation of anti-hookworm drug targets.


Introduction
Soil-transmitted helminths ( = geohelminths) are responsible for neglected tropical diseases (NTDs) mostly in developing countries [1]. In particular, the blood-feeding hookworms Necator americanus and Ancylostoma duodenale (Nematoda) infect ,740 million people in rural areas of the tropics and subtropics [2], causing an estimated disease burden of 22 million disability-adjusted life years (DALYs) [3]. Geographically, N. americanus is the most widely distributed hookworm of humans globally [4]. The life cycle is direct, with thin-shelled eggs passed in the faeces from the infected host. Under suitable environmental conditions (e.g., 26uC and 100% humidity; [5]), the eggs hatch and develop through two free-living larval stages to the infective, third-stage (L3; filariform) larvae. The latter larvae penetrate human skin and migrate via the circulatory system and lung to finally reside as adults usually in the duodenum. The adult stages attach by their buccal capsule to the intestinal mucosa, rupture capillaries and feed on blood. The pathogenesis of hookworm disease is mainly a consequence of the blood loss, which occurs during attachment and feeding. The disease ( = necatoriasis) is commonly characterized by iron-deficiency anaemia, which can cause physical and mental retardation and sometimes deaths in children, adverse maternal-foetal outcomes [6][7] and, in chronically infected individuals, can result in a significant alteration of their immune response to helminths [8].
Traditionally, the control of hookworm disease has relied mostly on the treatment of infected individuals with anthelmintics, such as albendazole, mebendazole, pyrantel pamoate and/or levamisole. With mass treatment strategies now in place in a number of countries [9][10], there is an increased potential for hookworms to develop genetic resistance against the compounds administered, if they are used excessively and at suboptimal dosages. Thus, given the experience with drug resistance in parasitic nematodes of livestock [11], it is prudent to maintain a continual focus on the discovery of novel drugs against hookworms of humans. Such a discovery effort could be underpinned by an integrated genomicbioinformatic approach, using functional genomic and phenomic information available for the free-living nematode Caenorhabditis elegans (see WormBase; www.wormbase.org). This nematode, which is the best characterized metazoan organism [12][13], is considered to be relatively closely related to nematodes of the order Strongylida (to which hookworms belong) [14]. Current evidence indicates that ,60% of genes in strongylids (or hookworms) have orthologues/homologues in C. elegans [15][16], and that a range of biological pathways is conserved between strongylid nematodes/hookworms and this free-living nematode [17][18][19][20]. Therefore, conducting comparative explorations of molecular data sets between these nematodes should identify nematode-specific biological pathways, which, if essential for the development and survival, could provide new targets for nematocidal drugs.
Next generation sequencing technologies, such as ABI-SOLiD, Illumina/Solexa (www.illumina.com; [21]), Helicos (www.helicosbio. com; [22]) and 454/Roche (www.454.com; [23]), together with the recent progress in bioinformatics, are providing unique opportunities for the high-throughput transcriptomic and genomic explorations of nematodes in far more detail than previously possible [24] and at a substantially lower cost than using conventional (Sanger) sequencing. To date, genomic and molecular studies of hookworms have mainly involved the canine hookworm, Ancylostoma caninum [19,[25][26][27], because of its use as a model for human hookworms [27][28]. In contrast, genomic datasets for N. americanus are scant, representing a major constraint to progress in molecular research of this nematode [4]. In the present study, we (i) conducted a detailed exploration and functional annotation of the transcriptome of the adult stage of N. americanus by 454 sequencing coupled to semi-automated bioinformatic analyses, (ii) compared the transcriptome of N. americanus to currently available transcriptomic data for A. caninum, and (iii) inferred the essentiality of key genes and gene products in order to predict putative drug targets.

Accession numbers
The nucleotide sequence data produced for this study are available in the GenBank database under accession SRA012052. The contigs assembled from these data can be requested from the primary author or are available at www.nematode.net.

Parasite material
The ''Shanghai strain'' of N. americanus (kindly provided by Drs Bin Zhan and Peter Hotez) was produced in golden hamsters (Mesocricetus auratus; infected for 94 days) at the Universidade Federal de Minas Gerais, Brazil. The infection experiment was conducted according the animal ethics guidelines of the Universidade Federal de Minas Gerais.

RNA isolation, cDNA synthesis and 454 sequencing
Total RNA from 30 adult worms was prepared using TRIzol Reagent (GibcoBRL, Life Technologies, USA) following the manufacturer's instructions and then treated with Ambion Turbo DNase (Ambion/Applied Biosystems, Austin, TX). The integrity of the RNA was verified using the Bioanalyzer 2100 (Agilent Technologies, USA), and the yield determined using the NanoDrop ND-1000 UV-VIS spectrophotometer v.3.2.1 (Nano-Drop Technologies, Wilmington, DE). The cDNA library was constructed using the SMART TM kit (Clontech/Takara Bio, CA) from ,100ng of total RNA. An optimized PCR cycling protocol (over 20 cycles) was used to amplify full-length cDNAs, employing primers complementary to the SMART IIA-Probe and custom oligo(dT), and the Advantage-HF 2 polymerase mix (Clontech/ Takara). The cDNA was normalized by denaturation-reassociation, treated with duplex-specific nuclease (Trimmer kit, Evrogen, CA) and amplified over 11 cycles. Subsequently, the 59-and 39adaptors were removed by digestion with the exonuclease Mme1 and streptavidin-coated paramagnetic beads [29]. The normalized cDNA (500-700 bases) was then amplified using 9 cycles of Long and Accurate (LA)-PCR [30] and then sequenced in a Genome Sequencer TM (GS) Titanium FLX instrument (Roche Diagnostics) employing a standard protocol [23].

Bioinformatic analyses
Expressed sequence tags (ESTs) generated from the normalised cDNA library for N. americanus were assembled and annotated using a standard bioinformatic pipeline [31]. Briefly, sequences were aligned and assembled using the Contig Assembly Program v.3 (CAP3; [32], employing a minimum sequence overlap length of 50 nucleotides and an identity threshold of 95%. ESTs (n = 2,200; www.ncbi.nlm.nih.gov) from adult N. americanus available from previous studies [4,16,33,34] were included for comparative analysis. Following the pre-processing of the ESTs, contigs and singletons from the present dataset were subjected to analysis by BLASTx (NCBI, www.ncbi.nlm.nih.gov) and BLASTn

Author Summary
The blood-feeding hookworm Necator americanus infects hundreds of millions of people. To elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of adult Necator americanus was studied using next-generation sequencing and in silico analyses. Contigs (n = 19,997) were assembled from the sequence data; 6,771 of them had known orthologues in the freeliving nematode Caenorhabditis elegans, and most encoded proteins with WD40 repeats (10.6%), proteinase inhibitors (7.8%) or calcium-binding EF-hand proteins (6.7%). Bioinformatic analyses inferred that C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%), oxidative phosphorylation (63%) and/or proteases (60%). Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. Essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have human homologues. These candidate targets were inferred to be linked to mitochondrial metabolism or amino acid synthesis. This investigation provides detailed insights into the transcriptome of the adult stage of N. americanus.

Prediction of essentiality and drug targets
All protein sequences predicted from contigs for N. americanus were compared with protein sequences available in the OrthoMCL 2.0 database (www.OrthoMCL.org) by BLASTp (e-value cut off of ,1e-05). A subset of C. elegans protein homologues was then selected based on: (i) an association with a lethal RNAi phenotype; (ii) the presence/absence of gene paralogues (based on OrthoMCL orthology grouping); and (iii) GO annotation to terms linked to enzyme or G protein-coupled receptor (GPCR) activity (i.e., GO:0003824 or GO:0004930, or a sub-term thereof). The following information was obtained: (i) network connectivity score (cf. http://www.functionalnet.org/ wormnet/Wormnet_v1_index.html; see [45]); (ii) presence of mammalian orthologues (based on OrthoMCL orthology grouping (iii) essentiality information (i.e. association with non-wildtype RNAi phenotypes) in other model organisms (including Saccharomyces cerevisiae, Mus musculus and Drosophila melanogaster) based on OrthoMCL groups. Each predicted drug target was selected based on (i) the presence of orthologues linked to non-wildtype RNAi or mutant phenotypes in S. cerevisiae, M. musculus and D. melanogaster, (ii) the absence of orthologues/homologues from the human host and (iii) its network connectivity score [45].
To predict the potential of selected C. elegans orthologues of N. americanus contigs as drug targets ( = ''druggability''), the InterPro domains inferred from the predicted proteins were compared with those linked to known small molecular drugs which follow the 'Lipinsky rule of 5' regarding bioavailability [46,47]. Similarly, GO terms inferred from the predicted proteins were mapped to Enzyme Commission (EC) numbers, and a list of enzyme-targeting drugs was compiled based on data available in the BRENDA database (www.brenda-enzymes.info; [48,49]). The C. elegans orthologues included in the list were ranked according to the 'severity' of the non-wild-type RNAi phenotypes (i.e. adult lethal, embryonic and/or larval lethal, sterile and other defects) in C. elegans (cf. www.wormbase.org) defined in previous studies [50,51].

Discussion
Next-generation sequencing and integrated bioinformatic analyses have provided detailed and biologically relevant insights into the transcriptome of the adult stage of N. americanus. A total of 12,799 ORFs were inferred from the present EST dataset, thus increasing the number of predicted proteins currently available (for this stage/species) in public databases by approximately 27fold [4]. Amongst the InterPro domains identified, 'WD40', 'proteinase inhibitors' and 'EF-hand' motifs were the most abundant, followed by 'proteases' and 'protein kinases'. WD40 repeats (also known as WD or beta-transducin repeats) are short (,40 amino acid) motifs found in the proteomes of all eukaryotes and implicated in a variety of functions, ranging from signal transduction and transcription regulation to cell-cycle control and apoptosis [52,53]. WD40 motifs act as sites for protein-protein interactions; proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of a transient interplay with other proteins, such as the ubiquitin ligases, involved in the onset of the anaphase during cell mitosis [54]. Similarly, proteins containing 'EF-hand' domains are involved in a number of protein-protein interactions regulated by various specialized systems (e.g., Golgi system, voltagedependent calcium channels and calcium transporters) for the uptake and release of calcium, which acts as a secondary messenger for their activation [55]. In C. elegans, both EF-hand and WD40 proteins are known to be required for the maturation of the nervous system and the formation of ciliated sensory neurons, in particular of the chemoreceptors located in the amphids [8,56]. The amphids of parasitic nematodes are, besides having the chemoreceptive activity, also known to play a role as secretory organs, primarily to provide an appropriate substrate for the transmission of neuronal potentials [57]. However, in N. americanus, a group of specialized amphidial neuronal cells ( = amphidial glands; [57]) expresses a group of aspartic proteases (i.e. cathepsin D-like Na-APR-1 and Na-APR-2) which are proposed to degrade host haemoglobin and serum proteins in the buccal capsule of adult worms [58]. In the dog hookworm, A. caninum, the amphidial glands have also been shown to produce a proteinase inhibitor (called 'ancylostomatin') that acts as an anticoagulant to promote the flow of host blood and tissue fluids into the buccal capsule and the intestine of the parasite [59]. Although proteinase inhibitors, such as the 'kunitz-type' molecules, were significantly more abundant in the transcriptome of adult N. americanus [4,33] than Ancylostoma spp., they have been better characterized in the latter parasites [60][61][62][63] for which both single and multiple kunitz-domain proteins have been described [61].  For instance, a cDNA coding a single kunitz-domain proteinase inhibitor (named AceKI-1) was isolated from A. ceylanicum. The corresponding recombinant protein has been shown to act as a tight-binding inhibitor of the serine proteases chymotrypsin, pancreatic elastase, neutrophil elastase and trypsin [60] and confers partial protection against hookworm-associated growth delay in hamsters [62]. Recently, a kunitz-type cDNA was shown to be enriched in the adult male of A. braziliense [63]. Although their precise biological function remains to be determined, kunitztype proteinase inhibitors of hookworms appear to play pivotal roles in preventing homeostasis and inhibiting host proteases (e.g., pancreatic and intestinal enzymes; [60,64]). Proteases were also highly represented in the transcriptome of N. americanus (6.1%) as well as that of A. caninum (4.6%) (see Table 2). These proteases included cysteine, aspartic and metalloproteases, which are known to function in multi-enzyme cascades GO   to digest haemoglobin and other serum proteins [65,66]. In N. americanus, cysteine proteases with high sequence homology to the protein cathepsin B were localized to the gut of adult worms and the corresponding mRNAs shown to be upregulated in the adult stage compared with the infective L3 stage, thus strongly suggesting that these enzymes are involved in blood-feeding [67]. In A. caninum, a cysteine protease (Ac-CP-1) with 86% amino acid sequence identity to those characterized in N. americanus, was shown to be expressed in the cephalic and excretory glands [68] and was detected in the excretory/secretory products (ES) [69] of adult worms; thus, it has been proposed that Ac-CP-1 functions as an extracorporeal digestive enzyme at the site of attachment [67]. Another cysteine protease (Ac-CP-2) was localized to the brush border membrane of the intestine and demonstrated to be involved in the digestion of haemoglobin [65]. The N. americanus homologue of Ac-CP-2 (i.e. Na-CP-2) digests haemoglobin [66] and, expressed as a recombinant protein in Escherichia coli and injected subcutaneously into experimental hamsters, has been shown to induce a significant reduction in adult worm burden following challenge infection with L3s of N. americanus [28], suggesting that the immunogenic response directed against this protein severely impairs the digestion of host proteins by the adult worms. However, recently, a cathepsin-like cysteine protease has been isolated and characterized in the human filarial nematode Brugia malayi and shown by double-stranded RNAi to play an essential role in the early development and maturation of embryos of this nematode [70]. Therefore, it is possible that the abundant transcripts encoding proteases in both adult N. americanus and A. caninum also reflect a key role of these enzymes in embryogenesis. Proteases have also been isolated from larval stages of both A. caninum and N. americanus [71,72]. For instance, a metalloprotease in ES of the activated third-stage larvae (L3) of A. caninum has been characterized and demonstrated to be released specifically in response to stimuli that induce feeding [73]. The corresponding cDNA, isolated from an L3 expression library, encoded a zincmetalloprotease (Ac-MTP-1) of the astacin family, that has been proposed to (i) regulate developmental changes associated with the transition from the free-living to the parasitic L3 and the subsequent moult to the fourth-stage larva (L4) [72]; (ii) activate host TGF-ß during the infection, which, in turn, could stimulate parasite development directly, determine tissue predilection sites [74] and/or inhibit neutrophil infiltration at the site of penetration [75]; and, (iii) facilitate skin penetration or tissue migration by the invading L3 [72,76] and/or degrade the cuticular proteins of the sheath surrounding the infective, free-living L3 [77]. In N. americanus, serine proteases have been isolated from ES of the L3 stage and proposed to play a central role in the evasion of the host immune response [71]. Interestingly, a significant number (n = 135, 30%) of N. americanus proteases and protease inhibitors of N. americanus were not predicted to possess signal peptides indicative of secretion (cf. Tables 1 and 2). The likely explanation for this result is technical and would appear to relate to a 39-bias in sequence reads [78], thus affecting the prediction of ORFs as well as the identification of signal peptide sequences at the 59-ends. Other groups of molecules, such as Ancylostoma-secreted proteins (ASPs), have been proposed to have an immunomodulatory function during the invasion of the host, the migration through tissues, attachment to the intestinal wall and blood-feeding [79]. In the present study, ASPs were amongst the ten most abundant groups of molecules in the N. americanus dataset, and are most abundant in A. caninum (cf. Table 2). ASPs belong to a large group of proteins, the 'sperm-coating protein (SCP)-like extracellular proteins', also called SCP/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS; Pfam accession number no. PF00188), characterized by the presence of a single or double 'SCP-like extracellular domain' (InterPro: IPR014044). In A. caninum, double and a single SCPdomain ASPs, called Ac-ASP-1 and Ac-ASP-2, respectively, were identified as major components of ES from serum-activated, infective L3s and proposed to be secreted in response to one or more host-specific signals during the infection process [80,81], as also hypothesized in a transcriptomic analysis of serum-activated L3s [19]. In N. americanus, homologues of Ac-ASP-1 and Ac-ASP-2 (i.e Na-ASP-1 and Na-ASP-2, respectively) have been identified in the L3 stage [82][83][84]. Results from crystallography [85], combined with the observation that Na-ASP-2 induces neutrophil and monocyte migration [86], suggest that this molecule has a role as an antagonistic ligand of complement receptor 3 (CR3) and alters the immune cascade by preventing the binding of chemotaxin [85]. Because of its immunogenic properties, Na-ASP-2 is under investigation as a vaccine candidate against necatoriasis [7,28,81,87]. In adult A. caninum, at least four other ASPs have been identified to date and named Ac-ASP-3, Ac-ASP-4, Ac-ASP-5 and Ac-ASP-6 [72]. Another SCP/TAPS molecule, designated neutrophil inhibitor factor (NIF), has been isolated and shown to play an immunomodulatory role by blocking the adhesion of activated neutrophils to vascular endothelial cells and the subsequent release of H 2 O 2 from activated neutrophils [88] and by interfering with the function of integrin receptors located on the cell surface, which results in the inhibition of platelet aggregation and adhesion [89]. Subsequently, NIF was shown to be transcribed abundantly in the intestines of both A. caninum and N. americanus [34]. The present study revealed that, although highly represented in the transcriptome of adult N. americanus, ASPs were much more abundant in A. caninum (cf. Results section). One of the possible explanations for this finding is that, although the A. caninum dataset was generated from adult worms recovered from their natural host (i.e. dog), the specimens of N. americanus were recovered from a Chinese strain of the golden hamster (M. auratus), which is not a natural host for this parasite [90,91]. Indeed, adults of N. americanus recovered from hamsters with patent infections are smaller and less fecund than from the human host [91]. These phenetic differences in this parasite might be associated with variation in transcriptional profiles. However, the difference in prevalence of particular transcripts, such as those of asps, between A. caninum and N. americanus might reflect their distinct roles in the modulation of the host immune response between the two hookworms, an hypothesis that requires testing.
A benefit of investigating the transcriptome of parasitic nematodes using predictive algorithms is that potential drug targets can be inferred and/or prioritized. The present study identified a subset of 278 'druggable' proteins, of which 18 did not match any human homologues (cf. Results section). Of these 18 molecules, mitochondrial-associated proteins were significantly represented (i.e. encoded by the C. elegans orthologues W01a8.4, ucr-1, F26E4.6 and Y71H2aM.4; cf. Table 4). Mitochondria are essential organelles with central roles in diverse cellular processes, such as apoptosis, energy production via oxidative phosphorylation, ion homeostasis, and the synthesis of haeme, lipid, amino acids, and iron-sulfur ions [92]. In C. elegans, defects in the mitochondrial respiratory chain lead to or are associated with a wide variety of abnormalities, including embryonic, larval and adult lethality, sterility and embryonic defects [92]. Despite their essential roles in numerous fundamental biological processes, knowledge of mitochondrial genes and proteins in parasitic nematodes has been utilized mainly to study their systematics, population genetics and ecology [93][94][95]. However, that some mitochondrial-associated proteins are predicted to be essential in N. americanus and significantly different from human homologues Transcriptome of Adult Necator americanus www.plosntds.org provides a context for the discovery of new drug targets in mitochondrial pathways and chemical compounds that disrupt these pathways [95,96]. Amongst the other N. americanus orthologues of essential C. elegans genes, nrs-2 encodes an asparaginyl-tRNA synthetase (AsnRS), which is a class II aminoacyl-tRNA synthetase that catalyzes the attachment of asparagine to its cognate tRNA and is required for protein biosynthesis [97]; loss of nrs-2 function via RNAi has been shown to result in a number of phenotypes, including adult and larval lethality and/or larval arrest [97]. In parasitic nematodes, information on amino acid biosynthesis is limited [98]. Although a number of parasitic helminths, including the nematode Heligmosomoides polygyrus [sic. H. bakeri] and the trematode Fasciola hepatica, have been reported to excrete asparagine during in vitro incubation [99,100], the role of asparagine synthetases in essential biological processes is currently unknown. However, in a study investigating the molecular mechanisms of induced cell differentiation in human pro-myelocytic leukemia, asparagine synthetase transcription was reported to be significantly reduced in maturing monocytes/macrophages [101]; therefore, an active role of asparagine synthetases in the development and growth of cancer cells has been suggested, which led to the hypothesis that the induction of a down-regulation of asparagine synthetases might be a new strategy for the treatment of blast cell leukaemia [102]. This finding raises questions about the role(s) of asparagine synthetases in cell differentiation and maturation in parasitic nematodes and the potential of inhibitors of these enzymes as anti-hookworm drugs.
The present study has provided new insights into the transcriptome of N. americanus, elucidated similarities and differences between the transcriptomes of N. americanus and the related canine hookworm, A. caninum, and predicted a panel of novel drug targets and nematocides. All except one of the essential 'druggable' proteins (n = 18) inferred for N. americanus were present in the A. caninum (and C. elegans) but not in the mammalian hosts, suggesting relative sequence conservation for these targets among nematodes. The prediction of such targets is particularly important, considering the risk of emerging drug resistance in parasitic nematodes [102,103]. Clearly, transcriptomic and genomic studies, such as that carried out here can facilitate and expedite the prevalidation of targets for nematocidal drugs, although the lack of genomic and transcriptomic data for many nematodes, including the human hookworm A. duodenale, impairs the comparative exploration of essential biological pathways in parasitic nematodes of major public health significance [6]. Furthermore, the present analysis has inferred qualitative and quantitative differences in the transcriptome between N. americanus and A. caninum, raising questions as to the suitability of the latter species as a model for the former. Although these differences require experimental validation, there is a need to define the transcriptome of A. duodenale as a foundation for comparative investigations with a perspective on the identification of new and hookworm-specific drug targets.