Systems Biology Studies of Adult Paragonimus Lung Flukes Facilitate the Identification of Immunodominant Parasite Antigens

Background Paragonimiasis is a food-borne trematode infection acquired by eating raw or undercooked crustaceans. It is a major public health problem in the far East, but it also occurs in South Asia, Africa, and in the Americas. Paragonimus worms cause chronic lung disease with cough, fever and hemoptysis that can be confused with tuberculosis or other non-parasitic diseases. Treatment is straightforward, but diagnosis is often delayed due to a lack of reliable parasitological or serodiagnostic tests. Hence, the purpose of this study was to use a systems biology approach to identify key parasite proteins that may be useful for development of improved diagnostic tests. Methodology/Principal Findings The transcriptome of adult Paragonimus kellicotti was sequenced with Illumina technology. Raw reads were pre-processed and assembled into 78,674 unique transcripts derived from 54,622 genetic loci, and 77,123 unique protein translations were predicted. A total of 2,555 predicted proteins (from 1,863 genetic loci) were verified by mass spectrometric analysis of total worm homogenate, including 63 proteins lacking homology to previously characterized sequences. Parasite proteins encoded by 321 transcripts (227 genetic loci) were reactive with antibodies from infected patients, as demonstrated by immunoaffinity purification and high-resolution liquid chromatography-mass spectrometry. Serodiagnostic candidates were prioritized based on several criteria, especially low conservation with proteins in other trematodes. Cysteine proteases, MFP6 proteins and myoglobins were abundant among the immunoreactive proteins, and these warrant further study as diagnostic candidates. Conclusions The transcriptome, proteome and immunolome of adult P. kellicotti represent a major advance in the study of Paragonimus species. These data provide a powerful foundation for translational research to develop improved diagnostic tests. Similar integrated approaches may be useful for identifying novel targets for drugs and vaccines in the future.


Introduction
Paragonimiasis is an important food-borne trematode infection (and a ''neglected tropical disease'') that is caused by lung flukes in the genus Paragonimus [1][2][3]. More than 50 Paragonimus species have been described, and nine species are known to infect humans. Human infections are most frequent in Asia (P. westermani, P. skrjabini, P. heterotremus, P. siamensis, P. miyazakiki), but they also occur in sub-Saharan Africa (P. uterobilateralis, P. africanus), and in the Americas (P. kellicotti, P. mexicanus) [1]. Approximately 21 million people are infected with Paragonimus worms [2], and some 293 million live in endemic areas where they are at risk of contracting the infection [3].
Paragonimus metacercariae enter the human host upon ingestion of raw or undercooked crustaceans. Metacercariae excyst, migrate out of the intestine, cross the diaphragm into the pleural space, and eventually invade the lungs where they mature and live for years in pulmonary cysts [1]. This results in a range of clinical symptoms, including cough, fever, weight loss, pleural effusion, chest pain, and bloody sputum [4]. These symptoms can be very similar to those seen in patients with tuberculosis, bacterial pneumonia, fungal infections, or lung cancer, so misdiagnosis is common [5][6][7]. For example, one study in the Philippines found P. westermani eggs rather than acid-fast bacilli in sputum samples from 26 of 160 (16%) patients with suspected tuberculosis [5]. Even in the US, the median time between onset of symptoms and diagnosis of recent P. kellicotti infections was approximately 12 weeks (range 3-38 weeks), and all of the patients were subjected to multiple, unnecessary medical interventions tailored to un-related diseases [8]. Once a proper diagnosis is made, parasites are easily cleared by a short course of the anthelmintic drug praziquantel, but infections can be fatal if left untreated [9].
Paragonimus infections are most often diagnosed by identification of parasite eggs in the stool or sputum (reviewed in [1]). Unfortunately, migrating parasites are capable of causing disease weeks or months before eggs production commences. Egg detection is also insensitive due to temporal inconsistencies and requires knowledge and expertise that are not readily available in many clinical settings. Serological tests for P. westermani and P. kellicotti using native parasite antigens have been described, but these tests are impractical for widespread use because they require continued access to adult parasites [8,10,11]. Thus far, efforts to develop and implement practical, standardized molecular diagnostic tools have been hindered by a lack of information on the basic biology and genomics of Paragonimus species.
According to the study outline presented in Figure 1, we sequenced and annotated the transcriptome of adult P. kellicotti to better understand this parasite at a molecular level and to facilitate proteomic analyses of both the total worm homogenate and of immunogenic proteins purified using IgG from P. kellicotti patient sera. The resulting sequence data led to the identification of proteins that are promising candidates for the development of novel (and much needed) serodiagnostic tests for paragonimiasis. In addition, the annotated transcriptome of adult P. kellicotti provides a valuable resource for molecular biological and translational research on paragonimiasis and related food-borne trematode infections.

Parasite material
Wild crayfish (genus Orconectes).3 cm in length were collected from small rivers in southern Missouri, USA. P. kellicotti metacercariae, identified by morphological examination, were isolated from the hearts of infected crayfish and introduced to Mongolian gerbils (Meriones unguiculatus) by intraperitoneal injection as previously described [12]. Gerbils were sacrificed 35-49 days post-infection, and egg-producing adult flukes were removed from lung cysts, rinsed in 16 phosphate buffered saline (PBS), and stored at 280uC prior to use in experiments.

RNA isolation and sequencing
Total RNA was isolated from two mature adult flukes using the PureLink RNA Mini Kit according to the manufacturer's microcentrifuge pestle protocol for animal tissues (Ambion, Austin, TX), and DNase treated using the TURBO DNA-free Kit (Ambion). cDNA was synthesized and sequenced as previously described [13]. Briefly, poly(A) RNA was selected from total RNA using the MicroPoly(A) Purist Kit (Ambion) and reverse transcribed using the Ovation RNA Amplification System V2 (NuGEN Technologies, Inc., San Carlos, CA). Paired-end, small fragment, Illumina libraries with insert sizes ranging from 180-380 bp were constructed and sequenced on an Illumina HiSeq2000 version 3 flow cell according to the manufacturer's recommended protocol (Illumina Inc., San Diego, CA). Raw reads were deposited in the NCBI sequence read archive under accession number SRX530756 (NCBI BioProject Accession: PRJNA179523).

RNAseq read processing and assembly
Raw reads were converted from bam to fastq format using Picard Tools' SamToFastq script (http://picard.sourceforge.net). cDNA synthesis and Illumina sequencing adapters were trimmed using Flexbar [14] and Trimmomatic [15], respectively. Trimmomatic was also used to perform sliding window quality trimming (5 bp window, average quality $20) and removal of reads less than 60 consecutive high quality bases and reads containing ambiguous base calls [15]. Reads with an average DUST score less than seven were removed using the filter_by_complexity script from the seq_crumbs package (http://bioinf.comav.upv.es/seq_crumbs/). Remaining reads were mapped against ribosomal RNA [16,17] and bacterial sequence databases [18] with Bowtie2 (version 2.1.0, default parameters, [19]) and against the human genome (hs37) and GenBank rodent database (gbrod, downloaded April 24, 2013) with Tophat2 (version 2.0.8, default parameters, [20]); all matching reads and their mates were excluded from further analysis. The remaining high quality P. kellicotti originated reads were assembled using the Trinity de novo RNAseq assembler [21] with default parameters. Modules within the Trinity software package were used to estimate transcript abundance and remove transcripts representing ,1% of the per-component expression level and ,1 transcript per million [21,22]. The RNAseq reads used for the assembly were re-mapped to the high-confidence transcripts with Bowtie2 (version 2.1.0, default parameters, [19]) and transcript breadth of coverage (defined as the percent of covered bases over the length of the reference transcript) was assessed using RefCov (http://gmt.genome.wustl.edu/genomeshipit/gmt-refcov/current/). Transcripts with ,99% breadth of coverage with RNAseq reads were removed, resulting in the final

Author Summary
Paragonimiasis is a food-borne trematode infection that people acquire when they eat raw or undercooked crustaceans. Disease symptoms (including cough, fever, blood in sputum, etc.) can be similar to those observed in patients with tuberculosis or bacterial pneumonia, frequently resulting in misdiagnosis. Although the infection is relatively easy to treat, diagnosis is complicated. Available diagnostic assays rely on total parasite homogenate to facilitate the detection of Paragonimus-specific antibodies in patients. Though these blot-based assays have shown high sensitivity and specificity, they are inconvenient because total parasite homogenate is not readily available. This study used next generation genomic and proteomic methods to identify transcripts and proteins expressed in adult Paragonimus flukes. We then used sera from patients infected with P. kellicotti to isolate immunoreactive proteins, and these were analyzed by mass spectrometry. The annotated transcriptome and the associated proteome of the antibody immune response represent a significant advance in research on Paragonimus. This information will be a valuable resource for further research on Paragonimus and paragonimiasis. Thus this project illustrates the potential power of employing systems biology for translational research in parasitology.
transcript set. Assembly statistics at each phase of filtering are given in Table S1. It is expected that the de novo assembly would over-estimate the number of transcripts and loci, so in-house PERL scripts were used to estimate fragmentation based on WU-BLAST alignments to protein coding sequences from closely related species as previously described [23]. Assembly fragmentation was calculated as the percentage of reference genes associated with multiple, non-overlapping BLAST hits.

Transcriptome annotation
All assembled transcript isoforms were compared to known protein sequences by BLASTX [24] against the GenBank Non-Redundant protein database (NR, downloaded April 15, 2014). Results were parsed to consider only top matches to nonoverlapping regions of the query with e-value less than 1e-05. Putative protein translations from the transcripts were predicted using Prot4EST [25]. Transmembrane domains and secretion peptides were predicted using Phobius [26,27]. Proteins were assigned to KEGG orthologus groups, biochemical pathways and pathway modules using KEGGscan [28] with KEGG release 68. Associations with known InterPro domains and Gene Ontology (GO) classifications were inferred from predicted protein sequences using InterProScan [29][30][31]. Functional enrichment of GO terms was calculated using FUNC with an adjusted p-value cutoff of 0.01 [32]. For FUNC analysis, the target list included the longest isoform of a given locus that contained the feature of interest against the background of the longest isoforms of all loci including the target list. All transcripts, predicted proteins, and associated annotations are available at Trematode.net (trematode. net/Paragonimus_kellicotti.html).
Immunoprecipitation and purification of P. kellicotti  exhibited symptoms consistent with paragonimiasis, tested positive for Paragonimus exposure using existing serological or parasitological diagnostic assays, and had no recent history of international travel. In all cases, sera were collected prior to treatment.
Patient sera were tested for reactivity against adult P. kellicotti and P. westermani antigen by Western blot as previously described [10]. Serum samples from five strongly-reactive patients were pooled (total volume 3 mL), and total IgG was precipitated using    Two mL Pierce NHS-active agarose slurry (Thermo Fisher Scientific) was added to a 2.0 mL spin column (Thermo Fisher Scientific), and rinsed with 2.0 mL water followed by 2.0 mL 16 PBS. Two mL of IgG precipitated from the paragonimiasis serum pool was added to the column and mixed for 2 hours at room temperature to couple IgG to the agarose. The column was washed once with 16 PBS, blocked with 1.0M ethanolamine pH 7.4 for 20 minutes at room temperature, and washed again with 16 PBS.
Approximately 720 mg of adult P. kellicotti total antigen was added to the column and incubated overnight at 4uC. Column was washed with 16 PBS, and immune complexes were eluted with Pierce IgG elution buffer (Thermo Scientific) in eight 1 mL fractions. Fractions were neutralized with 50 mL 1.0M Tris, pH 9.0, and 10 mL aliquots of each fraction were analyzed by Western blot as previously described using the pooled patient sera as the primary antibody [10]. The fraction with the highest concentration was precipitated using the 2D clean-up kit (GE Healthcare, Buckinghamshire, UK) and the pellet was solubilized in 20 mL 100 mM Tris-HCl, pH 8.5 with 8M urea to prepare peptides for mass spectrometry.

Digestion of proteins for mass spectrometry
The proteins that were eluted and denatured from the antibody coupled beads or from the GELFrEE protein fractions were reduced with 1 mM TCEP (Pierce) for 30 min, and alkylated with 20 mM Iodoacetamide (Sigma) at room temperature in the dark for 30 min. The reaction was quenched with 10 mM DTT (Sigma) for 15 min. Endoprotease Lys-C (Sigma) (5 mg) was added and the samples were digested in a barocycler (Pressure Biosciences) [36] for 30 min at 37uC, followed by dilution to 2M urea with the Tris buffer, addition of trypsin (Sigma) and barocycler digestion for 30 min at 37uC. The digest was acidified to 5% formic acid and peptides were desalted in parallel on Glygen Nutips containing C4 and graphite carbon solid phase on a Beckman Biomek (Biomek NXP), as previously described [37]. The eluted peptides were dried in a SpeedVac and dissolved in water/acetonitrile/formic acid (99%/1%/1%) and transferred to autosampler vials (SUNSRI Cat No. 200-046) for storage at 2 80uC or LC-MS analysis.
Peptides for LC-MS from the GELFrEE fractionation were prepared as described above with the following modification. The endoprotease digests were acidified to 1% TFA, filtered through a 30K MWCO filter (Sartorius VIVACON 500). Peptides were desalted on a SepPak cartridge (50 mg/1cc) (Waters), dried in a SpeedVac and transferred into the autosampler vials for LC-MS analysis.
Data acquisition was performed with a TripleTOF 5600+ mass spectrometer (AB SCIEX, Concord, ON) fitted with a Picoview Nanospray source (PV400)(New Objectives, Woburn, MA) and a 10 mm Silica PicoTip emitter (New Objectives, Woburn, MA). Data were acquired using an ion spray voltage of 2.9 kV, curtain gas of 20 PSI, nebulizer gas of 25 psi, and an interface heater temperature of 175uC. The MS was operated with a resolution of greater than or equal to 25,000 fwhm for TOFMS scans. For data dependent acquisition, survey scans were acquired in 250 mS from which 100 product ion scans were selected for MS2 acquisition for a dwell time of 20 mS. Precursor charge state selection was set at +2 to +5. The survey scan threshold was set to 100 counts per second. The total cycle time was fixed at 2.25 seconds. Four time bins were summed for each scan at a pulser frequency value of 15.4 kHz through monitoring of the 40 GHz multichannel TDC detector with four-anode/channel detection. A rolling collision energy was applied to all precursor ions for collision-induced dissociation using the equation CE~slope Ã m=zzintercept, where the slope for all charges above 2+ is 0.0625 and the intercept is 23,25 and 26 for 2+,3+, and 4+, respectively. Total IgG was purified and used to precipitate immunogenic proteins from total P. kellicotti homogenate. P. kellicotti proteins were eluted from the purification column in eight fractions, which were tested by Western blot using an aliquot of the same IgG used in the immunoprecipitation. Fraction 2 had the greatest protein concentration and was used in our mass spectrometry analysis. doi:10.1371/journal.pntd.0003242.g003 The raw LC-MS data (*.wiff) were converted to *.mzML format utilizing the AB SCIEX MS Data Converter v 1.3 (AB SCIEX, Foster City, CA) within PEAKS STUDIO 7.0 (Bioinformatics Solutions Inc., Waterloo, Canada). The resulting files were used for database searching by the PEAKS software using protein translations from the P. kellicotti transcriptome. The Ensembl Human protein database (Homo_sapiens.GRCh37.72) was used to identify human background proteins in the sample matrix. The searches were conducted with trypsin cleavage specificity, allowing 3 missed cleavages, oxidation of Met and carbamidomethylation of Cys as variable and constant modifications, respectively. A parent ion tolerance of 25 ppm and a fragment ion tolerance of 100 millimass units were used. The MS2-based peptide identifications were validated within PEAKS software using a modified target decoy approach, decoy fusion, to estimate the FDR [38]. A 1% FDR for peptide spectral matches was used as the quality filter to identify peptides and associated proteins. MS data are available from Trematode.net (trematode.net/Paragonimus_kellicotti.html) and PeptideAtlas (identifier PASS00555).

Results/Discussion
Characterizing the adult transcriptome of P. kellicotti Prior to this study, a total of 911 GenBank sequences were available from the genus Paragonimus, only seven of which were from P. kellicotti. Therefore, it was necessary to sequence, assemble and analyze the transcriptome of P. kellicotti to enable further study (Table 1). Approximately 70 million paired-end reads were generated from an adult P. kellicotti cDNA library on the Illumina HiSeq platform. Following removal of low quality and contaminant reads, 40 million read pairs and 18 million unpaired orphan reads were assembled into 78,674 highconfidence transcript isoforms with an average length of 560 bp. These were further clustered into 54,622 distinct genetic loci, 21.5% of which are associated with more than one transcript isoform (mean 3.0 transcript isoforms per alternatively spliced locus). We assume that the P. kellicotti genome contains a similar number of protein coding genes as other recently sequenced trematode genomes, which currently ranges from 10,852 in Schistosoma mansoni to 16,258 in Clonorchis sinenesis [39][40][41][42][43]. The discordance between the number of detected genetic loci and the expected number of genes is likely due to assembly fragmentation resulting in overestimation of the number of genes, a common problem seen in de novo transcriptome assemblies of short read data [44][45][46]. We calculated the fragmentation rate of our assembly at 25.8% using S. mansoni genes as a reference and at 31.4% using C. sinensis genes as a reference. The fragmentation rate is an estimate and it depends on the level of sequence conservation between the species of interest and species with available genome data; however, it is likely that at least 25.8-31.4% of all P. kellicotti genes represented in our assembly are split into two or more non-overlapping genetic loci.
Assembled transcripts were compared to known proteins originating from other species. A total of 32,201 transcript isoforms from 20,102 loci shared a sequence similarity with an e-value cut-off of better than 1e-05 (Table S2). A majority of the matches were to sequences from C. sinensis followed by Schistosoma species. This is not surprising, as these were the only trematodes with sequenced genomes at the time this study was conducted. P. kellicotti sequences shared an average 61.3% sequence identity with corresponding C. sinensis sequences at the protein level. There were just 165 P. westermani sequences included in GenBank-NR at the time of this study, so only 125 transcripts from 67 genetic loci had a top BLASTX hit to a P. westermani protein. The sequence identity shared between P. kellicotti and P. westermani high-scoring segment pairs was 79.8% at the protein level. P. kellicotti and P. westermani are not considered to be close relatives within the genus Paragonimus [47]; however, the identified high level of sequence conservation may help facilitate the design of pan-Paragonimus serological assays.
A total of 77,123 unique protein sequences were predicted from 54,616 of the detected genetic loci. Detailed annotations are available in Table S2. Predicted proteins from 11,116 genetic loci were associated with a total of 4,407 unique InterPro protein domains and 1,234 unique GO terms. The number of genetic loci associated with each molecular function term was tallied, and the most abundantly represented terms were related to protein, ATP and nucleic acid binding. Similarly, the biological processes with the highest representation were protein phosphorylation, metabolic process, and oxidation-reduction process. In a comparison between three trematode species, a total of 312 conserved domains were unique to P. kellicotti, while 305 and 218 were unique to C. sinensis and S. mansoni, respectively (Figure 2A). A majority of the domains present in each species were shared between all three species.
Predicted proteins from 18,028 transcripts/11,599 genetic loci were associated with 6,854 unique KEGG orthologous groups. These were further binned into 336 unique biochemical pathways and 284 pathway modules. The KEGG orthologous groups represented in the adult of transcriptome of P. kellicotti were compared to those represented in the draft genomes of C. sinensis and S. mansoni ( Figure 2B). Altogether, 620 P. kellicotti KEGG orthologous groups (KOs) were absent from the other trematodes; these were binned into 255 pathways and 97 modules, most of which were very sparsely populated with the P. kellicotti-specific KOs. A careful analysis failed to identify any complete or nearly complete pathways present in P. kellicotti but absent in the other trematodes. The coverage of specific KEGG pathways can be visualized and compared to other trematodes using the Trema-Path tool available at Trematode.net (http://trematode.net/TN_ frontpage.cgi?navbar_selection=comparative_genomics&subnav_ selection=tremapath).
Secreted proteins have an important role in the life cycle of tissue-migrating parasite species like P. kellicotti, facilitating interactions with the host. These proteins are of practical interest as diagnostic, vaccine, or drug targets. Proteins related to 1,610 genetic loci were annotated as potentially secreted based on the presence of a classical signal peptide for secretion and absence of a predicted transmembrane domain (Table S2). Seven GO terms were found to be enriched among predicted secreted proteins, with the most highly enriched term being related to cysteine protease activity ( Table 2). Proteases tend to be prevalent among trematode excretory-secretory products [48][49][50][51], and various reports have described their role in migration through host tissues, nutrient uptake, and immune evasion [52][53][54][55].
Characterizing the adult worm proteome of P. kellicotti Total parasite antigen was subjected to analysis by mass spectrometry to survey the worm proteome and subsequently to validate a subset of our assembled transcripts. A total of 244,048 spectra were matched to 25,405 database protein predictions that corresponded to 2,555 transcripts from 1,863 genetic loci (Table  S2). The verified proteins encompass 1,626 InterPro protein domains, 586 GO terms, 1,925 KEGG orthologous groups from 307 pathways and 198 pathway modules. Furthermore, 63 transcripts from 48 genetic loci with no annotation (i.e., no significant BLAST hit in NR or KEGG, conserved protein domain, GO term, etc.) were confirmed by the proteomic data. These sequences, thus far unique to P. kellicotti, might have otherwise been dismissed as low confidence transcripts due to the draft nature of the transcriptome assembly. However, proteomic evidence verified that these species-specific nucleotide sequences are translated and that they may have important biological functions in P. kellicotti.
In order to obtain an estimate of abundance, identified proteins were ranked according to associated spectral counts. Given the draft nature of the transcriptome and the known issue of fragmentation, attempts were not made to correct for protein size, so follow up experiments would be required to assess abundance in a more robust and quantitative manner. The 25 proteins with highest spectral counts (Table 3) included actins, myoglobins, chaperone proteins, and yolk ferritins, and these proteins may be abundant in the parasites. Oxygen binding proteins such as myoglobin are vital to parasite survival, as an exceptionally high affinity for their substrate allows the parasite to scavenge oxygen from host blood and tissues [56]. The high abundance of myoglobin proteins in our analysis may serve as an indication of the importance of aerobic respiration in P. kellicotti.
Identification of potential serodiagnostic antigens using antibodies from patient sera Serodiagnostic assays based on worm homogenate have been shown to sensitively and specifically detect an immune response to P. westermani and P. kellicotti [8,10,11]. In these assays, total parasite protein antigens are analyzed by SDS PAGE gel electrophoresis, transferred to a membrane, and exposed to patient serum. Doublet bands appearing at 21/23 kDa and a more diffuse band at 34 kDa are indicative of exposure to Paragonimus species (Figure 3 and [10]). However, the identity of these proteins was not known.
An unusual cluster of cases of paragonimiasis (caused by P. kellicotti) occurred in recent years in the state of Missouri [8,57,58]. Since helminth infections are uncommon in Missouri, sera from these patients contain antibodies to Paragonimus antigens, but they are unlikely to contain antibodies to antigens of other helminths. These sera represented an excellent resource for our study. P. kellicotti proteins recognized by total IgG from some of these patients were enriched by immunoprecipitation using affinity beads. Eluate fractions were assessed by Western blot (Figure 3), and the strongest fraction was analyzed by mass spectrometry. A total of 2,406 spectra were matched to 1,443 proteins predicted from the transcriptome assembly that corresponded to 321 transcripts from 227 genetic loci (Table S2). Some 212 of these 227 loci were also detected in our analysis of the total worm proteome. Thus, the whole parasite proteome provided useful supplementary information to the immunoprecipitated proteins. The 25 most abundant proteins bound by patient IgG (as approximated by spectral counts) are listed in Table 4. Most of the translations predicted from the transcriptome represent a fraction of the full length of the deduced protein. Therefore, it is challenging to determine with certainty which of these might represent the antigen present in the 21/23 kDa or 34 kDa bands. Nonetheless, several of the proteins on this list are of interest as potential serodiagnostic antigens.
Five of the highly abundant immunoreactive proteins (Table 4), Pk00394_txpt2, Pk45107_txpt2, Pk48549_txpt1, Pk24571_txpt1, and Pk42039_txpt2 are putative cysteine proteases. Translations from three of these transcripts (Pk00394_txpt2, Pk48549_txpt1, and Pk42039_txpt2) are predicted to have molecular weights in the range of 35-36 kDa, close in size to the diffuse ,34 kDa antigen detected by serodiagnostic Western blots with total native parasite antigen (Figure 3). The predictions of 35-36 kDa are only estimates and may not represent the full length of the protein. However, the predicted molecular weights of top BLASTX hits of these proteins are in the same size range (36)(37), and this indicates that the P. kellicotti sequences we have are complete or nearly so. Recombinant cysteine proteases have shown promise as serodiagnostic antigens for trematode infections [59][60][61][62][63], and a previous study reported that partially purified cysteine proteases from P. westermani excretory-secretory products were superior for antibody diagnosis compared to whole worm antigen extracts [64]. Two of the most abundant proteins identified in the mass spectrometry analysis of our P. kellicotti immunoprecipitate, Pk00394_txpt2 and Pk48549_txpt1, share 86% sequence identity at the amino acid level. These proteins are similar to cysteine proteases from other P. westermani and, to a lesser extent, helminths of other genera. By selecting a specific region from these cysteine proteases, it may be possible to develop an assay that discriminates between Paragonimus species and other helminths. A recombinant cysteine protease from P. westermani, rPwCP2, has already shown promise as diagnostic antigen [62], but this sequence (gi:42516556) has no homolog in our P. kellicotti transcriptome. Thus, the cysteine proteases identified in our study may be more useful as a pan-Paragonimus diagnostic reagent than those previously described.
Other proteins on our top-25 list (Table 4), such as the MF6p proteins and myoglobins, have not been considered as serodiagnostic antigens, but they are abundant excretory-secretory products of trematodes and merit further exploration. For example, Pk39524_txpt1 is annotated as a putative MF6p protein.
Its top BLAST hit was recently characterized as a heme-binding protein and is a major antigen secreted by F. hepatica [65]. The P. kellicotti orthologue only shares 57% sequence identity with the F. hepatica protein, so cross-reactivity with antibodies in patients with fascioliasis should not be a major problem. Orthologs from other Paragonimus species have not yet been reported, so it is not possible to assess the potential utility of this protein as a pan-genus diagnostic reagent at this time. However, Pk34178_txpt1, a putative myoglobin 1, shares 90% sequence identity with an ortholog in P. westermani, but significantly less similarity with orthologs from other trematode species (Figure 4), strongly indicating that this candidate is worth further attention to examine its diagnostic utility.

Conclusions
We undertook a systems biology approach to comprehensively study the adult transcriptome and proteome of P. kellicotti to improve understanding of the protein composition of the adult parasite and potential interactions between the parasite and its mammalian host. The transcriptome of adult P. kellicotti represents a major advance in the study of Paragonimus species. Transcriptomes provide powerful foundations for translational research in parasitology to develop improved diagnostic tests, treatments, and vaccines. In this study, transcriptome data was used together with immunoaffinity chromatography and mass spectrometry to efficiently identify candidate diagnostic antigens. Similar integrated approaches may be useful for identifying novel targets for drugs and vaccines. Finally, the data generated in this study (transcriptome, proteome, and immunolome) represent a valuable resource for the research community, and it will be especially helpful for annotating genomes of Paragonimus spp. as they become available.

Supporting Information
Table S1 Transcriptome assembly statistics at various stages of filtering. Information is provided on the content and completeness of the P. kellicotti transcriptome assembly after each filtering step. (DOCX) Table S2 Annotation of the P. kellicotti transcriptome assembly. Information on the annotation of assembled transcripts is provided here. This includes the top NR BLASTX hit, InterPro protein domains, gene ontology terms, KEGG orthologous groups, biochemical pathways, pathway modules, transmembrane domains and secretion signals. The numbers of MS peptides and spectral counts associated with each transcript are also provided. (DOCX)