A comparative ‘omics’ approach for prediction of candidate Strongyloides stercoralis diagnostic coproantigens

Human infection with the intestinal nematode Strongyloides stercoralis is persistent unless effectively treated, and potentially fatal in immunosuppressed individuals. Epidemiological data are lacking, partially due to inadequate diagnosis. A rapid antigen detection test is a priority for population surveillance, validating cure after treatment, and for screening prior to immunosuppression. We used a targeted analysis of open access ‘omics’ data sets and used online predictors to identify S. stercoralis proteins that are predicted to be present in infected stool, Strongyloides-specific, and antigenic. Transcriptomic data from gut and non-gut dwelling life cycle stages of S. stercoralis revealed 328 proteins that are differentially expressed. Strongyloides ratti proteomic data for excreted and secreted (E/S) proteins were matched to S. stercoralis, giving 1,057 orthologues. Five parasitism-associated protein families (SCP/TAPS, prolyl oligopeptidase, transthyretin-like, aspartic peptidase, acetylcholinesterase) were compared phylogenetically between S. stercoralis and outgroups, and proteins with least homology to the outgroups were selected. Proteins that overlapped between the transcriptomic and proteomic datasets were analysed by multiple sequence alignment, epitope prediction and 3D structure modelling to reveal S. stercoralis candidate peptide/protein coproantigens. We describe 22 candidates from seven genes, across all five protein families for further investigation as potential S. stercoralis diagnostic coproantigens, identified using open access data and freely-available protein analysis tools. This powerful approach can be applied to many parasitic infections with ‘omic’ data to accelerate development of specific diagnostic assays for laboratory or point-of-care field application.


Introduction
The intestinal nematode Strongyloides stercoralis is a soil transmitted helminth (STH) prevalent in faecally-contaminated, humid soils in tropical and sub-tropical regions. Strongyloidiasis is estimated to affect up to 40% of people in many endemic regions [1,2]. Infection occurs when infective third stage (L3) larvae penetrate the skin. The parasitic adult female resides in the epithelium of the duodenum where it feeds on host tissue. Although clinical signs may be mild or non-specific, long term infection by Strongyloides can have significant impact on quality of life and child development and progress to severe and fatal disease [3][4][5].
Strongyloides stercoralis is unusual among human parasitic nematodes in that it can complete its life cycle within the host, thus sustaining infection for decades if untreated [6][7][8]. During reduced immune competence due to immunosuppression, such as corticosteroid treatment of co-morbidities, or HTLV-1 co-infection, very large numbers of larvae may undergo this autoinfective cycle, causing hyperinfection or disseminated strongyloidiasis, both with a high fatality rate [9][10][11]. This autoinfective lifecycle means that infections will fully reestablish if worms are not completely cleared from the host during treatment. Diagnosis of strongyloidiasis and validation of cure after treatment are therefore imperative.
Treatment with the first-line drug ivermectin has a reported efficacy of between 57% and 100%. However, accurate determination of cure depends on follow-up and the diagnostic method used [8,10,12]. Albendazole and mebendazole, used to treat infection with other STH are less effective or ineffective against Strongyloides [13][14][15]. Moxidectin has shown effectiveness against S. stercoralis that is equivalent to ivermectin, and trials continue to evaluate it further [16,17].
There is no single gold-standard diagnostic method for strongyloidiasis. Options include microscopy of cultured or incubated fresh stool samples, such as Koga agar plate culture [18] or Baermann funnel. qPCR on extracted stool DNA is used in research and highly resourced laboratories, and provides a reliable confirmation test but may not improve sensitivity over larval detection methods if used for screening [19]. Sensitivity of PCR may be affected by DNA extraction method and choice of PCR reagents [20], as well as irregular larval excretion [12]. Serology, detecting antibodies to either whole worm or recombinant antigens NIE and SsIR, has the highest sensitivity of the diagnostic options; ranging from 72 to 98%, and 90 to 99% specificity, when compared with microscopic detection as gold standard. Specific antigen assays achieve accuracy at the upper end of the ranges and continue to be improved upon [21][22][23]. While serology can indicate cure, this requires re-sampling several months after effective treatment [24][25][26]. Therefore, there is a need for a rapid diagnostic test (RDT) that can be used for screening as well as for timely confirmation of cure. A coproantigen-based assay would fulfil this need, because it may achieve high sensitivity and specificity, and test for active infection. Pilots of such assays have shown proof of principle under research conditions using antibodies against Strongyloides ratti [27,28], Strongyloides venezuelensis [29,30] and S. stercoralis somatic, excretory/secretory (E/S) or faeces-derived antigens [31], as reviewed by Blachandra et al. (2021) [32]. However, identification of specific Strongyloides protein antigens would enable production of standardised diagnostic tests on a large scale.
The wealth of 'omic' data now available in the public domain, coupled with online protein analysis tools, enables a computational approach to antigen discovery. This concept was termed reverse vaccinology when used in vaccine candidate discovery [33]. The approach begins with genomic analysis, as opposed to biochemical or serological methods and it also has significant potential in diagnostic antigen discovery. It has the advantages of not requiring culture of the organism and of revealing therapeutically or diagnostically relevant antigens that may be less abundant or difficult to purify in vitro, such as those from host-dwelling life stages of pathogens. Incorporation of transcriptomic data can inform candidate gene/protein selection in parasites with multiple life stages, such as S. stercoralis [33].
Our approach was facilitated by the publication of the genomes of S. stercoralis and three related Strongyloides species [34]. Here, we have applied a series of computational analyses to open access transcriptomic, genomic and proteomic data from Strongyloides species and other helminths. We have used common bioinformatic tools, to identify Strongyloides protein antigens that may be diagnostic targets detectable in human stool, using a coproantigen capture RDT.

Data sources
This study used data obtained from public databases, including Wormbase ParaSite, a resource for parasitic worm genomics curated by the Sanger Institute and the EMBL European Bioinformatics Institute [35,36]. Full details of data sources are listed in Table 1. Strongyloides stercoralis transcript sequences identified by the prefix 'SSTP' can be obtained via UniProtKB (www.uniprot.org) or WormBase ParaSite (WBPS: www.parasite.wormbase.org).

Overview of method
Three criteria were applied for candidate antigen selection: predicted presence in infected stool; specificity to Strongyloides and/or S. stercoralis; predicted antigenicity, to facilitate raising sensitive antibodies. Datasets and computational analyses, all open-access, used to make this selection are detailed in Fig 1.

Evidence for the presence of candidate coproantigens in stool
Evidence for the presence of candidate coproantigens in stool was considered to be from proteins that are associated with parasitism and gut-dwelling life stages of Strongyloides and therefore with maximal chances of being released into the host gut lumen. For this we used openaccess transcriptomic and proteomic data (Fig 1, boxes A and B).
Analysis of the Strongyloides genomes by Hunt et al. (2016) [38] was used to identify protein families associated with parasitism. Five parasitism-associated protein families ('priority protein families'), of suitable size to be put through our pipeline, were our focus for identifying coproantigens: sperm-coating-proteins/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS), transthyretin-like (TTL), acetylcholinesterase (AChE), prolyl oligopeptidase (POP) and aspartic peptidases. The large astacin, and small proteinase inhibitor, families were excluded from our analysis in this study, because they were not amenable to the manual sections of the pipeline, either by being too many or too few sequences for comparison.
Transcriptomics. To reveal candidate coproantigens, we used transcriptomic data from Stoltzfus et al. (2012) [37], which analysed transcriptomes of 7 life stages of S. stercoralis (Figs 2 and 1, box A). RNA data were downloaded from the National Centre for Biotechnology Information (NCBI) Sequence Read Archive (SRA), with triplicate reads from each stage. We grouped these data by life stage and by presence or absence within the host gut, thus representing stages excreting or secreting antigens into human stool (Fig 2).
We calculated relative abundance of transcripts using RSEM [39] and bowtie2 [40] and subsequently separated RNA data from the 4 non-gut-dwelling stages and 3 gut-dwelling stages into two groups. Differential gene expression between the two groups was analysed using ebseq in RSEM (Fig 1, box C). We selected genes that were differentially expressed with a 100% credibility interval (CI), in either direction, between gut-dwelling and non-gut-dwelling life stages.
Differentially expressed protein family identification. ClustalW [41] was used to perform multiple alignment of the differentially expressed (DE) genes and to produce a phylogenetic tree, which was annotated with iTOL [42]. The tree, labelled only with S. stercoralis gene accession numbers, facilitated grouping of the DE genes into protein/gene families but protein identities remained unknown at this stage (Fig 1, box D).

Excretory/Secretory proteomics
Source of proteomic data. E/S data was used in conjunction with gene expression data, to build evidence for the candidate coproantigens (Fig 1, box B). Soblik et al. (2011) [48] submitted excretory/secretory (E/S) material of S. ratti parasitic females to mass spectrometry and identified the constituent proteins. In their presentation of the S. ratti genome, Hunt et al. (2016) [34] re-analysed the spectral data and obtained protein identities from corresponding genomic data of S. ratti. We acquired the list of parasitic female E/S proteins, with S. ratti genome accession numbers and protein identities, from Supplementary

S. stercoralis orthologues to the S. ratti excretory/secretory proteome
At the time of this study, there were no E/S proteomic data available for S. stercoralis. Therefore, we obtained S. stercoralis orthologues of the S. ratti E/S proteins by searching the S. ratti E/S proteins against a custom blast+ database consisting of the S. stercoralis protein file (WBPS v8), using blastp with word size 2 and e-value -50 (Fig 1, box F). S. stercoralis hits, in the form of accession numbers, were extracted from the resulting table and duplicated hits removed. Corresponding S. stercoralis amino acid sequences were extracted from the S. stercoralis protein file using samtools. VENNY 2.1 [50] was used to reveal the S. stercoralis accession numbers that occurred in both the DE proteins and the E/S orthologues. All the E/S orthologues were submitted to BlastKOALA as before, to obtain protein family identities, as well as matching them with the protein identities reported for the original S. ratti E/S proteins, by Hunt et al. (2016) [34] (Fig 1, box G). Separately, differential gene expression data from analysis of the Stoltzfus et al. (2012) [37] dataset were extracted for the E/S orthologues that occurred in both datasets (Fig 1, box H).
Signal peptide prediction for evidence that a protein is secreted was performed on the S. stercoralis DE proteins and E/S orthologues using SignalP 4.1 [51] (Fig 1, box I).

Specificity of candidate coproantigens to Strongyloides
We used phylogenetic comparison to indicate S. stercoralis proteins with least homology to those of other relevant species, followed by multiple sequence alignment to identify exact regions of specificity (Fig 1, boxes J, K, L).  Table 1. https://doi.org/10.1371/journal.pntd.0010777.g002 A custom blast+ database was created from the genome-derived proteomes of selected outgroup species (Table 2 and Fig 1, box J). The outgroups were selected to represent parasitic nematodes, including those that commonly co-infect with S. stercoralis, non-parasitic nematodes, as well as trematodes, cestodes and human, for a broad analysis of homology. Human protein/coding sequence (CDS) data were provided by the Human Genome Project at the Wellcome Trust Sanger Institute and obtained from Ensembl.
Specificity of differentially expressed proteins to S. stercoralis. Specificity of DE proteins to Strongyloides was assessed by their clustering in a phylogenetic tree after alignment with orthologues from the outgroups (Fig 1, box L). The S. stercoralis DE proteins that resulted from the transcriptomic comparison were searched as separate protein families, against the custom outgroup database with blast+ criteria: word size 2 and e-value of -5 or -10, as appropriate, to obtain about 100 to 1,000 hits (S1 File). In cases where there were very few DE proteins in a particular family, the DE protein(s) were also searched against a custom database consisting of only S. stercoralis genome-derived proteins (S1 File). This was intended to increase the number of S. stercoralis proteins to enable species-specific clusters to be revealed on phylogenetic trees.
Protein hits from each outgroup, and the additional S. stercoralis hits where appropriate, were aligned with their respective original blast queries by multiple sequence alignment (MSA) using ClustalW, and phylogenetic trees were constructed. Trees were annotated using iTOL [42] to show proteins from each outgroup species in a different colour. S. stercoralis proteins which formed a distinct cluster, or clusters, on each phylogenetic tree were viewed in MSA along with the most and least similar proteins from each of the outgroups on that tree. These ClustalW alignments were analysed by eye for Strongyloides and S. stercoralis-specific regions which were then submitted to BLASTP search against the NCBI non-redundant (nr), and Nematoda (taxid: 6231) databases, as relevant, to validate their specificity (Fig 1, box N).  [55] were used to predict epitopes within the DE proteins and E/S proteome orthologues (Fig 1, box M). A BepiPred threshold of 1.3 (range -4 to 4) was selected for maximum specificity of 96%, with corresponding 13% sensitivity, of predicted epitopes in order to minimise the chance of false positive predictions. Minimum length was 9 amino acids with no maximum. In the differentially expressed proteins, longer sequences with an overall very high epitope score were allowed to contain small regions scoring below 1.3.
Bcepred criteria were based on the reported highest accuracy of 58.7% which was achieved using a threshold of 2.38 for the average score of four amino acid properties: hydrophobicity, flexibility, polarity and exposed surface. In addition to BepiPred 1.0 and Bcepred, we also used BepiPred version 2.0 [56] for certain candidate antigens that had already been identified by the pipeline. This version became available only after the majority of the analysis and offered improved prediction of conformational epitopes. BepiPred 2.0 was used with the same epitope length criteria and an epitope score threshold of 0.55 (range 0 to 1) which provided specificity of 81.7% and sensitivity of 29.2% on epitope predictions.
Outputs from the two prediction tools were compared, initially for proteins present in both outputs. The predicted epitope regions of these proteins were then examined for sequence overlap. Prior to selection as candidate antigens, predicted epitopes were assessed for their specificity to Strongyloides (Fig 1, box N). Sequences were searched using BLASTP against the NCBI nr database. The "expect threshold" in BLASTP was increased if no results were obtained with default parameters. BLASTP output was examined by eye for the sequence identity and biological relevance, i.e. likelihood of presence in a human stool sample.
3D modelling. Selected proteins of interest containing predicted epitopes, Strongyloidesspecific regions, and in a priority protein family, were submitted to Phyre2 [57] for 3D structure modelling against known crystal structures, using the intensive mode (Fig 1, box O). UCSF Chimera [58] was used to visualise and annotate 3D models to highlight specific sequences of interest on the model. Glycosylation prediction. N-linked glycosylation was predicted with NetNGlyc [59] to account for the potential of a glycan to obscure protein antigen regions, or conversely to contribute to antigenicity (Fig 1, box P). The prediction tool identified asparagine (N) residues with a high probability of being glycosylated via their amide nitrogen. Prediction was based on the motif N-X-S/T, where X is any residue except proline (P), and along with the presence of a signal peptide or trans-membrane domain on that protein, this indicates that potential glycosylation sites are likely to be glycosylated. Intracellular, intramembrane regions, or signal peptides of a protein are unlikely to be glycosylated. If present, a glycosylation site close to a candidate antigen region on the 3D protein could indicate that the protein is less likely to be accessible to antibodies in a capture assay and therefore a lower priority candidate, pending in vitro screening. Hunt et al. (2016) [38] identified seven protein families associated with Strongyloides parasitism. For our coproantigen search, we focused on 5 of these that were amenable size to be put through our pipeline, namely: sperm coating protein/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS), transthyretin-like (TTL), acetylcholinesterase (AChE), prolyl oligopeptidase (POP) and aspartic peptidases.

Differential gene expression in gut life stages
Of a total of 13,098 S. stercoralis genes identified in RNA-seq data by Stoltzfus et al. (2012) [37] we found 328 which were differentially expressed with a 100% CI between gut-dwelling and non-gut-dwelling life stages according to our groupings (Figs 3 and 1, box C). Of these, 198 (60.4%) contained a signal peptide (S2 File and Fig 1, box I). Of the 328 DE genes, 203 were more highly expressed in gut-dwelling life stages than non-gut and included proteins representing each of the 5 priority protein families analysed here (S2 File, columns B and C). These were therefore the focus of our coproantigen search (Fig 3 and S2 File). Twenty-eight protein families were identified among the differentially expressed (DE) proteins, accounting for 193 (58.8%) of the proteins, with the remainder either not identified (22%) or given a disorder prediction (19.2%), indicating that they do not have a fixed conformation and are difficult to assign to a particular function or family (Figs 3 and 1, boxes D,E).

Excretory/Secretory proteome
In the absence of a S. stercoralis excretory/secretory (E/S) proteome, we identified 1,057 S. stercoralis proteins that had high homology to the 584 proteins in the published E/S proteome of S. ratti [34], of which 325 (30.7%) contained signal peptides (Fig 1, boxes F and I). Original S. ratti E/S proteins were given as 582 accession numbers, however two were found to have alternative isoforms which are also included here. Multiple sequence alignment indicated that 550 (94.2%) of the 584 S. ratti E/S proteins had a S. stercoralis orthologue at the selected similarity level (e-value 1E-50) and that 284 (51.6%) of these had multiple homologues in S. stercoralis (S3 File).
To identify possible S. stercoralis E/S proteins among those differentially expressed in the host gut, we compared the 1,057 S. stercoralis orthologues with the 328 identified by transcriptomic data as DE in gut-dwelling life stages of S. ratti. Seventy seven (23.5%) proteins were shared between both data sets, of which 58 (28.6%) were shared between the 203 gut-stage DE proteins and the E/S orthologues (Fig 1, box H).
To investigate gene expression of the E/S orthologues, we extracted data for the 1,057 S. stercoralis proteins from RSEM analysis of the entire 13,098 active S. stercoralis genes (Fig 1,  box H). Protein families were assigned by a combination of BlastKOALA, which identified 537 (50.8%) of the E/S orthologues, and the S. ratti protein identifications provided by Hunt et al. (2016) [34] (S4 File and Fig 1, box G). Relevant proteins were assigned to the five 'priority protein families' (Fig 3).

Predicted epitopes
Within the 328 confidently identified DE S. strongyloides proteins, BepiPred and bcepred jointly predicted epitopes in 104 proteins, 78 and 62 proteins respectively, with 36 proteins containing epitopes predicted by both tools (S2 File, Figs 3, and 1, box M). Within the 78 proteins, BepiPred predicted 125 epitopes, and within 62 proteins bcepred predicted 108 epitopes (S5 File). Fifty six epitopes contained overlap or identity between the two prediction tools (S5 File). Predicted epitopes ranged from 9 residues to entire proteins of up to 651 amino acids (aa) with BepiPred, and 8 to 66 aa with bcepred. These regions were given greater scrutiny in the context of species specificity and antibody accessibility (Fig 1, box N).
S. stercoralis orthologues (n = 1,057) of the S. ratti E/S proteome were also submitted to BepiPred and bcepred which predicted 747 and 62 epitope regions respectively, in a total of 324 of the proteins. These ranged from 9 to 99 residues and contained 49 epitope sequences that overlapped, originating from 40 proteins (S5 File and Fig 1, box M).

S. stercoralis-specific candidates identified by phylogenetic comparisons
The five S. stercoralis protein families linked to parasitism and among the DE proteins were analysed for S. stercoralis genus or species-specificity. Separate BLAST searches of each protein family against 15 outgroups revealed the S. stercoralis proteins most likely to contain specific regions (Figs 4 and 1, box K).
S. stercoralis proteins from the clusters identified in the phylogenetic trees were examined in alignment with outgroup representative homologues with the most and least similarity (Fig  1, box L). From the alignments, the S. stercoralis regions with least homology to outgroups were selected as candidate antigens. In some cases these included the entire protein. Results of this analysis are detailed below by protein family.
SCP/TAPS coproantigen candidates. Twenty one differentially expressed proteins were identified as definite (n = 19) or possible (n = 2) members of the SCP/TAPS protein family ( Fig 3 and S2 File). The two 'possible' SCP/TAPS proteins clustered separately among CAPdomain proteins (Fig 3) and were therefore not included as definite members of this protein family. In phylogenetic analysis following alignment against outgroup genomes, seven of the 19 SCP/TAPS proteins formed a cluster of higher S. stercoralis specificity (Fig 4A, arrowed) (SSTP_0001008500, 8600, 8700, 8900, 0000511800, 512000, 513400). Multiple sequence alignment (MSA) and BLAST searching showed that these SCP/TAPS proteins contained several regions of apparent S. stercoralis species specificity, however, most were more highly expressed in non-gut life stages, such as tissue-migrating larvae. In total 8 of the 19 were expressed more in gut-dwelling life stages. Therefore SCP/TAPS from outside the cluster (in Fig 4A) were also viewed in MSA. A species-specific region was identified in gut-stage DE protein, SSTP_0000990000, consisting of a large 381 aa sequence of this protein which is upregulated in the parasitic female life stage (Table 3). Transthyretin-like coproantigen candidates. Fourteen (14) TTL proteins were differentially expressed between gut and non-gut-dwelling life stages with 100% CI, all of which were expressed more in gut-dwelling life stages, almost exclusively in parasitic female worms (Fig 3  and S2 File). Twelve of these proteins grouped as a cluster while the other 2 were more similar to other protein families or features (Fig 3). When aligned against homologous proteins from the outgroup species, 11 TTL proteins clustered together (Fig 4B, arrowed). All 14 TTL proteins were inspected visually in sequence alignment with selected outgroup proteins and five (SSTP_0000700800, 700900, 1222000, 0485800, 1133200) showed greater specificity to S. stercoralis and low sequence similarity to any of the outgroup homologues. However, when many of the possible Strongyloides-specific regions were BLAST searched separately, they were not sufficiently specific to be used as coproantigens. In particular, the search revealed that the amino acid sequence VTCDGKPL in protein SSTP_0000485800 is conserved across many nematode genera and should therefore be avoided in any candidate antigen. Two of the TTL proteins (SSTP_0001222000 and SSTP_0000700800) did contain one or more regions of Strongyloides species or genus specificity, giving a total of 6 TTL candidate coproantigens (Table 3).
In addition to high Strongyloides-specificity across its whole sequence of 177aa, TTL protein SSTP_0000700800 also contained predicted epitopes. To investigate the position of these, the protein was 3D modelled to a template generated from TTR-52 protein of C. elegans (PDB accession: 3UAF) (Figs 5 and 1, box O). This model aligned to 89 residues (aa 2-90; 50% of the sequence) with 99.9% confidence. This protein contained a predicted glycosylation site at position N165 and, although not predicted to have a signal peptide, it was indicated as an 'extracellular or secreted' protein in UniProt [60] and is a known E/S protein family, therefore it is more likely to be glycosylated (method Fig 1, box P). The whole sequence of 177 amino acids was selected as a candidate coproantigen due to its Strongyloides specificity, high expression by parasitic females and the presence of several predicted epitopes (Table 3).
Acetylcholinesterase (AChE) coproantigen candidates. Nineteen AChE proteins were in the DE dataset, of which 18 were expressed more highly in gut-dwelling stages, particularly in parasitic female worms (Fig 3). The other AChE protein was highly expressed in infectious and tissue-migrating larvae (S2 File). All 19 proteins formed a cluster when compared with other DE protein families, indicating greater sequence similarity. The 19 proteins were then aligned with BLAST hits from outgroup species and most grouped into two distinct clusters in the corresponding phylogenetic tree, one cluster of 4 S. stercoralis proteins, the other of 10 ( Fig  4C, arrowed), all of which were DE in gut-dwelling life stages. The cluster of 4 consisted of SSTP_0000274700, 671000, 638700 and 670800, all of which contained several regions of potential S. stercoralis specificity by visual inspection of the alignment. Equally, the 10 proteins within the other cluster contained multiple regions of possible Strongyloides specificity which were then analysed by BLAST to confirm this specificity. For AChE, 10 candidate antigens originated from 2 proteins: SSTP_0000274700 and 509400 (Table 3).
From the larger cluster in Fig 4C, one of the sources of candidate antigens, AChE protein SSTP_0000509400 was modelled with 100% confidence to 6 templates that jointly covered aa's 17-551 (95%) with about 30% sequence identity. Peptides with potential S. stercoralis speciesspecific sequences were annotated on the model to view their surface exposure (Fig 6). BepiPred 1.0 and bcepred both failed to identify epitopes in this protein. However, BepiPred 2.0 did predict epitope regions, with moderate specificity and surface exposure. This AChE protein contained multiple potential glycosylation sites, three of which, at positions 38, 89 and 319, were predicted with high confidence on this known glycoprotein (Fig 6).
Aspartic peptidase coproantigen candidates. Only 4 aspartic peptidases appeared in the DE proteins, all of which were expressed more highly in gut-dwelling life stages. Two of these were less reliably identified in this protein family so the other two (SSTP_0000164500 and 164700), with very high expression levels in parasitic female worms and both orthologues of E/ S proteins, were analysed for S. stercoralis specificity. An additional 9 homologous proteins from S. stercoralis itself were included and all 11 proteins were then compared with the outgroups (Fig 4D, arrowed). The two DE proteins had low sequence similarity with each other. However, BLAST searching indicated that most of the possible species-specific regions lacked Strongyloides specificity and only one suitable peptide was identified from SSTP_0000164500 (Table 3). 3D modelling of this protein could not be achieved with any reliability and there were no predicted epitopes in either of the DE aspartic peptidases. The only candidate coproantigen from this protein family was selected for its Strongyloides specificity, high expression and likely presence in E/S material.
Prolyl oligopeptidase (POP) coproantigen candidates. The 5 differentially expressed POP proteins, all very highly expressed in parasitic females, were combined with a further 10 S. stercoralis homologues. In the phylogenetic tree, all 5 clustered with other Strongyloides proteins and away from non-Strongyloides outgroups (Fig 4E). Four of the DE proteins (SSTP_0000289100, 1108800, 1108500, 1019400) had higher species-specificity, clustering with other S. stercoralis homologues (Fig 4E, arrowed). One of these 4 proteins, SSTP_0000289100 contained two Bepipred-predicted epitopes and was very highly expressed in parasitic females, as well as being in the E/S orthologues (Fig 3 and S2 File). However, the epitope regions had moderate sequence identity to other nematodes, a Staphylococcus species and to Plasmodium vivax, suggesting widespread conservation. Therefore, these peptides were not considered sufficiently specific to Strongyloides.
Analysis of the POP proteins in multiple sequence alignment and subsequent search for similar proteins using BLASTP against NCBI nr and Nematoda databases, revealed several regions of high specificity to S. stercoralis in one of the DE proteins mentioned above (SSTP_0001108800). Therefore, although it did not contain predicted epitopes, this protein was orthologous to E/S proteins and Strongyloides-specific peptides were selected as candidate coproantigens (Table 3). Amino acid motifs conserved across genera, and therefore not Strongyloides specific, were removed from the sequences originally selected from the MSA, these included: DKLEN, KTDSK, RNAH and DIFAFI. Table 3 presents selected candidate coproantigen protein regions that satisfy the criteria of being present in stool, specific to Strongyloides or S. stercoralis, and being antigenic. The full amino acid sequences of all candidate antigens are given in FASTA format in S6 File.

Discussion
The paucity of data on S. stercoralis infection prevalence and its low profile compared with the other STH species are largely due to inadequate diagnostics and lack of a single gold standard. While serology has the highest sensitivity for active disease, it is unsuitable for monitoring treatment outcome or defining cure in a timely manner [23,24]. Incomplete cure and reinfection post-treatment may occur [8,61,62]. Therefore, we aimed to identify specific diagnostic targets from the nematode that could be captured by a rapid antigen detection test on stool samples. Such coproantigen assays are commercially available for Giardia and Cryptosporidium and have been developed for a wide range of human and animal parasites including Ascaris [63], Fasciola [64], Echinococcus [65], S. ratti [28], S. venezuelensis [29,30], Opisthorcis [66], Toxocara [67] and Entamoeba histolytica [68], among others. These assays employ either somatic, E/S material, or known antigens as targets.
We used open access data sources, published literature and freely-accessible online protein analysis tools to shortlist candidate antigens, based on the three criteria: presence in infected stool; Strongyloides or S. stercoralis specificity, and antigenicity. A similar study by Culma (2021) investigated cellular location, antigenicity and allergenicity, among other features, in the S. stercoralis proteome to identify potential vaccine and diagnostic targets [69]. Recently, Dishnica et al (2023) [70] published the somatic proteome of S. stercoralis iL3 isolated from a clinical case and conducted a similar computational pipeline to discover possible novel serological antigens. While their main candidates do not overlap with those from the present study, three appear in the DE dataset (S2 File) as they were differentially expressed more highly in non-gut-dwelling life stages, predominantly iL3, as would be expected (Q9UA16/ SSTP_0001008900, A0A0K0E2F4/SSTP_0000367400, A0A0K0DTP5/SSTP_0000060800), and three proteins were present in the E/S orthologues (A0A0K0E6J0/SSTP_0000511900, Q9UA16/SSTP_0001008900, A0A0K0DTP5/SSTP_0000060800. This suggests that there may be overlap between somatic and E/S proteomes which is worth investigating in future studies. Here, we focused on proteins that were differentially expressed between gut-dwelling and non-gut-dwelling life stages of S. stercoralis, according to RNA-seq data [37]. Seven protein families have been identified by other studies as expanded in the genomes of parasitic nematodes, and upregulated in parasitic life stages [38]. Studies of the Ancylostoma hookworm E/S proteome and S. venezuelensis and S. stercoralis somatic larval proteomes detected some of the same families (SCP/TAPS, proteases and TTL) indicating possible presence in stool [70][71][72]. However, specific proteins within these families must still be selected as candidate antigens because every individual protein is not expressed simultaneously. To identify coproantigen diagnostic targets, we have performed a detailed analysis of 5 of the 7 parasitism-associated protein families. We also identified S. stercoralis orthologues of the S. ratti E/S proteome [34,48].
We found an overlap of 77 proteins (5.9% of the total) between DE proteins and E/S orthologues. This limited overlap may reflect post-transcriptional control of expression [34]. Thus, grouping together transcriptomic data of life stages with different gene expression profiles may have limited our resolution of stage-specific coproantigens. E/S proteomics are an alternative starting point for coproantigen discovery. However, when we viewed gene expression data for all the E/S orthologues, no single life stage accounted for all the parasitic female E/S proteome. Although the S. stercoralis E/S orthologues broadly represented the S. ratti E/S proteome dataset, there were 1,057 compared to the original S. ratti 550 dataset. Therefore, differential gene expression between species is also likely to be a complicating factor and ultimately, the E/ S proteome of the species of interest would be most suitable for coproantigen discovery. However, shared epitopes between these closely related species still warrant the use of all available data [52].

Priority protein families and species specificity
Phylogenetic trees of S. stercoralis DE proteins and their homologues from a selection of outgroups focused our analysis on individual proteins from the priority protein families. These proteins were more likely to contain species-or genus-specific regions.
SCP/TAPS. The first of the priority protein families analysed here, SCP/TAPS, is among the CAP domain-containing proteins and is proposed to have a role in modulating the host immune response [38]. Here, we identified 21 SCP/TAPS proteins and 6 additional CAPdomain-containing proteins among the DE proteins. Many of the species-specific proteins or regions in this protein family, were highly expressed in tissue migrating larvae, rather than gut-dwelling life stages, therefore, only one protein expressed in adult female worms was the source of candidate antigens. This protein family has been studied in other parasitic nematodes, particularly the hookworms Ancylostoma caninum and Necator americanus, in which it has numerous members [73,74]. Multiple of these SCP/TAPS proteins, which include the NIE antigen, have been demonstrated to be antigenic and recognised by anti-E/S antisera, showing that they do possess at least two of the characteristics predicted by this pathway [73].
Transthyretin-like proteins. Transthyretin-like proteins have been identified in the E/S material of nematodes from 5 major clades, including parasitic and non-parasitic species [75]. Two TTL proteins were sources of candidate coproantigens in the present study, both of which were expressed highly in the parasitic female worm and very little in other life stages. Neither of these DE proteins were among the E/S orthologues, however other TTL proteins were, and were expressed highly across multiple life stages. In addition, a TTL protein has been identified as a potential vaccine or diagnostic candidate through a similar bioinformatic pipeline, adding to the evidence for this protein family as a source of diagnostic targets [69]. Various functions have been identified for TTL proteins including involvement in larval development, innate immunity of the nematode itself, apoptosis, and subduing the immune response to favour worm survival [76][77][78].
Acetylcholinesterase. Secreted AChE has a possible role in enabling certain parasitic helminths to evade host expulsion mechanisms from mucosal surfaces [38, 79,80]. The transcriptomic and E/S proteomic data strongly supported this, with AChE family proteins in the E/S orthologues being expressed almost exclusively in the parasitic female life stage (S4 File). Secreted AChE differs from the neuromuscular protein in structure, gene family and substrate, being less specific to acetylcholine [79]. We found moderate homology between the predicted epitope regions of a S. stercoralis AChE and other nematode species. We identified candidate antigens in two AChE proteins. One of the candidate antigen proteins (SSTP_0000274700), contained 8 regions of Strongyloides specificity, when less specific regions were excluded, comprising over 50% of the full length 607 aa protein.
Aspartic peptidase. The aspartic peptidase family of enzymes is named for the creation of the active site from aspartic acid residues. Very few aspartic peptidases were DE, but had high expression in parasitic females and homologues to S. ratti E/S proteins. Aspartic peptidases play numerous roles in parasitic helminths including digestion, immune evasion and tissue invasion, among others linked to feeding within the host gut epithelium where the adult worm resides [81]. We identified a single candidate antigen from the few examined aspartic peptidases which had considerable homology to aspartic proteases from other Strongyloides species, but little to any other relevant species, whereas the full protein had high homology to multiple relevant nematode species including hookworm.
Prolyl oligopeptidase. Five POP family proteins were differentially expressed between gut and non-gut dwelling life stages. All 5 were very highly expressed in parasitic females where they may be involved in defending against the host immune and parasite expulsion response. In the trematode Schistosoma mansoni, POP enzymes have been found to cleave peptide hormones and neuropeptides [82]. Inhibiting POP activity in S. ratti lead to immobility in a concentration-dependent manner, even in in vitro conditions, indicating that this protein family is also vital to worm survival [48]. Only one of the 5 POP proteins contained regions of sufficient Strongyloides specificity for consideration as a coproantigen.
In addition to the shorter, specific sequences listed here, there is broader potential to express whole recombinant proteins such as those indicating higher Strongyloides specificity, in order to raise polyclonal antibodies for screening. Alternatively, specific monoclonal antibodies could be developed against these whole proteins and again, screened for Strongyloides specificity, as conducted by Abduhaleem et al. (2019) with a monoclonal antibody raised against S. ratti somatic antigens [83].

Epitope prediction
We performed epitope prediction on the DE proteins and E/S proteome homologues using two open access online tools, which yielded many predicted epitope peptides. An alternative to this would be to scan the entire genome for epitopes, a method applied to vaccine candidate discovery [84]. The challenge faced by this approach is the complexity of conformational epitopes compared with linear peptide epitopes. Antibodies frequently bind to conformational epitopes formed by the 3D structure of the antigen, which therefore cannot easily be detected by sequence analysis alone [56].
The availability of 3D protein models can assist with selecting conformational epitopes by modelling a sequence onto the structure of a homologous protein and revealing adjacent amino acids on the surface of the protein [85]. Models do not necessarily have high sequence identity to the query sequence but this does not decrease the confidence in the model. Confidence in 3D models of >90% indicates that the protein adopts the overall folds of the model but may differ from the native protein in surface loops [57], thus this method is not guaranteed, but provides a good indication for selecting candidate antigens. Ab initio-modelled regions, where the sequence was not covered by the model, have very poor accuracy and should therefore be interpreted with caution and not used as the sole basis for selecting conformational epitopes. The field of 3D structure modelling has been progressed in leaps by the use of machine learning [86,87], with AlphaFold recently being applied to Strongyloides proteomic data [70].
We saw differences in predicted epitopes between the computational versus 'by eye' approach to selecting epitope regions. The DE protein dataset contained longer predicted epitopes due to the decision, where relevant, to extend predicted epitopes across a short region of lower epitope score whereas the computational selection worked only on the exact score threshold and would not join two adjacent high scoring regions.
Glycoprotein antigens were not considered in this study, apart from the presence of potential N-linked glycosylation sites on candidate protein antigens. Glycans form existing speciesspecific, highly antigenic diagnostic antigens, including CCA and CAA of Schistosoma mansoni and Schistosoma genus trematodes respectively [88], and LAM of Mycobacterium tuberculosis [89]. They have also been implicated in lysate seroantigen of S. stercoralis [90]. In helminths, glycan structures may not only be species-specific, but also life-stage specific [91]. Glycans may obscure some of the protein epitopes predicted here, particularly in the secreted AChE which is highly glycosylated. This is to be expected as glycans form many of the hostparasite interactions [91]. In addition, secreted candidate antigens may also contain O-linked glycans, via oxygen atoms of serine or threonine, which are not easily predicted. Although we have excluded potential glycan epitopes, they could be accounted for to some extent by expressing antigens of interest, ideally in a closely-related system, potentially C. elegans or even Strongyloides itself [92]. As an alternative, glycans could be excluded altogether by synthesising peptides or expressing recombinant proteins in bacteria, thus focusing the antigen search purely on proteins, as we have done here.

Limitations and future work
We have described a methodological approach to discovery of diagnostic antigens. The process is predictive, based on computational analyses of 'omic' data. Predictions rely on the integrity of data sources, efficacy of software, and correct use of cut-off values at multiple stages of analysis. WormBase ParaSite, the source of most of the sequences used, is a curated database, regularly updated with high quality additional data and regarded as predominantly reliable. Thus, large-scale sequencing errors that might affect our use of software, such as antigen prediction or BLAST analysis, are rare. Single nucleotide and small-scale sequencing errors are unlikely to affect results of the predictive pathways. With the E value cut-off used to select matching proteins by BLAST analysis, some proteins returned multiple hits. This may be due to paralogs, or may reflect an E value that is too lenient. Even in the latter case, over-inclusion of proteins at this stage, pared down by downstream analysis, is preferable to rejection of candidates. Proof of candidate antigen validity requires follow-up experimental laboratory research, which can reveal whether constraints and cut-off values have been too strict or too lenient.
Another limitation is prediction of antigens that are present in the stool. In our pipeline, a protein being excreted or secreted by a gut-dwelling stage of the parasite has been used as a proxy for presence in the stool. This rests on the assumption that the protein passes through the gut, is present in the stool in sufficient quantity and with antigenic properties unchanged by digestion or denaturation. As S. stercoralis establishes and matures in the small intestine, E/ S proteins will not be subject to the denaturation and enzymatic degradation in the stomach but may be affected by enzymes in the small intestine. However, post-parasitic larvae, which hatch from eggs laid in the intestine, migrate through the small and large intestine, and can be found in faeces, so any E/S proteins produced by these stages, especially lower down in the intestinal tract, are less likely to be affected by host processes. The validity of these assumptions can be investigated by analysis of infected stool and an appropriately sensitive diagnostic developed based on laboratory-confirmed antigens and validation in the field.
A further consideration is the focus of the approach on differentially expressed proteins and certain parasitism-associated protein families. There is rationale to targeting proteins with these characteristics, because they are proven to be expressed at higher levels in the relevant life cycle stages, are likely to be excreted or secreted and species or genus-specific due to adaptive radiation and protein family expansion. The focus on these groups limits the scope of an 'omic' approach, however, proteins that do not belong to one of these two groups might still have the characteristics of a candidate coproantigen. The approach presented here can therefore be applied to other helminth infections, but may be complimented by an expanded approach that uses whole genomic or proteomic data without focussing on parasitism-associated protein families or life stages, with the results of the two methods being used in conjunction.
Other protein families with potential to be sources of diagnostic coproantigens include: enolase, common to Schistosoma japonicum [93], Echinostoma and Fasciola [94], Onchocerca [95] and Trichuris [96], as well as among S. stercoralis E/S orthologous proteins reported here and with constitutive high expression across all life stages (S4 File); astacins and proteinase inhibitors, as other parasitism-associated protein families [38]; protein 14-3-3 from E/S and somatic extract of Strongyloides [70,97,98], Ascaris [99], Schistosoma and Ancylostoma [72]. In addition, collagen, which forms some 80% of the outer cuticle of the nematode and was prominent among the DE proteins [100]. As an antigen, collagen would have to be carefully analysed for species specificity and the influence of glycosylation on its availability to capture antibodies [101].
Possible data sources for further antigen searches continue to be added to and include: an analysis of E/S proteomes by Tritten et al. (2021) [75]; analysis of S. venezuelensis somatic proteins by Fonseca et al. (2019) [71], and S. venezuelensis E/S proteins by Maeda et al. (2019) [102]; also S. stercoralis iL3 somatic proteins by Dishnica et al. (2023) [70]. Somatic protein datasets would include tegument antigens as might be present in stool if whole or parts of larvae were present, or if the proteins were secreted via non-classical pathways, as proposed for some S. venezuelensis proteins [102]. Vaccine candidates for S. stercoralis: sodium potassium ATPase (SsEAT), tropomyosin, and a galectin (LEC-5) [103,104] may also generate effective antibodies for antigen capture, if the relevant antigens are detectable in stool, as could serological antigens SsIR [105] and NIE [106], the latter of which has been detected in serum [107].
The most immediate work arising from this research is to validate the 'omic' pathway via wet laboratory analyses, specifically proteomic methods to detect and assess the antigenicity of proteins present in stool. Production of predicted candidates followed by antigenic testing could also be used to investigate the results of the 'omic' pathway as well as for development of lateral flow design for successful peptides. Results could be used to inform and refine results from the 'omic' pipeline, however, well-characterised stool sample collections are required to pursue diagnostic development.
Antigenic diversity must be considered, due to geographic differences between nematode strains [108]. This has impacted on diagnosis and vaccination for other parasitic infections [109,110]. Diversity can be readily investigated by amplifying and sequencing the genes of interest from a wide geographic range of samples, or investigating differences between actual protein sequences from clinical isolates [70] and their predicted counterpart from genomes. For S. stercoralis, this is especially required, because the reference genome strain PV001 originates from a dog infection with UPD (University of Pennsylvania dog) strain [111].

Conclusion
We have presented a detailed analysis of S. stercoralis proteins, leading to the selection of diagnostic coproantigens. We have identified multiple S. stercoralis candidate protein antigen sequences with evidence for their specificity to Strongyloides or S. stercoralis from phylogenetic and sequence comparison with relevant other species. Evidence supporting their presence in infected stool was assessed by belonging to parasitism-associated protein families, upregulation in gut-dwelling life stages, presence in E/S material of other helminths, and being among S. stercoralis orthologues of the S. ratti E/S proteome. Antigenicity was predicted using epitope prediction tools and 3D structure modelling. Peptides or whole proteins analysed and presented here form a selection of promising candidates for raising antibodies against and capturing S. stercoralis antigen in stool, with potential adaptation to prototype point-of-care rapid diagnostic tests. Access to coproantigen tests for Strongyloides will speed up the control efforts of this neglected public health problem.  )). Overlap is indicated with proteins which were also differentially expressed (DE) with 100% certainty between gut-dwelling and non-gutdwelling life stages, as determined for this study. Proteins containing predicted epitopes according to two tools: BepiPred 1.0 and bcepred are indicated as well as proteins predicted by SignalP to contain a signal peptide. Accession numbers starting ERR refer to RNA-seq data in NCBI SRA and those starting SSTP are S. stercoralis gene transcripts which may be found in UniProt or WormBase ParaSite. Life-stage abbreviations are explained in the manuscript text.