Skip to main content
Advertisement
  • Loading metrics

Transformer-based deep learning enables improved B-cell epitope prediction in parasitic pathogens: A proof-of-concept study on Fasciola hepatica

  • Rui-Si Hu,

    Roles Data curation, Formal analysis, Funding acquisition, Methodology, Software, Writing – original draft

    Affiliations School of Health and Wellness Industry & School of Medicine, Sichuan University of Arts and Science, Dazhou, Sichuan Province, People’s Republic of China, Key Laboratory of Intelligent Medicine and Health Data Science, Sichuan University of Arts and Science, Dazhou, Sichuan Province, People’s Republic of China

  • Kui Gu ,

    Roles Conceptualization, Funding acquisition, Methodology, Writing – review & editing

    gukui0404@cqut.edu.cn (KG) chunrenwang@sohu.com (CRW)

    Affiliation School of Pharmacy and Bioengineering, Chongqing University of Technology, Banan, Chongqing, People’s Republic of China

  • Muhammad Ehsan,

    Roles Data curation, Formal analysis, Writing – review & editing

    Affiliation Department of Parasitology, Faculty of Veterinary and Animal Sciences, The Islamia University of Bahawalpur, Punjab, Pakistan

  • Sayed Haidar Abbas Raza,

    Roles Methodology, Software, Writing – review & editing

    Affiliation State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, Guangdong Province, People’s Republic of China

  • Chun-Ren Wang

    Roles Conceptualization, Funding acquisition, Methodology, Writing – review & editing

    gukui0404@cqut.edu.cn (KG) chunrenwang@sohu.com (CRW)

    Affiliation College of Animal Science and Veterinary Medicine, Heilongjiang Bayi Agricultural University, Daqing, Heilongjiang Province, People’s Republic of China

Abstract

Background

The identification of B-cell epitopes (BCEs) is fundamental to advancing epitope-based vaccine design, therapeutic antibody development, and diagnostics, such as in neglected tropical diseases caused by parasitic pathogens. However, the structural complexity of parasite antigens and the high cost of experimental validation present certain challenges. Advances in Artificial Intelligence (AI)-driven protein engineering, particularly through machine learning and deep learning, offer efficient solutions to enhance prediction accuracy and reduce experimental costs.

Methodology/Principal findings

Here, we present deepBCE-Parasite, a Transformer-based deep learning model designed to predict linear BCEs from peptide sequences. By leveraging a state-of-the-art self-attention mechanism, the model achieved remarkable predictive performance, achieving an accuracy of approximately 81% and an AUC of 0.90 in both 10-fold cross-validation and independent testing. Comparative analyses against 12 handcrafted features and four conventional machine learning algorithms (GNB, SVM, RF, and LGBM) highlighted the superior predictive power of the model. As a case study, deepBCE-Parasite predicted eight BCEs from the leucine aminopeptidase (LAP) protein in Fasciola hepatica proteomic data. Dot-blot immunoassays confirmed the specific binding of seven synthetic peptides to positive sera, validating their IgG reactivity and demonstrating the model’s efficacy in BCE prediction.

Conclusions/Significance

deepBCE-Parasite demonstrates excellent performance in predicting BCEs across diverse parasitic pathogens, offering a valuable tool for advancing the design of epitope-based vaccines, antibodies, and diagnostic applications in parasitology.

Author Summary

Antigen-antibody interactions are critical events in the humoral immune response, facilitating the recognition and neutralization of invasive parasites. BCEs, defined as surface-exposed clusters of amino acids recognized by B-cell receptors or antibodies, play a critical role in initiating a humoral immune response. This study focuses on the identification of parasite BCEs, which serve as promising targets for the development of vaccines, therapeutic antibodies, and diagnostic tools. To this end, we developed a deep learning model, termed deepBCE-Parasite, which was rigorously benchmarked against and integrated with traditional machine learning models. By leveraging state-of-the-art AI techniques, these models enable rapid and precise BCE identification directly from amino acid sequences, rendering it particularly suitable for large-scale epitope screening. As a proof-of-concept, we applied these AI-driven models to predict BCEs in F. hepatica, a globally distributed parasite responsible for fascioliasis, a neglected tropical disease. Utilizing available proteomics data of this trematode species, we identified peptides exhibiting high specificity for antibody binding. This work highlights the potential of AI in advancing epitope prediction within parasitology, providing a rapid, scalable, and cost-effective strategy for discovering immune targets.

Introduction

Parasitic diseases, caused by both established and emerging infections, remain a significant global health challenge, particularly neglected tropical diseases (NTDs) associated with foodborne parasites [1]. These parasites often cause opportunistic infections in humans and livestock, imposing not only significant public health burdens on humans but also considerable and often underestimated economic losses in the livestock industry [2]. Notable examples of such NTDs include fascioliasis, caused by Fasciola hepatica and Fasciola gigantica, which pose significant economic and public health challenges globally [3,4]. Although conventional broad-spectrum antiparasitic drugs are effective in controlling or eradicating these parasites, several limitations and challenges hinder their long-term efficacy and safety. Among these, the growing emergence of drug resistance represents as a major hurdle, driven by factors such as genomic evolution, improper drug administration, environmental factors, and limited availability of novel therapeutics [5,6]. Consequently, antibody-based therapies and vaccine development are increasingly being explored as promising alternatives for the prevention and control of parasitic infections.

Upon host invasion, parasite and its molecular components act as antigens that trigger host immune response. Some specific antibodies recognize BCEs, which are short amino acids sequences on antigenic proteins, thereby mediating humoral immunity against pathogens [7]. Nevertheless, the identification of BCE remains challenging due to the complex interactions between antibody variable regions and epitope sequences [8]. BCEs can be categorized as either conformational or linear epitopes [9]. Conformational epitopes comprise discontinuous amino acid residues that are spatially adjacent through protein folding, requiring structural modeling for precise prediction. In contrast, linear epitopes consist of continuous amino acid sequences that maintain their antigenicity even under denatured or partially unfolded conditions. Traditional in silico approaches with respect to the prediction of parasite BCEs predominantly depend on bioinformatics screening coupled with experimental immunogenicity validation [1013]. Although computational methodologies have advanced significantly, the identification of BCEs using large-scale datasets has not been thoroughly investigated in parasitic immunology.

Machine learning and deep learning approaches have been extensively applied in the prediction of BCEs. For conformational epitopes, various algorithms, including Support Vector Machine (SVM), Adaptive Boosting (AdaBoost), Logistic Regression (LR), eXtreme Gradient Boosting (XGBoost), and pre-trained protein structure models (e.g., ESM-2), have demonstrated exceptional predictive accuracy [1420]. Regarding linear BCEs, sequence-based approaches employing Recurrent Neural Networks (RNN), Language Models (LM), Extremely Randomized Tree (ERT), Gradient Boosting (GB), Random Forest (RF), and SVM have been extensively implemented [2127]. In the field of parasitology, the post-genomic era has witnessed two significant developments: the completion of genomic sequencing projects for diverse protozoan and helminthic parasites [28,29], and considerable advancements in multi-omics technologies, particularly transcriptomics and proteomics [3032]. These progressions have notably expanded parasite datasets, providing unprecedented opportunities to integrate AI models with multi-omics data, thereby enhancing prediction accuracy while reducing experimental costs.

In this study, we developed a Transformer-based deep learning model to predict linear BCEs in parasites. Trained on experimentally validated datasets from diverse parasite species, the model achieved an accuracy of 80.97% and an AUC of 0.9 on an independent test set. To benchmark its performance, we compared it with traditional machine learning approaches using 12 feature extraction techniques and four classifiers. Furthermore, we applied our models to proteomic data from the liver fluke Fasciola hepatica, predicting eight peptide sequences derived from the leucine aminopeptidase (LAP) protein. These peptides were synthesized and experimentally validated via dot-blot immunoassays, revealing that seven exhibited specific binding to antibodies in F. hepatica-positive ovine sera. Our findings demonstrate the efficacy of Transformer-based and machine learning models in predicting BCEs in parasites, providing valuable insights for the development of antibody-based therapeutics, peptide vaccines, and immunodiagnostic tools.

Methods

Dataset preparation of parasites

Although numerous prediction tools for BCEs have been developed (Table 1), none have been specifically trained on parasitic datasets, thus limiting their applicability to parasite-specific antigens. Moreover, the effective AI model training necessitates a well-curated dataset comprising both positive and negative samples. In this study, positive and negative samples of parasite BCEs were systematically collected from the Immune Epitope Database (IEDB) [33] and Bcipep database [34] (retrieved in April 2024). Initially, 8,128 positive BCE sequences were obtained. To mitigate redundancy and enhance dataset diversity for robust model training, sequence clustering was performed using CD-HIT v4.8.1 [35] with a similarity threshold of 0.8, yielding a refined dataset of 5,752 unique positive peptide sequences. The IEDB provides an experimentally validated negative dataset of parasite BCEs, which were extracted, deduplicated using the CD-HIT software, and numerically balanced to match the positive sample set. The final dataset was divided into training and test sets in an 80:20 ratio, ensuring rigorous model training and evaluation.

thumbnail
Table 1. A summary of bioinformatics tools for conformational and linear BCE prediction.

https://doi.org/10.1371/journal.pntd.0012985.t001

Deep learning architecture

Embedding.

The model input consists of amino acid sequences, where each amino acid is represented as a dense vector through an embedding vector. This layer comprises two components: amino acid embedding and positional encoding.

  1. (1). Amino acid embedding:

Each amino acid residue is mapped to a low-dimensional vector using a learned embedding matrix , where =24 (the vocabulary size, representing the 20 standard amino acids plus 4 special tokens) and =256 (the embedding dimension). The embedding matrix is optimized during training via backpropagation to capture semantic relationships between amino acids.

  1. (2). Positional encoding:

To encode sequence order information, positional encoding is added to the amino acid embeddings. The encoding for position is computed as:

Here, denotes the position of the amino acid in the sequence, and , represents even and odd dimensions of the positional encoding vector. The final embedding for each residue is the sum of its amino acid embedding and positional encoding:

Encoder-decoder architecture.

  1. (1). Multi-head attention mechanism:

An encoder-decoder architecture is built around a multi-head attention mechanism, which enables the model to focus on different parts of the input sequence simultaneously. This mechanism is implemented in both the encoder and decoder layers to capture complex dependencies within the sequences. For each attention head, queries (), keys (), and values () are computed through learned linear transformations:

Here, represents the input embeddings, and , , and are learned weight matrices.

The attention scores are computed as:

where =32 is the dimensionality of the keys. The scaling factor ensures stable gradients during training. The multi-head attention mechanism applies this process across 8 parallel attention heads, allowing the model to capture diverse aspects of the input sequence. The outputs from all heads are concatenated and projected back into the model’s dimensional space (=256) using a learned linear transformation.

  1. (2). Feed-forward network (FFN):

Both the encoder and decoder layers contain a FFN, which consists of two linear transformations with a ReLU activation function in between:

Here, and are learned weight matrices, and , are bias terms. The FFN expands the hidden dimension to =2048 to enhance the model’s capacity for nonlinear representation.

  1. (3). Feature optimization block (FOB):

The FOB refines learned features using a 1D convolutional layer (kernel size 3, padding 1), followed by ReLU activation and dropout (=0.3). The convolutional layer extracts local features, while dropout reduces overfitting. The output is projected back to the model’s dimensional space (=256) via a fully connected layer.

  1. (4). Encoder layer:

The encoder layer consists of 2 sub-layers, namely multi-head attention and FFN. The input is first processed by the multi-head attention mechanism, followed by the FNN. Each sub-layer is accompanied by a residual connection and layer normalization:

In addition, we incorporate a FOB to further refine the learned features using convolutional layers and dropout.

  1. (5). Decoder layer:

The decoder layer has a structure similar to the encoder but includes an additional encoder-decoder attention mechanism. This mechanism enables the decoder to focus on the encoder’s output while processing the target sequence, integrating information from both the input and target sequences to generate predictions.

Model output.

The final output from the decoder is passed through an adaptive average pooling layer, followed by a fully connected (FC) layer and a ReLU activation. Dropout is applied to prevent overfitting:

Design of traditional machine learning models

Comparing the performance of different methods in the presence of a large number of handcrafted features is a complex task. To tackle this, we selected twenty representative statistical features as the foundation for our prediction and analysis. These features were derived from classical protein sequence encoding methods, including Amino Acid Composition (AAC), Adaptive Skip Dipeptide Composition (ASDC), Composition of k-Spaced Amino Acid Group Pairs (CKSAAGP), Composition of k-Spaced Amino Acid Pairs (CKSAAP), Dipeptide Deviation from Expected Mean (DDE), Di-Peptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Grouped Di-Peptide Composition (GDPC), Grouped Tri-Peptide Composition (GTPC), Pseudo-Amino Acid Composition (PAAC), Quasi Sequence Order (QSO), and Sequence Order Coupling Number (SOCN). These features were extracted using the iLearnPlus tool [36].

To reduce noise in the feature matrix and negative effects on model performance, we employed the Light Gradient-Boosting Machine (LGBM) algorithm to rank all extracted features based on their importance. The proven capability of LGBM in feature selection makes it an ideal choice for enhancing model performance in high-dimensional tasks [37,38]. From this ranking, the top 200 features were selected. Subsequently, Recursive Feature Elimination (RFE) was applied to further fine-tune the selection, thereby improving both model performance and robustness.

For the classification task, we utilized four classic classification algorithms that are widely adopted in biological sequence analysis and prediction: SVM, RF, LGBM, and Gaussian Naïve Bayes (GNB). These algorithms were chosen due to their demonstrated effectiveness to handle complex, high-dimensional datasets, such as protein sequences, while maintaining robustness and generalizability. To implement and compare the performance of these features and classifiers, we used the scikit-learn (https://scikit-learn.org/stable/) library to execute machine learning algorithms.

Model evaluation metrics

We assessed the performance of all models using four standard metrics: accuracy (ACC), sensitivity (SE), specificity (SP), and the Matthews correlation coefficient (MCC) [3941].

In the aforementioned formulas, “True Positive” (TP) denotes the number of instances correctly classified as positive (i.e., accurately predicted BCEs), while “True Negative” (TN) represents the number of instances correctly classified as negative (i.e., accurately predicted non-BCEs). Conversely, “False Positive” (FP) indicates the number of negative instances erroneously classified as positive (i.e., non-BCEs misclassified as BCEs), and “False Negative” (FN) refers to the number of positive instances erroneously classified as negative (i.e., BCEs misclassified as non-BCEs). ACC indicates the proportion of correctly predicted instances, including both true positives and true negatives. SE measures the proportion of correctly predicted positives among all positive instances, while SP reflects the proportion of correctly predicted negatives among all negative instances. MCC provides a comprehensive assessment of model performance by considering all four prediction categories (TP, FP, TN, FN), offering a balanced assessment. Additionally, the Area Under the Curve (AUC) refers to the area under the Receiver Operating Characteristic (ROC) curve, which illustrates the model’s ability to distinguish between positive and negative samples, with values closer to 1 indicating better classification performance.

A case study for wet-lab evaluation

Peptide identification in the liver fluke Fasciola hepatica.

To validate and evaluate the applicability of our models in parasitic research, we used F. hepatica as a case study. The raw proteomic sequencing data of F. hepatica were retrieved from the iProX database (https://www.iprox.cn/; Project ID: IPX0002165000). This dataset comprises mass spectrometry (MS)-based proteomic data from four developmental stages of F. hepatica: (1) Metacercaria, obtained from Galba pervia snails cultured under suitable conditions for 30–45 days, during which cercariae emerged and underwent cyst formation; (2) Juvenile fluke, isolated from the livers of sheep artificially infected with metacercariae for 28 days; (3) Immature fluke, recovered from the livers of sheep infected for 59 days; and (4) Adult fluke, extracted from the livers of sheep infected for 118 days [42]. The protein sequences of F. hepatica were downloaded from the UniProt database on October 12, 2024, using the keyword “Fasciola hepatica”. Analysis of raw data was conducted using MaxQuant v2.6.5.0 [43], extracting key parameters from the proteinGroups file, including peptide counts, label-free quantification (LFQ) intensity values, sequence coverage, molecular weight, and FASTA headers. To facilitate cross-sample comparisons, LFQ intensity values were normalized using Z-score standardization.

Model prediction of potential BCEs.

Based on MS-based proteomics data, we applied our models to predict potential linear BCEs of F. hepatica. To enhance prediction robustness, we conducted an intersection analysis across multiple models, retaining only epitopes consistently identified by different approaches. Furthermore, to elucidate the biological relevance of the predicted BCEs, we performed subcellular localization analysis of their source proteins using DeepLoc 2.0 [44] and assessed protein-protein interaction networks for key proteins via the STRING database [45].

Synthesis, antigenicity, and experimental validation of BCEs via dot-blot immunoassays.

To experimentally validate the predicted BCEs, eight peptides derived from the LAP protein (Uniport ID: A0A345G0S4) were synthesized, alongside a human peptide sequence (RSRTPSLPTPPTREP) used as a control. All peptides were chemically synthesized by Sangon Biotech Co., Ltd. (Shanghai, China), with the purity of the synthesized peptides confirmed to exceed 95% through high-performance liquid chromatography (HPLC) and MS analyses. Dot-blot immunoassays were performed on each nitrocellulose membrane (purchased from Beyotime Biotech Co., Ltd., Shanghai, China), with three peptides (10 μg of each) spotted per membrane. These membranes were air-dried, blocked with PBS containing 2% non-fat milk, and incubated overnight at 4°C with F. hepatica-positive and negative (control) ovine serum antibodies (diluted 1:400). Following washing, the membranes were incubated at room temperature for 1 hour with horseradish peroxidase (HRP)-conjugated rabbit anti-ovine IgG (H+L) (AS029, diluted 1:3000; ABclonal Biotech Co., Ltd., Wuhan, China). The binding signals were detected using the SuperPico ECL Master Mix (Vazyme Biotech Co., Ltd., Nanjing, China) and visualized with an enhanced chemiluminescence (ECL) detection system. Signal images were captured using a digital imaging platform. All peptides were tested in three independent experimental repetitions, and only peptides that consistently exhibited positive dot signals across all three replicates were confirmed as true BCEs specific to IgG antibodies.

Results

Design of the deep learning pipeline

We proposed a deep learning pipeline, named deepBCE-parasite, based on a custom Transformer architecture for predicting BCEs in parasite proteins. This model was trained on a balanced dataset containing 5,752 positive and 5,752 negative BCEs, with amino acid features encoded using amino acid embedding and positional encoding. The architecture employs a binary classification framework using an encoder-decoder structure (Fig 1A). The innovation is the integration of a multi-head attention mechanism within this encoder-decoder framework, which enables efficient feature extraction and context-aware sequence modeling (Fig 1B). Model hyperparameters include a vocabulary size of 24, an embedding dimension of 256, two encoder layers, eight attention heads, and a feedforward hidden layer dimension of 2,048. Detailed parameter settings are provided in S1 Table. This architecture offers a scalable and robust solution for epitope identification in complex parasite protein sequences.

thumbnail
Fig 1. Schematic representation of the study workflow.

(A) Overview of the proposed framework, highlighting a comparative architecture between the Transformer-based model (upper section) and the traditional machine learning model (lower section). (B) A detailed depiction of the Transformer-based model architecture, emphasizing its structure comprising 2 encoder layers and 8 attention heads. The model processes input amino acid sequences via an embedding layer, positional encoding, multi-head self-attention, and feature optimization modules, before classifying the sequences through a fully connected layer.

https://doi.org/10.1371/journal.pntd.0012985.g001

The model was trained with a batch size of 512, utilizing 80% of the data for training and 20% for testing in each fold. Before training, as shown in S1 Fig, we conducted a comprehensive analysis of amino acid frequencies and peptide length distributions in both the training and test datasets. The results indicated that positive samples exhibited elevated frequencies of amino acids D, E, I, K, N, Q, and Y compared to negative samples, while amino acids A, G, H, L, P, R, S, T, and V occurred less frequently in positive samples. Furthermore, analysis of length distribution demonstrated that positive samples predominantly contained peptides ranging from 10 to 15 amino acids in length, whereas negative samples were primarily 15 to 20 amino acids long. Upon completion of training, deepBCE-parasite achieved an average performance on the training set, with an ACC of 81.91%, an AUC of 0.89, a SE of 71.74%, a SP of 82.09%, and a MCC of 0.64. Rigorous evaluation via 10-fold cross-validation on the balanced BCE dataset yielded an average performance on the independent test set, with an ACC of 80.97%, an AUC of 0.90, a SE of 69.64%, a SP of 89.26%, and a MCC of 0.61. These results are summarized in Table 2.

thumbnail
Table 2. Performance of the Transformer-based deep learning model for predicting BCEs in parasites across training and independent test datasets.

https://doi.org/10.1371/journal.pntd.0012985.t002

Selection of conventional machine learning models based on handcrafted features

Following the model establishment of the deep learning pipeline, we benchmarked its performance against traditional machine learning models to identify the most appropriate approaches for subsequent BCE screening. As shown in Fig 1A, this study also incorporated a traditional machine learning pipeline, which evaluated commonly used algorithms, including SVM, RF, LGBM, and GNB. To ensure a fair comparison, the same training and testing datasets were used across both deep learning and machine learning. A total of twelve handcrafted features were utilized in this study, including AAC, ASDC, CKSAAGP, CKSAAP, DDE, DPC, GAAC, GDPC, GTPC, PAAC, QSO, and SOCN, which were then applied to the four machine learning algorithms. Hyperparameters for the classifiers were optimized via Grid Search, with the detailed settings provided in S1 Table.

The performance results showed that features derived from the deep learning model consistently outperform those handcrafted features across all four algorithms, yielding a superior ROC curve (AUC=0.90) and highlighting the enhanced evaluation performance of deep learning over machine learning methods (S2 Fig). In addition to AUC, we evaluated several performance metrics, including SP, SE, and MCC. Notably, the RF classifier consistently achieved the highest classification performance compared to GNB, LGBM, and SVM, particularly when employing the CKSAAP, DDE, and DPC feature descriptors (S3 Fig). Specifically, for the independent test set, RF attained SP values of 81.67%, 84.71%, and 83.23% for CKSAAP, DDE, and DPC, respectively (Fig 2A), with corresponding SE values of 74.28%, 75.76%, and 75.07% (Fig 2B), and MCC values of 0.56, 0.61, and 0.58 (Fig 2C). Further analysis of the AUC values across the four classifiers showed the following results: for CKSAAP, the AUC values for GNB, LGBM, RF, and SVM were 0.71, 0.75, 0.78, and 0.73, respectively (Fig 2D); for DDE, the AUC values were 0.70, 0.76, 0.80, and 0.73 (Fig 2E); and for DPC, the AUC values were 0.69, 0.74, 0.79, and 0.73 (Fig 2F). Additional performance metrics for machine learning methods are compiled in S2 Table. These findings further validate the superior performance of the RF classifier, confirming its selection as the optimal predictive models.

thumbnail
Fig 2. Comparison of deep learning feature (deepFeature) with eight top traditional handcrafted features (AAC, ASDC, CKSAAGP, CKSAAP, DDE, DPC, GDPC, and GTPC).

(A-C) Radar plots showing SP, SE, and MCC values for each feature. (D-F) ROC curve comparing the deep learning model (deepBCE) with four traditional machine learning algorithms (GNB, LGBM, RF, and SVM) on CKSAAP, DDE, and DPC features. The x-axis represents False Positive Rate (FPR), and the y-axis represents True Positive Rate (TPR).

https://doi.org/10.1371/journal.pntd.0012985.g002

Benchmarking custom and existing state-of-the-art models on dual test sets

To ensure a fair comparison between our models and existing linear BCE prediction models, we randomly partitioned two independent test sets, namely test1 and test2. As shown in Table 3 and Fig 3, deepBCE and BCE_RF are specialized models trained exclusively on experimentally validated datasets derived from the human and veterinary parasites, whereas EpiDope and BepiPred3 are general-purpose models trained on a wide range of BCE datasets. In the benchmark evaluation, BepiPred3 achieved remarkable SE with values of 99.46% and 98.31%, but its SP was notably low, reaching only 2.01% and 2.64%. This suggests that BepiPred3 is highly effective in identifying positive BCEs but performs inadequately in distinguishing negative BCEs. In contrast, although EpiDope surpassed BCE_RF in terms of AUC, it underperformed in other critical metrics, including SE and SP. Among our proposed models, deepBCE exhibited superior performance in discriminating positive samples and overall classification accuracy, while BCE_RF demonstrated slightly better capability in identifying negative samples and overall predictive performance. Overall, when evaluated in various metrics, the deepBCE and BCE_RF models, trained on parasite-specific BCE datasets, consistently surpassed EpiDope and BepiPred3, thereby highlighting their superior adaptability and effectiveness for parasite-related prediction tasks.

thumbnail
Table 3. Performance comparison of our models against existing state-of-the-art models.

https://doi.org/10.1371/journal.pntd.0012985.t003

thumbnail
Fig 3. Performance comparison between our models and existing models on benchmark datasets.

(A) Results on Test1 dataset. (B) Results on Test2 dataset.

https://doi.org/10.1371/journal.pntd.0012985.g003

Proteomics-based bioinformatics analysis and prediction of BCEs in the liver fluke Fasciola hepatica

We evaluated the accuracy and utility of our predictive models for identifying BCEs using F. hepatica as a case study. Our models were applied to predict BCEs within F. hepatica proteins using proteomic data and peptide fragments. The complex life cycle of F. hepatica involves an intermediate snail host and a definitive mammalian host, typically cattle or sheep, although humans also act as definitive hosts (Fig 4A) [46]. Leveraging raw proteomic data, we characterized the proteome and expression profiles of F. hepatica across four developmental stages: metacercariae, juvenile fluke (28 days post-infection [dpi] in sheep), immature fluke (59 dpi), and adult fluke (118 dpi). Detailed protein identifications and corresponding expression profiles are provided in S3 Table. Based on these results, the following observations can be drawn.

thumbnail
Fig 4. Proteomic and bioinformatics analysis of F. hepatica across four developmental stages.

(A) The lifecycle of Fasciola spp., demonstrating the developmental stages within the intermediate snail host, environmental phases, and definitive mammalian/human host. The parasite image was adapted from Servier Medical Art (https://smart.servier.com/). (B) Principal component analysis (PCA) of protein expression profiles in metacercaria, juvenile fluke (28 dpi), immature fluke (58 dpi), and adult fluke (118 dpi). (C) Distribution of subcellular localization for proteins capable of generating B-cell epitopes (BCEs), highlighting their potential immunogenic properties. (D) Heatmap depicting protein expression dynamics across the four stages, with color-coded subcellular localization categories shown on the right. (E) Protein-protein interaction network of the leucine aminopeptidase (LAP) protein, generated using the STRING database, emphasizing key interactions and its potential as a vaccine candidate.

https://doi.org/10.1371/journal.pntd.0012985.g004

The most recent F. hepatica protein reference from the UniProt database was utilized to guide our MS analysis, resulting the identification of 2,165 proteins with a Q-value ≤ 0.01. PCA revealed pronounced differences in expression profiles across developmental stages, with the most substantial divergence observed between the metacercarial stage and the mammalian-host stages (Fig 4B). Additionally, the clustering of biological replicates across stages highlighted the high quality and reproducibility of the sequencing data, establishing a robust foundation for investigating the distribution of BCEs within parasitic proteins.

The deep learning model, combined with RF-based machine learning models, was employed to predict BCEs of F. hepatica. This integrated approach facilitates the identification of high-confidence BCEs in subsequent bioinformatics analysis, with the results summarized in S4 Table. To investigate the pre-cleavage subcellular localization of these source proteins, we systematically mapped their distribution across ten subcellular regions: nucleus, cytoplasm, extracellular space, mitochondria, cell membrane, endoplasmic reticulum, chloroplast, Golgi apparatus, lysosome/vacuole, and peroxisome. Our results indicate that the majority of BCE-source proteins were localized to the cytoplasm (61.24%), followed by mitochondria (10.90%), extracellular space (8.11%), nucleus (7.24%), cell membrane (5.88%), and endoplasmic reticulum (4.83%), with the remaining regions collectively accounting for 1.80% (Fig 4C). Moreover, heatmap clustering demonstrated that proteins carrying BCEs showed the highest expression levels during the metacercariae (MC) stage, while significantly lower expression levels were observed in the mammalian stages (JF, IF, and AF) (Fig 4D).

Here, we selected four proteins: glutathione S-transferase (GST), leucine aminopeptidase (LAP), annexin (ANX), and β-actin, and conducted three-dimensional structure prediction using AlphaFold3 [47], followed by domain analysis using the InterPro web service [48]. As shown in Fig 5, the GST protein contains the GST_NTER and GST_C_Mu domains, yielding seven BCEs; the LAP protein harbors a Peptidase_M17 domain, resulting in eight BCEs; the ANX protein includes four ANNEXIN_2 domains, producing ten BCEs; and the β-actin protein contains a single domain, generating four BCEs. Notably, in prior animal studies, the LAP protein demonstrated an 89% vaccine efficacy in sheep [49]. Protein interaction predictions further revealed that the LAP protein interacts with 27 high-confidence proteins, each of which also carries BCEs (Fig 4E), suggesting promising pathways for the exploration of F. hepatica immune targets.

thumbnail
Fig 5. Predicted 3D structural models of potential F. hepatica vaccine molecules, highlighting the spatial distribution of putative BCEs.

(A–D) Structural models for Glutathione transferase, leucine aminopeptidase, annexin, and β-actin, respectively. The predicted template modelling (pTM) score from AlphaFold3, which reflects prediction quality, is provided in the legend (higher value indicates greater model reliability). Predicted functional domains are color-coded, while non-domain regions are shown in gray.

https://doi.org/10.1371/journal.pntd.0012985.g005

Dot-blot immunoassays

Structural domain prediction of the LAP protein identified the presence of a Peptidase_M17 domain, confirming its classification as a canonical member of the M17 family (Fig 5B). To assess its evolutionary conservation, a multiple sequence alignment was conducted using MAFFT [50], incorporating LAP protein sequences from diverse trematode species, including Fasciola gigantica, Clonorchis sinensis, Opisthorchis felineus, Schistosoma japonicum, and Schistosoma mansoni. The alignment indicated a high degree of conservation in predicted BCEs between F. hepatica and F. gigantica. In contrast, comparisons with other trematode species showed that only the sequence “DIGGSDPER” was fully conserved, while other predicted BCEs exhibited significant amino acid variability (Fig 6A).

thumbnail
Fig 6. Sequence alignment and experimental validation of the LAP protein and its putative BCEs.

(A) Multiple sequence alignment of F. hepatica LAP with five other trematode species. Predicted BCEs identified by our AI models are highlighted with red boxes. (B) Dot-blot immunoassay validation of predicted BCEs using serum from F. hepatica-positive animal, with F. hepatica-negative serum and a synthetic human peptide as controls.

https://doi.org/10.1371/journal.pntd.0012985.g006

To experimentally validate the predicted BCEs, eight peptides derived from the LAP protein was chemically synthesized and subjected to dot-blot immunoassays. Immunoreactivity analysis revealed that seven of the predicted BCEs (P2, P3, P4, P5, P6, P7, and P8) produced distinct dot signals in F. hepatica-positive ovine serum, while no dot signals were observed in synthetic human peptide controls and F. hepatica-negative ovine serum (Fig 6B and Table 4). In contrast, P1 failed to generate a detectable signal, suggesting a false-positive prediction by the model. Among the validated BCEs, P2, P4, P5, P6, and P8 were located within the predicted structural domain, whereas P3 and P7 were situated outside the domain. These findings demonstrate that the LAP protein exhibits multi-site antigenicity and has the potential to elicit host immune responses through distinct structural regions.

thumbnail
Table 4. Experimental validation of predicted BCEs derived from the LAP protein using dot-blot immunoassay.

https://doi.org/10.1371/journal.pntd.0012985.t004

Discussion

The identification of eukaryotic antigenic peptides derived from parasitic proteins previously relied on machine learning approaches that utilize handcrafted feature extraction, such as LGBM and stack-based models [5153]. Although these approaches have demonstrated good prediction accuracy, their dependence on manual feature engineering often limits their ability to fully capture the complexity of antigenic peptide biology. In contrast, Transformer-based deep learning approaches have quickly emerged as a powerful alternative for protein sequence analysis, offering significant advantages over other deep learning methods, such as RNN-based models, which are constrained by sequential processing and struggle to model long-range dependencies [5456]. By leveraging the Transformer architecture, the developed model effectively extracts hierarchical features that integrate amino acid properties and positional information through self-attention mechanisms. This approach not only captures both local and long-range residue interactions but also enables efficient parallel processing of large, variable-length sequences without the need for extensive preprocessing.

In this study, by integrating Transformer-based models with traditional machine learning models and leveraging publicly available F. hepatica proteomic data, potential BCEs were predicted and experimentally validated for those targeted by the LAP protein. Notably, seven of the eight synthesized peptides were confirmed as immunoreactive BCEs, demonstrating the high predictive accuracy and computational efficiency of our models in screening epitope candidates from large-scale MS-based datasets. The LAP protein was selected for its well-documented potential as a vaccine candidate that has been supported by extensive experimental evidence [5761]. Our findings reveal the sequence characteristics and evolutionary relationships of LAP-driven BCEs, providing a bioinformatics foundation for peptide vaccine and diagnostic tool development. Additionally, proteins such as GST, ANX, and β-actin were identified as potential vaccine candidates based on their immunoprotective effects [62]. However, further experimental validation is required to confirm the antibody-binding efficacy of the BCEs derived from these candidate proteins.

Importantly, our model is not restricted to the liver fluke F. hepatica but can be broadly applied to predict BCEs across a wide range of human and veterinary parasites, including both protozoan and helminth species. For example, in veterinary parasitology, this approach enables high-throughput screening of potential BCEs, allowing synthesized peptides to be combined with adjuvants and evaluated for their immunoprotective potential against pathogens [63]. Furthermore, integrating BCEs into diagnostic assays offers the potential to enhance the accuracy of infection detection, allowing for the differentiation between past and active infections, as well as the evaluation of vaccine efficacy. Although the current models demonstrate considerable predictive ability, further advancements are needed to improve the prediction of BCEs and to better capture complex structural interactions.

Our study only focuses on the development of linear BCE prediction models that leverage the linear sequence features of parasitic peptides to identify potential BCEs. Linear epitopes consist of contiguous amino acid sequences, which offer an advantage in model development due to their direct reliance on sequence features [64]. As a result, these epitopes are relatively straightforward to classify and experimentally validate. Consequently, linear epitope predictions are often prioritized in vaccine development and diagnostic design, particularly for rapid antigenic peptide identification [6569]. However, this study does not address conformational epitope prediction, thus limiting its applicability in natural biological systems where BCEs typically adopt three-dimensional (3D) conformational structures. Conformational epitopes, formed by non-contiguous amino acid residues, depend on the overall protein folding and spatial exposure [70,71]. This dependence introduces additional challenges to epitope prediction, as it requires modeling complex structural interactions.

With advancements in AI-facilitated protein structure prediction, future studies on parasitic infections could benefit from protein models that integrate both linear and conformational epitope recognition. Open-source tools like AlphaFold, combined with structure determination techniques like X-ray crystallography and cryo-electron microscopy, would provide high-accuracy structures [72]. These advancements enable the identification of conformational epitopes, which are critical for simulating real-world antigen-antibody interactions. Future research should explore the development of multimodal predictive frameworks that integrate 3D structural information with sequence-based approach to improve the prediction of parasite-specific BCEs. Additionally, state-of-the-art deep learning architectures, such as 3D convolutional neural networks or graph neural networks [73,74], should be explored in the future to identify critical contact sites and surface-exposed features within protein structures.

Conclusions

In this study, we developed a Transformer-based deep learning model alongside several traditional machine learning models for predicting BCEs in parasitic protein. Performance comparisons demonstrated that the Transformer-based model achieved superior predictive capability, with an independent test performance characterized by an ACC of 80.97%, an AUC of 0.90, a SE of 69.64%, a SP of 89.26%, and a MCC of 0.61. Among the traditional machine learning models, the RF algorithm performed best, particularly when using the DDE feature descriptor, yielding an ACC of 80.23%, an AUC of 0.88, a SE of 75.76%, a SP of 84.71%, and a MCC of 0.61. In addition, this study leveraged proteomic data from F. hepatica, identifying seven BCEs through dot-blot immunoassay targeting the LAP protein, which specifically bind to F. hepatica-positive ovine serum antibodies, underscoring their potential as vaccine antigen candidates. The developed models show versatility, suitable for predicting BCEs in various parasitic pathogens, including protozoa and helminths. To enhance predictive accuracy and robustness in experimental validation, we recommend combining multiple models for BCE prediction in parasitic peptides.

Supporting information

S1 Fig. Amino acid frequency distribution and length statistics of peptides.

(A-B) Amino acid frequency in positive and negative samples from the training and testing datasets. (C-D) Peptide length distribution in positive and negative samples from the training and testing datasets.

https://doi.org/10.1371/journal.pntd.0012985.s001

(TIF)

S2 Fig. Comparison of ROC curve between deep learning and 12 handcrafted features using four machine learning algorithms.

https://doi.org/10.1371/journal.pntd.0012985.s002

(TIF)

S3 Fig. Radar chart comparing the performance of deep learning features (deepFeature) with 12 handcrafted features across four machine learning algorithms.

The chart highlights their respective performance in SP, SE, and MCC metrics.

https://doi.org/10.1371/journal.pntd.0012985.s003

(TIF)

S1 Table. Parameter settings for deep learning and machine learning models used in this study.

https://doi.org/10.1371/journal.pntd.0012985.s004

(XLSX)

S2 Table. Performance of conventional machine learning models with handcrafted features for B-cell epitope prediction in parasites.

https://doi.org/10.1371/journal.pntd.0012985.s005

(XLSX)

S3 Table. Bioinformatic analysis of proteomic features and expression profiles of F. hepatica across four developmental stages.

https://doi.org/10.1371/journal.pntd.0012985.s006

(XLSX)

S4 Table. The prediction results of B-cell epitope in F. hepatica using deep learning and machine learning models.

https://doi.org/10.1371/journal.pntd.0012985.s007

(XLSX)

References

  1. 1. Torgerson PR, Devleesschauwer B, Praet N, Speybroeck N, Willingham AL, Kasuga F, et al. World Health Organization estimates of the global and regional disease burden of 11 foodborne parasitic diseases, 2010: a data synthesis. PLoS Med. 2015;12(12):e1001920. pmid:26633705
  2. 2. Morgan ER, Aziz N-AA, Blanchard A, Charlier J, Charvet C, Claerebout E, et al. 100 questions in livestock helminthology research. Trends Parasitol. 2019;35(1):52–71. pmid:30477758
  3. 3. Cwiklinski K, O’Neill SM, Donnelly S, Dalton JP. A prospective view of animal and human Fasciolosis. Parasite Immunol. 2016;38(9):558–68. pmid:27314903
  4. 4. Mas-Coma S. Epidemiology of fascioliasis in human endemic areas. J Helminthol. 2005;79(3):207–16. pmid:16153314
  5. 5. Hodgkinson JE, Kaplan RM, Kenyon F, Morgan ER, Park AW, Paterson S, et al. Refugia and anthelmintic resistance: Concepts and challenges. Int J Parasitol Drugs Drug Resist. 2019;10:51–7. pmid:31125837
  6. 6. Pramanik PK, Alam MN, Roy Chowdhury D, Chakraborti T. Drug resistance in protozoan parasites: an incessant wrestle for survival. J Glob Antimicrob Resist. 2019;18:1–11. pmid:30685461
  7. 7. Getzoff ED, Tainer JA, Lerner RA, Geysen HM. The chemistry and mechanism of antibody binding to protein antigens. Adv Immunol. 1988;43:1–98. pmid:3055852
  8. 8. Potocnakova L, Bhide M, Pulzova LB. An Introduction to B-cell epitope mapping and in silico epitope prediction. J Immunol Res. 2016;2016:6760830. pmid:28127568
  9. 9. Van Regenmortel MHV. What is a B-cell epitope?. Methods Mol Biol. 2009;524:3–20. pmid:19377933
  10. 10. Bastos LM, Macêdo AG Jr, Silva MV, Santiago FM, Ramos ELP, Santos FAA, et al. Toxoplasma gondii-derived synthetic peptides containing B- and T-cell epitopes from GRA2 protein are able to enhance mice survival in a model of experimental toxoplasmosis. Front Cell Infect Microbiol. 2016;6:59. pmid:27313992
  11. 11. Buffoni L, Garza-Cuartero L, Pérez-Caballero R, Zafra R, Javier Martínez-Moreno F, Molina-Hernández V, et al. Identification of protective peptides of Fasciola hepatica-derived cathepsin L1 (FhCL1) in vaccinated sheep by a linear B-cell epitope mapping approach. Parasit Vectors. 2020;13(1):390. pmid:32736582
  12. 12. Garza-Cuartero L, Geurden T, Mahan SM, Hardham JM, Dalton JP, Mulcahy G. Antibody recognition of cathepsin L1-derived peptides in Fasciola hepatica-infected and/or vaccinated cattle and identification of protective linear B-cell epitopes. Vaccine. 2018;36(7):958–68. pmid:29373193
  13. 13. Mu Y, Gordon CA, Olveda RM, Ross AG, Olveda DU, Marsh JM, et al. Identification of a linear B-cell epitope on the Schistosoma japonicum saposin protein, SjSAP4: Potential as a component of a multi-epitope diagnostic assay. PLoS Negl Trop Dis. 2022;16(7):e0010619. pmid:35816547
  14. 14. Ansari HR, Raghava GP. Identification of conformational B-cell epitopes in an antigen from its primary sequence. Immunome Res. 2010;6:6. pmid:20961417
  15. 15. da Silva BM, Myung Y, Ascher DB, Pires DEV. epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform. 2022;23(1):bbab423. pmid:34676398
  16. 16. Høie MH, Gade FS, Johansen JM, Würtzen C, Winther O, Nielsen M, et al. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol. 2024;15:1322712. pmid:38390326
  17. 17. Ivanisenko NV, Shashkova TI, Shevtsov A, Sindeeva M, Umerenkov D, Kardymon O. SEMA 2.0: web-platform for B-cell conformational epitopes prediction using artificial intelligence. Nucleic Acids Res. 2024;52(W1):W533–9. pmid:38742639
  18. 18. Liang S, Zheng D, Standley DM, Yao B, Zacharias M, Zhang C. EPSVR and EPMeta: prediction of antigenic epitopes using support vector regression and multiple server results. BMC Bioinformatics. 2010;11:381. pmid:20637083
  19. 19. Ponomarenko J, Bui H-H, Li W, Fusseder N, Bourne PE, Sette A, et al. ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics. 2008;9:514. pmid:19055730
  20. 20. Zhou C, Chen Z, Zhang L, Yan D, Mao T, Tang K, et al. SEPPA 3.0-enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res. 2019;47(W1):W388–94. pmid:31114919
  21. 21. Alghamdi W, Attique M, Alzahrani E, Ullah MZ, Khan YD. LBCEPred: a machine learning model to predict linear B-cell epitopes. Brief Bioinform. 2022;23(3):bbac035. pmid:35262658
  22. 22. Clifford JN, Høie MH, Deleuran S, Peters B, Nielsen M, Marcatili P. BepiPred-3.0: Improved B-cell epitope prediction using protein language models. Protein Sci. 2022;31(12):e4497. pmid:36366745
  23. 23. Collatz M, Mock F, Barth E, Hölzer M, Sachse K, Marz M. EpiDope: a deep neural network for linear B-cell epitope prediction. Bioinformatics. 2021;37(4):448–55. pmid:32915967
  24. 24. Manavalan B, Govindaraj RG, Shin TH, Kim MO, Lee G. iBCE-EL: a new ensemble learning framework for improved linear B-cell epitope prediction. Front Immunol. 2018;9:1695. pmid:30100904
  25. 25. Qi Y, Zheng P, Huang G. DeepLBCEPred: A Bi-LSTM and multi-scale CNN-based deep learning method for predicting linear B-cell epitopes. Front Microbiol. 2023;14:1117027. pmid:36910218
  26. 26. Saha S, Raghava GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins. 2006;65(1):40–8. pmid:16894596
  27. 27. Yao B, Zhang L, Liang S, Zhang C. SVMTriP: a method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity. PLoS One. 2012;7(9):e45152. pmid:22984622
  28. 28. Howe KL, Bolt BJ, Shafie M, Kersey P, Berriman M. WormBase ParaSite - a comprehensive resource for helminth genomics. Mol Biochem Parasitol. 2017;215:2–10. pmid:27899279
  29. 29. Warrenfeltz S, Basenko EY, Crouch K, Harb OS, Kissinger JC, Roos DS, et al. EuPathDB: the eukaryotic pathogen genomics database resource. Methods Mol Biol. 2018;1757:69–113. pmid:29761457
  30. 30. Cwiklinski K, Dalton JP. Advances in Fasciola hepatica research using “omics” technologies. Int J Parasitol. 2018;48(5):321–31. pmid:29476869
  31. 31. Hu R-S, Zhang F-K, Elsheikha HM, Ma Q-N, Ehsan M, Zhao Q, et al. Proteomic profiling of the liver, hepatic lymph nodes, and spleen of buffaloes infected with Fasciola gigantica. Pathogens. 2020;9(12):982. pmid:33255373
  32. 32. Hu R-S, Zhang F-K, Ma Q-N, Ehsan M, Zhao Q, Zhu X-Q. Transcriptomic landscape of hepatic lymph nodes, peripheral blood lymphocytes and spleen of swamp buffaloes infected with the tropical liver fluke Fasciola gigantica. PLoS Negl Trop Dis. 2022;16(3):e0010286. pmid:35320269
  33. 33. Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 2019;47(D1):D339–43. pmid:30357391
  34. 34. Saha S, Bhasin M, Raghava GPS. Bcipep: a database of B-cell epitopes. BMC Genomics. 2005;6:79. pmid:15921533
  35. 35. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2. pmid:23060610
  36. 36. Chen Z, Zhao P, Li C, Li F, Xiang D, Chen Y-Z, et al. iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res. 2021;49(10):e60. pmid:33660783
  37. 37. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A. Feature selection for high-dimensional data. Prog Artif Intell. 2016;5(2):65–75.
  38. 38. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W. Lightgbm: A highly efficient gradient boosting decision tree. Advance neural information processing systems. n.d.;30:3146–54.
  39. 39. Jiao S, Ye X, Ao C, Sakurai T, Zou Q, Xu L. Adaptive learning embedding features to improve the predictive performance of SARS-CoV-2 phosphorylation sites. Bioinformatics. 2023;39(11):btad627. pmid:37847658
  40. 40. Jiao S, Ye X, Sakurai T, Zou Q, Liu R. Integrated convolution and self-attention for improving peptide toxicity prediction. Bioinformatics. 2024;40(5):btae297. pmid:38696758
  41. 41. Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics. 2023;39(1):btac715. pmid:36342186
  42. 42. Xu J, Wu L, Sun Y, Wei Y, Zheng L, Zhang J, et al. Proteomics and bioinformatics analysis of Fasciola hepatica somatic proteome in different growth phases. Parasitol Res. 2020;119(9):2837–50. pmid:32757109
  43. 43. Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26(12):1367–72. pmid:19029910
  44. 44. Thumuluri V, Almagro Armenteros JJ, Johansen AR, Nielsen H, Winther O. DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Res. 2022;50(W1):W228–34. pmid:35489069
  45. 45. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. pmid:36370105
  46. 46. Hu R-S, Zhang X-X, Ma Q-N, Elsheikha HM, Ehsan M, Zhao Q, et al. Differential expression of microRNAs and tRNA fragments mediate the adaptation of the liver fluke Fasciola gigantica to its intermediate snail and definitive mammalian hosts. Int J Parasitol. 2021;51(5):405–14. pmid:33513403
  47. 47. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. pmid:38718835
  48. 48. Blum M, Andreeva A, Florentino LC, Chuguransky SR, Grego T, Hobbs E, et al. InterPro: the protein sequence classification resource in 2025. Nucleic Acids Res. 2025;53(D1):D444–56. pmid:39565202
  49. 49. Piacenza L, Acosta D, Basmadjian I, Dalton JP, Carmona C. Vaccination with cathepsin L proteinases and with leucine aminopeptidase induces high levels of protection against fascioliasis in sheep. Infect Immun. 1999;67(4):1954–61. pmid:10085042
  50. 50. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. pmid:23329690
  51. 51. Charoenkwan P, Schaduangrat N, Pham NT, Manavalan B, Shoombuatong W. Pretoria: An effective computational approach for accurate and high-throughput identification of CD8+ t-cell epitopes of eukaryotic pathogens. Int J Biol Macromol. 2023;238:124228. pmid:36996953
  52. 52. Hu R-S, Hesham AE-L, Zou Q. Machine learning and its applications for protozoal pathogens and protozoal infectious diseases. Front Cell Infect Microbiol. 2022;12:882995. pmid:35573796
  53. 53. Hu R-S, Wu J, Zhang L, Zhou X, Zhang Y. CD8TCEI-EukPath: a novel predictor to rapidly identify CD8+ T-cell epitopes of eukaryotic pathogens using a hybrid feature selection approach. Front Genet. 2022;13:935989. pmid:35937988
  54. 54. Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. Elife. 2023;12:e82819. pmid:36651724
  55. 55. Le NQK. Leveraging transformers-based language models in proteome bioinformatics. Proteomics. 2023;23(23–24):e2300011. pmid:37381841
  56. 56. Mswahili ME, Jeong Y-S. Transformer-based models for chemical SMILES representation: A comprehensive literature review. Heliyon. 2024;10(20):e39038. pmid:39640612
  57. 57. Acosta D, Cancela M, Piacenza L, Roche L, Carmona C, Tort JF. Fasciola hepatica leucine aminopeptidase, a promising candidate for vaccination against ruminant fasciolosis. Mol Biochem Parasitol. 2008;158(1):52–64. pmid:18178266
  58. 58. Checa J, Salazar C, Goyeche A, Rivera M, Silveira F, Maggioli G. A promising new target to control fasciolosis: Fasciola hepatica leucine aminopeptidase 2. Vet Parasitol. 2023;320:109959. pmid:37329826
  59. 59. Hernández-Guzmán K, Sahagún-Ruiz A, Vallecillo AJ, Cruz-Mendoza I, Quiroz-Romero H. Construction and evaluation of a chimeric protein made from Fasciola hepatica leucine aminopeptidase and cathepsin L1. J Helminthol. 2016;90(1):7–13. pmid:25274570
  60. 60. Ortega-Vargas S, Espitia C, Sahagún-Ruiz A, Parada C, Balderas-Loaeza A, Villa-Mancera A, et al. Moderate protection is induced by a chimeric protein composed of leucine aminopeptidase and cathepsin L1 against Fasciola hepatica challenge in sheep. Vaccine. 2019;37(24):3234–40. pmid:31036453
  61. 61. Salazar C, Tort JF, Carmona C. Design of a peptide-carrier vaccine based on the highly immunogenic Fasciola hepatica leucine aminopeptidase. Methods Mol Biol. 2020;2137:191–204. pmid:32399930
  62. 62. Toet H, Piedrafita DM, Spithill TW. Liver fluke vaccines in ruminants: strategies, progress and future opportunities. Int J Parasitol. 2014;44(12):915–27. pmid:25200351
  63. 63. Ehsan M, Hu R-S, Liang Q-L, Hou J-L, Song X, Yan R, et al. Advances in the development of anti-Haemonchus contortus vaccines: challenges, opportunities, and perspectives. Vaccines (Basel). 2020;8(3):555. pmid:32971770
  64. 64. Bahrami AA, Payandeh Z, Khalili S, Zakeri A, Bandehpour M. Immunoinformatics: in silico approaches and computational design of a multi-epitope, immunogenic protein. Int Rev Immunol. 2019;38(6):307–22. pmid:31478759
  65. 65. Baptista B de O, Souza ABL de, Oliveira LS de, Souza HADS de, Barros JP de, Queiroz LT de, et al. B-cell epitope mapping of the Plasmodium falciparum malaria vaccine candidate GMZ2.6c in a naturally exposed population of the brazilian amazon. Vaccines (Basel). 2023;11(2):446. pmid:36851323
  66. 66. Durante IM, La Spina PE, Carmona SJ, Agüero F, Buscaglia CA. High-resolution profiling of linear B-cell epitopes from mucin-associated surface proteins (MASPs) of Trypanosoma cruzi during human infections. PLoS Negl Trop Dis. 2017;11(9):e0005986. pmid:28961244
  67. 67. Guedes RLM, Rodrigues CMF, Coatnoan N, Cosson A, Cadioli FA, Garcia HA, et al. A comparative in silico linear B-cell epitope prediction and characterization for South American and African Trypanosoma vivax strains. Genomics. 2019;111(3):407–17. pmid:29499243
  68. 68. Javadi Mamaghani A, Fathollahi A, Spotin A, Ranjbar MM, Barati M, Aghamolaie S, et al. Candidate antigenic epitopes for vaccination and diagnosis strategies of Toxoplasma gondii infection: A review. Microb Pathog. 2019;137:103788. pmid:31605758
  69. 69. Mendes TA de O, Reis Cunha JL, de Almeida Lourdes R, Rodrigues Luiz GF, Lemos LD, dos Santos ARR, et al. Identification of strain-specific B-cell epitopes in Trypanosoma cruzi using genome-scale epitope prediction and high-throughput immunoscreening with peptide arrays. PLoS Negl Trop Dis. 2013;7(10):e2524. pmid:24205430
  70. 70. Cretich M, Gori A, D’Annessa I, Chiari M, Colombo G. Peptides for infectious diseases: from probe design to diagnostic microarrays. Antibodies (Basel). 2019;8(1):23. pmid:31544829
  71. 71. Durairaj R, Pageat P, Bienboire-Frosini C. Impact of semiochemicals binding to Fel d 1 on Its 3D conformation and predicted B-cell epitopes using computational approaches. Int J Mol Sci. 2023;24(14):11685. pmid:37511444
  72. 72. Sela-Culang I, Kunik V, Ofran Y. The structural basis of antibody-antigen recognition. Front Immunol. 2013;4:302. pmid:24115948
  73. 73. Ramakrishnan G, Baakman C, Heijl S, Vroling B, van Horck R, Hiraki J, et al. Understanding structure-guided variant effect predictions using 3D convolutional neural networks. Front Mol Biosci. 2023;10:1204157. pmid:37475887
  74. 74. Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and flexible protein design using deep graph neural networks. Cell Syst. 2020;11(4):402–411.e4. pmid:32971019