Fig 1.
Performance of PepLM-GNN against other baseline methods on the benchmark dataset via five-fold cross-validation.
The benchmark dataset (total 17244 samples) is derived from peptide-protein complex structures in the RCSB PDB database (before October 2022), containing 8622 positive samples (interacting peptide-protein pairs) and 8622 negative samples (non-interacting pairs) after data filtering. All reported performance metrics (e.g., ACC, AUC) represent the average values obtained from the five-fold cross-validation conducted on the benchmark dataset.
Table 1.
Statistical t-test p-values for ACC of comparative methods versus PepLM-GNN. This table reports the p-values of statistical t-tests between PepLM-GNN and other baseline methods, based on the mean ACC values from five-fold cross-validation on the benchmark dataset (17244 samples: 8622 positive and 8622 negative pairs). A p-value < 0.05 indicates that the PepLM-GNN is statistically significantly higher than that of the comparative method.
Fig 2.
Performance of PepLM-GNN against other baseline methods on four independent test datasets.
The four test datasets include: Test1440 (1440 positive peptide-protein pairs and 1440 negative pairs, sourced from the RCSB PDB database, January 2023-July 2024), LEADS-PEP (52 positive pairs and 52 negative pairs, a classic benchmark for evaluating peptide-protein docking performance), Test251 (249 positive pairs and 249 negative pairs), and Test167 (255 positive pairs and 255 negative pairs, derived from the RCSB PDB database, October-December 2022). PepLM-GNN’, DeepGNHV, Deep-GNN-esm are only applied to test subsets with available structural data. All performance metrics represent the mean values averaged across the five models derived from five-fold cross-validation on each independent test set.
Table 2.
Statistical t-test p-values for ACC between comparative methods and PepLM-GNN across four combined test sets. This table reports the p-values of statistical t-tests between PepLM-GNN and other comparative methods. The test data is the union of four independent test datasets. T-tests are based on the mean ACC values from five-fold cross-validation across five folds on the combined dataset. A p-value < 0.05 indicates that PepLM-GNN is statistically superior.
Fig 3.
Comparison of PepPI predictions based on cluster-split datasets for predicting novel peptides, proteins, and peptide-protein pairs.
Error bars represent the mean ± standard deviation of cross-validation experiments. The cluster-split (cold start) dataset is constructed using the CD-HIT clustering algorithm with four thresholds (0.6, 0.7, 0.8, 0.9), following the CAMP strategy: no entities from the same cluster appear in both training and test sets, resulting in three sub-datasets (“novel peptides”, “novel proteins”, “novel binding pairs”). All performance metrics are the mean ± standard deviation of five folds from five-fold cross-validation.
Table 3.
Performance comparison between PepLM-GNN and other pre-trained language models on the benchmark dataset in terms of ACC, F1, AUC, and AUPR. The benchmark dataset contains 17,244 samples (8,622 positive and 8,622 negative pairs). All metrics (ACC, F1, AUC, AUPR) are the mean ± standard deviation of five folds from five-fold cross-validation.
Table 4.
P-values of statistical t-test for ACC: PepLM-GNN with different pre-trained model on five-fold cross-validation set. This table reports the p-values of statistical t-tests between the PepLM-GNN (using ProtT5) and other pre-trained language models. T-tests are based on the mean ACC values from five-fold cross-validation on the benchmark dataset (17244 samples: 8622 positive + 8622 negative pairs). A p-value < 0.05 indicates that the original PepLM-GNN has a statistically significantly higher ACC than the variant.
Table 5.
Ablation experiment performance of PepLM-GNN on the benchmark set using five-fold cross-validation. Ablation variants include: A (pretrained module + classification module), B (pretrained module + GAT + classification module), C (pretrained module + GCN + classification module), D (pretrained module + GIN + classification module), and the complete PepLM-GNN model (pretrained module + GCN + GIN + classification module). All metrics (ACC, AUC) are the mean ± standard deviation across five folds of a five-fold cross-validation.
Table 6.
P-values of statistical t-test for ACC: Ablation experiments of PepLM-GNN on five-fold cross-validation set. This table reports the p-values from t-tests comparing the complete PepLM-GNN model with each of its ablation variants. Ablation variants include: A (pretrained module + classification module), B (pretrained module + GAT + classification module), C (pretrained module + GCN + classification module), D (pretrained module + GIN + classification module). T-tests are based on the mean ACC values from five-fold cross-validation on the benchmark dataset (17244 samples: 8622 positive + 8622 negative pairs). A p-value < 0.05 indicates that the removed module makes a statistically significant contribution to the ACC of PepLM-GNN.
Fig 4.
Application of PepLM-GNN in virtual peptide drug screening.
(A) Relative changes in the predicted binding probability of RBBp4 and SALL4 peptide alanine mutants by PepLM-GNN compared with experimentally measured changes in binding free energy. (B) Protein subgraphs interacting with FFW peptides extracted using the GNNExplainer tool. (C) Functional enrichment analysis of proteins interacting with the FFW peptide.
Fig 5.
The framework of PepLM-GNN, comprising four modules: ProtT5, graph convolution, graph isomorphism, and classification.