Fig 1.
Overview of the SimPep framework, a seven-step process for osteogenic peptide detection (OPD).
(A) Input sets: (positive training set),
(negative training set) and
(test set), (B) Peptide representation based on biologically features (Z-scale and T-scale) and embeddings obtained from protein language models (ProtBERT and ESM-2); ProtBERT is selected as the optimal representation, (C) Balanced dataset construction (
) for osteogenic peptide similarity (OPS) classification problem where
: the pairs of known OPs share the same osteogenic properties,
: the pairs of non-OPs also share the same osteogenic properties,
: the pairs of OPs and non-OPs exhibit varying osteogenicity;
and
= randomly oversampled
where
, (D) SimPep-Net: a siamese model architecture for OPS classification prediction, (E) Iterative training: if accuracy is unstable after 5 epochs, a new balanced dataset is generated (repeat C) for retraining, (F) SimPep-Net evaluation for OPS classification prediction, (G) OPD prediction: unknown peptides in
are paired with known peptides to infer osteogenicity using SimPep-Net.
Fig 2.
The architecture of SimPep-Net model.
(A) A pair of peptides is provided as input to SimPep-Net, with each peptide encoded to a 1024-dimensional vector using the pre-trained ProtBERT model (
and
, (B) Each vector is mapped individually to a 32-dimensional (
and
latent space via a non-linear function
, (C) The absolute difference between the two latent vectors is computed and passed through a fully connected layer with 16 neurons followed by a sigmoid activation to predict peptide similarity.
Fig 3.
The validation of the SimPep-Net model for OPS prediction and the SimPep framework for OPD prediction under different dropout settings.
Fig 4.
The validation of the SimPep-Net model for OPS prediction and the SimPep framework for OPD prediction under different learning rate settings.
Fig 5.
The validation of the SimPep-Net model for OPS prediction and the SimPep framework for OPD prediction under different optimizer configurations.
Table 1.
The validation of the SimPep-Net model for the OPS classification problem based on peptide representation by ProtBERT and ESM-2 embeddings, and Z-Scale and T-Scale biological features.
Table 2.
The validation of the SimPep framework for OPD based on peptide representation by ProtBERT and ESM-2 embeddings, and Z-Scale and T-Scale biological features.
Table 3.
The SimPep-Net performance for the OPS classification problem in each fold of five-fold cross-validation.
Table 4.
The SimPep performance in OPD prediction in each fold of five-fold cross-validation.
Table 5.
The SimPep-Net performance using as the non-OP set for the OPS classification problem.
Table 6.
The SimPep framework performance using as non-OP set for OPD prediction.
Table 7.
criterion for comparing the performance of the framework for two different non-OP sets.
Fig 6.
Benchmarking SimPep against RF, SVM, and XGBoost (XGB).
Table 8.
The OPD prediction score ( in the seventh step of the SimPep framework) on external osteogenic peptides published between 2022 and 2024. Bioactivity score (
) is computed by PeptideRanker [39].
Table 9.
The OPD prediction ( in the seventh step of the SimPep framework) on external non-osteogenic peptides published in [22]. Bioactivity score (
) is computed by PeptideRanker [39].
Table 10.
The list of potential osteogenic peptides derived from casein types using SimPep framework where (
in the seventh step of the framework).
shows the number of preformation of the framework out of 10 where
.
Table 11.
The list of key receptors that are relevant to osteogenesis.
Fig 7.
The distribution of docking scores.
(A) Based on each peptide within 13 receptors, (B) Based on 11 peptides using 13 receptors.
Fig 8.
The docking of peptide P1 and the intended receptors.
(A) The predicted structure of peptide P1using AAT Bioquest, (B) The docking of P1 peptide and Frizzled-2 receptor, (C) The docking of P1 peptide and Frizzled-4 receptor.