Fig 1.
Flowchart of ENCAP.
Table 1.
Evaluation results of cross validation on DS1-CV for all the ML models.
Bald face indicates the highest value among all the methods.
Table 2.
Evaluation results of cross validation on DS2-CV for all the ML models.
Bald face indicates the highest value among all the methods.
Table 3.
Evaluation results of independent test on DS1-IND for all the ML models.
Bald face indicates the highest value among all the methods.
Table 4.
Evaluation results of independent test on DS2-IND for all the ML models.
Bald face indicates the highest value among all the methods.
Fig 2.
t-SNE distributions of (A) DS1-CV using 4349 features, (B) DS1-CV using 150 selected features, (C) DS2-CV using 4349 features, and (D) DS2-CV using 210 selected features.
Table 5.
The selected numbers and sizes of features for the top 5 most frequent feature types (excluding motifs) used for machine learning on DS1 and DS2.
Fig 3.
The beeswarm plots of SHAP values for the top 20 features based on A) DS1 and B) DS2.
Fig 4.
Analysis of prediction performance evaluated with MCC with respect to three different peptide properties.
Panels A, B, and C corresponds to ratios of hydrophobic, hydrophilic, and charged amino acids for peptides from DS1-IND. Panels D, E, and F corresponds to ratios of hydrophobic, hydrophilic, and charged amino acids for peptides from DS2-IND.