TastepepAI: An artificial intelligence platform for taste peptide de novo design
Fig 4
Development and optimization of SpepToxPred.
(A) Systematic framework for feature engineering and model optimization. Upper panel: Integration of 20 sequence encoding descriptors (light yellow box) and 9 machine learning algorithms (light blue box). Middle panel: Performance evaluation of individual algorithms with their optimal feature combinations through 10-fold cross-validation, ranked by Matthews Correlation Coefficient (MCC). Lower panel: Weight optimization results for ensemble models, showing the top 5 configurations with different algorithm combinations. SpepToxPred (Model 1) achieved optimal performance with weights distributed across RF (0.3), LGBM (0.1), XGB (0.2), KNN (0.2), and LR (0.2). Full spelling of the abbreviations of the features and algorithms are listed in Section 4.2.4. (B) Comprehensive performance comparison of SpepToxPred with 17 existing toxicity prediction tools on the independent test set. The evaluation metrics include true positives (TP), false positives (FP), true negatives (TN), false negatives (FN), accuracy, recall (sensitivity), precision, specificity, F1 score, and MCC. SpepToxPred and Models 2-5 represent the top five ensemble configurations from the optimization framework.