Fig 1.
Overview of the TastePepAI platform for taste peptide design.
TastePepAI is a fully automated integrated computational platform. The platform firstly analyzes users’ requirements for specific taste characteristics, followed by four main steps: (1) Target taste peptide generation (light blue panel): utilizing LA-VAE to generate sequences with desired taste properties while suppressing unwanted taste. (2) Clustering analysis of generated sequences to select representative peptide sequences (light red panel). (3) Toxicity prediction using SpepToxPred (yellow-green panel). (4) Comprehensive physicochemical analysis of candidate peptides, including properties such as hydrophobicity, solubility, charge, stability, charge density, isoelectric point, aromaticity, and aliphatic index (light yellow panel).
Fig 2.
Sequence characteristics and taste property distribution of the curated taste peptide dataset.
(A) Length distribution of taste peptides. (B) Distribution of peptides across five basic taste categories: sour (201), sweet (162), bitter (541), salty (141), and umami (575) peptides. (C) Amino acid composition analysis across different taste categories. (D) Non-redundant classification analysis of 1131 taste peptides revealing the distribution of single and multiple taste properties. Colored circles represent different tastes (sour: light green, sweet: light red, bitter: light gray, salty: light blue, umami: light yellow), with multiple circles indicating peptides possessing multiple taste properties.
Fig 3.
Architecture and workflow of LA-VAE.
(A) Schematic illustration of the loss-supervised adaptive data generation framework. The training process is strategically divided into three phases: (1) Initial exploration phase (first half of total epochs, blue) monitors and records the global minimum loss while maintaining the model’s generative capability; (2) Convergence optimization phase (second half of total epochs, purple) generates sequences and terminates upon discovering a lower loss value, otherwise continues training; (3) Extension phase (additional epochs, dark purple) activates when a new optimal loss is not found during phase II, enabling further optimization. The lower panel shows the core components of the variational autoencoder architecture, including the encoder for latent space mapping, the latent space sampler, and the decoder for sequence reconstruction. Yellow and purple dots represent generated and training data points, respectively, illustrating the progressive refinement of the model’s generative distribution. (B) Contrastive learning-based taste property control mechanism. Left panel: Workflow of selective taste removal, where user-specified taste peptides are split into positive training and negative sets, each processed through variational autoencoders to establish contrasting latent spaces. Middle panel (Step 1): Visualization of latent space distribution displaying positive training data (pink), negative data (green), and generated data points (orange). Right panel (Step 2): Quality assessment of generated peptides based on Euclidean distances to k-nearest neighbors (k = 5). Upper plots show high-quality generated peptides (GP 1-3) with significant distance differences between positive and negative samples (*p < 0.05, **p < 0.01), while lower plots demonstrate low-quality peptides (GP 4-6) with non-significant differences (ns). Scatter plots illustrate the spatial distribution of high-quality (upper) and low-quality (lower) generated peptides (gray) relative to positive training data (yellow) and negative data (green) in the latent space.
Fig 4.
Development and optimization of SpepToxPred.
(A) Systematic framework for feature engineering and model optimization. Upper panel: Integration of 20 sequence encoding descriptors (light yellow box) and 9 machine learning algorithms (light blue box). Middle panel: Performance evaluation of individual algorithms with their optimal feature combinations through 10-fold cross-validation, ranked by Matthews Correlation Coefficient (MCC). Lower panel: Weight optimization results for ensemble models, showing the top 5 configurations with different algorithm combinations. SpepToxPred (Model 1) achieved optimal performance with weights distributed across RF (0.3), LGBM (0.1), XGB (0.2), KNN (0.2), and LR (0.2). Full spelling of the abbreviations of the features and algorithms are listed in Section 4.2.4. (B) Comprehensive performance comparison of SpepToxPred with 17 existing toxicity prediction tools on the independent test set. The evaluation metrics include true positives (TP), false positives (FP), true negatives (TN), false negatives (FN), accuracy, recall (sensitivity), precision, specificity, F1 score, and MCC. SpepToxPred and Models 2-5 represent the top five ensemble configurations from the optimization framework.
Fig 5.
Electronic tongue analysis reveals concentration-dependent taste profiles of TastePepAI-generated peptides.
Taste characteristics of 73 peptides at two concentrations (0.1 mg/mL and 1 mg/mL). The intensity of each taste modality (sour, sweet, bitter, salty, and umami) is represented by colored dots, where the size reflects the quartile distribution of positive taste scores: large filled dots (75th-100th percentile), medium filled dots (50th-75th percentile), small filled dots (25th-50th percentile), and tiny dots (0-25th percentile). Large empty circles indicate undetected taste responses. At 0.1 mg/mL concentration, all peptides exhibited sweet and umami characteristics, whereas at 1 mg/mL concentration, a universal salty response was observed across all samples.
Fig 6.
Development and deployment of integrated open-access platforms for taste peptide research.
(A) Logo and landing page of TastePepMap, a comprehensive database for taste peptides. (B) User interface of TastePepMap. (C) Logo and entry page of TastePepAI. (D) User interface of TastePepAI. (E) Logo and entry page of SpepToxPred, a tool for AI-driven peptide toxicity prediction. (F) User interface of SpepToxPred.