BLSAM-TIP: Improved and robust identification of tyrosinase inhibitory peptides by integrating bidirectional LSTM with self-attention mechanism

Saeed Ahmed; Nalini Schaduangrat; Pramote Chumnanpuen; S. M. Hasan Mahmud; Kah Ong Michael Goh; Watshara Shoombuatong

doi:10.1371/journal.pone.0333614

Abstract

Tyrosinase plays a central role in melanin biosynthesis, and its dysregulation has been implicated in the pathogenesis of various pigmentation disorders. The precise identification of tyrosinase inhibitory peptides (TIPs) is critical, as these bioactive molecules hold significant potential for therapeutic and cosmetic applications, including the treatment of hyperpigmentation and the development of skin-whitening agents. To date, computational methods have received significant attention as a complement to experimental methods for the in silico identification of TIPs, reducing the need for extensive material resources and labor-intensive processes. In this study, we propose an innovative computational approach, BLSAM-TIP, which combines a bidirectional long short-term memory (BiLSTM) network and a self-attention mechanism (SAM) for accurate and large-scale identification of TIPs. In BLSAM-TIP, we first employed various multi-source feature embeddings, including conventional feature encodings, natural language processing-based encodings, and protein language model-based encodings, to encode comprehensive information about TIPs. Secondly, we integrated these feature embeddings to enhance feature representation, while a feature selection method was applied to optimize the hybrid features. Thirdly, the BiLSTM-SAM architecture was specially developed to highlight the crucial features. Finally, the features from BiLSTM-SAM was fed to deep neural networks (DNN) in order to identify TIPs. Experimental results on an independent test dataset demonstrate that BLSAM-TIP attains superior predictive performance compared to existing methods, with a balanced accuracy of 0.936, MCC of 0.922, and AUC of 0.988. These results indicate that this new method is an accurate and efficient tool for identifying TIPs. Our proposed method is available at https://github.com/saeed344/BLSAM-TIP for TIP identification and reproducibility purposes.

Citation: Ahmed S, Schaduangrat N, Chumnanpuen P, Mahmud SMH, Goh KOM, Shoombuatong W (2025) BLSAM-TIP: Improved and robust identification of tyrosinase inhibitory peptides by integrating bidirectional LSTM with self-attention mechanism. PLoS One 20(10): e0333614. https://doi.org/10.1371/journal.pone.0333614

Editor: Yunhe Wang, Hebei University of Technology, CHINA

Received: January 15, 2025; Accepted: September 15, 2025; Published: October 8, 2025

Copyright: © 2025 Ahmed et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The source code of BLSAM-TIP and all the data used in this study are freely available at https://figshare.com/s/e0ddad96bb9a366b373d and https://github.com/saeed344/BLSAM-TIP.

Funding: This project is funded by the National Research Council of Thailand and Mahidol University (N42A660380), and Mahidol University Partnering Initiative under the MU-KMUTT Biomedical Engineering & Biomaterials Research Consortium. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors declare that they have no competing interests.

1. Introduction

Tyrosinase is a crucial enzyme involved in the biosynthesis of melanin, catalyzing the initial steps of melanogenesis in mammals and contributing to enzymatic browning in fruits and vegetables [1–3]. This copper-containing oxidase facilitates the oxidation of phenolic compounds, leading to the formation of melanin, which plays an essential role in pigmentation and protection against UV radiation [1]. However, excessive melanin production can result in hyperpigmentation disorders, and the browning of fruits during storage can lead to economic losses in the food industry. Consequently, there is a significant interest in developing tyrosinase inhibitors as therapeutic agents for skin conditions and as preservatives in food products [4,5]. The overproduction of melanin can lead to hyperpigmentation disorders, such as melasma and age spots, which pose both cosmetic and medical concerns [6,7]. As a result, there has been growing interest in identifying effective inhibitors of tyrosinase to mitigate these conditions [8,9]. The search for effective tyrosinase inhibitors has led to the identification of various natural and synthetic compounds. These inhibitors can be classified into different categories based on their mechanisms of action and chemical structures. Some inhibitors act as competitive or non-competitive agents, while others may irreversibly bind to the enzyme, effectively inactivating it during catalysis. For instance, kojic acid is one of the most well-studied tyrosinase inhibitors and serves as a benchmark for evaluating the efficacy of new compounds [10,11].

Tyrosinase inhibitory peptides (TIPs) have emerged as promising candidates for reducing melanin production. These peptides, typically composed of 3–20 amino acids, can effectively inhibit tyrosinase activity. Recent studies have demonstrated that various bioactive peptides derived from natural sources exhibit strong tyrosinase inhibitory properties, offering a safer alternative to traditional chemical inhibitors such as hydroquinone and kojic acid, which may cause adverse side effects [12,13]. The mechanisms by which TIPs exert their inhibitory effects are multifaceted. These peptides can bind to the active site of tyrosinase, leading to competitive inhibition or, in some cases, irreversible inhibition. Additionally, TIPs may modulate signaling pathways involved in melanogenesis, further enhancing their effectiveness in treating hyperpigmentation [9,14].

The methodologies employed for identifying TIPs can be broadly categorized into in vitro and in silico approaches, each with distinct advantages and limitations. Current experimental methods face significant challenges in high-throughput screening due to their labor-intensive and expensive nature [8,15]. Recently, advancements in computational methods, such as machine learning (ML) algorithms, have facilitated the prediction and identification of novel TIPs. These approaches allow researchers to screen thousands of peptides based on their structural properties and predicted anti-tyrosinase activities, demonstrating high accuracy rates [8,12,15]. By integrating bioinformatics with peptide research, these advancements are driving more precise and efficient strategies for tyrosinase inhibition. Notable examples include ML-based methods such as TIP-KNN and TIP-RF [15], as well as TIPred [8]. Comprehensive details on these cutting-edge techniques are provided in earlier studies [8]. Despite ongoing improvements in the predictive performance of these advanced methods [8,15], their practical effectiveness in real-world applications remains inadequate. Key challenges include the limited availability of known TIPs and issues related to imbalanced learning.

Although the existing methods facilitate the identification of TIPs, several challenges remain to be addressed. First, relying on single feature descriptor is insufficient for capturing the comprehensive information of TIPs [16–20]. Second, protein language models (PLMs), inspired by natural language models (LMs), have recently shown effectiveness in generating peptide sequence representations [19,21,22]. Since PLMs are pre-trained on extensive protein databases such as BFD [23,24], UniRef [25], and Pfam [26], which collectively contain over a billion protein sequences, they can extract comprehensive and valuable information. Regrettably, no studies have yet employed PLMs to generate feature representations for TIPs. Third, the imbalance between TIPs and non-TIPs in datasets can adversely affect the prediction performance of the models. Finally, the overall prediction accuracy and robustness of existing methods remains inadequate, highlighting the need for further improvements.

To address these deficiencies, a novel computational approach, termed BLSAM-TIP, leveraging a combination of bidirectional long short-term memory (BiLSTM) and a self-attention mechanism (SAM), is proposed for the accurate and large-scale identification of TIPs (Fig 1). The major contributions of the proposed model can be summarized in the following four aspects. First, to capture multi-view and comprehensive information about TIPs, various feature encoding schemes were employed, encompassing sequential information, graphical information, statistical information, contextual information, and protein semantic information. Second, the synthetic minority oversampling technique (SMOTE) was utilized to address the impact of data imbalance on the model’s performance. Additionally, the least absolute shrinkage and selection operator (LASSO) method was applied to optimize the combined features, potentially enhancing the model performance. Third, the BiLSTM-SAM-DNN architecture was specially constructed to reduce interference from irrelevant information and subsequently employed to identify TIPs. Fourth, benchmark experiments on the independent test set illustrated that BLSAM-TIP significantly outperformed existing state-of-the-art methods, achieving a balanced accuracy (BACC) of 0.936, Matthew’s correlation coefficient (MCC) of 0.870, and an area under the receiver-operating curve (AUC) of 0.988.

Download:

Fig 1. An overview of BLSAM-TIP for identifying TIPs.

(A) Data construction. (B) Overall framework of BLSAM-TIP. (C) Performance evaluation and ablation experiments.

https://doi.org/10.1371/journal.pone.0333614.g001

2. Materials and methods

2.1 Data collection and curation

The existing predictors for TIPs were developed and fine-tuned using a limited dataset of TIPs and non-TIPs, as detailed in S1 Table. Developing a high-accuracy predictive model necessitates a larger sample size [27–29]. To construct an updated and high-quality dataset, specific filtering criteria were applied to the initial TIPs and non-TIPs: (i) peptide sequences containing unusual letters such as ‘B’, ‘U’, ‘X’, or ‘Z’ were eliminated; and (ii) duplicate peptide sequences were removed. After applying these filters, a refined dataset comprising 206 TIPs and 502 non-TIPs was compiled. These sequences were sourced from our previous research [8] and six recently published studies [8,12,15,30–34], while 401 non-TIPs were taken from our earlier work [8] and Qin et al. [35]. Since TIPs are typically shorter than 20 amino acid residues [9], both TIP and non-TIP datasets were restricted to peptide lengths ranging from 2 to 20 amino acids. For the establishment of training and independent test datasets, we adhered to the criteria set forth by Charoenkwan et al. [8]. The training dataset consisted of 164 TIPs and 401 non-TIPs, while the independent test dataset included 42 TIPs and 101 non-TIPs. Additional details regarding the composition of the training and independent test datasets can be found in S1 Table.

2.2 Feature encoding scheme

To capture comprehensive information about TIPs, we employed six feature encoding methods from different perspectives, including conventional feature encodings, NLP-based encodings, and protein language model-based encodings. For conventional feature encoding, we applied a novel feature extraction method called FEGS, which is capable of capturing graphical and statistical information [36]. FEGS integrates two interpretable feature descriptors (i.e., amino acid composition (AAC) and dipeptide composition (DPC)) with the physicochemical properties of amino acids. Based on FEGS, any peptide sequence is encoded as a 578-D feature vector. In the recent years, embedding methods inspired by NLP techniques have gained attention in the field of bioinformatics and computational biology [37–40]. These methods provide contextual information for peptide sequences [37,38,41]. Among them, FastText is a powerful embedding method that leverages morphological information to address the issue of out-of-vocabulary words [42,43], thereby improving performance in downstream tasks. Herein, FastText generates a 120-D feature vector for peptide sequences. With advancements in NLP techniques and the availability of millions of protein sequences, PLMs have been increasingly employed as embedding extractors. In this study, we utilized four well-known PLMs, including bidirectional encoder representations from transformers (BERT) [44], ProtT5-U50, ProtT5-BFD, and ESM-2, to encode peptide sequences into feature embeddings (i.e., distributed vector representations). The text-to-text transfer transformer (T5) [45] architecture was used to develop both ProtT5-BFD and ProtT5-UR50. ProtT5-UR50 was trained on Uniref50 [25], which contains 45 million protein sequences, while ProtT5-BFD was trained on BFD [46], a database comprising 2.1 billion protein sequences. to account for the relatively small size of the training dataset, we used ESM-2, which was trained on the UR50/D 2021_04 dataset (called esm2_t6_8M_UR50D). Additionally, we used the esm2_t33_650M_UR50D model [47], which is based on the BERT architecture and was trained on Uniref50. Using these PLMs, peptide sequences were encoded as feature vectors of varying dimensions: 768-D, 1024-D, 1024-D, and 320-D for BERT, ProtT5-U50, ProtT5-BFD, and ESM-2, respectively.

2.3 Feature selection method

In the field of bioinformatics, feature selection plays an important role in enhancing model efficiency and addressing overfitting [48–50]. Robert Tibshirani introduced a well-regarded feature selection method called LASSO, which can perform both feature selection and regularization. This method has been effective in identifying beneficial features from high-dimensional data [51]. Given and as the features and classes, the linear regression model is defined as follows:

(1)

where , , and is the error term. In the LASSO method, the goal is to determine the optimal value of with a special penalty constraint. The LASSO estimation is defined as follows:

(2)

(3)

where represents the Euclidean norm.

2.4 Bidirectional long short-term memory and self-attention mechanism

Long short-term memory network (LSTM) can learn long-term sequential features without requiring a large number of features, unlike traditional ML models, that often depend on additional features to improve model performance. The LSTM method was developed to address the vanishing gradient problem [52,53], a challenge encountered in recurrent neural networks (RNNs) [54]. LSTMs use memory cells to decide which information to retain and which to discard, enabling them to capture long-range contextual information effectively. The structure of an LSTM typically contains three main components, such as the forget gate (), the input gate (), and the memory cell (). At time , the formulations of the LSTM structure can be defined as follows:

(4)

(5)

(6)

(7)

(8)

(9)

(10)

where , , and represent the weights of , , and , respectively, while , , and are the biases of input gate, forget gate, and output gate, respectively. is the updated cell state, generated based on the previous cell state. . Rather than using LSTM, we applied BiLSTM, which consists of two LSTM layers – one processing sequences in the forward direction and the other in the backward direction. This design enables BiLTSM to capture both future/upcoming and historical contexts, allowing it to extract not only past information but also future features, thereby achieving better prediction performance compared to standard LSTMs.

2.5 Self-attention mechanism

To date, the attention mechanism has effectively helped models highlight important parts of sentences in several NLP tasks, such as aspect-level sentiment classification. Specifically, the attention mechanism can automatically extract significant word embeddings from text sequences during model training [55,56]. Several previous studies have demonstrated its successful application in enhancing model performance in bioinformatics and computational biology [48,57,58]. Thus, after obtaining the features generated from BiLSTM, we employed the SAM to strengthen specific BiLSTM-based feature representations. In the SAM structure, given an input, it can generate three standard matrices Query (Q), Key (K), and Value (V), which is calculated as follows:

(11)

where , , and . Here, represents the dimensionality of and , while , , and are the weight matrices used to compute , , and , respectively.

2.6 The overall framework and performance of BLSAM-TIP

Fig 1 illustrates the overall framework and performance of the proposed BLSAM-TIP model for identifying TIPs. As shown in Fig 1, BLSAM-TIP is a DL-based prediction model where the input is a query peptide sequence, and the output is the confidence score for TIP identification. The BLSAM-TIP framework consists of two main procedures: (i) multi-view feature extraction and optimization, and (ii) TIP identification using the BiLSTM-SAM-DNN architecture.

Procedure I: Multi-view feature extraction and optimization.

The input peptide sequence is processed using various feature encoding methods, including FastText, BERT, ProtT5-U50, ProtT5-BFD, ESM-2, and FEGS. These methods generate feature vectors of dimensions 120-D, 768-D, 1024-D, 1024-D, 320-D, and 578-D, respectively. These diverse feature representations capture different types of information, such as sequential information, graphical information, contextual information, and protein semantic information. To comprehensively represent multi-view information beneficial for TIP identification, we combined the above-mentioned feature representations. Given the imbalance between TIP and non-TIP samples (i.e., 158 TIPs and 408 non-TIPs), the learning accuracy and model performance might be imparied [59,60]. Thus, to address data imbalance, the SMOTE method was employed to oversample TIPs [61]. As a result, we obtained a hybrid feature vector containing 3848 features. To eliminate noise and irrelevant information, several well-known feature selection methods were applied, generating various feature subsets. Finally, the optimal feature subset was selected based on the best-performing cross-validation MCC value.

Procedure II: TIP identification using the BiLSTM-SAM-DNN architecture.

The BiLSTM-SAM architecture, which combines BiLSTM and SAM, was specifically designed to mitigate interference from irrelevant information and enhance prediction performance. Finally, the BiLSTM-SAM-based feature representations were input into deep neural networks (DNN) for the identification of TIPs [48,62]. The performance of BLSAM-TIP and related prediction models was evaluated using several metrics, including BACC, AUC, MCC, F1, area under the precision-recall curve (AUPR), sensitivity (SN), and specificity (SP) [63–68]. Additional details about these seven performance metrics are provided in the Supplementary information.

3. Results and discussion

3.1 Performance evaluation of different feature representations

Here, we selected six feature extraction methods (i.e., FastText, BERT, ProtT5-U50, ProtT5-BFD, ESM-2, and FEGS) to capture critical information about TIPs from multiple perspectives, including sequential, graphical, semantic, and evolutionary information. Specifically, the feature representations derived from these methods were processed using the SMOTE method to address the class-imbalance problem in the training dataset (164 TIPs and 401 non-TIPs) [59,60]. Finally, the processed feature representations were fed into the BiLSTM-SAM architecture. To evaluate the representational capability of these feature extraction methods, we evaluated their performance in terms of ACC, AUC, AUPR, SN, SP, MCC, and F1 scores through a five-fold cross-validation test, as detailed in Table 1. From Table 1, the MCC values of FastText, BERT, ProtT5-U50, ProtT5-BFD, ESM-2, and FEGS are 0.904, 0.963, 0.981, 0.964, 0.968, and 0.743, respectively. Interestingly, the top-three feature representations were obtained from PLMs. Among these, ProtT5-U50 provided the best feature representation, achieving ACC, SN, SP, F1, AUC, and AUPR values of 0.990, 0.985, 0.995, 0.990, 0.998, and 0.998, respectively.

Download:

Table 1. Performance comparison of different feature representations over the training dataset.

https://doi.org/10.1371/journal.pone.0333614.t001

3.2 Determination of optimal feature subsets

Rather than employing only the best-performing feature (ProtT5-U50) to develop the final model, we combined six feature representations into a single hybrid feature vector (named Hybrid) to improve the feature space and capture more comprehensive information about TIPs. However, the performance of the Hybrid was lower than that of ProtT5-U50 (as shown in Table 1). The possible reason for this decline is that using the Hybrid significantly increased data dimensionality, introducing noise and potentially degrading the model performance. To address this challenge, we applied eight distinct feature selection methods [50,69–71] to the training dataset. These methods included LASSO, mRMR, random projection (RP), truncated singular value decomposition (TSVD), elastic net (EN), graph autoencoders (GAE), principal component analysis (PCA), spectral embedding (SE), which generated eight different feature subsets. For ease of discussion, these feature subsets are referred to as LASSO_FS, mRMR_FS, RP_FS, TSVD_FS, EN_FS, GAE_FS, PCA_FS, and SE_FS, respectively. Specifically, we trained nine individual BiLSTM-SAM-based models, each using one of the feature subsets. The prediction results were evaluated over the cross-validation and independent test on the training and independent test datasets, respectively. The feature dimensions of the subsets were 259, 400, 230, 700, 700, 301, 37, and 500 for EN_FS, GAE_FS, LASSO_FS, mRMR_FS, PCA_FS, RP_FS, SE_FS, and TSVD_FS, respectively. The optimal feature subset, determined by the highest cross-validation MCC, was considered the most effective for TIP identification.

Fig 2 and Table 2 summarize the prediction results of the selected feature selection methods and the control, where the control refers to the hybrid feature vector. The performance of all the feature subsets exceeded that of the control, with the sole exception of the SE-based feature subset. This indicates the effectiveness of the feature selection methods. As seen in Table 2, five feature selection methods, encompassing EN_FS, LASSO_FS, RP_FS, PCA_FS, and mRMR_FS, achieved cross-validation MCC values greater than 0.980. To evaluate its generalization ability, its performance was further assessed on the independent test dataset. The corresponding MCC values for these methods were 0.841, 0.922, 0.697, 0.651, and 0.712, respectively. Overall, LASSO_FS exhibited optimal performance across both validation strategies. Notably, the MCC values of LASSO_FS were 9.02% and 33.44% higher than the control in terms of the cross-validation and independent tests, respectively. Moreover, on the independent test, the ACC, SN, SP, F1, AUC, AUPR values of LASSO_FS were 16.20, 14.58, 17.02, 33.44, 21.26, 14.32, and 20.07%, respectively, higher than the control. Thus, we utilized the LASSO_FS subset, comprising 230 selected features, to optimize the proposed BLSAM-TIP model herein.

Download:

Table 2. Performance of different feature selection methods over the cross-validation and independent tests.

https://doi.org/10.1371/journal.pone.0333614.t002

Download:

Fig 2. Performance comparison of different feature selection methods.

Comparisons of the ROC curve, AUC value, PR curve, and AUPR value on the training (A, B) and independent test (C, D) datasets.

https://doi.org/10.1371/journal.pone.0333614.g002

3.3 Analysis of the contribution of our multi-view features

As mentioned above, our proposed feature subset (LASSO_FS), which combines multi-view information, is a 230-D feature vector derived from 5 FastText-based, 65 BERT-based, 41 ProtT5-U50-based, 46 ProtT5-BFD-based, 12 ESM-2-based, 61 FEGS-based features (as shown in Fig 3). To investigate the effectiveness of LASSO_FS, we compared its performance with six baseline feature descriptors. The performance results for LASSO_FS and the compared feature descriptors in both cross-validation and independent tests are summarized in Fig 4 and Table 3. As observed in Table 3, LASSO_FS achieved the best overall predictive performance across both cross-validation and independent tests. Compared to the best-performing baseline feature descriptor (ProtT5-U50) in the independent test, LASSO_FS achieved ACC, MCC, F1, AUC, and AUPR values of 0.965, 0.922, 0.948, 0.988, and 0.982, respectively, representing improvements of 10.56, 24.17, 16.58, 7.00, and 11.42%.

Download:

Table 3. Performance of different feature representations over the cross-validation and independent tests.

https://doi.org/10.1371/journal.pone.0333614.t003

Download:

Fig 3. Analysis of the optimal feature set.

The number (A) and of proportion (B) of each type feature embedding selected from the optimal feature set.

https://doi.org/10.1371/journal.pone.0333614.g003

Download:

Fig 4. Performance comparison of different representations.

Comparisons of the ROC curve, AUC value, PR curve, and AUPR value on the training (A, B) and independent test (C, D) datasets.

https://doi.org/10.1371/journal.pone.0333614.g004

To further illustrate the effectiveness of LASSO_FS in distinguishing TIPs from non-TIPs, we visualized the distribution of TIPs and non-TIPs using the t-distributed stochastic neighbor embedding (t-SNE) method, which reduces the original feature space to a two-dimensional space [72]. Herein, seven t-SNE plots were created, as shown in Fig 5. It is evident that LASSO_FS (Fig 5G) forms two clear clusters of TIPs and non-TIPs, whereas unclear clusters were observed in the feature spaces of FastText, BERT, ProtT5-U50, ProtT5-BFD, ESM-2, and FEGS. Overall, these analysis results are sufficient to indicate that LASSO_FS effectively captures essential and sufficient information about TIPs. This explains why the proposed BLSAM-TIP model trained with LASSO_FS can precisely classify TIPs with great prediction performance.

Download:

Fig 5. Comparison of t-SNE plots of our multi-view features (LASSO_FS) with other feature representations on the training dataset.

https://doi.org/10.1371/journal.pone.0333614.g005

3.4 Performance comparison between BiLSTM-TIP and several conventional ML and DL models

To elucidate the effectiveness and robustness of the proposed BLSAM-TIP model, we compared its performance with several ML and DL classifiers using the same training and independent test datasets to ensure a fair comparison. In addition, all the compared ML and DL classifiers were constructed using the LASSO_FS feature subset and their optimal parameters, with the grid search space for each ML and DL classifier summarized in S2 and S3 Tables, respectively. Herein, we selected 12 ML methods (i.e., NB, DT, RF, KNN, ADA, LGBM, GBDT, XGB, MLP, ET, LR, and SVM) and 7 DL methods (i.e., CNN-BiLSTM, BiLSTM, DNN, BiGRU, CNN, GRU, and LSTM) to conduct the comparative experiments [18–20,73]. To date, these ML and DL methods have been widely and successfully applied to address numerous research questions in bioinformatics [20,38,48,49,74]. From Fig 6, Tables 4, 5, and S4 and S5 Tables, several key observations can be drawn as follows: (i) Among the top-five classifiers, almost all were based on DL methods (i.e., BiGRU, CNN, GRU, and LSTM), with the sole exception of SVM. The MCC values for BiGRU, CNN, SVM, GRU, and LSTM were 0.946, 0.949, 0.953, 0.968, and 0.978, respectively; (ii) On the independent test dataset, CNN and SVM still outperformed other classifiers, achieving MCC values of 0.841 and 0.861, respectively. The observation suggests that DL methods are particularly effective in leveraging information from large-scale datasets to attain impressive performance [38,74]; and (iii) When comparing BLSAM-TIP with CNN and SVM, BLSAM-TIP demonstrated slightly better performance. Specifically, BLSAM-TIP outperformed the compared models by 2.82–3.52% in ACC, 6.07–8.07% in MCC, 5.19–5.71% in F1, and 10.42–14.58% in SN. In summary, the proposed BLSAM-TIP model is more effective than several conventional ML and DL models in the identification of TIPs, especially in terms of performance on the independent test. These results indicate the excellent generalization ability and robustness of BLSAM-TIP.

Download:

Table 4. Comparison of the prediction results of BLSAM-TIP and conventional ML methods over the cross-validation and independent tests.

https://doi.org/10.1371/journal.pone.0333614.t004

Download:

Table 5. Comparison of the prediction results of BLSAM-TIP and conventional DL methods over the cross-validation and independent tests.

https://doi.org/10.1371/journal.pone.0333614.t005

Download:

Fig 6. ACC, SN, SP, MCC, F1, AUC, AUPR scores of different prediction models on the training (A, C) and independent test (B, D) datasets.

(A-B) Performance comparison of BLSAM-TIP with conventional ML models. (C-D) Performance comparison of BLSAM-TIP with conventional DL models.

https://doi.org/10.1371/journal.pone.0333614.g006

3.5 Performance comparison between BLSAM-TIP and the existing methods

To reveal the excellent performance of the proposed BLSAM-TIP model, we compared it with existing methods, including TIP-KNN [15], TIP-RF [15], and TIPred [8], using the independent test, as summarized in Fig 7. Since TIP-KNN and TIP-RF are not available as online web servers for TIP identification, we implemented KNN-based and RF-based classifiers in conjunction with the selected feature encodings (i.e., AAC, DPC, and PCP). For TIPred, we evaluated its web server using the default threshold. As can be seen from Fig 7, BLSAM-TIP significantly outperformed all existing methods across nearly all performance metrics, including ACC, BACC, SN, MCC, AUC, and AUPR. To be specific, compared to the runner-up TIPred, BLSAM-TIP attained improvements of 7.53, 17.26, 9.18, 3.46, and 4.83% in BACC, SN, MCC, AUC, and AUPR values, respectively. Interestingly, the outstanding SN value of BLSAM-TIP underscores its ability to effectively minimize false negatives. Taken together, these results confirm that BLSAM-TIP delivers more stable and superior performance than the existing methods.

Download:

Fig 7. Heat-map of the prediction performance of our proposed BLSAM-TIP model and the existing methods over the independent test.

https://doi.org/10.1371/journal.pone.0333614.g007

3.6 Ablation study

In this section, we performed ablation experiments to assess the contribution of our proposed computational approach for the accurate identification of TIPs. Specifically, we compared BLSAM-TIP with two modified versions: (i) BLSAM-TIP (-SAM) is the version of BLSAM-TIP trained without the use of SAM and (ii) BLSAM-TIP (-LASSO_FS) is the version of BLSAM-TIP trained using the original hybrid feature vector containing 3848 features instead of the optimized LASSO_FS subset. It can be noticed from Fig 8, BLSAM-TIP outperformed its modified versions in terms of ACC, SN, SP, MCC, and F1 scores in both cross-validation and independent tests. While BLSAM-TIP achieved comparable AUC and AUPR values to BSLAM-TIP (-SAM) in the cross-validation test, it demonstrated superior performance in the independent test. Specifically, BLSAM-TIP achieved ACC, SN, MCC, and F1 scores that were 4.23, 8.33, 9.57, and 6.42% higher than those of BLSAM-TIP (-SAM), respectively. These results confirm that the proposed computational approach benefits from the inclusion of individual components, such as SAM and LASSO_FS feature subset, enabling it to attain more accurate and robust TIP identification.

Download:

Fig 8. Comparison of the prediction results of BLSAM-TIP and its modified versions on the training (A) and independent test (B) datasets.

https://doi.org/10.1371/journal.pone.0333614.g008

3.7 Case study

As can be seen in the above experiments, BLSAM-TIP consistently achieved stable and superior performance in TIP identification. In this section, we conducted case studies to investigate how effectively BLSAM-TIP can identify novel TIPs in unknown samples. Initially, we collected 11 experimentally validated TIPs from previous studies [75–82], while 67 new non-TIPs were peptides that had been experimentally validated as low or non-active, ranging from 5 to 20 amino acids in length [12,83] (S6 Table). Notably, these new peptides were not included in the training or independent test datasets, ensuring an unbiased assessment of our model’s generalization ability. The detailed prediction results of BLSAM-TIP and the existing methods (i.e., TIP-KNN, TIP-RF, and TIPred) in terms of the case studies are recorded in S7 Table. As can be seen from S7 Table, BLSAM-TIP outperformed all the existing methods. Specially, when compared with TIP-KNN and TIP-RF, BLSAM-TIP (ACC of 0.833) significantly outperformed these compared methods (ACC ranging from 0.295–0.654). This capability is important for prioritizing and ranking novel TIPs among large sets of uncharacterized peptides. Altogether, BLSAM-TIP shows clear superiority over the compared methods and holds promise as a computational tool for TIP identification.

4. Conclusions

This study presents BLSAM-TIP, a novel computational approach for the accurate and large-scale identification of TIPs by combining BiLSTM with SAM. Both cross-validation and independent tests confirm that BLSAM-TIP is an accurate and robust computational tool. In terms of the independent test, BLSAM-TIP significantly outperformed state-of-the-art methods for TIP identification, achieving a BACC of 0.936, MCC of 0.870, and AUC of 0.988. The excellent performance of BLSAM-TIP can be attributed to four major reasons: (i) Several feature encoding schemes from several perspectives are employed to capture multi-view and sufficient information about TIPs, including sequential, graphical, statistical, contextual, and protein semantic information; (ii) The SMOTE method is applied to handle the issue of class imbalance effectively; (iii) The LASSO-based feature subset contains excellent discriminative information, which contributes to significant performance improvements; and (iv) The BiLSTM-SAM-DNN architecture can effectively leverage the strengths of individual components to attain more accurate and stable TIP identification. Although BLSAM-TIP has greatly enhanced and facilitated TIP identification, there is still ample room for further improvement. One possible extension is to incorporate interpretable feature representations (such as physicochemical properties (PCPs) or amino acid and dipeptide propensities) into the current feature subset. Another potential enhancement is to implement the BLSAM-TIP web server to facilitate the in-silico identification of peptides with tyrosinase inhibitory properties.

Supporting information

S1 Table. A number of TIPs and non-TIPs used for developing three TIP predictors.

https://doi.org/10.1371/journal.pone.0333614.s001

(DOCX)

S2 Table. Information of parameter settings for 12 ML methods used in this study.

https://doi.org/10.1371/journal.pone.0333614.s002

(DOCX)

S3 Table. Information of parameter settings for five DL methods used in this study.

https://doi.org/10.1371/journal.pone.0333614.s003

(DOCX)

S4 Table. Comparison of the prediction results of BLSAM-TIP and conventional ML methods over the cross-validation and independent tests.

https://doi.org/10.1371/journal.pone.0333614.s004

(DOCX)

S5 Table. Comparison of the prediction results of BLSAM-TIP and conventional DL methods over the cross-validation and independent tests.

https://doi.org/10.1371/journal.pone.0333614.s005

(DOCX)

S6 Table. Detailed information of new TIPs and non-TIPs in the case studies.

https://doi.org/10.1371/journal.pone.0333614.s006

(DOCX)

S7 Table. Detailed prediction results of TIP-KNN, TIP-RF, TIPred, and BLSAM-TIP on case studies.

https://doi.org/10.1371/journal.pone.0333614.s007

(DOCX)

References

1. Pathak MA, Jimbow K, Szabo G, Fitzpatrick TB. Sunlight and melanin pigmentation. Photochem Photobiol Rev. 1976;1:211–39.
- View Article
- Google Scholar
2. Chai W, Wei W, Hu X, Bai Q, Guo Y, Zhang M, et al. Inhibitory effect and molecular mechanism on tyrosinase and browning of fresh-cut apple by longan shell tannins. Int J Biol Macromol. 2024;274(Pt 2):133326. pmid:38925198
- View Article
- PubMed/NCBI
- Google Scholar
3. Gandía-Herrero F, Jiménez M, Cabanes J, García-Carmona F, Escribano J. Tyrosinase inhibitory activity of cucumber compounds: enzymes responsible for browning in cucumber. J Agricul Food Chem. 2003;51(26):7764–9.
- View Article
- Google Scholar
4. Baber MA, Crist CM, Devolve NL, Patrone JD. Tyrosinase inhibitors: a perspective. Molecules. 2023;28(15):5762.
- View Article
- Google Scholar
5. Chang T-S. An updated review of tyrosinase inhibitors. Int J Mol Sci. 2009;10(6):2440–75. pmid:19582213
- View Article
- PubMed/NCBI
- Google Scholar
6. Tayier N, Qin N-Y, Zhao L-N, Zeng Y, Wang Y, Hu G, et al. Theoretical exploring of a molecular mechanism for melanin inhibitory activity of calycosin in Zebrafish. Molecules. 2021;26(22):6998. pmid:34834088
- View Article
- PubMed/NCBI
- Google Scholar
7. Ando H, Kondoh H, Ichihashi M, Hearing VJ. Approaches to identify inhibitors of melanin biosynthesis via the quality control of tyrosinase. J Invest Dermatol. 2007;127(4):751–61. pmid:17218941
- View Article
- PubMed/NCBI
- Google Scholar
8. Charoenkwan P, Kongsompong S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides. BMC Bioinform. 2023;24(1):356. pmid:37735626
- View Article
- PubMed/NCBI
- Google Scholar
9. Song Y, Chen S, Li L, Zeng Y, Hu X. The hypopigmentation mechanism of tyrosinase inhibitory peptides derived from food proteins: an overview. Molecules. 2022;27(9):2710. pmid:35566061
- View Article
- PubMed/NCBI
- Google Scholar
10. Lee SY, Baek N, Nam T. Natural, semisynthetic and synthetic tyrosinase inhibitors. J Enzyme Inhib Med Chem. 2016;31(1):1–13. pmid:25683082
- View Article
- PubMed/NCBI
- Google Scholar
11. Deri B, et al. The unravelling of the complex pattern of tyrosinase inhibition. Scient Rep. 2016;6(1):1–10.
- View Article
- Google Scholar
12. Kongsompong S, E-Kobon T, Taengphan W, Sangkhawasi M, Khongkow M, Chumnanpuen P. Computer-aided virtual screening and in vitro validation of biomimetic tyrosinase inhibitory peptides from abalone peptidome. Int J Mol Sci. 2023;24(4):3154. pmid:36834568
- View Article
- PubMed/NCBI
- Google Scholar
13. Hassan M, Shahzadi S, Kloczkowski A. Tyrosinase inhibitors naturally present in plants and synthetic modifications of these natural products as anti-melanogenic agents: a review. Molecules. 2023;28(1):378.
- View Article
- Google Scholar
14. Wang W, Lin H, Shen W, Qin X, Gao J, Cao W, et al. Optimization of a novel tyrosinase inhibitory peptide from atrina pectinata mantle and its molecular inhibitory mechanism. Foods. 2023;12(21):3884. pmid:37959003
- View Article
- PubMed/NCBI
- Google Scholar
15. Kongsompong S, E-Kobon T, Chumnanpuen P. K-nearest neighbor and random forest-based prediction of putative tyrosinase inhibitory peptides of abalone haliotis diversicolor. Molecules. 2021;26(12):3671. pmid:34208619
- View Article
- PubMed/NCBI
- Google Scholar
16. Charoenkwan P, Ahmed S, Nantasenamat C, Quinn JMW, Moni MA, Lio’ P, et al. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep. 2022;12(1):7697. pmid:35546347
- View Article
- PubMed/NCBI
- Google Scholar
17. Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Lio’ P, Manavalan B, et al. StackDPPIV: a novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides. Methods. 2022;204:189–98. pmid:34883239
- View Article
- PubMed/NCBI
- Google Scholar
18. Charoenkwan P, Schaduangrat N, Manavalan B, Shoombuatong W. M3S-ALG: Improved and robust prediction of allergenicity of chemical compounds by using a novel multi-step stacking strategy. Future Gener Comp Syst. 2025;162:107455.
- View Article
- Google Scholar
19. Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Nav blocking peptides prediction. Sci Rep. 2024;14(1):4463. pmid:38396246
- View Article
- PubMed/NCBI
- Google Scholar
20. Shoombuatong W, Meewan I, Mookdarsanit L, Schaduangrat N. Stack-HDAC3i: a high-precision identification of HDAC3 inhibitors by exploiting a stacked ensemble-learning framework. Methods. 2024;230:147–57. pmid:39191338
- View Article
- PubMed/NCBI
- Google Scholar
21. Pham NT, Zhang Y, Rakkiyappan R, Manavalan B. HOTGpred: enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach. Comput Biol Med. 2024;179:108859. pmid:39029431
- View Article
- PubMed/NCBI
- Google Scholar
22. Zhu Y-H, Liu Z, Liu Y, Ji Z, Yu D-J. ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction. Brief Bioinform. 2024;25(2):bbae040. pmid:38349057
- View Article
- PubMed/NCBI
- Google Scholar
23. Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018;9(1):2542. pmid:29959318
- View Article
- PubMed/NCBI
- Google Scholar
24. Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019;16(7):603–6. pmid:31235882
- View Article
- PubMed/NCBI
- Google Scholar
25. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23(10):1282–8. pmid:17379688
- View Article
- PubMed/NCBI
- Google Scholar
26. Bateman A, et al. The Pfam protein families database. Nucleic Acids Res. 2004;32(suppl_1):D138–41.
- View Article
- Google Scholar
27. Wei L, Zhou C, Chen H, Song J, Su R. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics. 2018;34(23):4007–16. pmid:29868903
- View Article
- PubMed/NCBI
- Google Scholar
28. Rao B, Zhou C, Zhang G, Su R, Wei L. ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides. Brief Bioinform. 2020;21(5):1846–55. pmid:31729528
- View Article
- PubMed/NCBI
- Google Scholar
29. Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides. Brief Bioinform. 2021;22(6):bbab172. pmid:33963832
- View Article
- PubMed/NCBI
- Google Scholar
30. Ledwoń P, Goldeman W, Hałdys K, Jewgiński M, Calamai G, Rossowska J, et al. Tripeptides conjugated with thiosemicarbazones: new inhibitors of tyrosinase for cosmeceutical use. J Enzyme Inhib Med Chem. 2023;38(1):2193676. pmid:37146256
- View Article
- PubMed/NCBI
- Google Scholar
31. Le NQK, Li W, Cao Y. Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection. Brief Bioinform. 2023;24(5):bbad319. pmid:37649385
- View Article
- PubMed/NCBI
- Google Scholar
32. Song Y, Li J, Tian H, Xiang H, Chen S, Li L, et al. Copper chelating peptides derived from tilapia (Oreochromis niloticus) skin as tyrosinase inhibitor: biological evaluation, in silico investigation and in vivo effects. Food Res Int. 2023;163:112307. pmid:36596203
- View Article
- PubMed/NCBI
- Google Scholar
33. Liu Y, Liu Y, Wang GA, Cheng Y, Bi S, Zhu X. BERT-Kgly: a bidirectional encoder representations from transformers (BERT)-based model for predicting lysine glycation site for homo sapiens. Original Res. 2022;2.
- View Article
- Google Scholar
34. Xue W, Liu X, Zhao W, Yu Z. Identification and molecular mechanism of novel tyrosinase inhibitory peptides from collagen. J Food Sci. 2022;87(6):2744–56. pmid:35603815
- View Article
- PubMed/NCBI
- Google Scholar
35. Qin D, Jiao L, Wang R, Zhao Y, Hao Y, Liang G. Prediction of antioxidant peptides using a quantitative structure-activity relationship predictor (AnOxPP) based on bidirectional long short-term memory neural network and interpretable amino acid descriptors. Comput Biol Med. 2023;154:106591. pmid:36701965
- View Article
- PubMed/NCBI
- Google Scholar
36. Mu Z, Yu T, Liu X, Zheng H, Wei L, Liu J. FEGS: a novel feature extraction model for protein sequences and its applications. BMC Bioinform. 2021;22:1–15.
- View Article
- Google Scholar
37. Raza A, Uddin J, Almuhaimeed A, Akbar S, Zou Q, Ahmad A. AIPs-SnTCN: Predicting anti-inflammatory peptides using fastText and transformer encoder-based hybrid word embedding with self-normalized temporal convolutional networks. J Chem Inform Model. 2023;63(21):6537–54.
- View Article
- Google Scholar
38. Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics. 2021;37(17):2556–62. pmid:33638635
- View Article
- PubMed/NCBI
- Google Scholar
39. Zulfiqar H, Sun Z-J, Huang Q-L, Yuan S-S, Lv H, Dao F-Y, et al. Deep-4mCW2V: a sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods. 2022;203:558–63. pmid:34352373
- View Article
- PubMed/NCBI
- Google Scholar
40. Le NQK. Leveraging transformers‐based language models in proteome bioinformatics. Proteomics. 2023;23(23–24):2300011.
- View Article
- Google Scholar
41. Do DT, Le TQT, Le NQK. Using deep neural networks and biological subwords to detect protein S-sulfenylation sites. Brief Bioinform. 2021;22(3):bbaa128. pmid:32613242
- View Article
- PubMed/NCBI
- Google Scholar
42. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. arXiv preprint. 2016.
- View Article
- Google Scholar
43. Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One. 2015;10(11):e0141287. pmid:26555596
- View Article
- PubMed/NCBI
- Google Scholar
44. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2018.
45. Raffel C, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(140):1–67.
- View Article
- Google Scholar
46. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. pmid:34265844
- View Article
- PubMed/NCBI
- Google Scholar
47. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30. pmid:36927031
- View Article
- PubMed/NCBI
- Google Scholar
48. Zhang T, Jia J, Chen C, Zhang Y, Yu B. BiGRUD-SA: protein S-sulfenylation sites prediction based on BiGRU and self-attention. Comput Biol Med. 2023;163:107145. pmid:37336062
- View Article
- PubMed/NCBI
- Google Scholar
49. Zhang X, Wang Y, Wei Q, He S, Salhi A, Yu B. DRBPPred-GAT: Accurate prediction of DNA-binding proteins and RNA-binding proteins based on graph multi-head attention network. Knowledge-Based Syst. 2024;285:111354.
- View Article
- Google Scholar
50. Zou Q, Zeng J, Cao L, Ji R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing. 2016;173:346–54.
- View Article
- Google Scholar
51. Li Y, Chen Z, Wang Q, Lv X, Cheng Z, Wu Y, et al. Identification of hub proteins in cerebrospinal fluid as potential biomarkers of Alzheimer’s disease by integrated bioinformatics. J Neurol. 2023;270(3):1487–500. pmid:36396814
- View Article
- PubMed/NCBI
- Google Scholar
52. Hochreiter S, Schmidhuber J. LSTM can solve hard long time lag problems. Adv Neural Inform Process Syst. 1996;9.
- View Article
- Google Scholar
53. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Inter J Uncertainty, Fuzziness and Knowledge-Based Syst. 1998;6(02):107–16.
- View Article
- Google Scholar
54. Medsker LR, Jain L. Recurrent neural networks. Design Appl. 2001;5(64–67):2.
- View Article
- Google Scholar
55. Gibbons FX. Self-attention and behavior: a review and theoretical update. Adv Exp Soc Psychol. 1990;23:249–303.
- View Article
- Google Scholar
56. Vaswani A. Attention is all you need. Adv Neural Inform Process Syst. 2017.
- View Article
- Google Scholar
57. Arif R, Kanwal S, Ahmed S, Kabir M. A computational predictor for accurate identification of tumor homing peptides by integrating sequential and deep BiLSTM features. Interdiscip Sci. 2024;16(2):503–18. pmid:38733473
- View Article
- PubMed/NCBI
- Google Scholar
58. Peng D, Zhang D, Liu C, Lu J. BG-SAC: entity relationship classification model based on self-attention supported capsule networks. Appl Soft Comput. 2020;91:106186.
- View Article
- Google Scholar
59. Kabir M, Nantasenamat C, Kanthawong S, Charoenkwan P, Shoombuatong W. Large-scale comparative review and assessment of computational methods for phage virion proteins identification. EXCLI J. 2022;21:11–29. pmid:35145365
- View Article
- PubMed/NCBI
- Google Scholar
60. Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules. 2019;24(10):1973. pmid:31121946
- View Article
- PubMed/NCBI
- Google Scholar
61. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artificial Intelligence Res. 2002;16:321–57.
- View Article
- Google Scholar
62. Yan L, Wang M, Zhou H, Liu Y, Yu B. AntiCVP-Deep: identify anti-coronavirus peptides between different negative datasets based on self-attention and deep learning. Biomed Signal Process Control. 2024;90:105909.
- View Article
- Google Scholar
63. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315–6. pmid:20736804
- View Article
- PubMed/NCBI
- Google Scholar
64. Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Yu D-J, et al. MMPatho: leveraging multilevel consensus and evolutionary information for enhanced missense mutation pathogenic prediction. J Chem Inf Model. 2023;63(22):7239–57. pmid:37947586
- View Article
- PubMed/NCBI
- Google Scholar
65. Azadpour M, McKay CM, Smith RL. Estimating confidence intervals for information transfer analysis of confusion matrices. J Acous Soc Am. 2014;135(3):EL140–6.
- View Article
- Google Scholar
66. Akbar S, Ullah M, Raza A, Zou Q, Alghamdi W. DeepAIPs-Pred: predicting anti-inflammatory peptides using local evolutionary transformation images and structural embedding-based optimal descriptors with self-normalized BiTCNs. J Chem Inf Model. 2024;64(24):9609–25. pmid:39625463
- View Article
- PubMed/NCBI
- Google Scholar
67. Akbar S, Zou Q, Raza A, Alarfaj FK. iAFPs-Mv-BiTCN: predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif Intell Med. 2024;151:102860. pmid:38552379
- View Article
- PubMed/NCBI
- Google Scholar
68. Akbar S, Raza A, Zou Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinform. 2024;25(1):102. pmid:38454333
- View Article
- PubMed/NCBI
- Google Scholar
69. Sun X, Jin T, Chen C, Cui X, Ma Q, Yu B. RBPro-RF: use Chou’s 5-steps rule to predict RNA-binding proteins via random forest with elastic net. Chemometrics Intelligent Lab Syst. 2020;197:103919.
- View Article
- Google Scholar
70. Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B: Stat Methodol. 1996;58(1):267–88.
- View Article
- Google Scholar
71. Bingham E, Mannila H. Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. 245–50.
- View Article
- Google Scholar
72. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Machine Learn Res. 2008;9(11).
- View Article
- Google Scholar
73. Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinform. 2023;24(1):301. pmid:37507654
- View Article
- PubMed/NCBI
- Google Scholar
74. Xie R, Li J, Wang J, Dai W, Leier A, Marquez-Lago TT, et al. DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy. Brief Bioinform. 2021;22(3):bbaa125. pmid:32599617
- View Article
- PubMed/NCBI
- Google Scholar
75. Zhao Y, Zhang T, Ning Y, Wang D, Li F, Fan Y, et al. Identification and molecular mechanism of novel tyrosinase inhibitory peptides from the hydrolysate of “Fengdan” peony (Paeonia ostii) seed meal proteins: peptidomics and in silico analysis. LWT. 2023;180:114695.
- View Article
- Google Scholar
76. Kubglomsong S, Theerakulkait C, Reed RL, Yang L, Maier CS, Stevens JF. Isolation and identification of tyrosinase-inhibitory and copper-chelating peptides from hydrolyzed rice-bran-derived albumin. J Agric Food Chem. 2018;66(31):8346–54. pmid:30016586
- View Article
- PubMed/NCBI
- Google Scholar
77. Chen H, Yao Y, Xie T, Guo H, Chen S, Zhang Y, et al. Identification of tyrosinase inhibitory peptides from sea cucumber (Apostichopus japonicus) collagen by in silico methods and study of their molecular mechanism. Curr Protein Pept Sci. 2023;24(9):758–66. pmid:37350006
- View Article
- PubMed/NCBI
- Google Scholar
78. Joompang A, Anwised P, Klaynongsruang S, Taemaitree L, Wanthong A, Choowongkomon K, et al. Rational design of an N-terminal cysteine-containing tetrapeptide that inhibits tyrosinase and evaluation of its mechanism of action. Curr Res Food Sci. 2023;7:100598. pmid:37790858
- View Article
- PubMed/NCBI
- Google Scholar
79. Putri SA, Maharani R, Maksum IP, Siahaan TJ. Peptide design for enhanced anti-melanogenesis: optimizing molecular weight, polarity, and cyclization. Drug Design Develop Therapy. 2025:645–70.
- View Article
- Google Scholar
80. Yu Z, Lv H, Zhou M, Fu P, Zhao W. Identification and molecular docking of tyrosinase inhibitory peptides from allophycocyanin in Spirulina platensis. J Sci Food Agric. 2024;104(6):3648–53. pmid:38224494
- View Article
- PubMed/NCBI
- Google Scholar
81. Song EC, Park C, Shin Y, Kim WK, Kim SB, Cho S. Neurog1-derived peptides RMNE1 and DualPep-Shine penetrate the skin and inhibit melanin synthesis by regulating MITF transcription. Int J Mol Sci. 2023;24(7):6158. pmid:37047130
- View Article
- PubMed/NCBI
- Google Scholar
82. Li J, Yin S, Wei Z, Xiao Z, Kang Z, Wu Y, et al. Newly identified peptide Nigrocin-OA27 inhibits UVB induced melanin production via the MITF/TYR pathway. Peptides. 2024;177:171215. pmid:38608837
- View Article
- PubMed/NCBI
- Google Scholar
83. Olsen TH, Yesiltas B, Marin FI, Pertseva M, García-Moreno PJ, Gregersen S, et al. AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides. Sci Rep. 2020;10(1):21471. pmid:33293615
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Pathak MA, Jimbow K, Szabo G, Fitzpatrick TB. Sunlight and melanin pigmentation. Photochem Photobiol Rev. 1976;1:211–39.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Chai W, Wei W, Hu X, Bai Q, Guo Y, Zhang M, et al. Inhibitory effect and molecular mechanism on tyrosinase and browning of fresh-cut apple by longan shell tannins. Int J Biol Macromol. 2024;274(Pt 2):133326. pmid:38925198
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Gandía-Herrero F, Jiménez M, Cabanes J, García-Carmona F, Escribano J. Tyrosinase inhibitory activity of cucumber compounds: enzymes responsible for browning in cucumber. J Agricul Food Chem. 2003;51(26):7764–9.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Baber MA, Crist CM, Devolve NL, Patrone JD. Tyrosinase inhibitors: a perspective. Molecules. 2023;28(15):5762.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Chang T-S. An updated review of tyrosinase inhibitors. Int J Mol Sci. 2009;10(6):2440–75. pmid:19582213
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Tayier N, Qin N-Y, Zhao L-N, Zeng Y, Wang Y, Hu G, et al. Theoretical exploring of a molecular mechanism for melanin inhibitory activity of calycosin in Zebrafish. Molecules. 2021;26(22):6998. pmid:34834088
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Ando H, Kondoh H, Ichihashi M, Hearing VJ. Approaches to identify inhibitors of melanin biosynthesis via the quality control of tyrosinase. J Invest Dermatol. 2007;127(4):751–61. pmid:17218941
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Charoenkwan P, Kongsompong S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides. BMC Bioinform. 2023;24(1):356. pmid:37735626
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. Song Y, Chen S, Li L, Zeng Y, Hu X. The hypopigmentation mechanism of tyrosinase inhibitory peptides derived from food proteins: an overview. Molecules. 2022;27(9):2710. pmid:35566061
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Lee SY, Baek N, Nam T. Natural, semisynthetic and synthetic tyrosinase inhibitors. J Enzyme Inhib Med Chem. 2016;31(1):1–13. pmid:25683082
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref11] 11. Deri B, et al. The unravelling of the complex pattern of tyrosinase inhibition. Scient Rep. 2016;6(1):1–10.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref12] 12. Kongsompong S, E-Kobon T, Taengphan W, Sangkhawasi M, Khongkow M, Chumnanpuen P. Computer-aided virtual screening and in vitro validation of biomimetic tyrosinase inhibitory peptides from abalone peptidome. Int J Mol Sci. 2023;24(4):3154. pmid:36834568
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref13] 13. Hassan M, Shahzadi S, Kloczkowski A. Tyrosinase inhibitors naturally present in plants and synthetic modifications of these natural products as anti-melanogenic agents: a review. Molecules. 2023;28(1):378.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref14] 14. Wang W, Lin H, Shen W, Qin X, Gao J, Cao W, et al. Optimization of a novel tyrosinase inhibitory peptide from atrina pectinata mantle and its molecular inhibitory mechanism. Foods. 2023;12(21):3884. pmid:37959003
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref15] 15. Kongsompong S, E-Kobon T, Chumnanpuen P. K-nearest neighbor and random forest-based prediction of putative tyrosinase inhibitory peptides of abalone haliotis diversicolor. Molecules. 2021;26(12):3671. pmid:34208619
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref16] 16. Charoenkwan P, Ahmed S, Nantasenamat C, Quinn JMW, Moni MA, Lio’ P, et al. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep. 2022;12(1):7697. pmid:35546347
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref17] 17. Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Lio’ P, Manavalan B, et al. StackDPPIV: a novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides. Methods. 2022;204:189–98. pmid:34883239
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref18] 18. Charoenkwan P, Schaduangrat N, Manavalan B, Shoombuatong W. M3S-ALG: Improved and robust prediction of allergenicity of chemical compounds by using a novel multi-step stacking strategy. Future Gener Comp Syst. 2025;162:107455.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref19] 19. Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Nav blocking peptides prediction. Sci Rep. 2024;14(1):4463. pmid:38396246
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref20] 20. Shoombuatong W, Meewan I, Mookdarsanit L, Schaduangrat N. Stack-HDAC3i: a high-precision identification of HDAC3 inhibitors by exploiting a stacked ensemble-learning framework. Methods. 2024;230:147–57. pmid:39191338
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref21] 21. Pham NT, Zhang Y, Rakkiyappan R, Manavalan B. HOTGpred: enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach. Comput Biol Med. 2024;179:108859. pmid:39029431
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Zhu Y-H, Liu Z, Liu Y, Ji Z, Yu D-J. ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction. Brief Bioinform. 2024;25(2):bbae040. pmid:38349057
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref23] 23. Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018;9(1):2542. pmid:29959318
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref24] 24. Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019;16(7):603–6. pmid:31235882
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref25] 25. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23(10):1282–8. pmid:17379688
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref26] 26. Bateman A, et al. The Pfam protein families database. Nucleic Acids Res. 2004;32(suppl_1):D138–41.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref27] 27. Wei L, Zhou C, Chen H, Song J, Su R. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics. 2018;34(23):4007–16. pmid:29868903
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref28] 28. Rao B, Zhou C, Zhang G, Su R, Wei L. ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides. Brief Bioinform. 2020;21(5):1846–55. pmid:31729528
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides. Brief Bioinform. 2021;22(6):bbab172. pmid:33963832
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref30] 30. Ledwoń P, Goldeman W, Hałdys K, Jewgiński M, Calamai G, Rossowska J, et al. Tripeptides conjugated with thiosemicarbazones: new inhibitors of tyrosinase for cosmeceutical use. J Enzyme Inhib Med Chem. 2023;38(1):2193676. pmid:37146256
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref31] 31. Le NQK, Li W, Cao Y. Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection. Brief Bioinform. 2023;24(5):bbad319. pmid:37649385
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref32] 32. Song Y, Li J, Tian H, Xiang H, Chen S, Li L, et al. Copper chelating peptides derived from tilapia (Oreochromis niloticus) skin as tyrosinase inhibitor: biological evaluation, in silico investigation and in vivo effects. Food Res Int. 2023;163:112307. pmid:36596203
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref33] 33. Liu Y, Liu Y, Wang GA, Cheng Y, Bi S, Zhu X. BERT-Kgly: a bidirectional encoder representations from transformers (BERT)-based model for predicting lysine glycation site for homo sapiens. Original Res. 2022;2.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref34] 34. Xue W, Liu X, Zhao W, Yu Z. Identification and molecular mechanism of novel tyrosinase inhibitory peptides from collagen. J Food Sci. 2022;87(6):2744–56. pmid:35603815
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref35] 35. Qin D, Jiao L, Wang R, Zhao Y, Hao Y, Liang G. Prediction of antioxidant peptides using a quantitative structure-activity relationship predictor (AnOxPP) based on bidirectional long short-term memory neural network and interpretable amino acid descriptors. Comput Biol Med. 2023;154:106591. pmid:36701965
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref36] 36. Mu Z, Yu T, Liu X, Zheng H, Wei L, Liu J. FEGS: a novel feature extraction model for protein sequences and its applications. BMC Bioinform. 2021;22:1–15.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref37] 37. Raza A, Uddin J, Almuhaimeed A, Akbar S, Zou Q, Ahmad A. AIPs-SnTCN: Predicting anti-inflammatory peptides using fastText and transformer encoder-based hybrid word embedding with self-normalized temporal convolutional networks. J Chem Inform Model. 2023;63(21):6537–54.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref38] 38. Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics. 2021;37(17):2556–62. pmid:33638635
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref39] 39. Zulfiqar H, Sun Z-J, Huang Q-L, Yuan S-S, Lv H, Dao F-Y, et al. Deep-4mCW2V: a sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods. 2022;203:558–63. pmid:34352373
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref40] 40. Le NQK. Leveraging transformers‐based language models in proteome bioinformatics. Proteomics. 2023;23(23–24):2300011.
View Article
Google Scholar

[148] View Article

[149] Google Scholar

[ref41] 41. Do DT, Le TQT, Le NQK. Using deep neural networks and biological subwords to detect protein S-sulfenylation sites. Brief Bioinform. 2021;22(3):bbaa128. pmid:32613242
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref42] 42. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. arXiv preprint. 2016.
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref43] 43. Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One. 2015;10(11):e0141287. pmid:26555596
View Article
PubMed/NCBI
Google Scholar

[158] View Article

[159] PubMed/NCBI

[160] Google Scholar

[ref44] 44. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2018.

[ref45] 45. Raffel C, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(140):1–67.
View Article
Google Scholar

[163] View Article

[164] Google Scholar

[ref46] 46. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. pmid:34265844
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref47] 47. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30. pmid:36927031
View Article
PubMed/NCBI
Google Scholar

[170] View Article

[171] PubMed/NCBI

[172] Google Scholar

[ref48] 48. Zhang T, Jia J, Chen C, Zhang Y, Yu B. BiGRUD-SA: protein S-sulfenylation sites prediction based on BiGRU and self-attention. Comput Biol Med. 2023;163:107145. pmid:37336062
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref49] 49. Zhang X, Wang Y, Wei Q, He S, Salhi A, Yu B. DRBPPred-GAT: Accurate prediction of DNA-binding proteins and RNA-binding proteins based on graph multi-head attention network. Knowledge-Based Syst. 2024;285:111354.
View Article
Google Scholar

[178] View Article

[179] Google Scholar

[ref50] 50. Zou Q, Zeng J, Cao L, Ji R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing. 2016;173:346–54.
View Article
Google Scholar

[181] View Article

[182] Google Scholar

[ref51] 51. Li Y, Chen Z, Wang Q, Lv X, Cheng Z, Wu Y, et al. Identification of hub proteins in cerebrospinal fluid as potential biomarkers of Alzheimer’s disease by integrated bioinformatics. J Neurol. 2023;270(3):1487–500. pmid:36396814
View Article
PubMed/NCBI
Google Scholar

[184] View Article

[185] PubMed/NCBI

[186] Google Scholar

[ref52] 52. Hochreiter S, Schmidhuber J. LSTM can solve hard long time lag problems. Adv Neural Inform Process Syst. 1996;9.
View Article
Google Scholar

[188] View Article

[189] Google Scholar

[ref53] 53. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Inter J Uncertainty, Fuzziness and Knowledge-Based Syst. 1998;6(02):107–16.
View Article
Google Scholar

[191] View Article

[192] Google Scholar

[ref54] 54. Medsker LR, Jain L. Recurrent neural networks. Design Appl. 2001;5(64–67):2.
View Article
Google Scholar

[194] View Article

[195] Google Scholar

[ref55] 55. Gibbons FX. Self-attention and behavior: a review and theoretical update. Adv Exp Soc Psychol. 1990;23:249–303.
View Article
Google Scholar

[197] View Article

[198] Google Scholar

[ref56] 56. Vaswani A. Attention is all you need. Adv Neural Inform Process Syst. 2017.
View Article
Google Scholar

[200] View Article

[201] Google Scholar

[ref57] 57. Arif R, Kanwal S, Ahmed S, Kabir M. A computational predictor for accurate identification of tumor homing peptides by integrating sequential and deep BiLSTM features. Interdiscip Sci. 2024;16(2):503–18. pmid:38733473
View Article
PubMed/NCBI
Google Scholar

[203] View Article

[204] PubMed/NCBI

[205] Google Scholar

[ref58] 58. Peng D, Zhang D, Liu C, Lu J. BG-SAC: entity relationship classification model based on self-attention supported capsule networks. Appl Soft Comput. 2020;91:106186.
View Article
Google Scholar

[207] View Article

[208] Google Scholar

[ref59] 59. Kabir M, Nantasenamat C, Kanthawong S, Charoenkwan P, Shoombuatong W. Large-scale comparative review and assessment of computational methods for phage virion proteins identification. EXCLI J. 2022;21:11–29. pmid:35145365
View Article
PubMed/NCBI
Google Scholar

[210] View Article

[211] PubMed/NCBI

[212] Google Scholar

[ref60] 60. Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules. 2019;24(10):1973. pmid:31121946
View Article
PubMed/NCBI
Google Scholar

[214] View Article

[215] PubMed/NCBI

[216] Google Scholar

[ref61] 61. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artificial Intelligence Res. 2002;16:321–57.
View Article
Google Scholar

[218] View Article

[219] Google Scholar

[ref62] 62. Yan L, Wang M, Zhou H, Liu Y, Yu B. AntiCVP-Deep: identify anti-coronavirus peptides between different negative datasets based on self-attention and deep learning. Biomed Signal Process Control. 2024;90:105909.
View Article
Google Scholar

[221] View Article

[222] Google Scholar

[ref63] 63. Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315–6. pmid:20736804
View Article
PubMed/NCBI
Google Scholar

[224] View Article

[225] PubMed/NCBI

[226] Google Scholar

[ref64] 64. Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Yu D-J, et al. MMPatho: leveraging multilevel consensus and evolutionary information for enhanced missense mutation pathogenic prediction. J Chem Inf Model. 2023;63(22):7239–57. pmid:37947586
View Article
PubMed/NCBI
Google Scholar

[228] View Article

[229] PubMed/NCBI

[230] Google Scholar

[ref65] 65. Azadpour M, McKay CM, Smith RL. Estimating confidence intervals for information transfer analysis of confusion matrices. J Acous Soc Am. 2014;135(3):EL140–6.
View Article
Google Scholar

[232] View Article

[233] Google Scholar

[ref66] 66. Akbar S, Ullah M, Raza A, Zou Q, Alghamdi W. DeepAIPs-Pred: predicting anti-inflammatory peptides using local evolutionary transformation images and structural embedding-based optimal descriptors with self-normalized BiTCNs. J Chem Inf Model. 2024;64(24):9609–25. pmid:39625463
View Article
PubMed/NCBI
Google Scholar

[235] View Article

[236] PubMed/NCBI

[237] Google Scholar

[ref67] 67. Akbar S, Zou Q, Raza A, Alarfaj FK. iAFPs-Mv-BiTCN: predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif Intell Med. 2024;151:102860. pmid:38552379
View Article
PubMed/NCBI
Google Scholar

[239] View Article

[240] PubMed/NCBI

[241] Google Scholar

[ref68] 68. Akbar S, Raza A, Zou Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinform. 2024;25(1):102. pmid:38454333
View Article
PubMed/NCBI
Google Scholar

[243] View Article

[244] PubMed/NCBI

[245] Google Scholar

[ref69] 69. Sun X, Jin T, Chen C, Cui X, Ma Q, Yu B. RBPro-RF: use Chou’s 5-steps rule to predict RNA-binding proteins via random forest with elastic net. Chemometrics Intelligent Lab Syst. 2020;197:103919.
View Article
Google Scholar

[247] View Article

[248] Google Scholar

[ref70] 70. Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B: Stat Methodol. 1996;58(1):267–88.
View Article
Google Scholar

[250] View Article

[251] Google Scholar

[ref71] 71. Bingham E, Mannila H. Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. 245–50.
View Article
Google Scholar

[253] View Article

[254] Google Scholar

[ref72] 72. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Machine Learn Res. 2008;9(11).
View Article
Google Scholar

[256] View Article

[257] Google Scholar

[ref73] 73. Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinform. 2023;24(1):301. pmid:37507654
View Article
PubMed/NCBI
Google Scholar

[259] View Article

[260] PubMed/NCBI

[261] Google Scholar

[ref74] 74. Xie R, Li J, Wang J, Dai W, Leier A, Marquez-Lago TT, et al. DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy. Brief Bioinform. 2021;22(3):bbaa125. pmid:32599617
View Article
PubMed/NCBI
Google Scholar

[263] View Article

[264] PubMed/NCBI

[265] Google Scholar

[ref75] 75. Zhao Y, Zhang T, Ning Y, Wang D, Li F, Fan Y, et al. Identification and molecular mechanism of novel tyrosinase inhibitory peptides from the hydrolysate of “Fengdan” peony (Paeonia ostii) seed meal proteins: peptidomics and in silico analysis. LWT. 2023;180:114695.
View Article
Google Scholar

[267] View Article

[268] Google Scholar

[ref76] 76. Kubglomsong S, Theerakulkait C, Reed RL, Yang L, Maier CS, Stevens JF. Isolation and identification of tyrosinase-inhibitory and copper-chelating peptides from hydrolyzed rice-bran-derived albumin. J Agric Food Chem. 2018;66(31):8346–54. pmid:30016586
View Article
PubMed/NCBI
Google Scholar

[270] View Article

[271] PubMed/NCBI

[272] Google Scholar

[ref77] 77. Chen H, Yao Y, Xie T, Guo H, Chen S, Zhang Y, et al. Identification of tyrosinase inhibitory peptides from sea cucumber (Apostichopus japonicus) collagen by in silico methods and study of their molecular mechanism. Curr Protein Pept Sci. 2023;24(9):758–66. pmid:37350006
View Article
PubMed/NCBI
Google Scholar

[274] View Article

[275] PubMed/NCBI

[276] Google Scholar

[ref78] 78. Joompang A, Anwised P, Klaynongsruang S, Taemaitree L, Wanthong A, Choowongkomon K, et al. Rational design of an N-terminal cysteine-containing tetrapeptide that inhibits tyrosinase and evaluation of its mechanism of action. Curr Res Food Sci. 2023;7:100598. pmid:37790858
View Article
PubMed/NCBI
Google Scholar

[278] View Article

[279] PubMed/NCBI

[280] Google Scholar

[ref79] 79. Putri SA, Maharani R, Maksum IP, Siahaan TJ. Peptide design for enhanced anti-melanogenesis: optimizing molecular weight, polarity, and cyclization. Drug Design Develop Therapy. 2025:645–70.
View Article
Google Scholar

[282] View Article

[283] Google Scholar

[ref80] 80. Yu Z, Lv H, Zhou M, Fu P, Zhao W. Identification and molecular docking of tyrosinase inhibitory peptides from allophycocyanin in Spirulina platensis. J Sci Food Agric. 2024;104(6):3648–53. pmid:38224494
View Article
PubMed/NCBI
Google Scholar

[285] View Article

[286] PubMed/NCBI

[287] Google Scholar

[ref81] 81. Song EC, Park C, Shin Y, Kim WK, Kim SB, Cho S. Neurog1-derived peptides RMNE1 and DualPep-Shine penetrate the skin and inhibit melanin synthesis by regulating MITF transcription. Int J Mol Sci. 2023;24(7):6158. pmid:37047130
View Article
PubMed/NCBI
Google Scholar

[289] View Article

[290] PubMed/NCBI

[291] Google Scholar

[ref82] 82. Li J, Yin S, Wei Z, Xiao Z, Kang Z, Wu Y, et al. Newly identified peptide Nigrocin-OA27 inhibits UVB induced melanin production via the MITF/TYR pathway. Peptides. 2024;177:171215. pmid:38608837
View Article
PubMed/NCBI
Google Scholar

[293] View Article

[294] PubMed/NCBI

[295] Google Scholar

[ref83] 83. Olsen TH, Yesiltas B, Marin FI, Pertseva M, García-Moreno PJ, Gregersen S, et al. AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides. Sci Rep. 2020;10(1):21471. pmid:33293615
View Article
PubMed/NCBI
Google Scholar

[297] View Article

[298] PubMed/NCBI

[299] Google Scholar

Figures

Abstract

1. Introduction

2. Materials and methods

2.1 Data collection and curation

2.2 Feature encoding scheme

2.3 Feature selection method

2.4 Bidirectional long short-term memory and self-attention mechanism

2.5 Self-attention mechanism

2.6 The overall framework and performance of BLSAM-TIP

Procedure I: Multi-view feature extraction and optimization.

Procedure II: TIP identification using the BiLSTM-SAM-DNN architecture.

3. Results and discussion

3.1 Performance evaluation of different feature representations

3.2 Determination of optimal feature subsets

3.3 Analysis of the contribution of our multi-view features

3.4 Performance comparison between BiLSTM-TIP and several conventional ML and DL models

3.5 Performance comparison between BLSAM-TIP and the existing methods

3.6 Ablation study

3.7 Case study

4. Conclusions

Supporting information

S1 Table. A number of TIPs and non-TIPs used for developing three TIP predictors.

S2 Table. Information of parameter settings for 12 ML methods used in this study.

S3 Table. Information of parameter settings for five DL methods used in this study.

S4 Table. Comparison of the prediction results of BLSAM-TIP and conventional ML methods over the cross-validation and independent tests.

S5 Table. Comparison of the prediction results of BLSAM-TIP and conventional DL methods over the cross-validation and independent tests.

S6 Table. Detailed information of new TIPs and non-TIPs in the case studies.

S7 Table. Detailed prediction results of TIP-KNN, TIP-RF, TIPred, and BLSAM-TIP on case studies.

References