Figures
Abstract
Drug-target interactions (DTIs) play a critical role in drug discovery and repurposing. Deep learning-based methods for predicting drug-target interactions are more efficient than wet-lab experiments. The extraction of original and substructural features from drugs and proteins plays a key role in enhancing the accuracy of DTI predictions, while the integration of multi-feature information and effective representation of interaction data also impact the precision of DTI forecasts. Consequently, we propose a drug-target interaction prediction model, SSCPA-DTI, based on substructural subsequences and a cross co-attention mechanism. We use drug SMILES sequences and protein sequences as inputs for the model, employing a Multi-feature information mining module (MIMM) to extract original and substructural features of DTIs. Substructural information provides detailed insights into molecular local structures, while original features enhance the model’s understanding of the overall molecular architecture. Subsequently, a Cross-public attention module (CPA) is utilized to first integrate the extracted original and substructural features, then to extract interaction information between the protein and drug, addressing issues such as insufficient accuracy and weak interpretability arising from mere concatenation without interactive integration of feature information. We conducted experiments on three public datasets and demonstrated superior performance compared to baseline models.
Citation: Shi H, Hu J, Zhang X, Jin S, Xu X (2025) Prediction of drug-target interactions based on substructure subsequences and cross-public attention mechanism. PLoS One 20(5): e0324146. https://doi.org/10.1371/journal.pone.0324146
Editor: Claudio Zandron, University of Milano–Bicocca: Universita degli Studi di Milano-Bicocca, ITALY
Received: August 13, 2024; Accepted: April 22, 2025; Published: May 30, 2025
Copyright: © 2025 Shi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work;
Competing interests: The authors have declared that no competing interests exist.
Introduction
Accurately predicting drug-target interactions (DTI) is essential for drug discovery and repurposing. While conventional experimental techniques in the laboratory are still extremely reliable, they are also notably time-consuming and require significant manual effort. Researchers must conduct extensive chemical and biomedical experiments in the lab, screening from a large pool of drugs, while also facing issues such as limited data acquisition and poor scalability. Meanwhile, researchers have begun to apply various machine learning methods to DTI prediction and have made significant progress. For example, Support Vector Machines (SVM) [1] and Random Forest (RF) [2,3]. However, traditional machine learning models have relatively limited performance when dealing with complex nonlinear relationships. The interactions between biomolecules are often highly complex nonlinear processes, which may prevent traditional machine learning models from capturing this complexity, thus limiting the accuracy and adaptability of DTI predictions.
Deep learning models generally exhibit better performance than traditional machine learning [4–6] models because they can learn complex nonlinear relationships. They are particularly suited for describing intricate interactions between biomolecules, making them highly adaptable to high-throughput data in the biomedical field.
Similar to using textCNN for semantic learning of word sequences, Huang et al. [7] proposed an model, MolTrans, which takes drug and protein sequences as inputs. By incorporating a transformer encoder, the model captures the interaction information between drugs and proteins in greater detail, making the interaction maps produced by the model more interpretable. However, when considering drug-target interactions, the model only uses substructural information and overlooks the original features of DTIs. The more global information contained in the original features is not fully utilized, thus limiting the model’s understanding of the molecular overall structure. Ozturk et al. [8] introduced DeepDTA, which utilizes two distinct convolutional neural network (CNN) blocks to separately process SMILES strings and protein sequences. This approach aids in capturing local features within protein and drug sequences, enhancing the model’s ability to model complex relationships. Bai et al. [9] developed DrugBAN, which leverages graph convolutional networks (GCNs) and one-dimensional convolutional neural networks (1D-CNNs) to extract substructural features from drug molecular graphs and protein sequences, respectively. Subsequently, a bilinear attention network module explicitly learns the local interaction relationships between drug-target pairs. Lee et al. [10] developed DeepConv-DTI, a deep learning model employing a 1D-CNN. This model outperforms earlier machine learning models by effectively extracting local residue features from protein sequences using CNN, capturing key local features of proteins more effectively than other protein descriptors. Subsequently, the model employs the extended-connectivity fingerprint (ECFP) [11] algorithm to extract feature information from drugs. However, it does not account for the interaction mechanisms between drugs and proteins, resulting in an inability to capture the interaction patterns between drugs and targets, which impacts the model’s prediction accuracy.
In recent years, researchers have increasingly incorporated various novel attention mechanisms [12] into DTI models to more effectively mine the association information between drugs and proteins. Zhao et al. [13] proposed a specifically designed attention mechanism called HyperAttention, which integrates convolutional neural networks (CNNs) with attention mechanisms to visualize attention scores across spatial and channel dimensions. This approach enables a more comprehensive capture of interaction information between drugs and targets. Huang et al. [14] introduced CoaDTI-pro, a novel interaction feature extraction mechanism. This model consists of stacked cross-attention modules and an encoder-decoder structure, forming a multimodal feature extractor. Although CoaDTI-pro effectively extracts interaction features between drugs and proteins from multimodal data, it exhibits high computational complexity. Shin et al. [15] introduced a pretrained molecular Transformer encoder for drug feature extraction, enabling the model to learn representations of drug molecular structures from extensive molecular data. While improving the model’s ability to comprehend and represent molecular internal data, the computational complexity inherent in the Transformer architecture leads to increased costs during the training and inference phases. Wang et al. [16] employed a heterogenous graph-based algorithmic framework, autonomously extracting useful meta-paths for DTI prediction from heterogenous graphs. This overcomes the dependency on manually defined meta-paths in traditional methods and enhances the adaptability of the algorithm. Although graph neural networks excel in relation detection [17,18], they are less flexible and less efficient when processing large-scale data, making them difficult to scale to large networks. Gong et al. [19] proposed HS-DTI, which utilizes a stacked multi-layer graph neural network to identify and capture specific functional group information in drug molecules, and a CNN module to obtain first and second order sequence information of proteins. The features of proteins and drug molecules are subsequently concatenated to perform predictive tasks. However, this simple cascading operation overlooks cross-modal complementarity and fails to calculate which specific portions of the drug molecule contribute most significantly to the interaction with the target protein.
To address these challenges, we have considered the domain knowledge of substructures, modeling of the molecular overall features, and representation of drug-target interaction relationships. Our proposed model, termed SSCPA-DTI, is a drug-target interaction prediction approach that incorporates substructure subsequences and a cross co-attention mechanism. Through the Substructure Information Mining Module (MIMM), the model extracts substructural features of drugs and proteins, enhancing the granular understanding of critical structural information. Simultaneously, the MIMM algorithm preserves the original features of the drugs and targets, fully utilizing the more global information contained in these original features. Subsequently, a CNN module is used for further feature extraction of the substructural and original features of the drugs/targets. The extracted features pertaining to drugs, proteins, and substructures are subsequently integrated via the cross-co-attention module, followed by the extraction of interaction information. This approach differs from previous models that mechanically concatenate multiple features of drugs/targets, improving the model’s accuracy and interpretability. We compared SSCPA-DTI with other advanced baseline models. Results indicate that SSCPA-DTI performs excellently on three commonly used drug-target datasets.
Methods
Fig 1(A) depicts the architectural design of our proposed model, which comprises five distinct components: the Multi-Information Mining Module (MIMM), an embedding layer, a CNN block, a Cross-Co-Attention Module (CPA), and a Fully Connected Network prediction module (FCN).
The core of the model comprises the MIMM and CPA modules. The MIMM module filters substructural features of the drug (or target) while preserving original features, the CNN module further refines features initially extracted by the multi-feature extraction module, and the CPA module is used for feature integration and the extraction of interaction information.
Multi-feature information mining module
SSCPAT-DTI first orderly decomposes the drug sequences and protein sequences into substructure sequences and preserves their corresponding original sequences. In the domain of natural language processing, the application of subword units [20] has already achieved significant results. We apply these ideas to the mining of substructure information from drug sequences and protein sequences.
Inspired by the BPE (Byte Pair Encoding) algorithm in the field of natural language processing and the PrefixSpan algorithm employed in bioinformatics, we propose a multi-feature information mining module (MIMM) to discover recurrent subsequences in drug and protein databases. MIMM hierarchically decomposes each protein/drug sequence, Decomposing it into subsequences, more diminutive subsequences, and discrete atoms or amino acid symbols, while preserving their corresponding original sequences. We decompose each sequence into a series of orderly discovered frequent subsequences. This process is crucial because these subsequences not only decompose the original sequence but also meet two important conditions: firstly, the union of these subsequences can completely reconstruct each element in the original sequence; secondly, each subsequence is independent of each other without overlapping. The MIMM module is summarized in Algorithm 1.
First, MIMM initializes a set of tokens, denoted as L, for tokenizing protein amino acid/SMILES string characters. Then, with the given set of tokens L, The drugs and proteins each have their own token sets and
, denoted as, the entire corpus E is tokenized to obtain a tokenized set R, where E can be protein sequences or SMILES sequences from datasets such as Human, C.elegans, KIBA, etc. Next, it iterates through R and identifies the most frequent consecutive tokens (P, Q). Then, MIMM uses the new token (PQ) to update each (P, Q) in the tokenized set R and adds the new token to the token set L. This process is repeated for scanning, identifying, and updating until there are no frequent tokens higher than the threshold d or the size of the vocabulary set L reaches the predefined maximum value ф. Through this process, frequent subsequences are merged into a token, while infrequent subsequences undergo decomposition into a collection of more diminutive tokens, MIMM generates a sequence
, where each
is a substructure drug or target protein with a size of k, B is the original sequence of drugs/proteins, and each
comes from the set L. Through MIMM, input drug and protein sequences can be transformed into sequences of explicit substructures
and
, as well as original sequences
and
.
To enable efficient batch training, we investigated the distribution of protein sequence lengths within the dataset, as depicted in Fig 2. We then established a maximum permissible length (MaxL) and implemented either truncation or zero-padding techniques on the respective word embedding matrices.
Embedding layer
The embedding layer consists of four distinct embedding modules (Fig 1(B)). For every drug-target pair provided as input, we convert the corresponding substructures ,
, and the original sequences
, and
into four matrices:
,
,
,
.
For the substructural matrices and
, k/l represents the total size of the protein/drug substructures or the cardinality of the vocabulary set L from the MINN algorithm, while
and
are the maximum lengths of the protein and drug substructure sequences, respectively. Each column
and
in the matrices are one-hot vectors corresponding to the ith substructure of the protein sequence and the jth substructure of the drug sequence, respectively.
This representation allows the model to effectively capture and distinguish different substructural features, which is crucial for improving prediction accuracy in drug-target interactions.
The content embeddings for proteins and drugs, and
, are generated via learnable dictionary lookup matrices
and
:
Where ϑ denotes the dimension of the latent embedding vector corresponding to each substructure. By using learnable matrices, the model can adaptively learn the importance of different substructures, enhancing its ability to model complex interactions between drugs and targets.
Since MIMM uses sequential substructures, we also include position embeddings obtained via query dictionaries ,
[21]. These are generated by querying dictionary matrices
and
:
Where /
is a one-hot vector, with the i/j-th position being 1. This step ensures effective modeling of the positions of elements in the substructure sequence. which is crucial in biological sequences where the relative positioning of features can significantly impact functionality.
The sum of the content and position embedding matrices produces the final embedding matrices and
:
This combined representation not only encapsulates the structural information of the substructures but also retains positional context, allowing for enhanced understanding in downstream tasks.
Similarly, for the protein and drug original sequence matrices and
, z/g represents the lengths of the protein/drug sequences, while
and
are the channel dimensions of the embedding vectors for proteins and drugs, respectively.
The content embeddings ,
for proteins and drugs are generated by querying learnable dictionary lookup matrices
and
, respectively:
δ represents the size of the latent embeddings for proteins/drugs. This process allows the model to capture rich representations of the original sequences, facilitating better feature extraction for complex biological interactions. Position embeddings and
[21] are generated via lookup in the dictionary matrices
and
):
and
are one-hot vectors, where the i-th/j-th position is set to 1. This step reinforces the importance of element positions in the final embeddings, particularly in sequential data where the order of elements is significant.
The sum of the content and position embedding matrices produces the final embedding matrices and
:
This final step integrates both content and positional information, ensuring that the model has a comprehensive representation of each sequence, crucial for effectively modeling the interactions between drugs and targets.
Since proteins, drugs, and their substructure information belong to different feature spaces, our approach employs four independent CNN blocks (Fig 1(C)) focusing on processing drugs, proteins, drug substructure subsequences, and protein substructure subsequences respectively. Each CNN block consists of three consecutive 1D-CNNs, a design efficient in extracting sequence semantic information [22].
For drug and protein sequences, the kernel sizes for each of the three convolutional layers vary, reflecting the distinct structural patterns in proteins and drugs. Specifically, the drug convolutional layers use kernel sizes of 4, 6, and 8, while the protein layers utilize kernel sizes of 4, 8, and 12. This difference in kernel sizes allows the model to capture varying levels of local sequence patterns unique to each molecular type.
The CNN module transforms the input protein embedding matrix , drug embedding matrix
, protein substructure embedding matrix
, and drug substructure embedding matrix
into
respectively.
Cross co-attention module
Through the preceding modules, we have successfully extracted the original and substructural features of drugs and proteins. Next, we initially integrate the original feature information and substructural features of the drug (or protein), and then extract the representation of interactions between drug targets. Inspired by prior work [23], we constructed a Cross-Co-Attention Module (as shown in Fig 3) in a cascading manner. Its core is composed of stacked modules: DA (Drug Self-Attention), PA (Protein Self-Attention), PDA (Protein-Drug Attention), and DPA (Drug-Protein Attention).
For the input drug or protein features, the DA and PA modules (as shown in Fig 4) are inspired by sequence-to-sequence models, enabling a more intuitive fusion of drug (or protein) original features with substructure features. This approach allows for enhanced flexibility in capturing various interactions, which is crucial in modeling complex biological systems where interactions can vary significantly.
The DA module (Fig 4(A)) first concatenates the input matrices and
to obtain the drug matrix
. Here,
and
are the substructure feature matrix and original feature matrix of the drug, respectively. Then, the DA module feeds
into a self-attention mechanism to fuse the original features and substructure features of the drug. The inputs for drug self-attention query, key, and value are computed using the following formulas:
This linear transformation allows the model to focus on the most relevant features, facilitating the attention mechanism’s ability to highlight critical interactions among the elements of the drug’s representation.
Next, the drug feature fusion matrix is calculated as follows using the softmax function:
Here, is used to transform the feature fusion matrix into a standard normal distribution. This normalization step is essential for maintaining numerical stability during training, ensuring that the gradients do not explode or vanish.
Our Cross Co-Attention Module incorporated a multi-head attention mechanism composed of h parallel attention heads. Each of these heads generated a corresponding set of output values, which were subsequently concatenated. This concatenated output then underwent projection, ultimately yielding the drug self-attention matrix .
Where are projection matrices for the i-th attention head, and
. Here,
is the output dimension of each attention head. Similarly, the input of the PA module (Fig 4(B)) consists of
and
, resulting in the protein feature matrix
, but the goal is to fuse the original features and substructure features of the protein. The inputs for protein self-attention - query, key, and value - are all calculated from
, and the final protein self-attention matrix is denoted as
.
The PDA and DPA modules model the spatial and channel dimensions and process the feature matrices through attention mechanisms. They compute cross-attention between drugs and targets to capture their complex interaction relationships, thereby enhancing their feature representation capabilities. This cross-attention mechanism is particularly beneficial in understanding how specific components of a drug influence the behavior of target proteins, allowing for more informed predictions in drug discovery.
The Protein-Drug Attention (PDA) module, illustrated in Fig 4(C), is designed to compute the influence exerted by different components of a drug molecule on the target protein. Specifically, PDA receives two key feature inputs, and
. Here,
represents the protein feature vector, while
denotes the drug feature vector. The keys and values are derived from
, whereas the queries are computed from
. By introducing the multi-head attention mechanism, PDA can learn the complex relationships between
and
in pairs, and output high-dimensional protein vectors based on the cross-modal similarity of all atomic features between
in
.
Similarly, the role of DPA (Fig 4 (D)) is similar. DPA can effectively measure the effects of different parts of target proteins on drugs. DPA receives and
as inputs, generates Keys and values using
, and calculates queries through
. This reciprocal attention mechanism facilitates a deeper understanding of how protein features can affect drug interactions, which is vital for predicting drug efficacy and side effects.
The feature representations Y_update, obtained as output from all attention units, are subsequently input into a feedforward layer followed by a dropout layer. Moreover, to bolster the model’s robustness, we incorporated residual connections and normalization techniques.
This incorporation of dropout layers aids in preventing overfitting, especially in complex models where the risk of memorizing training data is high.
Ultimately, the final attention feature matrices (Y_pA and Y_dA) and the original feature matrices undergo integration via residual connections, yielding the final feature matrix:
This final integration allows the model to balance information derived from attention mechanisms and the original feature matrices, providing a comprehensive representation that enhances predictive performance.
Prediction module
The forecasting component incorporates two maximum pooling layers spanning the entire input, a cascading layer, and one FCN. In this design, global max pooling is applied to the protein feature map and the drug feature map
, resulting in 1D feature
, both with dimension
. See the equation below:
The downsampled drug and protein feature vectors are concatenated to form (dimension
). See the equation below:
Ultimately, the concatenated feature vector is fed into the FCN for DTI prediction. In this module, we use the Leaky Rectified Linear Unit (Leaky ReLU) [24] as the activation function to enhance the model’s ability to express nonlinearity. To effectively address overfitting, we have introduced a Dropout layer following each FCN. The final layer of this output module undertakes the task of representing the likelihood of interaction, outputting a probability value. Considering our task involves binary classification, we have chosen the binary cross-entropy loss function for training the model. The mathematical expression for this loss function is:
Here, y represents the true label.
Through such a design, we not only preserve the flexibility and non-linearity of the network but also introduce effective mechanisms to prevent overfitting, thus reliably accomplishing the task of binary classification.
Experiments
During the training process, we followed an 80:20 split to divide the dataset into training and testing sets. Subsequently, the training set was further partitioned into five subsets, with four subsets utilized as training data to train the model, while the remaining subset served as validation data to assess the model’s performance. When the performance of the model on the validation set no longer showed improvement, we proceeded to evaluate its performance on the testing set and retained the corresponding experimental results.
Datasets
We evaluated our proposed model using three publicly accessible datasets: Human, C.elegans, and KIBA.
The Human and C.elegans datasets were developed by Liu et al. [6]. For these datasets, we employed the construction methodology from CoaDTI [14], ensuring a balanced dataset. The Human dataset includes 3369 positive interactions among 1052 compounds and 852 proteins. The C.elegans dataset comprises 4000 positive interactions involving 1434 compounds and 2504 proteins.
As for the KIBA [25] dataset, it covers information related to kinase inhibitor bioactivity. We applied the dataset construction method from HyperAttentionDTI [13] to create an imbalanced dataset. This KIBA dataset consists of 22,154 positive and 94,196 negative interactions derived from 2068 drugs and 225 proteins.
Evaluation indicators
To ensure a fair and reasonable comparison with baseline models on the Human and C.elegans datasets, we selected the Area Under the ROC Curve (AUC) as our primary evaluation metric. Additionally, we considered Precision and Recall [14,25,26]. The AUC measures the area under the ROC curve, enclosed by the coordinate axes; a value closer to 1 indicates higher model validity. In the formulas for calculating Precision and Recall, TP are the correctly predicted positive samples, representing the number of drug targets with interactions. FP are positive samples incorrectly predicted. TN are correctly predicted negative samples, representing drug targets without interactions, while FN are negative samples incorrectly predicted.
Additionally, on the KIBA dataset, we employed accuracy (Acc), precision, recall, AUC, and AUPR as metrics to assess the model performance, where AUPR is the area under the precision-recall curve, with a larger area indicating better model performance. The optimal results for each metric will be highlighted in bold to present the model performance on different datasets more clearly.
Results
Performance on the C.elegans and Human datasets
The cross-validation results across five folds for our model, applied to the C.elegans and Human datasets, are depicted in Fig 5(A) to account for potential chance fluctuations. The final outcomes utilize the mean values.
On the C.elegans dataset, as presented in Table 1, we contrast our approach against baseline machine learning models and sophisticated deep learning techniques (including Random Forest (RF), GCN, CPI-GNN [26], MHSADTI [27], TransformerCPI [28], CoaDTI-pro [14], and Wang’s Methodology [29]). Our method exhibits leading performance across the evaluation metrics AUC, Precision, and Recall. Compared to the top-performing baseline models, our approach enhances AUC by 0.4%, Precision by 1.4%, and Recall by 0.2%.
Furthermore, on the Human dataset, as presented in Table 2, our method achieves comparable or superior performance relative to the baseline models, including TransformerCPI, CPI-GNN, GanDTI [30], IIFDTI [31], CoaDTI-pro, and Wang’s Method. Particularly, in terms of AUC and Precision, our method surpasses the optimal performances of all baseline models, showing a 0.6% improvement in AUC and a 1.4% increase in Precision. The slightly lower Recall compared to the baseline models is due to our more cautious approach in predicting samples as positives during model training, thereby reducing false positives (FP).
Performance on the KIBA datasets
Ultimately, we applied and evaluated our proposed methodology on the KIBA dataset, conducting comparative trials against baseline models.
Table 3 outlines the results in detail. The KIBA dataset exhibits a pronounced category imbalance, presenting substantial hurdles that commonly impede the performance of deep neural networks. Nonetheless, our approach surpassed the peak performance of the baseline models across metrics like AUC, AUPR, ACC, and Precision.
Ablation experiments
To further validate the efficacy of our proposed methodology, we performed ablation experiments on the C.elegans dataset. These experiments targeted both the multi-feature information extraction module and the cross co-attention module. Initially, we removed the multi-feature information mining module from SSCPA-DTI, created a variant model called Without-MINN, and compared it with SSCPA-DTI to verify the effectiveness of the multi-feature information mining module. As shown in Fig 6, the multi-feature information mining module improved accuracy by 0.04 (4.29%), AUC by 0.037 (3.87%), and Recall by 0.027 (2.90%). This indicates the effectiveness of the multi-feature information mining module in significantly enhancing model performance. Secondly, we removed the cross co-attention module from SSCPA-DTI, inputted the extracted drug and target features into the prediction module to form another model named Without-CPA, and compared it with SSCPA-DTI to validate the importance of the cross co-attention module. As illustrated in Fig 6, the cross co-attention module increased accuracy by 0.003 (0.31%), AUC by 0.003 (0.3%), and Precision by 0.008 (0.83%). These results further emphasize the crucial role of the cross co-attention module in enhancing model performance.
Green dashed lines represent hydrogen bond interactions, and red dashed lines represent electrostatic interactions.
Case study
We randomly selected a drug and its interacting target protein from the DrugBank dataset, and then used a pre-trained model to predict their interactions. Glutathione (DB00143) is the drug we randomly selected. Glutathione plays an important role in detoxification processes, as it can bind to some toxic substances or metabolites to help cells eliminate them. It can interact with targets such as glutathione S-transferase (Q04760). Table 4 shows the interaction results between glutathione and Q04760, as well as several other targets. For glutathione (DB00143), we had one incorrect prediction out of 10 positive samples and only two incorrect predictions out of 10 negative samples, resulting in an accuracy of 85%.
As shown in Fig 6, we retrieved glutathione S-transferase from the PDB database. We conducted molecular docking of DB00143 and Q04760 using software such as PYMOL, AutoDockTools, and AutoDock Vina. The docking results were visualized using Discover Studio and PyMOL. The docking free energy score between Q04760 and DB00143 is -4.9 kcal/mol. The small molecule forms hydrogen bonds with HIS126 and GLU172, and electrostatic interactions with LYS150, GLU99, and GLU172.
The aforementioned experimental findings highlight that our proposed approach exhibits strong predictive capabilities and generalization competence when forecasting drug-target interactions.
Conclusion
We propose a DTI prediction model based on substructure subsequences and a cross-coattention mechanism, integrating multi-feature information to predict DTIs. It extracts features from drug and protein sequences, including substructural features of drug-target interactions and original features; substructural information provides detailed insights into the local molecular structures, while original features encompass more global molecular characteristics. The cross co-attention module first merges the extracted original and substructural feature information, then captures the interactive data between the proteins and drugs. By integrating both original and substructural feature information, enhancing the model’s understanding of the overall molecular structure and enabling it to better differentiate molecules with similar global structures but distinct substructures. Across all experimental configurations, the outcomes showcase that our model exhibits remarkable proficiency regarding metrics such as AUC, Precision, and other evaluation criteria.
References
- 1. Faulon J, Misra M, Martin S, Sale K, Sapra R. Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor. Bioinformatics. 2008;24(2):225–33.
- 2. Breiman L. Random forests. Mach Learn 2001; 45(1): 5–32.
- 3. Wang X-R, Cao T-T, Jia CM, Tian X-M, Wang Y. Quantitative prediction model for affinity of drug-target interactions based on molecular vibrations and overall system of ligand-receptor. BMC Bioinformatics. 2021;22(1):497. pmid:34649499
- 4. Ballester PJ, Mitchell BJ. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26(9):1169–75.
- 5. Bleakley K, Yamanishi Y. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics. 2009 Sep 15;25(18):2397-403.
- 6. Liu H, Sun J, Guan J, Zheng J, Zhou S. Improving compound–protein interaction prediction by building up highly credible negative samples. Bioinformatics. 2015 Jun 15;31(12):i221-9.
- 7. Huang K, Xiao C, Glass LM, Sun J. Moltrans: molecular interaction transformer for drug–target interaction prediction. Bioinformatics. 2021;37(6):830–6.
- 8. Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics. 2018;34(17):i821-9.
- 9. Bai P, Miljković F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug–target prediction. Nat Mach Intell. 2023;5(2):126–36.
- 10. Lee I, Keum J, Nam H. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol. 2019;15(6):e1007129. pmid:31199797
- 11. Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54. pmid:20426451
- 12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Adv Neural Inf Process Syst. 2017.
- 13. Zhao Q, Zhao H, Zheng K, Wang J. Hyperattentiondti: improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism. Bioinformatics. 2022;38(3):655–62.
- 14. Huang L, Lin J, Liu R, Zheng Z, Meng L, Chen X, et al. CoaDTI: multi-modal co-attention based framework for drug-target interaction annotation. Brief Bioinform. 2022;23(6):bbac446. pmid:36274236
- 15. Shin B, Park S, Kang K, Ho JC. Self-attention based molecule representation for predicting drug-target interaction. arXiv. arXiv preprint arXiv:1908.06760. 2019.
- 16. Wang H, Huang F, Xiong Z, Zhang W. A heterogeneous network-based method with attentive meta-path extraction for predicting drug-target interactions. Brief Bioinform. 2022;23(4):bbac184. pmid:35641162
- 17. Tsubaki M, Tomii K, Sese J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics. 2019;35(2):309–18. pmid:29982330
- 18. Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J. Identifying drug-target interactions based on graph convolutional network and deep neural network. Brief Bioinform. 2021;22(2):2141–50. pmid:32367110
- 19.
Gong X, Liu M, Sun H, Li M, Liu Q. Hs-dti: Drug-target interaction prediction based on hierarchical networks and multi-order sequence effect. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2022 Dec 6 (pp. 322-327).
- 20. Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. 2015 Aug 31.
- 21. Gage P. A new algorithm for data compression. C Users J. 1994;12(2):23–38.
- 22.
Chen Y. Chen Y. Convolutional neural network for sentence classification (Master’s thesis). University of Waterloo.
- 23.
Yu Z, Yu J, Cui Y, Tao D, Tian Q. Deep modular co-attention networks for visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. 2019. 6281–90.
- 24.
Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In: Proc. Int. Conf. Mach. Learn. 2013. 3.
- 25. Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, et al. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model. 2014;54(3):735–43. pmid:24521231
- 26. Tsubaki M, Tomii K, Sese J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics. 2019;35(2):309–18. pmid:29982330
- 27. Cheng Z, Yan C, Wu FX, Wang J. Drug-target interaction prediction using multi-head self-attention and graph attention network. IEEE/ACM Trans Comput Biol Bioinformatics. 2021;19(4):2208–18.
- 28. Chen L, Tan X, Wang D, Zhong F, Liu X, Yang T, et al. Transformercpi: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020;36(16):4406–14.
- 29.
Wang K, Hu J, Zhang X. Identifying drug–target interactions through a combined graph attention mechanism and self-attention sequence embedding model. In: Springer Nature Singapore. 246–57.
- 30. Wang S, Shan P, Zhao Y, Zuo L. Gandti: A multi-task neural network for drug-target interaction prediction. Comput Biol Chem. 2021;92:107476.
- 31. Cheng Z, Zhao Q, Li Y, Wang J. Iifdti: predicting drug–target interactions through interactive and independent features based on attention mechanism. Bioinformatics. 2022;38(17):4153–61.