Predicting drug–Protein interaction with deep learning framework for molecular graphs and sequences: Potential candidates against SAR-CoV-2

Weian Du; Liang Zhao; Rong Wu; Boning Huang; Si Liu; Yufeng Liu; Huaiqiu Huang; Ge Shi

doi:10.1371/journal.pone.0299696

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the COVID-19 disease, which represents a new life-threatening disaster. Regarding viral infection, many therapeutics have been investigated to alleviate the epidemiology such as vaccines and receptor decoys. However, the continuous mutating coronavirus, especially the variants of Delta and Omicron, are tended to invalidate the therapeutic biological product. Thus, it is necessary to develop molecular entities as broad-spectrum antiviral drugs. Coronavirus replication is controlled by the viral 3-chymotrypsin-like cysteine protease (3CLpro) enzyme, which is required for the virus’s life cycle. In the cases of severe acute respiratory syndrome coronavirus (SARS-CoV) and middle east respiratory syndrome coronavirus (MERS-CoV), 3CLpro has been shown to be a promising therapeutic development target. Here we proposed an attention-based deep learning framework for molecular graphs and sequences, training from the BindingDB 3CLpro dataset (114,555 compounds). After construction of such model, we conducted large-scale screening the in vivo/vitro dataset (276,003 compounds) from Zinc Database and visualize the candidate compounds with attention score. geometric-based affinity prediction was employed for validation. Finally, we established a 3CLpro-specific deep learning framework, namely GraphDPI-3CL (AUROC: 0.958) achieved superior performance beyond the existing state of the art model and discovered 10 molecules with a high binding affinity of 3CLpro and superior binding mode.

Citation: Du W, Zhao L, Wu R, Huang B, Liu S, Liu Y, et al. (2024) Predicting drug–Protein interaction with deep learning framework for molecular graphs and sequences: Potential candidates against SAR-CoV-2. PLoS ONE 19(5): e0299696. https://doi.org/10.1371/journal.pone.0299696

Editor: Gurudeeban Selvaraj, Concordia University, CANADA

Received: December 4, 2023; Accepted: February 14, 2024; Published: May 10, 2024

Copyright: © 2024 Du et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The original data can be assessed at Figshare with link: https://figshare.com/articles/dataset/GraphDPI-3CL_dataset/25139291 The source code can be assessed at https://github.com/duwa2/GraphDPI-3CL.

Funding: This work was supported in part by the National Natural Science Foundation of China (81974300), China Postdoctoral Science Foundation COVID-19 Prevention Special Project (Grant No.: 2020T130151ZX), the Youth Program of National Natural Science Foundation of China (Grant No. 81902012). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

The outbreak of the coronavirus that causes severe acute respiratory syndrome in 2019 has resulted in a global pandemic of severe pneumonia-like disease and known as COVID-19. During the infection process, the coronavirus genome utilized the host ribosomes and translated into a large polypeptide (PP) chain, which is then cleaved by two proteases, papain like proteases (PLpro) and 3-chyomotrypsin like protease (3CLpro), to generate several non-structural proteins (NSPs) required for viral replication [1]. PLpro and 3CLpro (M^pro) cleave the PP chain into 16 NSPs, and 11 NSPs are generated by the 3CLpro [2,3]. M^pro mediate viral replication by cleaving viral polyprotein precursors at specific sites [4] and inhibition of M^pro can block the synthesis of viral proteins, suggesting that 3CLpro can be one of the target candidates for developing anti-COVID-19 drugs [5,6].The 3CLpro protein sequences for SARS-CoV-2 are publicly available on the Protein Data Bank and BindingDB 3CLpro dataset [4]. Herein, we have selected 3CLpro of SARS-CoV-2 as a target candidate to identify potential inhibitors since 3CLpro is indispensable for viral replication. Furthermore, numerous isoflavone glycosides were isolated from soybean seed hypocotyls, demonstrating the capacity to prevent coronavirus-induced cytopathological effects [7].

Issahaku et al. elucidate the binding mechanism of MRTX1133, a potential KRASG12D inhibitor, offering insights that could inform the targeting of analogous protein interaction domains within SARS-CoV-2 [8]. Similarly, the exploration of dual COX and LOX inhibition by compounds in Indian spices suggests an innovative therapeutic approach to viral mitigation by modulating the inflammatory response, which may be applicable to SARS-CoV-2 [9]. Furthermore, the molecular interactions of bioactive compounds as potential SARS-CoV-2 protease inhibitors highlight a promising direction for multi-targeted antiviral strategies [10]. This is complemented by research into arylhydrazone derivatives, indicating a new frontier for antiviral drug development relevant to SARS-CoV-2 [11]. Additionally, the molecular structure elucidation of bioactive compounds provides a foundational framework for structural-based drug discovery efforts aimed at combating SARS-CoV-2 [12]. Finally, a comprehensive review underscores the potential of nutraceuticals in enhancing immunity against respiratory viruses, including SARS-CoV-2 [13].

Recently, there has been a surge of AI in drug discovery over the past decade, which is still gaining popularity. AI has the potential to revolutionize medication research, such as the proteome of Alphafold DB [14,15]. Through technical advancements such as the use of neural networks to design compounds and the application of knowledge graphs to comprehend target biology, AI-enabled drug development [16,17] has progressed significantly over the last few years. Several AI-native drug discovery businesses have advanced compounds into clinical trials, indicating considerably quicker schedules and lower costs in certain cases, setting high expectations in the R&D sector [18,19]. In addition, several well-known pharmaceutical corporations have created discovery agreements with AI firms in order to investigate the technology. Despite this advancement, AI in drug discovery is still in its infancy, with many unanswered issues concerning its effect and future possibilities.

In our research, we initially introduced an innovative deep learning framework based on an attention mechanism, specifically designed for molecular graphs and sequences, which was trained using the BindingDB 3CLpro dataset, consisting of 114,555 compounds. Following the development of this model, we performed extensive screening on a combined in vivo and in vitro dataset comprising 276,003 compounds from the Zinc Database, and we subsequently visualized the potential candidate compounds, highlighting them with their respective attention scores. To validate our findings, we applied a geometric-based approach for predicting binding affinity. Culminating our efforts, we established a specialized deep learning framework tailored for 3CLpro, denoted as GraphDPI-3CL, which yielded an impressive AUROC of 0.958. This framework enabled us to identify 10 molecules that demonstrated high binding affinity toward 3CLpro, along with an exceptional binding mode.

2. Materials and methods

2.1 Data preprocessing

The Lipinski parameters for molecules were used for druggability evaluation, which were calculated using RDKit [20]. The molecules in Zinc database were filtered by Lipinski rules including: 1) Molecular weight is less than 500 Da; 2) Number of H bond donors (including hydroxyl, amino, etc.) does not exceed 5; 3) Number of H bond acceptors in the compound does not exceed 10; 4) logarithm of the fat-water partition coefficient (logP) is between -2 and 5; 5) Number of rotatable bonds in the compound does not exceed 10. Detecting problematic candidates at early stage can significantly reduce wasted time and resources, and thus the ADME-related properties were calculated and filtered by using the QikProp [21] program in normal mode. The overall 46 related descriptors were chosen to filter our natural products. Some important descriptors include: 1) the CNS parameter: predicted central nervous system activity on a −2 (inactive) to +2 (active) scale. 2) QPlogS: the logarithm of aqueous solubility (range for 95% of drugs: -6.0 to 0.5); logS is the concentration of the solute in a saturated solution that is in equilibrium with the crystalline solid. 3) QPlogKHSA: the logarithm of predicted binding constant to human serum albumin (range for 95% of drugs: -1.5 to 1.2). 4) Human Oral Absorption: predicted qualitative human oral absorption: 1, 2, or 3 indicating for low, medium, or high; the assessment uses a knowledge-based set of rules, including checking for suitable values of Percent Human Oral Absorption, number of metabolites, number of rotatable bonds, logP, solubility, and cell permeability [22].

2.2 Data encoding

The graph representation and descriptor of the filtered molecules was generated by using RDKit [20]. The molecule with N_a atoms can be represented by a graph G = {V, E}, where the node v_i∈V, i = 1, 2, …, N_a, is corresponded to the i-th non-hydrogen atom in the molecule candidates, and each edge e_i,i∈E, i,j ∈{1, 2, …, N_a}, is corresponded to a chemical bond between the i-th and the j-th atoms [23]. The descriptor [24] of each atom was represented as a 34-dimension vector, indicating atom type, degree of atom, formal charge, number of radical electrons, hybridization type, aromaticity, number of hydrogen linking atoms, chirality and configuration in molecular feature.

2.3 Model architecture of GraphDPI-3CL

The proposed model is consisting of two major parts, message-passing neural network (MPNN) module [25] and self-attentive bidirectional long short-term memory (BiLSTM_Atten) [26]. The MONN is designed for the semi-supervised node classification problem such as molecular graph representation. The molecule was denoted as graph representation G = {V, E}, each node V represented as a n-dimensional feature vector and each edge E is the set of covalent bonds as an adjacency matrix E ∈ R^{m × m}. The MONN module aggregates all atoms and bonds information at molecule level with the following equation. where H^(l) ∈ R^{n × m}, is the output the l-th hidden layer, W^(l) ∈ R^{n × n} is the weight matrix for the l-th neural network layer, D∈ R^{m × m} is diagonal matrix and () is rectified linear unit (ReLU) activation function, is the identity matrix. In our proposed model, n = 34 and the final output of MONN, O ∈ R^{n × s}, where s is the length of the molecule sequence. After processing of MONN, the output was fed into the module of BiLSTM_Atten with following equation. where is the forward-directed hidden layer output and is the backward-directed hidden layer output, thus is able to collect the bidirectional information dependency between adjacent tokens in a molecule. To visualize the atom contribution for the interaction prediction, we introduce multi-head self-attention mechanism into BiLSTM. The attention block takes the whole LSTM hidden layer H output as input following the equation as where hidden layer number of MLP is 512 in this study and we obtained the weighted sums by multiplying the annotation matrix A^m and LSTM hidden states H. After the molecule feature extraction, we fed the concatenation of last hidden layer of H and the sums of attention score into 2 fully connected layer for classification and employed cross entropy loss function and Adam optimizer for the training. where Θ is the set of all weight matrices and bias vectors, N is the number of molecules in training dataset, λ is L2 regularization for training stride punishment.

GraphDPI-3CL is composed of three encoder layers and three decoder layers (Fig 1), which are instrumental in capturing the complex relationships within the protein-drug interaction data. Each atom is represented in a 64-dimensional vector space, allowing the model to effectively learn the nuances of molecular structures. The model utilizes a multi-head attention mechanism with eight attention heads, facilitating the simultaneous processing of different representation subspaces and enhancing the model’s ability to focus on various parts of the input graph. For the training process, we have chosen a learning rate of 1e-4 and a weight decay of 1e-4 to ensure stable convergence while preventing overfitting. The batch size is set to 8, balancing the computational efficiency with the ability to generalize across the dataset. To further aid in the model’s generalization, we have implemented a dropout rate of 0.2, which helps mitigate the risk of co-adaptation of neurons.

Download:

Fig 1. The framework of machine learning screening and geometric-based affinity prediction.

https://doi.org/10.1371/journal.pone.0299696.g001

2.4 Model comparison

The baseline model has been proposed for performance comparison including deep learning-based model such as DrugVQA [27], TransformerCPI [28] and machine learning-based model such as XGBoost [29], Gradient Boosted Trees. For fair comparison, we adopted simplified version of DrugVQA without protein structure information (DrugVQA-seq) provided by author. TransformerCPI was implemented and pretrained by configuration of author. The machine learning-based methods including XGBoost, Gradient Boosted Trees was performed by RDKit and Knime platform. We employed 5-fold cross-validation strategy for the evaluation of performance. The predictive performance of the model is assessed with the area under receiver operating characteristic (AUROC), Accuracy (Acc), Specificity (Spec), Recall and Matthews correlation coefficient (MCC).

2.5 Interaction visualization

MaSIF [30] is a deep learning framework to capture fingerprints that are important for geometric biomolecular interactions, which is a compound-protein interaction prediction software with high precision. After the large-scale screening, the premilarily positive molecule will be input to MaSIF to visualize the compound-protein interaction and interaction analysis and the MaSIF tool can be assessed at https://github.com/LPDI-EPFL/masif.

3. Results

3.1. Machine learning model

The efficacy of the proposed computational models is delineated in Table 1. Among the models evaluated, GraphDPI-3CL exhibited a superior performance, as evidenced by its attainment of the highest values in key metrics, including the area under the receiver operating characteristic curve (AUROC), specificity (Spec), recall, and Matthew’s correlation coefficient (MCC). In comparison, TransformerCPI was notable for achieving the highest accuracy.

Download:

Table 1. The performance of model.

https://doi.org/10.1371/journal.pone.0299696.t001

GraphDPI-3CL outperformed both deep learning-based models (Fig 2), such as DrugVQA and TransformerCPI, and machine learning-based models, such as XGBoost and Gradient Boosted Trees, across several evaluation metrics. This superior performance of GraphDPI-3CL is particularly evident in its predictive precision and ability to correctly classify both positive and negative instances, as reflected by its high specificity and recall. Leveraging the strengths of GraphDPI-3CL, we conducted large-scale virtual screening using the Zinc Database, incorporating a druggability assessment to refine our search. This process culminated in the identification of the top 15 candidate molecules. Subsequent geometric-based affinity validation and interaction analysis were performed on these candidates, leading to the discovery of four molecules characterized by a high binding affinity for 3CLpro and an optimal binding mode. These results underscore the potential of GraphDPI-3CL as a robust tool in the identification of promising therapeutic compounds. The integration of GraphDPI-3CL into the drug discovery pipeline could significantly enhance the efficiency and accuracy of virtual screening processes.

Download:

Fig 2. The performance metric of the proposed 3CLpro-specific affinity prediction model.

https://doi.org/10.1371/journal.pone.0299696.g002

3.2. Compound-protein interaction visualization

The reliability of our docking methodology was established through a re-docking assessment using the co-crystallized ligands of Mpro. Employing Glide software (v2019.2), we compared the re-docked conformations with their respective co-crystallized counterparts. The root-mean-square deviation (RMSD) values obtained from this comparison were 2.5 Å and 2.8 Å, providing evidence of the docking protocol’s accuracy and reliability [31]. Further validation of our computational approach is evident in the docking results detailed in Table 2. Out of the screened molecular candidates, 15 exhibited strong binding to Mpro, demonstrating notable binding energies. Impressively, 10 of these candidates surpassed the binding affinity of the co-crystallized ligand, N-acetylglucosamine (NAG), suggesting that these molecules possess the potential to act as biologically active ligands against Mpro. This is a significant finding, as it not only validates our docking procedures but also identifies promising candidates for further experimental investigation.

Download:

Table 2. Docking results of top 15 molecule candidates toward M^pro.

https://doi.org/10.1371/journal.pone.0299696.t002

The screening process revealed that the molecular candidates bound to 3CLpro exhibited free binding energy values spanning from -53.096 to -25.819 kcal/mol, which were benchmarked against the co-crystallized ligand NAG, with a known binding energy of -36.592 kcal/mol. Notably, molecules 1 and 4, as depicted in Fig 3, demonstrated particularly favorable binding energies of -52.096 kcal/mol and -49.342 kcal/mol respectively. These values suggest a higher binding affinity to 3CLpro, potentially indicating stronger inhibitory action on the enzyme compared to NAG. The precise binding interactions of these candidates were further elucidated using Glide software’s 2D interaction analysis. This analysis highlighted two critical hydrogen bonds formed with the GLY496 residue, which may significantly contribute to the observed high binding affinity for 3CLpro of SARS-CoV-2. The importance of these interactions is supported by the attention weights from the self-attention layer of our computational model, which showed alignment with the Glide 2D interaction findings. This consistency underscores the model’s ability to capture essential mutual information between the molecule candidates and the target protein, which could offer valuable insights and direction for the optimization of lead compounds.

Download:

Fig 3. Model interpretation analysis reveals GraphDPI-3CL can capture binding affinity weight on ligand and providing insightful guidance for downstream quantitative structure-activity relationship analysis.

https://doi.org/10.1371/journal.pone.0299696.g003

4. Conclusion

The ongoing COVID-19 pandemic continues to exert a profound global impact, underscoring the urgent need for effective vaccines and therapeutic agents. This urgency has galvanized efforts to mine the vast chemical space for safe and efficacious treatments. In our research, we have introduced an innovative attention-based deep learning framework tailored for the analysis of molecular graphs and sequences. Trained on a comprehensive BindingDB dataset specific to 3CLpro that includes 114,555 compounds, our model is designed to effectively discern high-affinity drug candidates. Upon establishing this model, we embarked on a large-scale screening of an extensive in vivo/in vitro dataset, comprising 276,003 compounds from the Zinc Database. To ensure a focused approach, we applied attention mechanisms to rank and visualize the candidate compounds based on their predicted binding affinity, highlighted by their respective attention scores. To validate the predictive outcomes, we employed geometric-based affinity prediction techniques. The culmination of our efforts is the creation of a deep learning framework specialized for 3CLpro, which we have named GraphDPI-3CL. This model boasts a remarkable AUROC score of 0.958, underscoring its predictive prowess. Despite the promising performance of GraphDPI-3CL, the framework is not without its limitations. One significant drawback is its potential to miss potential inhibitors that could be detected through methods that consider the three-dimensional structure of proteins. The model’s current design does not explicitly incorporate chemical bond information, which could lead to less detailed molecular representations. The absence of direct modeling of chemical bond types and configurations within the molecules means that the framework might overlook subtle but critical aspects of molecular interactions and activity, which are essential for a comprehensive understanding of the intricacies involved in molecular binding processes. Moreover, to effectively adapt our model to new protein targets, it is essential to compile a comprehensive dataset of protein-ligand interactions specific to each target, which will be used to retrain the model, leveraging its existing architecture. This retraining enables the model to learn the unique interaction patterns of the new protein. Following this, it is crucial to validate and test the model’s predictions against known data and, where possible, through experimental assays to ensure its predictive accuracy and reliability for the new target.

Our framework has facilitated the identification of 10 molecular candidates that not only possess high binding affinity to 3CLpro but also exhibit optimal binding modes, which are critical for effective inhibition of the viral enzyme. While several universal deep learning-based models have been developed for drug-protein interaction prediction, GraphDPI-3CL sets itself apart by being a bespoke model for 3CLpro, outperforming existing state-of-the-art models in this domain. Consequently, GraphDPI-3CL represents a powerful tool for large-scale screening of molecule candidates targeting 3CLpro, with the potential to expedite the drug discovery journey in the fight against SARS-CoV-2.

References

1. Mody V., et al., Identification of 3-chymotrypsin like protease (3CLPro) inhibitors as potential anti-SARS-CoV-2 agents. Communications biology, 2021. 4(1): p. 1–10.
- View Article
- Google Scholar
2. Jin Z., et al., Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature, 2020. 582(7811): p. 289–293. pmid:32272481
- View Article
- PubMed/NCBI
- Google Scholar
3. Patra S., et al., Emerging molecular prospective of SARS-CoV-2: Feasible nanotechnology based detection and inhibition. Frontiers in Microbiology, 2020: p. 2098. pmid:33193115
- View Article
- PubMed/NCBI
- Google Scholar
4. Kurt Yilmaz N., Swanstrom R., and Schiffer C.A., Improving Viral Protease Inhibitors to Counter Drug Resistance. Trends in microbiology, 2016. 24(7): p. 547–557. pmid:27090931
- View Article
- PubMed/NCBI
- Google Scholar
5. Rathnayake A.D., et al., 3C-like protease inhibitors block coronavirus replication in vitro and improve survival in MERS-CoV–infected mice. Science Translational Medicine, 2020. 12(557): p. eabc5332. pmid:32747425
- View Article
- PubMed/NCBI
- Google Scholar
6. Jin Z., et al., Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature, 2020. 582(7811): p. 289–293. pmid:32272481
- View Article
- PubMed/NCBI
- Google Scholar
7. Alesawy M.S., et al., In silico studies of some isoflavonoids as potential candidates against COVID-19 targeting human ACE2 (hACE2) and viral main protease (Mpro). Molecules, 2021. 26(9): p. 2806. pmid:34068579
- View Article
- PubMed/NCBI
- Google Scholar
8. Issahaku A.R., et al., Characterization of the binding of MRTX1133 as an avenue for the discovery of potential KRASG12D inhibitors for cancer therapy. Scientific Reports, 2022. 12(1): p. 17796. pmid:36273239
- View Article
- PubMed/NCBI
- Google Scholar
9. Rudrapal M., et al., Dual synergistic inhibition of COX and LOX by potential chemicals from Indian daily spices investigated through detailed computational studies. Scientific Reports, 2023. 13(1): p. 8656. pmid:37244921
- View Article
- PubMed/NCBI
- Google Scholar
10. Issahaku A.R., et al., Discovery of Potential KRAS‐SOS1 Inhibitors from South African Natural Compounds: An In silico Approach. ChemistrySelect, 2023. 8(24): p. e202300277.
- View Article
- Google Scholar
11. Ezugwu J.A., et al., Design, Synthesis, Molecular Docking, Molecular Dynamics and In Vivo Antimalarial Activity of New Dipeptide‐Sulfonamides. ChemistrySelect, 2022. 7(5): p. e202103908.
- View Article
- Google Scholar
12. Othman I.M., et al., Toward a treatment of antibacterial and antifungal infections: Design, synthesis and in vitro activity of novel arylhydrazothiazolylsulfonamides analogues and their insight of DFT, docking and molecular dynamic simulations. Journal of Molecular Structure, 2021. 1243: p. 130862.
- View Article
- Google Scholar
13. Baru Venkata R., et al., Utilizing Andrographis paniculata leaves and roots by effective usage of the bioactive andrographolide and its nanodelivery: investigation of antikindling and antioxidant activities through in silico and in vivo studies. Frontiers in Nutrition, 2023. 10: p. 1185236. pmid:37324729
- View Article
- PubMed/NCBI
- Google Scholar
14. Varadi M., et al., AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 2022. 50(D1): p. D439–D444. pmid:34791371
- View Article
- PubMed/NCBI
- Google Scholar
15. Jumper J., et al., Highly accurate protein structure prediction with AlphaFold. Nature, 2021. 596(7873): p. 583–589. pmid:34265844
- View Article
- PubMed/NCBI
- Google Scholar
16. Li F., et al., Effective drug-target interaction prediction with mutual interaction neural network. Bioinformatics, 2022. pmid:35652721
- View Article
- PubMed/NCBI
- Google Scholar
17. Yazdani-Jahromi M., et al., AttentionSiteDTI: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Briefings in Bioinformatics, 2022. 23(4): p. bbac272. pmid:35817396
- View Article
- PubMed/NCBI
- Google Scholar
18. Paul D., et al., Artificial intelligence in drug discovery and development. Drug Discovery Today, 2021. 26(1): p. 80. pmid:33099022
- View Article
- PubMed/NCBI
- Google Scholar
19. Reddy S., Fox J., and Purohit M.P., Artificial intelligence-enabled healthcare delivery. Journal of the Royal Society of Medicine, 2019. 112(1): p. 22–28. pmid:30507284
- View Article
- PubMed/NCBI
- Google Scholar
20. Bento A.P., et al., An open source chemical structure curation pipeline using RDKit. Journal of Cheminformatics, 2020. 12(1): p. 51. pmid:33431044
- View Article
- PubMed/NCBI
- Google Scholar
21. Greenwood J.R., et al., Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. Journal of Computer-Aided Molecular Design, 2010. 24(6): p. 591–604. pmid:20354892
- View Article
- PubMed/NCBI
- Google Scholar
22. Shelley J.C., et al., Epik: a software program for pKaprediction and protonation state generation for drug-like molecules. Journal of Computer-Aided Molecular Design, 2007. 21(12): p. 681–691. pmid:17899391
- View Article
- PubMed/NCBI
- Google Scholar
23. Deng D., et al., XGraphBoost: extracting graph neural network-based features for a better prediction of molecular properties. Journal of chemical information and modeling, 2021. 61(6): p. 2697–2705. pmid:34009965
- View Article
- PubMed/NCBI
- Google Scholar
24. Winter R., et al., Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chemical science, 2019. 10(6): p. 1692–1701. pmid:30842833
- View Article
- PubMed/NCBI
- Google Scholar
25. Gilmer J., et al. Neural message passing for quantum chemistry. in International conference on machine learning. 2017. PMLR.
- View Article
- Google Scholar
26. Vaswani A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30.
- View Article
- Google Scholar
27. Zheng S., et al., Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence, 2020. 2(2): p. 134–140.
- View Article
- Google Scholar
28. Chen L., et al., TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics, 2020. 36(16): p. 4406–4414. pmid:32428219
- View Article
- PubMed/NCBI
- Google Scholar
29. Chen T., et al., Xgboost: extreme gradient boosting. R package version 0.4–2, 2015. 1(4): p. 1–4.
- View Article
- Google Scholar
30. Gainza P., et al., Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 2020. 17(2): p. 184–192. pmid:31819266
- View Article
- PubMed/NCBI
- Google Scholar
31. Kufareva I. and Abagyan R., Methods of protein structure comparison. Methods in molecular biology (Clifton, N.J.), 2012. 857: p. 231–257. pmid:22323224
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Mody V., et al., Identification of 3-chymotrypsin like protease (3CLPro) inhibitors as potential anti-SARS-CoV-2 agents. Communications biology, 2021. 4(1): p. 1–10.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Jin Z., et al., Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature, 2020. 582(7811): p. 289–293. pmid:32272481
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Patra S., et al., Emerging molecular prospective of SARS-CoV-2: Feasible nanotechnology based detection and inhibition. Frontiers in Microbiology, 2020: p. 2098. pmid:33193115
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Kurt Yilmaz N., Swanstrom R., and Schiffer C.A., Improving Viral Protease Inhibitors to Counter Drug Resistance. Trends in microbiology, 2016. 24(7): p. 547–557. pmid:27090931
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Rathnayake A.D., et al., 3C-like protease inhibitors block coronavirus replication in vitro and improve survival in MERS-CoV–infected mice. Science Translational Medicine, 2020. 12(557): p. eabc5332. pmid:32747425
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Jin Z., et al., Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature, 2020. 582(7811): p. 289–293. pmid:32272481
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Alesawy M.S., et al., In silico studies of some isoflavonoids as potential candidates against COVID-19 targeting human ACE2 (hACE2) and viral main protease (Mpro). Molecules, 2021. 26(9): p. 2806. pmid:34068579
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Issahaku A.R., et al., Characterization of the binding of MRTX1133 as an avenue for the discovery of potential KRASG12D inhibitors for cancer therapy. Scientific Reports, 2022. 12(1): p. 17796. pmid:36273239
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Rudrapal M., et al., Dual synergistic inhibition of COX and LOX by potential chemicals from Indian daily spices investigated through detailed computational studies. Scientific Reports, 2023. 13(1): p. 8656. pmid:37244921
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Issahaku A.R., et al., Discovery of Potential KRAS‐SOS1 Inhibitors from South African Natural Compounds: An In silico Approach. ChemistrySelect, 2023. 8(24): p. e202300277.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref11] 11. Ezugwu J.A., et al., Design, Synthesis, Molecular Docking, Molecular Dynamics and In Vivo Antimalarial Activity of New Dipeptide‐Sulfonamides. ChemistrySelect, 2022. 7(5): p. e202103908.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref12] 12. Othman I.M., et al., Toward a treatment of antibacterial and antifungal infections: Design, synthesis and in vitro activity of novel arylhydrazothiazolylsulfonamides analogues and their insight of DFT, docking and molecular dynamic simulations. Journal of Molecular Structure, 2021. 1243: p. 130862.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref13] 13. Baru Venkata R., et al., Utilizing Andrographis paniculata leaves and roots by effective usage of the bioactive andrographolide and its nanodelivery: investigation of antikindling and antioxidant activities through in silico and in vivo studies. Frontiers in Nutrition, 2023. 10: p. 1185236. pmid:37324729
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref14] 14. Varadi M., et al., AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 2022. 50(D1): p. D439–D444. pmid:34791371
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref15] 15. Jumper J., et al., Highly accurate protein structure prediction with AlphaFold. Nature, 2021. 596(7873): p. 583–589. pmid:34265844
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref16] 16. Li F., et al., Effective drug-target interaction prediction with mutual interaction neural network. Bioinformatics, 2022. pmid:35652721
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref17] 17. Yazdani-Jahromi M., et al., AttentionSiteDTI: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Briefings in Bioinformatics, 2022. 23(4): p. bbac272. pmid:35817396
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref18] 18. Paul D., et al., Artificial intelligence in drug discovery and development. Drug Discovery Today, 2021. 26(1): p. 80. pmid:33099022
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref19] 19. Reddy S., Fox J., and Purohit M.P., Artificial intelligence-enabled healthcare delivery. Journal of the Royal Society of Medicine, 2019. 112(1): p. 22–28. pmid:30507284
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref20] 20. Bento A.P., et al., An open source chemical structure curation pipeline using RDKit. Journal of Cheminformatics, 2020. 12(1): p. 51. pmid:33431044
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref21] 21. Greenwood J.R., et al., Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution. Journal of Computer-Aided Molecular Design, 2010. 24(6): p. 591–604. pmid:20354892
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref22] 22. Shelley J.C., et al., Epik: a software program for pKaprediction and protonation state generation for drug-like molecules. Journal of Computer-Aided Molecular Design, 2007. 21(12): p. 681–691. pmid:17899391
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref23] 23. Deng D., et al., XGraphBoost: extracting graph neural network-based features for a better prediction of molecular properties. Journal of chemical information and modeling, 2021. 61(6): p. 2697–2705. pmid:34009965
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref24] 24. Winter R., et al., Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chemical science, 2019. 10(6): p. 1692–1701. pmid:30842833
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref25] 25. Gilmer J., et al. Neural message passing for quantum chemistry. in International conference on machine learning. 2017. PMLR.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref26] 26. Vaswani A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref27] 27. Zheng S., et al., Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence, 2020. 2(2): p. 134–140.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref28] 28. Chen L., et al., TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics, 2020. 36(16): p. 4406–4414. pmid:32428219
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Chen T., et al., Xgboost: extreme gradient boosting. R package version 0.4–2, 2015. 1(4): p. 1–4.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref30] 30. Gainza P., et al., Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 2020. 17(2): p. 184–192. pmid:31819266
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref31] 31. Kufareva I. and Abagyan R., Methods of protein structure comparison. Methods in molecular biology (Clifton, N.J.), 2012. 857: p. 231–257. pmid:22323224
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

Figures

Abstract

1. Introduction

2. Materials and methods

2.1 Data preprocessing

2.2 Data encoding

2.3 Model architecture of GraphDPI-3CL

2.4 Model comparison

2.5 Interaction visualization

3. Results

3.1. Machine learning model

3.2. Compound-protein interaction visualization

4. Conclusion

References