Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Protein ligand binding site prediction using graph transformer neural network

  • Ryuichiro Ishitani ,

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Software, Supervision, Writing – original draft, Writing – review & editing

    r.ishitani@tmd.ac.jp

    Affiliations Division of Computational Drug Discovery and Design, Medical Research Institute, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan, Preferred Networks, Inc., Chiyoda-ku, Tokyo, Japan

  • Mizuki Takemoto,

    Roles Investigation, Writing – review & editing

    Affiliation Division of Computational Drug Discovery and Design, Medical Research Institute, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan

  • Kentaro Tomii

    Roles Conceptualization, Funding acquisition, Methodology, Validation, Writing – review & editing

    Affiliation Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan

Abstract

Ligand binding site prediction is a crucial initial step in structure-based drug discovery. Although several methods have been proposed previously, including those using geometry based and machine learning techniques, their accuracy is considered to be still insufficient. In this study, we introduce an approach that leverages a graph transformer neural network to rank the results of a geometry-based pocket detection method. We also created a larger training dataset compared to the conventionally used sc-PDB and investigated the correlation between the dataset size and prediction performance. Our findings indicate that utilizing a graph transformer-based method alongside a larger training dataset could enhance the performance of ligand binding site prediction.

Introduction

The identification of the compound binding site of the target protein is the first step in structure-based drug design. In this context, “binding site prediction” is the task of predicting the binding site of its ligand or other compounds on the target protein surface [14]. This is especially important when there is no experimental information available about the binding site of its substrate or ligand. This method is also important in the case of targeting sites other than the original substrate binding site. Discovering the allosteric binding site that modulates the activity of the target protein is one of the important tasks in drug discovery [5]. In those cases, binding site prediction is performed first, and then, compounds that bind to the predicted binding sites are designed in a structure-based manner.

Due to the importance of the binding site prediction task, many previous studies have been carried out over several decades, and various methods have been investigated [1]. They can be roughly divided into five categories [6]: geometric [7, 8], energetic [9], conservation-based [10, 11], template-based [12, 13], and machine learning (ML)/knowledge-based methods [6, 1417]. Recently, ML-based methods have been well-studied. They can be further divided into two types. The first type includes methods that predict a pocket directly from the target protein structure. In that method, they compute descriptor features for points near the target protein or points on the protein surface, and then directly predict whether the points can form a binding site using these features as input to ML. These include P2Rank [6] and DeepSite [14]. The second type includes a hybrid of rule-based and ML-based methods. In them, candidate binding sites are predicted by a rule-based method and then ranked by an ML score to select the true positive sites. This is based on the observation that the binding sites of ligands or small compounds are often located on the concave surface of the protein. It has been shown that the rule-based methods using protein surface geometry can successfully detect the candidates of the positive binding sites with a high recall score [15]. However, as these candidate sites usually include many false positive sites, it is not an easy task to predict true positive sites from them. The program DeepPocket is an example of this type of method [15].

Alternatively, the ML-based methods can be divided into two types, depending on the ML algorithm used, i.e., neural-network (NN) based and non-NN-based methods. The latter includes the program P2Rank (and PRank) that uses the random forest as an ML algorithm. The NN-based methods, especially deep learning (DL)-based methods, have been well studied recently. For example, the programs DeepSite [14], Kalasanty [17], and DeepPocket [15] exploit a 3D convolutional neural network (CNN) to predict the binding sites using a voxelized protein structure as an input.

Recently, methods based on graph convolutional NNs (GCNs) with the property of roto-translation invariance (or equivariance), which can be generalized as transformer-based models [1822], have successfully been exploited for the protein structure prediction and design tasks [2325], attracting attention in the field of computational structural biology. In this study, we tried to apply this roto-translation invariant NN to the problem of the binding site prediction. The graph transformer-based model was used in the ML part of the hybrid method, and then its performance was compared with that of previous studies, including 3D CNN-based models. In addition, we measured the effect of the size of the training dataset, by constructing a larger dataset than the sc-PDB [26], which was commonly used in previous studies.

Methods

Data preprocessing

To apply the machine learning (ML) methods to the binding site detection problem, there are two possible settings: one is to predict the sites directly from the protein structure, and the other is to predict the candidates using a rule-based method and then rank these candidates by the ML-based method. In this study, we adopted the latter one, i.e., a hybrid of the rule-based and ML methods. For the rule-based method, we used the program Fpocket (version 4.0), which detects the concave surface of protein using the geometry of the protein surface structure and property [7]. As shown by previous study [15], the program Fpocket can predict possible ligand binding sites with high recall scores, i.e., it outputs a binding site that contains the correct answer, but the output contains many false-positive pockets. In this study, we tried to predict true positive pockets with high accuracy from the output of the program Fpocket.

We preprocessed the protein structures to create the input data for the neural network (NN) as follows. First, to calculate possible pockets, Fpocket was run on protein structures after the removal of water and bound ligand molecules, etc. Then, we selected the protein amino acid residues within 10 Å around all pocket vertices (alpha spheres) calculated by Fpocket to construct the pocket residues. The input graph G = (V, E) was constructed from the pocket residues as follows: node V = {vi} is defined for each Cα atom αi, where i is the index of the pocket residues, and then edge E = {ei,j} is added between nodes vi and vj where the distance between atoms ai and aj (ij) is within 25 Å. For the training dataset, grand truth labels were assigned using a commonly used criterion, DCA [27], which is defined as the minimum distance between the pocket barycenter and any ligand atom of the experimental structure. Specifically, the pockets with DCA values less than 4 Å were labeled as true samples, otherwise false samples. Here, it should be noted that all structures in the training dataset were selected to contain more than one bound ligand to ensure that the labels can be assigned using these bound ligands.

Model

Using the pocket residue data defined above, we constructed a NN model with a hidden dimension of d to classify the Fpocket’s prediction as true or false positive pockets. The node feature for node vi is calculated as follows. where Eaa∈ℝ21,d is the weight matrix, and aa(∙) is a function to return the one-hot representation ({0,1}21) of the amino acid residue of the given node, including naturally occurring 20 amino acids and an unknown token. For the model using the SASA (solvent-accessible surface area) feature of the node, is calculated as follows. where k = [0, d) denotes the index of the feature vector, si is the SASA value for the residue corresponding to the node vi. The hyperparameters for the SASA embedding (γ and μk) was determined so that the centers of the radial basis function are equally located in the range of 0 Å2 and 350 Å2. The upper limit of the SASA embedding (350 Å2) was determined based on the SASA distribution of the amino acid residues in the dataset (S1 Fig). The edge feature is calculated as follows. where ri is the Cartesian coordinates of the atom corresponding to the node vi. The hyperparameters for the distance embedding (γ and μk) was determined so that the centers of the radial basis function are equally located in the range of 3–25 Å.

Using these input feature vectors ( and eij), the node feature vectors ( where l = [0, L)) were updated using the graph transformer [28] with a hidden dimension of d defined as follows (Fig 1).

thumbnail
Fig 1. The ligand binding site prediction model proposed in this paper.

A) Schematic diagram of the procedure for the prediction. A pocket vertex is calculated by processing the input protein structure with Fpocket. The graph is generated from Cα atoms within 10 Å around the pocket vertex, which is then used as input to NN to calculate the ligand-binding site score. B, C, D) Schematic diagrams of the graph transformer neural network. Green boxes represent weight tensors for query (Q), key (K), value (V), edge (E), and output (O). Magenta boxes represent graph transformer (GT) and multi-head attention (MHA) modules, respectively. The details of GT and MHA modules are depicted in panels (C) and (D), respectively.

https://doi.org/10.1371/journal.pone.0308425.g001

The query, key, and value projections of the node feature vectors were calculated using the weight matrices , respectively, as in the original transformer paper [29] (Fig 1C). Note that dh is the dimension of a head satisfying the following relation dhH = d, where H is the number of heads.

The edge projection of the edge feature vector eij was also calculated using the weight matrix (Fig 1C).

Using these key, query, and edge projections () the attention was calculated as follows.

The output of the above softmax function was clamped between –5 and 5 for numerical stability. The attention was calculated only for the node pairs (i, j) directly connected by the edges. Then, the output of the multi-head attention was calculated using the value projections and the attentions calculated above.

The outputs of the heads were concatenated and passed to the feed forward network and normalization layers including skip connections to calculate the final output of the graph transformer layer : where h = [0, H) denotes the index of heads, Linear(∙) a fully connected linear layer, softmaxj(∙) the softmax function over the index j, concath(∙) concatenation of the vectors over the index h, Drop(∙) a drop-out layer [30], α(∙) an activation function, and Norm(∙) a layer-wise normalization layer [31]. We used the rectified linear unit (ReLU) as the activation function, i.e., α(x) = max(x, 0). While the node features () were updated by each layer as above, the edge features were not updated, since almost no performance gain was observed. The resulting graph transformer layers were stacked L times to calculate the final feature vector (Fig 1B). Then, the hidden vector h for the pocket graph G was calculated using the pooling of the hidden vectors . We tried pooling methods suggested in previous studies [28, 32, 33] and found that sum pooling yielded the best results in our current study.

Finally, the loss function ℒ for the binary classification task is calculated as follows. where ϕ is a multi-layer perceptron (MLP) with ReLU activation function, σ is a sigmoid function, and y*∈{0,1} is a ground truth label of the corresponding pocket. Overall, the loss function ℒ satisfies the E(3) invariance according to the translation and rotation of the input atom coordinates.

Training

The loss function ℒ defined above was minimized for the training dataset using the Adam optimizer [34]. Cosine annealing with a warm-up with a period of 25 epochs and a maximum learning rate of 2×10−4 was used to schedule the learning rate of the optimizer. The batch size was 128, and 300 epochs of training were performed in total. The model with the highest PR-AUC (area under the precision-recall curve) value for the validation dataset was saved as the best model during the training.

Data augmentation

Although the candidate pockets generated by the program Fpocket contain true positives, the majority of them are false positives, thereby making the dataset highly imbalanced. Therefore, if the dataset is used as is for training, it is difficult to obtain models with high recall and precision. To mitigate this label imbalance and obtain a model with good performance, the true-label samples were augmented to a level equal to the number of false-label samples. However, simply repeating the true-label samples in the training process could result in overfitting. In this study, we tried to suppress this overfitting problem by adding noise to the training data. More specifically, we tried to prevent overfitting by adding noise to the dataset in the following ways:

  1. Adding normal distributed noise to the Cartesian coordinate values of atom positions (ri) according to the following formula. where and σpos is a hyperparameter that controls the strength of the positional noise.
  2. Randomly dropping/duplicating nodes by sampling (1−σnode)∙Norig nodes with replacement from the original Norig nodes in the dataset (node dropping), where and σnode is a hyperparameter that controls the node dropping.
  3. Adding normal distribution noise to the SASA values according to the following formula. where σSASA is a hyperparameter that controls the strength of the SASA noise.

Datasets

In this study, we created two datasets of different sizes to compare performance along with the model size. The first one is a dataset based on the sc-PDB v.2017 database [26] containing 16,247 PDB entries. Next, a test set was created to evaluate the model performance. We here used a union of the coach420 and holo4k datasets [6] as the test set, which were commonly used in previous studies [6, 15]. To prevent leakage between the training/validation and test sets, amino acid sequences with more than 50% sequence identity to those of the test set proteins were excluded from the training/validation set (per previous studies [15]). As a result, 7,710 PDB entries were removed from the original dataset, leaving 8,537 PDB entries in the training/validation set. This dataset was processed by the program Fpocket [7], resulting in 276,531 pocket candidates with about 7.1% positives. The samples were randomly split into five parts and a 5-fold cross-validation was performed.

The second is a dataset based on the PoSSuM database [35]. Among the known ligand-binding sites deposited in PoSSuM database, we extracted the entries that bind ligands that appeared in the sc-PDB dataset. As a result, the dataset contains 37,067 PDB entries. As in the case of the sc-PDB dataset, we used a union of coach420 and holo4k dataset as the test set [6]. The entries with sequence identity greater than 50% against the test set proteins were removed from the training/validation set. As a result, 22,599 PDB entries remained. This dataset was processed using Fpocket, resulting in 729,853 pocket candidates with about 6.3% positives. The samples were randomly split into five parts and a 5-fold cross-validation was performed.

Results

To evaluate the performance on imbalanced datasets, we compared the PR-AUC value as the evaluation metric. We also compared ROC-AUC and “Top-(n + i) success rate” to compare with previous methods. The definition of the Top-(n + i) success rate followed the previous study [15]. For the evaluation of the overall performance, we calculated above metrics for the prediction results on the test set (i.e., coach420 and/or holo4k dataset) to avoid leakage from the training dataset. The ensemble average of the outputs from the five models by the 5-fold cross-validation was used as the prediction.

At first, we trained the model with baseline hyperparameters (Table 1) on the sc-PDB dataset and evaluated its performance. The resulting model size is ~1.28 M parameters. As expected, the case without data balancing on the training dataset, i.e., using the dataset as is for training (Unbal; Table 1), resulted in a model with poor performance. In particular, the loss function on the validation set only decreased for the first few epochs, and after that, it increased significantly (Fig 2A). Although ROC-AUC exhibited a high value of about 0.93, the PR-AUC value reached a plateau at around 0.6 (Fig 2B), indicating that the model’s performance on predicting positive binding sites is not high. Next, we evaluated the performance of the trained model against the test dataset. Like the performance against the validation set, the PR-AUC value is not high, indicating poor performance on the positive binding sites prediction (Table 2, Fig 3).

thumbnail
Fig 2. The learning curves for the models trained in this study.

A) Loss, B) PR-AUC, and C) ROC-AUC values for the validation dataset were plotted. The mean of the 5 models resulting from the cross-validation is plotted as a solid line, while the values for each model as a transparent line. Please refer to Table 1 for the abbreviations used in the figure legends.

https://doi.org/10.1371/journal.pone.0308425.g002

thumbnail
Fig 3.

Plots of precision-recall (PR) and receiver-operating-characteristic (ROC) curves against the test dataset (A: coach420, B: holo4k, and C: holo4k+coach420).

https://doi.org/10.1371/journal.pone.0308425.g003

thumbnail
Table 1. Summary of the experiments performed in this work.

https://doi.org/10.1371/journal.pone.0308425.t001

thumbnail
Table 2. Summary of the PR-AUC and ROC-AUC metrics for the test datasets.

https://doi.org/10.1371/journal.pone.0308425.t002

Next, the model with the same hyperparameters was trained with label balancing (Bal; Table 1). The results showed that the PR-AUC values for the validation dataset improved slightly during the training epochs, and the best PR-AUC value is slightly improved (~0.62) as compared to the case without balancing (Fig 2B). However, similar to the case without balancing, the loss function for the validation set tends to increase after the first few epochs (Fig 2A). The inference result for the test dataset showed almost the same performance as that without label balancing in terms of PR-AUC and success rate (Figs 3 and 4). The negative effect of overfitting to the same true-positive samples may be stronger than the effect of label balancing.

thumbnail
Fig 4.

Plots of success rates (i = 0 ~ 6) against the test dataset (A: coach420, B: holo4k, and C: holo4k+coach420).

https://doi.org/10.1371/journal.pone.0308425.g004

Next, the model with the same hyperparameters was trained including positional noise addition and node dropping, in addition to the label balancing (Bal+aug; Table 1). The tuning of the hyperparameters controlling the noise amounts (σpos and σnode) were performed and we found that σpos = 0.5 and σnode = 0.03 gives the best result. The loss function for the validation set gradually decreased over 300 epochs (Fig 2A), and the PR-AUC value increased over 0.75. Accordingly, the PR-AUC value against the test dataset significantly increased (Fig 3, Table 2). The success rate also showed improvements as compared to those of other NN-based methods (Table 3). The data augmentation by noise addition may be effective in suppressing overfitting to the true-positive samples.

thumbnail
Table 3. Summary of the success rates for the test dataset including the results from previous studies.

The success rate values of the previous studies are taken from ref. [15].

https://doi.org/10.1371/journal.pone.0308425.t003

In addition, the SASA features were added to the training dataset (Bal+aug+SA; Table 1). Noise was also added to the SASA features as described in the Methods section. The tuning of the noise amount for the SASA values were also performed, and we found that σSASA = 0.3 gives the best result. The training results showed that the PR-AUC value for the validation dataset was significantly improved and reached 0.8 (Fig 2B). The result of the test dataset inference also showed that the model performance improved in both PR-AUC and success rate (Figs 3 and 4, Table 2). These results seem to be reasonable because previous studies have shown that the SASA is an important feature for the ligand binding site prediction [6].

Next, the PoSSuM dataset, which is about 2.6 times larger dataset than the sc-PDB dataset, was used to train in the same conditions, including the model size, label balancing, noise addition, and SASA features. (PoSSuM/M; Table 1). As a result, improvement was observed in both PR-AUC values to the validation and test dataset (Figs 2B and 3) as well as success rate (Fig 4). We also trained a larger model with ~7.34 M parameters using the PoSSuM dataset. (PoSSuM/L; Table 1). A slight improvement in PR-AUC for validation and test datasets was observed (Figs 2B and 3); however, the increase in the success rate is limited to a marginal level (Fig 4). The capacity of the baseline model is likely sufficiently large even for the PoSSuM dataset, and thus increasing the model size did not significantly improve performance.

Finally, when compared in terms of success rate, the best model performed better than the previous methods, including other NN-based methods such as DeepPocket (Table 3). In contrast, the ROC-AUC values (Table 2) do not improve significantly compared to DeepPocket (0.951 for holo4k); however, it should be considered that the ROC-AUC value itself is not an appropriate criterion to evaluate models trained using an imbalanced dataset. The examples of the prediction by the best model (PoSSuM/L) and DeepPocket are shown in Fig 5. In these examples, DeepPocket failed to predict the correct site, while PoSSuM/L successfully predicted the correct site with high contrast.

thumbnail
Fig 5. Examples of the ligand binding site prediction.

The prediction results for the protein structures of (A) SET7/9 lysine methyltransferase (PDB ID: 1N6A) and (B) influenza virus neuraminidase (PDB ID: 1IVE) in the test dataset by the model in this study (PoSSuM/L) and DeepPocket [15] are shown in the left and center panels, respectively. The pocket vertices calculated by Fpocket were shown by spheres colored from white to red according to the output values (0–1) of the NN model. The actual positions of the ligand in the crystal structures were shown in the right panels.

https://doi.org/10.1371/journal.pone.0308425.g005

Discussion

In this study, we built a model for ligand binding site prediction using a combination of the rule-based method and the graph transformer-based NN model. We found that data augmentation of the graph structure, including the addition of noise to the atom positions and the random removal and/or addition of graph nodes, is crucial to avoid overfitting to the highly imbalanced training datasets. We also created a dataset based on the PoSSuM database [35] and examined the effect of increasing model size and dataset size. The best model showed improved performance in terms of success rate compared to methods in previous literature, including other NN-based and rule-based methods (Table 3).

The GCNs, including the graph transformer used in this study, have several advantages over the 3D-CNN used in previous research. First, 3D-CNNs have several hyperparameters for input features, such as the extent of voxelization region and voxel resolution, while GCNs do not have such hyperparameters since they can directly input protein atoms as point clouds. Furthermore, the use of an E(3)- or SE(3)-invariant GCN eliminates the need for augmentation of the rotation and translation of the input data, thereby allowing an efficient training process. Second, GCNs can utilize a variety of atom and residue features, including discrete and continuous values, as node properties. For example, the use of high-dimensional output vectors of the protein language models [36] as the node feature of specific amino acid residues would be even possible. In the 3D CNN-based models, input features (e.g., atom types) are encoded as different channels, and thus it would be difficult to efficiently use continuous or multi-dimensional values as input features. In contrast, there are also advantages of the 3D-CNN-based models. For example, DeepPocket, which is one of the 3D-CNN-based methods, contains a model to predict the shape of the binding sites [15]. In this method, the shape of the binding site is predicted by segmentation of volume data using the U-Net-like model [37]. Although recent works applied GCN-based models to the image segmentation tasks [38], it may not be straightforward and performance-effective to apply the GCN-based models for the 3D voxel segmentation around the binding sites.

Concerning the dataset size, larger datasets were shown to contribute to better prediction performance in the current problem setting (Fig 3). The available databases such as sc-PDB [26] and PDBBind [39] only include curated PDB entries. The inclusion of a wide range of uncurated PDB entries that are bound to small molecules may enable the construction of larger datasets and contribute to improved prediction performance. Recently, the latest version of PoSSuM provides a pocket database that includes the predicted structures [40]. Data augmentation through the addition of such non-experimental structures may further improve prediction performance.

References

  1. 1. Zhao J, Cao Y, Zhang L. Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J. 2020;18: 417–426. pmid:32140203
  2. 2. Konc J, Janežič D. Binding site comparison for function prediction and pharmaceutical discovery. Curr Opin Struct Biol. 2014;25: 34–39. pmid:24878342
  3. 3. Zheng X, Gan L, Wang E, Wang J. Pocket-based drug design: exploring pocket space. AAPS J. 2013;15: 228–241. pmid:23180158
  4. 4. Pérot S, Sperandio O, Miteva MA, Camproux A-C, Villoutreix BO. Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discov Today. 2010;15: 656–667. pmid:20685398
  5. 5. Sheik Amamuddy O, Veldman W, Manyumwa C, Khairallah A, Agajanian S, Oluyemi O, et al. Integrated Computational Approaches and Tools forAllosteric Drug Discovery. Int J Mol Sci. 2020;21. pmid:32013012
  6. 6. Krivák R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform. 2018;10: 39. pmid:30109435
  7. 7. Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics. 2009;10: 168. pmid:19486540
  8. 8. Kawabata T. Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins. 2010;78: 1195–1211. pmid:19938154
  9. 9. Ngan C-H, Hall DR, Zerbe B, Grove LE, Kozakov D, Vajda S. FTSite: high accuracy detection of ligand binding sites on unbound protein structures. Bioinformatics. 2012;28: 286–287. pmid:22113084
  10. 10. Tsujikawa H, Sato K, Wei C, Saad G, Sumikoshi K, Nakamura S, et al. Development of a protein-ligand-binding site prediction method based on interaction energy and sequence conservation. J Struct Funct Genomics. 2016;17: 39–49. pmid:27400687
  11. 11. Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009;5: e1000585. pmid:19997483
  12. 12. Gao J, Zhang Q, Liu M, Zhu L, Wu D, Cao Z, et al. bSiteFinder, an improved protein-binding sites prediction server based on structural alignment: more accurate and less time-consuming. J Cheminform. 2016;8: 38. pmid:27403208
  13. 13. Hung LV, Caprari S, Bizai M, Toti D, Polticelli F. LIBRA: LIgand Binding site Recognition Application. Bioinformatics. 2015;31: 4020–4022. pmid:26315904
  14. 14. Jiménez J, Doerr S, Martínez-Rosell G, Rose AS, De Fabritiis G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics. 2017;33: 3036–3042. pmid:28575181
  15. 15. Aggarwal R, Gupta A, Chelur V, Jawahar CV, Priyakumar UD. DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks. J Chem Inf Model. 2022;62: 5069–5079. pmid:34374539
  16. 16. Cui Y, Dong Q, Hong D, Wang X. Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinformatics. 2019;20: 93. pmid:30808287
  17. 17. Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Improving detection of protein-ligand binding sites with 3D segmentation. Sci Rep. 2020;10: 5035. pmid:32193447
  18. 18. Khemani B, Patil S, Kotecha K, Tanwar S. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. Journal of Big Data. 2024;11: 1–43.
  19. 19. Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019;6: 11. pmid:37915858
  20. 20. Xu X, Zhao X, Wei M, Li Z. A comprehensive review of graph convolutional networks: approaches and applications. Electronic Research Archive. 2023;31: 4185–4215.
  21. 21. Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A Comprehensive Survey on Graph Neural Networks. IEEE Trans Neural Netw Learn Syst. 2021;32: 4–24. pmid:32217482
  22. 22. Min E, Chen R, Bian Y, Xu T, Zhao K, Huang W, et al. Transformer for Graphs: An Overview from Architecture Perspective. arXiv [cs.LG]. 2022. Available: http://arxiv.org/abs/2202.08455
  23. 23. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596: 583–589. pmid:34265844
  24. 24. Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620: 1089–1100. pmid:37433327
  25. 25. Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373: 871–876. pmid:34282049
  26. 26. Desaphy J, Bret G, Rognan D, Kellenberger E. sc-PDB: a 3D-database of ligandable binding sites—10 years on. Nucleic Acids Res. 2015;43: D399–404. pmid:25300483
  27. 27. Chen K, Mizianty MJ, Gao J, Kurgan L. A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure. 2011;19: 613–621. pmid:21565696
  28. 28. Dwivedi VP, Bresson X. A generalization of transformer networks to graphs. arXiv [cs.LG]. 2020. Available: https://github.com/graphdeeplearning/graphtransformer
  29. 29. Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. Adv Neural Inf Process Syst. 2017; 5998–6008.
  30. 30. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv [cs.NE]. 2012. Available: http://arxiv.org/abs/1207.0580
  31. 31. Ba JL, Kiros JR, Hinton GE. Layer Normalization. arXiv [stat.ML]. 2016. Available: http://arxiv.org/abs/1607.06450
  32. 32. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural Message Passing for Quantum Chemistry. arXiv [cs.LG]. 2017. Available: http://arxiv.org/abs/1704.01212
  33. 33. Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T, Aspuru-Guzik A, et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints. arXiv [cs.LG]. 2015. Available: http://arxiv.org/abs/1509.09292
  34. 34. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv [cs.LG]. 2014. Available: http://arxiv.org/abs/1412.6980
  35. 35. Ito J-I, Ikeda K, Yamada K, Mizuguchi K, Tomii K. PoSSuM v.2.0: data update and a new function for investigating ligand analogs and target proteins of small-molecule drugs. Nucleic Acids Res. 2015;43: D392–8. pmid:25404129
  36. 36. Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A. 2021;118. pmid:33876751
  37. 37. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv [cs.CV]. 2015. Available: http://arxiv.org/abs/1505.04597
  38. 38. Aflalo A, Bagon S, Kashti T, Eldar Y. DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering. arXiv [cs.CV]. 2022. Available: http://arxiv.org/abs/2212.05853
  39. 39. Wang R, Fang X, Lu Y, Yang C-Y, Wang S. The PDBbind database: methodologies and updates. J Med Chem. 2005;48: 4111–4119. pmid:15943484
  40. 40. Tsuchiya Y, Yonezawa T, Yamamori Y, Inoura H, Osawa M, Ikeda K, et al. PoSSuM v.3: A Major Expansion of the PoSSuM Database for Finding Similar Binding Sites of Proteins. J Chem Inf Model. 2023;63: 7578–7587. pmid:38016694