Fig 1.
Voxelization of ligand-binding pockets.
(A) Starting from a ligand-protein complex, a sphere centered on the geometric center of the ligand is created and filled with a 3D grid. Grid points (B) overlapping with the protein, (C) too far away from the protein, and (D) disconnected from the main grid structure are removed. Points to be removed are shown in red in B-D. Subsequently, (E) statistical potentials for ligand-protein interactions are calculated at each grid point and (F) the pocket principal axes are aligned to the Cartesian axes. (G) The voxel representation of a ligand-binding pocket is used as an input in deep learning.
Fig 2.
Structure of a convolutional neural network in DeepDrug3D.
The network consists of (A) an input voxel followed by (B) two convolutional layers with leaky ReLu activation functions, and (C) a series of dropout, pooling, fully connected and softmax layers.
Fig 3.
An example of class-activation map (CAM) grids within ligand-binding sites.
ATP bound to the human TRPV4 ankyrin repeat domain is shown as sticks colored by atom type (C–green, O–red, N–blue, and P–orange). Selected grid points are represented by spheres whose size and color depend on the assigned CAM value according to the scale in the top left corner. Two residues are shown, R248 (orange) forming a hydrogen bond with the ribose moiety of ATP and a more distant residue F272 (purple). A dotted black line represents a hydrogen bond, whereas dotted red lines mark the distance between protein residues and the highest-scoring grid point.
Fig 4.
ROC plots evaluating the performance of various algorithms to classify ligand-binding sites.
DeepDrug3D is compared to volume- and shape-based approaches, as well as a classifier employing the histogram of gradients with principal component analysis (HOG/PCA) for (A) nucleotide- and (B) heme-binding pockets. The x-axis shows the false positive rate (FPR) and the y-axis shows the true positive rate (TPR). The gray area represents a random prediction.
Table 1.
Performance of various algorithms to classify nucleotide-binding sites.
DeepDrug3D is compared to volume- and shape-based approaches, a classifier employing the histogram of gradients with principal component analysis (HOG/PCA), pocket matching with G-LoSA, molecular docking with Vina, and sequence signature detection with ScanProsite. The performance is assessed with the accuracy (ACC), precision (PPV), sensitivity (TPR), specificity (TNR), and the area under the curve (AUC).
Table 2.
Performance of various algorithms to classify heme-binding sites.
DeepDrug3D is compared to volume- and shape-based approaches, a classifier employing the histogram of gradients with principal component analysis (HOG/PCA), pocket matching with G-LoSA, molecular docking with Vina, and sequence signature detection with ScanProsite. The performance is assessed with the accuracy (ACC), precision (PPV), sensitivity (TPR), specificity (TNR), and the area under the curve (AUC).
Fig 5.
ROC plots evaluating the performance of DeepDrug3D and other methods to classify ligand-binding sites.
DeepDrug3D is compared to pocket matching with G-LoSA, molecular docking with Vina, and sequence signature detection with ScanProsite for (A) nucleotide- and (B) heme-binding pockets. The x-axis shows the false positive rate (FPR) and the y-axis shows the true positive rate (TPR). The gray area represents a random prediction.
Fig 6.
ROC plots evaluating the performance of DeepDrug3D against the peptidase dataset.
The performance is assessed individually for five groups of enzymes, serine endopeptidases (EC 3.4.21), cysteine endopeptidases (EC 3.4.22), aspartic endopeptidases (EC 3.4.23), metalloendopeptidases (EC 3.4.24), and threonine endopeptidases (EC 3.4.25). The x-axis shows the false positive rate (FPR) and the y-axis shows the true positive rate (TPR). The gray area represents a random prediction.
Fig 7.
Distribution of class-activation map (CAM) scores for specific ligand-protein interactions.
Three interaction types, hydrogen bonds, aromatic and hydrophobic contacts are reported by the LPC program for (A) nucleotide- and (B) heme-binding pockets. For each interaction type, grid points are divided into two groups, those points in close proximity to residues forming a particular contact and the remaining points that are closer to residues not forming these interactions. The last pair of violins are plotted for grid points randomly assigned into two groups, irrespectively of any ligand-protein interactions. Horizontal blue lines represent median values, and whiskers extend to the most extreme non-outlier data points.
Table 3.
Percentage of binding residues forming specific interactions with ligands.
Binding residues are identified by the class-activation map analysis to be part of the highly discriminative regions of nucleotide- and heme-binding pockets. Three types of interactions reported by the LPC program are considered, hydrogen bonds, aromatic and hydrophobic contacts.
Fig 8.
Two examples of accurately classified ligand-binding pockets.
(A and B) A GDP-binding protein, the signal recognition particle receptor ftsY from E. coli, and (C and D) a heme-binding protein, the C-terminal domain of the S. enterica PduO protein. (A and C) Experimental complex structures and (B and D) close-ups of binding sites with high-scoring class-activation map (CAM) grid points. GDP and heme are shown as green sticks colored by atom type (C–green, O–red, N–blue, P and Fe–orange), whereas grid points are represented by spheres whose size and color depend on CAM values according to the scale shown in Fig 3. Hydrogen bonds are indicated by dashed black lines. Selected binding residues are labeled and colored by the interaction type (hydrogen bond–orange, aromatic–gray, hydrophobic–blue, aromatic and hydrophobic–cyan, hydrogen bond, aromatic and hydrophobic–magenta).