Figure 1.
The concept of multiple binding sites on a single protein visualized schematically (A) and in protein data bank structure 1JSU (B).
The ATP binding site was shown in green on cyclin dependent kinase 2 (CDK2) (grey), commonly referred to as the orthosteric binding site. One allosteric binding site (type V inhibitors) was shown in red, closely located to the orthosteric binding site. Also shown was a non-allosteric inhibitor (green, projected from PDB 1HCK) and an allosteric inhibitor (red, projected from PDB 3PY1). Finally cyclin-A was visualized pairing (light grey) with CDK2 and the natural inhibitor CDKN1B (dark green) to show the potential of allosteric inhibitors to disrupt the CDK2- cyclin-A protein-protein interaction.
Table 1.
Data set composition.
Figure 2.
Distribution of retrieved allosteric and non-allosteric publications sorted per year.
Overall the allosteric records made up a small fraction of the total records in ChEMBL-14. However a slight upward trend was seen. Note that the y-axis is logarithmic.
Figure 3.
L2 target class distribution of both the allosteric (A) and non-allosteric data (B) sets.
The distribution of the target classes differed between the two sets; which confirmed that targets that are easy to hit via non-allosteric inhibitors are not necessarily easy to hit via an allosteric modulator and vice versa. Abbreviations: 7TM1 - Class A GPCRs, 7TM2 - Class B GPCRs, 7TM3 - Class C GPCRs, IP3 - Inositol triphosphate receptors, KIR - Killer-cell Immunoglobulin-like Receptors, LGIC - Ligand Gated Ion Channels, RYR - Ryanodine Receptors, SUR - Sulfonylurea Receptors, TRP - Transient receptor potential channels, VGC - Voltage Gated Ion Channels.
Figure 4.
(A) Scatter plots showing the molecular weight (x), LogD (y) and adherence to the rule of 5 (color) of allosteric and non-allosteric compounds.
The allosteric compounds represented a subset of the non-allosteric ligands; this image was conserved among most different target classes. (B) Scatter plots showing the molecular polar surface area fraction (x), solubility (y) and activity (color; pKi, pKd, pIC50, pEC50,). The area of high activity was observed to be narrower in the allosteric set versus than non-allosteric set. The non-allosteric compounds could display high affinity along a broader range of both properties.
Figure 5.
Mean value (and standard deviation) of several physicochemical properties calculated for both allosteric and non-allosteric ligands of Class B GPCRs.
To plot all properties within one order of magnitude, a number of properties were scaled, dividing the mean value by 10 (e.g. logP) or by 1000 (e.g. molecular weight). Differences occurred for properties related to size (e.g. molecular weight, number of chains, number of hydrogen bond acceptors). However properties that were not correlated to size showed smaller differences (e.g. fraction of carbon). Note that the allosteric compounds were more rigid (higher sp2 hybridized carbon fraction, higher aromatic bonds fraction, higher rigidity index). For the full figure see supporting Figure S3.
Table 2.
Bioactivity measurements for allosteric and non-allosteric compounds.
Figure 6.
(A) Receiver Operator Characteristics (ROC) curve for out-of-bag validation of the allosteric classifier trained on 70% of the allosteric and balanced orthosteric set demonstrated good performance.
(B) External validation on the remaining 30% of the data set confirmed good predictive performance.
Table 3.
Examples of allosteric models for balanced data sets of L0, L1, and L2 groups.
Figure 7.
ROC curves for out-of-bag validation of the allosteric classifier models trained in case studies 2–4.
(A) ROC curve for the HIV-RT classifier. (B) ROC curve for the adenosine receptors classifier. (C) ROC curve for the Protein Kinase B classifier (note that here a ternary model was used as opposed to a binary model).
Table 4.
Overview of allosteric models used in the case studies.
Table 5.
Binary classification confusion matrix.
Table 6.
Ternary classification confusion matrix.