Table 1.
Comparison of IC50 and ΔGbind values for CYP 1A2 as determined in-house (inhouse) and gathered from literature sources (lit), expressed in μM and kJ mol-1, respectively.
ΔΔGbind refers to the difference in ΔGbind between literature and in-house determined values.
Table 2.
Comparison between predicted inhibition mechanisms and experimentally determined inhibition mechanisms reported in literature.
Fig 1.
Correlation between calculated (∆GbindCalc) and observed (∆GbindObs) binding free energies obtained for the CYP 1A2 LIE model (Eq 4, α = 0.587 and β = 0.267).
The solid line indicates ideal correlation between ∆GbindObs and ∆GbindCalc, and dashed lines represent deviations between calculated and experimental values of ±5 kJ mol−1 (corresponding to an error well within 1.0 pKi units). Compounds from the training set are represented in black. Test-set compounds that were found to be outlier in 0, 1, 2, and 3 analyses are represented in green, yellow, orange, and red, respectively.
Table 3.
Calculated (ΔGbindCalc) and observed (ΔGbindObs) free energies of binding, and corresponding residuals (ΔGbindObs—ΔGbindCalc) for the training-set compounds (kJ mol-1).
Table 4.
Calculated (ΔGbindCalc) and observed (ΔGbindObs) free energies of binding (kJ mol-1), and residuals (ΔGbindObs–ΔGbindCalc) for the test-set compounds.
Results from the reliability analyses are given as well, where a score 1 in columns (A)-(D) refers to the identification of outliers according to the following analyses: (A) Chemical similarity analysis; (B) Average interaction energy distribution analysis; (C) Ligand-residue electrostatic interaction analysis; (D) Ligand-residue van der Waals interaction analysis. In the last column (Total), the total sum of the number of analyses is reported in which a compound is identified as an outlier.
Fig 2.
Similarity matrix of the data set.
Heat map of the compounds included in the training and test set, colored according to percent similarity expressed in terms of Tanimoto scores (TSs) between pairs of structural fingerprints (white = 100% similarity (TS = 1.00); black = 0% similarity (TS = 0.00)).
Fig 3.
Distribution of ΔVEle and ΔVVdW values (Eq 2) for training-set (black circles) and test-set (white squares) MD simulations.
The dashed line represents the confidence for the 95 percentile of the training set distribution. The simulations from the test set that are not comprised in this interval are labeled according to the corresponding compound ID.
Fig 4.
Per-residue decomposition analysis of the electrostatic interaction energies between the ligand and its surrounding in the protein-ligand simulations.
(A) PCA loading plot for training-set electrostatic interaction energies; (B) Active site of CYP 1A2 from the crystallographic structure; heme group (purple carbon atoms), co-crystallized ligand α-naphthoflavone (yellow carbon atoms), and amino acids with high loading on the first two PCs (in red) are explicitly represented. (C) PCA score plot for the training-set (black circles) and test-set (white squares) compounds for the first two PCs. (D) Orthogonal distance (OD) of the compounds of the training set (black circles) and test set (white squares) from the model with 2 PCs. The dashed horizontal line represents the critical orthogonal distance, calculated for the training-set distribution.
Fig 5.
Per-residue decomposition analysis of the van der Waals interaction energies between the ligand and its surrounding in the protein-ligand simulations.
(A) PCA loading plot for training-set van der Waals interaction energies; (B) Active site of CYP 1A2 from the crystallographic structure; heme group (purple carbon atoms), the co-crystallized ligand α-naphthoflavone (yellow carbon atoms), and amino acids with high loadings in the PCA are explicitly represented. Residues with high positive loadings on the first PC are depicted in green; Residues with high loadings on the second component are also represented, both for positive (blue) and negative values (red). (C) PCA score plot for the training-set (black circles) and test-set (white squares) compounds for the first two PCs. (D) Orthogonal distance (OD) of the compounds of the training set (black circles) and test set (white squares) from the model with 4 PCs. The dashed horizontal line represents the critical orthogonal distance, calculated for the training-set distribution.
Fig 6.
Prediction errors obtained for the external test set compounds.
The compounds were grouped in a category according to the number of occurrences in which they were found to be an outlier according to analyses (A)-(D) in Table 4. Horizontal lines represent the standard error (SDEP) for a given category, while the boxes represent the standard deviation around this average.