Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening

doi:10.1371/journal.pone.0220113

Fig 1.

Data preparation for model training and testing.

The training set and test set for (A) the singe target CNN model and (B) multi-target CNN model. Blue denotes actives and yellow denotes decoys.

More »

Expand

Fig 2.

The architecture of the CNN model.

Each unit consists of three layers, Pooling, Convolutional and ReLU. The yellow bar labeled FL is the fully connected layer. Further details about the CNN model hyperparameters can be found in reference [41].

More »

Expand

Fig 3.

The performance of receptor-ligand and receptor-ligand-water CNN models in 10 DUD-E targets.

More »

Expand

Fig 4.

Correlation between the performance of the receptor-ligand CNN model and ligand-only CNN model.

The receptor-ligand CNN model was trained on receptor-ligand 3D binding poses, and the ligand-only CNN model was trained on ligand binding poses alone. Each blue dot is a target from DUD-E; there are 102 targets in total.

More »

Expand

Fig 5.

Performance of the receptor-ligand model for the same ligand test sets with and without receptor information.

For each target, red dots indicate performance when the receptor structure was provided in the test set, while blue triangles indicate performance when the receptor structure was replaced by a single dummy atom. The x-axis displays each DUD-E target in the same order as they appear in the DUD-E database (http://dude.docking.org/targets). The targets with even indices are not labeled on the x-axis due to space limitations.

More »

Expand

Fig 6.

The weights and predicted ligand scores of the AA2AR receptor-ligand CNN model.

(A) The average weight put on each atom type in the 32 filters from the first convolutional layer of the AA2AR receptor-ligand CNN model; atom types 0–15 are from the receptor, and atom types 16–34 are from the ligand. (B) Correlation between scores predicted by the AA2AR receptor-ligand CNN model on ligands with vs. without receptor information provided. R² = 0.998.

More »

Expand

Fig 7.

Performance of ligand-trained KNN and CNN models for 102 DUD-E targets.

More »

Expand

Table 1.

The best-K value distribution for 102 ligand-trained KNN models.

More »

Expand

Fig 8.

Actives and decoys are generally distinguishable for DUD-E targets.

(A) The prediction score of actives and decoys in AA2AR as a representative example; (B) Performance of ligand-trained CNN models trained on small sets of five actives and five decoys. The dots represent mean values, and the bars represent standard deviation.

More »

Expand

Fig 9.

Inter-target prediction performance of ligand-only CNN models.

(A) Ligand-only CNN model tested on test sets composed of actives and default decoys (B) Ligand-only CNN model tested on test sets composed of actives and AD decoys. The target order is the same as in DUD-E.

More »

Expand

Fig 10.

Total number of inter-target models that achieved AUC>0.9 for each target in DUD-E.

Targets are partitioned into subsets based on biological families. Targets from a different subset (or non-isoform targets in the "other" subset) are labelled “distinct targets” (blue). Targets in the same subset (except the “other” subset, unless an isoform exists) are labelled “similar targets” (orange). Targets that do not have inter-target high AUC (>0.9) are not shown.

More »

Expand

Table 2.

Ligand-only CNN models that achieved high AUC (greater than 0.9) for COMT.

More »

Expand

Fig 11.

AUC distributions for the ligand-only CNN model (102*102 prediction) across three groups of targets.

(A) Results for models tested using the DUD-E dataset; (B) Results for models tested using the AD dataset. The distributions are normalized such that the area under each distribution curve equals 1.

More »

Expand

Table 3.

The mean and SD of the AUC values across three target groups.

More »

Expand

Fig 12.

Comparison of the performance of multi-target trained receptor-ligand CNN model and ligand-only CNN model.

The two models were trained on 10 targets and tested on the remaining 92 targets. The receptor-ligand CNN model was trained on receptor-ligand 3D binding poses, and the ligand-only mode was trained on ligand binding poses alone. Each black dot represents a target.

More »

Expand

Fig 13.

The AUC value distribution for Vina, Gnina and Pafnucy performed on all 102 DUD-E targets.

Each black dot represents a DUD-E target.

More »

Expand

Table 4.

Summary of Vina, Gnina and Pafnucy performance on DUD-E targets.

More »

Expand

Fig 14.

Pose sensitivity of Vina, Gnina and Pafnucy.

The three models were tested on 100 re-docked poses of ligand XLC (A) and ligand XLD (B). The red asterisk at the RMSD = 0 marks the experimental affinity. Vina predicts free binding energy (ΔG) in kcal/mol; here, we estimated the Ki at 25 Celsius using the equation ΔG = RTlnKi, where R is the gas constant (8.31 J/K·mol).

More »

Expand