Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

doi:10.1371/journal.pcbi.1005929

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

Fig 6

Assessment of performance of the model on samples with elements that are rare in the data sets.

For the four data sets PDBBind v2007, v2013, v2015, and v2016 [99], and for each element, the testing set is the subset of the original core sets with only ligands that contain atoms of the particular element type. The features used are features with ID = 7 in Table 10. The reported RMSE is the average taken over the four data sets. Experiment 1: Training set is the original training set and all the features are used. Experiment 2: Training set is the original training set and only features that do not involve the particular element are used. Experiment 3: Training set is the original training set excluding the samples that contain atoms of the particular element type and all features are used. For most of the elements, experiment 1 achieves the best result and experiment 3 yields the worst performance.

doi: https://doi.org/10.1371/journal.pcbi.1005929.g006