Fig 1.
IBR (Informer-Based Ranking) for compound prioritization on a novel target.
From a complete bioactivity data matrix (blue grid), a subset of informer compounds (green stars) are identified from the broader set of compounds (stars) that have been tested against a large set of targets (pink circles). A previously uncharacterized target (red circle) is assayed with just the informer compounds, and the new bioactivity data are used to reveal the new target’s relationship to other targets. The combined data enable activity predictions (purple) on the remaining, non-informer compounds.
Table 1.
Retrieval counts by the various methods on new kinase targets (a) PknB, (b) BGLF4, and (c) ROP18 using PKIS1 or PKIS2 matrices.
The total number of experimentally determined active compounds and distinct active scaffolds is indicated in the total column. The values below each of the IBR methods indicate the number of active informers identified, the number of experimentally determined active compounds that were ranked in the top 10% of predicted active compounds by each method, and the number of unique active scaffolds identified in those top 10%. For a given target, these 10% are the active informers and the top ranking non-informers comprising 10% of the set of all compounds after removing inactive informers.
Table 2.
(a) ROCAUC, (b) NEF10, and (c) FASR10 in Leave-One-Target-Out Cross Validation on PKIS1.
IBR methods were evaluated on 224 PKIS1 targets using standard VS metrics that reflect active retrieval: ROCAUC and NEF10. FASR10 was also evaluated to reflect the chemical diversity of the actives retrieved. All baseline outcomes are shown in S4 Table along with p-values from pairwise comparisons in S5 Table. *The only non-baseline IBR that fails to demonstrate statistical improvement (p <0.0085) over all baselines is CS when using the ROCAUC metric. Note: a Šidák multiple comparison correction was applied using 6 baselines against each non-baseline IBR, lowering the α threshold from 0.05 to 0.0085.
Fig 2.
A comparison of models with respect to compound ranking performance as assessed by ROCAUC values.
Each model was evaluated on 224 targets through PKIS1 leave-one-target-out validation. ROCAUC of 0.5 indicates a random ranking of compounds on a given target; ROCAUC of 1.0 represents ideal ranking with all active compounds prioritized above the inactives. The individual target evaluations are shown as light grey dots with median and interquartile ranges displayed as a white circle and black bars, respectively.
Fig 3.
A comparison of models with respect to compound ranking performance as assessed by active enrichment in the top 10% of ranked compounds.
Each model was evaluated on 224 targets through PKIS1 leave-one-target-out validation. NEF10 represents the fold-enrichment of actives in top 10% above random that is normalized by dividing by the maximum theoretical fold-enrichment that could be achieved at the 10% threshold for the target of interest.
Fig 4.
A comparison of models with respect to the structural diversity of the active compounds retrieved.
Each model was assessed by FASR10 evaluations on 224 targets through PKIS1 leave-one-target-out validation. The FASR10 metric is the fraction of the total identified active molecule scaffolds, for the target of interest, that were identified in the top 10% of the ranked compounds on that target. Compounds are grouped by their generic (all-carbon skeletons) representations of Bemis-Murcko scaffolds.