Classification and analysis of a large collection of in vivo bioassay descriptions
Fig 8
Confusion matrix and per-class performance measures calculated for one of the random forest classifiers.
The figure shows performance measures calculated for a multiclass random forest classifier that assigns each assay with one of the five most common ATC code combinations—a proxy for the most common disease areas in ChEMBL. The model was built with data visualized on Fig 7; strict partitioning method based on random document split was used to partition the dataset into cross-validation subsets. The model achieved overall prediction accuracy of 0.87. (A) Per-class confusion matrix. (B) Per-class classification report.