Enabling interpretable machine learning for biological data with reliability scores
Fig 2
The SRS is lower when an instance’s class is excluded from training.
SWIF(r) was trained using each combination of two classes in the wheat morphology dataset [50] (see Methods). The trained model was then tested on instances from all three classes. A) Distribution of the training and testing data across all seven attributes: Perimeter, Area, Kernel Length, Compactness, Kernel Width, Asymmetry and Length of Kernel Groove. Note that wheat 1 (blue circles) generally has trait values that are intermediate to the other two classes. See S1 Fig for additional views into the raw dataset. B) Histograms show the distribution of the SRS for instances of each class. In each case, the unknown class has a more negative distribution of SRS compared to the two known classes. This is true even for wheat 1 (blue), which has attributes that are intermediate in value as compared to the other classes (see S1 Table for p-values). C) Data graphed by two attributes (Kernel Length and Compactness). Left: full data set colored by true labels. Right: colored by SWIF(r) probability (top) and SRS (bottom). SWIF(r) scores were generally greater than 90% for one of the two trained classes, even for instances from the unknown class, with just a handful of points receiving intermediate values from SWIF(r) (black diamonds). In contrast, coloring by SRS shows that points associated with the unknown class (larger dots) tend to have lower SRS, while points associated with known classes (smaller dots) received higher SRS.