Deep neural network based histological scoring of lung fibrosis and inflammation in the mouse model system
Fig 4
Analyzing annotator agreement and inherent partial ambiguity of image data.
A. Confusion matrix of predicted labels of the validation data (columns) compared to the ground truth (rows). The numbers are classification probabilities, normalized to a row sum of one. Note that the highest values are either on the diagonal (agreement of ground truth and prediction) or in an element next to the diagonal (a deviation with a neighboring class). The ignore class can be to some extent misinterpreted as all other classes and vice versa. The overall accuracy was A = 79.5%. B. Confusion matrix of the agreement of two human experts (annotators 1 and 2) using 400 randomly selected image tiles. The overall result was similar, however the inter annotator agreement of the human experts in terms of accuracy was A = 64.5% and lower than the agreement of the CNN on the unseen validation data. However, the exact value of the agreement of two human experts will depend on the type and amount of training. C. Visualization of the inner representation of the image data in the CNN. Here, the last hidden CNN layer representation of the image data was projected in two dimensions with t-SNE, a method to visualize high-dimensional data. Each dot represents one tile from ~2000 validation images. Insets show example images, along with the predicted label and their approximate locations in the cluster. Most classes are separated, however they are interconnected and especially in the transition areas there is ambiguity. Note the smaller area of class 0 tiles close to the “ignore” class (left) showing already properties of the “ignore” class (e.g. an only partially covered tile).