Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

Python frameworks supporting post-hoc attribution for XAI of DNNs.

Framework lists the name of the framework. Backend shows the Python-library the framework is based on. Propagation Attribution are the supported propagation-based attribution methods of the framework. Propagation Rule-map describes the framework’s support for mapping different rules to layers or parts of a model. Other Attribution (Notable) are the notable (i.e. non-trivial), non-propagation-based attribution approaches supported by the framework. Documentation/Tests highlights the framework’s state of the documentation and tests (with CI).

More »

Table 1 Expand

Fig 1.

Relation between our software frameworks Zennit, CoRelAy, and ViRelAy.

Given a dataset, Zennit produces feature attribution scores (e.g., using LRP), which are used by CoRelAy to conduct, e.g., a SpRAy. ViRelAy uses the dataset and outputs of both other frameworks to present an interactive visualization.

More »

Fig 1 Expand

Fig 2.

An overview of how LRP operates.

First inference is performed on the input sample x by performing a forward pass through the model , resulting in a prediction y. The prediction score is then propagated backwards through the layers of the model until the input layer is reached. The resulting attribution R reaches all latent and input components of the model and can then be visualized as a heatmap.

More »

Fig 2 Expand

Fig 3.

The workflow of SpRAy as implemented within our software frameworks.

(1) First, the model is used to perform inference on all samples of a dataset. (2) The classification decisions are then explained using a suitable LRP variant with the help of Zennit, resulting in attributions and heatmaps for all samples and classes of interest. (3) Then, an eigenvalue-based spectral cluster analysis is performed using CoRelAy, in order to identify different prediction strategies within the analyzed attribution data. (4) The resulting embeddings (e.g. the raw spectral embedding or e.g. a t-SNE or UMAP embedding based thereon) and clusterings (e.g. k-Means) can then be visualized in ViRelAy to identify possible characteristic prediction strategies or CH behavior of the model. This information can be used to improve the model or the dataset.

More »

Fig 3 Expand

Fig 4.

A schematic representation of a PyTorch computation graph and its corresponding gradient computation graph.

An example for a linear layer followed by a ReLU activation function is shown.

More »

Fig 4 Expand

Fig 5.

A computation graph modified by Zennit.

The gradient computation graph of PyTorch is utilized to implement decomposition-based methods such as LRP.

More »

Fig 5 Expand

Fig 6.

A list of pre-made rule composites provided by Zennit.

More »

Fig 6 Expand

Fig 7.

An example of how the SequentialMergeBatchNorm canonizer is used.

The canonizer merges batch normalization layers with the linear layers preceding them in a configuration of layers commonly used in models of the VGG family.

More »

Fig 7 Expand

Fig 8.

Heatmaps of attributions of lighthouses, using the pre-trained VGG-16 network provided by Torchvision.

The composite EpsilonGammaBox was used for computing attributions. Each row after the inputs in the top shows a different color map natively supported in Zennit. The meaning of each color value are shown at the left of each row, e.g., for coldnhot, negative relevance is light-/blue, irrelevant pixels are black, and positive relevance is red to yellow). The Custom color map show-cases the color-map specification language, which also supports discrete color-maps. The code for the custom color-map is ’70:0d0,70:ddd,90:ddd,90:d0d’.

More »

Fig 8 Expand

Fig 9.

The ViRelAy user interface.

Highlighted points are: (1) Project selection, (2) analysis setup and category selection, (3) color map selection, (4) data/attribution visualization mode selection, (5) image sampling mode selection, (6) import/export/share current selection, (7) 2D visual embedding canvas, (8) auxiliary score plot, (9) cluster point selection, and (10) data/attribution visualization.

More »

Fig 9 Expand

Fig 10.

Comparison of t-SNE embeddings of classes “bird” (left) and “horse” (right).

Each data point represents a spectral embedding of an attribution of a sample from the dataset that was projected into 2-dimensional space using t-SNE. The colors indicate the different clusters identified by the currently selected clustering method.

More »

Fig 10 Expand

Fig 11.

Input images and their respective feature attributions.

Top: input images of the samples in an outlier cluster of the class horse; Bottom: gray-scale versions of the same images with the attribution heatmap superimposed onto them (sample viewer with display mode overlay). Positive relevance increases from red to yellow to white color. Negative relevance increases from blue to cyan. Reprinted from http://www.pferdefotoarchiv.de as part of the PASCAL VOC 2007 dataset under a CC BY license, with permission from Lothar Lenz, original copyright Lothar Lenz 2007.

More »

Fig 11 Expand

Fig 12.

Test accuracies and tau scores on CIFAR-10.

Left: Test Accuracy on poisoned training set (vertical axis) vs. clean training set (horizontal axis). Right: Tau-score [21] of HDBSCAN-SpRAy on models trained on poisoned data (vertical axis) vs. models trained on clean data (horizontal axis). The light, small dots are individual trials (models) and the larger, thick dots are the empirical mean over the trials for that particular color. Each color represents the class which was poisoned each setting, where each setting is a 2-class classification with the previous class (e.g., class 0 means the classification was class 9 versus class 0, where class 0 was poisoned). The dashed line visualizes the points on which poisoned and clean training accuracy would be equal.

More »

Fig 12 Expand

Fig 13.

Heatmaps of attributions of lighthouses for VGG16.

The attribution scores were computed for the pre-trained VGG-16 network with BatchNorm provided by Torchvision. The model correctly predicted all images as class “lighthouse”. The attributions were visualized with the color map coldnhot (negative relevance is light-/blue, irrelevant pixels are black, positive relevance is red to yellow).

More »

Fig 13 Expand

Fig 14.

Heatmaps of attributions of lighthouses for ResNet50.

The attribution scores were computed for the pre-trained ResNet50 network provided by Torchvision. The model correctly predicted all images as class “lighthouse”. The attributions were visualized with the color map coldnhot (negative relevance is light-/blue, irrelevant pixels are black, positive relevance is red to yellow).

More »

Fig 14 Expand