On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
The blue dots are labeled negatively, the green dots are labeled positively. Left: Local gradient of the classification function at the prediction point. Right: Taylor approximation relative to a root point on the decision boundary. This figure depicts the intuition that a gradient at a prediction point x—here indicated by a square—does not necessarily point to a close point on the decision boundary. Instead it may point to a local optimum or to a far away point on the decision boundary. In this example the explanation vector from the local gradient at the prediction point x has a too large contribution in an irrelevant direction. The closest neighbors of the other class can be found at a very different angle. Thus, the local gradient at the prediction point x may not be a good explanation for the contributions of single dimensions to the function value f(x). Local gradients at the prediction point in the left image and the Taylor root point in the right image are indicated by black arrows. The nearest root point x0 is shown as a triangle on the decision boundary. The red arrow in the right image visualizes the approximation of f(x) by Taylor expansion around the nearest root point x0. The approximation is given as a vector representing the dimension-wise product between Df(x0) (the black arrow in the right panel) and x − x0 (the dashed red line in the right panel) which is equivalent to the diagonal of the outer product between Df(x0) and x − x0.