Using slisemap to interpret physical data

Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper, we apply a recently introduced manifold visualisation method, slisemap, on datasets from physics and chemistry. slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence investigates the decision processes of black box machine learning models and complex simulators. With slisemap, we find an embedding such that data items with similar local explanations are grouped together. Hence, slisemap gives us an overview of the different behaviours of a black box model, where the patterns in the embedding reflect a target property. In this paper, we show how slisemap can be used and evaluated on physical data and that it is helpful in finding meaningful information on classification and regression models trained on these datasets.


Introduction
Many real-world datasets, including those in physics, are tabular; rows correspond to data items (points) and columns to features of each data item.One problem when dealing with such datasets is making meaningful summaries.Unsupervised machine learning, e.g., clustering or visualisation, can be used to tackle the problem of understanding datasets with many features.In manifold visualisation the objective is to find an embedding that "compresses" the high-dimensional data into, typically, two dimensions.Manifold visualisation is central to exploring and understanding complex scientific datasets [1][2][3][4].
Manifold visualisation methods, such as [5][6][7], find embeddings such that similar data items in the original (high-dimensional) feature space end up nearby in the embedding.A challenge when applying these visualisation methods is that the resulting embedding depends on how we measure similarities between the data points.
On the other hand, supervised learning is increasingly applied in physical sciences [8].Physics simulations and powerful supervised learning modes are often "black boxes": it is not apparent by which criteria the model outputs are related to properties of, e.g., simulated molecules.This paper shows examples of how quantum chemical simulations are used to estimate molecular properties and how regression models, such as random forests, can classify particle jets from high-energy physics.Using black box models can be problematic if the aim is to understand the underlying phenomena.Additionally, to trust the predictions, it is helpful to understand by which criteria the predictions are made.
Explainable Artificial Intelligence (XAI) is a branch of machine learning that tries to open up these black boxes; see, e.g., [9,10] for reviews.Explanation methods can be split into model-specific and model-agnostic methods.The latter work on any supervised learning model and is what we consider here.Explanations can further be divided into global and local explanations.Global explanations try to explain the whole black-box model, but here, we are interested in local explanations for individual predictions.
Many model-agnostic local explanation methods find simpler, interpretable models that locally approximate the black-box supervised learning model in the neighbourhood of the points of interest [11][12][13].A common choice is to use linear models for the approximation.Naturally, for linear models to be interpretable, the number of coefficients cannot be too large [14], and the features in the data also need to be understandable [13,15].
Many of these methods require sampling of new data points.If care is not taken, the sampled data points are often unrealistic and unphysical.Evaluating black box models on data unlike the training data leads to unreliable predictions [16] and nonsensical explanations [13].
slisemap [17] is a recently introduced manifold visualisation method that finds local explanations for all data items and an embedding such that nearby data items have similar explanations.slisemap satisfies physical constraints since no sampling of-possibly unphysical-new data points is required.slisemap produces a visualisation where items with similar explanations form visual patterns, such as clusters.
While slisemap is introduced in [17], the focus of this paper is to demonstrate its application to physical datasets.When analysing physical data, an analyst usually has two goals: (i) to understand the structure of the high-dimensional data and (ii) how this data relates to a target property, such as a class label or saturation vapour pressure in the molecular example.Often, a model of some kind (a simulation, a regression model, a neural network, etc.) is used to elucidate the connection between the features and the target.As mentioned above, many model classes do not readily offer insight into how their estimate is formed.slisemap offers both explanations for the target property in the form of local simple models while also providing a visualisation where data points with similar explanations form patterns.
The objectives of this paper are the demonstration of (i) how slisemap can be applied to describe physical datasets (Sect.3.2), (ii) how the resulting visualisations can be evaluated (Sect.3.2), and (iii) that the explanations carry meaningful information for the domain expert (Sect.4).

Related work
slisemap [17] is a method for manifold visualisation and explainable artificial intelligence (XAI).This section briefly reviews related work on these topics and how they are used in science.
Manifold visualisation is commonly applied to explore and understand complex data in many fields of science, from genetics [1,2] and chemoinformatics [3] to astronomy [4] and linguistics [18].Generally, manifold visualisation is used to find structures, such as clusters, in large and high-dimensional datasets.
Manifold visualisation aims to present high-dimensional data as a low-dimensional (usually 2D) embedding while preserving the maximum amount of information based on preset criteria.In science, standard manifold visualisation methods include linear projections [5], such as principal component analysis (PCA) [19], and non-linear methods, such as t-SNE [6], and UMAP [7].These are examples of unsupervised October 25, 2023 2/17 methods; they can be utilized to discover structure in data but cannot offer insight into how the features affect the predictions.For a survey on manifold visualisation techniques, see, e.g., [20].There is also a supervised variant of UMAP [7], and several other supervised manifold visualisation techniques have been proposed, such as linear low-rank projections (LOL) [21].
Results from manifold visualisation methods are often analysed via visual inspection.Hence, we need to verify that the emerging patterns reflect some true quality of the data and are not just artefacts produced by the algorithm in question.A wide range of visualisation evaluation metrics has been proposed, from summary statistics such as stress [22] to visualisations such as Shepard diagrams [23].However, many evaluation metrics rely on the notion of distance both in the input data and the visualisation.In science, data is often tabular, and features can combine class variables, vectors and scalar values with different units.Defining suitable distance measures is, therefore, challenging.Furthermore, not all manifold visualisation methods produce linear or even metric embeddings.
Evaluation metrics inspired by information retrieval have been proposed for non-linear embedding methods, such as precision and recall [24].Incidentally, the definition of precision, or how neighbouring points in the input space are neighbours in the embedding space, serves as a basis for the neighbourhood quality metric we introduce in Sect.3.2.For a thorough quantitative survey concerning the evaluation of visualisations, see [25].
Explainable AI most often seeks to enhance the interpretability of black-box models.XAI methods can be classified along two axes: model-specific vs. model-agnostic and local vs.global.Model-specific methods can only be applied to single (or class of) models, while model-agnostic methods can be used for any model.Global methods aim to summarise the entire black-box model, while local methods explain limited regions of the data manifold, e.g., near a selected data item.
As black-box models can be very complex, summarising them holistically and interpretably is challenging.As such, many model-agnostic explanation methods provide only local explanations.A common approach is to use a interpretable surrogate model to locally approximate the black-box model [9].
These methods include lime [11], shap [12] and slise [13].lime provides sparse linear models as explanations for classifiers, with a penalty function for local model complexity to ensure interpretability.The method learns these sparse local models by sampling (generating) new data points around existing ones and utilizes the complex model to predict the target property for these new points.shap uses additive feature attributions (linear models on binary variables) where the feature contributions are estimated using ideas from cooperative games in game theory.Like lime, shap also relies on sampling new data items near the input points.Sampling-based methods, however, have a weakness when analysing datasets with underlying constraints.Such constraints can incorporate, e.g., physical conservation laws such as energy conservation.
slise [13,26] is an XAI method explicitly designed to not rely on sampling new data.The method is based on robust regression.slise splits the data into a set of potential outliers and a set of non-outliers.This translates to explanations if we consider items with different local explanations as outliers (with respect to a selected data item).Non-outlier points then specify a linear model, while outlier points are ignored.
For a broader review of XAI methods, we refer to, e.g., [9,10] for surveys of XAI methods and [15] for XAI in chemistry.
In this section, we introduce the methods for this paper.We begin, in Sect.3.1, with a brief overview of slisemap [17].and in Sect.3.2, we describe how we use slisemap and define several evaluation criteria.

SLISEMAP
slisemap [17] is a tool for visualising all local explanations in a dataset.Specifically, explanations are interpretable models that locally approximate the complex model, similar to [11][12][13].slisemap finds low-dimensional embedding and local models (explanations) such that items with similar local models end up next to each other.In slisemap, the local models are not limited to any single class; however, in this paper, we only consider simple linear regression and classification models.Formally, slisemap solves the following problem: Problem 1 Assume you are given a dataset of n data items (x 1 , y 1 ), . . ., (x n , y n ), where x i ∈ R m are the vectors of features and y i ∈ R o are the targets, and a radius r ∈ R >0 .For every data item i ∈ {1, . . ., n}, find the embedding where D(•) is the Euclidean distance and l(•, •) is a loss function for the local models under the constraint that The rows of B ∈ R n×p contain the parameters for the local models f i , where p is the number of parameters in the local white-box models, and λ lasso ≥ 0 and λ ridge ≥ 0 are the parameters for Lasso and Ridge regularisation, respectively.
Here we consider 2-dimensional embeddings (d = 2) with radius r = 3.5, as recommended by [17].We use linear regression for f i and mean squared error as l for regression problems.With classification problems, we use logistic regression for f i and Hellinger loss as l; see [17] for details.We optimise Eq. (1) using lbfgs [27] combined with a greedy heuristic for escaping local optima by using the slisemap library [28]. 1e have used χiplot visualisation platform for data exploration [29].

Workflow and Performance Measures
We start by normalising the data into zero mean and unit variance.This lets us use the built-in L 1 and L 2 regularisations for the local models to avoid overfitting.L 1 regularisation typically yields sparse solutions [30], making interpretation easier [9].
slisemap produces an embedding and a local explanation for each data item.To verify that the embedding and local models are reliable, we introduce three performance measures to test for various aspects of the solution: (i) permutation loss (a sanity check), (ii) local model stability (whether the found local models are stable), and (iii) neighbourhood stability (whether points have a stable and unique local model).We use these measures in Sect.4.4.

Permutation Loss
The permutation loss measures whether the slisemap solution captures a connection between the features and the target variable.More specifically, denote by L permuted the loss of Eq. ( 1) on a slisemap solution optimised after the target variable y has been randomly permuted, and by L the loss on the original unpermuted data.We define permutation loss as We expect M permutation < 1; otherwise, the slisemap has not been able to capture any relations between the features and the target variable.The permutation loss acts as a sanity check for the solution.

Local Model Stability
The local model stability measures the stability of the found local models with respect to the resampling of the data, i.e., can we trust the local models?Assume we have obtained two datasets of the same size, e.g., by resampling n data points without replacement from the original data, resulting in two sets of local models {f 1 , . . ., f n } and {f ′ 1 , . . ., f ′ n }.To compare the similarity of the local model populations, we use the Hungarian algorithm to find a permutation π such that the sum of distances is minimised: where • || is the Euclidean distance between the coefficients of the local models and

Neighbourhood Stability
The neighbourhood stability measures the stability of the neighbourhoods with respect to the resampling of the data, i.e., is the relative location of individual data points stable under resampling?To compute neighbourhood stability, we sample two datasets of size n, which share 50 % of the data items.Denote by S the set of points shared by these two datasets, and by {z 1 , . . ., z n } and {z ′ 1 , . . ., z ′ n } the points in the two embeddings.We define the neighbourhoods of point i in the first embedding by in the second embedding.The neighbourhood stability is defined as an average Jaccard similarity between the neighbourhoods:

Explanation quality
In addition to the metrics described above, we want to measure the quality of the explanations produced by slisemap.To measure explanation quality, we use three metrics: local loss, local loss with nearest neighbours and coverage with nearest neighbours.These metrics are described in [17], and we have included their definitions in the Supporting Information (Sect.S3).Some metrics require choosing additional parameters; in such cases, we use values found in [13].
Local loss measures how well the local models fit their targets (or, in case of nearest neighbours local loss, those of their neighbours).Coverage measures how well the local models generalise to their neighbours.For comparison, we also calculate these values for other widely used manifold visualisation methods, namely PCA, t-SNE and UMAP.To produce explanations for these alternative methods, we first calculate an embedding October 25, 2023 5/17 with each one and then train local models (similar to slisemap) based on that embedding.The results from the comparison can be found in Tab. 1.

Analysis
slisemap is foremost designed as a tool for investigating and understanding black box models.Therefore, the main way to analyse the solutions is through plots [29].We demonstrate how to analyse slisemap solutions for multiple datasets in Sect. 4 and how domain expertise can help interpret and validate the solutions.Analysing potentially thousands of local models is not feasible.Hence, we often cluster the local models using k-means clustering in the local model parameter space, i.e., the rows of the matrix B. If a suitable number of clusters is chosen, the cluster centroids are a good proxy for most local models.However, data items in lower-density areas of the embedding might have very different local models from the cluster centroids.Since they also have small neighbourhoods, the local models are at risk of overfitting, and care should be taken when interpreting these points.Notice that we use k-means clustering after slisemap only to help visualise and interpret the results.

Use Cases
This section demonstrates how slisemap can be applied to physical datasets and how we analyse and verify the solutions using domain knowledge and the metrics from above. 2 In Sect.4.1, we study a dataset from atmospheric science; in Sect.4.2 a dataset from high energy physics, and in Sect.4.3 a dataset about organic chemistry.Finally, in Sect.4.4, we evaluate the solutions using the performance measures from Sect.3.2.

Atmospheric relevant organic molecules: GeckoQ
We applied slisemap to 31,637 atmospheric relevant organic compounds contained in the GeckoQ dataset [31], collected by one of the authors.Each molecule has a variety of properties, such as the number of specific functional groups, saturation vapour pressure (p Sat ) and topological fingerprint (TopFP) descriptor [32,33].
The p Sat is a measure of the affinity of a molecule to condense into the liquid phase or to remain in the gas phase.This property is relevant in atmospheric science because low-volatile organic compounds are known to be driving factors for particle formation in the atmosphere [34].The GeckoQ p Sat have been calculated with the Conductor-like Screening Model for real solvents (COSMO-RS) [35,36], a quantum chemistry-based method, and will be the (log-transformed) target variables in following slisemap embeddings.A subset of molecular properties was chosen as interpretable features to construct the slisemap embeddings.A list of all explainable features can be found in the Supporting Information (Sect.S1).
Fig. 1 shows a slisemap embedding of the GeckoQ data.The local model coefficients generally match chemical expectations, meaning functional groups (FGs) that are especially known to lead to a low p Sat have the most negative coefficients in their local models: hydroxyl, hydroperoxide, carboxylic acid, and carbonylperoxyacids.These FGs can form hydrogen bonds, the strongest type of inter-molecular dipole-dipole interactions, and thus, the molecules are particularly strongly bound within the liquid phase, which leads to a low p Sat .The incidence of each FG by cluster can be found in the Supporting Information (Fig. S1).To further substantiate the slisemap embedding's agreement with the chemical expectancies, we chose cluster 1 (orange) and cluster 6 (pink), two visually distinct clusters for detailed analysis.In cluster 1, the median target (non-normalised) is 2.15 • 10 −9 mbar (1686 molecules), whereas cluster 6 contains molecules with a higher median p Sat , 5.75 • 10 −6 mbar (1862 molecules).The median p Sat for all the data is 1.55 • 10 −6 mbar.Fig. 2a depicts the fraction of molecules in the chosen clusters and the overall data containing FGs forming hydrogen bonds.Cluster 1 contains considerably more carboxylic acid groups than cluster 6 and the overall data, which can be directly linked to its particularly low median p Sat .Additionally, cluster 1 shows a higher incidence of hydroxyl groups, while cluster 6 has a higher fraction of carbonylperoxyacid groups.Both clusters have a similar fraction of hydroperoxide groups.
Differences between clusters also become apparent through binning the data points in the slisemap embedding and analysis of the median (normalised) target values in the bins (cf.Fig. 2b).Generally, regions of low and high p Sat emerge.These regions roughly correspond to the clusters determined by slisemap in Fig. 1, which is the most distinct in the points corresponding to clusters 1 and 6.The existence of these regions means that slisemap can capture this target-feature relationship.For comparison, Fig. 2c depicts the binned logarithm of the median target for a t-SNE embedding, an alternative way of visualising data projected into a two-dimensional space that is popular in the field of chemoinformatics.t-SNE embeds the data compactly with a tail in the negative and positive x-direction.Unlike the slisemap embedding, the t-SNE does not display distinct regions of low and high p Sat .This is further reflected in the ranges of the median values: t-SNE has a smaller range (−3.49 to 2.86) than slisemap (−4.56 to 3.60).
Furthermore, the explanation quality metrics (see Tab. 1) show that slisemap scores higher in all three explanation quality metrics than t-SNE.This reinforces how slisemap is able to prioritise patterns related to the target when building the embedding, whereas t-SNE, due to its unsupervised nature, cannot group molecules with similar behaviour together.

Elementary Particle Jets
This dataset contains simulated LHC proton-proton collisions [37].The collisions create elementary particles such as quark s and gluons.Quarks and gluons decay into cascades of stable particles, called jets, before being detected.The classification task is to determine if the particle that created a jet was a quark or a gluon [38].
To get probabilities for the jets, we train a random forest classifier [39] with 100 trees and 50 leaves per tree.When applying slisemap [17] on this dataset, we use logistic regression as the local models f i in Eq. ( 1).
The resulting solution can be seen in Fig. 3.The local models match the underlying quantum chromodynamics [38].Wider jets (high jetGirth and QG axis2) and jets with more particles (QG mult) are more gluon-like.Splitting the multiplicity based on charge should not offer any more information.The total momentum (jetPt) usually does not matter, except for higher energies where quarks are more likely.How much of the total momentum parallels the jet (QG ptD) is also helpful in finding quarks.
Even though all local models match the theory, there are some differences in magnitudes.E.g., the blue cluster 0, in Fig. 3, contains most of the highest energy jets.Since multiplicity and, indirectly, the spread is also dependent on the energy; these variables are less helpful in this subset.In contrast, the orange cluster 1 contains many jets easily classified as gluons based on QG ptD alone.Summarising, the SLISEMAP explanations are consistent with the physical knowledge (gluon jets are generally wider) and provided additional insights into the features used by the random forest classifier.Finally, we applied slisemap to the molecules in the QM9-dataset [40,41], a data set of 133,814 small organic molecules.We utilised HOMO energies obtained from [42,43] as targets and created interpretable features with the Mordred molecular descriptor calculator [44].Here, we will focus on qualitative analysis based on the slisemap embedding and omit a more technical chemical analysis.Fig. 4 displays a similar embedding structure to the GeckoQ embedding: the bulk of the data points group Table 1.Comparison of explanation measures for slisemap, PCA, t-SNE and UMAP for the datasets described in this paper; computed as described in [17].Bold values indicate the best performance, and the error bounds the standard deviation with respect to the resampling of the data.

Model
Local around a few centres, and a few data points are spread out.However, unlike GeckoQ, the clusters are much more distinct.Setting the number of clusters to four, we can see a central cluster flanked by three others.For the central cluster, the number of triple bonds (nBondsT) has relatively higher importance than for the other clusters, as can be seen in the right panel in Fig. 4. The orange and red clusters place high importance on the number of hydrogen bond donors (nHBDon).They are distinguished from one another by the number of atoms (nAtoms), which is of high importance to the red cluster while nearly inconsequential for the orange one, and the number of sulphur bonds (nBondsS), which sets the orange cluster apart from the others.

Evaluation of the solutions
It is essential to evaluate whether the resulting visualisations contain useful information or whether we are just visualising random noise.The three performance measures described in Sect.3.2 are designed to evaluate various aspects of visualisation.We depict the performance measures as a function of the sample size in Fig. 5.For each measure and subsampled size, we train ten models and average over them to get the final performance measure value.To provide a baseline for the stability measures, we also calculate the measure between slisemaps trained on actual data and those trained on data with randomly permuted target variables.
The permutation loss for each dataset remains less than one (the permuted baseline), assuring that slisemap has learned from the data.The local model stability shows similar behaviour, rapidly falling below the baseline as sample size increases, especially for the particle jets dataset.The measure also suggests that the sample sizes (10,000, 10,000 and 31,637 for the QM9, jets and Gecko datasets, respectively) are reasonable.
The neighbourhood stability is reasonable for the particle jets data set but worse for the molecular datasets.But, as noted in [17] equally viable explanations.We argue that the presence of these alternative explanations is inherent to local explanation methods, including slisemap.When we use linear models as slisemap explanations, data items near the intersection of the local model hyperplanes could be explained almost equally well by both intersecting models.As a practical consequence, one should be careful when, e.g., studying individual molecules in the slisemap clusters (as the molecule could potentially also be allocated to another cluster).However, even for the molecular dataset, the set of local models is stable.
It follows that all slisemap embeddings are non-random (permutation loss is less than one), and the set of local models (local model stability) is informative.However, for both molecular datasets, a single molecule can have several local explanations (neighbourhood stability), a known feature of local explanations, which SLISEMAP makes visible (see [17], Sect.4.5, for further discussion).
As for explanation quality, Tab. 1 shows that slisemap produces best explanations for each of the three datasets based on all of the chosen quality metrics.

Discussion and Conclusions
In this paper, we apply slisemap [17], a recent XAI and visualisation method, on three scientific datasets.Using domain knowledge, we verify that the explanations are consistent with the chemical and physical theories.
slisemap captures the chemical expectations set for the GeckoQ data.The local models match the different natures of the functional groups.Further, slisemap can find underlying structures in the atmospheric data and reflect the relationship between the target and the features, which are generally known but hardly quantifiable.The local models in the Jets dataset also adhere to (known) physics.However, the subgroups with slightly different models were new but, after analysis, justifiable.For the QM9 dataset, structure emerges in the slisemap embedding with distinctively behaving clusters of molecules with different local explanations for each of the clusters.
In Sect.3.2, we introduce stability measures for the slisemap embedding.Our stability measures reveal that the explanations for individual data items are often not unique, a fact that slisemap (unlike most other local explanation methods) makes apparent [17].However, in Sect.4.4, we show that the collection of explanations (local models) found by the slisemap are informative and stable for all our use cases.Moreover, Tab. 1 shows that slisemap captures patterns for predicting the target value October 25, 2023 11/17 better than other manifold visualisation methods widely used in these domains, as measured through the local models.As mentioned in Sect.4.4, there is ambiguity in the selection of local explanations.Nonetheless, it would be interesting to, in the future, analyse the alternative local explanations in depth.Furhtermore, our results suggest that while explanations for individual points are not guaranteed stability, the overall distribution of local models should be.Constructing these sets of plausible local models and, by extension, approximating the overall distribution could allow for uncertainty quantification.
In conclusion, we have demonstrated a workflow for applying slisemap and shown how slisemap can provide physically sound insight for analysing complex physical datasets.

Fig 1 .
Fig 1.The slisemap embedding of the GeckoQ data in the left panel.The number of clusters (seven) was chosen via visual inspection.The right panel includes the average local coefficients of each cluster.

Fig 2 .
Fig 2. (a) Fraction of molecules that contain at least one FG of hydroxyl, hydroperoxide, carboxylic acid, carbonylperoxyacid grouped by clusters 1, 6 and all the clusters.(b) slisemap and (c) t-SNE embedding, where the data points are binned, and the colour map corresponds to the median normalised target of the bins.Clusters 1 and 6 are encircled.

Fig 3 .
Fig 3. slisemap solution for 10 000 jets from the particle jets dataset.The data items have been clustered according to the local models.The coefficients for all local models follow physical theory but with varying magnitudes.

Fig 4 .
Fig 4. slisemap embedding for the QM9 data set (10,000 molecules), clustered with 4 clusters.The right panel shows the ten most influential features for predicting the target (HOMO energy).

Fig 5 .
Fig 5. Permutation loss, local model stability and neighbourhood stability (Sect.3.2) for the use cases as a function of (resampled) dataset size.Smaller values are better.Dashed lines indicate reference values for baseline datasets where the target variables have been permuted randomly.