Skip to main content
Advertisement

< Back to Article

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships

Fig 6

Comparison of spectra clustering using modified cosine (left) or Spec2Vec (right) across a range of similarity score cutoffs.

The cluster quality is assessed by measuring the average structural similarity across all linked pairs within each cluster. Setting a structural similarity threshold of 0.5 (see Fig 3A) allows to compare the fraction of spectra that ends up in chemically homogenous clusters (red) with those in more heterogeneous clusters (green) and the fraction of spectra that is not clustered at all (those with no links above set threshold). Clustering is done here by creating edges between spectra (= nodes) for similarities above a certain cutoff (adding max. 10 links per node). To make the resulting clustering more robust and better comparable across different scores, we used the Louvain algorithm to break up the large clusters. Dashed squares mark regions of relatively high retrieval (high fraction of clusters with high structural similarity) and high accuracy (large discrepancy between fraction of high structural similarity and low structural similarity clusters). Overall, Spec2Vec allows to cluster higher fractions of spectra into high structural similarity clusters (> 35% of all spectra are in high similarity clusters for a Spec2Vec similarity threshold of 0.7).

Fig 6

doi: https://doi.org/10.1371/journal.pcbi.1008724.g006