Skip to main content
Advertisement

< Back to Article

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships

Fig 4

Spec2Vec similarity scores deliver improved true-to-false-positive ratios during library matching.

1000 randomly selected spectra, all with at least 2 identical InChIKey in the entire dataset, were removed from a AllPositive and then matched to the remaining spectra. Matching was done by pre-selecting spectra with the same precursor-m/z (tolerance = 1ppm) and then choosing the candidate with the highest spectral similarity score if this score was larger than a set threshold. The left plot shows the true-vs-false positive rate when using Spec2Vec (red) or cosine scores (black). Due to the required precursor-m/z match, the modified cosine scores here are virtually identical to the cosine scores and are hence not shown. Labels near the first and final dots report the used similarity score thresholds. The inset plot on the left displays how many spectra identical InChIKey are part of the library for the 1000 query spectra. The plot on the right displays the resulting accuracy and retrieval rates for the same parameters. Using Spec2Vec, library matching could be done with notably higher accuracy across all tested retrieval rates. Please note: Retrieval rates for the cosine score do not fully reach the level of the Spec2Vec based matching due to the set min_match parameter which in the presented case will assign a score of 0.0 to each pair with less than six matching peaks. Lowering the min_match parameter will increase the retrieval but also lower the accuracy (see also Fig A in S3 Text).

Fig 4

doi: https://doi.org/10.1371/journal.pcbi.1008724.g004