rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

Accuracy of phylogenetic distance estimates based on different pattern sets.

Nine sets of DNA sequence pairs were simulated with distances d between 0.1 and 0.9 substitutions per position. Distances were estimated based on the number N of spaced-word matches between them, using the alignment-free method published in [34]. We used two types of underlying pattern sets, (a) pattern sets generated with rasbhari, minimizing the variance of N, and (b) randomly generated pattern sets. The root mean square error of the estimated distances is plotted against the ‘real’ distances d.

