Skip to main content
Advertisement

< Back to Article

Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes

Fig 6

To illustrate the advantageous scaling properties of MinHash data structures, synthetic profiles of length 100 were generated in the form of binary vectors (0 and 1 equiprobable).

Profiles were then clustered using an explicit calculation of the Jaccard distance, reduced to a lower dimensionality (5 dimensions) with truncated SVD, normalized and explicitly clustered using Euclidean distance as in SVD-Phy [18] or transformed into MinHash signatures and inserted into an LSH Forest object as in our method. Orders of magnitude showing typical use cases for profiling pipelines are shown on the x-axis. Curves were fitted to each set of timepoints to empirically determine the time complexity of each approach.

Fig 6

doi: https://doi.org/10.1371/journal.pcbi.1007553.g006