Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes
Fig 1
Diagram summarizing the different steps of the pipeline to generate the LSH Forest and hash signatures for each HOG.
The labelled phylogenetic trees generated by pyHam are converted into phylogenetic profiles and used to generate a weighted MinHash signature with Datasketch. The hash signatures are inserted into the LSH Forest and stored in an HDF5 file.