Skip to main content
Advertisement

< Back to Article

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets

Fig 5

Comparison between areas of sequence space covered by MCs in fully redundant pairs.

Note that we exclude MC pairs in parent-child relationships and are left with 25,980 pairs overall. For each MC, we generate the list of IDs of all proteins that map to at least one MC member (using the profile-HMM-based definition of MC membership, see Methods). Then, for each pair, we calculate the fraction of protein IDs that are shared between the two MCs (where the fraction is calculated with respect to the MC with the shorter protein ID list).

Fig 5

doi: https://doi.org/10.1371/journal.pcbi.1010610.g005