Characterized repesentatives of FUnkFams and omissions from PFam

Posted by morgannprice on 11 Jan 2019 at 21:42 GMT

I compared FUnkFams to a database of 113,300 characterized proteins (from the curated part of PaperBLAST's database) and found that over 1,000 of the FUnkFams have characterized representatives. I didn't expect there to be so many characterized proteins that are not in PFam or the Conserved Domain Database (CDD).

I also did a systematic analysis of characterized proteins that are not in any PFam, are at least 40 amino acids long, and have homologs, and found almost 4,000 characterized proteins that do not contain PFams. Some of these proteins have low-complexity sequences and might be entirely unstructured, some have hits against CDD and would not appear in FUnkFams, and some are diverged (yet functional) proteins that have escaped curation.

For a listing of the characterized proteins that are not in PFam, see

