Figure 1.
Coverage of genomic and metagenomic datasets with protein families.
Sequence sets include Human Gut Related(A), Human Gut Unrelated(B) and Metagenomic sequences(C). The unassigned proteins (green) consist of singletons and small sequence clusters (see text for details).
Figure 2.
Size distribution of protein families in human gut metagenomics data, PfamA protein families (red) and new families found in this work (blue).
Figure 3.
The distribution of “essentiality coefficients” for protein families.
PFAM families [5] are shown on the left and the new families introduced in this manuscript on the right panel.
Table 1.
The 10 most overrepresented (Ov) PfamA families in human gut microbiome.
Table 2.
The 10 most expanded (Ex) PfamA families in human gut microbiome.
Table 3.
The 10 most essential (Es) PfamA families in human gut microbiome.
Table 4.
10 top most overrepresented (Ov) new families, from the set of over 180 curated novel families identified in this work.
Table 5.
10 top most expanded (Ex) new families, from the set of over 180 curated novel families identified in this work.
Table 6.
10 top most essential (Es) new families, from the set of over 180 curated novel families identified in this work.