Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and Functions Across the Domains of Life
Fig 3
Protein set overlap among keratin-associated proteins identified within C-rich LCD classes in model eukaryotes.
(A) Top-ranking enriched GO terms associated with CX LCD classes, sorted by the percentage of eukaryotic organisms with significant GO-term enrichment for the LCD class/GO term pair. Bar color corresponds to LCD class. GO terms on the x-axis are colored according to the GO-term category with Biological Process (BP) in red, Cellular Component (CC) in green, and Molecular Function (MF) in blue. Only GO terms that were significantly enriched (Šidák-corrected p < 0.05) and had a minimum depth of 4 in the gene ontology are shown. (B-G) UpSet plots (analogous to a Venn diagram) depict the co-occurrence of CX LCDs among keratin-associated proteins for humans (B), cows (C), dogs (D), mice (E), rats (F), and pigs (G). Keratin-associated proteins with C-rich LCDs were parsed into secondary LCD classes and evaluated for co-occurrence (i.e., two LCD types appearing in the same protein) across LCD classes. Each pair of reciprocal LCD classes (e.g., CS and SC) was grouped into a single representative category. The graphs on the left of each panel indicate the number of keratin proteins containing each secondary class of C-rich LCDs. The bar graph at the top of each panel indicates the number of proteins with each combination of C-rich secondary LCD classes (which are indicated by green dots and connecting lines below the bar graph). For example, in humans, three keratin proteins contain CR LCDs: two of these proteins also contain CT/TC and CS/SC LCDs, while one contains CT/TC, CP/PC, and CS/SC LCDs.