Skip to main content
Advertisement

< Back to Article

Fig 1.

Organism-level LCD frequencies for primary and secondary LCD classes among each domain of life.

(A) Percentage of organisms with ≥1 LCD instance for each of the 20 classes of primary LCDs across the four major domains of life. (B) Heatmaps indicating the percentage of organisms from each domain of life with ≥1 instance of each secondary LCD class. In each plot, the x-axis represents the primary amino acid enriched in the LCD (≥40% composition) and the y-axis represents the secondary amino acid enriched in the LCD (≥20% composition). Values for primary LCD classes are indicated in the diagonals to facilitate relative comparisons. Amino acids in panel B were sorted by average whole-proteome frequency rank.

More »

Fig 1 Expand

Fig 2.

LCD classes most frequently associated with the same function(s) across organisms for each domain of life.

(A) For all GO terms significantly associated with an LCD class in at least one organism, the percentage of organisms sharing significant enrichment for that LCD class/GO term pair was calculated separately for each domain of life. Each dot represents a single LCD class/GO term pair. (B) Top 50 LCD class/GO term pairs with respect to the percentage of organisms sharing significant enrichment for that pair in eukaryotes. (C,D) Top 25 LCD class/GO term pairs with respect to the percentage of organisms sharing significant enrichment for that pair in archaea and bacteria, respectively. For panels B-D, bar color corresponds to LCD class, with reciprocal classes (e.g., GR and RG LCDs) assigned the same color for simplicity. GO terms on the x-axis are colored according to the GO-term category with Biological Process (BP) in red, Cellular Component (CC) in green, and Molecular Function (MF) in blue. Top-ranking LCD class/GO term pairs for viruses are shown in S9 Fig. For simplicity, only GO terms that were significantly enriched (Šidák-corrected p < 0.05) and had a minimum depth of 4 in the gene ontology are shown.

More »

Fig 2 Expand

Fig 3.

Protein set overlap among keratin-associated proteins identified within C-rich LCD classes in model eukaryotes.

(A) Top-ranking enriched GO terms associated with CX LCD classes, sorted by the percentage of eukaryotic organisms with significant GO-term enrichment for the LCD class/GO term pair. Bar color corresponds to LCD class. GO terms on the x-axis are colored according to the GO-term category with Biological Process (BP) in red, Cellular Component (CC) in green, and Molecular Function (MF) in blue. Only GO terms that were significantly enriched (Šidák-corrected p < 0.05) and had a minimum depth of 4 in the gene ontology are shown. (B-G) UpSet plots (analogous to a Venn diagram) depict the co-occurrence of CX LCDs among keratin-associated proteins for humans (B), cows (C), dogs (D), mice (E), rats (F), and pigs (G). Keratin-associated proteins with C-rich LCDs were parsed into secondary LCD classes and evaluated for co-occurrence (i.e., two LCD types appearing in the same protein) across LCD classes. Each pair of reciprocal LCD classes (e.g., CS and SC) was grouped into a single representative category. The graphs on the left of each panel indicate the number of keratin proteins containing each secondary class of C-rich LCDs. The bar graph at the top of each panel indicates the number of proteins with each combination of C-rich secondary LCD classes (which are indicated by green dots and connecting lines below the bar graph). For example, in humans, three keratin proteins contain CR LCDs: two of these proteins also contain CT/TC and CS/SC LCDs, while one contains CT/TC, CP/PC, and CS/SC LCDs.

More »

Fig 3 Expand

Fig 4.

Frequencies of top eukaryote-specific LCD classes.

Bar plot indicating the number of organisms with ≥10 LCD-containing proteins for each eukaryote-specific LCD class.

More »

Fig 4 Expand

Fig 5.

Functions consistently enriched for proteins containing H-rich LCDs in eukaryotes.

(A) For all HX and XH LCD classes (where X represents any amino acid except histidine), the percentage of eukaryotic organisms with significant enrichment for each LCD class/GO term pair was calculated. The bar plot indicates the top 50 LCD class/GO term pairs with the highest percentage of eukaryotes exhibiting significant enrichment among the HX or XH LCD classes. Bars are colored according to LCD class (with the reciprocal classes QH and HQ assigned the same color), whereas GO terms are colored according to the GO-term category with Biological Process (BP) in red, Cellular Component (CC) in green, and Molecular Function (MF) in blue. Only GO terms that were significantly enriched (Šidák-corrected p < 0.05) and had a minimum depth of 4 in the gene ontology are shown. (B) Frequency of significant enrichment across organisms for each GO term associated with proteins containing HQ LCDs. GO-term analyses were also performed on the same set of proteins but with those that also contained a spatially distinct H-rich LCD (primary class), Q-rich LCD (primary class), or QX, XQ, HX, and XH LCD (where X is any residue other than Q or H) removed prior to analysis. For simplicity, only GO terms that were significantly enriched for ≥30 organisms and had a minimum depth of 4 in the gene ontology are shown.

More »

Fig 5 Expand

Fig 6.

Percentage of organisms with significantly enriched LCDs after accounting for amino acid frequencies.

The percentage of organisms with significantly enriched LCD-containing proteins (relative to a scrambled version of each proteome) is depicted for each LCD class in archaea (A), bacteria (B), eukaryota (C), and viruses (D). LCD frequencies in the original and scrambled proteomes were compared using Fisher’s exact test for each LCD class in each organism. Within each organism, p-values for all represented LCD classes (i.e., those with at least one LCD instance in the original or scrambled proteomes) were corrected using the Holm–Šidák correction method to account for multiple hypothesis testing. Significant enrichment is defined as p < 0.05 after multiple-test correction.

More »

Fig 6 Expand

Fig 7.

Statistical LCD enrichment by LCD class in the malarial and human proteomes.

(A) Heatmap depicting the degrees of LCD enrichment (expressed as the lnOR) for each LCD class among the P. falciparum proteome (UniProt ID: UP000001450_36329). For LCD classes in which the number of LCDs in either the original or scrambled proteomes were 0, a value of 1 was added to all cells in the contingency table to calculate a biased lnOR (see Methods). (B) Binary classification for LCD categories for which enrichment was statistically significant (red squares) or statistically non-significant (black squares) after multiple-test correction. Grey squares indicate LCD categories that were excluded from statistical analysis since no LCDs were found in both the original and scrambled proteome. (C) Degrees of LCD enrichment in the human proteome (UniProt ID: UP000005640_9606). (D) Statistical significance for LCD enrichment in the human proteome. For all panels, the diagonals represent corresponding values for each primary LCD class. The data underlying these heatmaps can be found in the supplementary data available at [33].

More »

Fig 7 Expand

Fig 8.

Domains of life with the highest mean per-residue occupancy for each LCD class.

Mean per-residue LCD occupancy was calculated for each LCD class within each domain of life. For each LCD class, mean per-residue LCD occupancy values were compared across the four domains of life to determine the domain with the highest per-residue occupancy. The color of each square in the heatmap indicates the domain of life with the highest mean per-residue LCD occupancy. LCD classes on the diagonal represent the primary LCD classes.

More »

Fig 8 Expand

Fig 9.

Per-residue occupancy for the top-ranking organisms from each domain of life for the primary LCD classes.

Per-residue occupancy was calculated separately for each organism as the percentage of total residues in the proteome that were occupied by LCDs from each primary LCD class for archaea (A), bacteria (B), eukaryota (C), and viruses (D). Values above each bar represent the per-residue occupancy value (as a percentage), followed by the total number of proteins in the corresponding organism. For the average among each domain of life (red bars), the mean number of proteins per organism and the standard deviation in the number of proteins per organism is expressed above the bar for the “A” LCD class only since these values are independent of LCD class.

More »

Fig 9 Expand

Fig 10.

Maximum per-residue occupancy for each LCD class by domain of life.

Per-residue occupancy was calculated for each LCD class and each organism. Maximum per-residue occupancy is depicted separately for each LCD class in archaea (A), bacteria (B), eukaryotes (C), and viruses (D).

More »

Fig 10 Expand

Fig 11.

Number of LCD classes assigned to each eukaryotic organism contributing a maximum per-residue occupancy for at least one LCD class.

(A) Pie chart indicating the assignment of LCD classes (400 total) to the eukaryotic organism achieving the highest per-residue LCD occupancy. Each wedge represents a single organism associated with the overall highest per-residue occupancy observed among eukaryotes. Wedge size indicates the number of LCD classes for which the single organism corresponding to that wedge achieved the highest per-residue occupancy. The top five eukaryotic organisms are indicated in the legend. Out of necessity, the color palette was repeated in the pie chart, though each color cycle represents a different set of organisms. (B) Linkage maps indicate the types of LCD classes for which the organism contributed the maximum per-residue occupancy value for eukaryotes. The first row of amino acids in each linkage map indicates the primary amino acid comprising the LCD class, and lines connected to the second row of amino acids indicate the secondary amino acid comprising the LCD class. Lines connecting identical amino acids (e.g., W connected to W) indicate that the organism contributed the maximum per-residue occupancy value for the primary LCD class as a whole (e.g., the W-rich primary LCD class). LCD classes without connecting lines are those for which the organism did not contribute the maximum per-residue occupancy value. Similar analysis for archaea, bacteria, and viruses can be found in S21 Fig. Although it achieved high per-residue occupancies for multiple LCD classes, the Spodoptera litura proteome was manually identified as an exceptionally incomplete reference proteome and excluded from analyses.

More »

Fig 11 Expand

Fig 12.

Comparison of secondary LCD distributions among eukaryotes.

(A) Distributions of secondary LCDs within primary LCD categories for all secondary LCDs identified in eukaryotic organisms. (B) Distributions of human secondary LCDs among primary LCD categories. (C) Distributions of yeast secondary LCDs among primary LCD categories. For all panels, secondary LCDs were grouped into a primary LCD category based on the predominant amino acid used in the LCD search. Then, within each primary LCD category, the percentage of total LCDs in that category was calculated for all possible secondary LCD categories and depicted as a stacked bar plot. For all secondary amino acid classes, the primary amino acid is represented on the x-axis, the bar color specifies the secondary amino acid, and the size of the bar indicates the percentage. Secondary amino acids are loosely grouped and colored according to physicochemical properties and appear in the same order (from bottom to top) as in the figure legends. Total LCD sample sizes are indicated above each bar.

More »

Fig 12 Expand

Fig 13.

Comparison of organisms based on whole-proteome per-residue occupancy of LCDs.

(A) Percentile rank for each of the primary LCD classes in humans (blue) and yeast (orange) relative to all eukaryotes. The primary LCD classes (x-axis) are sorted by difference in percentile from largest to smallest. (B) Raw difference in per-residue occupancy for each of the primary LCD classes in humans compared to yeast. (C) Raw difference in per-residue occupancy between humans and yeast for each of the secondary LCD classes. (D) Percentile rank for each of the secondary LCD classes in humans relative to all eukaryotes. (E) Raw difference in per-residue occupancy between humans and the corresponding average value among eukaryotes for each of the secondary LCD classes.

More »

Fig 13 Expand