Known allosteric proteins have central roles in genetic disease

doi:10.1371/journal.pcbi.1009806

Fig 1.

Enrichment of allosteric proteins in disease ontologies and gene ontologies.

A) Summary of the disease ontology terms where allosteric proteins are significantly enriched. The terms were filtered for redundancy and plotted using their semantic similarities, the ones highlighted were selected manually from the filtered list of terms. Allostery is overrepresented in diverse diseases, which include many of the major disease types like cardiovascular diseases, metabolic diseases (e.g. hypercholesterolemia, diabetes) central nervous system diseases, and cancers. B) Graph of the disease ontology terms where allosteric proteins are most significantly enriched. The intensity of red corresponds to significance, only terms with p < 0.005 (Bonferroni-correction) are plotted. Allosteric proteins are most significantly associated with diseases of the hematopoietic system, cancers and vascular disease. C) GO analysis of significantly enriched Molecular Function terms. Related terms were grouped and visualized with REVIGO using their sematic similarity. The list of proteins in each term, and p-values are provided in S2 Table. The analysis shows that allosteric proteins are involved in diverse functions, and, unlike disease ontology terms, the enriched GO terms are not dominated by a few categories.

More »

Expand

Fig 2.

Allosteric proteins are overrepresented in disease, and are central in disease-protein networks.

A) Allosteric proteins are significantly more frequently involved in disease than non-allosteric proteins, irrespectively of their quaternary structure. * In the ‘All’ category the number of non-allosteric proteins was defined as human SwissProt proteins minus all human allosteric proteins. Note that the ‘All’ category includes also the proteins where quaternary structure is not known. B) Allosteric proteins (except homomers) are involved in significantly more diseases than non-allosteric ones. C) Allosteric proteins, particularly the ones forming heteromeric complexes have significantly higher betweenness centrality in the disease-network than non-allosteric ones. D-E) Drug-targets have generally higher betweenness centralities than non-drug target proteins, however, allosteric proteins have significantly higher betweenness than non-allosteric proteins in both groups. F) The disease-protein network. Allosteric proteins are represented by red nodes, non-allosteric ones by blue, the size of the nodes was calculated as 1 + Log(nr of diseases). The largest connected component was visualized with the OpenOrd algorithm of the Gephi platform.

More »

Expand

Fig 3.

Human orthologs and paralogs of known allosteric proteins, and which are not present in the allosteric dataset have different disease properties.

Orthology and paralogy have different consequences for functional conservation: orthologs usually have similar functions in different species, while paralogs, due to being released from selective pressure after duplication evolve and acquire new functions rapidly. A-B) Similarly to known human allosteric proteins, human orthologs of non-human allosteric proteins are enriched in disease, and have high betweenness centralities in the disease network. C-D) Paralogs of human allosteric proteins show no, or only weak importance in disease. E-F) Young duplicated human allosteric proteins are less important in disease than older duplications. On all panels the analyses were performed for separate phylogenetic datasets of the eggNOG5 database, excluding orthogroups that have only members of the lower taxonomic group (i.e. orthogroups having only mammalian proteins were not used in the vertebrate or metazoan sets).

More »

Expand

Fig 4.

Variants of cancer GWAS are enriched near allosteric proteins.

A-B) Relationship between the number of cancer cases and significant variants (p < 10⁻⁸) in cancer GWA studies where at least one significant variant is located near a protein coding gene. The majority of GWAS have a moderate number of cases (and power), 10³−10⁴, and identify less than 10 significant variants. Enrichment near different protein types was calculated in two datasets; one using studies irrespectively of the number of significant variants (“All”), and one using GWA studies reporting less than 10 significant variants, which is influenced less by the largest studies of the most common cancer types, and has less variability in statistical power to detect significant associations. C-F) In the analysis which only uses the studies with the highest number of cancer cases, allosteric and (mostly Mendelian) disease associated proteins show a 2- and 1.5-fold enrichment compared to other (i.e. neither allosteric nor disease associated) proteins (C). Variants with the highest significance in each GWAS (20% with the lowest p-values) show an even more pronounced, 3-fold enrichment near allosteric proteins (D). In studies reporting less than 10 variants, the enrichment near allosteric proteins is also 3-fold, comparable to the pattern seen with the most significant variants of the full dataset (E). The distribution of main cancer types in the variants of the two GWAS sets (F). G-J) The analysis which uses a single nonredundant list of genes compiled from all studies of the same cancer type (mapped trait of GWAS Catalog) shows a similar degree of enrichment of allosteric and disease proteins among the proteins identified by GWAS.

More »

Expand

Fig 5.

Conservation, mutation density, and disease associations of allosteric and non-allosteric proteins involved in disease.

Kinases and drug targets are shown separately. A-C) Allosteric proteins are characterised by a somewhat higher conservation than non-allosteric proteins. D-F) Allosteric proteins are characterised by significantly higher numbers of pathogenic mutations than non-allosteric ones. G-I) Despite having more pathogenic mutations, the relationship between the number of diseases, and the number of pathogenic mutations is qualitatively similar for allosteric and non-allosteric proteins.

More »

Expand

Fig 6.

Structural and dynamical characteristics of pathogenic mutations in allosteric and non-allosteric proteins.

A-B) Residue interaction matrix of Pfam domains of allosteric (A) and non-allosteric (B) proteins. C-D) The difference between the two matrices (panel C) shows that allosteric proteins have significantly fewer long-range interactions in their Pfam domains than non-allosteric proteins (panel D), and are likely to be more flexible. E-F) Disease associated mutations are significantly enriched in community interfaces, i.e. residues that interact with members of other communities, both in the case of allosteric and non-allosteric proteins. The enrichment is more pronounced for stronger interactions like H-bonds (panel E). G) The distribution of disease mutations across communities (horizontal bar) differs from the random expectation (violin plot) more in allosteric proteins than in non-allosteric ones, indicating that pathogenic mutations have a stronger effect in allosteric proteins. See also S9 Fig. H) Disease associated mutations are significantly enriched in the protein-protein interfaces of both allosteric and non-allosteric heteromers. I) A much less pronounced, but also significant enrichment is present in the interfaces of homomers. J) The structure of Mitogen-activated protein kinase 8 (MAPK8, PDB ID: 4qtd). K) Community structure of MAPK8. Each community is represented by a different colour, residue-residue interactions between different communities (community interfaces) are indicated with black.

More »

Expand

Fig 7.

Allosteric proteins have higher betweenness both in PPI and disease networks than non-allosteric proteins, even though the overall correlation between the centralities of the two network types is weak.

A-B) Correlations between PPI and disease protein network centralities. Proteins with 0 betweenness centrality in any of the two networks were excluded. C-D) Density plots of allosteric proteins. 43.5% (IntAct) and 39.9% (BioGrid) of proteins have betweenness centrality higher than 1000 in both networks (red rectangle). E-F) Density plots of non-allosteric proteins. 23.6% (IntAct) and 23.3% (BioGrid) of proteins have betweenness centrality higher than 1000 in both networks (red rectangle). Data ellipses on panel A and B were drawn with the stat_ellipse() function of ggplot2 (R), with default settings, ANCOVA was performed on log transformed data.

More »

Expand