Fig 1.
Proteostasis proteins are closely associated with disease.
The relative disease association of proteostasis proteins (PN) was quantified and benchmarked against 3 control groups: kinases, transcription factors, and ion channels. Disease-association was determined by relative over-representation of a protein group within disease gene sets. This was done using the hypergeometric test measuring the statistical significance of their prevalence within each disease gene set. P-values were plotted on a -log(p-value) scale, with higher values representing stronger significance. Based on this quantification, proteostasis proteins are significantly over-represented in all the disease groups studied. They are relatively more disease associated than transcription factors, and in some cases even than kinases.
Fig 2.
Proteostasis profiles of diseases.
For each of the 32 diseases included in this study, we computed the fraction of proteostasis proteins within each disease gene set (brown bars). Across the 32 diseases, the fraction of proteostasis proteins within disease gene sets ranged from 20% to 36%. We further decomposed disease genes involved in proteostasis into their relevant proteostasis pathways and functional classes. This was done by identifying which pathway/functional class was over-represented within the proteostasis proteins corresponding to the genes of each disease gene set. The statistical significance of over-representation of a pathway or functional class was determined using the hypergeometric test (p-value < 0.01, represented as a coloured dot) and represented as a spot within the figure. Colour-coding of the spots reflect their involvement in the autophagy-lysosome pathway (brown), the ubiquitin-proteasome system (green) and the anabolic system (blue).
Fig 3.
Characteristic proteostasis perturbation states and patterns in disease.
(A) Three generalised proteostasis perturbation states are capable of discriminating disease types: (i) ALP + UPS + ER- (cancers), (ii) ALP + UPS + ER+ (neurodegenerative diseases), and (iii) ALP + UPS- and ER+ (other disease types analysed in this study). (B) Distinct patterns in enriched proteostasis network pathways (red spider plots) and functional classes (blue spider plots) reflect disease-relevant trends – notably, cancers and neurodegenerative diseases have distinct enrichment patterns compared to cardiovascular, autoimmune, reproductive, respiratory, and endocrine that have fairly similar patterns. The spider plots depict trends of over-representation of the relevant proteostasis network pathways and functional classes across all 7 disease types. Over-representation was determined using the hypergeometric test (p-value < 0.01).
Fig 4.
Proteostasis signatures of disease.
(A) Unsupervised clustering of cancers, neurodegenerative diseases, autoimmune diseases, and cardiovascular disease resulted in 4 clusters. The clusters were mostly by disease type, with the notable exception of pancreatic cancer and kidney cancer clustering with autoimmune diseases. (B) Proteostasis signature trends reveal that cancers and autoimmune diseases have a large proportion of common genes perturbed in similar patterns. In contrast, neurodegenerative diseases are perturbed in opposite directions. Each bar represents a gene from the relevant proteostasis network pathway. (C) Functional implications of the proteostasis signatures. The top enriched pathways (up to 10 each) for upregulated (red bars) and downregulated (blue bars) genes for each cluster type are shown.
Fig 5.
Temporal progression of proteostasis perturbations across disease types.
Patient samples from each disease was compared against against healthy controls. Differential gene analysis reveals that perturbation of the proteostasis network (PN) occurred progressively in neurodegenerative diseases but early in cancers. Each point represents a gene significantly perturbed in disease compared to controls, coloured by its direction and magnitude of change in disease conditions.
Fig 6.
Temporal patterns of ALP and UPS perturbations in the progression of cancer and neurodegenerative diseases.
The proportion of the ALP and UPS genes affected in each stage of disease is calculated and depicted. While ALP and UPS perturbations are indicative of disease states in both cancers and neurodegenerative diseases, a large proportion of ALP and UPS genes affected only in later stages of disease compared to the early implication of these genes in early stages of cancers.
Fig 7.
Central role of mid-stage regulatory proteins in the proteostasis network perturbations in AD. The functional interaction network of AD-associated proteostasis network genes (from all disease stages) are shown.
Genes affected in the early stage are depicted as blue nodes; mid stage as beige nodes; and late stage as red nodes. A stage-wise quantification of the degree and betweenness centralities of the AD-associated proteostasis network genes within the network presented reveals that proteostasis proteins perturbed in mid-stage AD (Braak 2/3) are most central in the AD proteostasis network. The degree centrality measures the number of connections of a protein within the network, and the betweenness measures the extent to which a protein lies on the shortest path between protein pairs within the network. Proteins with high degree and betweenness are likely to play key regulatory roles in their functional networks. The upper and lower bounds of the boxplots represent the interquartile range of degree/betweenness for genes associated with each disease stage. The line contained in the box represents the 50th percentile of degree/betweenness for genes associated with each disease stage. Whiskers represent non-outlying extreme points while data points beyond the whiskers are plotted individually.
Fig 8.
Proteostasis perturbations due to smoking are indicative of disease risk.
(A) Smokers present a higher similarity of proteostasis perturbations with at-risk diseases compared to reduced-risk diseases. Computing similarities of proteostasis proteins is more indicative of disease risk as compared to smoking-impacted kinases, transcription factors, or a random sample of differentially expressed genes. (B) At-risk diseases have a higher directional similarity of their perturbed proteostasis proteins with smoking. In contrast, reduced-risk diseases have a large proportion of perturbed proteostasis proteins that are deregulated in the opposite direction. (C) Genes encoding proteostasis proteins are perturbed similarly in smokers and patients with COPD. Genes similarly perturbed between smoking and COPD are likely to be contributive toward increasing COPD risk and onset. (D) Proteostasis proteins corresponding to genes perturbed in smokers and patients with PD. Proteostasis proteins oppositely perturbed between smoking and PD are likely to be protective against PD.