Table 1.
Influence of the alignment threshold on disease enrichment in correlated positions.
Figure 1.
Disease mutations occur significantly more often in correlated positions than expected.
Black: Empirical background distribution obtained by 1000 permutations (random expectation). Grey: Bootstrap distribution of observed LOD. Dotted vertical line indicates the observed LOD obtained by an alignment cutoff of .
Figure 2.
Disease mutations are overrepresented in correlated positions.
Distribution of log odds (LOD) scores for individual proteins. All proteins for which no score could be obtained were excluded. A: All proteins; B: proteins with disease mutations. The bars at -Inf represent cases where no position was both correlated and associated with disease, resulting in an LOD of
, as discussed in the main text.
Figure 3.
Interdependence between correlation and sequence conservation.
LOD distribution for different conservation thresholds: (A, B) LOD for correlated residues at different levels of conservation. (A) BLOSUM conservation score, (B) fractional identity. Each dot represents the LOD score achieved using a specific conservation cutoff. A cutoff of 0.4 indicates that for the calculation of the global LOD score only the residues which have a conservation score were taken into account. (C, D) LOD scores for sequence conservation irrespective of residue correlation. Here, a cutoff of 0.4 indicates that the global LOD represents all positions with a conservation score
. The LOD remains largely stable over wide ranges of sequence conservation (A,B). Residue conservation yields LODs similar to correlation for intermediate levels of conservation and performs better for very high conservation.
Table 2.
Summary of performance.
Figure 4.
Venn diagrams of SIFT, PolyPhen and correlated mutations in (A) disease and (B) non-disease positions.
For SIFT and PolyPhen, maybe and possibly damaging were treated as non-damaging.
Figure 5.
Correlation patterns in the human proline peptidase PEPD.
(A) circular representation of linear protein sequence. Arcs indicate residue correlations. (B) Structure view indicating selected correlations as dashed lines. Metal binding residues are shown with side chains (PDB: 2iw2). (A,B) Disease associated positions are marked red, functional sites in blue.
Figure 6.
Correlation patterns in the adenylate kinase isoenzyme 1.
(A) circular representation of linear protein sequence. Arcs indicate residue correlations. (B) Structure view indicating selected correlations as dashed lines (PDB: 2C95). (A,B) Disease associated positions are marked red, functional sites in blue.
Figure 7.
(A) The degree (number of correlation partners) of a position in the correlation network is positively correlated with the log odds of causing disease. As very large degrees are rare, substantial noise occurs at degrees above . For lower values the association is clearly visible. Overall the association is substantial and significant (
,
). (B) Degree distribution of multiple correlations.
Figure 8.
Influence of correlation cutoff .
Commonly, (i.e.
) is used. At first a substantial increase of LOD can be observed with increasing values of
. Once
reaches values of
the steep increase is replaced by a quite moderate slope.