Figure 1.
Illustration of the PRINCE algorithm.
A query disease, denoted Q, has varying degrees of phenotypic similarity with other diseases, denoted d1–d5 (marked with maroon lines, where thicker lines represent higher similarity). Known causal genes for these similar diseases are connected by dashed blue lines and used as the prior information. p1–p11 comprise the protein set of a protein-protein interaction network, where interactions are marked with black lines and thicker lines denote edges with higher confidence. A scoring function that is smooth over the network is computed using an iterative network propagation method. At every iteration of the algorithm, each protein pumps flow to its neighbors and receives flow from them. Protein colors correspond to the flow they receive in a specific iteration, the darker the color the higher the flow. (A): the flow after the first iteration, representing the prior information. Only proteins p2, p4 & p9, which are directly associated with similar diseases, have a positive incoming flow. (B): After several iterations, the amount of flow to each node converges, and the resulting flow, used to score the proteins, appears to be smooth over the network. p5 emerges as the best causal gene candidate for disease Q, as it interacts with both p2 and p4.
Figure 2.
A comparison of prioritization algorithms.
Performance comparison for PRINCE, Random Walk and CIPHER in a leave-one-out cross-validation test over 1,369 diseases with a known causal gene. The figure shows recall versus precision when considering the top proteins for various values of
.
Table 1.
Coherency comparison of different protein complex collections.
Figure 3.
Case studies of inferred complexes.
Examples of inferred protein complexes and their associated diseases. Circular nodes represent proteins and their connecting edges represent protein-protein interactions. Diseases are denoted by square nodes, connected by phenotypic similarity edges. Green dashed edges represent known gene-disease associations; red edges connect a disease to a gene that lies within its associated genomic interval. The complexes were generated for the query diseases (A) Ataxia-Telangiectasia, (B) Hereditary Prostate Cancer type 8 and (C) MOPD-I.