Network-based protein-protein interaction prediction method maps perturbations of cancer interactome

doi:10.1371/journal.pgen.1009869

Fig 1.

Illustration of the perturbation of the protein relationship network and NECARE algorithm.

Panel A-C introduce the concept of protein network perturbation. (A) Each node represents a protein. Mutations such as nonsense mutations could cause the node to be totally inactive or absent (red) and lose all the edges connected to this node (gray dashed edges). (B) Each node represents a protein. Mutations such as missense mutations could cause the gain or loss of specific edges (purple edges mean the new gained edges due to the mutations; gray dashed edge means lost interaction), while the center node is not totally inactive. (C) This is an example of the perturbation of the protein relationship network in cancer. The example is based on the KEGG database (6). Gray dashed edges are the interactions that are lost in cancer, and purple edges are the new interactions in which genes are involved in cancer. Panel D is a simple example to show how we represent the gene (red node) by NECARE with R-GCN. Nodes a-e and the red node represent different genes, and the red node is set as the target gene. Nodes a-e are all in contact with the red node, and different colored edges represent different types of interactions. First, each node is represented by a feature vector that contains three parts: (tan: OPA2Vec; salmon: TCGA-based expression feature; and taupe: TCGA-based mutation feature). Then, to represent the red node, the feature vectors are gathered and transformed for each relation type individually (for both in- and out-edges; also, a self-loop is included). The resulted representation (vertical rectangles with different colours for different relationship types) is summed up and passed to an activation function (ReLU).

More »

Expand

Fig 2.

Workflow of this study.

It describes the dataset we used and the whole pipeline of the research: from data collection and NECARE model training to the following network analysis with NECARE.

More »

Expand

Fig 3.

Network-based cancer gene relationship (NECARE) prediction.

(A) All machine learning solutions reflect the strength of a prediction even for binary classifications. This graph relates the prediction strength to the performance. The x-axes give the prediction strength as the RI (from -100: very reliable noninteraction to 100: very reliable interaction). The y-axes reflect the precision percentage (red line, Eq 3) and recall percentage (blue line, Eq 2). The precision is proportional to the prediction strengths, i.e., predictions with a higher RI are, on average, better than predictions with a lower RI. For example, for all the gene relationship predictions with RI>80 (black dashed line), approximately 96% are correct predictions. (B) This graph relates prediction strength to performance for negative predictions (noninteractions). For example, for all the negative gene relationship predictions with RI<-80 (black dashed line), approximately 92% are correct predictions. (C) The MCC (Eq 3) was determined for a comparison among different methods on the test set, and our method NECARE obtains the highest MCC: 0.84. (D) ROC curve comparison for different methods based on the test set. NECARE has the largest AUC: 0.97.

More »

Expand

Fig 4.

Cancer hub genes of the cancer gene relationship network.

Type 1: hub genes enriched for only gained links; Type 2: hub genes enriched for only lost links; Type 3: hub genes enriched for both gained and lost links. (A) The number of three different types of cancer hub genes. (B) The distribution of cancer hub genes among chromosomes. The links inside the circle are the top 1000 links between cancer hub genes based on the NECARE output scores. The blue links were inside-chromosome interactions. (C) The centrality eigenvector of cancer hub genes. The x-axis is the centrality in the normal network, and the y-axis is the centrality in the cancer network.

More »

Expand

Fig 5.

The prognostic landscape of hub genes.

Kaplan–Meier plots for the patients from 32 different types of cancers from TCGA divided into high- and low-MS groups (Materials and Methods). The P-value was calculated by the log-rank test.

More »

Expand

Fig 6.

Experimental validation of the NECARE predictions.

Panel A shows the genes that cross-talk with WNT3 and SHC2 in each pathway. Different colored edges represent different types of interactions. The red edge indicates activation; the blue edge indicates inhibition; the green edge is the KEGG annotated binding; the gray edge is NECARE predicted binding. The left yellow group shows the genes interacting with WNT3 in the Wnt signaling pathway. The right cyan group shows the genes in contact with SHC2 in the Ras signaling pathway. Those 10 genes in the middle with gray edges are NECARE predicted genes binding to WNT3 and SHC2 with a high RI (> 90). Panels B and C are co-IPs that validated the interactions of 10 predicted genes with WNT3 and SHC2 in LN229 cells. The interactions were determined by immunoblotting. The labelled “*” indicates a negative result of the co-IP validation experiment. Panel B: LN229 cells were co-transfected with the indicated HA-tagged constructs of 10 predicted genes and FLAG-tagged WNT3. Panel C: LN229 cells were co-transfected with the indicated HA-tagged constructs of 10 predicted genes and FLAG-tagged SHC2.

More »

Expand