Pathway Relevance Ranking for Tumor Samples through Network-Based Data Integration

doi:10.1371/journal.pone.0133503

Fig 1.

Global network construction.

(a) Conversion of binary data to a network representation. All continuous data are mapped to a binary representation with ‘1’ (colored squares) corresponding to a gene with a value deviating from normal for a particular sample. Each ‘1’ in the binary datasets is converted to an undirected link (solid line) between a gene node and a sample node. Prior knowledge, derived from public gene interaction repositories, is available in the form of undirected links (dashed grey line) between genes. Characters a-g correspond to gene IDs, S₁-S₃ represent sample IDs. (b) Construction of the global network. The network representations of the binary datasets and the prior knowledge network are merged to constitute a single comprehensive network representation. Gene nodes originating from the input datasets are connected to the corresponding gene in the prior knowledge interaction network (dashed yellow lines). (c) The resulting adjacency matrix representation of the undirected global network. For clarity, individual gene and sample identifiers are omitted. NET (grey) = genes from the prior knowledge interaction network, S (dark blue) = samples, EXP (green) = genes from the gene expression dataset, CNV (pink) = genes from the copy number dataset, MUT (light blue) = mutated genes, MET (orange) = methylated genes. (d) The similarity matrix derived from the adjacency matrix, indicating the parts of the similarity matrix that are relevant for the pathway ranking task.

More »

Expand

Fig 2.

Pathway relevance scoring.

Given a subset of the global similarity matrix (S_exp S_cnv, S_mut, S_met, see Fig 1) and a set of genes (a,b,d) constituting a pathway P, a score for each input dataset is calculated by first removing genes from S_exp S_cnv, S_mut, S_met that do not belong to the pathway and then taking the average of all remaining values in S_exp S_cnv, S_mut, S_met. This process is repeated for n randomly generated gene sets (with the same number of genes as the pathway P) yielding n scores for each input dataset. The random pathway scores are used to calculate a p-value for obtaining the pathway scores purely by chance. The resulting p-values are multiplied, resulting in a single aggregated pathway score.

More »

Expand

Fig 3.

The 20 highest ranking pathways for each of the four breast cancer subtypes.

The aggregate score assigned to each pathway can be decomposed into 4 probabilistic components. The contribution of each component to the total score is indicated in a different color bar: mRNA expression (dark blue), copy number (light blue), mutation (green) and methylation (yellow).

More »

Expand

Fig 4.

Pathway scores compared across breast cancer subtypes for a selection of pathways.

Dark blue = Basal-like, light blue = HER2, green = Luminal A and yellow = Luminal B.

More »

Expand

Fig 5.

The 20 highest ranking pathways for the two most extreme ovarian cancer survival-based subtypes.

The contribution of each component to the total score is indicated in a different color bar: mRNA expression (dark blue), copy number (light blue), mutation (green) and methylation (yellow).

More »

Expand

Fig 6.

Ratio of bad-outcome pathway scores and the corresponding good-outcome scores.

A ratio of ‘1’ indicates that the pathway scores equally high for patients in the bad-outcome group and patients in the good-outcome group. Values larger than 1 indicate higher pathway importance / activity for the bad-outcome group. Pathways shown are limited to the top-20 highest scoring pathways in the bad-outcome group.

More »

Expand