PRER: A patient representation with pairwise relative expression of proteins on biological networks

doi:10.1371/journal.pcbi.1008998

Fig 1.

Illustration to show how the PRER representation is obtained for a single source node, node B.

The nodes in the graph are proteins, edges exist if they interact in the PPI network. First, several random walks are generated that starts at node B as in [17]. These random walks are stored in W_B and used to define the neighborhood of B, N_B. Only the most frequently visited nodes are included in the set of neighbors of B. Then, the pairwise comparison of the neighborhood proteins in terms of their protein expression quantities is used to form a representation of the patient for node B and its neighborhood. The figure shows the features generated for a single protein. This procedure is repeated for all source proteins, and the resulting vectors are concatenated.

More »

Expand

Fig 2.

The pipeline for survival prediction.

The step that involves generating PRER is skipped when the experiment is run with the alternative method of individual expression values.

More »

Expand

Fig 3.

Comparison of RSF model performances that are trained with individual proteins and pairwise ranking representations for different cancer types.

The distribution is over 100 models trained that have different random train and test splits. The performances of the models that use the individual expression values as features (Individual) and PRER representation as features (PRER) are compared in each case.

More »

Expand

Table 1.

Win/Tie/Loss counts of PRER against competing methods.

PRER is compared against each model over 100 trained models, where each model is trained on a different train/test split. The comparisons are based on one-sided Wilcoxon signed rank test with BH multiple hypothesis test correction at the significance level of 0.05. The Hofree et al. method is the network propagation algorithm [7]. RRSF stands for reweighted random survival algorithm by Wang and Liu [13].

More »

Expand

Fig 4.

The variable importance of significant pairwise ranking representations for ovarian cancer.

More »

Expand

Fig 5.

PRER Network for ovarian adenocarcinoma.

Nodes represent proteins that appear in the top 50 pairwise ranking representations for ovarian cancer; each edge indicates that two proteins participate in a pairwise rank order feature together. For cases where the expression value pertains to the protein’s phosphorylated state, the ids include the phosphosite’s residue position and the amino acid type.

More »

Expand

Table 2.

The top PRER feature in each cancer type.

The relative expression level of this feature is found to be important in the RSF model. The gene symbols of the corresponding gene are listed. The letter P after the gene symbol indicates that this is the phosphorylated version of the protein. The type of phosphosite and its residue number is provided.

More »

Expand

Fig 6.

Age and sex adjusted Kaplan-Meier plots for a) KIRC and b) BLCA based on overall survival.

Number at risk denotes the number of patients at risk at a given time, and p-value is calculated with the log-rank test.

More »

Expand

Table 3.

Top-10 rank differentiated features in each cancer with PRER.

More »

Expand