Fig 1.
Multiview iterative random forests (MiRF) model.
The iRF algorithm has 3 main components: first, iterative reweighted RF with K times of iterations; second, generalized RIT that takes projected binary features from the last feature-weighted RF as input; and third, bagged stability scores that aggregate interactions that are prevalent in B bootstrapped samples. The 3 steps of bootstrapping indicated in blue. The model had 2 main outputs: important features with the highest Gini importance returned in at least 50% of bootstrapped replicates, and interactions between important features with stability scores > 0.5 were returned from bootstrapped replicates. The Multiview IRF model is composed of six models: Three models for OS include gene expression, proteomics, and integrated omics; and three for KPS include gene expression, proteomics, and integrated omics.
Fig 2.
MiRF model performance in feature prediction and interaction recovery.
a 95% CIs of mean AUPR curves for gene expression, proteomics, and integrated omics data used to predict important features in patients who lived for more than 2 years. b 95% CIs of mean AUROC curves for gene expression, proteomics, and integrated omics data used to recover interactions in patients who lived for more than 2 years. c 95% CIs of mean AUPR curves for gene expression, proteomics, and integrated omics data used to predict important features in patients who had KPS ≥ 80. d 95% CIs of mean AUROC curves for gene expression, proteomics, and integrated omics data used to recover interactions in patients who had KPS ≥ 80.
Table 1.
List of important features obtained by the MiRF model.
Fig 3.
Stability scores of recovered interactions from the MiRF model.
a Stability scores of recovered interactions for OS ≥ 2 years. b Stability scores of recovered interactions for QOL ≥ 80 KPS. Colors indicate interactions from different datasets. Blue indicates interactions between proteomics data, orange indicates interactions between gene expression data, and red indicates interactions between integrated omics.
Fig 4.
Functional annotation and interaction analysis.
a Clustered heatmaps illustrate the functional annotation obtained from the DAVID database grouped by the list molecular features obtained from MiRF for OS ≥ 2 years and QOL ≥ 80 KPS. Each cell reports the fold enrichment of each gene toward a specific annotation. Darker coloring is associated with larger values and vice versa. The coloring of clusters corresponds to the following annotations: red represents KEGG pathways and blue represents GO-BP annotations. b Pie chart represent distribution of proteins categorized based on functions relevant to overall survival or quality of life genes. c Protein–protein interactions from the STRING database for both OS ≥ 2 years and QOL ≥ 80 KPS, respectively. Colored nodes indicate query proteins and the first shell of interactors. Interactions include known interactions from curated databases and those that were experimentally determined; predicted interactions by gene neighborhood, gene fusion, and gene co-occurrence; and others by text mining, co-expression, and protein homology. All the different interactions can be recognized by the different colors. Colored nodes represent query proteins and the first shell of interactors; filled nodes are the 3D structure of known or predicted proteins.
Table 2.
a. Important features for OS ≥ 2 years.
b. Important features for QOL≥ 80 KPS.
Fig 5.
Regularized CPH regression model.
a1 and b1 Plots of the 10-fold cross-validated error rates show the optimal lambda value with the higher Harrell C-index and minimum error for both OS and overall signatures models, respectively. a2 and b2 Dot charts show β values of each feature. Values that are closer to zero are the least important features, and vice versa.
Table 3.
a. Univariate and Multivariate Cox Regression Analysis of OS ≥ 2 years.
b. Univariate and Multivariate Cox Regression Analysis of Overall signatures.
Fig 6.
a1 and a2 3D PCA scatter plots with their explained variance plots for OS and QOL signatures show two clusters for each model: blue for OS ≥ 2 years, green for OS ≤ 6 months and blue for KPS ≥ 80, green for KPS ≤ 60, respectively. b1 and b2 Heatmaps represent internal validation on the list of molecular features obtained from MiRF for OS ≥ 2 years and QOL ≥ 80 KPS. Each cell reports the average expression of all samples for each signature. Blue shades indicate high expression and red shades indicate low expression. c1 and c2 t-SNE plots show the effect of OS and overall signatures, respectively, in distinguishing OS ≥ 2 years and OS ≤ 6 months groups. Each plot has two clusters have distinct expression profile of OS and overall signatures. Red for OS ≥ 2 years, green for OS ≤ 6 months. d1 and d2 Boxplots represent the expression mean cutoffs for OS ≥ 2 years and OS ≤ 6 months of both OS and overall signatures, respectively. Red for OS ≥ 2 years, green for OS ≤ 6 months. A p-value of < 0.05 was considered significant.
Table 4.
Expression means cutoffs and external validation on CPTAC database.