Molecular signature to predict quality of life and survival with glioblastoma using Multiview omics model

doi:10.1371/journal.pone.0287448

Fig 1.

Multiview iterative random forests (MiRF) model.

The iRF algorithm has 3 main components: first, iterative reweighted RF with K times of iterations; second, generalized RIT that takes projected binary features from the last feature-weighted RF as input; and third, bagged stability scores that aggregate interactions that are prevalent in B bootstrapped samples. The 3 steps of bootstrapping indicated in blue. The model had 2 main outputs: important features with the highest Gini importance returned in at least 50% of bootstrapped replicates, and interactions between important features with stability scores > 0.5 were returned from bootstrapped replicates. The Multiview IRF model is composed of six models: Three models for OS include gene expression, proteomics, and integrated omics; and three for KPS include gene expression, proteomics, and integrated omics.

More »

Expand

Fig 2.

MiRF model performance in feature prediction and interaction recovery.

a 95% CIs of mean AUPR curves for gene expression, proteomics, and integrated omics data used to predict important features in patients who lived for more than 2 years. b 95% CIs of mean AUROC curves for gene expression, proteomics, and integrated omics data used to recover interactions in patients who lived for more than 2 years. c 95% CIs of mean AUPR curves for gene expression, proteomics, and integrated omics data used to predict important features in patients who had KPS ≥ 80. d 95% CIs of mean AUROC curves for gene expression, proteomics, and integrated omics data used to recover interactions in patients who had KPS ≥ 80.

More »

Expand

Table 1.

List of important features obtained by the MiRF model.

More »

Expand

Fig 3.

Stability scores of recovered interactions from the MiRF model.

a Stability scores of recovered interactions for OS ≥ 2 years. b Stability scores of recovered interactions for QOL ≥ 80 KPS. Colors indicate interactions from different datasets. Blue indicates interactions between proteomics data, orange indicates interactions between gene expression data, and red indicates interactions between integrated omics.

More »

Expand

Fig 4.

Functional annotation and interaction analysis.

a Clustered heatmaps illustrate the functional annotation obtained from the DAVID database grouped by the list molecular features obtained from MiRF for OS ≥ 2 years and QOL ≥ 80 KPS. Each cell reports the fold enrichment of each gene toward a specific annotation. Darker coloring is associated with larger values and vice versa. The coloring of clusters corresponds to the following annotations: red represents KEGG pathways and blue represents GO-BP annotations. b Pie chart represent distribution of proteins categorized based on functions relevant to overall survival or quality of life genes. c Protein–protein interactions from the STRING database for both OS ≥ 2 years and QOL ≥ 80 KPS, respectively. Colored nodes indicate query proteins and the first shell of interactors. Interactions include known interactions from curated databases and those that were experimentally determined; predicted interactions by gene neighborhood, gene fusion, and gene co-occurrence; and others by text mining, co-expression, and protein homology. All the different interactions can be recognized by the different colors. Colored nodes represent query proteins and the first shell of interactors; filled nodes are the 3D structure of known or predicted proteins.

More »

Expand

Table 2.

a. Important features for OS ≥ 2 years.

b. Important features for QOL≥ 80 KPS.

More »

Expand

Fig 5.

Regularized CPH regression model.

a1 and b1 Plots of the 10-fold cross-validated error rates show the optimal lambda value with the higher Harrell C-index and minimum error for both OS and overall signatures models, respectively. a2 and b2 Dot charts show β values of each feature. Values that are closer to zero are the least important features, and vice versa.

More »

Expand

Table 3.

a. Univariate and Multivariate Cox Regression Analysis of OS ≥ 2 years.

b. Univariate and Multivariate Cox Regression Analysis of Overall signatures.

More »

Expand

Fig 6.

MiRF signatures validation.

a1 and a2 3D PCA scatter plots with their explained variance plots for OS and QOL signatures show two clusters for each model: blue for OS ≥ 2 years, green for OS ≤ 6 months and blue for KPS ≥ 80, green for KPS ≤ 60, respectively. b1 and b2 Heatmaps represent internal validation on the list of molecular features obtained from MiRF for OS ≥ 2 years and QOL ≥ 80 KPS. Each cell reports the average expression of all samples for each signature. Blue shades indicate high expression and red shades indicate low expression. c1 and c2 t-SNE plots show the effect of OS and overall signatures, respectively, in distinguishing OS ≥ 2 years and OS ≤ 6 months groups. Each plot has two clusters have distinct expression profile of OS and overall signatures. Red for OS ≥ 2 years, green for OS ≤ 6 months. d1 and d2 Boxplots represent the expression mean cutoffs for OS ≥ 2 years and OS ≤ 6 months of both OS and overall signatures, respectively. Red for OS ≥ 2 years, green for OS ≤ 6 months. A p-value of < 0.05 was considered significant.

More »

Expand

Table 4.

Expression means cutoffs and external validation on CPTAC database.

More »

Expand