Identification of an Efficient Gene Expression Panel for Glioblastoma Classification

doi:10.1371/journal.pone.0164649

Table 1.

Datasets used in this experiment.

More »

Expand

Fig 1.

Random Model-Based Panel Size Determination.

Figures show different average values of models made with random gene subsets of between 2 and 60. Each data point in A and C represent the average comparative accuracy of the 1000 random models on all 803 samples. The models themselves were built using the randomly selected genes trained with only the samples used in the Verhaak et al. study. In B and D, these figures are smoothed curves produced from fitting of the random data using local regression. By smoothing the curves, a more accurate guess can be made as to how much data is likely being added with larger gene subsets. For the purposes of this study we determined an appropriate cutoff to be when an additional gene adds less than 0.001 percent to average random model accuracy. The final values selected using this approach were 32 genes for RNA-seq data (A and B) and 48 (C and D) for the reduced Verhaak et al. classification. Both these cutoffs are marked with a dotted line connecting to the x axis.

More »

Expand

Fig 2.

Genetic Algorithm/Random Forest Flow Chart.

General description of our Genetic Algorithm/Random Forest approach used to select the best 48-gene classifier. A starting gene pool is refined by removing the least "fit" genes until a subset remains representing a local maximum based on the starting subsets of genes selected.

More »

Expand

Fig 3.

Kaplan-Meier Survival Curves for our Combined Cohort.

Kaplan-Meier survival curves for 537 patients from the Rembrandt and TCGA datasets, classified using the original Verhaak et al. ClaNC-based classification (left) and our reduced random forest classification based GBM48 panel described in this paper (right). The y-axis represents the proportion of surviving patients. Both classifications show a statistically significant difference between Proneural and the other subtypes according to the log-rank (p-values at the bottom of the figures). Our GBM48 panel shows more significant differences in expected clinical outcome.

More »

Expand

Table 2.

GBM48 Average Expression by Classification.

More »

Expand

Fig 4.

Heatmap Comparing GBM48 and Verhaak et al. Classifier.

Heatmap and hierarchical bi-clustering of all 840 Verhaak et al. genes (left) and the GBM48 panel genes (right). GBM48 gene names are listed on the right.

More »

Expand

Fig 5.

Multidimensional Scaling Plot Comparing GBM48 and Verhaak et al. Classifier.

Multidimensional scaling of 173 core TCGA samples based on 840 Verhaak et al. genes (left) and the GBM48 panel genes (right).

More »

Expand

Fig 6.

Experimentally Validated Gene Network in the GBM48 Panel.

Experimentally associated gene network from string-db with statistically significant gene set enrichment sets from G:profiler and GSEA. Two of the four Verhaak et al. subclassifications (Classical and Proneural) were over-represented in our experimentally associated gene set from our GBM48 panel. Five of the 12 genes in the network were biomarkers for the Classical Verhaak et al. subset and represented in green, four genes were biomarkers for Proneural and are represented in this figure in gold. Genes which were linked with the statistically significant over-enriched kinase binding and activity pathways are represented in this figure in grey for the genes described in the G:profiler database and black for the genes described in the GSEA database, representing seven and five genes from the total gene network respectively.

More »

Expand