Discovering novel driver mutations from pan-cancer analysis of mutational and gene expression profiles

doi:10.1371/journal.pone.0242780

Fig 1.

Summary of approach.

In this research, we have identified novel driver mutations by computing the intersection of mutational and gene expression data, and later validated candidate driver mutations using literature mining and pathway analysis. This study pooled together mutational and gene expression data from three cancer types (breast, ovarian and prostate cancers) from TCGA datasets to demonstrate an unbiased approach for cancer-driver gene selection. a) Mutation and gene expression data are processed into mutation and expression matrices for integrative data analysis; b) Pre-selection of genes includes the exclusion of non-pathogenic variants, and an intersection of the remaining mutated genes in the three cancer types (TCGA datasets). c) The pre-selected genes are investigated for their effect on gene expression (as a measure of functionality) by performing differential gene expression analysis. d) The final genes are subjected to gene ontology and pathway enrichment for validation, and the same analysis is performed on patients.

More »

Expand

Fig 2.

Mutated genes of interest.

Circos plots showing the distribution, across the human genomes, of the 3700 pre-selected genes (inner circle) commonly mutated BRCA-US, OV-US, and PRAD-US cancer data sets, including COSMIC (orange) and non-COSMIC (green) genes (red); The second circle from the middle shows the 1537 cancer-causing candidate genes, with non-COSMIC genes in blue, and COSMIC genes in red labeled with their gene names.

More »

Expand

Fig 3.

Gene set enrichment & sequences analysis.

a) KEGG pathway enrichment for candidate genes, showing the number of genes with specific enrichment for the most enriched pathways; b) Disease signature enrichment showing gene enrichment in cancer-related conditions. c) Gene and protein length comparison between the candidate genes, COSMIC genes and non-COSMIC genes (Gene-length K-S test p-values: candidate vs. non-cancer genes = 0.0, COSMIC vs. non-cancer genes < 0.001; Protein-length p-values: candidate vs. non-cancer genes = 0.0, COSMIC vs. non-cancer genes < 0.001), d) Percentage of oncogenes (blue) and tumor suppressors (red), as defined by the 20/20 rule [3], in the different gene groups within each cancer type (Chi-square tests of results for candidate-genes vs non-COSMIC genes, and COSMIC genes vs non-COSMIC genes: all p-values < 0.001 for all cancer types for both oncogene and tumor-suppressor classifications).

More »

Expand

Fig 4.

Driver gene discovery in patients with no alterations in COSMIC genes.

a-c Oncoplots for 4 significant driver genes discovered in patients with no alterations in COSMIC genes. Oncoplots shown for each gene in our complete datasets (all patients). d) Showing the proportion of genes which experience changes in their expression levels when the four specified genes are mutated in each of the three cancer types–showing both under-expression and over-expression effects. e) Showing the classification as oncogene or tumor suppressor of the four genes in each of our three cancer types.

More »

Expand