Ancient genes establish stress-induced mutation as a hallmark of cancer

doi:10.1371/journal.pone.0176258

Fig 1.

Younger genes are mutated more frequently in both normal and cancer.

The Enrichment Ratio is the observed rate of mutation of a gene (in mutations per base-pair) over the expected value according to the null hypothesis of uniform random mutations. We categorized genes in three main age groups, corresponding to post-metazoan (less than 500 MY), metazoan (between 500 and 1000 MY) and pre-metazoan (more than 1000 MY) ages and produced the distribution of Enrichment Ratio for each group. Genes younger than 500 MY old are mutated significantly more frequently in both normal (A) and cancer (B). Also, the frequency of mutation declines as the age of the gene increases. P-values in each case are taken as the maximum between the p-value given by a Tukey's range test between the three groups and a pair-wise t-test comparison.

More »

Expand

Fig 2.

Cancer displays a distinct mutational pattern relative to normal based on the evolutionary age of genes.

For each human gene, the expected number of mutations is obtained based on the normal mutation pattern: frequency of normal mutations times the total number of cancer mutations recorded in the data set. According to this, the Enrichment Ratio (ER) is calculated as the ratio of observed cancer mutations and the number of expected mutations in the gene. Over-mutated genes have ER > 1.5; under mutated genes have ER < -1.5. Numbers in legend indicate the size of each gene set. Cross marks (X) on bars tips indicate the enrichment in that category is statistically significant at p < 0.01 according to a bootstrap test taking random samples from the set of all human genes (BSQ < 1%, see Table 1). Boxplots in lower panel show distribution quartiles; black vertical lines are medians, yellow diamonds are means and black dots are outliers.

More »

Expand

Table 1.

Enrichment score for gene age bins of 500 million years for both COSMIC and OMIM genes.

More »

Expand

Fig 3.

Genes causally implicated in cancer are under-represented among young (<500 MY) genes.

(A) Age distribution of dominant (green) and recessive (orange) genes from COSMIC Cancer Gene Census. Grey bars represent the age distribution of all human genes in ENSEMBL, and blue the age distribution of all COSMIC genes. Numbers in legend are the sizes of each gene set. Cross marks (X) on bars tips indicate the enrichment in that category is statistically significant according to Gene Enrichment Score method and a bootstrap test (BSQ < 1%, see Table 1). Accordingly, the second hypothesis predicts that the blue bars should skew to the left with enrichment in both <500 MY and 500–1000 MY. This is not observed. The under-representation of very young genes (less than 500 MY) and the over representation of dominant genes between 500 and 1500 MY are statistically significant. The distinct right skew of the recessive set is also statistically significant (t-test for the difference of the mean with all other sets has p<0.01). This implies that recessive genes are older than expected from random sampling. (B) Similar age distributions for single-gene Mendelian disorders, from the Online Mendelian Inheritance in Man database (OMIM). The general pattern in gene age distributions between dominant and recessive phenotypes observed in cancer, particularly the recessive gene skewness towards old ages and the under-representation of very young genes, is replicated in these gene sets. No notable overrepresentation of dominant genes at moderate ages is detected in this case. The enrichment of cancer genes in such age range is likely associated to breakdown of regulation functions that evolved during the emergence of multicellularity.

More »

Expand

Fig 4.

Functional enrichment network of recessive COSMIC cancer genes highlights DNA repair and cell cycle control.

Each node in this network represents a group of functionally related genes as returned in DAVID (gene ontology, orthology, functional annotations, etc.). The size of the node represents the number of genes in it. Links between nodes represent gene overlaps between groups, with the width representing the number of genes. Node colors indicate the general functional categories defined in the legend revealing an additional layer of clustering of gene groups. The number in the node indicates the group label as given in S2 Table. Further details of these enrichments for each node are elaborated in S2 Table. For convenience, only nodes with p < 0.002 and FDR < 0.05 are plotted.

More »

Expand

Fig 5.

The genomic distribution of SNV clustering differs between normal and cancer.

Circos plot showing distribution of SNV clustering for chromosomes 1, 3, 13 and 17. Tracks from inside out are: blue, evolutionarily re-used breakpoint regions (EBR); green, amniote homologous synteny regions (mHSB); orange, hot spots of CM clusters in normal; and red, hot spots of CM clusters in cancer. Outside text track are symbols for COSMIC genes in their corresponding genomic locations. Dominant genes are in black fonts and recessive genes are in red font.

More »

Expand

Table 2.

Association of gene age and COSMIC gene status with evolutionarily important regions for genome rearrangement.

More »

Expand

Table 3.

Co-localization of cluster hotspots with evolutionarily important regions for genome rearrangement in normal peripheral blood.

More »

Expand

Table 4.

Overlap of COSMIC genes with cluster hotspots (i.e. clustering of clusters) in both normal peripheral blood and tumors based on clusters that overlap genes.

More »

Expand

Table 5.

Co-localization of cluster hotspots with evolutionarily important regions for genome rearrangement in cancer genomes.

More »

Expand

Fig 6.

Mutational pattern in young genes is characterized by hot-spotting.

(A) Age distribution of all genes mutated in normal samples data (blue), genes that have neutral level of mutation, as expected from a uniform random distribution (green) and genes in hotspots (orange). Grey bars represent the age distribution of all human genes. Numbers in legend are the sizes of each gene set. Cross marks (X) indicate the enrichment in that category is statistically significant according to a bootstrap test (BSQ < 1%, see Table 1). Boxplots in lower panel show distribution quartiles; black vertical lines are medians, yellow diamonds are means and black dots are outliers. (B) Equivalent plots for cancer data (ICGC release 19). In both plots when we observe the age distribution of genes involved in hotspots (orange), a very large proportion of them are very young (less than 500 MY). This suggests that the mutational activity that produces hot-spotting in the genome is preferentially hitting younger genes in spite of the fact that they are generally under-represented in the sets of all observed mutations.

More »

Expand