Fig 1.
Illustration of the multi-hit model.
A short genome with three (k = 3) possible 2-hit (h = 2) combinations of carcinogenic mutations. The 2-hit combinations are shown with yellow/green/purple shaded background. Somatic mutations are outlined in red. One (purple) of the three combinations occurs with five (m = 5) somatic mutations, resulting in carcinogenesis.
Fig 2.
Somatic mutations at diagnosis are weakly correlated with age at cancer diagnosis.
Pearson’s linear correlation between somatic mutations and age at diagnosis ranges from -0.2 to +0.2, except for kidney chromophobe for which there were only nine matched tumor and blood derived normal samples.
Fig 3.
Number of hits estimated by the multi-combination multi-hit model depends on the distribution of somatic mutations.
(a)-(c) Examples of three cancer types exhibiting distinct distributions and the predicted probability distribution for the optimal model, showing a corresponding difference in the number of hits. S1–S3 Figs show the distributions for the 17 cancer types with at least 200 samples.
Fig 4.
Graphical summary of estimated number of hits by cancer type.
Derived from the public domain image by M Haggstrom (2014).
Table 1.
Number of hits estimated by the multi-combination multi-hit model range from two-eight depending on cancer type.
For the 17 cancer types with at least 200 samples, the RMSD between the distribution of somatic mutations and the probability distribution for the optimal model is less than 2.2% (top section of the table). Number of hits estimated by this somatic mutations based model is in the same range as those estimated by previous models based on incidence by age (middle section). Calculation of 95% confidence interval is described in the SI. These averages may consist of a mix of different number of hits (hi and hj) as illustrated in the bottom section of the table.
Fig 5.
Estimated number of hits are moderately correlated to lifetime stem cell division.
Pearson’s linear coefficient = 0.522, suggesting that number of hits may depend on cellular growth characteristics of individual tissues. However, the 95% confidence interval = -0.29–0.90, indicating that the relationship may be coincidental. Estimates for lifetime stem cell divisions were from S1 Table of Ref. (29).
Fig 6.
Distribution of somatic mutations for breast invasive carcinoma (BRCA) is similar by subtype and stage.
The estimated number of hits is identical for subsets of BRCA samples by (a) subtype and (b) stage.