^{ 1 }

^{¤}

^{*}

^{ 1 }

^{ 1 }

^{ 1 }

^{ 2 }

^{ 2 }

^{ 2 }

^{3}

^{ 1 }

KWK, VEV, and BV designed and performed experiments that formed the basis for the evaluation; NB, TA, DD, AT, and MAN developed the mathematical model; and NB, TA, DD, BV, and MAN wrote the paper.

¤ Current address: Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland

The authors have declared that no competing interests exist.

Cancer results from genetic alterations that disturb the normal cooperative behavior of cells. Recent high-throughput genomic studies of cancer cells have shown that the mutational landscape of cancer is complex and that individual cancers may evolve through mutations in as many as 20 different cancer-associated genes. We use data published by Sjöblom et al. (2006) to develop a new mathematical model for the somatic evolution of colorectal cancers. We employ the Wright-Fisher process for exploring the basic parameters of this evolutionary process and derive an analytical approximation for the expected waiting time to the cancer phenotype. Our results highlight the relative importance of selection over both the size of the cell population at risk and the mutation rate. The model predicts that the observed genetic diversity of cancer genomes can arise under a normal mutation rate if the average selective advantage per mutation is on the order of 1%. Increased mutation rates due to genetic instability would allow even smaller selective advantages during tumorigenesis. The complexity of cancer progression can be understood as the result of multiple sequential mutations, each of which has a relatively small but positive effect on net cell growth.

The current view of cancer is that tumorigenesis is due to the accumulation of mutations in oncogenes, tumor suppressor genes, and genetic instability genes [

Tumors arise from a process of replication, mutation, and selection through which a single cell acquires driver mutations which provide a fitness advantage by virtue of enhanced replication or resistance to apoptosis [

Genetic mutations can arise either due to errors during DNA replication or from exposure to genotoxic agents. The normal mutation rate due to replication errors is in the range of 10^{−10} to 10^{−9} per nucleotide per cell per division [

Mathematical modeling of carcinogenesis has had a rich history since its introduction more than 50 years ago [

The tumor data collected by Sjöblom et al. [^{6} cells (^{9} cells. Whether the whole population of cells is at risk for clonal expansion or whether a fraction of cells akin to stem cells drives growth of the adenoma is currently a subject of debate. This is important as cancer stem cells, as well as other factors such as geometric constraints on the architecture of the adenoma, may significantly reduce the effective population size and thereby impact the waiting time to cancer [

The adenoma grows from a population of 10^{6} to 10^{9} cells which accumulate mutations that drive phenotypic changes seen in cancer cells. Blue circles symbolize adenoma cells prior to accumulating the additional mutations that are the subject of modeling, green indicates cells that have acquired additional, but an insufficient number of mutations for malignancy, and red indicates cells with the number of mutations required for the cancer phenotype.

We use the Wright-Fisher process [

The mutation data are represented in a binary matrix of size 35 × 78, whose rows correspond to 35 tumor samples and whose columns correspond to the 78 candidate cancer genes identified by Sjöblom et al. [

Matrix rows are indexed by tumors, columns are indexed by cancer-associated genes as identified by Sjöblom et al. (2006). Dark spots indicate mutated genes. Both tumors and genes have been sorted by an increasing number of mutations. The three genes mutated most often are

For the purpose of mathematical modeling of tumorigenesis, we consider the presence of an adenoma. Adenoma formation probably requires the appearance of mutations in one or a few genes (in particular, ^{9} cells, we can efficiently compute estimates of the time to the first appearance of any

The distribution of cells in the error classes _{0}, …, _{20} is displayed in a single simulation over a time period of 12 years after which the first cell harboring 20 mutations appears. The total population size (dashed line) grows exponentially from 10^{6} to 10^{9} cells in this time period. Each cell has 100 susceptible genes, all of which are of wild-type initially. We further assumed a mutation rate of 10^{−7} per gene, a 1% selective advantage per mutation, and a turnover of 1 cell division per cell per day. Each error class has an approximately Gaussian distribution (after a short initial phase), but the introduction of each new mutant is subject to stochastic fluctuations.

Within our model, the probability of developing cancer is equated with the probability of generating at least one

Simulation results are displayed for three different population sizes (10^{9}, solid lines; 10^{7}, dashed lines; 10^{5}, dotted lines), three different selection coefficients (10%, red lines; 1%, green lines; 0.1%, blue lines), and two different mutation rates (10^{−7}, top; 10^{−5}, bottom).

The simulations suggest that in a time frame of 5 to 15 years, cancer might develop in an adenoma of size 10^{7} to 10^{9} cells with a normal mutation rate of 10^{−7} per gene per cell division and a 1% selective advantage per mutation (^{−5} per gene per cell division would enable a smaller population of at-risk cells (10^{5} to 10^{7}) and a smaller selective advantage (0.1%) to reach the required number of mutations in the same time interval (

Each curve connects points in parameter space (^{−7} (solid lines) and 10^{−5} (dashed lines), respectively. Curves are labeled with the number

Based on the simulation results, we have derived an analytical approximation for the expected time to cancer. The key observation is that the distribution of error types follows a Gaussian (_{init} and _{fin} are the initial and final population sizes of the polyp, respectively (see Materials and Methods). The approximation is linear in _{k}

Research over the past three decades has shown that cancer is an acquired genetic disorder [

In our model, we assume that each subsequent mutation has the same incremental effect on the fitness of the cell. In general, however, the impact of a specific mutation on the phenotype of the cell will depend on the genetic background. Gene interactions, or epistasis, can be positive or negative, and they can impose constraints on the order in which mutations accumulate [

We have seen that a fitness increase of 1% per mutation may be enough for the Wright-Fisher model to generate dynamics that are consistent with the observed time scale of evolution from adenoma to carcinoma. However, genetic alterations associated with the initiation of colon cancer (such as those in

In another simplifying abstraction, we have defined the tumor cell by the accumulation of

These abstractions are important because all lesions begin with a small number of neoplastic cells. The simulations in

The large population size of 10^{9} cells would suggest that a purely deterministic approximation to the Wright-Fisher process is reasonable. It turns out, however, that the stochasticity associated with generating mutants of each new type has a strong impact on the evolutionary dynamics (see

Tumors derived from the same tissue exhibit considerable variability in their spectrum of mutations (

Most tissues in metazoans undergo turnover and are maintained by a population of tissue-specific stem cells that generally replicate at a slow rate and exhibit properties such as asymmetric division and immortal DNA strand co-segregation [^{7} crypts, each one maintained by a small number of stem cells [

Our model permits investigation of the impact of the relevant parameters of tumor evolution on a global scale. These parameters include the size of the population at risk, the mutation rate, and the fitness advantage conferred by specific mutations (

Finally, this model helps answer several questions about colorectal tumorigenesis that have long perplexed researchers and clinicians. Why is there so much heterogeneity in the times required for tumor progression among different patients? Why is there so much heterogeneity in the sizes and development times of tumors even within individual patients, such as those with familial adenomatous polyposis, if they all have the same initiating

In our view, there is no reason to think that this model, or the data on which it was based, will be applicable only to colorectal cancers. Indeed, Sjöblom et al. [

After completion and submission of the manuscript, we have learnt about related work recently published or being published [

The collection of tumor data has been described in [

To test for dependencies between mutated genes, we calculated all 3,003 pairwise partial correlations between the 78 genes that were considered candidate drivers. Because the number of observed tumors is much smaller than the number of genes, we used the shrinkage method introduced in [

We initially consider a colonic adenoma composed of 10^{6} cells (∼1 mm^{3}) that is growing exponentially to reach a size of 10^{9} cells (∼1 cm^{3}). Serial radiological observations show that the growth of unresected colonic adenomas is well-approximated by an exponential function [^{6} to 10^{9} cells. We consider an evolving cell population of size ^{6}. Although 〈

Each cell is represented by its genotype, which is a binary string of length ^{9} cells, it is not feasible to track the fate of each of the possible 2^{100} mutants in computer simulations. However, we are interested in the first appearance of any _{j}_{i}_{i} / N

We use the discrete Wright-Fisher process rather than the continuous Moran process [

The large cell population size might suggest that one could consider a replicator equation in the limit as _{j}^{st}

To account for the stochastic fluctuations in the accumulation of ^{2}, and travels with velocity ^{2} (_{init} = _{fin} = _{k}_{init} to _{fin}. We will often restrict our attention to constant population sizes because of the equivalent waiting time in a constant population with effective size equal to the geometric mean of the initial and final population sizes.

Correlation coefficients have been computed from the 0/1 matrix displayed in

(5 KB PDF)

The waiting time _{k}^{−7}, right panels to an increased mutation rate of ^{−5}. Population sizes of 10^{5} (top panels), 10^{7} (middle panels), and 10^{9} (bottom panels) are considered. The selective advantage per mutation varies among 0.1 (red lines), 0.01 (green), 0.001 (cyan), and 0 (purple).

(11 KB PDF)

(152 KB PDF)

We are grateful to Tobias Sjöblom, Sian Jones, Jimmy Lin, Laura Wood, and Yoh Iwasa for helpful discussions.

adenomatous polyposis coli gene

Kirsten rat sarcoma 2 viral oncogene homolog

tumor protein 53