Sequential mutations in exponentially growing populations

Michael D. Nicholson; David Cheek; Tibor Antal

doi:10.1371/journal.pcbi.1011289

Abstract

Stochastic models of sequential mutation acquisition are widely used to quantify cancer and bacterial evolution. Across manifold scenarios, recurrent research questions are: how many cells are there with n alterations, and how long will it take for these cells to appear. For exponentially growing populations, these questions have been tackled only in special cases so far. Here, within a multitype branching process framework, we consider a general mutational path where mutations may be advantageous, neutral or deleterious. In the biologically relevant limiting regimes of large times and small mutation rates, we derive probability distributions for the number, and arrival time, of cells with n mutations. Surprisingly, the two quantities respectively follow Mittag-Leffler and logistic distributions regardless of n or the mutations’ selective effects. Our results provide a rapid method to assess how altering the fundamental division, death, and mutation rates impacts the arrival time, and number, of mutant cells. We highlight consequences for mutation rate inference in fluctuation assays.

Author summary

In settings such as bacterial infections and cancer, cellular populations grow exponentially. DNA mutations acquired during this growth can have profound effects, e.g. conferring drug resistance or faster tumour growth. In mathematical models of this fundamental process, considerable effort—spanning many decades—has been invested to understand the factors that control two key aspects of this process: how many cells exist with a set of mutations, and how long does it take for these cells to appear. In this paper, we consider these two aspects in a general mathematical framework. Surprisingly, for both quantities, we find universal probability distributions which are valid regardless of how many mutations we focus on, and what effect these mutations might have on the cells. The distributions are elegant and easy to work with, providing a computationally efficient alternative to intensive simulation-based approaches. We demonstrate the usefulness of our mathematical results by illustrating their consequences for bacterial experiments and cancer evolution.

Citation: Nicholson MD, Cheek D, Antal T (2023) Sequential mutations in exponentially growing populations. PLoS Comput Biol 19(7): e1011289. https://doi.org/10.1371/journal.pcbi.1011289

Editor: Jasmine Foo, University of Minnesota, UNITED STATES

Received: September 9, 2022; Accepted: June 21, 2023; Published: July 10, 2023

Copyright: © 2023 Nicholson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Code and data relating to this study can be found at www.github.com/MichaelDNicholson/accumulate_nmutations.

Funding: M.D.N is a cross-disciplinary post-doctoral fellow supported by funding from CRUK Brain Tumour Centre of Excellence Award (C157/A27589). The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: No competing interests to declare.

Introduction

To quantitatively characterise diseases, in settings such as cancer, and bacterial and viral infections, a concerted effort has been made to study evolutionary dynamics in exponentially expanding populations. Understanding the timescale of evolution is a key aspect of this research program which has proven useful in a diverse range of areas such as: measuring mutation rates [1], assessing the likelihood of therapy resistance developing [2–4], inferring the selective advantage of cancer driver events [5–7], and exploring the necessary steps in the metastatic process [8, 9]. The common theme within these works is that they use information about when a particular cell type arises within the population of interest. For a concrete example, whose roots lie in the celebrated work of Luria and Delbrück [1], if we imagine a growing colony of bacteria, we might wish to know how quickly a mutant bacterium will develop with a specific mutation that confers resistance to an antibiotic therapy.

The time until a cell type emerges, and expands to a detectable population size, depends on a variety of factors. Most obvious are the relevant mutation rates, however selection also plays an important role. For instance, if we start an experiment with an unmutated cell and wait for a cell with 2 mutations, a low division rate of cells with one mutation slows down this process. In the scenario of the sequential acquisition of driver alterations in cancer, with each mutation providing a selective advantage, Durrett and Moseley characterised the time to acquire n driver mutations [10]. We recently examined the setting of drug resistance conferring mutations, which often have a deleterious effect, so that the original cell type grew the fastest [11]. However, in general, the effects of mutation and selection on evolutionary timescales within exponentially growing populations remain unclear.

In this study we build upon the mathematical machinery developed in Refs. [10, 11] to investigate this question. We focus on the biologically relevant settings of large times and small mutation rates. Broad-ranging features of the cell number, and arrival time, of type n cells are highlighted—including universal simple distributions—and explicit expressions make the impact of mutation and selection clear.

Model

We consider a population of cells, where each cell can be associated with a given ‘type’ (for example ‘type 3’ might be cells with 3 particular mutations). Cells of type n divide, die, and mutate to a cell of type n + 1, at rates α_n, β_n and ν_n, with all cells behaving independently of each other. With (n) representing a type n cell and ⌀ symbolising a dead cell, our cell level dynamics can be represented as (see also Fig 1A): (1) In other words after a random, exponentially distributed waiting time with parameter α_n + β_n + ν_n, a type n cell is replaced by one of the listed three options with probability proportional to its corresponding rate. The process starts with a single cell of type 1 at time t = 0, and we assume that the type 1 population is supercritical (α₁ > β₁) and that it survives forever (does not undergo stochastic extinction).

Download:

Fig 1. Model schematic.

A: We consider a multitype branching process in which cells can divide, die, or mutate to a new type. B: We study the waiting time until a cell of the nth type exists, τ_n, starting with a single cell of type 1. C: Stochastic simulation of the number of cells over time, with dashed lines indicating the large-time trajectories given by Eq (1). Grey horizontal line occurs at the inverse of the mutation rate, while the grey vertical lines indicate the time at which the type n population size reaches the inverse of the mutation rate, which gives the arrival time of the type n + 1 cells to leading order. Parameters: α₁ = α₃ = 1.1, α₂ = 1, β₁ = 0.8, β₂ = 0.9, β₃ = 0.5, ν₁ = ν₂ = 0.01. Thus, the net growth rates are λ₁ = 0.3, λ₂ = 0.1, λ₃ = 0.6 and the running-max fitness follows δ₁ = δ₂ = λ₁, δ₃ = λ₃.

https://doi.org/10.1371/journal.pcbi.1011289.g001

We focus on two quantities; the number of cells of type n at time t—denoted Z_n(t), and the arrival time of the first type n cell—termed τ_n (see Fig 1B and 1C). To describe the growth of the cellular populations, let the net growth rate of the type n cells be λ_n = α_n − β_n. We denote the ‘running-max’ fitness, which is the largest growth rate of the cell types among 1, …, n, as δ_n, that is δ_n = max_i=1,…,n λ_i. Further, we introduce r_n as the number of times the running-max has been attained over the cell types up to n, that is r_n = #{i = 1, …, n : λ_i = δ_n}. A summary of the key notation used in this article is provided in Table 1.

Download:

Table 1. Key notation used throughout this article.

https://doi.org/10.1371/journal.pcbi.1011289.t001

Motivation

Our model considers a linear evolutionary path of cells sequentially mutating from type 1 to 2 to 3, and so on (see Figs 1A and 2). We briefly highlight scenarios for which our model is relevant, drawing on examples from cancer evolution (although similar statements can be made for other exponentially growing populations).

Download:

Fig 2. Comparison with prior work and motivating examples.

A. Previous work has considered special cases of growth rate sequences, here we consider general sequences as long as λ₁ > 0. B. Two biological scenarios in which the growth rate sequences covered in this paper are relevant: the acquisition of driver mutations in the canonical carcinogenesis pathway of colorectal cancer, and the accumulation of neoantigens by cancer cells which results in increased cell death due to immune system surveillance.

https://doi.org/10.1371/journal.pcbi.1011289.g002

Cancer cells accumulate mutations with a variety of phenotypic effects during the cancer’s expansion. Oncogenic driver mutations are thought to increase the population’s net growth rate, either by increasing the proliferation rate or decreasing the death rate. A linear path is relevant when considering cancers that follow a specified evolutionary trajectory. For example, the canonical mutational path [12, 13] in colorectal cancer is loss of APC (type 1 cells), followed by a KRAS mutation (type 2 cells have mutations in both genes), then loss of TP53 (type 3 cells with mutations in all 3 genes); see Fig 2B.

When the cancer evolutionary trajectory is not specified, but it is assumed that driver mutations arise at a constant rate such that each new mutation confers a constant 1 + s_d fold increase in the proliferation rate, then this model also falls within our framework. Bozic et al. [5] applied this model to cancer genetic data, thereby inferring the selective effect s_d of driver mutations. Conversely to oncogenic drivers, neoantigen-creating mutations that stimulate the immune system to attack cancer cells have been modelled as increasing the death rate of the mutated cells by a factor of 1 + s_n [14] (Fig 2B). Lakatos et al. [14] used this model to examine conditions such that a population of neoantigen-presenting cancer cells would be sufficiently large to be observed in sequencing data in order to explore the limits of detecting immune-mediated negative selection. Exploring how the distribution of the cell number with k neoantigens varies as function of s_n and the neoantigen-mutation rate can be rapidly assessed with the results below.

For a more general model that describes a population with the potential to traverse multiple evolutionary paths, genotype space can be represented as a directed graph. When the original cell type has the largest net growth rate, we recently derived simple formulas for the arrival time and cell number through the directed graph of genotypes [11]. The results presented below, where the cell type with the largest net growth rate is unconstrained, hold only for a linear path through a genotype space. While in this work we cannot compare arbitrary sets of paths to a target evolutionary genotype, one may focus on each evolutionary path to the target type separately as a single linear path and then compare the median time to traverse each evolutionary path using the results presented below. For example, two sets of driver mutations might be considered: mini-drivers which have a high mutation rate, but low selective advantage, and major-drivers which have a low mutation rate but large selective advantage [15]. We would then compare the median times of the evolutionary paths ‘Driver 1 → Mini-driver → Driver 3’ and ‘Driver 1 → Major-driver → Driver 3’ to determine which path is most likely to produce the first cell with three driver mutations.

The cancer evolution examples discussed above all assume that the type 1 cell has a driver mutation. In other settings, it may be more natural to consider the type 1 cells as wild type, for example when considering the emergence of drug resistance. We emphasise that in this paper the type one cells are always supercritical, that is they grow exponentially on average.

Results

Our results are broken into three sections. We first give an overview of our main mathematical results, stratified by whether they relate to the number of type n cells or to their arrival time. We then highlight the main properties of the results as well as providing intuitive arguments for why these properties emerge. Finally, we compare our results to previously known special cases.

Results overview

Population sizes.

Understanding the distribution of the number of cells of type n at a fixed time t (e.g. the probability that 5 cells exist of type 2 at time 2) can be complex [16], however a surprising level of simplicity emerges at large times with small mutation rates. The number of cells of type n can be decomposed into the product of a time-independent random variable and a simple time-dependent deterministic function controlled by the running-max fitness δ_n, and the number of times it has been attained r_n up to type n: (1) The random variable V_n has a Mittag-Leffler distribution with tail parameter λ₁/δ_n, and scale parameter ω_n. Its density has a particularly simple Laplace transform The parameter ω_n may be computed by the following recurrence relations: setting ω₁ = α₁/λ₁, then for n ≥ 1, (2) where Notably, when type 1 has the maximal growth rate of all types up to type n, that is δ_n = λ₁, the Mittag-Leffler distribution collapses to an exponential distribution with mean ω_n. Stochastic simulations of the scaled number of type n cells for large times, , which according to Eq (1) is Mittag-Leffler distributed, are compared with theory in Fig 3.

Download:

Fig 3. Comparison of limiting Mittag-Leffler distribution for the number of type n cells with stochastic simulations.

Eq (1), states that for large times and small mutation rates, the scaled number of type n cells, , is approximately Mittag-Leffler distributed with scale ω_n and tail λ₁/δ_n. Here, we compare simulations of the scaled number of type n divided by ω_n, to the density of V_n/ω_n which is Mittag-Leffler with scale parameter 1, and tail parameter λ₁/δ_n ∈ (0, 1]. We chose three tail parameter values λ₁/δ_n = 0.25, 0.5, 1.0, and these curves are depicted with solid lines. The simulation parameter were always α₁ = 1.2, β₁ = 0.2, ν₁ = 0.01, β₂ = 0.3 and for n = 2 types sim 1: α₂ = 4.3, t = 5; sim 2: α₂ = 2.3, t = 7; sim 3: α₂ = 1.0, t = 12. Then for n = 3 types sim 4: as in sim 3 plus α₃ = 2.4, β₃ = 0.4, ν₃ = 0.001, t = 12. Density lines were created in Mathematica using x^γ−1MittagLefflerE[γ, γ, −x^γ].

https://doi.org/10.1371/journal.pcbi.1011289.g003

The variable V_n/ω_n is a single parameter Mittag-Leffler random variable with scale parameter one, and tail parameter γ = λ₁/δ_n. For γ = 1 its density is simply e^−x, and hence V_n/ω_n has mean 1, while for γ < 1 the density has a x^γ−1 singularity at the origin and a x^−γ−1 tail, thus V_n/ω_n has infinite mean. A further property is that, when the running-max fitness does not increase between n and n + 1, the random variables V_n and V_n+1 are equal up to a constant factor (perfectly correlated), i.e. with probability 1 (3) However, in the case δ_n < λ_n+1, such simple rules do not apply.

In general, the equation for asymptotic growth (1) together with the formulas for ω_n in (2) enables us to easily answer questions about the population of different cell types. One might ask, for example, whether the number of cells of type n is greater than a given size k and how the growth rates and mutation rates in the system influence this; this problem can be approached using Numerically evaluating the resulting distribution function is standard in scientific software (e.g. using the Mittag-Leffler package in R [17]).

Arrival times.

Similarly to the population sizes, the exact distribution of the arrival time is analytically intractable outside of the simplest settings. For example, the exact probability that type 3 cells arrive by time t is given in Ref. [18] and requires the evaluation of 4 hypergeometric functions. However, when the mutation rates are small simplicity again emerges; the time until the appearance of the first type n + 1 cell, τ_n+1, has approximately a logistic distribution (4) with scale given by and median given by (5) where ω_n is the scale parameter defined in (2). Comparisons of the limiting logistic distribution with simulations are shown in Fig 4, with further simulations provided in the supplementary figure S1 Fig. The population initiated by the first cell of type n + 1 could go extinct, and so we might wish to instead consider the waiting time until the first type n + 1 cell whose lineage survives. All lineages of type n + 1 will eventually go extinct unless λ_n+1 > 0. If λ_n+1 > 0 then the results given above hold also for the arrival time of the first surviving lineage if we replace ν_n by ν_nλ_n+1/α_n+1.

Download:

Fig 4. Comparison of limiting logistic distribution for arrival times with stochastic simulations.

Normalized histogram for the arrival times of types 1–3 obtained from 1000 simulations of the exact model versus the probability density corresponding to the logistic distribution of Eq (4). Note the shape of the distribution remains unchanged. Parameters: α₁ = α₃ = 1, α₂ = 1.4, ν₁ = ν₂ = ν₃ = 0.01, β₁ = β₂ = 0.3, β₃ = 1.5.

https://doi.org/10.1371/journal.pcbi.1011289.g004

For the case where each running-max fitness is attained only by one type (r_i = 1 for each i) then the medians satisfy the following recursion: with (6) then for n ≥ 2 7) where c_n is defined immediately after Eq (2). If the running-max fitness may be obtained multiple times, then a more detailed recursion also exists, given as Lemma 6 in Methods. Note that since the distribution in Eq (4) is symmetric, the median and the mean coincide.

Properties of the results

Population sizes.

From Eq (1), we see that on a logarithmic scale (as in Fig 1C), at large times the number of cells approximately follows a straight line with gradient that increases only when the running-max fitness increases. When the running-max fitness does increase (δ_n−1 < λ_n), then the type n cell number grows exponentially with rate λ_n. Conversely, if the type n cells have net growth rate smaller than the running-max fitness (δ_n−1 > λ_n), then as the large time behaviour of the type n cell number is exponential growth with rate δ_n−1 = δ_n, the flux from the type n − 1 population eventually drives the cell growth. One can observe this behaviour in Fig 1C: although the type 2 cells have lower fitness than type 1, the population sizes both eventually grow at the same rate of λ₁. However, the type 3 cells have the largest fitness so far, hence the cell number grows at its own rate λ₃. When the type n cells have net growth rate equal to the running-max fitness (δ_n−1 = λ_n), relevant for a neutral mutations scenario, then exponential growth at rate δ_n occurs but with an additional geometric factor of . The origin of this geometric factor is best understood by considering the mean growth for n = 2, λ₁ = λ₂ [19]. In this case mutations occur at rate proportional to and the average number of descendants from a mutation which occurs at time s is by time t. Hence, at time t, the mean number of mutants is , which is the same geometric factor that appeared as for the limit result Eq (1). Extending this argument to type n explains the geometric factor.

The random amplitude of the deterministic growth, V_n, has a Mittag-Leffler distribution, with infinite mean if λ₁ < δ_n, which is driven by a power-law decay in its distribution. Intuition for the tails can be gleaned from the case of n = 2 [19]. In the λ₁ < λ₂ case, the power-law tail arises due to rare, early mutations from the type 1 cells. The descendants of these early mutations make a considerable contribution to the total number of type 2 cells even at large times (see discussion of Theorem 3.2 in [19]). However, for λ₁ ≥ λ₂, the type 2 descendants from any given mutation eventually make up zero proportion of the type 2 population. Instead, the sheer number of new mutations from the type 1 cells drives the growth of the type 2 population, and in this case the tail decays exponentially. To move to type n, from Eq (3) we see that if δ_n ≥ λ_n+1 then the randomness in the cell number is inherited from type n to type n + 1. Thus if the running-max fitness does not exceed the growth rate of the type 1 population, that is if δ₁ = δ_n, then an exponential distribution will be propagated, i.e. all follow an exponential distribution. However, if the running-max fitness does increase, then for the first i such that δ_i < λ_i+1, a power-law tail will emerge for V_i+1. For types that occur after the emergence of the power-law, that is for j > i + 1, if the running-max fitness does not increase then the power-law with tail-exponent λ₁/λ_i+1 will be propagated, again due to the inheritance property of Eq (3). If instead the running-max fitness increases again, i.e. there is j > i + 1 such that λ_i+1 < λ_j, then the power-law tail remains but with the exponent decreased to λ₁/λ_j. Thus, if the running-max fitness ever rises above δ₁, the tail of the random amplitude has a power-law decay with a monotone decreasing exponent λ₁/δ_n.

Our approximation (1) for the cell number of the type n cells is valid for large times. Additionally, small mutation rates are required when the running-max fitness increases, so λ₁ < δ_n. Heuristically, we expect the approximation to be valid at large enough times such that the type n cells have been seeded with high probability, that is for . Around the arrival time for the type n cells, , fluctuations in the cell number can be greater, which can be seen even in the two-type setting. In the two-type neutral case (λ₁ = λ₂), from Eq (1) we expect that, for , where V₂ is exponentially distributed, and therefore has an exponentially decaying tail. However, for (or ), it is known that Z₂(t) has a heavy-tailed distribution, commonly known as the Luria-Delbrück distribution [19–21]. On the other hand, for λ₁ < λ₂, we found that V₂ does have a power-law heavy-tail as for the Luria-Delbrück distribution. Therefore, at times around the arrival time for type n cells, the fluctuations in cell number may exceed the characterisation given in Eq (1), but at larger times they are described by the Mittag-Leffler random variable V_n. We also note that, in the scale parameter recursion of Eq 2, when mutations are mildly deleterious (0 < δ_n − λ_n+1 ≪ 1), the scale parameter can take large values. Therefore, caution should be adopted when using our approximation in this case.

Arrival times.

The arrival time density has a general shape centred at (Fig 4). As expected, the median arrival time increases with n or as the mutation rates decreases, and the recursion of Eq 7 explicitly details how these parameters interact. In contrast, the variance of the arrival time is always . Moreover, the entire shape of the distribution, which is centered around , is determined only by λ₁. Thus due to the constant variance, for , modellers may safely ignore the stochastic nature of waiting times and treat the arrival time of the type n cells as deterministic. However, our result raises questions for statistical identifiability; aiming to distinguish between models, e.g. does a phenotype of interest require 2 or 3 mutations, based on fluctuations may be difficult due to the common logistic distribution.

The formulas for the arrival times (7) are valid for small mutation rates, and to leading order the increase in the median arrival time for each new type (i.e. ) is . An intuitive understanding can be gained by assuming that: (i) the arrival time for the type n + 1 cells approximately occurs when the type n population size reaches 1/ν_n and (ii) we can ignore fluctuations in population size such that the type n population grows exponentially as in the deterministic factor of Eq (1). Then, for the case n = 1, we simply find as the time it takes an exponentially growing population to grow from one cell to 1/ν₁, that is we solve , which reproduces the leading order of Eq (6) as ν₁ → 0. Similarly, for the arrival times for type n + 1, suppose we start an exponential function at with net growth rate δ_n; this growth will take time to reach the threshold of from one cell. To leading order in small mutation rates, this reproduces the recursion of Eq (7).

Comparison with prior special cases.

Special cases of our results have been obtained previously. Durrett and Moseley [10] obtained the formulas for the arrival time in the special case λ₁ < λ₂ < ⋯ < λ_n in the context of accumulation of driver mutations in cancer, and the leading order was also derived in [5]. A key conclusion of [5, 10] follows directly from the representation of the difference in median arrival times given in Eq (7): Assuming a constant driver mutation rate (ν₁ = … = ν_n), the median waiting time between the nth and (n + 1)th driver mutation is approximately which decreases as a function of n. Hence, under this model, tumor evolution accelerates during its growth [5, 10]. For a comparison with the formulas of [10], note that in this case the running-max fitness for type j is always λ_j, that is δ_j = λ_j, and so r_j = 1 for all j. Further, the cell types in [10] are numbered from zero. Then the quantity as defined in this paper corresponds and agrees with c_θ,nμ_n of [10] (the formulas in [10] contain some misprints, but they are corrected in [22]). Durrett and Moseley [10] also pointed out that the shapes of the distributions of both the arrival time and the population size were independent of n. These distributions were also observed for the special case λ₁ > λ_i for 1 < i ≤ n in [11]; this case was studied under the motivation of mutations that confer drug resistance but at a fitness cost. In the present paper we have found that even for a general sequence of net growth rates the distribution shapes remain independent of n and their dependence on the rate parameters can be written in relatively simple terms.

An application: n-mutation fluctuation assays

Pairing mathematical models for the emergence of drug resistance during exponential population growth with experimental fluctuation assays enables the inference of mutation rates [1, 23]. In the classic fluctuation assay, replicates are initiated by a small number of drug sensitive cells, which are then grown for either a fixed time period or until the total population reaches a given size. The cells are then exposed to the drug, killing non-resistant cells, which allows the number of replicates without resistance, and the mutant number in those replicates with resistance, to be measured. These experimental quantities are then combined with an appropriate statistical model to infer the mutation rate of acquiring resistance [24]. Originally, only wild type and mutated cells were considered in fluctuation assays. However, including multiple types is required when assessing multidrug resistance, investigating resistant-intermediates such as persistor cells [25], or if multiple gene amplifications are needed for therapy resistance. Gene amplifications are a prevalent resistance mechanism in cancer [26] and amplification rates have been previously reported using fluctuation assays [27], under the standard assumption of a single mutational transition to resistance. However, the modelling assumption of a single mutation imbuing therapy tolerance may be invalid if multiple amplifications are required for resistance. For example, the drug resistant WB₂₀ rat epithelial cell line in Tlsty et al [27] contained 4 gene copies, compared to the wild type having only 1 copy of the resistance gene. In such settings, to meaningfully infer amplification rates, an inference framework that describes sequential mutation acquisition is needed. With our results such a modified inference scheme can be constructed.

For simplicity, and as is typical for mutation rate inference, assume mutations are modelled as neutral (λ₁ = λ₂ = …) and that mutations occur at rate ν (ν = ν₁ = ν₂ = …). Suppose k replicates of a fluctuation assay are performed and the number of replicates without resistance, and/or the distribution of mutant numbers over replicates is recorded (Fig 5A). If the mutation rate ν is known, the distribution of replicates without resistance is binomial with k trials and success probability given by the logistic distribution of Eq (4) (further details on inference methodology is given in the supplementary material S1 Text). In this setting the median arrival time of the (n + 1)th type is Hence, given the number of replicates without resistance, the unknown mutation rate ν may be inferred by maximum likelihood (p₀ method). Similarly, the mutant count distribution over replicates would be characterised by Eq (1), which in this setting take the simple form of with V_n an exponential random variable with mean Maximum likelihood for the mutant counts under this distribution provides a secondary approach to infer ν.

Download:

Fig 5. Statistical inference for an n-mutation fluctuation assay.

A. Schematic of a fluctuation assay for the measurement of mutation rates when n mutations are required for resistance. Drug sensitive cells are initially cultured, and after growth for a given time t, the cells are exposed to a selective medium. Non-resistant cells are killed, revealing the number of mutants. This experiment is conducted over replicates, and the number of replicates without resistance and the mutant numbers are recorded. B. Likelihood inference on a simulated fluctuation assay assuming: 2 mutations are required for resistance, 100 replicates, no death, α_i = 1 for each i, t = 10, and the mutation rate ν stated on the x-axis. Wide error bars are expected when using the p₀ method for as only a small number of replicates have no resistant cells; in such a setting using the mutant counts (right panel) provides superior inference. Likewise, if the approximation of Eq (1) is not appropriate, which explains the inaccurate inference for log₁₀(ν) = −3 when using the mutant counts; the p₀ method provides improved inference in this scenario.

https://doi.org/10.1371/journal.pcbi.1011289.g005

Fig 5B shows likelihood inference for the mutation rate using both approaches assuming 100 simulated replicates and that 2 mutations (e.g. amplifications) confer resistance. The two inference approaches have strengths and weaknesses depending on the underlying mutation rate and the time t for which the cells are grown before being exposed to the drug. If t is too large the majority, or all, replicates will have resistant cells, and hence the number without resistance carries limited information on the mutation rate (e.g. the wide error bars for log₁₀(ν) = −1.5 in the left plot of Fig 5B). Instead, the long-time limit approximation of the mutant count distribution, Eq (1), is appropriate, and here our simulated inference for the mutation rate closely matches the true parameter value (Fig 4B). However, if t isn’t large enough ( then Eq (1) poorly characterises the distribution of resistant cells (e.g. the incorrect inference for log₁₀(ν) = −3 in the right plot of Fig 5B); instead, the p₀ method enables accurate inference of the mutation rate. Hence, similar to the advice for the classic fluctuation assay [24], if only some replicates show resistance the p₀ method is preferred, whereas if all replicates have sizeable mutant numbers, inference using the mutant counts is advisable. Note that our inference here has assumed known birth rates and no death. These rates could be measured by standard experimental protocols, for example using growth curve assays. Kimmel and Axelrod [28] also gave statistical consideration to a fluctuation assay where two mutations are needed. However, in principle (neglecting experimental complexities), our results hold for any n, include death, and allow for varied growth rates between the cell types, extending the work of Ref. [28].

Discussion

Due to their simplicity and ability to model fundamental biology such as cell division, death, and mutation, multitype branching processes have become a standard tool for quantitative researchers investigating evolutionary dynamics in exponentially growing populations. Further, these models are able to link detailed microscopic molecular processes to explain macroscopic experimental, clinical, and epidemiological data [29, 30]. Despite the importance of this framework, even simple questions are often challenging to examine. Whilst numerical and simulation based methods have proven powerful for both model exploration and statistical inference, the computational expense of simulating to plausible scales can lead to challenges; e.g. simulating to tumour sizes orders of magnitude smaller than reality, which provides obstacles for biological interpretation of inferred parameters. Moreover, it is often unclear how to precisely summarise the manner in which a large number of parameters interact to influence quantities of interest, such as the time until a triply resistant cell emerges. In this study, we analysed the regimes of large times, and small mutation rates, in order to develop limiting formulas that can be used to quickly gain intuition or for approximate statistical inference

We have focused on the number, and arrival time, of cells with n mutations. While this problem dates back at least to the work of Luria and Delbrück—where a mutation resulted in phage resistant bacteria—specific instances of the problem are commonly used to study a variety of biological phenomena [3–5, 8, 9, 14, 24, 31–34]. The time of first mutation is well known, however the arrival time of cells with n alterations is unclear outside of specific fitness landscapes [10, 11]. Here, we developed approximations for the cell number and arrival time regardless of whether mutations increase, decrease, or have no effect on the growth rate of the cells carrying the alterations. We showed that, within relevant limiting regimes, the number of type n cells can be decoupled into the product of a deterministic time-dependent function and a time-independent Mittag-Leffler random variable; meanwhile the arrival time of type n cells follows a logistic distribution with a shape that depends only on the net growth of the type 1 cells. The features of these distributions, such as median arrival time, can be exactly mapped to the underlying model parameters, that is the division, death, and mutation rates. These results illuminate the effects of mutation and selection, and can be readily numerically evaluated to explore particular biological hypotheses. We highlighted the utility of our results on mutation rate inference in fluctuation assays.

As the biological processes studied become increasingly complex, so too will the mathematical models constructed to describe such processes. We hope that the results of the present paper will enable researchers to find simplicity in an arbitrarily complex parameter landscape for a fundamental class of mathematical models.

Methods

In this section we provide detailed results and proofs in their general form.

Branching process: Population growth

We first look to understand the number of cells of type n at time t, that is Z_n(t), at large times.

Proposition 1. Assume non-extinction of the type 1 population, that is that Z₁(t) > 0 for all t ≥ 0. Then, for each , there exists a (0, ∞)-valued random variable V_n such that almost surely.

As our branching process is reducible this result is not considered classical [35]. Heuristically, the result says that for large t, and so at large times all the stochasticity of Z_n(t) is bundled into the variable V_n.

Towards proving Proposition 1, we first consider a model of a deterministically growing population which seeds mutants as a Poisson process, the mutants growing as a branching process. The next result defines the model and describes the large-time number of mutants, generalising a result of [36].

Lemma 1. Let (f(t))_t≥0 be a non-negative cadlag function, x, δ > 0, and r ≥ 0, with Suppose that come from a Poisson process on [0, ∞) with intensity f(⋅). Suppose that (Y_i(t))_t≥0, , are i.i.d. birth-death branching processes initiating from a single cell, that is Y_i(0) = 1, with birth and death rates α and β. Let λ = α − β. Define Then almost surely. Here V is some positive random variable with mean .

Proof. We first give the argument assuming λ ≠ 0, and provide a comment at the end of the proof indicating modifications needed for the λ = 0 case.

First we claim that is a martingale with respect to the natural filtration. Indeed, for s ≤ t, as required.

Next we look to bound the second moment of M(t). To this end, observe that is a compound Poisson distribution which is a Poisson sum of i.i.d. random variables distributed as Y₁(t − ξ), where ξ is a [0, t]-valued random variable with density proportional to f (see, e.g., Section 2 of [36]). Using the already-known second moment for a birth-death branching process [37] (see Theorem 6.1 on page 103), we have that It follows that and since , we find that Therefore (8) where C, D and E are positive constants.

To conclude the proof, we will separately consider the three cases listed in the Lemma’s statement: δ < λ, δ = λ, and δ > λ.

We begin with the case δ < λ. Here the martingale M(t) has a bounded second moment. By the martingale convergence theorem, M(t) converges to some random variable V′ with mean zero. Rearranging the limit of M(t), almost surely, where the integral converges because the integrand has an exponentially decaying tail. The positivity of V can be seen by Fatou’s lemma: where the are i.i.d. random variables on [0,∞) that are each non-zero with positive probability [22, 35] (recall this case assumes that λ > δ > 0 so that each Y_i(⋅) is supercritical). Hence, with probability one at least one of the W_i is positive. This gives the result for δ < λ.

The second case is δ = λ. Here the second moment of M(t) is still bounded and so we can again apply the martingale convergence theorem to see that M(t) converges almost surely. It follows that converges to zero almost surely. Thus, using dominated convergence, is the almost sure limit of t^−r−1e^−δtZ(t).

The third and final case is δ > λ. This case requires a new perspective because the second moment of M(t) may not be bounded, disallowing the martingale convergence theorem. Instead we appeal to Borel-Cantelli. For ϵ > 0 and , consider the events Then by Doob’s martingale inequality and then Eq (8); here G and γ are positive numbers which do not depend on n. By Borel-Cantelli, the probability that only finitely many of occur is one. Equivalently, converges to zero almost surely. Thus, using dominated convergence, is the almost sure limit of t^−re^−δtZ(t).

For the case of λ = 0, minor modifications are required. Firstly, the second-moment has the form and hence with C′ a positive constant. When λ = 0, then δ > λ. Thus, the above bound should be used in the Borel-Cantelli centred argument, which leads to the same result.

We can now give the proof of Proposition 1 on the convergence of cell numbers.

Proof of Proposition 1. We prove the result by induction. Clearly it is true for n = 1. Now suppose that almost surely. Condition on the trajectory of Z_n(⋅), and apply Lemma 1 to see that almost surely.

Having proven that the cell numbers grow asymptotically as a deterministic function of time multiplied by a time-independent random amplitude V_n, our next aim is to determine the distribution of this random amplitude. We shall proceed via induction. To establish the base case we restate a classic result [22, 35]:

Lemma 2. The random variable V₁ from Proposition 1 has exponential distribution with parameter λ₁/α₁ = 1 − β₁/α₁.

Since the type n population seeds the type n + 1 population, one might expect that the random amplitudes V_n and V_n+1 of the two populations are related. The next result says that this is indeed the case for a part of parameter space—when the type n + 1 fitness is no greater than the fitnesses of previous types.

Corollary 1. Let n ≥ 1. If δ_n > λ_n+1 while for δ_n = λ_n+1 Proof. Immediate from Lemma 1.

Corollary 1 focuses on the case that the fitness of type n + 1 does not dominate the fitnesses of types 1 to n; here it says that the random amplitude V_n+1 is simply a constant multiple of V_n, meaning that the large-time stochasticity of the type n + 1 population size is perfectly inherited from the type n population. A special example is that type 1 has a larger fitness than all subsequent types, in which case V_n is a constant multiple of V₁ and thus all random amplitudes are exponentially distributed, recovering a result of [11]. Corollary 1 is also a generalisation of Theorem 3.2 parts 1 and 2 of [19] which provided the distribution of V₂ in terms of V₁.

The remaining region of parameter space—where a new type may have a fitness greater than the fitness of all previous types is our next focus. Here, contrasting with the region considered in Corollary 1, the random amplitudes seem to be rather complex. The distribution of V₂ takes an intricate form, which is calculated in [16] (Eq. 56) and we do not restate it here for brevity. The distribution of V_n for n > 2 apparently are unknown. We aim to find simple approximations for the V_n by introducing an approximate model.

Approximate model introduction

The exact distribution of the random amplitude V_n for a generic sequence of birth and death rates appears to be analytically intractable. Thus we look to approximate V_n in the limit of small mutation rates. Towards such an approximation, we choose to follow a method inspired by Durrett and Moseley [10] which simplifies calculations by introducing an approximate model. The approximate model is motivated by the following heuristic argument: mutations to create cells of type (n + 1) occur at rate ν_nZ_n(t); when the mutation rates are small it will take some time for the first cell of type (n + 1) to appear; at large times (Proposition 1); therefore for small mutation rates, mutations to create cells of type (n + 1) should occur at rate . We carefully define the approximate model momentarily, but briefly it arises by assuming the type (n + 1) arrive at rate and then letting the type (n + 1) cells follow the dynamics we’ve already been assuming.

Formally, we define the approximate model iteratively. We let be the size of the type n population at time t, set for t ≥ 0, and fix . Then, given , let be the times from a Poisson process with rate Then, we set (9) where the Y_n,i(⋅) are independent birth-death processes initiated from a single cell with birth and death rates α_n and β_n, and (10) We hypothesise but do not prove that the distribution of the random amplitudes and V_n for the approximate and original models respectively coincide in the limit of small mutation rates; this is known to be true in the two-type setting (Section 4.4 of [16]).

Approximate model: Population growth

First we have the counterpart to Proposition 1, clarifying that the approximate model is well defined.

Proposition 2. For n ≥ 1, there exists a (0, ∞)-valued random variable such that almost surely.

Proof. Identical to the proof of Proposition 1.

Analogously to Corollary 1 we can relate the random amplitudes of type n + 1 with that of type n for the approximate process—now we include also the case where type n has a larger growth rate than the type (n − 1) cells. We give the results at the level of the Laplace transform, as it turns out this function will dictate the distribution of the arrival times.

Corollary 2. Let n ≥ 1. Then where h_n(θ) is defined by where Φ is the Lerch transcendent function (see 25.14.1 in [38]).

Proof. For the cases of δ_n > λ_n+1 or δ_n = λ_n+1 we can appeal directly to Corollary 1.

For δ_n < λ_n+1, we expand upon the argument of Durrett and Moseley [10], who considered λ₁ < λ₂ < …. Let which is the Laplace transform for a linear birth-death process initiated with a single cell, at time t with division and death rates α_n, β_n. Note that when δ_n < λ_n+1, necessarily λ_n+1 > 0 as δ_n ≥ δ₁ > 0, due to the type 1 population being assumed supercritical. If we fix , then the arrivals to the type n + 1 population occur as a Poisson process, so by the definition of given in Eq (9), is a compound Poisson random variable. Generally, if we have a compound Poisson variable, defined by the sum of N ∼ Poisson(λ) i.i.d. random variables X_i, then its Laplace transform follows In our case, with fixed, is a Poisson sum of i.i.d. random variables distributed as Y₁(t − ξ), where ξ is a [0, t]-valued random variable with density proportional to (see, e.g., Section 2 of [36]). Applying this to we have (11) To obtain the limit of the integrand we use the well known result (see Ref. [10] Section 2) that if Y(⋅) is a linear birth-death process with division, and death rates α_n+1, β_n+1, initiated from a single cell (Y(0) = 1), and with ϕ_n+1 = λ_n+1/α_n+1, then as t → ∞, where B ∼ Bernoulli(ϕ_n+1), E ∼ Expo(ϕ_n+1), and both random variables are independent from each other. Hence its Laplace transform converges to Then as t → ∞. Using this and taking the t → ∞ limit over Eq 11 results in Let γ_n = δ_n/δ_n+1 and recall the Lerch transcendent has integral representation for , and (see 25.14.5 in [38]) which converges for . Upon the substitution t = λ_n+1s we see

Corollary 2 implies that (12) which means that the distribution of the random amplitude is possible to numerically evaluate. Such numerical computation for the approximate model is already a step beyond what we could do for the original model.

Recall that it was heuristically argued that the random amplitudes of the approximate and original models coincide in the limit of small mutation rates. Therefore the exact distribution of seen in (12) is not so much our interest as is its limit for small mutation rates. Our task for the remainder of this section is thus to take the small mutation rate limit of (12).

To state the limit we now introduce some notation.

Let (13) Then, writing ν = (ν₁, ν₂,..), we define (14) This function satisfies (15) Further let γ_n = δ_n/δ_n+1, and (16) Note that c_n as defined under Eq (2) is κ_n when δ_n < λ_n+1. Then, for small mutation rates, the distribution of may be related to :

Proposition 3.

Before proving this proposition we give two required lemmas in order to understand the limit behaviour of the function h_n(θ) (defined in Corollary 2). Recall the Lerch transcendant function appeared in the definition of h_n(θ), which motivates considering the following lemma.

Lemma 3. With Φ as the Lerch transcendent function with 0 < a < 1 and positive integer s, as z → − ∞ Proof. We first rewrite Φ in terms of the generalised hypergeometric function (see 16.2.1 in [38]) for positive integer s This identity can be readily verified from the definitions of these special functions. Then we use its integral representation (Eq. 16.5.1 at [38]) The integrand has poles at −a (where 0 < a < 1) and at all real integers due to the Gamma functions. The contour of integration separates the poles at −a and 0. From the residue theorem for z < 0 we can rewrite the integral as the sum of the residues coming from all poles on the left of the contour The first term on the right hand side is the contribution from the pole at −a, while the sum goes over the contributions from all other poles at −n = −1, −2…. The leading order term comes from the residue of closest pole to the origin at x = −a, which can be written as a finite sum of terms including powers of log −z. The leading order of these terms is

Before giving the next lemma we recall h_n for convenience Then the following lemma will be of use.

Lemma 4. With f_n as in Eq (13) and κ_n as in Eq (16), which implies that for δ_n > λ_n+1 for λ_n+1 = δ_n, while for δ_n < λ_n+1 Proof. Recall γ_n = δ_n/δ_n+1, ϕ_n+1 = λ_n+1/α_n+1. The lemma is clearly true by the definition of h_n(θ) for δ_n > λ_n+1 and δ_n = λ_n+1.

We turn to the case of δ_n < λ_n+1. For ease of notation we drop ‘n’ subscripts and introduce l_ν = log(ν⁻¹). From the definition of h(θ) in this case we see we require the limit of the Lerch transcendent for large first argument given in Lemma 3. Further, observe that for a ∈ [0, 1], sin aπ = sin(1 − a)π. Hence, as ν → 0, and so The ν factors outside of the logarithms immediately cancel, leaving the logarithmic factors. Collecting the logarithmic factors together, and recalling that Γ(r_n) = (r_n − 1)!, we have Notice that Hence This leaves as required.

We can now give the proof of Proposition 3:

Proof of Proposition 3. The base case is clear, we now argue by induction. We recall that Hence where the relation between and given in Eq (15) was used. Thus Using Lemma 4, we have Using the induction hypothesis

We remark that when λ_i+1 ≤ δ_i (a fitness increase does not occur), we are not required to take the limit above on ν_i—that is the statement of Proposition 3 is true without applying these limits.

Summarising thus far, we see has a Mittag-Leffler distribution with tail parameter δ₁/δ_n+1 and scale parameter Separating into a time-dependent component this implies that (17) with being Mittag-Leffler with tail parameter δ₁/δ_n+1 and scale parameter (18) If we consider the family of random variables then the scale parameters ω_n+1 satisfy the following recursion

Lemma 5. Set ω₁ = α₁/λ₁, then for n≥1, (19) where κ_n is defined in Eq (16).

Proof. By Eq (18), (20) We now demonstrate that multiplying ω_n as given above, by the factors stated in Lemma 5 results in ω_n+1 as expressed in Eq (18).

For the case of δ_n ≥ λ_n+1, κ_n is either (δ_n − λ_n+1)⁻¹ for δ_n > λ_n+1 or for δ_n = λ_n+1 (see the definition of κ_n in Eq (16)). Hence, comprising both the cases of δ_n > λ_n+1 and δ_n = λ_n+1, we desire to show ν_nκ_nω_n = ω_n+1. Using Eq (20) (21)

For δ_n ≥ λ_n+1, δ_n = δ_n+1. Moreover, (Eq (13)) and from Eq 15 Thus, taking Eq (21), replacing each δ_n with δ_n+1, and using the representation of , Recognising that leads us to the desired form of ω_n+1 as in Eq (18).

In the case of δ_n < λ_n+1 = δ_n+1, we aim to demonstrate that matches the expression for ω_n+1 given in Eq (18). Again, using Eq (20), (22) For δ_n < λ_n+1, (Eq (13)) and from Eq 15, which combined with Eq (22) brings us to the desired form of ω_n+1 as in Eq (18).

We summarise this approximate form of as a theorem, to emphasise that it is the culmination of the results in this section.

Theorem 1 For t large, and all ν_i small where is Mittag-Leffler distributed with tail parameter δ₁/δ_n+1 and scale parameter ω_n+1 which satisfies the recurrence of Lemma 5.

Arrival times

We now turn to the time at which the type n population arrives. Our limit results concerning this question are identical for both the original and approximate model, with only the parameters in the limit expressions changing. To avoid repeating results we introduce the superscript ∘, such that statements with variables with ∘ superscript are true for both models. Here, the first time a cell arrives of type n + 1 is

It turns out can be appropriately centered using the following variables (23) such that its distribution simplifies for small final seeding rates.

Proposition 4. As ν_n → 0, Proof of Proposition 4. We introduce so that m_n = σ_n − ρ_n. First let’s condition on Observe that can be expressed as As ν_n → 0 the first factor above converges to . The second factor may be expressed as which converges to as ν_n → 0. Hence .

Propositions 1 and 2 imply that for any realisation we may find small enough x such that for ν_n ≤ x which is integrable over (−∞, t]. Using dominated convergence we have the claimed result.

We know that with δ_n = λ₁, has an exponential distribution, and so the limit distribution for may be immediately obtained [11]. If there are fitness increases, we turn to our small mutation results for the approximate model.

For the remainder of this section we discuss only results for the approximate model. The below results also hold for the original branching processes if the running-max fitness does not increase, i.e. δ_n = λ₁.

Thus with as in Eq 14, and using Proposition 3, we see that:

Corollary 3. Proof. From Proposition 4 While from Proposition 3,

This implies that for small mutation rates Recall that and that by the definition of m_n, Hence Defining we see that has a logistic distribution with scale parameter and median (24) The median times satisfy the following recurrence:

Lemma 6. Set Then for n ≥ 2 (25) Proof. We start with λ_n < δ_n−1, in which case , and δ_n−1 = δ_n, r_n = r_n−1, thus For the case of λ_n = δ_n−1, then ω_n = ν_n−1ω_n−1/r_n−1 and δ_n = δ_n−1, r_n = r_n−1 + 1, thus Turning to the case of λ_n > δ_n−1, we have , or alternatively and we also have δ_n = λ_n and r_n = r_n−1. Similarly to before

We summarise this approximate distribution of as a theorem, to emphasise that it is the culmination of the results in this section.

Theorem 2. For t ≥ 0 and all ν_i small where the median times which satisfies the recurrence of Lemma 6.

Remark 1 In the above results we take the ordered limit for two technical reasons:

(i) In the proof of Proposition 4 we used the almost sure convergence of the scaled type n cell number, that is Proposition 2. As the type n populations’ growth is unaffected by the value of ν_n, no issues arise. However, the type n’s growth is affected by ν₁, …, ν_n−1, and so almost sure convergence of cell numbers would not hold when simultaneously sending these mutation rates to 0, thus invalidating our proof strategy.

(ii) We build our understanding of the limit random variable from the distribution of , as seen in Corollary 2. Small mutation rate limits were required to circumvent the complexity introduced by the Lerch transcendent in h_n(θ), and then ultimately in the composite function—composing all h_i—in Eq (12). In the composite function of Eq (12), the function h_i+1 is applied before h_i, hence the mutation rate ordering.

This specific ordering may have consequences on higher order details; for example in Eq (24), the final mutation rate ν_n is privileged, appearing in the term. In other limits, e.g. all mutation rates are equal, this term may alter. On the other hand, when considering τ_n+1, we wait for the first mutation of type n + 1, whereas multiple mutations may occur from type i → i + 1 for i = …, n − 1; so the might remain in alternative limit orders. However, for practical scenarios we do not expect this feature to considerably impact results; this may be seen by the considering the median time , where it’s clear that the privileged term acts as a higher order loglog correction to the leading behaviour.

Supporting information

S1 Fig. Comparison of limiting logistic distribution for hitting times with stochastic simulations.

Empirical cumulative distribution of the arrival times of types 1–3 obtained from simulations of the exact model versus the cumulative distribution function corresponding to the logistic distribution of Eq 4. Birth/death parameters: A (net growth rate decreases then increases), α₁ = α₂ = 1, α₃ = 1.4, β₁ = β₃ = 0.3, β₂ = 1.5; B, D (net growth rate increases then decreases); α₁ = α₃ = 1, α₂ = 1.4, β₁ = β₂ = 0.3, β₃ = 1.5; C (neutral), α₁ = α₂ = α₃ = 1, β₁ = β₂ = β₃ = 0.3. Mutation rates: A, B, C, ν₁ = ν₂ = ν₃ = 0.01; D, ν₁ = ν₂ = ν₃ = 0.001. Number of simulations: A, B, C; 1000 simulations; D, 100 simulations.

https://doi.org/10.1371/journal.pcbi.1011289.s001

(TIF)

S1 Text. Statistical methods for n-mutation fluctuation assay.

https://doi.org/10.1371/journal.pcbi.1011289.s002

(PDF)

Acknowledgments

We are grateful to Adri B. Olde Daalhuis for his help with Lemma 3, and to Martin Reijns for discussions on fluctuation assay experiments.

References

1. Luria SE, Delbrück M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics. 1943;48(6):491–511. pmid:17247100
- View Article
- PubMed/NCBI
- Google Scholar
2. Komarova NL, Wodarz D. Drug resistance in cancer: Principles of emergence and prevention. Proceedings of the National Academy of Sciences. 2005;102(27):9714–9719.
- View Article
- Google Scholar
3. Leder K, Foo J, Skaggs B, Gorre M, Sawyers CL, Michor F. Fitness Conferred by BCR-ABL Kinase Domain Mutations Determines the Risk of Pre-Existing Resistance in Chronic Myeloid Leukemia. PLOS ONE. 2011;6(11):1–11. pmid:22140458
- View Article
- PubMed/NCBI
- Google Scholar
4. Bozic I, Reiter JG, Allen B, Antal T, Chatterjee K, Shah P, et al. Evolutionary dynamics of cancer in response to targeted combination therapy. eLife. 2013;2:e00747. pmid:23805382
- View Article
- PubMed/NCBI
- Google Scholar
5. Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, Chen S, et al. Accumulation of driver and passenger mutations during tumor progression. Proceedings of the National Academy of Sciences. 2010;107(43):18545–18550. pmid:20876136
- View Article
- PubMed/NCBI
- Google Scholar
6. Williams MJ, Werner B, Heide T, Curtis C, Barnes CP, Sottoriva A, et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nature Genetics. 2018;50(6):895–903. pmid:29808029
- View Article
- PubMed/NCBI
- Google Scholar
7. Lahouel K, Younes L, Danilova L, Giardiello FM, Hruban RH, Groopman J, et al. Revisiting the tumorigenesis timeline with a data-driven generative model. Proceedings of the National Academy of Sciences. 2020;117(2):857–864. pmid:31882448
- View Article
- PubMed/NCBI
- Google Scholar
8. Haeno H, Gonen M, Davis M, Herman J, Iacobuzio-Donahue C, Michor F. Computational Modeling of Pancreatic Cancer Reveals Kinetics of Metastasis Suggesting Optimum Treatment Strategies. Cell. 2012;148(1):362–375. pmid:22265421
- View Article
- PubMed/NCBI
- Google Scholar
9. Reiter JG, Makohon-Moore AP, Gerold JM, Heyde A, Attiyeh MA, Kohutek ZA, et al. Minimal functional driver gene heterogeneity among untreated metastases. Science. 2018;361(6406):1033–1037. pmid:30190408
- View Article
- PubMed/NCBI
- Google Scholar
10. Durrett R, Moseley S. Evolution of resistance and progression to disease during clonal expansion of cancer. Theoretical Population Biology. 2010;77(1):42–48. pmid:19896491
- View Article
- PubMed/NCBI
- Google Scholar
11. Nicholson MD, Antal T. Competing evolutionary paths in growing populations with applications to multidrug resistance. PLOS Computational Biology. 2019;15(4):1–25. pmid:30986219
- View Article
- PubMed/NCBI
- Google Scholar
12. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990;61(5):759–767. pmid:2188735
- View Article
- PubMed/NCBI
- Google Scholar
13. Nguyen LH, Goel A, Chung DC. Pathways of Colorectal Carcinogenesis. Gastroenterology. 2020;158(2):291–302. pmid:31622622
- View Article
- PubMed/NCBI
- Google Scholar
14. Lakatos E, Williams MJ, Schenck RO, Cross WCH, Househam J, Zapata L, et al. Evolutionary dynamics of neoantigens in growing tumors. Nature Genetics. 2020;52(10):1057–1066. pmid:32929288
- View Article
- PubMed/NCBI
- Google Scholar
15. Castro-Giner F, Ratcliffe P, Tomlinson I. The mini-driver model of polygenic cancer evolution. Nature Reviews Cancer. 2015;15(11):680–685. pmid:26456849
- View Article
- PubMed/NCBI
- Google Scholar
16. Antal T, Krapivsky PL. Exact solution of a two-type branching process: models of tumor progression. Journal of Statistical Mechanics: Theory and Experiment. 2011;2011(08):P08018.
- View Article
- Google Scholar
17. Gill G, Straka P. MittagLeffleR: Using the Mittag-Leffler distributions in R; 2018. Available from: https://strakaps.github.io/MittagLeffleR/.
18. Denes J, Krewski D. An Exact Representation for the Generating Function for the Moolgavkar-Venzon-Knudson Two-Stage Model of Carcinogenesis with Stochastic Stem Cell Growth. Mathematical Biosciences. 1996;131(2):185–204. pmid:8589544
- View Article
- PubMed/NCBI
- Google Scholar
19. Cheek D, Antal T. Mutation frequencies in a birth–death branching process. Ann Appl Probab. 2018;28(6):3922–3947.
- View Article
- Google Scholar
20. Zheng Q. Progress of a half century in the study of the Luria-Delbrück distribution. Mathematical Biosciences. 1999;162(1–2):1–32. pmid:10616278
- View Article
- PubMed/NCBI
- Google Scholar
21. Kessler DA, Levine H. Large population solution of the stochastic Luria-Delbrück evolution model. Proceedings of the national Academy of Science USA. 2013;110(29):11682–11687. pmid:23818583
- View Article
- PubMed/NCBI
- Google Scholar
22. Durrett R. Branching Process Models of Cancer. Stochastics in Biological Systems. Springer; 2015.
23. Ma WT, Sandri GV, Sarkar S. Analysis of the {L}uria-{D}elbrück Distribution using discrete convolution powers. Journal of Applied Probability. 1992;29(2):255–267.
- View Article
- Google Scholar
24. Rosche WA, Foster PL. Determining mutation rates in bacterial populations. Methods. 2000;20:4–17. pmid:10610800
- View Article
- PubMed/NCBI
- Google Scholar
25. Russo M, Pompei S, Sogari A, Corigliano M, Crisafulli G, Puliafito A, et al. A modified fluctuation-test framework characterizes the population dynamics and mutation rate of colorectal cancer persister cells. Nature Genetics. 2022;54(7):976–984. pmid:35817983
- View Article
- PubMed/NCBI
- Google Scholar
26. Borst P. Genetic Mechanisms of Drug Resistance: A Review. Acta Oncologica. 1991;30(1):87–105. pmid:2009189
- View Article
- PubMed/NCBI
- Google Scholar
27. Tlsty TD, Margolin BH, Lum K. Differences in the rates of gene amplification in nontumorigenic and tumorigenic cell lines as measured by Luria-Delbruck fluctuation analysis. Proc Natl Acad Sci U S A. 1989;86(23):9441–9445. pmid:2687881
- View Article
- PubMed/NCBI
- Google Scholar
28. Kimmel M, Axelrod DE. Fluctuation test for two-stage mutations: application to gene amplification. Mutation Research. 1994;306:45–60. pmid:7512202
- View Article
- PubMed/NCBI
- Google Scholar
29. Altrock PM, Liu LL, Michor F. The mathematics of cancer: Integrating quantitative models. Nature Reviews Cancer. 2015;15(12):730–745. pmid:26597528
- View Article
- PubMed/NCBI
- Google Scholar
30. Bozic I, Wu CJ. Delineating the evolutionary dynamics of cancer from theory to reality. Nature Cancer. 2020;1(6):580–588. pmid:35121980
- View Article
- PubMed/NCBI
- Google Scholar
31. Ford CB, Shah RR, Maeda MK, Gagneux S, Murray MB, Cohen T, et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nature Genetics. 2013;45(7):784–790. pmid:23749189
- View Article
- PubMed/NCBI
- Google Scholar
32. Fu F, Nowak MA, Bonhoeffer S. Spatial Heterogeneity in Drug Concentrations Can Facilitate the Emergence of Resistance to Cancer Therapy. PLoS Computational Biology. 2015;11(3). pmid:25789469
- View Article
- PubMed/NCBI
- Google Scholar
33. Avanzini S, Antal T. Cancer recurrence times from a branching process model. PLOS Computational Biology. 2019;15(11):1–30. pmid:31751332
- View Article
- PubMed/NCBI
- Google Scholar
34. Zhang R, Ukogu OA, Bozic I. Waiting times in a branching process model of colorectal cancer initiation. bioRxiv. 2022;
- View Article
- Google Scholar
35. Athreya KB, Ney PE. Branching Processes. Dover Publications; 2004.
36. Keller P, Antal T. Mutant number distribution in an exponentially growing population. J Stat Mech P01011. 2015;(1).
- View Article
- Google Scholar
37. Harris TE. The theory of branching processes. Die Grundlehren der mathematischen Wissenschaften. Berlin, G{ö}ttingen, Heidelberg: Springer; 1963. Available from: http://opac.inria.fr/record=b1081408.
38. NIST Digital Library of Mathematical Functions; 2016. http://dlmf.nist.gov/, Release 1.0.7 of 2014-03-21. Available from: http://dlmf.nist.gov/.

[ref1] 1. Luria SE, Delbrück M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics. 1943;48(6):491–511. pmid:17247100
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Komarova NL, Wodarz D. Drug resistance in cancer: Principles of emergence and prevention. Proceedings of the National Academy of Sciences. 2005;102(27):9714–9719.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Leder K, Foo J, Skaggs B, Gorre M, Sawyers CL, Michor F. Fitness Conferred by BCR-ABL Kinase Domain Mutations Determines the Risk of Pre-Existing Resistance in Chronic Myeloid Leukemia. PLOS ONE. 2011;6(11):1–11. pmid:22140458
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Bozic I, Reiter JG, Allen B, Antal T, Chatterjee K, Shah P, et al. Evolutionary dynamics of cancer in response to targeted combination therapy. eLife. 2013;2:e00747. pmid:23805382
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, Chen S, et al. Accumulation of driver and passenger mutations during tumor progression. Proceedings of the National Academy of Sciences. 2010;107(43):18545–18550. pmid:20876136
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. Williams MJ, Werner B, Heide T, Curtis C, Barnes CP, Sottoriva A, et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nature Genetics. 2018;50(6):895–903. pmid:29808029
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Lahouel K, Younes L, Danilova L, Giardiello FM, Hruban RH, Groopman J, et al. Revisiting the tumorigenesis timeline with a data-driven generative model. Proceedings of the National Academy of Sciences. 2020;117(2):857–864. pmid:31882448
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Haeno H, Gonen M, Davis M, Herman J, Iacobuzio-Donahue C, Michor F. Computational Modeling of Pancreatic Cancer Reveals Kinetics of Metastasis Suggesting Optimum Treatment Strategies. Cell. 2012;148(1):362–375. pmid:22265421
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Reiter JG, Makohon-Moore AP, Gerold JM, Heyde A, Attiyeh MA, Kohutek ZA, et al. Minimal functional driver gene heterogeneity among untreated metastases. Science. 2018;361(6406):1033–1037. pmid:30190408
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Durrett R, Moseley S. Evolution of resistance and progression to disease during clonal expansion of cancer. Theoretical Population Biology. 2010;77(1):42–48. pmid:19896491
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref11] 11. Nicholson MD, Antal T. Competing evolutionary paths in growing populations with applications to multidrug resistance. PLOS Computational Biology. 2019;15(4):1–25. pmid:30986219
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref12] 12. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell. 1990;61(5):759–767. pmid:2188735
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref13] 13. Nguyen LH, Goel A, Chung DC. Pathways of Colorectal Carcinogenesis. Gastroenterology. 2020;158(2):291–302. pmid:31622622
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Lakatos E, Williams MJ, Schenck RO, Cross WCH, Househam J, Zapata L, et al. Evolutionary dynamics of neoantigens in growing tumors. Nature Genetics. 2020;52(10):1057–1066. pmid:32929288
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Castro-Giner F, Ratcliffe P, Tomlinson I. The mini-driver model of polygenic cancer evolution. Nature Reviews Cancer. 2015;15(11):680–685. pmid:26456849
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Antal T, Krapivsky PL. Exact solution of a two-type branching process: models of tumor progression. Journal of Statistical Mechanics: Theory and Experiment. 2011;2011(08):P08018.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref17] 17. Gill G, Straka P. MittagLeffleR: Using the Mittag-Leffler distributions in R; 2018. Available from: https://strakaps.github.io/MittagLeffleR/.

[ref18] 18. Denes J, Krewski D. An Exact Representation for the Generating Function for the Moolgavkar-Venzon-Knudson Two-Stage Model of Carcinogenesis with Stochastic Stem Cell Growth. Mathematical Biosciences. 1996;131(2):185–204. pmid:8589544
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref19] 19. Cheek D, Antal T. Mutation frequencies in a birth–death branching process. Ann Appl Probab. 2018;28(6):3922–3947.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref20] 20. Zheng Q. Progress of a half century in the study of the Luria-Delbrück distribution. Mathematical Biosciences. 1999;162(1–2):1–32. pmid:10616278
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref21] 21. Kessler DA, Levine H. Large population solution of the stochastic Luria-Delbrück evolution model. Proceedings of the national Academy of Science USA. 2013;110(29):11682–11687. pmid:23818583
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Durrett R. Branching Process Models of Cancer. Stochastics in Biological Systems. Springer; 2015.

[ref23] 23. Ma WT, Sandri GV, Sarkar S. Analysis of the {L}uria-{D}elbrück Distribution using discrete convolution powers. Journal of Applied Probability. 1992;29(2):255–267.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref24] 24. Rosche WA, Foster PL. Determining mutation rates in bacterial populations. Methods. 2000;20:4–17. pmid:10610800
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref25] 25. Russo M, Pompei S, Sogari A, Corigliano M, Crisafulli G, Puliafito A, et al. A modified fluctuation-test framework characterizes the population dynamics and mutation rate of colorectal cancer persister cells. Nature Genetics. 2022;54(7):976–984. pmid:35817983
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref26] 26. Borst P. Genetic Mechanisms of Drug Resistance: A Review. Acta Oncologica. 1991;30(1):87–105. pmid:2009189
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref27] 27. Tlsty TD, Margolin BH, Lum K. Differences in the rates of gene amplification in nontumorigenic and tumorigenic cell lines as measured by Luria-Delbruck fluctuation analysis. Proc Natl Acad Sci U S A. 1989;86(23):9441–9445. pmid:2687881
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref28] 28. Kimmel M, Axelrod DE. Fluctuation test for two-stage mutations: application to gene amplification. Mutation Research. 1994;306:45–60. pmid:7512202
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref29] 29. Altrock PM, Liu LL, Michor F. The mathematics of cancer: Integrating quantitative models. Nature Reviews Cancer. 2015;15(12):730–745. pmid:26597528
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref30] 30. Bozic I, Wu CJ. Delineating the evolutionary dynamics of cancer from theory to reality. Nature Cancer. 2020;1(6):580–588. pmid:35121980
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref31] 31. Ford CB, Shah RR, Maeda MK, Gagneux S, Murray MB, Cohen T, et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nature Genetics. 2013;45(7):784–790. pmid:23749189
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref32] 32. Fu F, Nowak MA, Bonhoeffer S. Spatial Heterogeneity in Drug Concentrations Can Facilitate the Emergence of Resistance to Cancer Therapy. PLoS Computational Biology. 2015;11(3). pmid:25789469
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref33] 33. Avanzini S, Antal T. Cancer recurrence times from a branching process model. PLOS Computational Biology. 2019;15(11):1–30. pmid:31751332
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref34] 34. Zhang R, Ukogu OA, Bozic I. Waiting times in a branching process model of colorectal cancer initiation. bioRxiv. 2022;
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref35] 35. Athreya KB, Ney PE. Branching Processes. Dover Publications; 2004.

[ref36] 36. Keller P, Antal T. Mutant number distribution in an exponentially growing population. J Stat Mech P01011. 2015;(1).
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref37] 37. Harris TE. The theory of branching processes. Die Grundlehren der mathematischen Wissenschaften. Berlin, G{ö}ttingen, Heidelberg: Springer; 1963. Available from: http://opac.inria.fr/record=b1081408.

[ref38] 38. NIST Digital Library of Mathematical Functions; 2016. http://dlmf.nist.gov/, Release 1.0.7 of 2014-03-21. Available from: http://dlmf.nist.gov/.

Figures

Abstract

Author summary

Introduction

Model

Model

Motivation

Results

Results overview

Population sizes.

Arrival times.

Properties of the results

Population sizes.

Arrival times.

Comparison with prior special cases.

An application: n-mutation fluctuation assays

Discussion

Methods

Branching process: Population growth

Approximate model introduction

Approximate model: Population growth

Arrival times

Supporting information

S1 Fig. Comparison of limiting logistic distribution for hitting times with stochastic simulations.

S1 Text. Statistical methods for n-mutation fluctuation assay.

Acknowledgments

References