## Figures

## Abstract

Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.

## Author summary

Knowing the likely paths of tumor progression is instrumental for cancer precision medicine as it would allow us to identify genetic targets that block disease progression and to improve therapeutic decisions. Direct information about paths of tumor progression is scarce, but cancer progression models (CPMs), which use as input cross-sectional data on genetic alterations, can be used to predict these paths. CPMs, however, make assumptions about fitness landscapes (genotype-fitness maps) that might not be met in cancer. We examine if four CPMs can be used to predict successfully the distribution of tumor progression paths; we find that some CPMs work well when sample sizes are large and fitness landscapes have a single fitness maximum, but in fitness landscapes with multiple fitness maxima prediction is poor. However, the best performing CPM in our study could be used to estimate evolutionary unpredictability. When we apply the best performing CPM in our study to twenty-two cancer data sets we find that predictions are generally unreliable but that some cancer data sets show low unpredictability. Our results highlight that CPMs could be valuable tools for predicting disease progression, but emphasize the need for methodological work to account for multi-peaked fitness landscapes.

**Citation: **Diaz-Uriarte R, Vasallo C (2019) Every which way? On predicting tumor evolution using cancer progression models. PLoS Comput Biol 15(8):
e1007246.
https://doi.org/10.1371/journal.pcbi.1007246

**Editor: **Arne Traulsen,
Max-Planck-Institute for Evolutionary Biology, GERMANY

**Received: **November 28, 2018; **Accepted: **July 5, 2019; **Published: ** August 2, 2019

**Copyright: ** © 2019 Diaz-Uriarte, Vasallo. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the manuscript and its Supporting Information files.

**Funding: **Work partially supported by BFU2015-67302-R (MINECO/FEDER, EU) to RDU. CV supported by PEJD-2016-BMD-2116 from Comunidad de Madrid to RDU. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Improving our ability to predict the paths of tumor progression is helpful for diagnostic, prognostic, and treatment purposes as, for example, it would allow us to identify genes that block the most common paths of disease progression [1–4]. This interest in predicting paths of progression is not exclusive to cancer (see reviews in [5, 6]). For example, in some cases antibiotic resistance shows parallel evolution with mutations being acquired in a similar order [7], and here “Even a modest predictive power might improve therapeutic outcomes by informing the selection of drugs, the preference between monotherapy or combination therapy and the temporal dosing regimen (…)” [8, p. 243i]. But detailed information about the paths of tumor evolution and their distribution, obtained from multiple within-patient samples with timing information, is not available.

Cancer progression models (CPMs), such as conjunctive Bayesian networks (CBN) [9–11], oncogenetic trees (OT) [12, 13], CAncer PRogression Inference (CAPRI) [14, 15], or CAncer PRogression Extraction with Single Edges (CAPRESE) [16], can be used to predict paths of tumor progression. CPMs were originally developed to identify restrictions in the order of accumulation of mutations during tumor progression from cross-sectional data [17, 18]. But CPMs also encode all the possible mutational paths or trajectories of tumor progression, from the initial genotype to the genotype with all driver genes mutated (see Fig 1); in fact, mutational pathways and evolutionary trajectories are already mentioned in the papers that describe CBN [10], CAPRI [14, 15] and in general overviews of CPMs [18]. Thus, CPMs could improve our ability to predict disease progression by leveraging on the available, and growing, number of cross-sectional data sets.

(A) Representable; (B) Local maxima; (C) Rough Mount Fuji (RMF). Top row: partial fitness landscapes (representation based on [23], but only accessible genotypes are shown); genotypes are shown as sequences of 0s and 1s, where “1010” means a genotype with the first and third driver genes mutated; the vertical position of a genotype is its fitness; its horizontal position on the x-axis is given by its Hamming distance to the “0000” genotype. Green segments connect mutational neighbors of increasing fitness. The inset in the first row shows the DAG of restrictions in the order of accumulation of mutations that applies to (A) and (B). A DAG of restrictions shows genes (not genotypes) in the nodes; an arrow from gene *i* to gene *j* means that a mutation in *i* must occur before a mutation in *j* can occur; an arrow indicates a direct dependency of a mutation in gene *j* on a mutation in gene *i*. In the figure, a mutation in the fourth gene can be observed only if both the second and third genes are mutated. Note that among the models considered in this paper, CAPRESE and OT can only represent trees (so they can not account for the fourth gene having two, or more, incoming arrows). The absence of an arrow between two genes means that there are no direct dependencies between the two genes. Bottom row: fitness graphs or graphs of mutational paths; in fitness graphs nodes are genotypes and arrows point toward mutational neighbors of higher fitness (i.e., two genotypes connected by an arrow differ in one mutation that increases fitness [24–26]). These fitness graphs show all the paths of tumor progression, the set of accessible mutational paths and adaptive walks that, under the restriction that there can be no back mutations, start from the “0000” genotype and end in a fitness maximum. When evolution runs until fixation in a fitness maximum, each path from “0000” to a fitness maximum corresponds to a different Line of Descent (LOD). For (B) and (C), gray edges and nodes in the fitness graphs show edges and nodes that are present in (A) but missing in (B) or (C). Under CPMs, since each new driver mutation with its dependencies satisfied increases fitness, all accessible genotypes that differ by exactly one mutation are connected in the fitness graph, as shown in (A), and the genotype with all driver genes mutated is the single fitness maximum (see also section 4 of S2 Text for local maxima from unconditionally deleterious mutations). For (B), the fitness landscape —and its fitness graph— has the same accessible genotypes as the fitness landscape in (A). But the fitness landscape in (B) has three maxima and, compared to (A), there are fewer paths to the genotype with all genes mutated, “1111”, and several paths end in the other two maxima (“1100”, “1010”). Thus, the fitness graph of (B) does not fulfill the assumptions of CPMs. Compared to the fitness graph of (A), in the fitness graph of (B) not all accessible genotypes that differ by one mutation are connected —e.g., genotypes “1100” and “1110”. In terms of acquisition of mutations, in (B), and in contrast to (A), we cannot reach genotype “1110” from genotype “1100”, even when a mutation in the third gene does not depend on any previous mutation according to the DAG of restrictions. So if we go from “1000” to “1100”, the acquisition of the second mutation precludes acquiring the third mutation (a violation of an assumption of CPMs), and this creates a local fitness maximum. The fitness landscape in (C) cannot be represented by any DAG of restrictions: no DAG of restrictions can account at the same time for the presence of genotypes “1000”, “0100”, “0010”, and the absence of every double mutant with the first gene mutated. Relative to (A), the fitness graph in (C) is missing both paths and genotypes (relative to the fitness graph from any possible DAG of restrictions it could either be missing and/or adding genotypes and paths).

The first question we address in this study is whether we can predict the paths of tumor progression using CPMs. To answer this question we will examine how close to the truth are the predictions made by four CPMs (CBN, OT, CAPRI, and CAPRESE) about the distribution of paths of tumor progression. We have adapted the output from the CPMs (the restrictions in the order of accumulation of mutations) to predict mutational pathways, and assessed the quality of these predictions. When addressing this question we need to take into account possible deviations from the models assumed by CPMs. In particular, most CPMs assume that the acquisition of a mutation in a driver gene, when all its possible dependencies on other genes are satisfied, does not decrease the probability of gaining a mutation in another driver gene [19]. In other words, acquiring driver mutations (when their dependencies on other genes are satisfied) cannot decrease fitness. This implies that the fitness landscapes assumed by CPMs, with respect to the driver genes, only have a single fitness maximum, the genotype with all drivers mutated (see Fig 1; note that there can be multiple fitness maxima if we include genes that are unconditionally deleterious —see section 4 of S2 Text). But it is likely that many cancer fitness landscapes have several local fitness maxima (i.e., they are rugged, multi-peaked landscapes): this can happen if there are many combinations of a small number of drivers, out of a larger pool of drivers [20], that result in the escape genotypes; moreover, synthetic lethality is common in both cancer cells [21, 22] and the human genome [27], and it can lead to local fitness maxima when it affects mutations that individually increase fitness—see also [28]. Thus, to examine if CPMs can be used to predict paths of tumor progression we will need to assess how the quality of the predictions is affected by multi-peaked fitness landscapes.

The second question addressed in this paper is whether we can use CPMs to estimate evolutionary unpredictability, regardless of the performance when predicting the actual paths of tumor progression. A model could be useful if it suggests few paths are possible, even if its actual predictions about the distribution of paths are not trustworthy. Conversely, predicting correctly the distribution of paths of tumor progression might be of little importance in scenarios where the true evolutionary unpredictability itself is very large (where disease progression follows a very large number of possible paths); for practical purposes, forecasting here would be useless.

To address the above questions (can we predict the paths of tumor progression using CPMs?; can we estimate evolutionary unpredictability using CPMs?) we use evolutionary simulations on 1260 fitness landscapes that include from none to severe deviations from the assumptions that CPMs make about the structure of fitness landscapes, and we analyze the data with four different CPMs, whose predictions about restrictions in the order of accumulation of mutations we have adapted to provide probabilities of paths of tumor progression. This paper does not attempt to understand the determinants of evolutionary (un)predictability (see, e.g., [5, 6, 25, 29, 30]) but, instead, we focus on the effects of evolutionary unpredictability for CPMs. This is why we use variation in key determinants of evolutionary unpredictability (e.g., variation in population sizes and mutation rates) but these factors are only used to generate variability in unpredictability, and not themselves the focus of the study. To better assess the quality of predictions, we use sample sizes that cover the range from what is commonly used to what are much larger sample sizes than currently available. We also include variation in the cancer detection process or detection regime (when cancer samples are taken, or when patients are sampled), since previous studies have shown that it affects the quality of inferences from CPMs [31].

We have shown before [31] that the performance of two CPMs (CBN and CAPRI) for predicting accessible genotypes degrades considerably when the fitness landscapes contain reciprocal sign epistasis. That study focused on predicting accessible genotypes and its results cannot provide an answer to the questions about predicting paths of tumor progression and estimating evolutionary unpredictability. We are extending our previous study to answer whether CPMs can be used to predict paths of progression and to estimate evolutionary unpredictability. To address these questions we need to look directly at the prediction of paths (not genotypes), and compare them with the true paths of progression, as we do in the current work. Thus, the two studies differ in objectives, methods (here we use a larger number of CPMs, we follow evolution until fixation, and we develop procedures to compare predicted with true paths of tumor progression), and scenarios considered (the types fitness landscapes used and the extent of evolutionary unpredictability); see details in S1 Text.

Here we find that the agreement between the predicted and true distributions of paths is generally poor, unless sample sizes are very large and fitness landscapes conform to the assumptions of CPMs. Both detection regime and evolutionary unpredictability itself have major effects on performance. But in spite of the unreliability of the predictions of paths of tumor progression, we find that CPMs can be useful for estimating upper bounds to the true evolutionary unpredictability.

What are the implications of our results for the analysis and interpretation of the use of CPMs with cancer data sets? We analyze twenty-two real cancer data sets with H-CBN, the best performing CPM in the simulations. We cannot examine how close predictions are to the truth, since the truth is unknown; thus, we use bootstrap samples to examine the reliability of the inferences. Many of the cancer data sets reflect conditions where useful predictions could be possible, based on the estimates of evolutionary unpredictability from H-CBN. But for most data sets these results are thwarted by the unreliability of the predictions themselves, which increases with the number of features analyzed. Our results question uncritical use of CPMs for predicting paths of tumor progression, and suggest the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.

## Materials and methods

This paper involves both a simulation study where results from four CPMs (CBN —variants H-CBN and MCCBN—, OT, CAPRI —variants CAPRI_AIC and CAPRI_BIC—, and CAPRESE) are compared to the known truth from the simulations, and the analysis of twenty-two cancer data sets using the best performing of the above CPMs (H-CBN). Section Cancer Progression Models used and paths of tumor progression describes the CPMs used and how predicted paths of tumor progression are obtained from them. Section Overview of the simulation study provides an overview of the simulation study. Sections Fitness landscapes to Detection regimes and obtaining data sets from the simulations provide details on how the simulations were conducted. How the performance of CPMs was assessed is explained in section Measures of performance and predictability. Section Characteristics of the simulated fitness landscapes and genotypes summarizes the main features of the simulated landscapes and data sets used to evaluate the performance of the CPMs. The cancer data sets and the methods used to analyze them are described in section Cancer data sets.

### Cancer progression models used and paths of tumor progression

We have compared four distinct CPMs: CBN, OT, CAPRI, and CAPRESE. Two of the models used, CBN and CAPRI, have been used in two variants (H-CBN and MCCBN for CBN, CAPRI_AIC and CAPRI_BIC for CAPRI), yielding a total of six different procedures for obtaining CPMs. Only a brief overview of these CPMs is provided here; detailed descriptions can be found in the original references for each model: H-CBN [9, 10], MCCBN [11], OT [12, 13], CAPRI [14, 15], and CAPRESE [16]. (Other CPMs that we are aware of include DiP [32], bcbn [33] and RESIC [34, 35]; we do not consider these methods here because they are too slow for routine work, have no software available, or have dependencies on non-open source external libraries —see S4 Text).

The CPMs considered try to identify restrictions in the order of accumulation of mutations from cross-sectional data. CPMs assume that the different observations in the cross-sectional data set constitute independent realizations of evolutionary processes where the same constraints hold for all tumors [10, 17, 18]. Thus, a data set can be regarded as a set of replicate evolutionary experiments where all individuals are under the same genetic constraints. For the four CPMs considered in this paper, the cross-sectional data is a matrix of subjects (or individuals) by driver alteration events, where each entry in the matrix is binary coded as mutated or not-mutated (or, equivalently, altered or non-altered). CPMs assume there are no back mutations in these events —i.e., once gained, an alteration is not lost. CPMs further assume that the driver genes are known. For the simulations, we will refer to these driver alteration events as “genes”, but they can be individual genes, parts or states of genes, or modules or pathways made from several genes (e.g., [10, 15]). When we analyze the twenty-two cancer data sets (see section Cancer data sets) we will use the generic term “features” as some of those data sets use genes whereas others use pathway or module information. CPMs assume that all tumors start cancer progression without any of the mutations considered in the study (the above matrix of subjects by driver alterations), but other mutations could be present that have caused the initial tumor growth. All these other mutations are absorbed in the root node from which cancer is initiated [35]; note that the way the data are simulated to generate cross-sectional observations (see section Overview of the simulation study) is consistent with this assumption.

The above assumptions are common to the CPMs considered. The models examined here differ, however, in the types of restrictions they can represent and on their model fitting procedures. Both OT [12, 13] and CAPRESE [16] describe the accumulation of mutations with order constraints that can be represented as trees. Thus, among the “representable” fitness landscapes used in this paper (section Evolutionary simulations), OT and CAPRESE can only faithfully model the subset that are trees, those where a gene mutation has a direct dependency on only one other gene’s mutation. A key difference between OT and CAPRESE is that CAPRESE reconstructs these models using a probability raising notion of causation in the framework of Suppes’ probabilistic causation, whereas in OT weights along edges can be directly interpreted as probabilities of transition along the edges by the time of observation [12, p. 4]. In contrast to OT and CAPRESE, both CAPRI [14, 15] and CBN [9–11] allow modeling the dependence of an event on more than one previous event: the output of the models are directed acyclic graphs (DAGs) where some nodes have multiple parents, instead of a single parent (as in trees). CAPRI tries to identify events (alterations) that constitute “selective advantage relationships”, again using probability raising in the framework of Suppes’ probabilistic causation. We have used two versions of CAPRI, that we will call CAPRI_AIC and CAPRI_BIC, that differ in the penalization used in the maximum likelihood fit, Akaike Information Criterion (AIC), or Bayesian Information Criterion (BIC), respectively. For CBN we have also used two variants, H-CBN, described in [9, 10] that uses simulated annealing with a nested expectation-maximization (EM) algorithm for estimation, and MCCBN, described in [11], that uses a Monte-Carlo EM algorithm. Thus, these six procedures can be divided into three groups: models that return trees (OT and CAPRESE) and two families of models that return DAGs, CBN (H-CBN and MCCBN) and CAPRI (CAPRI_AIC and CAPRI_BIC).

Because (the transitive reduction of) a DAG of restrictions determines a fitness graph (see Fig 1 and [31]), the set of paths to the maximum encoded by the output from a CPM is obtained from the fitness graph. This we did for all models. From H-CBN and MCCBN we can also obtain the estimated probability of each path of tumor progression to the fitness maximum, since both H-CBN and MCCBN return the parameters of the waiting time to occurrence of each mutation (given its restrictions are satisfied; e.g., [11, p. i729], [9, section 2.2], or [36]; details and example in section 3 of S4 Text). It is also possible to perform a similar operation with the output of OT, and use the edge weights from the fits of OT to obtain the probabilities of transition to each descendant genotype and, from them, the probabilities of the different paths to the single fitness maximum. It must be noted that these probabilities are not really returned by the model, since the OTs used are untimed oncogenetic trees [12, 13]. We will refer to paths with probabilities assigned in the above way as **probability-weighted paths**. For CAPRESE and CAPRI, it is not possible to map the output to different probabilities of paths of progression (see section 3 of S4 Text) and in all computations that required probability of paths we assigned the same probability to each path.

### Overview of the simulation study

We have used simulations of tumor evolution on fitness landscapes of three different types (see Fig 1), for landscapes of 7 and 10 genes, under different initial population sizes and mutation rates. We have used a total of 1260 fitness landscapes = 2 conditions of numbers of genes x 3 types of fitness landscapes x 3 initial population sizes x 2 mutation regimes x 35 random fitness landscapes for each combination of conditions. For each one of the 1260 fitness landscapes, we simulated 20000 independent evolutionary processes (with the specified parameters for initial population size and mutation rate) using a logistic-like growth model; each simulated evolutionary process was run until one of the genotypes at the local fitness maxima (or the single fitness maximum) reached fixation. Each set of 20000 simulated evolutionary processes was then sampled under three detection regimes, so that each fitness landscape generated three sets of 20000 simulated genotypes. From each of these sets, we obtained five different splits of the genotypes for each of three sample sizes (50, 200, 4000); thus a total of 56700 (= 1260 x 3 x 3 x 5 combinations of 1260 fitness landscapes, 3 detection regimes, 3 sample sizes, 5 splits) data sets were produced. Each of these 56700 data sets was analyzed with every one of the CPMs compared (H-CBN, MCCBN, OT, CAPRI_AIC, CAPRI_BIC, and CAPRESE), to obtain predicted paths of tumor progression. These predictions were then compared with the true, recorded, paths of tumor progression from the simulations (see section Measures of performance and predictability). A schematic view of the simulation study is provided in Fig 2.

(A) On each of the 1260 fitness landscapes, (B) 20000 evolutionary processes were simulated. The set of 20000 evolutionary processes from a fitness landscape were then (C) sampled under three detection regimes, obtaining one observation per simulation under each detection regime, (D) leading to 20000 observations that are enriched in large-sized tumors —large detection regime—, 20000 observations enriched on small-sized tumors —small detection regime—, and 20000 observations with uniform distribution with respect to the logarithm of tumor size (so that the large and small detection regimes emulates cases when cancer tends to be detected at late or early stages, respectively —see text). (E) From each of the individual observations, we obtained the genotype of the most common clone; therefore, (F) each fitness landscape provides 3 sets of 20000 genotypes, one for each detection regime. (G) These sets were split in 5 non-overlapping sets of 50 observations, 5 non-overlapping sets of 200 observations, and 5 non-overlapping sets of 4000 observations. Each of these data sets was analyzed with each of the CPMs considered to obtain the predicted paths of tumor progression.

### Fitness landscapes

We have used three different kinds of random fitness landscapes (see Fig 1). **Representable** fitness landscapes are fitness landscapes for which a DAG of restrictions exists with the same accessible genotypes and accessible mutational paths. (Accessible mutational path: a trajectory through a collection of genotypes, where each genotype is separated from the preceding genotype by one mutation, along which fitness increases monotonically [26]; accessible genotypes: genotypes along accessible mutational paths). An example of a representable fitness landscape with its corresponding DAG of restrictions and fitness graph is shown in Fig 1A. A defining characteristic of representable fitness landscapes is that all accessible genotypes that differ by exactly one mutation are connected in the fitness graph; thus, with respect to the driver genes, there is a single fitness maximum, the genotype with all driver genes mutated, and all accessible mutational paths in the fitness graph end in that single maximum. For representable fitness landscapes there is a one-to-one correspondence between DAGs of restrictions and fitness graphs.

In **local maxima** fitness landscapes (Fig 1B) the set of accessible genotypes can be represented by a DAG of restrictions, but there are local fitness maxima and the fitness graph has missing paths; the genotype with all genes mutated might or might not be the genotype with largest fitness. In other words, in the local maxima fitness landscapes, the DAG of restrictions and the fitness landscape agree on which genotypes should be accessible and which genotypes should not be accessible; what the local maxima landscapes are missing are mutational paths to the genotype with all genes mutated, because we have introduced local fitness maxima. Once we introduce local maxima there is no longer a one-to-one correspondence between DAGs of restrictions and fitness graphs (thus, there is no longer a one-to-one correspondence between DAGs of restrictions and sets of tumor progression paths). These local maxima landscapes should not be as challenging to CPMs as the DAG-derived non-representable fitness landscapes used in [31], as those also missed some genotypes that should exist under the DAG of restrictions. This is by design: here we want to isolate the effect of multi-peaked landscapes or local maxima (or, equivalently, missing paths), without the additional burden, for the CPMs, of missing genotypes.

The third type of fitness landscapes used are Rough Mount Fuji (**RMF**) fitness landscapes. The RMF model [25, 37] combines a random House of Cards model (where fitness is assigned to genotypes by independently sampling from a fixed probability distribution) and an additive fitness landscape (where the reference genotype, or the genotype with largest fitness, need not be the one with all genes mutated). The RMF model is a very flexible one, where the ruggedness of the landscape can be modified by changing the ratio of the additive component (the change in fitness per unit increase in Hamming distance from the reference genotype) relative to the variance of the random fitness component (the House of Cards component). The RMF model has been useful to model empirical fitness landscapes [25, 26, 37]. RMF fitness landscapes generally have multiple local fitness maxima and considerable reciprocal sign epistasis and thus not even the set of accessible genotypes can be represented by a DAG of restrictions (see [31], and Fig 1).

We generated the DAG-derived **representable** fitness landscapes by generating a random DAG of restrictions and from it the fitness graph. We then assigned birth rates to genotypes using an iterative procedure on the fitness graph where, starting from the genotype without any driver mutation with a birth rate of 1, the birth rate of each descendant genotype was set equal to the maximum fitness of its parent genotypes times a random uniform variate between 1.01 and 1.19 (*U*(1.01, 1.19)) yielding, therefore, an average multiplicative increase in fitness of 0.1 (which is within values previously used: [4, 38, 39]). The birth rate of genotypes that were not accessible according to the DAG of restrictions was set to 0. For example, if the DAG of restrictions was the one shown in Fig 1A, a cell with genotype “1001” would have a birth rate of 0, since the dependencies of the DAG of restrictions are not satisfied —mutations in genes 2 and 3 must occur before a mutation in gene 4. Therefore, this simulation scheme strictly adheres to the assumptions about accessible and non-accessible genotypes under the CPM model. (For the growth model used here —see below— birth rates determine fitness at any population size as death rates are identical for all genotypes and depend only on population size. Genotypes with a birth rate of 0 are never added to the population and, thus, they cannot mutate before dying). We generated the DAG-derived **local maxima** fitness landscapes by first generating a random DAG and from it the fitness graph, identically to what was done for representable fitness landscapes. Though in contrast to representable fitness landscapes, before assigning fitness to genotypes a random selection of edges of the fitness graph were removed so that all accessible genotypes remained accessible but now from a possibly much smaller set of parents. Birth rate was then assigned as for the representable fitness landscapes (using the iterative procedure on the fitness graph, where birth rate of descendant genotype = max(birth rate parent genotypes) * *U*(1.01, 1.19), and with all non-accessible genotypes with a birth rate of 0). For each DAG we repeated this procedure 50 times, and kept the one that introduced the largest number of local maxima. Creating local maxima almost always resulted in creating reciprocal sign epistasis (see also section Characteristics of the simulated fitness landscapes and genotypes and S2 Text). We generated the **RMF** fitness landscapes by randomly choosing the reference genotype (i.e., the genotype with the largest fitness) and the decrease in birth rate of a genotype per each unit increase in Hamming distance from the reference genotype (which affects the ruggedness of the landscape); see details in S2 Text.

### Evolutionary simulations

Once a fitness landscape had been generated, we simulated 20000 evolutionary processes (step B in Fig 2). We used the continuous-time, logistic-like model of [38], in which death rate depends on total population size, as implemented in OncoSimulR [40], with the specified parameters of initial population size and mutation rate (below). Each individual evolutionary process was run until one of the genotypes at the local fitness maxima (or the single fitness maximum) reached fixation (see details in S3 Text). We also verified that all 7 or 10 genes had appeared in at least some genotypes, i.e., were part of the paths of tumor progression. If this condition was not fulfilled, a new fitness landscape was generated and the processes started again. This procedure is independent of the detection process that returns the genotypes analyzed by the CPMs (section Detection regimes and obtaining data sets from the simulations).

We used three **initial population sizes**, 2000, 50000, and 1 × 10^{6} cells, for the simulations; these cover a range of population sizes at tumor initiation that have previously been used in the literature (e.g., [10, 38, 41, 42]). We also used two **mutation regimes**; in the first one, all genes had a common mutation rate of 1 × 10^{−5}; in the second, genes had different mutation rates, uniformly distributed in the log scale between (1/5) 1 × 10^{−5} and 5 × 10^{−5} (i.e., the largest ratio between largest and smallest mutation rates was 25), so that the arithmetic mean of mutation rates was 1.5 × 10^{−5} and the geometric mean 1 × 10^{−5}. These mutation rates are within ranges previously used in the literature [38, 39, 43], with a bias towards larger numbers (since we use only 7 or 10 genes relevant for population growth and we could be modeling pathways, not individual genes). Initial population size and mutation rates are not of intrinsic interest here (since our focus are not the determinants of evolutionary predictability *per se*), but are used to generate variability in evolutionary predictability and to allow for deviations from the strong-selection-weak-mutation (SSWM) regime [25]; see section Characteristics of the simulated fitness landscapes and genotypes.

Only in the representable fitness landscapes are simulations restricted to move uphill in the fitness landscapes. In all three types of fitness landscapes, mutations can lead to either increases or decreases in fitness. In the representable and local maxima fitness landscapes, as explained above, mutation events that do not fulfill the restrictions in the order of accumulation of mutations lead to a birth rate (and, thus, fitness) of 0. Therefore, in the simulations in representable and local maxima fitness landscapes, no path from the “0000” genotype to a fitness maximum can ever go through a non-accessible genotype. This is by design, so that these fitness landscapes strictly adhere to the assumption of CPMs about restrictions in the accumulation of mutations. But in both RMF and local maxima fitness landscapes it is possible to move through a fitness valley (i.e., make moves from ancestor to descendant that are not always monotonically increasing in fitness), phenomena that are more frequent as we deviate from the SSWM assumption ([25]; commented example in section 5 of S3 Text). (Note that this is possible in local maxima fitness landscapes, even when non-accessible genotypes can never be part of evolutionary paths, because with no back mutations an accessible genotype can be along an uphill path when coming from one ancestor but in a valley when coming from another ancestor; no such genotypes can exist in the representable fitness landscape as in the representable landscapes all accessible genotypes that differ by exactly one mutation are connected in the fitness graph). In addition, in the RMF fitness landscape, we can move through fitness valleys of non-accessible genotypes as non-accessible genotypes need not have a birth rate of 0 in the RMF; see section 5 of S3 Text).

### Detection regimes and obtaining data sets from the simulations

To obtain the genotypes that were analyzed by the CPMs, we first sampled the simulated evolutionary processes, obtaining one observation per evolutionary processes, using three different **detection regimes** (Fig 2C and 2D); then, for each observation of each detection regime, we obtained the genotype corresponding to each observation (Fig 2E), which lead to matrices of 20000 genotypes (Fig 2F); finally we split these matrices into non-overlapping subsets to be analyzed with the CPMs (Fig 2G).

The three detection regimes differ in the distribution of sizes of the sampled tumors (Fig 2D). Under the **large** detection regime a large fraction of the samples correspond to large tumors. In contrast, under the **small** detection regime a large fraction of the samples correspond to small tumors. Finally, under the **uniform** detection regime the distribution of sizes of the sampled tumors is approximately uniform. Thus, the large detection regime would emulate scenarios where cancer tends to be detected at late, advanced stages, and the small detection regime would emulate scenarios where cancer tends to be detected at early stages.

To implement these detection regimes, we drew random deviates from beta distributions with parameters B(1, 1), B(5, 3), and B(3, 5) (for uniform, large, and small, respectively), rescaled them to the range of the log-transformed distribution of observed tumor sizes (log of number of cells), and obtained the observation with population size closest to the target (see details in section 2 in S3 Text). (We used the log-scale of tumor size because in the model of [38] tumor population size increases logarithmically with number of driver mutations; thus, distributions of sampled tumors that are biased towards large sizes in the log scale will mimic sampling of late-stage tumors —tumors with a large number of drivers—, and distributions of sampled tumors that are biased towards small sizes in the log scale will mimic sampling of early-stage tumors, as intended).

For each observation, the genotype returned was the genotype of the most abundant clone (Fig 2E). Finally, the set of 20000 genotypes (Fig 2F) was then split into five sets of non-overlapping data sets for each of the three **sample sizes** of 50, 200, and 4000 (Fig 2G). These are the data sets that were analyzed with the CPMs.

### Measures of performance and predictability

We have characterized evolutionary unpredictability using the diversity of **Lines of Descent (LODs)**. LODs were introduced by [30] and “(…) represent the lineages that arrive at the most populated genotype at the final time” (p. 572). In other words, in our simulations a LOD is a sequence of parent-child genotypes, from the initial genotype to a local maximum: a LOD is the path that a tumor has taken until fixation. The final genotype in a LOD is a local fitness maximum, but there are no guarantees that any intermediate genotype in the LOD will have been the most common genotype at any time point (especially under deviations from SSWM such as clonal interference and stochastic tunneling [25, 30, 44]). As in [30], we can use the entropy of these paths to measure the indeterminism of the paths of evolution, or evolutionary unpredictability, and we will define *S*_{p} = −∑*p*_{i} ln *p*_{i}, where *p*_{i} is the observed probability of each LOD (each path) computed from the 20000 simulations, and the sum is over all paths or LODs. Evolutionary unpredictability, as estimated by the CPMs, will analogously be defined as *S*_{c} = −∑*q*_{j} ln *q*_{j}, where *q*_{j} is the probability of each path to the maximum according to the cancer progression model considered, and the sum is over all paths predicted by the CPMs. ([36] normalizes predictability by dividing by the maximum entropy, similar to dividing by the prior entropy in the “information gain” statistic in [5]; but the maximum entropy is a constant for each number of genes, i.e., 7! or 10! for our simulations).

To measure how well CPMs predict tumor progression, we used three different statistics. To compare the overall similarity of the distribution of paths predicted by CPMs with the true observed one (i.e., the distribution of LODs) we used the Jensen-Shannon divergence (**JS**) [45, 46], scaled between 0 and 1 (equivalent to using the logarithm of base 2). JS is a symmetrized Kullback-Leibler divergence between two distributions and is defined even if the two distributions do not have the same sample space, i.e., even if *P*(*i*) ≠ 0 and *Q*(*i*) = 0 (or *Q*(*i*) ≠ 0 and *P*(*i*) = 0), as can often be the case for our data. A JS value of 0 means that the distributions are identical, and a value of 1 that they do not overlap. Therefore, predictions of CPMs are closer to the truth the smaller the value of JS. The sum of the probabilities of the paths in the LODs that are not among the paths allowed by the CPMs, *P*(¬*DAG*|*LOD*), is equivalent to **1—recall**. Larger values of 1-recall mean that the CPM is not capturing a large fraction of the evolutionary paths to the maximum (or maxima). The sum of the predicted probabilities of paths according to the CPMs that are not used by evolution (i.e., that are not LODs), *P*(¬*LOD*|*DAG*), is equivalent to **1—precision**. Larger values of 1-precision mean that the CPMs predict larger numbers of paths to the maximum that are not used by evolution. In S6 Text we also use as statistic the **probability of recovering the most common LOD**; we will rarely refer to this statistic in the main paper since it follows a pattern very similar to recall (section 2 in S6 Text). Statistics 1-recall and 1-precision can overestimate performance: they could both have a value of 0, even when JS is very close to 1 (see example in section 4 in S4 Text). Thus, the main overall performance measure will be JS.

#### Comparing paths from CPMs with LODs of different lengths.

When all paths from the CPM and the LOD have equal length (they end in a genotype with the same number of genes mutated, *K*) computing the above statistics is straightforward. But paths could differ in length. In fitness landscapes with local maxima, LODs can differ in length; some LODs could have a length (or number of mutations of the fixated genotype), *K*_{i}, shorter than the length of the paths from the CPM, *K*_{C} (all paths from a CPM have the same number of mutations, since all arrive at the genotype with all *K*_{C} genes mutated). It is also possible that some or all *K*_{i} > *K*_{C}, i.e., some or all LODs have a length larger than the length of the paths from the CPM. This will happen if the CPM has been built from a data set that contains fewer genes than the number of genes in the landscape (e.g., because one or more genes were absent —see section 2 in S4 Text); if the sampled data set has fewer genes than the landscape in a representable fitness landscape, then all *K*_{i} > *K*_{C} (as *K*_{i} will be equal to either 7 or 10).

To compute JS, 1-recall, and 1-precision that will cover all those cases we used the following procedure (that reduces to the simpler procedure in the previous section when all *K*_{i} = *K*_{C}). Let *i* and *j* denote two paths, one from the LOD and the other from the CPM, with corresponding probabilities *p*_{i} and *q*_{j}; in contrast to the previous section, and to minimize notation, *i*, *j* (and *p*_{i}, *q*_{j}) could refer to a path from the LOD and a path from the CPM or, alternatively, a path from the CPM and a path from a LOD. Let *K*_{i}, *K*_{j} denote the length of paths *i* and *j*, respectively. At least one set of either *K*_{i}s or *K*_{j}s has all elements identical (e.g., if *j* refers to indices of the paths from the CPM, it is necessarily the case that *K*_{1} = *K*_{2} = … = *K*_{m} = *K*_{C}, with *m* the total number of different paths from the CPM).

Now if *K*_{i} > *K*_{j} and the path *i* up to *K*_{j} mutations (i.e., from the “0000” genotype to the genotype with *K*_{j} mutations) is identical to *j*, then path *j* is included in path *i*: all of *q*_{j} is accounted for by *i*. This also means that path *i* is partially included in (or accounted for by) path *j*, but a fraction of it, (*K*_{i} − *K*_{j})/*K*_{i}, is missing or unaccounted for. The above applies directly to calculations of 1-recall and 1-precision. For computing JS, there will be two entries in the vectors with the probability distributions that will be compared: , *Q* = [*q*_{i}, 0]. This procedure can be applied to all elements *i*, *j*, summing all unmatched entries: is the total flow in the set of paths *i* that cannot be matched by the *j*s because they are shorter. To simplify computations, that unmatched term can also include ∑*p*_{u}, where *u* denotes those paths *i* that do not match any *j*. Conversely, all paths *i* with *K*_{i} > *K*_{j} such that the paths become indistinguishable up to *K*_{j} can be summed in a single entry so that we obtain and for the matched and unmatched fractions, respectively. All computations have their corresponding counterparts for elements *i*, *j* when *K*_{i} < *K*_{j}. The above procedure is applied at all distinct *k*, the number of mutations of the final genotypes of the true LODs. The final JS (and 1-precision and 1-recall) is the weighted sum of each of those JS (and 1-precision and 1-recall), weighted by *w*_{k}, the frequency of all paths from the LOD that end at *k* mutations. This procedure results in unique JS (remember the *K* are all the same for at least one of the sets of paths) as well as unique 1-precision and 1-recall, and it reduces to the procedure (see above) when all *K*_{i} are equal and equal to all *K*_{j}. A commented example and further details are provided in the Supporting Information (section 5.1 in S4 Text).

#### Statistical modeling of performance.

We have used generalized linear mixed-effects models, with a beta model for the dependent variable [47–49], to model how JS, 1-recall, and 1-precision, are affected by *S*_{p}, detection regime, sample size, number of genes, type of fitness landscape, and CPM. In all models, the response variable was the average from the five split replicates of each fitness landscape by sample size by detection regime combination, and fitness landscape id (not type) was a random effect. When the dependent variable had values exactly equal to 0 or 1, we used the transformation suggested in [49]. Models were fitted using sum-to-zero contrasts [50] and all regressors were used as discrete regressors, except *S*_{p}, which has been scaled (mean 0, variance 1) for easier interpretation; the coefficients of the main effect terms of the discrete regressors are the deviations from the average (see further details in section 6 in S4 Text). We have used the glmmTMB [51] and car [52] packages for R [53] for statistical model fitting and analysis.

### Characteristics of the simulated fitness landscapes and genotypes

All the fitness landscapes used are shown in S1 Fig. We provide here a brief description of the main features of the three different fitness landscapes and the simulated data sets. The three types of fitness landscapes had comparable numbers of accessible genotypes but differed in the number of local fitness maxima and reciprocal sign epistasis, as shown in Figures A to C in S2 Fig (representable fitness landscapes had a single fitness maximum with no reciprocal sign epistasis, whereas RMF landscapes had the largest of both, and local maxima landscapes were intermediate).

Simulations resulted in varied amounts of clonal interference, as measured by the average frequency of the most common genotype (Figures D and E in S2 Fig); scenarios where clonal sweeps dominated (i.e., those characterized by the smallest clonal interference) corresponded to initial population sizes of 2000, with clonal interference being much larger at the other population sizes (Figure D in S2 Fig).

Simulations resulted in a wide range of numbers of paths to the maximum (number of distinct LODs: Figure F in S2 Fig). LOD diversities (*S*_{p}) ranged from 0.3 to 8.7 (Figure G in S2 Fig) with RMF models showing smaller *S*_{p}; RMF landscapes had the largest number and diversity of observed local fitness maxima (Figures H and I in S2 Fig) and *Sp* was strongly associated to the number of accessible genotypes (Figure J in S2 Fig). As designed, the number of mutations of the fitness maxima were 7 and 10 in the representable landscapes; the mean number of mutations were smaller in the fitness maxima of the local maxima and RMF landscapes (Figure K in S2 Fig).

The number of different sampled genotypes was comparable between detection regimes (Figure L in S2 Fig), but diversity differed (Figure M in S2 Fig), with the uniform detection regime showing generally larger sampled diversity. The mean and median number of mutations of sampled genotypes (Figures N and O in S2 Fig) differed between detection regimes, being largest in the large detection regime, and smallest in the small detection regime; the standard deviation and coefficient of variation in the number of mutations (Figures P and Q in S2 Fig) were largest in the uniform detection regime (thus, the uniform detection regime showed both the largest variation in number of mutations of genotypes and the largest diversity of genotypes). Sample characteristics and the difference in sample characteristics between detection regimes were affected by type of fitness landscape (Figures M and P in S2 Fig).

### Cancer data sets

We have used twenty-two cancer data sets (including six different cancer types: breast, glioblastoma, lung, ovarian, colorectal, and pancreatic cancer). All of these data sets, except for the breast cancer data sets BRCA_ba_s and BRCA_he_s (from [54]), have been used previously as input for some CPM algorithms in [10, 14–16, 19, 34, 35], with the original sources of the data being [55–65]. Details on sources, names, and how the data were obtained and processed are provided in S5 Text.

These data sets vary in sample size (27 to 594 samples), number of features (from 7 to over 100), data types (nonsynonymous somatic mutations and copy number aberrations or both), levels of analysis (genes, modules and pathways, exclusivity groups), patterns of number of mutations per subject and frequency of mutations analyzed, and procedures for driver selection, and restriction of patient subtypes. The data sets, therefore, are a large representative ensemble of data sets to which researchers have previously applied or might apply CPMs.

We have run the CPM analyses three times per data set, limiting the number of features analyzed to the 7, 10, and 12 most common ones, so as to examine how our assessments depend on the number of features analyzed; the first two thresholds use the same number of features as the simulations. (For data sets with 7 or fewer features, there are no differences in the data sets used under the 7, 10, and 12 thresholds; likewise for data sets with 8 to 10 features with respect to thresholds 10 and 12).

We do not know the true paths of tumor progression, but we can use the bootstrap to assess the robustness or reliability of the inferences. To do so, we repeated the process above with 100 bootstrap samples (section 1.2 in S5 Text). We computed *JS*_{o,b}, the average JS between the distribution of paths to the maximum from the original data set and each of the bootstrapped samples. Large differences in the distribution of paths between the analyses with the bootstrap samples and the analysis with the original sample (i.e., large *JS*_{o,b}) would suggest that the inferences are unreliable and cannot be trusted (but small differences do not indicate that the inferred paths match the distribution of the true ones).

## Results

### Predicting paths of evolution with CPMs

The CPMs used (four, two with two variants, yielding a total of six different procedures for obtaining CPMs: H-CBN, MCCBN, OT, CAPRI_AIC, CAPRI_BIC, and CAPRESE) can be divided into three groups: models that return trees (OT and CAPRESE) and two families of models that return DAGs, CBN (H-CBN and MCCBN) and CAPRI (CAPRI_AIC and CAPRI_BIC). Comparing within groups with respect to JS one member of the pair consistently outperformed the other (see Figure A in S6 Text). OT (using probability-weighted paths, see below) was significantly better than CAPRESE (paired *t*-test over all non-missing 56595 pairs of results: *t*_{56594} = −161.1, *P* < 0.0001), H-CBN was significantly better than MCCBN (*t*_{56593} = −42.6, *P* < 0.0001), and CAPRI_AIC was significantly better than CAPRI_BIC (*t*_{56594} = −41.9, *P* < 0.0001). In what follows, therefore, and for the sake of brevity, we will focus on OT, H-CBN, and CAPRI_AIC, since the overall performance of their alternatives is worse.

Fig 3 shows how the performance measures for OT, H-CBN, and CAPRI_AIC change with sample size for all combinations of type of landscape, detection regime, and number of genes (see Figure B in S6 Text for the probability of recovering the most common LOD). The measures of JS and 1-precision for OT and H-CBN (and MCCBN) use probability-weighted paths computed as explained in Measures of performance and predictability, because there was strong evidence for all three models that the probability-weighted paths led to better results (JS, paired *t*-test over all pairs: OT, *t*_{56594} = −195.8, *P* < 0.0001; H-CBN: *t*_{56594} = −222.3, *P* < 0.0001; MCCBN: *t*_{56593} = −149.0, *P* < 0.0001; 1-precision: OT: *t*_{56594} = −187.6, *P* < 0.0001; H-CBN: *t*_{56594} = −217.6, *P* < 0.0001; MCCBN: *t*_{56593} = −130.3, *P* < 0.0001). (See also Figures D to F in S6 Text). Overall, H-CBN was the model with the best performance (*P* < 0.0001 from all pairwise comparisons between the six procedures with Tukey’s contrasts and single-step multiple testing p-value adjustment [66] on linear mixed-effects models with landscape by split replicate as random effect). It must be noted, however, that all CPMs can show large variability in performance (Figure G in S6 Text).

(A) Jensen-Shannon divergence (JS); (B) 1—recall; (C) 1—precision. For all measures, smaller is better. For OT and H-CBN, JS (panel A) and 1-precision (panel C) use probability-weighted paths (see text). Each point represented is the average of 210 points (35 replicates of each one of the six combinations of 3 initial size by 2 mutation rate regimes —see Overview of the simulation study); we are thus marginalizing over mutation rate by initial simulation size combinations. Each one of the 210 points is, itself, the average of five runs on different partitions of the simulated data. See Figure A in S6 Text for results for all six procedures used (four, two with two variants: H-CBN, MCCBN, OT, CAPRI_AIC, CAPRI_BIC, and CAPRESE).

JS differed between type of landscape, number of genes, detection regime, and sample size, but the magnitude and even direction of effects differed between combinations of those factors, as seen in Figs 3 and 4. Generalized linear mixed-effects models fitted to the complete data set and to the different combinations of CPM and type of landscape (see section 11 in S6 Text) also showed highly significant (*P* < 0.0001) two-, three-, and four-way interactions between most of the factors, in particular those involving type of landscape and detection regime. Type of landscape and detection regime also had very strong effects in the variability of the estimates, with relative variabilities that could reach 20% with small sample sizes (Figure G in S6 Text).

Coefficients are from models with sum-to-zero contrasts (see text and Section 6 in S4 Text). Within each panel, coefficients have been ordered from left to right according to decreasing absolute value of coefficient. The dotted horizontal gray line indicates 0 (i.e. no effect). Coefficients with a large positive value indicate factors that lead to a large decrease in performance (increase in JS). Only coefficients that correspond to a term with a P-value <0.05 in Type II Wald chi-square tests are shown. The coefficient that corresponds to Number of genes 7 is not shown (as it is minus the coefficient for 10 genes —from using sum-to-zero contrasts). “N_Genes”: number of genes; “S_Size”: sample size; “Detect”: detection regime; “Sp”: LOD diversity (*S*_{p}).

Under representable fitness landscapes, performance improved with increasing sample size and with the uniform detection regime. Performance was worse in fitness landscapes of 10 genes (Figs 3A and 4 top row); the decrease in performance with increasing number of genes is related to CPMs both missing evolutionary paths (Fig 3B), and allowing paths that are not used by evolution (Fig 3C). With CAPRI_AIC the effect of sample size was much weaker and increases in sample size could lead to decreases in performance, specially under the uniform detection regime (highly significant, *P* < 0.0001, interactions of detection and sample size —section 11 in S6 Text). This is attributable to CAPRI_AIC excluding many paths taken during evolution (Fig 3B). This behavior was caused by CAPRI_AIC sometimes allowing only a few or even just one path to the maximum (Figure H in S6 Text); see also next section.

Under the RMF landscape overall performance was worse. Increasing sample size for OT and H-CBN led to minor decreases in performance (Figs 3 and 4 bottom row). CPMs failed to capture about 50% of the evolutionary paths (or fractions of paths) to the local maxima (Fig 3B) and included more than 75% of paths (or fractions of paths) that were never taken by evolution (Fig 3C). The behavior under local maxima was similar to that of representable fitness landscapes in terms of the direction of most effects, but effects were generally weaker, with the exception of evolutionary unpredictability.

Evolutionary unpredictability itself had a strong effect on performance. There were highly significant interactions (*P* < 0.0001) between evolutionary unpredictability (as measured with *S*_{p}), detection regime, and sample size, within representable and local maxima landscapes, as well as in the overall models (section 11 in S6 Text). In most scenarios, performance was worse with larger unpredictability (larger *S*_{p}) as seen by the positive slopes of JS on *S*_{p} (Fig 5). But under representable landscapes, in the large detection regime and for sample sizes 50 and 200, larger evolutionary unpredictability was associated with better performance. Under RMF fitness landscapes, large evolutionary unpredictability was associated with poorer performance over all sample sizes. Under local maxima, the effect of evolutionary unpredictability depended strongly on sample size and detection regime, with reversal of effects from sample size of 50 compared to 4000 under the large detection regime, similar to the ones in representable landscapes.

A beta regression was fitted to each subset of data. Each regression was fitted to 210 points, each of which is itself the average of five replicates, one for each of the five runs on different partitions of the simulated data.

### Inferring evolutionary unpredictability from CPMs

Fig 6A shows the ratio of inferred to true evolutionary unpredictability, *S*_{c}/*S*_{p}. Under representable fitness landscapes, for H-CBN this ratio remained close to 1 over all combinations of detection regime, number of genes, and sample size; the values were closest to one with sample size 4000 and under the uniform detection regime. This is in spite of large differences in the ratio of estimated number of paths to the maximum over true number of paths to the maximum (Fig 6B). This good performance is a consequence of both using probability-weighted paths by H-CBN (and OT) and of changes in scale (diversities use logarithms). Patterns for CAPRI_AIC seemed dominated by the tendency of CAPRI_AIC to only allow very few paths as the sample size grows large (see also Figure H in S6 Text) and were also the consequence of CAPRI_AIC’s inability to produce probability-weighted paths. For all CPMs type of landscape affected the quality of estimates: under local maxima and specially RMF the number and diversity of paths tended to be overestimated, sometimes by large factors. In summary, and regardless of fitness landscape, the estimates of evolutionary unpredictability from H-CBN (*S*_{c}) could be used to obtain an upper bound of the true evolutionary unpredictability.

(A) Average of the ratio of diversity of paths to the maximum inferred by the CPMs (*S*_{c}) relative to the true LOD diversity (*S*_{p}), for all combinations of type of landscape by detection regime by number of genes by sample size. (B) Like (A), but for number of paths to the maximum from the CPMs relative to the observed number of distinct LODs. As in Fig 3, each point is the average of 210 points. (C) Slope of the regression of *S*_{c} on *S*_{p}; each point is thus a slope from a regression of 210 points, each of which is itself the average of 5 replicates (see Fig 5). (A) shows whether evolutionary unpredictability (*S*_{p}) tends to be over- or under-estimated by *S*_{c}; (C) shows how *S*_{c} changes with *S*_{p} —see section 13 in S6 Text for an example of positive ratios with negative slopes.

And how does the estimated evolutionary unpredictability change with the true evolutionary unpredictability? Fig 6C shows that the slopes of regressions of estimated unpredictability from CPMs (*S*_{c}) on true unpredictability (*S*_{p}) changed depending on fitness landscape, detection regime, and sample size, including slopes over and under 1, and even inversion of signs.

### Cancer data sets

We have used H-CBN (the best performing model in the simulations) on twenty-two cancer data sets to examine the estimated evolutionary unpredictability and to assess the reliability of the estimates. The results are shown in Fig 7 (see Figure A in S5 Text for ranges of bootstrap runs). Unreliability (*JS*_{o,b} —section Cancer data sets) was large for most data sets, and very large for some of them. These results would be expected, even if the true fitness landscapes were representable ones, as most of the data sets have small sample sizes (less than 1000), and we have seen that performance is poor (large *JS*) for that range of sample sizes (Fig 3A). For these data sets there was no relationship between *JS*_{o,b} and sample size (Fig 7A), and when the same data set was analyzed using pathways/modules and genes, performance was generally better using pathways or modules (Pan_pa vs. Pan_ge, Col_pa vs. Col_g, GBM_pa vs. GBM_ge, GBM_mo vs. GBM_CNA). Within data sets, and for all data sets, as the number of features analyzed increased performance either decreased or stayed the same (i.e., for data sets with more than 7 features, unreliability at the 10 feature threshold, , was larger or equal to unreliability at the 7 feature threshold, ; for data sets with more than 10 features, : Figure A in S5 Text).

(A) *JS*_{o,b} vs. sample size of data sets. (B) *JS*_{o,b} vs. number of features analyzed for each data set. (C) *JS*_{o,b} vs. estimated evolutionary unpredictability; in the bottom x-axis, *S*_{c} is shown in terms of number of equiprobable paths; orange and salmon vertical lines indicate 20 and 100 equiprobable paths, respectively. All results shown are from analysis with up to 12 features. Values shown for *JS*_{ob} are the average of the 100 bootstrap runs; values for unpredictability (*S*_{c} or equiprobable paths) are from the analysis with the original, non-bootstrapped, data.

There were mild trends for an association between smaller *JS*_{o,b} and smaller numbers of features and smaller *S*_{c} (Fig 7B and 7C), with notable exceptions: the Pancreas Pathways (Pan_pa) data set had very small *JS*_{o,b} even for moderate number of features, and the All Pathways (all_pa) data set had a relatively small *JS*_{o,b} even though it used 12 features and had a large *S*_{c}; the GBM CNA modules (GBM_mo) data set also showed moderate *JS*_{o,b} in spite of having nine features and relatively large *S*_{c}. Conversely, some data sets with small *S*_{c} had extremely unreliable path predictions (e.g., BRCA_ba_s, Col_mss_co, Col_msi_co, GBM_ge).

Values for *S*_{c} were well within the ranges of *S*_{c} estimated by H-CBN for the simulated data (Figure K in S6 Text). As expected, *S*_{c} increased with the number of features analyzed (see also Figure D in S5 Text). Given the results from section Inferring evolutionary unpredictability from CPMs, where generally *S*_{p} < *S*_{c}, this suggests that the true evolutionary unpredictability (when analyzing up to 12 features) for 13 of the data sets should be less than that corresponding to about 100 equiprobable paths to the maximum, but only eight are below the much more manageable, and useful, 20 equiprobable paths (Fig 7D). The Pan_pa, GBM_coo, and BRCA_he_s show outstanding patterns in Fig 7. Examination of the output from H-CBN revealed that there was one single path with estimated probability > 0.97 for Pan_pa, and two paths to the maximum of about equal probability that together added > 0.95 for GBM_coo. BRCA_he_s had only four features but mutations in SRPRA and PIK3R1 were present each in only four individuals (different individuals for the two mutations); repeated runs of H-CBN led to different sets of restrictions being inferred which, because there are few paths to the maximum, and some had large probabilities (> 0.5), resulted in large differences in JS statistic between runs.

## Discussion

Can we predict the likely course of tumor progression using CPMs? We have examined the performance of six different procedures for obtaining CPMs (four CPMs, two of them with two variants: H-CBN and MCCBN, OT, CAPRI_AIC and CAPRI_BIC, and CAPRESE). H-CBN was the best performing CPM in our study. Using H-CBN under the representable fitness landscapes (the scenarios that agree with CPMs’ assumptions) returned estimates of the probability of paths of tumor evolution that were not far from the true distribution of paths of evolution (Fig 3A) when sample size was very large. But we find that, even under representable fitness landscapes, performance with moderate (and more realistic) sample sizes was considerably worse and was affected by detection regime. The analysis of the twenty-two cancer data sets revealed that performance (as measured by *JS*_{o,b}, an indicator of unreliability of inferences) was poor or very poor for most data sets. Even data sets with few features and small estimated diversity of paths to the maximum, *S*_{c}, showed very unreliable predictions.

What factors, and how, affect performance? Under representable fitness landscapes, performance on simulated data was affected by the number of features, the dimension of the fitness landscape: JS was worse with 10 than with 7 genes (Figs 3 and 4). Increasing sample size improved performance (Fig 3). Detection regime and evolutionary unpredictability, as measured by LOD diversity (*S*_{p}), affected individually and jointly all performance measures (Figs 3 and 5). Increased evolutionary unpredictability was detrimental to performance under most conditions (Fig 5). Detection regime was a key determinant of performance, as already found in previous work [31, 67]; performance was better under the uniform detection regime and, more importantly, detection regime affected how the rest of the factors (evolutionary unpredictability, sample size, and number of features) impacted on performance (Figs 3 to 5).

The analysis of the twenty-two cancer data sets also indicated number of features as a major determinant of performance. Across data sets, unreliability of inferences (*JS*_{o,b}) increased with number of features (Fig 7). Within-data set unreliability also increased as the number of features increased (Figure A in S5 Text; note that an increase in the number of features analyzed leads to an increase in the number of features with low frequency events). Interestingly, the driver-selected data sets (Col_mss, Col_msi, BRCA_he_s, BRCA_ba_s) did not perform much better than data sets with a simple frequency-based selection of features (e.g., Lu, Ov, or comparison Ov with Ov_drv). Even data sets with very careful, manually-curated selection of drivers and “exclusivity groups” and where variability due to subtypes has been minimized (Col_msi, Col_msi_co, Col_mss, Col_mss_co, ACML_co, BRCA_he_s and BRCA_ba_s) showed very large *JS*_{o,b}. And BRCA_he_s, with only four features, showed much larger *JS*_{o,b} than GBM_coo and Pan_pa (with 3 and 7 features, respectively), due to the presence of two low frequency alterations.

These results bring forth the problem of the selection of the relevant features for analysis [10, 15, 68] and whether sample size is large enough relative to the number and frequency of features considered. We have previously shown that feature selection can have a very detrimental impact on the performance of CPMs [67]. Using pathways instead of genes in the analyses (e.g., [68, 69]) can alleviate some of the problems of feature selection. Data sets coded as pathways or modules generally reduced the presence of low-frequency alterations (Figures B and C in S5 Text). Pathways can also improve predictability and how close the estimates of path distributions are to the truth because they are more similar to heritable phenotypes, which often have smoother phenotype-fitness maps and tend to show more repeatable evolution ([5]; see also [70], but also [71, 72]). Selection of the appropriate level of analysis is also a relevant question given non-genetic heterogeneity and phenotypic variance, which can speed up the evolutionary process and adaptation to novel selective pressures [73–75], including the evolutionary processes of cancer [76]. One of the consequences of phenotypic variance is the smoothing of the fitness landscape, leading to “lower peaks and shallower valleys” [77, p. 2311]. It is possible, thus, that in the presence of phenotypic variance, analysis closer to the phenotype could better fulfil the assumptions of CPMs. Gerstung et al. [10] found that analysis using pathways gave stronger evidence for order constraints than analysis using genes, and we also see in Fig 7 that both *S*_{c} and *JS*_{o,b} tend to decrease if we use pathways or modules (Pan_pa vs. Pan_ge, Col_pa vs. Col_g, GBM_ge vs. GBM_pa, GBM_mo vs. GBM_CNA). Using so-called “exclusivity groups” (*sensu* [15]) to identify “fitness equivalent alterations” is a similar, though not identical, procedure that in this paper showed only modest improvements in *JS*_{o,b} (Col_mss_co vs. Col_mss, Col_msi_co vs. Col_msi, ACML_co vs. ACML). This can be due to particularities of these data sets (e.g., large number of features relative to number of subjects) or the intrinsic difficulties of identifying true fitness equivalent groups via “hard/soft exclusivities”. However, although analysis using pathways/modules/exclusivity groups might lead to more reliable results from the predictability point of view, the identification of paths at the gene level is still the ultimate goal for therapeutic interventions (see [78]). Regardless of the details of the procedure for collapsing and reducing features, our results suggest that further work on feature selection should consider reduction of variability of estimates of evolutionary paths as a key component.

Hosseini et al. [36] reanalized the DAG-derived representable and a subset (those where the fully mutated genotype has the largest fitness) of the DAG-derived non-representable fitness landscapes in [31]. They find good agreement between the distributions of paths to the maximum from H-CBN and the fitness landscape-based probability distribution of paths to the maximum computed assuming SSWM. Our results for H-CBN under the best conditions are not as optimistic. Two differences in the studies explain the differences. First, [36] computes the fitness landscape-based probability of paths assuming a SSWM regime and restricting the analysis to fitness landscapes where the fully mutated genotype has the largest fitness, while our analyses directly examine the distribution of the paths to the maximum in each simulation (LODs), without restricting the evolutionary regime and the fitness landscapes; second, [36] uses H-CBN with the very large sample size of 20000 (the full data sets in [31]), while we use a more realistic range of sample sizes.

Even very good performance, though, needs to be interpreted with care. Very good performance simply tells us that the true and estimated probability distributions of the paths to the maximum agree closely. If the true evolutionary unpredictability is large, then for practical purposes our capacity to predict what will happen (in the sense of providing a small set of likely outcomes) is very limited. Ranges of diversities of 3.2 to 6.0, equivalent to 25 to 400 equiprobable paths, were common in the simulated data (Figure K in S6 Text) and are comparable to the ranges in most cancer data sets with 7 and 10 feature thresholds (Figure A in S5 Text). The inability to narrow down the likely paths to a small set of paths in these cases is not a limitation of the CPMs, but a problem inherent to the unpredictability of the evolutionary process in many scenarios, which could severely limit the usefulness of even perfect predictions.

The discussion above has centered on representable fitness landscapes. As argued before, fitness landscapes with local fitness maxima are probably common in cancer. With local fitness maxima, achieving good recall involves the relatively easier task of getting right the first part of short paths to the maximum. But good recall was more than offset by low precision and overall predictability was very poor. In fitness landscapes with local maxima, CPMs are fitting models with paths of tumor progression that extend beyond the true end point of the progression. In RMF fitness landscapes, in addition to local peaks, not even the set of accessible genotypes can be represented by DAGs of restrictions (see [31], and Fig 1). The violations of assumptions in RMF and local maxima fitness landscapes explain the decreases in the relevance of sampling regime and why increasing sample size has negligible (or even detrimental) effects in these fitness landscapes (Figs 3 and 4). Remarkably, regardless of type of fitness landscape (i.e., even under violation of assumptions), and for the two tasks considered (prediction of paths and estimating unpredictability) performance of CPMs that could return probability-weighted paths (H-CBN, MCCBN, OT) was better when using probability-weighted paths; thus, further improvement in these CPMs, even under violations of assumptions, might be possible by recalibrating their output.

CPMs have been developed for cross-sectional data sets with a single observation per subject. If we instead had access to data sets comprised of many subjects sampled many times over the course of tumor development we could directly compute the probabilities of the paths of tumor progression without using CPMs (and, thus, without relying on CPMs’ assumptions about fitness landscapes). Note, though, that using CPMs on multiple data points from one single subject, such as from multi-region, single-cell data, would not provide estimates of probabilities of paths of tumor progression since a single subject constitutes a single realization of the evolutionary process. Caravagna et al. [79] have recently proposed a method that they have used to infer repeated evolution across patients from data sets with moderate numbers of multiple observations per patient. Further development, evaluation, and comparison of these approaches with those using only cross-sectional data will become important as data sets with moderate numbers of multiple data points per individual become more common.

Finally, returning to our second question, even if achieving good performance in predicting the paths of tumor progression is unlikely, inferring evolutionary unpredictability could be an easier task. Can we use inferences of evolutionary unpredictability from CPMs as estimates of the true evolutionary unpredictability? Under representable fitness landscapes, H-CBN, the best performing model also for this task (Fig 6B), returned values of *S*_{c} very similar to *S*_{p}, the evolutionary unpredictability estimated from the diversity of paths, and this held over detection regimes and sample sizes. Hosseini et al. [36] also find that the estimates of predictability from H-CBN correlate well with the fitness landscape-based evolutionary predictability (estimated assuming SSWM in fitness landscapes where the fully mutated genotype has largest fitness), with slopes of the regression of CPM-based on landscape-based predictability generally slightly below 1, similar to our findings (Fig 6C). These good results do not hold under the other two types of fitness landscapes that we analyzed: evolutionary unpredictability is overestimated and increasing sample size made the problems worse, and different evolutionary scenarios, sample sizes, and detection regimes have different relationships of estimated and true unpredictability (Fig 6C). But our results indicate that we can use H-CBN to set upper bounds on the true *S*_{p}; obtaining tighter estimates is an objective for further research to explore. And here our analysis of twenty-two cancer data sets suggests that the true evolutionary unpredictability of at least some cancer scenarios might be reasonably small, specially if *S*_{c} is overestimating the true unpredictability.

### Conclusion

The answer to the question “can we predict the likely course of tumor progression using CPMs?” is, unfortunately, at least for the models examined, “only with moderate success and only under representable fitness landscapes and with very large sample sizes; but even perfect predictions might be of little use if evolutionary unpredictability is large”. Estimating upper bounds to evolutionary unpredictability is a more modest, though more likely to succeed, use of CPMs. Promisingly, several cancer data sets showed low evolutionary unpredictability. There are three key difficulties for successful prediction: the sheer size of the problem even for moderate numbers of genes, the intrinsic evolutionary unpredictability in many scenarios, and the deviations from the assumptions of CPMs that are likely to hold in most cancer data. Further methodological work to allow CPMs to deal with rugged, multi-peaked, fitness landscapes could improve their usefulness to predict tumor evolution. In addition to the caveat about using these models under scenarios where performance is very poor, this paper raises the general question of what can we really predict about likely paths of tumor progression from cross-sectional data, for instance to guide therapeutic interventions. At a minimum, measures such as *JS*_{o,b} and *S*_{c} with CPMs that return probability-weighted paths should probably become routine as ways of providing a sense of the reliability of predictions and for assessing whether the predictions could be of any practical use.

## Supporting information

### S1 Fig. Plots of simulated fitness landscapes and fitness graphs.

Plots of the 1260 fitness landscapes (and corresponding fitness graphs) used.

https://doi.org/10.1371/journal.pcbi.1007246.s001

(PDF)

### S2 Fig. Simulated fitness landscapes and fitness graphs: Characteristics, evolutionary unpredictability, clonal interference, and sampled genotypes.

https://doi.org/10.1371/journal.pcbi.1007246.s002

(PDF)

### S1 Text. Differences in fitness landscapes, simulations, methods, and objectives, with Diaz-Uriarte, 2018 [31].

https://doi.org/10.1371/journal.pcbi.1007246.s003

(PDF)

### S2 Text. Generating random fitness landscapes.

https://doi.org/10.1371/journal.pcbi.1007246.s004

(PDF)

### S3 Text. Evolutionary simulations.

Runs until fixation; detection regimes and sampling; other parameters of the simulations; number of genes used; LODs through non-accessible genotypes, LODs that go beyond a local maximum, and moving through fitness valleys.

https://doi.org/10.1371/journal.pcbi.1007246.s005

(PDF)

### S4 Text. CPMs: Software, probabilities of paths, statistics of performance, linear models.

https://doi.org/10.1371/journal.pcbi.1007246.s006

(PDF)

### S5 Text. Cancer data sets: Sources, characteristics, additional results.

https://doi.org/10.1371/journal.pcbi.1007246.s007

(PDF)

### S1 Dataset. Compressed file with data and code.

This is the first of a two-part zip file (made up of files S1_Dataset.zip and S2_Dataset.z01). See instructions in S7 Text (briefly: rename S2_Dataset.z01 to S1_Dataset.z01 and uncompress the split archive).

https://doi.org/10.1371/journal.pcbi.1007246.s010

(ZIP)

### S2 Dataset. Compressed file with data and code.

This is the second of a two-part zip file (made up of files S1_Dataset.zip and S2_Dataset.z01). See instructions in S7 Text (briefly: rename S2_Dataset.z01 to S1_Dataset.z01 and uncompress the split archive).

https://doi.org/10.1371/journal.pcbi.1007246.s011

(Z01)

## Acknowledgments

N. Beerenwinkel, S. Posada-Céspedes, and G. Caravagna for discussion about progression models or software; S.-R. Hosseini for providing a preprint of his MSc. thesis and for comments that helped us clarify our methods. C. Lázaro-Perea for comments on the ms.

## References

- 1. McPherson AW, Chan FC, Shah SP. Observing Clonal Dynamics across Spatiotemporal Axes: A Prelude to Quantitative Fitness Models for Cancer. Cold Spring Harb Perspect Med. 2018;8(2):a029603. pmid:28630229
- 2. Greaves M. Evolutionary Determinants of Cancer. Cancer Discovery. 2015;5(8):806–820. pmid:26193902
- 3. Lipinski KA, Barber LJ, Davies MN, Ashenden M, Sottoriva A, Gerlinger M. Cancer Evolution and the Limits of Predictability in Precision Cancer Medicine. Trends in Cancer. 2016;2(1):49–63. pmid:26949746
- 4. Williams MJ, Werner B, Heide T, Curtis C, Barnes CP, Sottoriva A, et al. Quantification of Subclonal Selection in Cancer from Bulk Sequencing Data. Nature Genetics. 2018;50(6):895–903. pmid:29808029
- 5. Lässig M, Mustonen V, Walczak AM. Predicting Evolution. Nature Ecology & Evolution. 2017;1(3):s41559–017–0077–017.
- 6.
Losos JB. Improbable Destinies: Fate, Chance, and the Future of Evolution. S.l.: Riverhead Books; 2018.
- 7. Toprak E, Veres A, Michel JB, Chait R, Hartl DL, Kishony R. Evolutionary Paths to Antibiotic Resistance under Dynamically Sustained Drug Selection. Nature Genetics. 2012;44(1):101–105.
- 8. Palmer AC, Kishony R. Understanding, Predicting and Manipulating the Genotypic Evolution of Antibiotic Resistance. Nature Reviews Genetics. 2013;14(4):243–248. pmid:23419278
- 9. Gerstung M, Baudis M, Moch H, Beerenwinkel N. Quantifying Cancer Progression with Conjunctive Bayesian Networks. Bioinformatics. 2009;25(21):2809–2815. pmid:19692554
- 10. Gerstung M, Eriksson N, Lin J, Vogelstein B, Beerenwinkel N. The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis. PLoS ONE. 2011;6(11):e27136. pmid:22069497
- 11. Montazeri H, Kuipers J, Kouyos R, Böni J, Yerly S, Klimkait T, et al. Large-Scale Inference of Conjunctive Bayesian Networks. Bioinformatics. 2016;32(17):i727–i735. pmid:27587695
- 12.
Szabo A, Boucher KM. Oncogenetic Trees. In: Tan WY, Hanin L, editors. Handbook of Cancer Models with Applications. World Scientific; 2008. p. 1–24. Available from: http://www.worldscibooks.com/lifesci/6677.html.
- 13. Desper R, Jiang F, Kallioniemi OP, Moch H, Papadimitriou CH, Schäffer AA. Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data. J Comput Biol. 1999;6(1):37–51. pmid:10223663
- 14. Ramazzotti D, Caravagna G, Olde Loohuis L, Graudenzi A, Korsunsky I, Mauri G, et al. CAPRI: Efficient Inference of Cancer Progression Models from Cross-Sectional Data. Bioinformatics. 2015;31(18):3016–3026. pmid:25971740
- 15. Caravagna G, Graudenzi A, Ramazzotti D, Sanz-Pamplona R, Sano LD, Mauri G, et al. Algorithmic Methods to Infer the Evolutionary Trajectories in Cancer Progression. PNAS. 2016;113(28):E4025–E4034. pmid:27357673
- 16. Olde Loohuis L, Caravagna G, Graudenzi A, Ramazzotti D, Mauri G, Antoniotti M, et al. Inferring Tree Causal Models of Cancer Progression with Probability Raising. PLOS ONE. 2014;9(10):e108358. pmid:25299648
- 17. Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer Evolution: Mathematical Models and Computational Inference. Systematic Biology. 2015;64(1):e1–e25. pmid:25293804
- 18. Beerenwinkel N, Greenman CD, Lagergren J. Computational Cancer Biology: An Evolutionary Perspective. PLoS Comput Biol. 2016;12(2):e1004717. pmid:26845763
- 19. Misra N, Szczurek E, Vingron M. Inferring the Paths of Somatic Evolution in Cancer. Bioinformatics (Oxford, England). 2014;30(17):2456–2463.
- 20. Tomasetti C, Marchionni L, Nowak MA, Parmigiani G, Vogelstein B. Only Three Driver Gene Mutations Are Required for the Development of Lung and Colorectal Cancers. PNAS. 2015;112(1):118–123. pmid:25535351
- 21. Beijersbergen RL, Wessels LFA, Bernards R. Synthetic Lethality in Cancer Therapeutics. Annual Review of Cancer Biology. 2017;1(1):141–161.
- 22. O’Neil NJ, Bailey ML, Hieter P. Synthetic Lethality and Cancer. Nat Rev Genet. 2017;18(10):613–623. pmid:28649135
- 23.
Brouillet S, Annoni H, Ferretti L, Achaz G. MAGELLAN: A Tool to Explore Small Fitness Landscapes. bioRxiv. 2015; p. 031583.
- 24. Crona K, Greene D, Barlow M. The Peaks and Geometry of Fitness Landscapes. Journal of Theoretical Biology. 2013;317:1–10. pmid:23036916
- 25. de Visser JAGM, Krug J. Empirical Fitness Landscapes and the Predictability of Evolution. Nat Rev Genet. 2014;15(7):480–490. pmid:24913663
- 26. Franke J, Klözer A, de Visser JAGM, Krug J. Evolutionary Accessibility of Mutational Pathways. PLoS Comput Biol. 2011;7(8):e1002134. pmid:21876664
- 27. Blomen VA, Májek P, Jae LT, Bigenzahn JW, Nieuwenhuis J, Staring J, et al. Gene Essentiality and Synthetic Lethality in Haploid Human Cells. Science. 2015;350(6264):1092–1096. pmid:26472760
- 28. Chiotti KE, Kvitek DJ, Schmidt KH, Koniges G, Schwartz K, Donckels EA, et al. The Valley-of-Death: Reciprocal Sign Epistasis Constrains Adaptive Trajectories in a Constant, Nutrient Limiting Environment. Genomics. 2014;104(6, Part A):431–437. pmid:25449178
- 29. Bank C, Matuszewski S, Hietpas RT, Jensen JD. On the (Un)Predictability of a Large Intragenic Fitness Landscape. PNAS. 2016;113(49):14085–14090. pmid:27864516
- 30. Szendro IG, Franke J, de Visser JAGM, Krug J. Predictability of Evolution Depends Nonmonotonically on Population Size. PNAS. 2013;110(2):571–576. pmid:23267075
- 31. Diaz-Uriarte R. Cancer Progression Models and Fitness Landscapes: A Many-to-Many Relationship. Bioinformatics. 2018;34(5):836–844. pmid:29048486
- 32. Farahani HS, Lagergren J. Learning Oncogenetic Networks by Reducing to Mixed Integer Linear Programming. PloS ONE. 2013;8(6):e65773.
- 33. Sakoparnig T, Beerenwinkel N. Efficient Sampling for Bayesian Inference of Conjunctive Bayesian Networks. Bioinformatics (Oxford, England). 2012;28(18):2318–24.
- 34. Cheng YK, Beroukhim R, Levine RL, Mellinghoff IK, Holland EC, Michor F. A Mathematical Methodology for Determining the Temporal Order of Pathway Alterations Arising during Gliomagenesis. PLoS computational biology. 2012;8(1):e1002337. pmid:22241976
- 35. Attolini C, Cheng Y, Beroukhim R, Getz G, Abdel-Wahab O, Levine RL, et al. A Mathematical Framework to Determine the Temporal Sequence of Somatic Genetic Events in Cancer. Proceedings of the National Academy of Sciences. 2010;107(41):17604–17609.
- 36. Hosseini SR, Diaz-Uriarte R, Markowetz F, Beerenwinkel N. Estimating the Predictability of Cancer Evolution. Bioinformatics. 2019;35(14):i389–i397.
- 37. Neidhart J, Szendro IG, Krug J. Adaptation in Tunably Rugged Fitness Landscapes: The Rough Mount Fuji Model. Genetics. 2014;198(2):699–721. pmid:25123507
- 38. McFarland CD, Korolev KS, Kryukov GV, Sunyaev SR, Mirny LA. Impact of Deleterious Passenger Mutations on Cancer Progression. Proceedings of the National Academy of Sciences of the United States of America. 2013;110(8):2910–5. pmid:23388632
- 39. Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, Chen S, et al. Accumulation of Driver and Passenger Mutations during Tumor Progression. Proceedings of the National Academy of Sciences of the United States of America. 2010;107:18545–18550. pmid:20876136
- 40. Diaz-Uriarte R. OncoSimulR: Genetic Simulation with Arbitrary Epistasis and Mutator Genes in Asexual Populations. Bioinformatics. 2017;33(12):1898–1899. pmid:28186227
- 41. Beerenwinkel N, Antal T, Dingli D, Traulsen A, Kinzler KW, Velculescu VE, et al. Genetic Progression and the Waiting Time to Cancer. PLoS computational biology. 2007;3(11):e225. pmid:17997597
- 42.
Wodarz D, Komarova NL. Dynamics of Cancer: Mathematical Foundations of Oncology; 2014.
- 43. Nowak MA, Michor F, Komarova NL, Iwasa Y. Evolutionary Dynamics of Tumor Suppressor Gene Inactivation. PNAS. 2004;101(29):10635–10638. pmid:15252197
- 44. Sniegowski PD, Gerrish PJ. Beneficial Mutations and the Dynamics of Adaptation in Asexual Populations. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2010;365(1544):1255–1263. pmid:20308101
- 45.
Crooks GE. On Measures of Entropy and Information; 2017. Available from: http://threeplusone.com/on_information.pdf.
- 46. Lin J. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information theory. 1991;37(1):145–151.
- 47. Ferrari S, Cribari-Neto F. Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics. 2004;31(7):799–815.
- 48.
Grün B, Kosmidis I, Zeileis A. Extended Beta Regression in
*R*: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software. 2012;48(11). - 49. Smithson M, Verkuilen J. A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological methods. 2006;11(1):54–71. pmid:16594767
- 50.
McCullagh P, Nelder JA. Generalized Linear Models, 2nd Ed. London: Chapman and Hall/CRC; 1989.
- 51. Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9(2):378–400.
- 52.
Fox J, Weisberg S. An R Companion to Applied Regression, 2nd Ed. Thousand Oaks, CA: Sage; 2011.
- 53.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2018. Available from: https://www.R-project.org/.
- 54. Cancer Genome Atlas Research Network. Comprehensive Molecular Portraits of Human Breast Tumours. Nature. 2012;490(7418):61–70. pmid:23000897
- 55. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) Database and Website. Br J Cancer. 2004;91(2):355–358. pmid:15188009
- 56. Cancer Genome Atlas Research Network. Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways. Nature. 2008;455(7216):1061–1068. pmid:18772890
- 57. Jones S, Zhang X, Parsons DW, Lin JCH, Leary RJ, Angenendt P, et al. Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science (New York, NY). 2008;321(5897):1801–6.
- 58. Parsons DW, Jones S, Zhang X, Lin JCH, Leary RJ, Angenendt P, et al. An Integrated Genomic Analysis of Human Glioblastoma Multiforme. Science. 2008;321(5897):1807–1812. pmid:18772396
- 59. Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, et al. The Genomic Landscapes of Human Breast and Colorectal Cancers. Science. 2007;318(5853):1108–1113. pmid:17932254
- 60. Brennan CW, Verhaak RGW, McKenna A, Campos B, Noushmehr H, Salama SR, et al. The Somatic Genomic Landscape of Glioblastoma. Cell. 2013;155(2):462–477. pmid:24120142
- 61. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, et al. Somatic Mutations Affect Key Pathways in Lung Adenocarcinoma. Nature. 2008;455(7216):1069–1075. pmid:18948947
- 62. Cancer Genome Atlas Research Network. Integrated Genomic Analyses of Ovarian Carcinoma. Nature. 2011;474(7353):609–615. pmid:21720365
- 63. Knutsen T, Gobu V, Knaus R, Padilla-Nash H, Augustud M, Strausberg RL, et al. The Interactive Online SKY/M-FISH & CGH Database and the Entrez Cancer Chromosomes Search Database: Linkage of Chromosomal Aberrations with the Genome Sequence. Genes, Chromosomes and Cancer. 2005;44(1):52–64. pmid:15934046
- 64. Piazza R, Valletta S, Winkelmann N, Redaelli S, Spinelli R, Pirola A, et al. Recurrent SETBP1 Mutations in Atypical Chronic Myeloid Leukemia. Nature Genetics. 2013;45(1):18–24. pmid:23222956
- 65. Cancer Genome Atlas Research Network. Comprehensive Molecular Characterization of Human Colon and Rectal Cancer. Nature. 2012;487(7407):330–337. pmid:22810696
- 66. Hothorn T, Bretz F, Westfall P. Simultaneous Inference in General Parametric Models. Biom J. 2008;50(3):346–363. pmid:18481363
- 67. Diaz-Uriarte R. Identifying Restrictions in the Order of Accumulation of Mutations during Tumor Progression: Effects of Passengers, Evolutionary Models, and Sampling. BMC Bioinformatics. 2015;16(41). pmid:25879190
- 68. Cristea S, Kuipers J, Beerenwinkel N. pathTiMEx: Joint Inference of Mutually Exclusive Cancer Pathways and Their Progression Dynamics. Journal of Computational Biology. 2016 pmid:27936934
- 69. Raphael BJ, Vandin F. Simultaneous Inference of Cancer Pathways and Tumor Progression from CrossSectional Mutation Data. Journal of Computational Biology. 2015;22(00):250–264.
- 70. Wang E, Zaman N, Mcgee S, Milanese JS, Masoudi-Nejad A, O’Connor-McCourt M. Predictive Genomics: A Cancer Hallmark Network Framework for Predicting Tumor Clinical Phenotypes Using Genome Sequencing Data. Seminars in Cancer Biology. 2015;30:4–12. pmid:24747696
- 71. Chebib J, Guillaume F. What Affects the Predictability of Evolutionary Constraints Using a G-matrix? The Relative Effects of Modular Pleiotropy and Mutational Correlation. Evolution. 2017;71(10):2298–2312. pmid:28755417
- 72. Sailer ZR, Harms MJ. Molecular Ensembles Make Evolution Unpredictable. PNAS. 2017;114(45):11938–11943. pmid:29078365
- 73. Bódi Z, Farkas Z, Nevozhay D, Kalapis D, Lázár V, Csörgő B, et al. Phenotypic Heterogeneity Promotes Adaptive Evolution. PLOS Biology. 2017;15(5):e2000644. pmid:28486496
- 74. Payne JL, Wagner A. The Causes of Evolvability and Their Evolution. Nat Rev Genet. 2019;20(1):24–38. pmid:30385867
- 75. Aguirre J, Catalán P, Manrubia S, Cuesta JA. On the Networked Architecture of Genotype Spaces and Its Critical Effects on Molecular Evolution. Open Biology. 2018;8(7):180069. pmid:29973397
- 76. Frank SA, Rosner MR. Nonheritable Cellular Variability Accelerates the Evolutionary Processes of Cancer. PLoS Biol. 2012;10(4):e1001296. pmid:22509130
- 77. Frank SA. Natural Selection. II. Developmental Variability and Evolutionary Rate. Journal of Evolutionary Biology. 2011;24(11):2310–2320. pmid:21939464
- 78. Ashworth A, Lord CJ, Reis-Filho JS. Genetic Interactions in Cancer Progression and Treatment. Cell. 2011;145(1):30–38. pmid:21458666
- 79. Caravagna G, Giarratano Y, Ramazzotti D, Tomlinson I, Graham TA, Sanguinetti G, et al. Detecting Repeated Cancer Evolution from Multi-Region Tumor Sequencing Data. Nature Methods. 2018;15(9):707. pmid:30171232