Supplementary Information for “Identiﬁcation of constrained cancer driver genes based on mutation timing”

The cancer-driving effect of mutations can depend on the mutation of other genes or more generally on the mutational, transcriptional, or environmental context. Epistatic interactions (among genes or between genes and other factors) introduce statistical dependencies and they can result in order constraints on the mutation accumulation during tumorigenesis. In order to study the dynamics of these order-constrained mutations and to motivate our mutation timing approach, we will consider Conjunctive Bayesian networks (CBNs), which model dependencies in binary mutation data. The continuous-time CBN (CT-CBN) model is defined by a partial order, ≺, on a set of n mutations and parameters λj , for all j ∈ [n] = {1, . . . , n}. The distribution of the waiting time Tj of mutation j ∈ [n] is defined recursively as Tj ∼ Exp(λj) + max `∈pa(j) T` (1)


CBN model
The cancer-driving effect of mutations can depend on the mutation of other genes or more generally on the mutational, transcriptional, or environmental context. Epistatic interactions (among genes or between genes and other factors) introduce statistical dependencies and they can result in order constraints on the mutation accumulation during tumorigenesis. In order to study the dynamics of these order-constrained mutations and to motivate our mutation timing approach, we will consider Conjunctive Bayesian networks (CBNs), which model dependencies in binary mutation data. The continuous-time CBN (CT-CBN) model is defined by a partial order, ≺, on a set of n mutations and parameters λ j , for all j ∈ [n] = {1, . . . , n}. The distribution of the waiting time T j of mutation j ∈ [n] is defined recursively as where Exp(λ j ) denotes the exponential distribution and pa(j) = pa ≺ (j) denotes the parents of j in the partially ordered set (poset) ([n], ≺). The parents of j are all mutations such that ≺ j is a cover relation, where ≺ j is a cover relation if no event j ∈ [n] exists with j = , j and ≺ j ≺ j, i.e., cover relations indicate direct predecessors. Eq. 1 states that mutation j can occur only after all its predecessor mutations have occurred. Thus, the order relations encoded in the mutation poset ([n], ≺) impose constraints on the occurrence of mutations (2). A mutational pathway is a linear extension of ([n], ≺), i.e., a sequence of mutations which is not violating any of the relations. The set of all mutational pathways of length k starting with a minimal element of the poset is denoted C k and a typical element is an ordered k-tuple (j 1 , . . . , j k ) of mutations j i ∈ [n] with j 1 minimal in the poset and j i ≺ j i+1 , for all i = 1, . . . k − 1. The set C k is in one-to-one correspondence with the set of all chains of length k + 1 in the distributive lattice G of order ideals of ([n], ≺) via the mapping (j 1 , . . . , j k ) → (∅, {j 1 }, {j 1 , j 2 }, . . . , {j 1 , . . . , j k }). We call G the genotype lattice. It consists of all valid mutation combinations (genotypes) and all possible evolutionary paths connecting them (1).
The probability of a mutational pathway C ∈ C k is given by the product of competing exponentials, Here, for any genotype g ∈ G, the exit set Exit(g) = Exit G (g) = {j ∈ [n] | j ∈ g and g ∪ {j} ∈ G} consists of all mutations that have not yet occurred in g, but could happen next. Two examples of mutation posets and their induced genotype lattices are displayed in Fig. S8 and S7. Fig. S8 shows an empty poset, where the n mutations are independent. Consequently, the genotype lattice is the n-dimensional hypercube; all mutational pathways are possible. Neutral passenger mutations are expected to accumulate in this manner. On the other hand, Fig. S7 shows a linear poset. This dependency structure is the strictest among all posets and requires mutations to occur in a single total order. Its genotype lattice consists only of this one mutational pathway. In the following, we will analyze the empty and the linear poset in more detail. They allow for deriving closed-form expressions of the probabilities of interest. The empty poset defines our null model of independence, whereas the linear poset is an example of deviation from the null (dependency structure).

The number of preceding mutations
We fix a mutation m ∈ [n]. Let C k (m) = {(j 1 , . . . , j k ) ∈ C k | j k = m} be the set of mutational pathways ending with mutation m in position k. We are interested in the number of mutations that occur before mutation m. This number is a random variable, because, in general, some mutations will be ≺-predecessors of m but some will be (conditionally) independent of m, such that their time of occurrence relative to m is stochastic. This number of preceding mutations of m is where I is the indicator function. Mutation m has exactly k − 1 predecessors in a specific realization of the oncogenetic process if the realization is a mutational pathway C that accumulates m as the k-th mutation. Hence, the distribution of A m is given by where C n (k, m) is the set of maximal pathways (j 1 , . . . , j n ) of length n with j k = m. These pathways are defined as all mutation orderings which are compatible with the poset. In the case of the empty poset, every possible order is compatible and the set of all possible orders can be enumerated by permuting the k elements. To compute this probability, it is enough to consider pathways of length k, because C∈Cn(k,m) For the empty poset ( The cumulative probability of the number of preceding mutations is For the empty poset with identical waiting time distributions, Pr(A m ≤ k − 1) = k m−1 1/n = k/n, which is linear in k. For the linear poset, Pr(A ji ≤ k−1) = 1 if i ≤ k, and 0 otherwise, such that, for fixed m = j i , this probability is a step function in k. This deviating behavior (linear versus step function) is the basis for our mutation timing ranking method. However, since the data is such that the mutation accumulation process is latent and only cross-sectional observations are available, we need to explicitly model the observation (or stopping, or sampling) process and calculate mutation observation probabilities.

The number of observed preceding mutations
The mutation accumulation process we have introduced in the above sections only provides a distribution over the order of mutations and their waiting times. However, in real data these orders and waiting times are hidden and we can only observe the mutations statuses at one specific time point, the diagnosis time point.
For the observation process, we model the observation time, or sampling time, as T s ∼ Exp(λ s ) and consider the extended poset [n] ∪ {s} with no relation between the additional sampling event s and any of the mutations 1, . . . , n. The probability of observing the genotype g ⊂ [n] is the probability of observing all mutation in g and none of its complement [n] \ g, before s occurred, Let X m be the binary random variable indicating observation of mutation m ∈ [n], and denote by τ := n i=1 X i the total number of observed mutations. We are interested in Pr(X m | τ ≤ k), the cumulative observation probability of mutation m when a total of k or less mutations have occurred. For the joint probability of X m and τ , we have and Finally, we are interested in the probability that among n mutations occurring according to a given CBN model, mutation m has occurred given that a total of k − 1 or less mutations are observed. This conditional probability will be used for gene ranking.

Empty poset
Let E k (s, m) be the set of all pathways of length k ending in s and containing m, and E k (s, \m) the set of all pathways of length k ending in s and not containing m. For the empty poset (Fig. S8), we first compute the joint probability of X m = 0 and τ = k − 1, If we again assume identical evolutionary rates, then all pathways of the same length have the same probability. We calculate the cardinalities of the following sets: For the probability of a single pathway C, we find In order to obtain the joint probability of X m and τ for identical evolutionary rates, we multiply the probability of a single pathway (Eq. 15) with the cardinality of the set E k (s, m) (Eq. 14), The marginal probability of τ is and hence we find the conditional probability The joint probability of X m = 1 and τ ≤ k − 1 as well as the cumulative marginal probability of τ can be derived by marginalizing over k. We obtain and By multiplying the above joint and marginal probabilities, we finally obtain the conditional probability P m (k) that is used for gene ranking, For an empty poset, P m (k) is the probability that among n independent mutations a specific mutation has occured when k − 1 or less mutations are observed in total. It is a linear function in k.

Linear poset
For ease of notation we set λ jn+1 = 0. For the linear poset (Fig. S7), we find if m ∈ {j 1 , . . . , j k−1 }, and zero otherwise. For identical evolutionary rates, λ 1 = . . . = λ n = λ s , this probability becomes 2 −k . The marginal probability of observing k − 1 mutations is For the conditional probability, we find the step function if m ∈ {j 1 , . . . , j k−1 }, and zero otherwise. Let p be the position of m in the poset, i.e., m = j p . The cumulative marginal probability of τ is and for the conditional probability, we find for p < k, and zero otherwise. Assuming identical evolutionary rates, this probability simplifies to Similarly, and for identical evoluionary rates, for p < k, and one otherwise. Hence, for linear posets, the conditional probability of interest, P m (k), that is used for ranking genes is This probability, as a function of k, is zero until k = p, then rises sharply, and then levels off (Fig. S9). In the linear poset case, P m (k) is the probability of observing the p-th mutation in a linear chain when a total of k − 1 or less mutations are observed.

The steepest slope of P m (k)
For gene ranking, we consider the conditional probability P m (k) = Pr(X m = 1 | τ ≤ k − 1) (Eq. 11) for different posets. We continue to assume identical rates of evolution for all mutations in this section, λ = λ 1 = · · · = λ n . We show that after rescaling, for any poset with a relation involving m, P m (k) has a higher maximal slope than for the empty poset, which has constant slope (Eq. 21). In other words, mutational dependencies on or of m can be detected by considering the steepest slope of P m (k). In order to compare highest slopes among posets, we first rescale P m (k) to be 1 at k = n, and define the mutation accumulation function such that its largest value, i.e., the overall marginal frequency of mutation m, is Q m (n) = 1. For n independent mutations (empty poset), P ind m (k) = (k − 1)/(2n) is linear in k, with constant slope 1/(2n) (Eq. 21) and marginal mutation frequency P ind m (n) = (n − 1)/(2n) for any mutation m ∈ [n]. Hence, has constant slope 1/(n − 1). We first compare this slope to the steepest slope of Q for a linear poset. If the mutations are totally ordered, and mutation m = j p ∈ [n] occurs at position p of the linear chain, then P lin m (k) is zero for k < p and for k ≥ p, it increases in a non-linear fashion as (2 −p − 2 −k )/(1 − 2 −k ) (Eq. 30). The marginal frequency of mutation m is (2 −p − 2 −n )/(1 − 2 −n ). Hence, if k ≥ p, and zero otherwise. The highest slope of Q lin m is at k = p, namely When comparing the highest slopes between the empty and the linear poset, we find that for all m, if and only if 1 This inequality is fulfilled for all 1 ≤ p ≤ n. Thus, the mutation accumulation function Q increases much steeper for totally ordered mutations than for independent mutations. Next, we relax the condition of total mutation order and generalize this observation to the situation where a mutation m has only at least one upstream or downstream dependency. For any given poset with this property, we denote by P dep m (k) and Q dep m (k) the corresponding conditional and scaled conditional probability, respectively.
that is, such that its scaled mutation accumulation function has steepest slope larger than in the case of independent mutations occurring at the same evolutionary rate.
Proof. We proceed by considering the two alternative (but non-exclusive) assumptions separately. In the first case, m is dependent on at least one event (i.e., has an upstream dependencies). The mutation accumulation function Q dep m (k) is monotonically increasing in k, because mutations are not reversible and therefore the (scaled) conditional probability can only stay constant or increase with k. Furthermore, Q dep m (k) is 0 at least until k = 2. This is because m can not occur before all its upstream dependent events have occurred. For the empty poset, Q ind m (k) can be written as (k − 1) · h where h ind = 1/(n − 1) is the constant step size for the increase of the accumulation function (Eq. 32). Furthermore, Q dep m (k) can be written as k t=1 h t where h t is the increase of the accumulation function at time t. We know that h 1 = 0 and h t ≥ 0 for all t. Hence, there is an index ∈ {2, . . . , n} such that h > h ind , i.e., the largest increase of Q dep m (k) is higher than that of Q ind m (k). Let us assume now that mutation m has only downstream dependencies. We first analyze the situation where m has n − 2 direct downstream dependencies and no further dependencies. If P dep m (k) was a linear function in k, then it would hold that P dep . Since m has only direct downstream dependencies, we can explicitly calculate the three conditional probabilities. First, because if only one event is observed, then it could only have been the diagnosis event. Next, where the numerator is the probability that mutation m occurred right before the diagnosis event, and the denominator is the probability that two or less events happened. Similarly, follows the same structure as P dep m (2). With these expression, the above relation yields, after simplifying, n = n + 1, a contradiction. Thus, P dep m (k) is not linear and there must be a point between k = 1 and k = n where the slope of Q dep m (k) is larger then the one of Q ind m . Finally, let us assume that mutation m has downstream dependencies that have further downstream dependencies. Following the case above, we know that Q dep m (2) = (1/n)/P m (n). This is because, as compared to the situation above, only additional downstream nodes are added and this does not influence P dep m (2). For the empty poset, we have Q ind m (2) = 1/(n−1), such that Q dep m (2) > Q ind m (2) if and only if (n−1)/n > P dep m (n), which is true for all n > 2, because P dep m (n) is bounded from above by P ind m (n) = (n − 1)/(2n) < 1/2. Thus, at k = 2, we find a steeper slope than for the empty poset. This concludes the proof.

Approximating with sigmoid functions and ranking by slope
In practice, we want a robust estimate of the steepest slope of Q m (k). For this purpose, we approximate Q m (k) by a sigmoid function f m (k), and the steepest slope (Eq. 34) is an upper bound for the expected slope of the sigmoidal curve. To obtain a better estimate, we consider the average slope of Q m (k) after the initial increase. Specifically, we analyze at which point this probability crosses the 90% threshold. For large n, Q m (n) ≈ 2 −p . Thus, we consider This equation has the solution k = log 2 (10(−0.9 + 2 p )) (38) which can be approximated by k = p + 3 (Fig. S10), i.e., after k > p + 3 the conditional probability P linear m (k) is larger than 90% of its asymptote. If we linearly interpolate the rise from 0 at k = p to the 90% of the asymptote at k = p + 3, then we find an average slope of 0.9/3 = 0.3 for the linear poset. This compares to the (scaled) exact value of log(2)/(1 − 2 −p ), or approximately log(2) ≈ 0.69 for large p.
For the empty poset, 90% of its increase will be achieved after k passed 90% of n + 1 which is the maximum value for k. Thus, the (scaled) average slope is 0.9/n. Comparing the independent and linear cases, we find a steeper average slope in the linear case, if and only if 0.3 > 0.9/n or n > 3.   Figure S9: Conditional mutation probabilities for the linear and the empty poset. The curves depict the conditional probability P m (k) = Pr(X m = 1 | τ ≤ k − 1) of gene m being mutated given that less than k mutations have occurred. For the linear poset, mutation m = j p at position p = 2 (light blue) and p = 5 (dark blue) of the linear chain are shown. The asymptote of P m (k), i.e., the overall marginal frequency of mutation m, which is 2 −p for the linear poset, is indicated to the right. For independent mutations, P m (k) is shown in orange. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  Figure S10: Approximate slope for the linear poset. Shown is the number of additional mutations (to p) needed to reach 90% of the cumulative limit probability for a linear poset. Circles denote k = log 2 (10(−0.9 + 2 p )) and the red line is k = p + 3.