Fig 1.
Tumour evolution and clonal exclusivity.
During tumour evolution different mutations may arise leading to heterogeneous subclones with distinct genotypes. From single-cell sequencing we may reconstruct the mutational history of each tumour, encoding the ordering of mutations and their phylogenetic relationships. The clones are numbered in each tree and inherit mutations in ancestral clones. For example in patient 1, clone 2 exhibits mutations in genes A, D and H (red, orange, purple) as displayed in the clone-genotype matrix on the right. Clonally exclusive mutations will appear in different branches of the trees and exhibit mutually exclusive patterns in the clone-genotype matrices, as exhibited for example for the two rightmost mutations G and H (blue and purple) in both patients.
Fig 2.
Calibration of the chi-squared approximation for the gene pair placement test.
For gene pairs simulated to occur in n patient samples, for smaller n we observe overly liberal p-values and a lack of calibration with the chi-squared approximation for the gene pair placement test. For larger n, the approximation becomes more appropriate.
Fig 3.
Calibration of the exact gene pair placement test.
For the exact test on the simulated data of Fig 2 we observe conservative p-values at lower significance levels, and very pronounced discrete effects for larger values.
Fig 4.
Calibration of the exact gene pair placement test with Monte Carlo smoothing.
Smoothing the exact gene pair placement test in Fig 3 by adding noise to the rates, we observe good calibration and the desired uniform distribution of p-values under the null for the larger n.
Fig 5.
Power of the exact gene pair placement test.
The power of the exact placement test as we increase the effect size Δ, which is the change in clonal exclusivity rate on the logit scale, for different sample sizes.
Fig 6.
Comparison to standard exclusivity tests.
Standard independence tests (top row: Fisher’s exact test, G-test and log odds ratio (OR) test, from left to right) are heavily miscalibrated for data generated from placing mutations on lineage trees under the null. GeneAccord’s gene pair placement test (bottom row: chi-squared approximation, exact test and with Monte Carlo smoothing, from left to right), apart from some enrichment with the chi-squared approximation, shows proper calibration.
Table 1.
GeneAccord combined results for the AML cohort [12].
Ranked list of the gene pairs tested with the GeneAccord exact combined test on the cohort of 123 AML patient samples. For each gene pair, the column nt is the total number of patients exhibiting both gene mutations, n the number of those patients whose trees are not linear or star shaped and ncx the number of times the genes are clonally exclusive within those n trees. The columns nl and ns contain the number of linear and star trees. Δ is the clonal exclusivity score indicating enrichment of clonal co-occurrence (positive) or clonal exclusivity (negative) with ±∞ corresponding to hitting the numerical optimisation bounds. LLR is the log-likelihood ratio statistic, p is the p-value and q the adjusted p-value after Benjamini-Hochberg correction. Only gene pairs with n > 3 are considered.
Fig 7.
Significance of clonal co-occurrence or exclusivity of the three GeneAccord tests for the AML cohort.
For the AML cohort, we plot the significance of clonal co-occurrence or exclusivity by computing the log10 p-value and including the sign of the effect (positive indicating clonal co-occurrence, negative indicating clonal exclusivity) for the three geneAccord tests: The x-axis depicts the placement test, the y-axis the occurrence test and the colouring the combined test (red indicating clonal co-occurrence, blue indicating clonal exclusivity). The labels of the significant gene pairs of the combined test, after Benjamini-Hochberg correction (Table 1), are also coloured.