Inferring ongoing cancer evolution from single tumour biopsies using synthetic supervised learning
Fig 2
(A) In a cohort of 2.8 million synthetic tumours, TumE outperformed all existing common population genetic [20,21] and cancer evolution [7,12] specific summary statistics when differentiating between positive selection and neutral evolution, based on AUROC (two-sided Wilcoxon test). (B) Further, for predicting the true frequency of selected subclones, TumE provides comparable or better performance relative to the current state-of-the-art mixture model MOBSTER [16] that properly accounts for neutral dynamics in tumour populations. The panel shows correlation between the true and predicted subclone frequency in 80,000 synthetic tumours sequenced at 150x mean sequencing depth. (C) In an orthogonal dataset of 150 synthetic tumours [16] with either 0 or 1 detectable subclones, TumE was significantly faster at estimating the number of subclones (two-sided Wilcoxon test) than existing mixture model based methods sciClone [24] and MOBSTER [16] (measured in inference time per sample). In addition, only TumE and MOBSTER consistently identified the correct number of subclones, as both methods directly account for the neutral dynamics observed in tumour populations. (D) TumE estimates in a synthetic tumour sequenced at 120x mean sequencing depth and a subclone at 54% cellular fraction.