Extending differential gene expression testing to handle genome aneuploidy in cancer

doi:10.1371/journal.pcbi.1014134

Extending differential gene expression testing to handle genome aneuploidy in cancer

Fig 2

DeConveil benchmarking on simulated gene expression data.

(A) Schematic overview of the simulation framework. Gene expression counts are generated for two biological conditions (e.g., healthy vs tumor) with CN alterations present in one condition. Expression differences in each gene-dosage class (DSGs, DCGs, DIGs, non-DEGs) reflect changes in the expected mean expression µ_g, as defined by the generative model (see S1 Text), which jointly depends on biological condition and CN. Ground-truth DEGs are defined by the simulation and compared against detected DEGs. (B) Evaluation of DE detection performance under CN confounding. Precision, recall, F1-score, and Matthews correlation coefficient (MCC) are shown as a function of sample size per condition (10, 20, 40, and 60), comparing DeConveil (CN-aware) and PyDESeq2 (CN-naive). (C) Assessment of DeConveil’s accuracy in effect size estimation. Top: Mean Square Error (MSE) between estimated and true log₂FC. Bottom: Pearson correlation between estimated and true log₂FC. Results compare DeConveil and PyDESeq2. (D) Gene dosage classification performance of DeConveil. Precision, recall, F1-score, and MCC are shown for distinguishing DCGs from DSGs and DIGs, under weak and strong CN signal conditions, as a function of sample size.

doi: https://doi.org/10.1371/journal.pcbi.1014134.g002