PICDGI: A framework for predicting cancer driver genes through dynamic gene-gene interaction modeling of single-cell data

doi:10.1371/journal.pcbi.1014143

Fig 1.

From environmental mutations to the emergence of cellular heterogeneity in cancer progression.

Schematic representation of how environmental factors contribute to mutations that drive cancer development. Mutations in Proto-oncogenes (Proto-OG), and or tumor suppressor genes (TSG) impair their normal protective roles, leading to emergence of cancerous cells. Mutations induced by factors such as UV radiation and smoking can activate OGs (upper pathway) or the inactivation of TSGs (lower pathway). These mutations disrupt normal cellular regulation, leading to uncontrolled cell proliferation and tumor formation, which in turn cause widespread changes in gene expression. (B). Overview of single-cell gene expression heterogeneity. ScRNA-seq data are collected from cancer patients at different stages of progression for example Early, Mid, and Late. The processed expression matrices were visualized using a nonlinear dimensionality reduction method to denoise data, reduce complexity, and improve cluster interpretability for cell type identification. Clustering and annotation are used to reveal distinct cell populations, including immune cells, cancer cells, and other cell types. For each identified cluster (Cluster A, Cluster B, Cluster C), time-series gene expression vectors are derived from the three stages, representing dynamic changes in expression during cancer progression.

More »

Expand

Fig 2.

PICDGI framework.

(A) Representation of gene-gene interaction effects (GIE) in cancer progression. Nodes denote genes and edges denote regulatory interactions, with statistical variability in interactions contributing to genetic heterogeneity. Five categories of genes are considered, with their interaction effects differing by type. (B) Illustration of GIE strength: oncogenes (OGs) and tumor suppressor genes (TSGs) are expected to exert stronger effects on network dynamics compared with other gene classes. (C) Computational formulation of PICDGI. The model links observed temporal gene expression data to hidden variables at two levels: (i) local hidden variables (e.g., gene-specific mutations and expression fluctuations) and (ii) global hidden variables capturing the overall GIE structure across the network. (D) Inference procedure. The effect of a gene on driving mutations in other genes is quantified through the highest density interval (HDI) of the posterior distribution over gene expression dynamics, integrating both temporal patterns and estimated gene–gene interactions.

More »

Expand

Fig 3.

Heatmap visualization of covariance structures across hurst exponents.

Heatmaps of covariance matrices for the innovation (error generating) process illustrating the Influence of the Hurst Exponent on long-range dependence over time. For and, covariance is highly localized along the diagonal, with weak long-range dependence. At , the covariance matrix is more uniform, balancing local and global dependence. As increases to and , covariance spreads further, indicating stronger long-range dependence. The optimal H is the value that minimizes the error between the estimated and observed covariance matrices, ensuring the best alignment with the observed covariance structure.

More »

Expand

Fig 4.

Overview of single cells from the lung tissues of three patients.

(A) t-SNE plots showing profiles of single cells from each tissue origin for three patients. In the first row (patient 1), 42,996, 45,150, and 29,061 cells are shown, respectively. In the second row (patient 2), 3,871, 4,362, and 3,301 cells are shown, respectively. In the third row (patient 3), 3,381, 3,766, and 5,731 cells are shown, respectively. Plots are color-coded by major cell lineages and gene expression counts. (B) Fractions of cells originating from tumor versus non-malignant lung tissues across cell types. Tumor-origin cell fractions vary by cell type and LUAD stage across patients, with epithelial cells consistently exhibiting the highest tumor fractions, increasing with LUAD progression.

More »

Expand

Table 1.

Pearson’s correlation coefficient () and coefficient of determination ().

More »

Expand

Fig 5.

Predicted vs. observed gene expression levels in epithelial cells.

(A-C) Scatterplots illustrating the performance of the PICDGI framework in predicting epithelial cell gene expression across the Early, Mid, and Late stages of LUAD progression for Patients 1, 2, and 3, respectively. Each plot shows the relationship between true gene expression (TGE) and predicted gene expression (PGE), with Pearson’s correlation coefficient (ρ), coefficient of determination (R²), and corresponding p-value computed using a two-sided t-test. (D) Summary of predictive accuracy across stages. Barplots display the mean Pearson correlation coefficients (ρ) ± SEM (Standard Error of the Mean) for the comparison between TGE and PGE at each of the three time points; Early, Mid, Late for each patient. These summary statistics complement the scatterplots by providing an aggregated view of model performance across genes. From top to bottom, the panels correspond to Patient 1, Patient 2, and Patient 3.

More »

Expand

Fig 6.

Cancer driver genes with the highest driver coefficient.

(A) Barplot showing the driver coefficients of epithelial cell genes derived from patient 1 gene expression data using the PICDGI framework. Data are presented as mean + /- SEM (Standard Error of the Mean). Black cross marks indicate genes previously identified as oncogenes (OGs) or tumor suppressor genes (TSGs). (B) Heatmap showing PICDGI-derived DrCoef values for the top 30 epithelial driver genes (selected based on panel A) recalculated independently within each annotated immune cell type from patient 1 single-cell data. DrCoef values in this panel are computed using cell-type specific models, enabling assessment of the regulatory influence of epithelial-identified driver genes across immune compartment. (C) Boxplots comparing transcription factor (TF) expression and TF activity between normal epithelial and cancer cells for two representative TFs showing discordance between differential activity and differential expression. P-values for differential TF activity and expression were calculated using a t-test and Wilcoxon rank-sum test, respectively. Boxplot elements indicate the median (horizontal line), interquartile range (box), and whiskers extending to 1.5 × interquartile range.

More »

Expand

Fig 7.

Comparison of the PICDG framework with the existing Moran’s I test algorithm for predicting driver genes’ inference in immune cells.

The driver genes identified through Moran’s I test display a lower average expression level compared to the expression level of driver genes presented by the PICDGI computational framework. The genes are ranked from the highest to the lowest immune-suppressive role (1 to 10): (A) Single-cell atlas map the trajectory and time values of cells progression; (B) Mast cell; (C) Natural Killer; (D) T cell; (E) B cell; (F) Dendritic cell.

More »

Expand