Integrative GWAS and RNA-Seq analysis for target identification and virtual drug screening in colorectal cancer

Qinghui Liu; Yiyang Lei; Zixuan Liu; Jiale Han

doi:10.1371/journal.pone.0333179

Abstract

Background

Colorectal cancer (CRC) is a leading cause of global cancer-related mortality, necessitating the identification of novel therapeutic targets. Integrating genetic and transcriptomic data may reveal key molecular drivers of CRC progression and treatment opportunities.

Methods

We performed a multiomics analysis combining genome-wide association study (GWAS) data (p < 1e-6) and RNA-seq data from the TCGA. Differential expression analysis (Limma) identified 24 consistently dysregulated genes (17 mRNAs, 7 lncRNAs) in CRC. Survival analysis was used to evaluate their prognostic impact on overall survival (OS), relapse-free survival (RFS), and post progression survival (PPS). Drug‒gene interactions were explored via Enrichr, and virtual screening (PubChem) prioritized high-affinity compounds that target PYGL, a metabolic regulator.

Results

Integration of GWAS and RNA-seq revealed that 24 CRC-associated genes, including PYGL, SMAD7, and TCF7L2, are involved in tumor metabolism and Wnt/TCF signaling. Survival analysis revealed that five genes (CDKN2B, BOC, METRNL, etc.) were significantly correlated with OS, RFS, and PPS. Ten small-molecule candidates targeting PYGL exhibited high binding affinity, suggesting their therapeutic potential.

Conclusion

This study identified CRC-linked genes through GWASs and transcriptomics, highlighting their prognostic and druggable relevance. Computational drug repurposing pinpoints PYGL inhibitors as promising candidates, offering a translational framework for CRC therapy development.

Citation: Liu Q, Lei Y, Liu Z, Han J (2025) Integrative GWAS and RNA-Seq analysis for target identification and virtual drug screening in colorectal cancer. PLoS One 20(10): e0333179. https://doi.org/10.1371/journal.pone.0333179

Editor: Zhengrui Li, Shanghai Jiao Tong University, CHINA

Received: April 11, 2025; Accepted: September 10, 2025; Published: October 27, 2025

Copyright: © 2025 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The gene expression data utilized in this study were obtained from the Genomic Data Commons (GDC) Data Portal (https://portal.gdc.cancer.gov/). Specifically, RNA-sequencing (RNA-seq) gene expression counts from the TCGA-COAD project were downloaded, with open access.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Colorectal cancer (CRC) accounts for approximately 10% of global cancer cases and is the second leading cause of cancer-related mortality [1,2]. In China, the annual incidence rate of CRC is 9.2%, ranking fourth among all cancers. In terms of both incidence and mortality, colorectal cancer rates were significantly higher in males than in females and greater in rural areas than in urban areas. The incidence increases significantly with age, especially after 40 or 45 years of age [3].

Single nucleotide polymorphisms (SNPs) are among the most common forms of genetic variation and involve single nucleotide alterations at specific genomic positions. These variations differ among individuals and can influence various phenotypic traits. SNPs can affect gene function, making them key determinants of susceptibility, progression, and prognosis in diseases such as CRC. The functional impact of SNPs depends on their genomic location and mutation type, with those occurring in coding or regulatory regions exerting more substantial effects on gene function. The TCGA database provides RNA-seq data across various cancer types, including gene expression profiles from tumor, adjacent, and normal tissues. Analyzing these datasets allows the identification of differentially expressed genes and their encoded proteins, many of which have emerged as key therapeutic targets in disease research, such as SPAG5, MAGEA3 [4], and TOP2A [5]. Furthermore, a drug and disease prediction platform was developed on the basis of TCGA data [6,7].

However, single-method analyses of differential gene expression often have limitations, as they focus primarily on highly differentially expressed genes while potentially overlooking those with lower expression differences, thereby reducing the likelihood of identifying novel therapeutic targets or drug candidates [8]. To address this, the present study integrates GWAS and RNA-seq data to identify novel target genes and proteins for CRC treatment [9]. First, GWAS data from European and East Asian populations were analyzed to identify significant genetic variants associated with CRC. Next, RNA-seq analysis identified seven mRNAs and six long noncoding RNAs (lncRNAs) with significant differential expression in CRC patients. Finally, molecular docking analysis was performed to screen potential drug candidates targeting these differentially expressed genes. By integrating multiomics approaches, this study aims to identify novel biomarkers and potential therapeutic targets for CRC, providing a theoretical foundation for future precision medicine strategies.

1. Materials and methods

1.1 Study design

The study design process is illustrated in Fig 1.

Download:

Fig 1. Schematic representation of the study design.

https://doi.org/10.1371/journal.pone.0333179.g001

1.2 Data acquisition

GWAS data related to CRC in European populations were obtained from the GWAS Catalog database. Datasets were filtered on the basis of case‒control status and sample size, with smaller sample sizes (e.g., ncases < 5,000) excluded to ensure statistical robustness. The GCST90018808 dataset was selected (Table 1), encompassing a total of 637,693 participants, including 14,886 CRC patients and 622,807 controls. The study population was exclusively European and East Asian, and both male and female participants were included to minimize allele frequency biases caused by population stratification and linkage disequilibrium (LD). Since this study utilized publicly available summary-level data, no additional ethical approval or informed consent was needed.

Download:

Table 1. Summary of GWAS datasets for colorectal cancer analysis.

https://doi.org/10.1371/journal.pone.0333179.t001

1.3 GWAS analysis

GWAS data analysis was conducted via the TwoSampleMR and ieugwasr packages in R (version 4.4.2). Chen et al. [11] employed a conditional p value threshold of p < 1e-6 to identify independent association signals and discovered several potential CRC susceptibility genes that had not been reported previously. SNPs with a significance threshold of p < 1e-6 were retained for further analysis, while weakly associated SNPs were filtered out. The functional annotation of the SNPs was performed via the FastTraitR package to assess the relationships among the SNPs, genes and phenotypes, facilitating the identification of CRC-associated genes. CMplot was used to construct a Manhattan plot, providing a visual representation of the positional distribution of significantly associated SNPs.

During linkage disequilibrium (LD)-based SNP screening, linked SNPs are typically removed, retaining only a single representative SNP. However, this selected SNP may have limited prior research or lack associated phenotypic data, potentially leading to the exclusion of SNPs with known phenotypic associations. To address this, we first conducted phenotypic analysis to identify SNPs with available phenotypic data, followed by LD analysis and subsequent investigation of the selected SNP loci. LD-based instrumental variable selection was carried out via the ieugwasr package, with the parameters set to r² < 0.1 and a window size of 100,000 kb. To minimize bias from weak instrumental variables, SNPs with an F statistic < 10 were excluded [12].

1.4 RNA-seq differential expression analysis

RNA-seq data for colon cancer types were retrieved and analyzed via Sangerbox (http://sangerbox.com/tool.html) [13]. Gene identifiers (ENSG_ID) were converted to GeneSymbols. First, the expression matrix was obtained, and genes and samples with more than 50% missing (NA) values were removed. Next, missing values were imputed via the impute.knn function from the R package impute, with the number of neighbors set to 10,000 for data imputation, and data normalization was performed via log²(X + 1) transformation. The expression profiles of genes identified through GWAS were extracted, and differential expression analysis was conducted via the Limma package [14]. The analysis criteria included a fold-change threshold of 2.0 and a significance threshold of FDR < 0.01 (|logFC| > 1.0 and -log10(FDR) ≥ 2.0) [15]. Volcano plots were constructed to visualize the results, highlighting genes with significantly altered expression in CRC.

1.5 Survival analysis across all genes in colon cancer

Survival analysis was performed via the Kaplan‒Meier plotter platform (https://kmplot.com/analysis/) [16]. Colon cancer was selected as the screening target, and either the gene symbols or Affy ids of the test genes were input into the designated fields. Patients were stratified via the “auto select best cutoff” option, which evaluates all possible cutoff values between the lower and upper quartiles and selects the optimal performance threshold as the cutoff value. The generated p value does not include correction for multiple hypothesis testing, all other parameters were maintained as default settings in the database. Separate survival analyses were conducted for OS, RFS, and PPS.

1.6 Enrichment analysis and functional annotation

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted to examine biological process variations via the GO biological process 2025 library and the KEGG 2021 human database on the Enrichr platform (https://maayanlab.cloud/Enrichr/) [17]. The results were filtered to display the top 10 enriched GO terms and KEGG pathways, with a focus on the biological process category.

1.7 Online drug screening

The Enrichr database (https://maayanlab.cloud/Enrichr/), specifically the DSigDB dataset, which comprises 4,026 small-molecule compounds and 19,513 genes, was used for online drug screening [17]. Significantly differentially expressed genes in CRC were queried in Enrichr to identify candidate drugs and their mechanisms of action, facilitating the selection of potential therapeutic agents for CRC.

1.8 Virtual screening

Protein structures and interaction networks of the significantly differentially expressed genes were retrieved from STRING version 12.0 (https://cn.string-db.org/) [18]. Structural data were obtained from the AlphaFold Protein Structure Database and the Protein Data Bank (PDB), with priority given to PDB entries. The selection criteria included the scientific name of the source organism (Homo sapiens), the experimental method (X-ray diffraction or electron microscopy), and a refinement resolution of ≤2.5 Å. For proteins lacking small-molecule ligands, binding sites were predicted via computational tools. If ligands were present, the ligand-binding region was defined as the active site for drug screening.

The protein encoded by the PYGL gene (PDB ID: 3DDS) was selected as the receptor, with its original ligand, 26B (molecular weight: 505.60 g/mol), used for docking. The 3DDS structure was determined by X-ray diffraction at a resolution of 1.80 Å. Only protein structures with well-defined active sites were retained for further analysis.

This study retrieved PYGL-targeting compounds from the BindingDB database (https://www.bindingdb.org/rwd/bind/index.jsp) [19]. Compounds with IC50 values of zero or nonspecific numerical data were removed. After the elimination of structurally identical compounds, 499 unique small-molecule compounds were obtained, all of which had experimentally determined IC₅₀ values against PYGL.

A total of 231,187 small-molecule crystal structures were downloaded from the PubChem database. Lipinski’s rule of five was applied to assess drug likeness [20] with the following criteria: molecular weight (200--500 Da), hydrogen bond donors (≤5), hydrogen bond acceptors (≤10), lipophilicity (LogP: 1--5), and rotatable bonds (1--10). After filtering, 62,964 small-molecule crystal structures were retained, hydrogenated, and optimized as ligands for virtual screening.

AutoDock Vina software was used for virtual screening [21], with protein crystal structures serving as receptors and the original ligand-binding pocket defined as the active site. Potential drug candidates were identified on the basis of binding energy and docking conformations. The molecular docking results were visualized and analyzed via PyMOL (version 3.1.0).

2. Results

2.1 Identification of SNPs significantly associated with phenotypes through GWAS data analysis

A significance threshold of p < 1e-6 was applied to filter out weakly associated SNPs from the CRC GWAS dataset derived from European and East Asian populations. Functional annotation was performed, and gene names were extracted from the MAPPED_GENE field across the datasets, resulting in the identification of 67 genes significantly associated with various phenotypes. These phenotypes include tumor-related traits, hematologic disorders, diabetes, and others. Most genes contained between 1 and 5 significant SNPs, although certain loci presented 10 or more. For example, the SMAD7 gene (chromosome 18: 48,922,435–48,927,678 bp) has more than 10 associated SNPs, including rs11874392, rs12953717, and rs12956924. The CDKN2B-AS1 locus (chromosome 9: 22,003,368–22,125,504 bp) contains more than 60 SNPs, all of which are strongly associated with gene activity.

LD-based instrumental variable selection was performed via the ieugwasr package. Integration of the GWAS datasets led to the identification of 55 SNPs and 67 candidate genes in total (S1 Table). Manhattan plots and QQ plot for the GWAS dataset were generated via the CMplot package. The results indicated that significant SNP loci were distributed across multiple chromosomes (Fig 2).

Download:

Fig 2. Manhattan plot and QQ plot for GWAS ID: GCST90018808.

A: Manhattan plot; B: QQ plot. The significance threshold was set at 1e-6, with SNPs exhibiting significant associations with colorectal cancer (CRC) appropriately annotated. Multiple SNPs reached genome-wide significance (p > 5e-8), and their proximal genes exhibited significant differential expression in the RNA-seq data.

https://doi.org/10.1371/journal.pone.0333179.g002

2.2 Identification of genes significantly associated with CRC via both GWAS and RNA-seq data analyses

RNA-seq data for colon adenocarcinoma (COAD) were retrieved from TCGA via Sangerbox. The data were normalized, with colon cancer patients serving as the comparison group and adjacent or normal tissues serving as the control group. Differential expression analysis was performed via the Limma package. To ensure data integrity, we removed genes with zero expression values in >50% of samples, yielding 64 genes associated with phenotype were identified.

By applying a fold-change (FC) threshold of 2.0 and a false discovery rate (FDR) threshold of 0.01, a total of 7 upregulated and 17 downregulated genes were identified (Table 2). A volcano plot was generated to visualize the RNA-seq analysis results for colon cancer, highlighting some genes that exhibited significant differential expression in both the GWAS and RNA-seq datasets for CRC. Notably, TMEM220-AS1, TMEM238L, SOX6, SMAD7, TCF7L2, and PYGL were significantly downregulated in colon cancer, whereas LINC02257, PCAT1, CASC8, and POU5F1B were significantly upregulated in colon cancer.

Download:

Table 2. Information on selected SNPs and associated genes.

https://doi.org/10.1371/journal.pone.0333179.t002

In addition, 7 lncRNAs, such as CDKN2B-AS1, TMEM220-AS1, PCAT1, and CASC8, were identified. Although these lncRNAs do not encode proteins, they play a regulatory role in gene expression through various molecular mechanisms. CDKN2B-AS1 and TMEM220-AS1 were significantly downregulated in colon cancer, whereas LINC02257, PCAT1, and CASC8 were significantly upregulated in colorectal cancer (Fig 3). Meng et al. [22] developed mPEG-DSPE liposomes encapsulating siRNA-targeting lncRNAs, which effectively inhibited CRC progression.

Download:

Fig 3. Volcano plot of differentially expressed genes in colon cancer RNA-seq data.

https://doi.org/10.1371/journal.pone.0333179.g003

2.3 Survival analysis

The Kaplan‒Meier plotter background database is manually curated, incorporating gene expression data along with relapse-free and overall survival information sourced from GEO, EGA, and TCGA. In this study, survival analysis was performed on 18 screened genes (Table 3 and Fig 4). Among these genes, five genes, CDKN2B, BOC, SCG5, PYGL, and METRNL, demonstrated statistical significance (p value < 0.05) in terms of OS, RFS, and PPS, with hazard ratios (HRs) > 1, indicating that high expression of these genes increases mortality risk and reduces patient survival rates. These genes may represent promising targets for future pharmacological development. SMAD7 expression was significantly different (p value < 0.05) across all three survival metrics, with an HR > 1 for OS and RFS and an HR < 1 for PPS, suggesting a complex relationship between its expression levels and colorectal cancer patient outcomes that warrants further investigation.

Download:

Table 3. Genes whose expression is higher in CRC tumors and are correlated to OS, RFS, PPS.

https://doi.org/10.1371/journal.pone.0333179.t003

Download:

Fig 4. Most significant druggable genes associated with Overall Survival.

HR = Hazard Rate, CDKN2B(A), BOC(B), SCG5(C), LRP1(D), ZCWPW2(E), PYGL(F).

https://doi.org/10.1371/journal.pone.0333179.g004

2.4 Enrichment analysis and functional annotation

The adjusted p values were calculated via the Benjamini‒Hochberg method to correct for multiple hypothesis testing. All genes in the human genome were used as background. Only the top 10 significant results are presented in Table 4. GO enrichment analysis revealed that the Wnt signaling pathway, fat cell differentiation and regulation of epithelial-to-mesenchymal transition were the most highly enriched biological processes. The Wnt signaling pathway is a highly conserved signaling cascade that plays crucial roles in cell fate determination, tissue development, and tumorigenesis. In the canonical Wnt pathway, β-catenin degradation is inhibited, allowing it to accumulate and interact with TCF/LEF transcription factors (e.g., TCF7L2) to activate the transcription of downstream target genes.

Download:

Table 4. Top 10 significantly enriched GO terms.

https://doi.org/10.1371/journal.pone.0333179.t004

KEGG pathway analysis revealed several significant pathways, including the TGF-beta signaling pathway, gastric cancer and Cushing syndrome (Table 5), suggesting a potential role of these genes in cancer-related mechanisms and metabolic processes. The TGF-beta signaling pathway plays vital roles in regulating cell growth, differentiation, and immune function. The glucocorticoid receptor signaling pathway, which is implicated in Cushing syndrome, modulates a wide range of physiological processes, including glucose metabolism and immune responses. Starch and sucrose metabolism is central to maintaining energy homeostasis, regulating blood glucose levels, and supporting overall metabolic health. Notably, genes such as TCF7L2, CDKN2B, and PYGL are involved in multiple glucose metabolism-related pathways. We hypothesize that their downregulation in colorectal cancer tissues may disrupt glucose metabolism, thereby contributing to tumorigenesis.

Download:

Table 5. Top 10 significantly enriched KEGG pathways.

https://doi.org/10.1371/journal.pone.0333179.t005

2.5 Online drug screening for CRC-associated genes

The 13 identified genes were queried in the Enrichr database, and potential therapeutic compounds were screened via DSigDB (Table 6). These compounds exert their effects by targeting specific genes or their encoded proteins, thereby influencing key signaling pathways involved in CRC progression. As a result, these compounds and their derivatives hold potential as therapeutic agents for CRC treatment [23–25].

Download:

Table 6. Potential therapeutic compounds identified through online screening.

https://doi.org/10.1371/journal.pone.0333179.t006

2.6 Virtual screening via molecular docking analysis of the receptor–ligand complex

A total of 17 mRNAs were identified, with their encoded protein structures available in the PDB or the AlphaFold protein structure database, among which only 3 possessed PDB IDs and contained small-molecule ligands. Based on the properties and size of the active sites, PYGL (PDB ID: 3DDS) was ultimately selected for further investigation [26], with the original ligand 26B serving as the docking ligand. Molecular docking was performed via AutoDock Vina, with the docking box centered at (x = 80.27, y = −97.68, z = 124.02). The calculated binding energy between the receptor and the original ligand was −12.3 kcal/mol, indicating a strong interaction. The docking conformation of the ligand exhibited a high degree of structural overlap with the original ligand (Fig 5). Both the crystal conformation and the docking conformation of the original ligand formed hydrogen bonds with key amino acid residues (Val40, Gln71, and Arg310), confirming the accuracy and reliability of the molecular docking method in predicting the optimal binding conformation of small-molecule compounds within the receptor’s active site.

Download:

Fig 5. Docking conformation of the original ligand at the active site.

The red structure represents the docked ligand conformation, with magenta bonds indicating hydrogen bonding interactions. The green structure denotes the crystal conformation of the protein-bound ligand, with yellow bonds highlighting hydrogen interactions.

https://doi.org/10.1371/journal.pone.0333179.g005

A total of 499 small-molecule compounds associated with the PYGL gene were retrieved from the BindingDB database, with most exhibiting nanomolar-range IC₅₀ values against the PYGL protein. Molecular docking analysis via AutoDock Vina identified 155 compounds with binding energies ≤ −10 kcal/mol, whereas 497 compounds (99% of all analyzed compounds) had binding energies ≤ −7 kcal/mol. Only two compounds exhibited binding energies > −7 kcal/mol, two of which had molecular weights below 150 g/mol (S2 Table), suggesting that low molecular weights may have affected docking accuracy. As shown in Fig 6, a positive correlation was observed between the binding energy and the Log(IC₅₀) value (p = 8.2e-3). These findings demonstrate that the screening platform based on AutoDock Vina can effectively identify potential active compounds that target the PYGL protease.

Download:

Fig 6. Correlation analysis of the binding energies and IC₅₀ values of compounds retrieved from the BindingDB.

https://doi.org/10.1371/journal.pone.0333179.g006

Small molecule compounds were retrieved from the PubChem database and filtered on the basis of Lipinski’s Rule of Five to assess drug likeness. A total of 62,964 small-molecule compounds were selected, hydrogenated, and optimized for use as ligands in virtual screening. Using the PYGL protein structure (PDB: 3DDS) as the receptor and the original ligand binding site as the target, virtual screening was conducted with AutoDock Vina. This process identified 10 high-affinity compounds (Table 7), which are potential candidates for CRC treatment.

Download:

Table 7. Top 10 high-affinity compounds identified through virtual screening.

https://doi.org/10.1371/journal.pone.0333179.t007

3. Discussion

This study provides a comprehensive molecular characterization of colorectal cancer (CRC) through integrated analysis of GWAS and transcriptomic data combined with computational drug discovery approaches. Our systematic investigation identified 24 CRC-associated genes, including key regulators such as SMAD7 (a negative modulator of TGF-β signaling) [27,28], TCF7L2 (a critical component of the Wnt/β-catenin pathway) [29], and PYGL (a metabolic enzyme involved in glycogenolysis) [30,31]. These findings not only reinforce established CRC pathways but also reveal novel molecular vulnerabilities, particularly in tumor metabolic reprogramming.

The computational drug screening pipeline identified ten high-affinity small molecules that target PYGL, suggesting potential for therapeutic repurposing. This metabolic target represents an innovative approach to disrupt cancer energy homeostasis, which is distinct from conventional kinase inhibitors. Furthermore, our analysis revealed that differentially expressed lncRNAs (e.g., CDKN2B-AS1 and TMEM220-AS1) may function as epigenetic regulators in CRC pathogenesis, opening new avenues for RNA-targeted therapies [32].

While these computational predictions require experimental validation, the robustness of our findings is supported by the following:

Stringent statistical thresholds (p < 1e-6 in GWAS) [11].
Cross-platform consistency between the genomic and transcriptomic data.
Comprehensive molecular docking analyses.
Utilization of established drug‒gene interaction databases.

Future directions should focus on the following:

Functional validation via CRISPR-based gene editing and organoid models [33,34].
Preclinical evaluation of PYGL inhibitors in relevant CRC models.
Development of LNP-encapsulated CRISPR systems for lncRNA modulation [35–37].
Investigation of combinatorial approaches targeting both metabolic and signaling pathways

This study establishes a framework for translating multiomics discoveries into therapeutic opportunities, highlighting the value of integrative computational biology in accelerating cancer drug discovery. The identified targets and compounds provide a foundation for developing next-generation CRC therapies that address the current limitations of targeted agents, particularly in overcoming drug resistance and improving treatment personalization.

Conclusion

In summary, this study employed p < 1e-6 as the screening threshold for the colorectal cancer GWAS, combined with RNA-seq, GO, KEGG, and survival analyses. Survival analysis revealed that five genes (CDKN2B, BOC, SCG5, PYGL, and METRNL) were significantly correlated with overall survival (OS), relapse-free survival (RFS), and post-progression survival (PPS). Among these, CDKN2B, BOC, and METRNL showed p-values greater than 5e-8 in the colorectal cancer GWAS data. Subsequently, computer-aided drug design was utilized to screen multiple compounds exhibiting low binding energy with the PYGL protein, providing new therapeutic targets and directions for colorectal cancer treatment.

Supporting information

S1 Table. Phenotype-associated SNPs and genes identified in the GCST90018808 dataset.

https://doi.org/10.1371/journal.pone.0333179.s001

(XLSX)

S2 Table. Binding energies and IC₅₀ values of compounds derived from BindingDB that target the PYGL protein.

The R scripts used in the study are available via the following DOI: 10.5281/zenodo.15803098, v1.0.2.

https://doi.org/10.1371/journal.pone.0333179.s002

(XLSX)

Acknowledgments

We would like to thank Professor Baoen Shan for kindly proofreading the manuscript. We would also like to express gratitude to EditSprings (https://www.editsprings.cn) for the expert linguistic services provided.

References

1. Zhou J, Zheng R, Zhang S, Zeng H, Wang S, Chen R, et al. Colorectal cancer burden and trends: comparison between China and major burden countries in the world. Chin J Cancer Res. 2021;33(1):1–10. pmid:33707923
- View Article
- PubMed/NCBI
- Google Scholar
2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. pmid:30207593
- View Article
- PubMed/NCBI
- Google Scholar
3. Liu S, Zheng R, Zhang M, Zhang S, Sun X, Chen W. Incidence and mortality of colorectal cancer in China, 2011. Chin J Cancer Res. 2015;27(1):22–8. pmid:25717222
- View Article
- PubMed/NCBI
- Google Scholar
4. Li B, Severson E, Pignon J-C, Zhao H, Li T, Novak J, et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol. 2016;17(1):174. pmid:27549193
- View Article
- PubMed/NCBI
- Google Scholar
5. Yu Y, Li L, Luo B, Chen D, Yin C, Jian C, et al. Predicting potential therapeutic targets and small molecule drugs for early-stage lung adenocarcinoma. Biomed Pharmacother. 2024;174:116528. pmid:38555814
- View Article
- PubMed/NCBI
- Google Scholar
6. Stathias V, Jermakowicz AM, Maloof ME, Forlin M, Walters W, Suter RK, et al. Drug and disease signature integration identifies synergistic combinations in glioblastoma. Nat Commun. 2018;9(1):5315. pmid:30552330
- View Article
- PubMed/NCBI
- Google Scholar
7. Mariano JA, Shen Y, Federico MG. Network-based inference of protein activity helps functionalize the genetic landscape of cancer. Nat Genet. 2016;48(8):838–47.
- View Article
- Google Scholar
8. Wang W, Liu Y, Wang Z, Tan X, Jian X, Zhang Z. Exploring and validating the necroptotic gene regulation and related lncRNA mechanisms in colon adenocarcinoma based on multi-dimensional data. Sci Rep. 2024;14(1):22251. pmid:39333335
- View Article
- PubMed/NCBI
- Google Scholar
9. Zhan J, Sun S, Chen Y, Xu C, Chen Q, Li M, et al. MiR-3130-5p is an intermediate modulator of 2q33 and influences the invasiveness of lung adenocarcinoma by targeting NDUFS1. Cancer Med. 2021;10(11):3700–14. pmid:33978320
- View Article
- PubMed/NCBI
- Google Scholar
10. Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53(10):1415–24. pmid:34594039
- View Article
- PubMed/NCBI
- Google Scholar
11. Chen Z, Guo X, Tao R, Huyghe JR, Law PJ, Fernandez-Rozadilla C, et al. Fine-mapping analysis including over 254,000 East Asian and European descendants identifies 136 putative colorectal cancer susceptibility genes. Nature Commun. 2024;15:3557.
- View Article
- Google Scholar
12. Burgess S, Thompson SG, CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol. 2011;40(3):755–64.
- View Article
- Google Scholar
13. Shen W, Song Z, Zhong X, Huang M, Shen D, Gao P, et al. Sangerbox: a comprehensive, interaction-friendly clinical bioinformatics analysis platform. Imeta. 2022;1(3):e36. pmid:38868713
- View Article
- PubMed/NCBI
- Google Scholar
14. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. pmid:25605792
- View Article
- PubMed/NCBI
- Google Scholar
15. Kaisrlikova M, Kundrat D, Koralkova P, Trsova I, Lenertova Z, Votavova H, et al. Attenuated cell cycle and DNA damage response transcriptome signatures and overrepresented cell adhesion processes imply accelerated progression in patients with lower-risk myelodysplastic neoplasms. Int J Cancer. 2024;154(9):1652–68. pmid:38180088
- View Article
- PubMed/NCBI
- Google Scholar
16. Győrffy B. Integrated analysis of public datasets for the discovery and validation of survival-associated genes in solid tumors. Innovation (Camb). 2024;5(3):100625. pmid:38706955
- View Article
- PubMed/NCBI
- Google Scholar
17. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90-7. pmid:27141961
- View Article
- PubMed/NCBI
- Google Scholar
18. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. pmid:36370105
- View Article
- PubMed/NCBI
- Google Scholar
19. Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44(D1):D1045-53. pmid:26481362
- View Article
- PubMed/NCBI
- Google Scholar
20. Roskoski R Jr. Rule of five violations among the FDA-approved small molecule protein kinase inhibitors. Pharmacol Res. 2023;191:106774. pmid:37075870
- View Article
- PubMed/NCBI
- Google Scholar
21. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–61. pmid:19499576
- View Article
- PubMed/NCBI
- Google Scholar
22. Meng Q, Wang J, Jiang B, Zhang X, Xu J, Cao Y, et al. SiRNA-based delivery nanoplatform attenuates the CRC progression via HIF1α-AS2. Nano Today. 2022;47:101667.
- View Article
- Google Scholar
23. Eissa SI. Synthesis, characterization and biological evaluation of some new indomethacin analogs with a colon tumor cell growth inhibitory activity. Med Chem Res. 2017;26(9):2205–20.
- View Article
- Google Scholar
24. An N, Sun Y, Ma LG. Helveticoside exhibited p53-dependent anticancer activity against colorectal cancer. Arch Med Res. 2020; 51(3): 224–32.
- View Article
- Google Scholar
25. Guo YC, Chang CM, Hsu WL. Indomethacin inhibits cancer cell migration via attenuation of cellular calcium mobilization. Molecules. 2013;18:6584–96.
- View Article
- Google Scholar
26. Thomson SA, Banker P, Bickett DM, Boucheron JA, Carter HL, Clancy DC, et al. Anthranilimide based glycogen phosphorylase inhibitors for the treatment of type 2 diabetes. Part 3: X-ray crystallographic characterization, core and urea optimization and in vivo efficacy. Bioorg Med Chem Lett. 2009;19(4):1177–82. pmid:19138846
- View Article
- PubMed/NCBI
- Google Scholar
27. Massagué J, Blain SW, Lo RS. TGFbeta signaling in growth control, cancer, and heritable disorders. Cell. 2000;103(2):295–309. pmid:11057902
- View Article
- PubMed/NCBI
- Google Scholar
28. Itóh S, Landström M, Hermansson A, Itoh F, Heldin CH, Heldin NE, et al. Transforming growth factor beta1 induces nuclear export of inhibitory Smad7. J Biol Chem. 1998;273(44):29195–201. pmid:9786930
- View Article
- PubMed/NCBI
- Google Scholar
29. Slattery ML, Folsom AR, Wolff R, Herrick J, Caan BJ, Potter JD. Transcription factor 7-like 2 polymorphism and colon cancer. Cancer Epidemiol Biomarkers Prev. 2008;17(4):978–82. pmid:18398040
- View Article
- PubMed/NCBI
- Google Scholar
30. Cao T, Wang J. PYGL regulation of glycolysis and apoptosis in glioma cells under hypoxic conditions via HIF1α-dependent mechanisms. Transl Cancer Res. 2024;13(10):5627–48. pmid:39525037
- View Article
- PubMed/NCBI
- Google Scholar
31. Ji Q, Li H, Cai Z, Yuan X, Pu X, Huang Y, et al. PYGL-mediated glucose metabolism reprogramming promotes EMT phenotype and metastasis of pancreatic cancer. Int J Biol Sci. 2023;19(6):1894–909. pmid:37063425
- View Article
- PubMed/NCBI
- Google Scholar
32. Ke R, Lv L, Zhang S, Zhang F, Jiang Y. Functional mechanism and clinical implications of MicroRNA-423 in human cancers. Cancer Med. 2020;9(23):9036–51. pmid:33174687
- View Article
- PubMed/NCBI
- Google Scholar
33. Douglas H. Hallmarks of cancer: new dimensions. Cancer Discov. 2022;12(1):31–46.
- View Article
- Google Scholar
34. Morelli E, Gulla’ A, Amodio N, Taiana E, Neri A, Fulciniti M, et al. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) to explore the oncogenic lncRNA network. Methods Mol Biol. 2021;2348:189–204. pmid:34160808
- View Article
- PubMed/NCBI
- Google Scholar
35. Holtzman L, Gersbach CA. Editing the epigenome: reshaping the genomic landscape. Annu Rev Genomics Hum Genet. 2018;19:43–71. pmid:29852072
- View Article
- PubMed/NCBI
- Google Scholar
36. Kampmann M. CRISPRi and CRISPRa screens in mammalian cells for precision biology and medicine. ACS Chem Biol. 2018;13(2):406–16. pmid:29035510
- View Article
- PubMed/NCBI
- Google Scholar
37. Kon E, Ad-El N, Hazan-Halevy I, Stotsky-Oterin L, Peer D. Targeting cancer with mRNA-lipid nanoparticles: key considerations and future prospects. Nat Rev Clin Oncol. 2023;20(11):739–54. pmid:37587254
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Zhou J, Zheng R, Zhang S, Zeng H, Wang S, Chen R, et al. Colorectal cancer burden and trends: comparison between China and major burden countries in the world. Chin J Cancer Res. 2021;33(1):1–10. pmid:33707923
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. pmid:30207593
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Liu S, Zheng R, Zhang M, Zhang S, Sun X, Chen W. Incidence and mortality of colorectal cancer in China, 2011. Chin J Cancer Res. 2015;27(1):22–8. pmid:25717222
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Li B, Severson E, Pignon J-C, Zhao H, Li T, Novak J, et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol. 2016;17(1):174. pmid:27549193
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Yu Y, Li L, Luo B, Chen D, Yin C, Jian C, et al. Predicting potential therapeutic targets and small molecule drugs for early-stage lung adenocarcinoma. Biomed Pharmacother. 2024;174:116528. pmid:38555814
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Stathias V, Jermakowicz AM, Maloof ME, Forlin M, Walters W, Suter RK, et al. Drug and disease signature integration identifies synergistic combinations in glioblastoma. Nat Commun. 2018;9(1):5315. pmid:30552330
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Mariano JA, Shen Y, Federico MG. Network-based inference of protein activity helps functionalize the genetic landscape of cancer. Nat Genet. 2016;48(8):838–47.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref8] 8. Wang W, Liu Y, Wang Z, Tan X, Jian X, Zhang Z. Exploring and validating the necroptotic gene regulation and related lncRNA mechanisms in colon adenocarcinoma based on multi-dimensional data. Sci Rep. 2024;14(1):22251. pmid:39333335
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Zhan J, Sun S, Chen Y, Xu C, Chen Q, Li M, et al. MiR-3130-5p is an intermediate modulator of 2q33 and influences the invasiveness of lung adenocarcinoma by targeting NDUFS1. Cancer Med. 2021;10(11):3700–14. pmid:33978320
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53(10):1415–24. pmid:34594039
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref11] 11. Chen Z, Guo X, Tao R, Huyghe JR, Law PJ, Fernandez-Rozadilla C, et al. Fine-mapping analysis including over 254,000 East Asian and European descendants identifies 136 putative colorectal cancer susceptibility genes. Nature Commun. 2024;15:3557.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref12] 12. Burgess S, Thompson SG, CRP CHD Genetics Collaboration. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol. 2011;40(3):755–64.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref13] 13. Shen W, Song Z, Zhong X, Huang M, Shen D, Gao P, et al. Sangerbox: a comprehensive, interaction-friendly clinical bioinformatics analysis platform. Imeta. 2022;1(3):e36. pmid:38868713
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref14] 14. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. pmid:25605792
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref15] 15. Kaisrlikova M, Kundrat D, Koralkova P, Trsova I, Lenertova Z, Votavova H, et al. Attenuated cell cycle and DNA damage response transcriptome signatures and overrepresented cell adhesion processes imply accelerated progression in patients with lower-risk myelodysplastic neoplasms. Int J Cancer. 2024;154(9):1652–68. pmid:38180088
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref16] 16. Győrffy B. Integrated analysis of public datasets for the discovery and validation of survival-associated genes in solid tumors. Innovation (Camb). 2024;5(3):100625. pmid:38706955
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref17] 17. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90-7. pmid:27141961
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref18] 18. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. pmid:36370105
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref19] 19. Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44(D1):D1045-53. pmid:26481362
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref20] 20. Roskoski R Jr. Rule of five violations among the FDA-approved small molecule protein kinase inhibitors. Pharmacol Res. 2023;191:106774. pmid:37075870
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref21] 21. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–61. pmid:19499576
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref22] 22. Meng Q, Wang J, Jiang B, Zhang X, Xu J, Cao Y, et al. SiRNA-based delivery nanoplatform attenuates the CRC progression via HIF1α-AS2. Nano Today. 2022;47:101667.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref23] 23. Eissa SI. Synthesis, characterization and biological evaluation of some new indomethacin analogs with a colon tumor cell growth inhibitory activity. Med Chem Res. 2017;26(9):2205–20.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref24] 24. An N, Sun Y, Ma LG. Helveticoside exhibited p53-dependent anticancer activity against colorectal cancer. Arch Med Res. 2020; 51(3): 224–32.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref25] 25. Guo YC, Chang CM, Hsu WL. Indomethacin inhibits cancer cell migration via attenuation of cellular calcium mobilization. Molecules. 2013;18:6584–96.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref26] 26. Thomson SA, Banker P, Bickett DM, Boucheron JA, Carter HL, Clancy DC, et al. Anthranilimide based glycogen phosphorylase inhibitors for the treatment of type 2 diabetes. Part 3: X-ray crystallographic characterization, core and urea optimization and in vivo efficacy. Bioorg Med Chem Lett. 2009;19(4):1177–82. pmid:19138846
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref27] 27. Massagué J, Blain SW, Lo RS. TGFbeta signaling in growth control, cancer, and heritable disorders. Cell. 2000;103(2):295–309. pmid:11057902
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref28] 28. Itóh S, Landström M, Hermansson A, Itoh F, Heldin CH, Heldin NE, et al. Transforming growth factor beta1 induces nuclear export of inhibitory Smad7. J Biol Chem. 1998;273(44):29195–201. pmid:9786930
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Slattery ML, Folsom AR, Wolff R, Herrick J, Caan BJ, Potter JD. Transcription factor 7-like 2 polymorphism and colon cancer. Cancer Epidemiol Biomarkers Prev. 2008;17(4):978–82. pmid:18398040
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref30] 30. Cao T, Wang J. PYGL regulation of glycolysis and apoptosis in glioma cells under hypoxic conditions via HIF1α-dependent mechanisms. Transl Cancer Res. 2024;13(10):5627–48. pmid:39525037
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref31] 31. Ji Q, Li H, Cai Z, Yuan X, Pu X, Huang Y, et al. PYGL-mediated glucose metabolism reprogramming promotes EMT phenotype and metastasis of pancreatic cancer. Int J Biol Sci. 2023;19(6):1894–909. pmid:37063425
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref32] 32. Ke R, Lv L, Zhang S, Zhang F, Jiang Y. Functional mechanism and clinical implications of MicroRNA-423 in human cancers. Cancer Med. 2020;9(23):9036–51. pmid:33174687
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref33] 33. Douglas H. Hallmarks of cancer: new dimensions. Cancer Discov. 2022;12(1):31–46.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref34] 34. Morelli E, Gulla’ A, Amodio N, Taiana E, Neri A, Fulciniti M, et al. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) to explore the oncogenic lncRNA network. Methods Mol Biol. 2021;2348:189–204. pmid:34160808
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref35] 35. Holtzman L, Gersbach CA. Editing the epigenome: reshaping the genomic landscape. Annu Rev Genomics Hum Genet. 2018;19:43–71. pmid:29852072
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref36] 36. Kampmann M. CRISPRi and CRISPRa screens in mammalian cells for precision biology and medicine. ACS Chem Biol. 2018;13(2):406–16. pmid:29035510
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref37] 37. Kon E, Ad-El N, Hazan-Halevy I, Stotsky-Oterin L, Peer D. Targeting cancer with mRNA-lipid nanoparticles: key considerations and future prospects. Nat Rev Clin Oncol. 2023;20(11):739–54. pmid:37587254
View Article
PubMed/NCBI
Google Scholar

[138] View Article

[139] PubMed/NCBI

[140] Google Scholar

Figures

Abstract

Background

Methods

Results

Conclusion

Introduction

1. Materials and methods

1.1 Study design

1.2 Data acquisition

1.3 GWAS analysis

1.4 RNA-seq differential expression analysis

1.5 Survival analysis across all genes in colon cancer

1.6 Enrichment analysis and functional annotation

1.7 Online drug screening

1.8 Virtual screening

2. Results

2.1 Identification of SNPs significantly associated with phenotypes through GWAS data analysis

2.2 Identification of genes significantly associated with CRC via both GWAS and RNA-seq data analyses

2.3 Survival analysis

2.4 Enrichment analysis and functional annotation

2.5 Online drug screening for CRC-associated genes

2.6 Virtual screening via molecular docking analysis of the receptor–ligand complex

3. Discussion

Conclusion

Supporting information

S1 Table. Phenotype-associated SNPs and genes identified in the GCST90018808 dataset.

S2 Table. Binding energies and IC50 values of compounds derived from BindingDB that target the PYGL protein.

Acknowledgments

References

S2 Table. Binding energies and IC₅₀ values of compounds derived from BindingDB that target the PYGL protein.