Figures
Abstract
Gastric cancer (GC) is a major cause of cancer mortality and remains difficult to diagnose early and treat effectively. Although transcriptomic profiling has defined extensive molecular heterogeneity, many studies are not anchored to clinicopathological variables or interpreted in the context of tissue-level pathobiology. We applied an integrative transcriptomic and network-based framework to identify molecular signatures that reflect structural and biochemical reprogramming of gastric tissue during malignant transformation. RNA-seq expression profiles and clinical annotations were analyzed using non-parametric differential expression filtering, functional enrichment, and protein–protein interaction network modeling. Expression patterns were further evaluated across clinicopathological strata (stage, grade, nodal status, and metastasis) and by network-informed clustering. Across clinical strata, GC showed a consistent signature of developmental reactivation and loss of gastric epithelial identity. Developmental regulators, most prominently HOX-cluster genes and the histone variant HIST1H3J, were upregulated, consistent with epigenetic/chromatin reprogramming. In parallel, gastric differentiation and secretory lineage markers (ATP4A, KCNE2, PTF1A, VSTM2A) were persistently downregulated, reflecting suppression of parietal/ductal programs and altered metabolic/secretory function. ADIPOQ demonstrated stage-dependent repression and was associated with poorer survival, supporting a context-dependent prognostic role. Enrichment and network analyses also highlighted FGFR-centered signaling as a dominant oncogenic axis linked to proliferation and invasion. This study identifies a molecular pathology framework in which GC progression involves coordinated chromatin-level developmental reprogramming and sustained loss of gastric differentiation programs, accompanied by FGFR-driven oncogenic signaling and stage-dependent metabolic disruption. These signatures provide candidate diagnostic and prognostic biomarkers and support prioritization of therapeutically actionable pathways in GC.
Citation: Mottaghi-Dastjerdi N, Soltany-Rezaee-Rad M (2026) Molecular and structural reprogramming of gastric cancer revealed by systems-level transcriptomic analysis. PLoS One 21(5): e0344143. https://doi.org/10.1371/journal.pone.0344143
Editor: Gary S. Stein, University of Vermont, UNITED STATES OF AMERICA
Received: February 16, 2026; Accepted: April 29, 2026; Published: May 21, 2026
Copyright: © 2026 Mottaghi-Dastjerdi, Soltany-Rezaee-Rad. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: TCGA-STAD RNA-seq and clinical data are publicly available from the Genomic Data Commons (GDC) Data Portal (https://portal.gdc.cancer.gov/). Processed data supporting the findings of this study are available from Figshare (https://doi.org/10.6084/m9.figshare.32050323). Analysis scripts used in this study are publicly available without restriction at GitHub (https://github.com/negmot/tcga-stad-network-analysis) and have been archived at Zenodo (https://doi.org/10.5281/zenodo.18537980) to ensure reproducibility.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: BP, Biological Process; CC, Cellular Component; DEG, Differentially Expressed Gene; FGFR, Fibroblast Growth Factor Receptor; GC, Gastric Cancer; GO, Gene Ontology; HOX, Homeobox Gene; KEGG, Kyoto Encyclopedia of Genes and Genomes; MF, Molecular Function; PPI, Protein-Protein Interaction; RNA-seq, RNA Sequencing; STAD, Stomach Adenocarcinoma; TCGA, The Cancer Genome Atlas; UALCAN, University of Alabama at Birmingham Cancer Data Analysis Portal
Introduction
Gastric cancer (GC) remains one of the leading causes of cancer‑related mortality worldwide and continues to impose a major global health burden, particularly in East Asian populations where late‑stage diagnosis and therapeutic resistance are especially prevalent [1,2]. Recent data indicate that GC remains highly prevalent across East Asia, with persistent high mortality and only modest improvements in survival over recent years [1,3,4]. Despite advances in surgical and chemotherapeutic intervention, the 5‑year overall survival of GC remains below 30‑35% throughout much of the world, particularly with diagnoses at advanced stages [1,5]. In the last decade, large‑scale transcriptomic profiling analysis, utilizing large datasets including those based on TCGA data and integrated bulk/single‑cell datasets, has illuminated extensive molecular heterogeneity and complexity in GC, uncovering novel subtypes, signaling networks, and gene expression patterns [6,7]. However, it is yet difficult to translate those findings into clinically valuable biomarkers and therapeutic targets.
GC pathobiology involves coordinated transcriptional dysregulation, immune evasion, metabolic rewiring, and dedifferentiation. During tumorigenesis, gastric epithelial identity is progressively lost as embryonic/developmental pathways (e.g., Wnt, Notch, FGFR) are reactivated, producing transcriptomic signatures linked to early malignancy and clinical outcomes. Systems-biology and gene-network analyses help decode these complex patterns and prioritize central regulatory genes with translational relevance [8,9].
Despite extensive transcriptomic profiling, many studies still report DEGs without network context or stratification by key clinicopathological variables (stage, grade, nodal status). As a result, systems-level insight into developmental reactivation (e.g., HOX clusters) and loss of epithelial lineage programs remains limited. While integrative network modeling has proven useful for identifying drivers and resistance mechanisms across GI and hepatic cancers [10–16], unified pipelines that link phenotype correlations with multidimensional network clustering are still uncommon, leaving a gap at the intersection of transcriptomics, network centrality, and clinical relevance for biomarker discovery.
To address these gaps, we performed an integrated transcriptomic and systems-level analysis of TCGA-STAD RNA-seq data, combining non-parametric DEG filtering with dual PPI network construction, functional enrichment, and clinicopathological stratification (stage, grade, nodal status). This framework aimed to identify stable diagnostic and prognostic biomarkers, characterize developmental and immune transcriptional programs, and prioritize therapeutic candidates, particularly within HOX genes, gastric lineage markers, and core regulatory transcription factors—building on our prior work in gastric and other gastrointestinal cancers [17,18].
Materials and methods
Computational environment and tools
A schematic workflow summarizing the data processing and analysis steps is provided in S1 Fig. All preprocessing of RNA-seq data, normalization, filtering, and differential expression analysis were carried out in Python (v3.10) via Jupyter Notebook. The main libraries used included pandas for data manipulation, numpy for numerical computations, and matplotlib and seaborn for data visualization. All analysis scripts are available at GitHub and have been archived at Zenodo, as described in the Data Availability section.
Filtering low-expression genes and log2 normalization
RNA-seq raw count data for stomach adenocarcinoma (TCGA-STAD) were obtained from the Genomic Data Commons (GDC) Data Portal (https://portal.gdc.cancer.gov/). Specifically, HTSeq-count files corresponding to 448 samples (412 tumor and 36 normal tissues) were downloaded and used for downstream analyses. Raw counts of 448 TCGA gastric cancer samples were removed for low-expression genes (≤1 count in >95% of the samples), and 56,678 genes expressed in ≥5 samples were left. Log₂(x + 1) transformation was then applied for stabilization of variance and for enhancing comparability.
Clinical metadata processing
Clinical metadata for TCGA-STAD patients (n = 511) were retrieved from the GDC Data Portal (https://portal.gdc.cancer.gov/), including variables such as patient ID, gender, tumor stage, and survival information. The data were subsequently cleaned and standardized using Python (v3.10) and the pandas library.
Differential expression analysis (DEA)
RNA-seq data from 448 samples (412 tumors and 36 normal tissues) were analyzed using the Mann–Whitney U test. Differentially expressed genes (DEGs) were defined as those with |log2FC| ≥ 1 and FDR < 0.05 (Benjamini–Hochberg). Genes with an average expression ≥ 1 in both groups were retained. To ensure a focused and interpretable network analysis while reducing noise from less biologically relevant genes, the top 100 upregulated and top 100 downregulated DEGs were selected based on absolute log2 fold change. This threshold was applied to prioritize genes with the strongest expression differences while maintaining a manageable network size for reliable topological analysis. Importantly, a separate genome-scale network analysis using a larger DEG set (~6,500 genes) was also performed to preserve global system-level insights. Ensembl IDs were converted to gene symbols using MyGene.info. All analyses were performed in Python (v3.10) using the pandas, scipy, statsmodels, and mygene packages. The complete DEG list is provided in S1 File. The lists of the top 100 upregulated and top 100 downregulated genes are provided in S1 Table.
Functional enrichment analysis
From the initially selected top 100 upregulated and 100 downregulated DEGs, a subset (85 upregulated and 94 downregulated genes) was retained for enrichment analysis after removal of genes lacking valid annotation or recognized gene symbols in the enrichment database. Upregulated (n = 85) and downregulated (n = 94) DEGs were analyzed using Enrichr to identify enriched Gene Ontology (GO: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC)) terms and Reactome 2024 pathways [19]. Only results with FDR < 0.05 were considered significant. Enrichment results were visualized as bar plots based on −log₁₀(FDR) for comparative interpretation.
Protein-protein interaction (PPI) network analysis
Two PPI networks were constructed using STRING in Cytoscape v3.10: one from ~6,500 DEGs (split into four subsets) and another from the top 100 up- and downregulated genes [20,21]. Networks were generated with a confidence cutoff of 0.15. This threshold was selected to preserve sufficient interaction coverage in a large transcriptome-derived gene set. STRING integrates multiple evidence channels, including experimentally validated interactions, curated databases, text mining, co-expression patterns, and computational predictions, each contributing probabilistic scores to the combined interaction confidence. Applying stricter thresholds can substantially reduce network connectivity by excluding interactions supported by indirect or predictive evidence, which are common for transcriptional regulators and less-characterized proteins. Because the present analysis aimed to capture the broader interaction landscape among DEGs, a permissive cutoff was used to avoid excessive network fragmentation. To mitigate potential sensitivity to individual interaction scores, hub gene identification was performed using multiple complementary topological metrics (MCC, DMNC, MNC, and Degree) in cytoHubba, and genes consistently ranked among the top-5 across these metrics were considered high‑confidence hubs [22].
Cluster analysis of the network
To identify functional modules within the 6,500-gene PPI network, we applied the IPCA algorithm in CytoCluster (Cytoscape v2.1.0) [23] with TinThreshold = 0.9, ComplexSize ≥ 10, and PathLength = 2. This yielded 28 clusters. The top 4 clusters (ranked by density) were analyzed for pathway enrichment using Reactome via Enrichr, focusing on the top 20 significant pathways per cluster.
UALCAN-based clinicopathological expression and survival analysis
Survival analyses were performed using the UALCAN platform, which stratifies patients into expression-based groups according to dataset-specific quantile thresholds. Expression of the 13 hub genes in TCGA-STAD was assessed using the UALCAN portal (https://ualcan.path.uab.edu). TPM values were retrieved for cancer stage, tumor grade, nodal metastasis status, and survival modules, with normal gastric tissue as the reference. For differential expression across clinical subgroups, UALCAN applies Welch’s unpaired t-test. Kaplan-Meier survival curves were generated to evaluate prognostic associations, and raw p-values were subsequently corrected using the Benjamini-Hochberg FDR method [24].
External validation of the hub genes
To externally validate representative key dysregulated genes identified in the TCGA-STAD cohort, the gastric cancer microarray dataset GSE54129 was retrieved from the Gene Expression Omnibus (GEO) database. Differential expression analysis between gastric cancer and normal gastric tissues was performed using the GEO2R web tool, which applies the limma package. Representative hub genes from the Top200 DEG network were evaluated based on expression direction, log fold change, and adjusted p-value. For genes represented by multiple probes, the probe with the lowest adjusted p-value was retained [25].
The processed dataset used in this study is publicly available as described in the Data Availability section.
Results
Identification of DEGs in GC
RNA-seq data of 448 gastric samples (412 tumor, 36 normal) was compared with the Mann–Whitney U test to identify the DEGs. The genes with a threshold of |log2FC| ≥ 1 and FDR < 0.05 were designated as significant, and there were 9,896 DEGs (9,592 upregulated, 304 downregulated). To improve reliability, we retained only genes with average expression ≥ 1 in both groups. From the filtered list, the top 100 upregulated and top 100 downregulated DEGs (by absolute log2FC) were taken for downstream analysis. Gene IDs were converted to official symbols using the MyGene.info API.
Network reconstruction and analysis
To distinguish therapeutic targets from diagnostic/prognostic biomarkers, we constructed two PPI networks from TCGA-STAD DEGs. The genome-scale network (~6,500 DEGs) mapped global GC interactions and, using cytoHubba (four topology algorithms), identified highly central hubs, including histone variants (H3C12, H3-4), transcription factors (PAX2, HOXD11/13, HOXC12), signaling molecules (PRKACG, FGF8), and immune mediators (IFNG, IFNL2), as candidate drug targets because perturbing them could disrupt broad tumor-promoting circuitry (Table 1).
A second, focused network built from the top 200 DEGs (100 up, 100 down) yielded 93 nodes and 318 edges (Fig 1) and highlighted 13 hubs (HOXC8, HOXC9, HOXA11, HOXC11, HOXA13, WT1, ADIPOQ, GCG, VSTM2A, ATP4A, KCNE2, PTF1A) with both strong expression shifts and network centrality, supporting their potential as diagnostic/prognostic biomarkers (Table 2). This dual-network strategy separates broadly actionable targets from clinically tractable biomarkers by integrating systems-level topology with differential expression magnitude.
Constructed using STRING in Cytoscape (confidence score: 0.15).
Functional enrichment analysis
Functional enrichment analysis was performed to understand the biological role of identified up- and downregulated genes. GO terms and pathway associations were analyzed in two sets separately to disclose different functional profiles of both gene sets (Figs 2 and 3).
Top GO terms (BP, CC, MF) and Reactome pathways enriched among 85 upregulated GC genes, ranked by -Log10FDR using ENRICHR.
Top GO terms and Reactome pathways for 94 downregulated GC genes, ranked by -Log10FDR using ENRICHR.
Upregulated genes (n = 85) were enriched for embryonic/developmental programs and immune differentiation (e.g., morphogenesis terms and myeloid dendritic cell activation), along with transcriptional regulation (sequence-specific DNA binding) and multiple solute/anion transport activities. Reactome analysis similarly highlighted organic anion and vitamin/nucleoside transport, kidney developmental pathways, and drug-metabolism modules (e.g., ciprofloxacin/atorvastatin ADME, bile acid metabolism), supporting a phenotype of developmental reactivation, transcriptional control, and altered transport/metabolic capacity.
Downregulated genes were predominantly enriched for ion homeostasis and membrane transport, particularly sodium/potassium handling (transmembrane transport, export/import, membrane repolarization), and localized to vesicular/endosomal–Golgi compartments and transport complexes (e.g., clathrin-coated vesicles, Na ⁺ /K ⁺ -ATPase complexes). Functional terms also indicated reduced channel and transporter activities, with Reactome showing decreased innate immune defense (defensins/antimicrobial peptides), aquaporin and ion channel transport, and neuroendocrine signaling pathways (acetylcholine release, incretin/glucagon signaling), consistent with loss of normal physiological and immune functions.
Overall, GC displayed a shift toward developmental reprogramming, transcriptional activation, and altered solute/drug metabolism (upregulated genes), alongside suppression of epithelial ion/transport homeostasis and innate immunity (downregulated genes), underscoring tissue remodeling and functional derangement relevant to biomarker and target discovery.
Cluster analysis and reactome pathway enrichment
We further enriched the top four clusters obtained from IPCA into Reactome pathways to gain deeper insights into the modular organization of the gene network and to identify distinct biological themes. These clusters were mainly enriched for FGFR signaling and its canonical downstream pathways, developmental biology, and transcriptional regulation, showing that they are biologically coherent and functionally specialized (Fig 4, and Table 3).
The presence of a pathway in a given cluster is indicated in blue, and absence is shown in white. Data are based on the top 20 enriched Reactome pathways for each of the top four PPI network clusters identified using the IPCA algorithm in CytoCluster.
Across all clusters, FGFR signaling emerged as a consistent hallmark, including ligand-driven activation of FGFR1–4 and isoforms (FGFR1c/2c/3b/3c), alongside evidence of oncogenic FGFR2 mutant activation and regulatory counterbalance via negative modulators (e.g., FGFRL1). Enrichment extended to major downstream cascades, SHC/FRS-mediated signaling, PLC, and PI3K/IRS pathways, implicating proliferation, survival, motility, and metabolic regulation. Developmental programs were also reactivated (e.g., gastrulation, kidney/nephron and ureteric bud development, neural plate formation), suggesting increased cellular plasticity and possible EMT-related microenvironmental shifts. Additional cluster-specific themes included disrupted transcriptional control (early pancreatic precursor programs; SLIT/ROBO regulation) and matrix remodeling via metalloproteinase activation. Overall, the IPCA clusters define a coherent network in which FGFR signaling links developmental reprogramming, intracellular signaling, transcriptional regulation, and invasive behavior in gastric cancer.
Analysis of tumor grade-specific gene expression and potential biomarker utility
Gene expression analysis across tumor grades in stomach adenocarcinoma (STAD) uncovered a dynamic transcriptional landscape, with several genes exhibiting significant and grade-specific expression changes (Fig 5). These alterations offer insight into molecular mechanisms underlying tumor progression and differentiation, and highlight potential diagnostic and prognostic biomarkers.
Boxplots show transcript per million (TPM) values across normal and cancer grades 1-3 (TCGA data via UALCAN).
Grade-stratified analysis showed increasing expression of HOXA11/HOXA13/HOXC8/HOXC9/HOXC11 and HIST1H3J from Grade 1 to Grades 2–3, consistent with developmental/epigenetic reprogramming. In contrast, gastric lineage markers (ATP4A, KCNE2, PTF1A, VSTM2A) were strongly downregulated early and remained low across grades, indicating sustained loss of epithelial identity. WT1 rose mainly in Grades 2–3, while ADIPOQ was repressed early with modest recovery and GCG declined gradually. Together, these patterns support a panel where HOX/HIST1H3J (±WT1) reflect progression, and ATP4A/KCNE2/PTF1A/VSTM2A (±ADIPOQ) mark early dedifferentiation.
Expression of genes in STAD based on nodal metastasis status and biomarker potential
When TCGA-STAD samples were examined according to nodal metastasis stage (N0–N3), a clear and recurring alteration appeared among the 13 studied genes, pointing to their possible value as biomarkers for identifying and tracking disease (Fig 6).
Boxplots represent gene expression levels in different nodal stages (N0–N3) obtained from TCGA-STAD data.
Early Detection and Diagnostic Biomarkers: The HOX cluster genes (HOXA11, HOXA13, HOXC8, HOXC9, and HOXC11) together with HIST1H3J, showed marked and statistically strong overexpression across every nodal category when compared with normal gastric tissue (most with P ≤ 10 ⁻ ⁹). The elevated expression was already present in node-negative tumors, pointing to early engagement of these transcription factors in tumor development rather than secondary activation during metastasis. Seen in this light, their persistent activity may serve as an early warning sign for primary tumors that have yet to reach the lymphatic system.
Progression and Metastatic Biomarkers: While the expression of HOX genes and HIST1H3J did not show a progressive increase with higher nodal stages (N1–N3), certain genes demonstrated patterns suggestive of progression markers. Expression of WT1 rose sharply in N0 and N1 tumors (P around 10 ⁻ ⁶–10 ⁻ ⁷) and climbed even higher by the N3 stage, which may point to its role in promoting or tracking advanced nodal spread. The smaller but steady changes seen for KCNE2 and PTF1A across the same stages might reflect the gradual erosion of differentiation as tumors become more metastatic, an observation that deserves closer study for its potential prognostic value.
Loss of Differentiation Markers: Gastric lineage genes (ATP4A, KCNE2, PTF1A, VSTM2A) and ADIPOQ were markedly downregulated in both node-negative and metastatic tumors (all P < 0.01 vs normal), indicating early, sustained dedifferentiation and supporting their use as negative diagnostic markers independent of nodal status. GCG showed no nodal-group differences and can be excluded from the core biomarker set. Overall, the data support a two-tier framework: early activation markers (HOX-cluster genes, HIST1H3J), persistent loss-of-identity markers (ATP4A, KCNE2, PTF1A, VSTM2A, ADIPOQ), and WT1 as a potential progression marker associated with nodal spread.
Tumor stage–specific gene expression and implications for biomarker discovery in STAD
As we looked across tumor stages in STAD, certain genes stood out for how sharply their expression shifted; patterns that could eventually guide biomarker discovery and treatment design. Several genes, among them HOXA11, HOXA13, HOXC8, HOXC9, HOXC11, HIST1H3J, WT1, and the differentiation markers ATP4A, KCNE2, VSTM2A, and PTF1A, showed stage-specific shifts in activity, suggesting value for classifying tumors or refining clinical decisions (Fig 7).
Boxplots show transcript per million (TPM) values across normal tissue and cancer stages I–IV using TCGA-STAD data via UALCAN.
Across STAD stages, HIST1H3J and multiple HOX-cluster genes are consistently upregulated (often P < 10 ⁻ ¹²), with early onset and sustained expression, suggesting roles in tumor establishment and maintenance and potential utility as early molecular markers. In contrast, gastric differentiation genes ATP4A, KCNE2, VSTM2A, and PTF1A are persistently downregulated across all stages versus normal tissue, indicating stable loss of parietal/ductal identity and supporting their use as diagnostic markers of malignant transformation. WT1 increases with stage, reaching stronger significance in advanced disease, consistent with a progression-associated biomarker. Together, these patterns define a stable STAD signature, early HOX/HIST1H3J activation with sustained repression of gastric identity genes, while WT1 may aid risk stratification and monitoring.
Survival analysis of candidate genes in STAD
We used Kaplan-Meier analysis to examine how the 13 candidate genes relate to patient survival in STAD. Among these genes, ADIPOQ showed the lowest nominal p-value (raw p = 0.012), whereas the remaining genes exhibited no apparent association with overall survival (all p > 0.05) (Fig 8). For ADIPOQ, patients were stratified by expression level (high, n = 100 vs. low/intermediate, n = 292). The Kaplan-Meier curve showed a trend toward poorer survival in the high-expression group, suggesting a potential negative prognostic pattern. This observation is biologically plausible given the context-dependent, inflammation-modulating roles of adiponectin in the tumor microenvironment. However, this interpretation should be considered exploratory. Survival analyses for all 13 genes were performed using the TCGA-STAD dataset via the UALCAN platform, and the raw p-values were corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate (FDR) method. After correction, the association for ADIPOQ was no longer statistically significant (raw p = 0.012; FDR = 0.156), and none of the other genes reached significance (all FDR values > 0.25).
Kaplan-Meier survival curves show overall survival differences between patient groups stratified by ADIPOQ expression levels based on TCGA-STAD data.
External validation of the hub genes
To further evaluate the robustness of our findings, hub genes identified from the Top200 DEG network were externally validated using the independent GEO gastric cancer dataset GSE54129. As shown in Table 4, several key genes demonstrated concordant and statistically significant differential expression patterns compared with the TCGA-STAD cohort. Among the upregulated genes, HOXC9, HOXA11, HOXC11, HOXA13, and WT1 remained significantly overexpressed in gastric cancer tissues. Among the downregulated genes, VSTM2A, KCNE2, and ATP4A remained significantly decreased in the validation cohort. In addition, HOXC8 and GCG showed concordant expression trends but did not reach statistical significance. In contrast, H3C12 and ADIPOQ did not demonstrate concordant regulation in the external dataset, while PTF1A was not available in the queried GEO output. Overall, most prioritized genes showed consistent expression trends across independent datasets, supporting the reproducibility and robustness of the present findings. The complete GEO2R differential expression results for the GSE54129 dataset are provided in S2 Table.
Discussion
In this work, we took a broad look at the transcriptome of STAD to map gene expression changes and their biological and clinical implications. By combining a full set of roughly 6,500 DEGs with a focused analysis of the top 200, two clear interaction modules began to emerge from the data. The first includes histone genes, immune regulators, and kinases that occupy central positions in the protein–protein interaction network, making them attractive targets for therapeutic disruption. The second group includes transcription factors, epithelial genes, and metabolic regulators that show striking expression differences. These features make them appealing candidates for developing both diagnostic and prognostic markers.
We found that HOXA11, HOXA13, HOXC8, HOXC9, and HOXC11 were consistently more active across tumor grades, stages, and nodal groups, a result that fits well with earlier evidence of HOX gene reactivation in gastrointestinal cancers [26]. Their expression tended to climb as tumors became less differentiated and more advanced, most noticeably in the aggressive forms. This steady rise links them to both loss of cellular identity and increasing malignancy. The histone gene HIST1H3J showed a comparable pattern, its levels rose step by step with tumor grade and stage, which aligns with reports of broad epigenetic remodeling [27]. These genes seem to switch on early in tumor formation and remain active as the disease develops, helping to sustain transcriptional activity that supports growth and survival.
Conversely, the observed downregulation of gastric lineage markers (ATP4A, KCNE2, VSTM2A, PTF1A) from early stages supports the idea that dedifferentiation is an initiating event in gastric tumorigenesis [28,29]. Our observations agree with earlier reports showing that genes tied to acid secretion and epithelial differentiation are already dampened in the early stages of gastric cancer. In particular, several studies have noted reduced expression of ATP4A and ATP4B, both of which have been examined as potential diagnostic markers [28]. ADIPOQ also showed dynamic regulation, with an early decrease followed by partial reactivation in advanced tumors; recent work suggests that low adiponectin levels or expression correlate with more aggressive GC features [30]. Survival analysis revealed that high ADIPOQ expression correlates with poorer patient outcomes, further highlighting its potential as a context-dependent prognostic biomarker [30]. Although ADIPOQ exhibited the lowest nominal p-value in the survival analysis, its prognostic association did not withstand Benjamini-Hochberg FDR correction across the 13 tested genes. This indicates that the observed raw p-value likely reflects a chance finding arising from multiple hypothesis testing rather than a robust, independent prognostic signal. Despite the lack of statistical significance after correction, the directionality of the initial trend where overall downregulation of ADIPOQ is observed in tumors, yet higher expression within the tumor cohort correlates with poorer survival warrants a cautious interpretation. This apparent discrepancy can be logically attributed to the inherent heterogeneity of bulk transcriptomic data. Since TCGA samples encompass both malignant cells and various stromal components, the measured ADIPOQ levels may reflect the contribution of non-epithelial elements within the tumor microenvironment rather than the cancer cells themselves. In such a scenario, a relative increase in ADIPOQ expression could be associated with a more prominent stromal presence or chronic inflammatory state, both of which are known factors in tumor progression. Nevertheless, given that the association did not remain significant after multiple-testing adjustment, these observations must be interpreted strictly as exploratory and hypothesis-generating. Further investigation using single-cell sequencing or stromal-epithelial deconvolution methods would be necessary to clarify whether this gene plays a meaningful role in the gastric cancer microenvironment.
Our enrichment analysis pointed to a set of molecular patterns that mirror what other transcriptomic studies have described in gastric cancer. The genes showing higher expression were mostly involved in transcriptional control, immune signaling, and developmental processes, similar to earlier reports describing a reactivation of embryonic and inflammatory pathways during tumor growth [31]. By contrast, the genes showing reduced activity were mostly linked to ion transport, vesicle movement, and antimicrobial defense. Their decline mirrors the loss of normal gastric cell function that tends to accompany malignant transformation [32]. Overall, the data suggest a broad reorganization of the tumor transcriptome, shifting away from the physiological roles of the stomach toward a more adaptable, development-like and metabolically flexible state, a trend that recent single-cell and spatial transcriptomic analyses of gastric and gastrointestinal cancers have also captured [31,32]. Our cluster-based pathway enrichment further emphasized the centrality of FGFR signaling and its downstream cascades. This finding is in agreement with recent clinical and preclinical studies demonstrating FGFR dysregulation as a driver of oncogenic transcriptional programs in gastric cancer [33,34].
Although the normal (n = 36) and tumor (n = 412) sample groups differed in size, preprocessing and normalization steps were applied uniformly across all samples, and the Mann-Whitney U test is robust to unequal group sizes due to its rank-based nature. The observed predominance of upregulated genes is therefore unlikely to represent a normalization artifact and is instead consistent with the extensive transcriptional activation characteristic of gastric cancer biology.
A key strength of this study is its layered design, combining stringent DEG filtering with network modeling, functional annotation, and clinicopathological interpretation to identify clinically relevant targets. Our findings are consistent with recent bulk and single-cell GC studies reporting metabolic reprogramming, immune microenvironment changes, and loss of gastric differentiation [7,35]. Limitations include reliance on TCGA-STAD only, lack of experimental validation, and the inability of transcriptomics to capture post-transcriptional/post-translational regulation; future work should validate results in independent cohorts and integrate additional omics layers.
Going forward, it will be important to check how well these biomarkers perform in other patient cohorts and to see whether they can be tracked in easier-to-obtain materials like blood or gastric fluid. Testing how HOX genes, WT1, and FGFR-pathway effectors behave when perturbed in cells or organoids should also help clarify what roles they actually play in tumor biology. Additionally, the relationship between ADIPOQ expression and the immune microenvironment warrants investigation to decipher its dual role in metabolism and immunoregulation.
In summary, our study delineates two functionally distinct gene modules in gastric cancer: a regulatory core enriched in HOX and histone genes that may be exploited for early diagnosis and a set of suppressed differentiation genes marking the loss of gastric identity. These results outline a molecular framework that could guide biomarker-based patient classification and help shape new therapeutic strategies in STAD. They also highlight how combining network-level analysis with clinical data can move the field closer to truly precise cancer care.
Conclusion
Using an integrated transcriptomic and network-based framework (differential expression, PPI mapping, hub ranking, and pathway enrichment), we identified a coordinated STAD signature marked by upregulation of developmental/immune programs and repression of epithelial differentiation and transport genes. Dual-network analysis prioritized a focused biomarker set (HOX genes, HIST1H3J, ATP4A, KCNE2, and PTF1A) with potential utility for early detection, monitoring, and risk stratification, and highlighted actionable pathways relevant to precision oncology. Key limitations are reliance on TCGA-STAD data and lack of experimental validation; future work should validate these markers in independent cohorts and test mechanisms in vitro/in vivo, ideally integrating proteomic/epigenomic and spatial transcriptomic layers to refine heterogeneity and clinical translation.
Supporting information
S1 Fig. Schematic overview of the study workflow.
This figure includes the data acquisition from TCGA-STAD, preprocessing, differential expression analysis, functional enrichment, PPI network construction, hub gene identification, clustering, and external validation.
https://doi.org/10.1371/journal.pone.0344143.s001
(TIF)
S1 File. Complete list of differentially expressed genes (DEGs) identified from TCGA-STAD RNA-seq analysis.
The table includes gene symbols, log2 fold change (log2FC) for all 6500 genes.
https://doi.org/10.1371/journal.pone.0344143.s002
(RAR)
S1 Table. Top 100 upregulated and top 100 downregulated genes identified from TCGA-STAD analysis after applying significance thresholds and expression filtering criteria.
The table includes gene symbols, log2 fold change (log2FC) of top 200 DEGs.
https://doi.org/10.1371/journal.pone.0344143.s003
(XLSX)
S2 Table. Complete differential expression results from GEO2R analysis of the GSE54129 dataset.
The table includes probe identifiers, corresponding gene symbols, log2 fold change (log2FC), and statistical significance values (adjusted p-values/FDR) for all analyzed genes.
https://doi.org/10.1371/journal.pone.0344143.s004
(XLSX)
References
- 1. Chen Y, Jia K, Xie Y, Yuan J, Liu D, Jiang L, et al. The current landscape of gastric cancer and gastroesophageal junction cancer diagnosis and treatment in China: a comprehensive nationwide cohort analysis. J Hematol Oncol. 2025;18(1):42. pmid:40234884
- 2. Mousavi SE, Ilaghi M, Elahi Vahed I, Nejadghaderi SA. Epidemiology and socioeconomic correlates of gastric cancer in Asia: results from the GLOBOCAN 2020 data and projections from 2020 to 2040. Sci Rep. 2025;15(1):6529. pmid:39988724
- 3. Lin J-L, Lin J-X, Lin G-T, Huang C-M, Zheng C-H, Xie J-W, et al. Global incidence and mortality trends of gastric cancer and predicted mortality of gastric cancer by 2035. BMC Public Health. 2024;24(1):1763. pmid:38956557
- 4. Shin WS, Xie F, Chen B, Yu P, Yu J, To KF, et al. Updated Epidemiology of Gastric Cancer in Asia: Decreased Incidence but Still a Big Challenge. Cancers (Basel). 2023;15(9):2639. pmid:37174105
- 5. Zhang Z, Wang J, Song N, Shi L, Du J. The global, regional, and national burden of stomach cancer among adolescents and young adults in 204 countries and territories, 1990–2019: A population-based study. Front Public Health. 2023;11:1079248.
- 6. Zhang H, Yang W, Tan X, He W, Zhao L, Liu H, et al. Long-term relative survival of patients with gastric cancer from a large-scale cohort: a period-analysis. BMC Cancer. 2024;24(1):1420. pmid:39558281
- 7. Lin X, Yang P, Wang M, Huang X, Wang B, Chen C, et al. Dissecting gastric cancer heterogeneity and exploring therapeutic strategies using bulk and single-cell transcriptomic analysis and experimental validation of tumor microenvironment and metabolic interplay. Front Pharmacol. 2024;15:1355269. pmid:38962317
- 8. Mottaghi-Dastjerdi N, Ghorbani A, Montazeri H, Guzzi PH. A systems biology approach to pathogenesis of gastric cancer: gene network modeling and pathway analysis. BMC Gastroenterol. 2023;23(1):248. pmid:37482618
- 9. Khoshdel F, Mottaghi-Dastjerdi N, Yazdani F, Salehi S, Ghorbani A, Montazeri H, et al. CTGF, FN1, IL-6, THBS1, and WISP1 genes and PI3K-Akt signaling pathway as prognostic and therapeutic targets in gastric cancer identified by gene network modeling. Discov Oncol. 2024;15(1):344. pmid:39133458
- 10. Rahimi-Farsi N, Shahbazi T, Ghorbani A, Mottaghi-Dastjerdi N, Yazdani F, Mohseni P, et al. Network-based analysis of candidate oncogenes and pathways in hepatocellular carcinoma. Biochem Biophys Rep. 2025;43:102086. pmid:40546347
- 11. Alvani A, Mottaghi-Dastjerdi N, Gholami A, Ghorbani A, Pazhoohesh Z, Pajdam M, et al. Identification of indirect pathways enhancing the biocompatibility of DOX/GO/Fe3O4 nanomaterials in Glioblastoma: Gene network modeling and pathway analysis. Biochem Biophys Res Commun. 2025;773:152077. pmid:40440993
- 12. Rahimi-Farsi N, Ghorbani A, Mottaghi-Dastjerdi N, Shahbazi T, Bostanian F, Mohseni P, et al. Comprehensive systems biology analysis of microRNA-101-3p regulatory network identifies crucial genes and pathways in hepatocellular carcinoma. J Genet Eng Biotechnol. 2025;23(1):100471. pmid:40074445
- 13. Rahimi H, Farzadifar E, Mottaghi-Dastjerdi N, Ghorbani A, Amerizadeh F, Pasdar A. Integrative Systems Biology Analysis of MicroRNA Regulation in Wnt Signaling Pathway During Breast Cancer Progression. Int J Cancer Manag. 2025;18(1).
- 14. Saeidi M, Pasdar A, Rahmani F, Ghorbani A, Mottaghi N, Amerizadeh F. A Prospective to Regulatory Role of miRNAs on Wnt/β-catenin Signaling and Its Crosstalk to the Other Cellular Pathways in Tumorigenesis of Glioblastoma by a Systems Biology Approach. Int J Cancer Manag. 2025;18(1).
- 15. Yazdani F, Mottaghi-Dastjerdi N, Shahbazi B, Ahmadi K, Ghorbani A, Soltany-Rezaee-Rad M, et al. Identification of key genes and pathways involved in T-DM1-resistance in OE-19 esophageal cancer cells through bioinformatics analysis. Heliyon. 2024;10(18):e37451. pmid:39309859
- 16. Soltany-Rezaee-Rad M, Mottaghi-Dastjerdi N, Setayesh N, Roshandel G, Ebrahimifard F, Sepehrizadeh Z. Overexpression of FOXO3, MYD88, and GAPDH Identified by Suppression Subtractive Hybridization in Esophageal Cancer Is Associated with Autophagy. Gastroenterol Res Pract. 2014;2014:185035. pmid:24527027
- 17. Mottaghi-Dastjerdi N, Soltany-Rezaee-Rad M, Sepehrizadeh Z, Roshandel G, Ebrahimifard F, Setayesh N. Genome expression analysis by suppression subtractive hybridization identified overexpression of Humanin, a target gene in gastric cancer chemoresistance. Daru. 2014;22(1):14. pmid:24401285
- 18. Mottaghi-Dastjerdi N, Soltany-Rezaee-Rad M, Sepehrizadeh Z, Roshandel G, Ebrahimifard F, Setayesh N. Identification of novel genes involved in gastric carcinogenesis by suppression subtractive hybridization. Hum Exp Toxicol. 2015;34(1):3–11. pmid:24812152
- 19. Xie Z, Bailey A, Kuleshov MV, Clarke DJB, Evangelista JE, Jenkins SL, et al. Gene Set Knowledge Discovery with Enrichr. Curr Protoc. 2021;1(3):e90. pmid:33780170
- 20. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638–46. pmid:36370105
- 21. Ono K, Fong D, Gao C, Churas C, Pillich R, Lenkiewicz J, et al. Cytoscape Web: bringing network biology to the browser. Nucleic Acids Res. 2025;53(W1):W203–12. pmid:40308211
- 22. Chin C-H, Chen S-H, Wu H-H, Ho C-W, Ko M-T, Lin C-Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8 Suppl 4(Suppl 4):S11. pmid:25521941
- 23. Li M, Li D, Tang Y, Wu F, Wang J. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks. Int J Mol Sci. 2017;18(9):1880. pmid:28858211
- 24. Chandrashekar DS, Karthikeyan SK, Korla PK, Patel H, Shovon AR, Athar M, et al. UALCAN: An update to the integrated cancer data analysis platform. Neoplasia. 2022;25:18–27. pmid:35078134
- 25. Clough E, Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, et al. NCBI GEO: archive for gene expression and epigenomics data sets: 23-year update. Nucleic Acids Res. 2024;52(D1):D138–44. pmid:37933855
- 26. Li Z, Lu T, Chen Z, Yu X, Wang L, Shen G, et al. HOXA11 promotes lymphatic metastasis of gastric cancer via transcriptional activation of TGFβ1. iScience. 2023;26(8):107346. pmid:37539033
- 27. Wang Y, Liu H, Zhang M, Xu J, Zheng L, Liu P, et al. Epigenetic reprogramming in gastrointestinal cancer: biology and translational perspectives. MedComm (2020). 2024;5(9):e670. pmid:39184862
- 28. Chen Q, Wang Y, Liu Y, Xi B. ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning. Front Physiol. 2022;13:905523. pmid:35812327
- 29. Akhtar A, Hameed Y, Ejaz S, Abdullah I. Identification of gastric cancer biomarkers through in-silico analysis of microarray based datasets. Biochem Biophys Rep. 2024;40:101880. pmid:39655267
- 30. Ming C, Orita H, Shangcheng Y, Yuan Q, Fedor CN, Yongyou W, et al. The role of adiponectin in gastric cancer. J Cancer Metastasis Treat. 2023.
- 31. Sun Y, Nie W, Xiahou Z, Wang X, Liu W, Liu Z, et al. Integrative single-cell and spatial transcriptomics uncover ELK4-mediated mechanisms in NDUFAB1+ tumor cells driving gastric cancer progression, metabolic reprogramming, and immune evasion. Front Immunol. 2025;16:1591123. pmid:40688093
- 32. Chen H, Jing C, Shang L, Zhu X, Zhang R, Liu Y, et al. Molecular characterization and clinical relevance of metabolic signature subtypes in gastric cancer. Cell Rep. 2024;43(7):114424. pmid:38959111
- 33. Lau DK, Collin JP, Mariadason JM. Clinical Developments and Challenges in Treating FGFR2-Driven Gastric Cancer. Biomedicines. 2024;12(5):1117. pmid:38791079
- 34. Edirisinghe O, Ternier G, Alraawi Z, Suresh Kumar TK. Decoding FGF/FGFR signaling: insights into biological functions and disease relevance. Biomolecules. 2024;14(12):1622.
- 35. Cai X, Yang J, Guo Y, Yu Y, Zheng C, Dai X. Re-analysis of single cell and spatial transcriptomics data reveals B cell landscape in gastric cancer microenvironment and its potential crosstalk with tumor cells for clinical prognosis. J Transl Med. 2024;22(1):807. pmid:39215354