Deciphering the molecular network of Trichostatin A in regulating Alzheimer’s disease screening of core genes and mechanistic investigation based on multidimensional bioinformatics and molecular simulation

doi:10.1371/journal.pone.0347532

Fig 1.

Flow-chart of dataset analysis in this paper.

This workflow delineates the regulatory network of TSA in AD, encompassing TSA putative target prediction, AD-related gene and core module screening from 5 GEO datasets, core gene identification via 130 integrated machine learning algorithms, multi-dimensional functional validation (including immune and single-cell profiling, regulatory network analysis), and ligand-receptor binding verification via molecular docking and 100 ns molecular dynamics simulations.

More »

Expand

Table 1.

Dataset content.

More »

Expand

Fig 2.

Schematic of TSA target prediction and AD-related gene screening.

(A). 2D chemical structure of TSA (PubChem CID: 444732). (B). Venn diagram of TSA potential targets predicted by ChEMBL, PharmMapper and SwissTargetPrediction, with 949 non-redundant unique targets obtained. (C). PCA plots of merged training cohort (n = 295) before/after ComBat batch correction. (D). Volcano plot of AD DEGs (screening thresholds: |logFC| > 0.4, adj.P.Val < 0.05). (E). Heatmap of module-trait correlations from WGCNA. (F). Bar chart of gene counts in each WGCNA co-expression module. (G). Venn diagram of the intersection between 1363 DEGs and 2756 genes in the WGCNA MEturquoise module (1070 overlapping genes obtained). (H). Venn diagram of the intersection between 1070 module genes and 949 TSA targets (60 key genes screened).

More »

Expand

Fig 3.

Functional Annotation of 60 TSA-associated Key Genes for AD and PPI Network Analysis.

(A-C). GO, KEGG, and DO enrichment bubble plots (significance threshold: P < 0.05, adjusted q < 0.05). (D). PPI network constructed based on the STRING database (confidence threshold was set at 0.4). (E). Key nodes of the PPI network identified by the cytoHubba plugin. (F). Key PPI nodes identified by CytoNCA plugin; red = upregulated genes in AD, blue = downregulated genes in AD. (G). Venn diagram of the intersection of screening results from cytoHubba and CytoNCA (a total of 14 candidate core genes were obtained).

More »

Expand

Fig 4.

Construction and Performance Evaluation of Machine Learning Models for Core Gene Screening.

(A). AUROC heatmap of 130 models across 3 cohorts, with the optimal glmBoost + RF model (AUROC = 0.994) marked in red, used for final core gene screening and AD risk prediction. (B). Top 10 gene feature importance bar plot of the optimal model, quantifying each gene’s contribution to AD diagnostic efficacy. (C). Calibration curve of the optimal model (45° line = perfect calibration), verifying the consistency between predicted and actual AD risk. (D). Nomogram for individualized AD risk prediction based on the optimal model, quantifying the independent contribution of each core gene to AD risk. (E). DCA plot comparing clinical net benefit of the combined model vs. single-gene prediction, confirming the superior clinical application value of the 8-gene model. (F). Cost-benefit analysis plot based on the GSE44771 dataset, evaluating the model’s clinical application potential under different risk thresholds.

More »

Expand

Fig 5.

Diagnostic efficacy validation of 8 core genes and SHAP-based model interpretability analysis.

(A-C). ROC curves of the 8-gene combined model in GSE44771 (129 AD, 101 controls) and GSE109887 (46 AD, 32 controls) validation cohorts, verifying the stable diagnostic performance of the core genes. (B-D). Box plots of 8 core genes’ expression in two validation cohorts; unpaired Student’s t-test, **** P < 0.0001, showing the significant AD-specific differential expression pattern of core genes. (E). SHAP beeswarm plot of core genes ranked by mean SHAP value, clarifying the direction and magnitude of each gene’s effect on AD prediction. (F). Bar plot of AUC reduction after gene permutation, identifying KCNC2 as the top contributor to the model’s predictive performance. (G). SHAP waterfall plot for a representative AD sample, intuitively showing each gene’s contribution to individual AD risk prediction. (H). SHAP scatter plot of core genes, revealing the nonlinear effect of core genes on AD risk and their co-expression pattern.

More »

Expand

Fig 6.

Immune infiltration characteristics of AD and correlation with 8 core genes analyzed via CIBERSORT.

(A). Immune cell abundance heatmap of the training set (142 Ctrl, 153 AD cases). (B). Spearman correlation heatmap of infiltrating immune cell populations. (C). Box plot of differentially infiltrated immune cells between AD and Ctrl groups (* P < 0.05, *** P < 0.001). (D). Spearman correlation heatmap between core genes and infiltrating immune cells, significant results (adjusted P < 0.05) marked in bold.

More »

Expand

Fig 7.

Pathway Enrichment Analysis of Core Genes and Regulatory Network Construction.

(A). GSEA enrichment plot of core genes, highlighting top AD-related significant pathways. (B). ceRNA regulatory network centered on 8 core genes (48 lncRNAs, 51 miRNAs; light green = lncRNAs, green = miRNAs, orange = core mRNAs). (C). Bar chart of NES of significantly enriched TF motifs (RcisTarget, NES > 3.5). (D). TF motif gene recovery curve, verifying the enrichment reliability of core target genes.

More »

Expand

Fig 8.

Preprocessing and dimensionality reduction analysis of scRNA-seq data (GSE161045: 4 AD vs 4 controls) performed via Seurat.

(A). PCA plot of scRNA-seq samples, showing cell population separation between AD and Ctrl groups. (B). Quality control distribution plot (filtering criteria: ≥ 300 genes/cell, mitochondrial ratio <20%), ensuring high data quality for subsequent analysis, ensuring high data quality for subsequent analysis. (C). Scatter plot of gene average expression vs. normalized variance; 2500 highly variable genes are marked in red, providing the basis for cell clustering and annotation. (D). Heatmap of cluster-specific marker genes, showing the unique expression signature of each cell cluster. (E). UMAP plot of 22 cell clusters (resolution = 0.6, top 15 PCs), displaying the spatial distribution of brain cell populations.

More »

Expand

Fig 9.

Cell Type Annotation and Core Gene Expression Characteristic Analysis of scRNA-seq Data.

(A). UMAP plot of annotated cell types (Astrocytes, Endothelial cells/Pericytes, Microglia, Neurons, Oligodendrocytes, OPC), clarifying the main cell populations involved in AD pathology. (B). Bar chart of cell type proportion in each sample, showing cell composition changes in AD brain. (C). Sankey diagram of the top 10 marker genes per cell type, displaying cell-specific marker genes. (D). Violin plot of core gene expression across cell clusters, revealing the cell-type-specific expression profile of core genes, and providing a cellular basis for TSA’s regulatory mechanism in AD.

More »

Expand

Table 2.

Binding energies of ligands and receptors.

More »

Expand

Fig 10.

Molecular docking analysis of the binding between TSA and 7 core AD target proteins performed via CB-Dock2 (AutoDock Vina v1.1.2).

(A-G). Molecular docking conformation diagrams of complexes formed by TSA binding to MET, GRIA2, GABRB2, GABARAPL1, EGR1, EFNA1 and CDK5 proteins, respectively. Receptor: ribbon model; TSA: stick model; yellow dashed lines: hydrogen bonds.

More »

Expand

Fig 11.

100 ns MD Simulation Analysis and Disease Prediction Based on Core Genes.

A. Dynamic parameters of TSA-CDK5 complex during 100 ns MD simulation: protein backbone RMSD stabilized within 0.3 nm after 40 ns; hydrogen bond count averaged 1 (0–2 fluctuations); Rg maintained at ~3.42 nm; SASA fluctuated around 147 nm²; most residue RMSF < 0.3 nm; MMPBSA binding energy dG = −1.163 kcal/mol, confirming weak binding affinity. B. Dynamic parameters of TSA-GABRB2 complex during 100 ns MD simulation: protein backbone RMSD stabilized at ~0.25 nm after 40 ns; hydrogen bond count averaged 1 (0–3 fluctuations); Rg maintained at ~7.63 nm; SASA fluctuated around 190 nm²; most residue RMSF < 0.3 nm; MMPBSA binding energy dG = −0.017 kcal/mol, confirming weak binding affinity. C. Disease prediction result plot based on core genes and the DisGeNET database, verifying the close association between core genes and AD-related neurodegenerative diseases.

More »

Expand

Table 3.

Predicted drug candidates table.

More »

Expand