Skip to main content
Advertisement

< Back to Article

Fig 1.

Development of a machine learning method for predicting neutrophil-specific functional genes.

(A) NeuRGI model training workflow involved: 1) extracting gene features from various databases. 2) using genes of neutrophil-related genes as positives and PU-learning as negatives. 3) balancing the training set with under-sampling and training the NeuRGI random forest model with 10-fold cross-validation, then employing a Gaussian Mixture Model (GMM) with NeuRGI scores to identify potential positives. 4) using OntoVAE for in silico knockout of GMM-classified genes to find key regulatory factors for guiding follow-up experiments. (B) AUC and PR curves and the mean AUC value for the NeuRGI model trained from 10-fold cross-validation. The data is split into 10 parts, with 9 parts for training and 1 for testing, repeated 10 times with each part used once for testing. (C) The boxplot illustrates the transformed feature values for 293 known positive genes and 293 PU-learning negative genes in four feature groups. The p value was calculated using the Student’s t-test. (D) Distribution of the relative importance score calculated by average mean decrease Gini for features used in NeuRGI model (color-coded by category).

More »

Fig 1 Expand

Fig 2.

Machine learning reveals MAP4K4 as a novel regulator of hematopoietic neutrophil differentiation.

(A) Three-component Gaussian Mixture Models (GMM) were applied to the NeuRGI scores across all 19,288 predicted genes. The solid black line represents the overall score distribution, dashed lines indicate the three probability density functions (PDFs). Intersections of the three PDFs define NeuRGI thresholds for classifying genes into function (red shadow), non-function (blue shadow), and uncertain (middle white) categories. (B) The line plot depicts the overlap of functional transcription factors (TFs) predicted by CellOracle and NeuRGI (red line), and by SCENIC and NeuRGI (blue line). The x-axis represents the top N TFs ranked by CellOracle’s perturbation score or SCENIC regulon activity. The y-axis shows the percentage of overlap between the predicted functional TFs. (C) The boxplot illustrates the transformed feature values for 4,786 functional genes and 4,734 non-functional genes in four feature groups. The p value was calculated using the Student’s t-test. (D) The GO-BP term network illustrates the five main clusters enriched from 4,786 predictive functional genes (see Methods). Each dot represents a GO term and the dot size indicates the enrichment score. The dashed oval indicates GO terms with similar functions. (E) Proportion of regulatory (red) and other (blue) genes across three gene sets. Regulatory genes include enzyme (Enz), membrane protein (MP), RNA binding protein (RBP), and transcription factor (TF). The p value was calculated by the proportion test. Numbers within bars represent the gene counts. (F) Boxplot of NeuRGI scores, color-coded by gene type: Enzyme (Enz), membrane proteins (MP), RNA binding proteins (RBP), Transcription factors (TF), and others. The p value was calculated using the Student’s t-test. (G) The scatter plot shows the impact of in silico knockout of 2,569 predictive functional regulatory genes on the “positive regulation of myeloid cell differentiation” pathway and MAGMA Zscore (neutrophil count). The y-axis represents the -log10 (p value) on the pathway after gene in silico knockout (higher values indicate greater impact), and the x-axis represents the gene’s effect on the ‘neutrophil count’ trait (higher Zscores indicate greater impact). Different colors represent different categories of genes. Dot size indicates the NeuRGI score, and contour lines show point density. A cutoff (y = 1.8 and x = 3.6) was set based on the contour lines, dividing the scatter plot into four regions. (H) Expression of top 12 genes in different immune cells from ImmuNexUT, including Naïve CD4 T cells (Naïve CD4), Memory CD4 T cells (Mem CD4), T helper 1 cells (Th1), T helper 2 cells (Th2), T helper 17 cells (Th17), T follicular helper cells (Tfh), Fraction II effector regulatory T cells (Fr. II eTreg), Fraction I naïve regulatory T cells (Fr. I nTreg), Fraction III non-regulatory T cells (Fr. III T), Naïve CD8 T cells (Naïve CD8), CD8+ T effector memory CD45RA+ cells (TEMRA CD8), Effector Memory CD8 T cells (EM CD8), Central Memory CD8 T cells (CM CD8), Naïve B cells (Naïve B), Unswitched memory B cells (USM B), Switched memory B cells (SM B), Double Negative B cells (DN B), Plasmablasts (Plasmablast), Natural Killer cells (NK), CD16 positive monocytes (CD16p Mono), Non-classical monocytes (NC Mono), Intermediate monocytes (Int Mono), Classical monocytes (CL Mono), Myeloid dendritic cells (mDC), Plasmacytoid dendritic cells (pDC), neutrophils (Neu), Low-Density Granulocytes (LDG). (I) Expression of 9 genes in neutrophil differentiation of human UCB. These 9 of the top 12 genes dynamically upregulated, including CREBBP, CYP27A1, JAK2, IFNGR1, MAP4K4, PLCG2, PTPRC, TIGAR, and TYK2. We set ‘time cut’ for cells at different differentiation stages, with HSC set as 1 and Neu as 5, and performed linear regression fitting for the expression of all 9 genes. R represents the Pearson correlation coefficient (R), and the p value (p) was calculated by the Student’s t-test. (J) Expression of top 10 genes in single-cell pseudotime analysis of 2,803 neutrophils in mouse bone marrow. All 12 genes except Cyp27a1 and HLA-DQA1 upregulated during neutrophil maturation. (K) The heatmap displays the log2 (fold change) in the expression of the top 12 genes in neutrophils from patients with 10 immune-related diseases compared to those from healthy individuals, including ANCA-associated vasculitis (AAV), Takayasu arteritis (TAK), Adult-onset Still’s disease (AOSD), Behçet’s disease (BD), Rheumatoid arthritis (RA), Systemic sclerosis (SSc), Idiopathic inflammatory myopathy (Myo), Sjögren’s syndrome (SjS), Mixed connective tissue disease (MCTD), Systemic lupus erythematosus (SLE). The p value was calculated using the Student’s t-test, and * represents statistical significance (p < 0.05). The histogram represents the number of significant diseases for each gene. (L) Bar plot displaying significantly affected pathways after OntoVAE in silico knockout of MAP4K4 in neutrophils. The blue dashed line represents the significance threshold (p = 0.05).

More »

Fig 2 Expand

Fig 3.

Loss of MAP4K4 expression causes a decrease in the number of neutrophils.

(A) Scheme of all-trans-retinoic acid (ATRA)-induced neutrophil differentiation of the hematopoietic HL-60 cell models (left); flow cytometry analysis of the neutrophil marker CD11b has validated that ATAR can induce differentiation of HL-60 cells into neutrophils. This is demonstrated by histogram plots showing variations in mean fluorescence intensity (MFI) across different conditions: cells untreated with ATAR, cells treated with ATAR for 72 hours, and cells treated with ATAR for 96 hours (right). (B) Percentage of CD11b+ cells in HL-60 cells were treated with 0-20 µM MAP4K4 inhibitor (PF-06260933, PF) for 48 hours, followed by a subsequent 96-hour treatment with both PF and all-trans-retinoic acid (ATRA). (C) The number of granulocyte-macrophage progenitor cells (CFU-GM, CFU-G, CFU-M) colonies formed by 25,000 whole bone marrow cells from control (Ctrl) or Map4k4-cKO (cKO) mice; BM, bone marrow (n=4; mean ± SD). (D) Numbers of neutrophils in the bone marrow of control (Ctrl) or Map4k4-cKO (cKO) mice; BM, bone marrow; Neu, neutrophil. (Ctrl n=7, cKO n=6; mean ± SD). (E) PB WBC numbers of control (Ctrl) or Map4k4-cKO (cKO) mice (Ctrl n=7, cKO n=6; mean ± SD). PB, peripheral blood; WBC, white blood cell; (Ctrl n=7, cKO n=6; mean ± SD). (F) PB Mon, and GR numbers of control (Ctrl) or Map4k4-cKO (cKO) mice; PB, peripheral blood; Mon, monocyte; GR, granulocyte; (Ctrl n=7, cKO n=6; mean ± SD). (G) Numbers of eosinophils in PB of control (Ctrl) or Map4k4-cKO (cKO) mice; PB, peripheral blood; Eso, eosinophils; (Ctrl n=7, cKO n=6; mean ± SD). (H) Numbers of monocytes in PB of control (Ctrl) or Map4k4-cKO (cKO) mice; PB, peripheral blood; Mon, monocytes; (Ctrl n=7, cKO n=6; mean ± SD). (I) Numbers of neutrophils in PB of control (Ctrl) or Map4k4-cKO (cKO) mice; PB, peripheral blood; Neu, neutrophil; (Ctrl n=7, cKO n=6; mean ± SD). (J) Spleen cell numbers control (Ctrl) or Map4k4-cKO (cKO) mice (Ctrl n=7, cKO n=6; mean ± SD). (K) Numbers of eosinophils in the spleen of control (Ctrl) or Map4k4-cKO (cKO) mice; SP, spleen; Eso, eosinophils; (Ctrl n=7, cKO n=6; mean ± SD). (L) Numbers of monocytes in the spleen of control (Ctrl) or Map4k4-cKO (cKO) mice; SP, spleen; Mon, monocytes; (Ctrl n=7, cKO n=6; mean ± SD). (M) Representative FACS analysis of spleen CD11b+ Ly6G+ neutrophils (left); percentage of neutrophils in the spleen of control (Ctrl) or Map4k4-cKO (cKO) mice (right); SP, spleen; Neu, neutrophil; (Ctrl n=7, cKO n=6; mean ± SD). (N) Numbers of neutrophils in the spleen of control (Ctrl) or Map4k4-cKO (cKO) mice; SP, spleen; Neu, neutrophil; (Ctrl n=7, cKO n=6; mean ± SD). Mann-Whitney U test.

More »

Fig 3 Expand

Fig 4.

Loss of MAP4K4 expression impairs the differentiation process of bone marrow neutrophils.

(A) Overview of study design, created using Biorender. (B) UMAP Plots for single-cell gene expression pooled across bone marrow samples, clusters visualized and labeled by cell type. The plot is split by control (Ctrl) vs Map4k4-cKO (cKO). (C) Proportions of the six neutrophil clusters in control (Ctrl) and Map4k4-cKO (cKO) samples. The p value was calculated by the adjusted proportion test. (D) Numbers of neutrophil progenitor cells in the bone marrow of control (Ctrl) or Map4k4-cKO (cKO) mice; c-KithiLy6Gneg (myeloblasts, MB); c-KitintLy6Gneg (promyelocytes, PM); c-KitnegLy6Glow (myelocytes, MC); c-KitnegLy6Gint (metamyelocytes, MM); and c-KitnegLy6Ghi (band cells and segmented neutrophils, BC/SC); BM, bone marrow (Ctrl n=7, cKO n=6; mean ± SD). Mann-Whitney U test. (E) ROS generation of neutrophil progenitor cells in BM of control (Ctrl) or Map4k4-cKO (cKO) mice; MFI, mean fluorescent intensity; ROS, reactive oxygen species; MB, myeloblasts; PM, promyelocytes; MC, myelocytes; MM, metamyelocytes; BC/SC, band cells and segmented neutrophils; (n=6; mean ± SD). Mann-Whitney U test. (F) The number of DEGs in control (Ctrl) vs Map4k4-cKO (cKO) of six neutrophil subpopulations. (G) GO-BP analysis of cluster-based DEGs between control (Ctrl) and Map4k4-cKO (cKO) HSC to neutrophils. Selected GO terms with Benjamini-Hochberg-corrected p values < 0.05 (one-sided Fisher’s exact test) are shown. The dot size represented the number of genes. The color scale represented the adjusted p value. Fig 4A was created using Biorender.

More »

Fig 4 Expand

Fig 5.

Loss of MAP4K4 expression affects apoptosis during the process of neutrophil differentiation.

(A) UMAP plot of the neutrophil co-expression network. Each node represents a single gene, and edges represent co-expression links between genes and module hub genes. Point size is scaled by eigengene-based connectivity (kME). Nodes are colored by co-expression module assignment. The top five hub genes per module are labeled. Network edges were down-sampled for visual clarity. (B) Heatmap of scaled gene expression for the top 10 hub genes by kME in each module. (C) UMAP colored by Pseudotime. (D) UMAP colored by harmonized module eigengenes (hMEs) of each module. (E) Module eigengenes (MEs) as a function of pseudotime for each co-expression module. For each module, a separate loess regression line is shown for each condition. The p value was calculated by distance correlation t-test (dcorT.test). dcor is a transformation of a bias-corrected version of distance correlation. The smaller the dcor, the greater the distance between control (Ctrl) and Map4k4-cKO (cKO) conditions. (F) Differentially expressed hub genes between control (Ctrl) and Map4k4-cKO (cKO) parasites involved in positive regulation of apoptotic process determined by Gene Set Enrichment Analysis (GSEA). RES represents the running enrichment score. RLM represents the ranked list metric. (G) Violin plots of apoptosis scores of control (Ctrl) against Map4k4-cKO (cKO) for each cluster, the p value was calculated by the Student’s t-test. (H) Percentages of annexin V+ and SYTOX cells in bone marrow neutrophil progenitor cells of control (Ctrl) or Map4k4-cKO (cKO) mice; MM, metamyelocytes; BC/SC, band cells and segmented neutrophils; BM, bone marrow (n=10; mean ± SD). Mann-Whitney U test. (I) Percentages of annexin V+ and SYTOX cells of HL-60 cells treated with 0-20 µM MAP4K4 inhibitor (PF-06260933, PF) for 48 hours, followed by a subsequent 96-hour treatment with both PF and all-trans-retinoic acid (ATRA) (n=4; mean ± SD). Mann-Whitney U test. (J) Quantification of HL-60 cell numbers after a 48-hour treatment with a 10 µM concentration of the MAP4K4 inhibitor (PF-06260933, PF), followed by a subsequent 96-hour treatment with both PF and all-trans-retinoic acid (ATRA) (n=4; mean ± SD). Mann-Whitney U test.

More »

Fig 5 Expand

Fig 6.

Knocking out MAP4K4 reduces phosphorylation levels in the apoptosis pathway, and overexpression STAT5A rescued MAP4K4 deficiency-caused neutrophil apoptosis.

(A) Schematic of mass spectrometry analysis, created using Biorender. (B) The number of proteins with significant differences in protein and phosphorylation levels after MAP4K4 knockout. (C) Comparison heatmap of log 2 (fold-change) in protein and phosphorylation levels after MAP4K4 knockout. (D) Differential phosphorylation level between control and MAP4K4 KO parasites is involved in the positive regulation of the apoptotic process, as determined by Gene Set Enrichment Analysis (GSEA). RES represents the running enrichment score. RLM represents the ranked list metric. (E) Volcano plot of the apoptosis-related proteins with differentially expressed phosphorylation levels. The x-axis shows the phosphorylation difference of log2 (fold-change) between MAP4K4 KO and Ctrl. The y-axis shows –log10 (adjusted p value) of the t-test between these two groups. Red indicates proteins with up-regulated phosphorylation after MAP4K4 KO, and blue, proteins with down-regulated phosphorylation after MAP4K4 KO. The top 20 proteins with the most significant differences in phosphorylation among the apoptosis-related proteins are visualized. (F) Western blot (WB) for MAP4K4, p-STAT5A, STAT5A, p-STAT5B, STAT5B, and GAPDH in sgAASV1 and sgMAP4K4 cells. (G) Representative FACS analysis of HL-60 cells were first infected with overexpression of STAT5A or STAT5B or control plasmids and then treated with 10 µM MAP4K4 inhibitor (PF-06260933, PF) for 48 hours, followed by a subsequent 96-hour treatment with both PF and all-trans-retinoic acid (ATRA) for staining of anti–annexin V and SYTOX was shown (left). Percentages of annexin V+ and SYTOX cells (right) (n=4; mean ± SD). Mann-Whitney U test. Fig 6A was created with Biorender.

More »

Fig 6 Expand