Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction of myeloid malignant cells in Fanconi anemia using machine learning

  • Luis A. Flores-Mejía,

    Roles Conceptualization, Formal analysis, Investigation, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliations Departamento de Medicina Genómica y Toxicología Ambiental, Universidad Nacional Autónoma de México, México, Laboratorio de Falla Medular & Carcinogénesis, Instituto Nacional de Pediatría, México

  • Pablo Siliceo,

    Roles Data curation, Methodology, Resources, Software, Writing – review & editing

    Affiliations Departamento de Medicina Genómica y Toxicología Ambiental, Universidad Nacional Autónoma de México, México, Laboratorio de Falla Medular & Carcinogénesis, Instituto Nacional de Pediatría, México, Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, México

  • Ulises Juárez Figueroa,

    Roles Data curation, Methodology, Writing – review & editing

    Affiliations Laboratorio de Citogenética, Instituto Nacional de Pediatría, México, Programa de Doctorado en Ciencias Biológicas, Universidad Nacional Autónoma de México, México

  • Angel A. De la Cruz,

    Roles Data curation, Methodology, Validation, Writing – review & editing

    Affiliations Departamento de Medicina Genómica y Toxicología Ambiental, Universidad Nacional Autónoma de México, México, Laboratorio de Falla Medular & Carcinogénesis, Instituto Nacional de Pediatría, México, Programa de Maestría y Doctorado en Ciencias Bioquímicas, Universidad Nacional Autónoma de México, México

  • Cecilia Ayala-Zambrano,

    Roles Conceptualization, Supervision

    Affiliations Departamento de Medicina Genómica y Toxicología Ambiental, Universidad Nacional Autónoma de México, México, Laboratorio de Falla Medular & Carcinogénesis, Instituto Nacional de Pediatría, México

  • Hugo Tovar,

    Roles Methodology, Software

    Affiliation Computational Genomics Division, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, Mexico

  • Sara Frías,

    Roles Conceptualization, Project administration, Supervision

    Affiliations Departamento de Medicina Genómica y Toxicología Ambiental, Universidad Nacional Autónoma de México, México, Laboratorio de Citogenética, Instituto Nacional de Pediatría, México

  • Alfredo Rodríguez

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    alfredo.rodriguez@iibiomedicas.unam.mx

    Affiliations Departamento de Medicina Genómica y Toxicología Ambiental, Universidad Nacional Autónoma de México, México, Laboratorio de Falla Medular & Carcinogénesis, Instituto Nacional de Pediatría, México

Abstract

Fanconi anemia (FA) is an inherited bone marrow failure syndrome with cancer predisposition. Most FA patients develop aplastic anemia during childhood and have an extremely high cumulative risk to develop cancer during their lifespan. Myeloid malignancy is one of the main neoplastic risks for patients with FA, including high-risk myelodysplastic syndrome (MDS), recently renamed as myelodysplastic neoplasm, and acute myeloid leukemia (AML). Although bone marrow transplantation is the treatment of choice for FA patients that develop aplastic anemia, patients with a more stable bone marrow remain not transplanted and at a high risk of presenting MDS/AML, these patients therefore should be monitored for appearance of myeloid malignant clones. Markers for an as-early-as-possible identification of emerging myeloid malignant cells are needed for the monitoring of patients with FA, since quick medical action after detection of neoplastic transformation is needed. In this work we have developed a deep neural network (DNN) model that was trained with publicly available single cell RNA-seq (scRNA-seq) datasets of patients with AML and used to predict the presence of AML-like cells in scRNA-seq datasets obtained from bone marrow samples of patients with FA. The predictor displayed high sensitivity, specificity, and accuracy for the detection of single-cell resolution myeloid malignant transcriptional profiles. Functional analyses of the predicted-AML cells from FA patients showed enrichment of lympho-myeloid-primed progenitor (LMPP) and granulocyte-monocyte progenitor (GMP) populations, as well as transcriptional profiles associated with malignant transformation. Cues of immune evasion were also detected using single cell pathway analysis (SCPA) and cell-cell communication profiles.

Background

Inherited bone marrow failure syndromes (IBMFS) are rare diseases characterized by physical abnormalities, and an exacerbated risk to develop bone marrow failure (BMF) [1,2]. Among them, Fanconi anemia (FA) is the most frequent, with a global occurrence estimated to be 1 in every 160,000–360,000 individuals, with a carrier frequency of 0.3% and a higher prevalence in populations with high consanguinity rates [3,4]. Patients with FA have inherited defects in the FA/BRCA pathway, responsible for the repair of DNA interstrand crosslinks (ICLs) [57].

Individuals with FA exhibit an extremely high predisposition, 500–1000 times higher than that in age-matched peers to develop squamous cell carcinomas in the oral cavity and anogenital region [7,8], and a significantly increased lifetime risk to develop myeloid malignancies, such as myelodysplastic neoplasm (MDS, up to 6000-fold increased risk) and acute myeloid leukemia (AML, approximately 700 times higher than that in the general population) [7,9,10].

Given the complex aetiology of FA, multiple therapeutic approaches are implemented for treating BMF in these patients, including administration of androgens and hematopoietic growth factors, as well as hematopoietic stem cell transplantation (HSCT) [4,11]; the last one remains the only curative treatment for BMF and MDS/AML in FA, but it is not exempt of serious complications, including graft-versus-host disease and secondary malignancies [12]. Moreover, early clonal evolution toward MDS or AML often precedes clinical symptoms, making early detection critical but challenging [13].

MDS is a clonal hematopoietic neoplasm characterized by bone marrow (BM) dysplasia, compromised hematopoiesis and variable risk of progression to AML. One out of three patients diagnosed with MDS will progress to AML, characterized by an increased percentage (>20%) of myeloid blasts in the BM. Despite the high risk of leukemic transformation in FA patients, current monitoring protocols vary between institutions, and the optimal timing or tools for early detection remain under discussion [6,9,14,15].

Common BM surveillance techniques include GTG karyotype and fluorescent in situ hybridization (FISH). Among cytogenetic abnormalities, duplication of chromosome 3q (3q+), deletion of 7q (7q−), and monosomy 7 (−7) are considered high-risk markers for progression to MDS and AML in FA patients [9]. These aberrations not only mark the emergence of MDS but also help identify cases at highest risk for rapid leukemic evolution. Hence, high-risk MDS and AML with MDS-related changes are increasingly recognized as a continuum of disease progression rather than discrete clinical entities [15].

Presence of MDS and AML clones is an indicator for BM transplantation (BMT) in patients with FA, making timely detection of malignant clones a critical step to expedite the BMT preparatory regime [5,16,17]. Although BM karyotype and FISH are highly reliable techniques for the detection of malignant clones, they are time-consuming. In patients with the highest risk of neoplastic transformation, such as patients with FA, the earliest possible detection of abnormal clones is of the utmost importance since rapid evolution of malignant clones in these patients is commonly observed [5,18].

Recent single-cell RNA sequencing (scRNA-seq) technologies have increased our understanding of the transcriptional programs of multiple cancer types at unicellular resolution [1921]. The advent of single-cell profiling and publicly available AML datasets [22,23] can be exploited to understand the spectrum of the MDS-AML myeloid malignancies.

Artificial intelligence (AI) has gained relevance for analyzing large and complex multidimensional datasets. In the field of AI, machine learning encompasses multiple pattern recognition algorithms used for fitting predictive models to data and/or identifying informative groupings within data [24]. Artificial neural networks are models that consist of multi-layered interconnected nodes that, by mimicking the neuronal connectivity of biological brains, learn hierarchical features of data. Each node, situated within a layer, symbolizes a weighted computation of a vector of variables, learning key aspects of the data and transmitting “learned” information between nodes from the network’s input, hidden, and output layers [24].

This deep learning approach allows the utilization of the above mentioned large and complex cancer scRNA-seq datasets to train machine learning algorithms with the capacity to predict malignant cells from complex cell populations, such as the BM of cancer predisposition syndromes, including FA.

In this work we developed and trained a multi-layer deep neural network (DNN) model for predicting and identifying cells with AML-related transcriptional profiles in scRNA-seq datasets from the BM of patients with FA. In this model, the architecture of the DNN consisted of an input layer corresponding to the normalized gene expression matrix, followed by multiple hidden layers with nonlinear activation functions (ReLU), and a final output layer producing binary classification scores [25]. The model was trained using supervised learning with labelled data derived from known AML and healthy scRNAseq profiles, and optimized using a cross-entropy loss function via backpropagation and stochastic gradient descent. Regularization techniques such as dropout and early stopping were employed to prevent overfitting and improve generalization. The predicted-AML cells were found enriched in the lympho-myeloid-primed progenitor (LMPP) and the granulocyte-monocyte progenitor (GMP) compartments, displaying gene expression profiles compatible with malignancy. We further analyzed the gene expression profile of these predicted AML cells and propose some markers for its identification.

Methods

Classifier

Data summary.

To train the model with a solid compendium of single-cell RNAseq data comprising transcriptomic profiles of AML cells, we downloaded count matrices from 16 patients with a confirmed diagnosis of AML, as well as 2 healthy donors with a normal bone marrow, according to van Galen et al (2019, GEO access number GSE116256) [23]. For our query dataset, on the other hand, we used publicly available raw sequencing reads from 6 patients with a confirmed diagnosis of FA and 4 healthy donors with a normal bone marrow, according to Rodríguez et al (2021, GEO access number GSE157591) [26].

Processing of single-cell expression matrices

Raw sequencing data from Rodríguez et al (2021, GEO access number GSE157591) [26] was processed with the Cell Ranger pipeline (v9.0.1) for alignment with the GRCh38 reference genome, barcode demultiplexing, and UMI counting. The resulting count matrices, along with the ones downloaded from van Galen et al (2019, GEO access number GSE116256) [23], underwent standard quality control filters using Seurat (v5.2.1) (min.cells = 3, min.features = 200, max.features = 5000, mt.percent <10%).

Model input formatting

To generate a single dataset to train and validate the model, datasets corresponding to healthy donors and AML patients from van Galen et al (2019, GEO access number GSE116256) along with healthy donors from Rodríguez et al (2021, GEO access number GSE157591) were first merged into a single Python (v3.13.3) object composed of a matrix with cells as rows and genes as columns. Only intersecting genes between the two datasets where kept, these shared genes were taken as the input variables for the model.

The combined dataset was transformed into an annotated object using the AnnData function from Scanpy (v1.11.1), with observations as cells, shared genes as variables and annotations as metadata. Annotations from the van Galen et al (2019) dataset (malignant and healthy) were already available from [23], meanwhile all cells from healthy donors from the Rodríguez et al [26] dataset were annotated as “healthy”.

RNA counts were normalized using the Scanpy (v1.11.1) function normalize_total (target_sum = 1e6) and later log-transformed using the log1p function (default parameters were used). To format the data into the training and testing sets for the model, annotations were categorically encoded into binary values (healthy = 0, malignant = 1) and an array with those values was created using the array function from Numpy (v2.2.5). Both binarized annotations and their corresponding RNA count values were randomly divided into the training sets, accounting for 80% of the data, and the testing sets, with the remaining 20%, using the scikit-learn (v1.10) function train_test_split (shuffle = T) (Fig 1A and 1B).

thumbnail
Fig 1. Development of a single-cell resolution deep neural network (DNN) predictor for AML cells in patients with FA.

(A) Workflow illustrating the preprocessing of publicly available single-cell RNA sequencing (scRNA-seq) datasets from AML patients and healthy donors. A deep neural network (DNN) model was trained to identify AML cells (predicted_AML, and non_AML). The training set was composed of 16 patients with AML and 2 healthy donors from the van-Galen et al. (2019) dataset (GSE116256), as well as 4 healthy donors from the Rodríguez et al. (2021) dataset (GSE157591). The combined datasets were split into two sets, 80% of cells were used for training and 20% were used for validation. (B) The trained model was used to predict AML-like cells, i.e., those with gene expression profiles similar to AML cells, in a dataset composed of 6 patients with FA from the Rodríguez et al. (2021) dataset (GSE157591). Cell annotation and functional analyses were performed in the classified cells.

https://doi.org/10.1371/journal.pone.0340578.g001

Building and testing the model

A deep neural network (DNN) model was built using the deep learning API from Keras (v3.9.0). The model consisted in a sequential layer-based model with 3 hidden layers (Layer 1 = 1000 nodes, Layer 2 = 800 nodes, Layer 3 = 50 nodes) with a sigmoidal type of activation, while the output layer used a softmax activation function. The model was configured using the model.compile function (loss = ‘sparse_categorical_crossentropy’, optimizer = Adam(), metrics=[“accuracy”]). Finally, the model was trained with the 20% of the combined datasets with the model.fit function (batch_size = 120, epochs = 8).

Predicting malignant cells

The query dataset, consisting of the FA patients [26], was formatted as described earlier and their cell types were predicted with the DNN model using the model.predict function, taking the normalized RNA counts as input. Output predictions were renamed from healthy and malignant to non_AML and predicted_AML, respectively.

Cell Annotation, StemNet visualization and differentially expressed genes (DEG) evaluation.

After obtaining predictions, the dataset was transferred to an R environment, where a Seurat object was created using the CreateSeuratObject function. The object consisted of previously normalized RNA counts across all cells and predictions as meta.data.

Following this, the Azimuth (v0.5.0) algorithm, an automated reference-based approach for single-cell annotation, was applied to the Seurat object using the Human – Bone Marrow reference. This reference includes 297,627 bone marrow cells from 39 donors and three different studies [2729], as well as the Human Cell Atlas Immune Cell Census.

STEMNET (Velten et al., 2017) was employed to reconstruct the differentiation trajectory, specifically a gradient commitment toward the lymphoid-myeloid and megakaryocyte-erythroid lineages.

Differential expression analysis between groups was conducted using the DESeq2 R package (version 1.42.0). Raw count data were normalized, and dispersion estimates were calculated by applying the DESeq function. Differentially expressed genes (DEGs) were identified based on an adjusted p-value (Benjamini-Hochberg corrected) threshold of 0.05. To focus on genes with the highest variability, the top 70 genes ranked by variance across samples were selected for heatmap visualization using the pheatmap package (version 1.0.12). Volcano plots were generated utilizing the EnhancedVolcano package (version 1.15.4), plotting log2 fold change (log2FC) against the adjusted p-value to visually represent significantly up- and down-regulated genes. Stringent cutoffs were applied for volcano plot annotation, setting adjusted p-value < 1e-5 and |log2FC| > 1. Key genes of biological relevance, including TP53 and MYC, were highlighted. Custom color schemes were employed to distinguish expression levels, and connectors were drawn to enhance label clarity.

Single cell pathway analysis (SCPA).

SCPA was conducted in RStudio (v2024.12.1 + 563) using the SCPA package (v 1.6.2), which involved extracting log1p normalized data from each relevant population. The pathways used in the analysis were generated from the publicly available molecular signatures database using the msigdbr package (v10.0.2) within R. Comparisons were performed using the compare_pathways function within SCPA, with the only inclusion criteria being gene sets with 15–200 genes [30]. Data processing and visualization was carried out using the Seurat (v5.2.1), ggplot2 (v3.5.2), and ComplexHeatmap(v2.18.0) R packages.

Subsequently, expression levels of genes codifying for cell surface proteins and soluble factors, including immunomodulatory proteins and growth factors, were evaluated across cell types. The SCpubr package (v1.2.0) was used to generate customized boxplots from the normalized expression data of each population. Pairwise comparisons per marker were conducted among healthy cells, FA Non-AML, and FA Predicted-AML cells using the Wilcoxon rank-sum test as implemented in SCpubr. Markers evaluated include LGALS9, CD200, CD74, IL-16 among others. The boxplots were generated without silhouette plots and significance annotations were displayed to prioritize clarity in the visualization of expression distributions. These analyses enabled the assessment of differential expression of specific surface proteins and soluble factors relevant to immune regulation and disease progression.

Code availability

The relevant code supporting the findings of this article is available in the following github repository https://github.com/BMF-CP-Lab/DNN-AML-MDS-classifier

Ethics statement

Publicly available pre-processed scRNAseq datasets were retrieved from public repositories, including van Galen et al (2019, GEO access number GSE116256) [23] and Rodríguez et al (2021, GEO access number GSE157591) [26]. All data were fully anonymized, and the authors did not have access to patient’s identity. This project was approved by the Institutional Review Board (IRB) from the National Institute of Pediatrics in Mexico, under approval number: 2023/003. This IRB is registered by the U.S. Department of Health and Human Services (HHS) under IRB number: IRB00013674.

Results

Using bone marrow derived scRNAseq datasets from healthy donors and patients with AML, we developed and trained a DNN model that predicts, at the single cell level, cells with AML transcriptional profiles (Fig 1A). This model was used to predict the presence of cells with transcriptional profiles resembling AML cells in the BM of patients with FA (Fig 1B) [23]. Our DNN AML classifier displays 96% accuracy, 98% sensitivity and 93% specificity in the prediction of cells with AML-like transcriptional profiles (Fig 2A).

thumbnail
Fig 2. Predicted AML cells are more abundant in the lymphomyeloid lineage of patients with FA.

(A) Confusion matrix of the AML DNN predictor model. (B) StemNet plot showing the trajectory of differentiation of HSPCs from FA patients. The GMP, CLP and LMPP progenitors are the more undifferentiated cell types (as shown by the diamonds in the centre of the plot). (C) StemNet plot projecting the predicted-AML cells in FA patients (red dots). The FA predicted-AML cells are mainly found in the undifferentiated lymphomyeloid compartments (LMPP, GMP and CLP), as indicated by the centrally located red dots in the plot. (D) Bar plot showing the percentage of predicted-AML cells (red stacked bar) per cell type in samples from FA patients. (E) Bar plot showing the number of predicted-AML cells (red stacked bar) per cell type compartment in samples from FA patients. (F) Bar plots showing the number of predicted-AML cells (red stacked bar) per cell type compartment per FA patient.

https://doi.org/10.1371/journal.pone.0340578.g002

Using Azimuth [2729] as reference, we annotated the scRNA-seq dataset from the BM of patients with FA, and obtained different cell types, including Hematopoietic Stem Cells (HSC), Erythroid Megakaryocyte Progenitor (EMP), Lymphoid Primed Multipotent Progenitor (LMPP), Common Lymphoid Progenitor (CLP), Granulocyte Monocyte Progenitor (GMP,) Early Erythroid, Late Erythroid, Progenitor B (pro B), Precursor Plasmacytoid Dendritic Cell (pre-pDC), Precursor Myeloid Dendritic Cell (pre-mDC) and Basophil-Eosinophil-Mast Progenitor (BaEoMa). Then we performed a StemNet representation analysis to display the different progenitors and their maturation state (Fig 2B). We visualized the distribution of the predicted_AML, non_AML, and healthy cells from Rodríguez et al [26] and observed that most of the FA predicted_AML cells appear in the LMPP compartment and were less differentiated (at the center of the StemNet plot) (Fig 2C). We then calculated the percentage of predicted_AML cells per cell type (Fig 2D), the number of predicted_AML cells per cell type (Fig 2E) and the number of cells per cell type and per patient (Fig 2F), in every case comparing with respect to the FA Non-AML cells.

After identification of the FA predicted_AML cells we used pseudo-bulk RNA seq analysis to compare the gene expression profile of the FA predicted_AML cells against healthy cells, the remaining FA non-AML cells, and against AML cells. Interestingly, differentially expressed genes (DEG) obtained through this analysis resulted in the distinction of three gene modules corresponding to the different cell identities. Module A is composed of genes expressed only in healthy cells and downregulated in all FA cells as well as in all AML cells. Module B is a set of genes expressed only in the FA non_AML cells and whose expression is lost in the FA predicted-AML cells. Finally, Module C is composed by genes that are down-regulated in healthy and FA non-AML cells, that start to gain expression in the FA predicted-AML cells and are full blown activated in the AML cells, suggesting an activation gradient as the cells progress from non-malignant towards AML (S1A Fig). Module score analysis with single cell resolution confirmed the downregulation of module A genes in AML cells and in the FA predicted-AML cells (S1B Fig). Module A is composed of potentially tumour suppressor genes like PCDH9 [31,32], aging-related genes like ATP6V1G3 [33,34], genes involved in cell cycle regulation and cancer invasiveness like PKP2 [35], long noncoding RNAs like HCN3 [36], HIST2H2AA4 [37], Six3os [38], and LINC01173 related with homeostasis maintenance [39] and cancer inhibition [34,40,41], pseudogenes in this module with an undefined functions include RN7SL668P, RNA5SP68, RNU6ATAC27P, RP11-78H24.1 and RP11-696F10.1.

We also assessed DEG between the FA predicted_AML cells in comparison to the FA FA non_AML cells. Of note, we obtained a distinct gene expression profile, including increased expression of WISP3, a gene that has been previously associated with aggressive inflammatory breast cancer and breast cancer metastasis [4244], and CCNA1 (Cyclin A1) a canonical cyclin that promotes S and G2 phase progression, previously reported to be overexpressed in up to 82% of AML cells [4548]. We noted also downregulation of immune inhibitory molecules, specifically CTLA4 and LAIR1 (Fig 3A).

thumbnail
Fig 3. Gene expression profile of the predicted AML cells from FA patients.

(A) Volcano plot showing DEG between the FA predicted_AML cells and the FA non-AML cells; pCutoff = 1e-5, FCcutoff = 1. (B) Single-cell pathway analysis (SCPA) showing enrichment of negative regulators of cell death pathways, specifically in the FA predicted_AML cells. (C) SCPA showing enrichment of inflammatory response pathways in sub-compartments of the FA predicted_AML cells.

https://doi.org/10.1371/journal.pone.0340578.g003

Later, using Single Cell Pathway Analysis (SCPA) and GO (Gene ontology) terms (Biological Processes), we aimed to identify coordinated transcriptional changes in biological pathways of interest. The pathway “Negative regulation of cell death” (gene list shown in the S1 Table) was found to be more active in the predicted_AML cells, and more prominently in the LMPP and GMP sub-compartments (Fig 3B). The “Regulation of the inflammatory response” was another pathway (gene list shown in the S2 Table) active in the FA predicted_AML cells, specifically in the LMPP, pre-mDC and BaEoMa subcompartments (Fig 3C). Importantly, promotion of an inflammatory environment is among the most reported mechanisms driven by malignant cells to promote their development and proliferation.

Another mechanism relevant to FA, particularly described in the Japanese population, is the aldehyde degradation deficiency syndrome, in which ALDH2 provides a critical compensatory role in detoxifying formaldehyde when ADH5 is deficient [49]. Interestingly, our expression analysis revealed upregulation of both ALDH2 and ADH5 in FA_predicted_AML cells compared with FA_non-AML and healthy cells (S3 Fig). These transcriptional patterns suggest that the predicted AML-like cells may gain a survival advantage under aldehyde-induced stress in the bone marrow microenvironment.

We next explored the expression of potential surface markers and soluble factors differentially expressed by the FA predicted_AML cells in comparison to the FA non_AML and Healthy cells. We obtained 11 potential surface markers significantly upregulated in the FA predicted_AML cells including CD200, CD99, CD74, HLA-DR/DP/DQ, CXCR4, LAIR1, L-Selectin, P-Selectin, Galectin-9 and PECAM-1 (Fig 4A), previously reported by others authors as surface markers in different cell types [5060] or even to identify leukemic cells [50,52,55,6164]. We obtained also 6 potential soluble factors overexpressed by the FA predicted_AML cells, including TNFSF13B, APP, IL-16, HGF, Pro-granulin and Semaphorin-4 (Fig 4B). TNFSF13B (BAFF), IL-16, HGF and progranulin are known to modulate the immune microenvironment, cell survival, and inflammatory signalling [6570]. During the progression to acute myeloid leukemia (AML), dysregulated expression or secretion of these soluble factors can contribute to immune evasion, support leukemic cell proliferation, and remodelling of the bone marrow niche favouring malignant hematopoiesis [7073]. The soluble nature of these molecules allows them to have long-range effects, amplifying systemic effects that may further disrupt hematopoietic homeostasis and promote different disease manifestations or symptoms [74].

thumbnail
Fig 4. Overexpression of potential cell surface markers and soluble factors in the FA-predicted AML cells.

(A) Boxplots showing overexpression of genes codifying for potential cell surface markers in the FA predicted_AML cells in comparison to healthy cells and other FA cells. (B) Boxplots showing overexpression of genes codifying for soluble factors in the FA predicted_AML cells in comparison to healthy cells and other FA cells. Wilcoxon rank-sum test was performed for comparisons.

https://doi.org/10.1371/journal.pone.0340578.g004

After identifying differential activation of signalling pathways in the FA predicted_AML cells, we evaluated whether these cells could be communicating or interacting with other cell types in the scRNA-seq dataset of FA patients. Using CellChat [75] we inferred interactions occurring among cell types and observed that the predicted_AML LMPP and GMP cell types, were the main interactors with other cell types classified as FA non_AML (Fig 5A and 5B).

thumbnail
Fig 5. Analysis of cellular interactions between predicted_AML cells and non-malignant cell types in patients with FA.

(A) Net Visual circle showing the interactions among healthy progenitors (LMPP and GMP) and other cell types. (B) NetVisual circle showing interactions among the FA predicted_AML cells (LMPP and GMP) and other cell types. The predicted malignant cells are indicated with a red arrowhead (C) NetVisual Bubble plot showing the signaling pathways involved in cell interaction among healthy progenitors (LMPP and GMP) and the other cell types. (D) NetVisual Bubble plot showing the signaling pathways involved in cell interaction among the predicted AML cells (LMPP and GMP) and the other cell types.

https://doi.org/10.1371/journal.pone.0340578.g005

One of the most enriched pathways of intercellular communication is the MIF- (CD74 + CXCR4) pathway (Fig 5C and 5D). MIF-CD74 interaction triggers the activation of pro-survival and proliferative Akt and ERK pathways, both important in tissue repair [76]. Also, this specific interaction has been shown to regulate tumour progression and determines patient’s outcomes in advanced melanoma and tumorigenesis [77,78]. Another enriched pathway was the CD99-CD99 pathway. CD99 is a molecule involved in crucial biological processes, including cell adhesion, migration, death, differentiation and diapedesis [79]. CD99 influences processes associated with inflammation, immune responses and cancer, including lymphoma/leukemia [80] and myeloid malignancies [52]. Finally, the APP-CD74 pathway appears increased in the FA predicted-AML cells, this pathway has been implicated in the production of beta amyloid proteins, but recent studies have reported this interaction to be associated with malignancy, including melanoma and adenoid cystic carcinoma [81,82]. Altogether, these results suggest that the predicted_AML cells, mainly LMPP and GMP sub-compartments, are activating potential early mechanisms associated to malignancy (Fig 5C and 5D).

Although our DNN model was trained on scRNA-seq datasets and we explored its potential to flag cells exhibiting transcriptional features associated with AML, the underlying technology imposes important limitations. The 10x Genomics 3′-capture scRNA-seq platform is optimized for transcriptomic profiling and provides restricted, non-uniform coverage of transcripts, preventing reliable detection of pathogenic variants or somatic mutations—particularly in genes with low or variable expression. [83]. Consequently, mutation-level resolution is beyond the capability of our current model. Accurate identification of genomic alterations, including those relevant to leukemic progression, would require complementary DNA-based approaches such as whole-genome or whole-exome sequencing.

Discussion

FA is a chromosome instability and cancer predisposition syndrome, with an exacerbated risk to develop MDS and AML. We therefore rationalized that cells with gene expression profiles similar to AML could be found in the BM of FA patients even at pre-clinical stages, and that such cells could be detected using AI tools applied to scRNAseq datasets. In this work, using a DNN model we first aimed to predict and identify cells with gene expression profiles similar to bona-fide AML cells, and subsequently we analysed their gene expression profile to identify potential cell surface markers and infer how these cells interact with other cells in the BM microenvironment.

Very importantly, the AML-like cells predicted by our DNN model in FA patients were enriched in the LMPP and GMP compartments, suggesting a very primitive identity and a transcriptional profile that resembles physiological primitive progenitor cells. Of note, others have proposed that these are important compartments for the origin of myeloid malignancy [84]. Our model did not predict AML cells in the HSC compartment, which is probably due to the fact that FA patients have very few of these primitive cells, and therefore their capture with microfluidic single cell technologies was scarce.

Most of the predicted-AML cells were detected in four out of six FA patients (Fig 2F). In patient no. 4, BM cytogenetics at the time of scRNAseq detected a clone with chromosome 7q deletion. This chromosome abnormality is well-known to have a high negative predictive score [16], and this sole abnormality places the patients in the high-risk AML group with worst prognosis [85]. Recent work has found that 7q loss is a common event during the carcinogenesis of FA patients towards AML [17]. In patient no. 1, mildly dysplastic megakaryocytes were detected during routine BM examination, but BM karyotype was reported as normal. Patients no. 2 and no. 5 were also patients with predicted AML cells; however, no cytogenetic clones nor morphological changes were detected in their clinical routine at the moment of scRNAseq. The prediction of AML-like cells in these three patients highlights the relevance of searching novel ways, beyond conventional karyotype and FISH, to identify malignant progression. In patients no. 3 and no. 7 a negligible number of malignant cells was predicted, interestingly however, patient no. 3 was previously found to have a clone with chromosome X trisomy, an abnormality that has not been linked to MDS nor AML in FA [16,17], and might therefore not be of relevance for malignant transformation.

Gene expression analysis gave us a broad idea on the potential cellular mechanisms setting the predicted malignant cells apart from healthy cells. Interestingly, the expression profile of the predicted_AML cells suggests transformation towards malignancy, including changes in immune modulation (downregulation of the immune inhibitory molecules CTLA4 and LAIR1), and changes in molecules associated to tumour progression (increased expression of CCNA1, HLA-C and WISP3) (S1 Fig and Fig 3A). Interestingly, others have reported these molecules as onco-therapeutic targets [45,8689].

Our gene expression analysis concurs with previous reports, where overexpression of CD74, CTLA-4, HLA-C, CD79A, IRF5 and LAG3 has been associated with AML and other types of cancer, indicating their potential participation in tumour development, either as tumour initiators or as immunological checkpoint modulators that allow malignant progression [76,77,9095]. Interestingly, upregulation of CD200, CD99 and PECAM1 in the FA predicted AML cells, in comparison to Healthy or FA non_AML cells (Fig 4A), opens the possibility to discover novel AML-associated markers, especially in the FA context.

SCPA analysis showed increased activation of anti-cell death mechanisms and differential activation of several inflammatory pathways in the FA Predicted_AML cells in comparison to the Healthy or FA Non_AML cells (Fig 3B-3C); highlighting the relevance that inflammation has in these patients, as the absence of FANC proteins leads to increased ROS levels, inflammasome activation and production of inflammatory cytokines [96,97].

Inferring how the predicted-AML cells interact with the rest of the BM cell populations is now possible with tools such as CellChat [75], which allows to explore the interactions among the FA predicted malignant LMPP and GMP cells and the rest of the cells [75]. In this analysis, the main pathway predicted to mediate communication between the FA predicted_AML and the remaining FA non_AML cells is the MIF-CD74 pathway. This pathway is important in the protection against injury and promoting healing in different parts of the body, but also has been reported in some types of cancer such as adenoid cystic carcinoma and melanoma [76,77,81,82]. Communication through the CD99-CD99 pathway was also detected. CD99 has been found to be relevant in lymphoma, leukaemia and myeloid malignancies [98,99]. This pathway is particularly interesting since the expression of CD99 in T cells is sought to detect minimal residual disease in acute lymphoblastic leukemia [80]; and some clinical trials propose CD99 as a therapeutic target in AML [52,100].

To the best of our knowledge, this is the first effort in which publicly available scRNAseq datasets are leveraged for training a machine learning predictor aiming to identify malignant cells in cancer prone bone marrow failure syndromes. This analysis provides insights into the identification of markers for early detection of myeloid malignant cells in the BM of patients with FA. Based on our results we aim to further characterize these AML_predicted cells and propose potential therapeutic strategies that target these malignant cells before full-blown AML occurs.

Our study has limitations. We rely on publicly available scRNA-seq data and their associated metadata, therefore we could not directly correlate our findings with the longitudinal clinical follow-up of the FA patients. The current lack of access to updated or extended clinical outcomes limits the ability to assess the predictive value of the identified AML-like cell populations in disease progression or relapse for these specific patients; however prospective search of AML-cells with markers derived from our predictions are warranted.

Our analyses are also constrained by the technical properties of the 10x Genomics Chromium 3′-end scRNA-seq platform, which captures only the terminal portion of transcripts and provides limited sequencing depth. As a result, full-length coverage of FA genes and cancer-associated genes is not achievable, precluding reliable detection of pathogenic germline variants, secondary somatic mutations (e.g., in TP53), or complex cytogenetic abnormalities. Similarly, the sparsity and dropout inherent to this technology limit the sensitivity of CNV-inference tools and restrict the model’s ability to resolve genotype- or population-specific effects. These constraints underscore that our DNN predictions reflect transcriptional consequences rather than direct genomic alterations and highlight the need for future integration of complementary single-cell DNA or full-length RNA sequencing modalities.

Conclusion

In this work we implemented a DNN machine learning algorithm that was trained using publicly available scRNA-seq datasets for the detection of AML cells. Using this algorithm, we predicted the presence of AML cells in scRNA-seq datasets from the BM of patients with FA. The predicted_AML cells were found enriched in the LMPP and GMP hematopoietic compartments and have gene expression profiles compatible with malignancy. Further experimental approaches that confirm the identity of these predicted malignant cells are warranted.

Supporting information

S1 Fig. Gene modules provide identity to healthy HSPCs, AML cells and FA cells.

(A) Heatmap of differentially expressed genes, identified through pseudo-bulk analysis of the scRNAseq datasets, among healthy cells, FA non-AML cells, FA predicted-AML cells and AML cells. Genes that allow identification of cell types are classified in modules. (B) Module score analysis using scRNAseq data showing average expression of gene modules per cell type.

https://doi.org/10.1371/journal.pone.0340578.s001

(TIFF)

S2 Fig. Expression of FA genes with respect to the mutated FANC gene in FA patients.

(A) Bubble plot showing the average expression of the FA pathway genes per cell type, dividing FA patients according to their germinal inactive gene.

https://doi.org/10.1371/journal.pone.0340578.s002

(TIFF)

S3 Fig. Increased expression of ALDH2 and ADH5 in the FA predicted-AML cells.

(A) Boxplots showing increased expression of ALDH1 in the FA-predicted AML cells in comparison to healthy and FA non AML cells. (B) Boxplots showing increased expression of ADH5 in the FA-predicted AML cells and in the FA non AML cells in comparison to healthy cells.

https://doi.org/10.1371/journal.pone.0340578.s003

(JPEG)

S1 Table. Gene list “Negative regulation of cell death”.

https://doi.org/10.1371/journal.pone.0340578.s004

(XLSX)

S2 Table. Gene list “Regulation of the inflammatory response”.

https://doi.org/10.1371/journal.pone.0340578.s005

(XLSX)

References

  1. 1. Dokal I, Tummala H, Vulliamy T. Inherited bone marrow failure in the pediatric patient. Blood. 2022;140(6):556–70. pmid:35605178
  2. 2. Elghetany MT, Punia JN, Marcogliese AN. Inherited bone marrow failure syndromes: biology and diagnostic clues. Clin Lab Med. 2021;41(3):417–31.
  3. 3. Steinberg-Shemer O, Goldberg TA, Yacobovich J, Levin C, Koren A, Revel-Vilk S, et al. Characterization and genotype-phenotype correlation of patients with Fanconi anemia in a multi-ethnic population. Haematologica. 2020;105(7):1825–34. pmid:31558676
  4. 4. Eghbali A, Safdari SM, Yousefi Roozbahani M, Tavajohi K, Hosseini S. Fanconi anemia: challenges in diagnosis and management - a case series report. Clin Case Rep. 2024;12(11):e9583.
  5. 5. Moreno OM, Paredes AC, Suarez-Obando F, Rojas A. An update on Fanconi anemia: Clinical, cytogenetic and molecular approaches (Review). Biomed Rep. 2021;15(3):74. pmid:34405046
  6. 6. Che R, Zhang J, Nepal M, Han B, Fei P. Multifaceted fanconi anemia signaling. Trends in Genetics. 2018;34(3):171–83.
  7. 7. Romick-Rosendale LE, Lui VWY, Grandis JR, Wells SI. The Fanconi anemia pathway: repairing the link between DNA damage and squamous cell carcinoma. Mutat Res. 2013;743–744:78–88. pmid:23333482
  8. 8. Alter BP, Giri N, Savage SA, Rosenberg PS. Cancer in the National Cancer Institute inherited bone marrow failure syndrome cohort after fifteen years of follow-up. Haematologica. 2018;103(1):30–9. pmid:29051281
  9. 9. Bhandari JTP, Puckett Y. Fanconi Anemia. https://www.ncbi.nlm.nih.gov/books/NBK559133/
  10. 10. Alter BP. Fanconi anemia and the development of leukemia. Best Pract Res Clin Haematol. 2014;27(3–4):214–21. pmid:25455269
  11. 11. Calado RT, Clé DV. Treatment of inherited bone marrow failure syndromes beyond transplantation. Hematology Am Soc Hematol Educ Program. 2017;2017(1):96–101. pmid:29222242
  12. 12. Liu YC, Eldomery MK, Maciaszek JL, Klco JM. Inherited predispositions to myeloid neoplasms: pathogenesis and clinical implications. Annu Rev Pathol. 2025;20(1):87–114.
  13. 13. Jonas BA, Greenberg PL. MDS prognostic scoring systems – past, present, and future. Best Pract Res Clin Haematol. 2015;28(1):3–13. pmid:25659725
  14. 14. Chen J, Kao Y-R, Sun D, Todorova TI, Reynolds D, Narayanagari S-R, et al. Myelodysplastic syndrome progression to acute myeloid leukemia at the stem cell level. Nat Med. 2019;25(1):103–10. pmid:30510255
  15. 15. Zavras PD, Sinanidis I, Tsakiroglou P, Karantanos T. Understanding the continuum between high-risk myelodysplastic syndrome and acute myeloid leukemia. Int J Mol Sci. 2023;24(5).
  16. 16. Behrens YL, Göhring G, Bawadi R, Cöktü S, Reimer C, Hoffmann B, et al. A novel classification of hematologic conditions in patients with Fanconi anemia. Haematologica. 2021;106(11):3000–3. pmid:34196171
  17. 17. Sebert M, Gachet S, Leblanc T, Rousseau A, Bluteau O, Kim R, et al. Clonal hematopoiesis driven by chromosome 1q/MDM4 trisomy defines a canonical route toward leukemia in Fanconi anemia. Cell Stem Cell. 2023;30(2):153-170.e9. pmid:36736290
  18. 18. Chang L, Cui Z, Shi D, Chu Y, Wang B, Wan Y, et al. Polyclonal evolution of Fanconi anemia to MDS and AML revealed at single cell resolution. Exp Hematol Oncol. 2022;11(1):64. pmid:36167633
  19. 19. Del Giudice M, Peirone S, Perrone S, Priante F, Varese F, Tirtei E, et al. Artificial intelligence in bulk and single-cell RNA-sequencing data to foster precision oncology. Int J Mol Sci. 2021;22(9).
  20. 20. Huang G-H, Zhang Y-H, Chen L, Li Y, Huang T, Cai Y-D. Identifying lung cancer cell markers with machine learning methods and single-cell RNA-seq data. Life (Basel). 2021;11(9):940. pmid:34575089
  21. 21. Petti AA, Williams SR, Miller CA, Fiddes IT, Srivatsan SN, Chen DY, et al. A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing. Nat Commun. 2019;10(1):3660. pmid:31413257
  22. 22. Ganan-Gomez I, Yang H, Ma F, Montalban-Bravo G, Thongon N, Marchica V, et al. Stem cell architecture drives myelodysplastic syndrome progression and predicts response to venetoclax-based therapy. Nat Med. 2022;28(3):557–67. pmid:35241842
  23. 23. van Galen P, Hovestadt V, Wadsworth Ii MH, Hughes TK, Griffin GK, Battaglia S, et al. Single-cell RNA-seq reveals aml hierarchies relevant to disease progression and immunity. Cell. 2019;176(6):1265-1281.e24. pmid:30827681
  24. 24. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55. pmid:34518686
  25. 25. Serghiou S, Rough K. Deep learning for epidemiologists: an introduction to neural networks. Am J Epidemiol. 2023;192(11):1904–16. pmid:37139570
  26. 26. Rodríguez A, Zhang K, Färkkilä A, Filiatrault J, Yang C, Velázquez M, et al. MYC promotes bone marrow stem cell dysfunction in fanconi anemia. Cell Stem Cell. 2021;28(1):33-47.e8. pmid:32997960
  27. 27. Granja JM, Klemm S, McGinnis LM, Kathiria AS, Mezger A, Corces MR, et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat Biotechnol. 2019;37(12):1458–65. pmid:31792411
  28. 28. Oetjen KA, Lindblad KE, Goswami M, Gui G, Dagur PK, Lai C, et al. Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry. JCI Insight. 2018;3(23):e124928. pmid:30518681
  29. 29. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888-1902.e21. pmid:31178118
  30. 30. Bibby JA, Agarwal D, Freiwald T, Kunz N, Merle NS, West EE, et al. Systematic single-cell pathway analysis to characterize early T cell activation. Cell Rep. 2022;41(8):111697.
  31. 31. Zhang J, Yang H-Z, Liu S, Islam MO, Zhu Y, Wang Z, et al. PCDH9 suppresses melanoma proliferation and cell migration. Front Oncol. 2022;12:903554. pmid:36452505
  32. 32. Lv J, Zhu P, Zhang X, Zhang L, Chen X, Lu F, et al. PCDH9 acts as a tumor suppressor inducing tumor cell arrest at G0/G1 phase and is frequently methylated in hepatocellular carcinoma. Mol Med Rep. 2017;16(4):4475–82. pmid:28791409
  33. 33. Xue X-P, Sheng Y, Ren Q-Q, Xu S-M, Li M, Liu Z-X, et al. Inhibition of ATP1V6G3 prompts hepatic stellate cell senescence with reducing ECM by activating Notch1 pathway to alleviate hepatic fibrosis. Tissue Cell. 2024;91:102554. pmid:39316936
  34. 34. Saxena V, Arregui S, Zhang S, Canas J, Qin X, Hains DS, et al. Generation of Atp6v1g3-Cre mice for investigation of intercalated cells and the collecting duct. Am J Physiol Renal Physiol. 2023;325(6):F770–8. pmid:37823193
  35. 35. Takahashi H, Nakatsuji H, Takahashi M, Avirmed S, Fukawa T, Takemura M, et al. Up-regulation of plakophilin-2 and Down-regulation of plakophilin-3 are correlated with invasiveness in bladder cancer. Urology. 2012;79(1):240.e1-8. pmid:22119253
  36. 36. Tang W, Dong K, Li K, Dong R, Zheng S. MEG3, HCN3 and linc01105 influence the proliferation and apoptosis of neuroblastoma cells via the HIF-1α and p53 pathways. Sci Rep. 2016;6:36268. pmid:27824082
  37. 37. Wang Y, Tan J, Xu C, Wu H, Zhang Y, Xiong Y, et al. Identification and construction of lncRNA-associated ceRNA network in diabetic kidney disease. Medicine (Baltimore). 2021;100(22):e26062. pmid:34087849
  38. 38. Ramos AD, Diaz A, Nellore A, Delgado RN, Park K-Y, Gonzales-Roybal G, et al. Integration of genome-wide approaches identifies lncRNAs of adult neural stem cells and their progeny in vivo. Cell Stem Cell. 2013;12(5):616–28. pmid:23583100
  39. 39. Fischer M, Riege K, Hoffmann S. The landscape of human p53-regulated long non-coding RNAs reveals critical host gene co-regulation. Mol Oncol. 2023;17(7):1263–79. pmid:36852646
  40. 40. Kui M, Pluznick JL, Zaidman NA. The transcription factor Foxi1 promotes expression of V-ATPase and Gpr116 in M-1 cells. Am J Physiol Renal Physiol. 2023;324(3):F267–73. pmid:36603001
  41. 41. Yin X, Lin H, Lin L, Miao L, He J, Zhuo Z. LncRNAs and CircRNAs in cancer. MedComm (2020). 2022;3(2):e141. pmid:35592755
  42. 42. Kleer CG, Zhang Y, Merajver SD. CCN6 (WISP3) as a new regulator of the epithelial phenotype in breast cancer. Cells Tissues Organs. 2007;185(1–3):95–9. pmid:17587813
  43. 43. Kleer CG, Zhang Y, Pan Q, van Golen KL, Wu Z-F, Livant D, et al. WISP3 is a novel tumor suppressor gene of inflammatory breast cancer. Oncogene. 2002;21(20):3172–80. pmid:12082632
  44. 44. Tran MN, Kleer CG. Matricellular CCN6 (WISP3) protein: a tumor suppressor for mammary metaplastic carcinomas. J Cell Commun Signal. 2018;12(1):13–9. pmid:29357008
  45. 45. Huang W, Pal A, Kleer CG. On how CCN6 suppresses breast cancer growth and invasion. J Cell Commun Signal. 2012;6(1):5–10. pmid:21842227
  46. 46. Goswami M, Hensel N, Smith BD, Prince GT, Qin L, Levitsky HI, et al. Expression of putative targets of immunotherapy in acute myeloid leukemia and healthy tissues. Leukemia. 2014;28(5):1167–70. pmid:24472813
  47. 47. Gaafar A, Sheereen A, Almohareb F, Eldali A, Chaudhri N, Mohamed SY, et al. Prognostic role of KIR genes and HLA-C after hematopoietic stem cell transplantation in a patient cohort with acute myeloid leukemia from a consanguineous community. Bone Marrow Transplant. 2018;53(9):1170–9. pmid:29549293
  48. 48. Khoyratty TE, Udalova IA. Diverse mechanisms of IRF5 action in inflammatory responses. Int J Biochem Cell Biol. 2018;99:38–42. pmid:29578052
  49. 49. Mu A, Hira A, Mori M, Okamoto Y, Takata M. Fanconi anemia and Aldehyde Degradation Deficiency Syndrome: Metabolism and DNA repair protect the genome and hematopoiesis from endogenous DNA damage. DNA Repair (Amst). 2023;130:103546. pmid:37572579
  50. 50. D’Arena G, Vitale C, Rossi G, Coscia M, Omede P, D’Auria F, et al. CD200 included in a 4-marker modified Matutes score provides optimal sensitivity and specificity for the diagnosis of chronic lymphocytic leukaemia. Hematol Oncol. 2018.
  51. 51. Cheong YK, Ngoh ZX, Peh GSL, Ang H-P, Seah X-Y, Chng Z, et al. Identification of cell surface markers glypican-4 and CD200 that differentiate human corneal endothelium from stromal fibroblasts. Invest Ophthalmol Vis Sci. 2013;54(7):4538–47. pmid:23744997
  52. 52. Chung SS, Eng WS, Hu W, Khalaj M, Garrett-Bakelman FE, Tavakkoli M, et al. CD99 is a therapeutic target on disease stem cells in myeloid malignancies. Sci Transl Med. 2017;9(374):eaaj2025. pmid:28123069
  53. 53. Milanezi F, Pereira EM, Ferreira FV, Leitão D, Schmitt FC. CD99/MIC-2 surface protein expression in breast carcinomas. Histopathology. 2001;39(6):578–83. pmid:11903575
  54. 54. Zhang L, Woltering I, Holzner M, Brandhofer M, Schaefer C-C, Bushati G, et al. CD74 is a functional MIF receptor on activated CD4+ T cells. Cell Mol Life Sci. 2024;81(1):296. pmid:38992165
  55. 55. Li H, Cao Z, Liu Y, Xue Z, Li Y, Xing H, et al. Slow-replicating leukemia cells represent a leukemia stem cell population with high cell-surface CD74 expression. Mol Oncol. 2024;18(10):2554–68. pmid:38922758
  56. 56. Martínez-Esparza M, Ruiz-Alcaraz AJ, Carmona-Martínez V, Fernández-Fernández MD, Antón G, Muñoz-Tornero M, et al. Expression of LAIR-1 (CD305) on human blood monocytes as a marker of hepatic cirrhosis progression. J Immunol Res. 2019;2019:2974753. pmid:31019980
  57. 57. Chowdhury RR, D’Addabbo J, Huang X, Veizades S, Sasagawa K, Louis DM, et al. Human coronary plaque T cells are clonal and cross-react to virus and self. Circ Res. 2022;130(10):1510–30. pmid:35430876
  58. 58. Kappelmayer J, Nagy B Jr, Miszti-Blasius K, Hevessy Z, Setiadi H. The emerging value of P-selectin as a disease marker. Clin Chem Lab Med. 2004;42(5):475–86. pmid:15202782
  59. 59. Ding AK, Wallis ZK, White KS, Sumer CE, Kim WK, Ardeshir A. Galectin-3, Galectin-9, and Interleukin-18 Are associated with Monocyte/macrophage activation and turnover more so than simian immunodeficiency virus-associated cardiac pathology or encephalitis. AIDS Res Human Retroviruses. 2024;40(9):531–42.
  60. 60. Paul EN, Carpenter TJ, Fitch S, Sheridan R, Lau KH, Arora R, et al. Cysteine-rich intestinal protein 1 is a novel surface marker for human myometrial stem/progenitor cells. Commun Biol. 2023;6(1):686. pmid:37400623
  61. 61. Angeles-Floriano T, Rivera-Torruco G, García-Maldonado P, Juárez E, Gonzalez Y, Parra-Ortega I, et al. Cell surface expression of GRP78 and CXCR4 is associated with childhood high-risk acute lymphoblastic leukemia at diagnostics. Sci Rep. 2022;12(1):2322. pmid:35149705
  62. 62. Zhang Y, Xue S, Hao Q, Liu F, Huang W, Wang J. Galectin-9 and PSMB8 overexpression predict unfavorable prognosis in patients with AML. J Cancer. 2021;12(14):4257–63. pmid:34093826
  63. 63. Aval OS, Ahmadi A, Hemid Al-Athari AJ, Soleimani Samarkhazan H, Sotudeh Chafi F, Asadi M, et al. “Galectin-9: A double-edged sword in Acute Myeloid Leukemia”. Ann Hematol. 2025;104(6):3077–90. pmid:40341460
  64. 64. Sun X, Huang S, Wang X, Zhang X, Wang X. CD300A promotes tumor progression by PECAM1, ADCY7 and AKT pathway in acute myeloid leukemia. Oncotarget. 2018;9(44):27574–84. pmid:29938007
  65. 65. Mackay F, Schneider P, Rennert P, Browning J. BAFF AND APRIL: a tutorial on B cell survival. Annu Rev Immunol. 2003;21:231–64. pmid:12427767
  66. 66. Kawabata K, Makino T, Makino K, Kajihara I, Fukushima S, Ihn H. IL-16 expression is increased in the skin and sera of patients with systemic sclerosis. Rheumatology (Oxford). 2020;59(3):519–23. pmid:31377804
  67. 67. Spender LC, Cornish GH, Sullivan A, Farrell PJ. Expression of transcription factor AML-2 (RUNX3, CBF(alpha)-3) is induced by Epstein-Barr virus EBNA-2 and correlates with the B-cell activation phenotype. J Virol. 2002;76(10):4919–27. pmid:11967309
  68. 68. Cruikshank WW, Kornfeld H, Center DM. Interleukin-16. J Leukoc Biol. 2000;67(6):757–66.
  69. 69. Jiang L, Meng W, Yu G, Yin C, Wang Z, Liao L, et al. MicroRNA-144 targets APP to regulate AML1/ETO+ leukemia cell migration via the p-ERK/c-Myc/MMP-2 pathway. Oncol Lett. 2019;18(2):2034–42. pmid:31423275
  70. 70. Liu L, Yang L, Liu X, Liu M, Liu J, Feng X, et al. SEMA4D/PlexinB1 promotes AML progression via activation of PI3K/Akt signaling. J Transl Med. 2022;20(1):304. pmid:35794581
  71. 71. Bolkun L, Lemancewicz D, Jablonska E, Szumowska A, Bolkun-Skornicka U, Ratajczak-Wrona W, et al. The impact of TNF superfamily molecules on overall survival in acute myeloid leukaemia: correlation with biological and clinical features. Ann Hematol. 2015;94(1):35–43. pmid:25085377
  72. 72. Smith MA, Smith JG, Pallister CJ, Singer CR. Kinetic characteristics of de novo and secondary AML cells influence their response to haemopoietic growth factor (HGF) priming and correlate with clinical outcome. Leuk Res. 1999;23(11):987–94. pmid:10576502
  73. 73. Weimar IS, Voermans C, Bourhis JH, Miranda N, van den Berk PC, Nakamura T, et al. Hepatocyte growth factor/scatter factor (HGF/SF) affects proliferation and migration of myeloid leukemic cells. Leukemia. 1998;12(8):1195–203. pmid:9697873
  74. 74. Kefaloyianni E. Soluble forms of cytokine and growth factor receptors: mechanisms of generation and modes of action in the regulation of local and systemic inflammation. FEBS Lett. 2022;596(5):589–606. pmid:35113454
  75. 75. Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan C-H, et al. Inference and analysis of cell-cell communication using cellChat. Nat Commun. 2021;12(1):1088. pmid:33597522
  76. 76. Farr L, Ghosh S, Moonah S. Role of MIF Cytokine/CD74 receptor pathway in protecting against injury and promoting repair. Front Immunol. 2020;11:1273. pmid:32655566
  77. 77. Fukuda Y, Bustos MA, Cho S-N, Roszik J, Ryu S, Lopez VM, et al. Interplay between soluble CD74 and macrophage-migration inhibitory factor drives tumor growth and influences patient survival in melanoma. Cell Death Dis. 2022;13(2):117. pmid:35121729
  78. 78. Ghoochani A, Schwarz MA, Yakubov E, Engelhorn T, Doerfler A, Buchfelder M, et al. MIF-CD74 signaling impedes microglial M1 polarization and facilitates brain tumorigenesis. Oncogene. 2016;35(48):6246–61. pmid:27157615
  79. 79. Takheaw N, Earwong P, Laopajon W, Pata S, Kasinrerk W. Interaction of CD99 and its ligand upregulates IL-6 and TNF-α upon T cell activation. PLoS One. 2019;14(5):e0217393. pmid:31120992
  80. 80. Dworzak MN, Fröschl G, Printz D, Zen LD, Gaipa G, Ratei R, et al. CD99 expression in T-lineage ALL: implications for flow cytometric detection of minimal residual disease. Leukemia. 2004;18(4):703–8. pmid:14961034
  81. 81. Anderson AN, Conley P, Klocke CD, Sengupta SK, Robinson TL, Fan Y. Analysis of uveal melanoma scRNA sequencing data identifies neoplastic-immune hybrid cells that exhibit metastatic potential. bioRxiv. 2023.
  82. 82. An P-G, Wu W-J, Tang Y-F, Zhang J. Single-cell RNA sequencing reveals the heterogeneity and microenvironment in one adenoid cystic carcinoma sample. Funct Integr Genomics. 2023;23(2):155. pmid:37162576
  83. 83. Zhang Y, Wang D, Peng M, Tang L, Ouyang J, Xiong F, et al. Single-cell RNA sequencing in cancer research. J Exp Clin Cancer Res. 2021;40(1):81. pmid:33648534
  84. 84. Joudinaud R, Boyer T. Stem cells in myelodysplastic syndromes and acute myeloid leukemia: first cousins or unrelated entities?. Front Oncol. 2021;11:730899. pmid:34490124
  85. 85. Grimwade D, Hills RK, Moorman AV, Walker H, Chatters S, Goldstone AH, et al. Refinement of cytogenetic classification in acute myeloid leukemia: determination of prognostic significance of rare recurring chromosomal abnormalities among 5876 younger adult patients treated in the United Kingdom Medical Research Council trials. Blood. 2010;116(3):354–65. pmid:20385793
  86. 86. Chujan S, Kitkumthorn N, Siriangkul S, Mutirangura A. CCNA1 promoter methylation: a potential marker for grading Papanicolaou smear cervical squamous intraepithelial lesions. Asian Pac J Cancer Prev. 2014;15(18):7971–5. pmid:25292097
  87. 87. da Silva RM, Santos JN, Uno M, Chammas R, Kulcsar MAV, Sant’Anna LB, et al. CCNA1 gene as a potential diagnostic marker in papillary thyroid cancer. Acta Histochem. 2020;122(8):151635. pmid:33007517
  88. 88. Leung WK, Workineh A, Mukhi S, Tzannou I, Brenner D, Watanabe N, et al. Evaluation of cyclin A1-specific T cells as a potential treatment for acute myeloid leukemia. Blood Adv. 2020;4(2):387–97. pmid:31985805
  89. 89. Yoon J. Acute myeloid leukemia is a disease associated with HLA-C3. Acta Haematol. 2015;133(2):164–7. pmid:25278127
  90. 90. Andrews LP, Marciscano AE, Drake CG, Vignali DAA. LAG3 (CD223) as a cancer immunotherapy target. Immunol Rev. 2017;276(1):80–96. pmid:28258692
  91. 91. Sadeghi M, Khodakarami A, Ahmadi A, Fathi M, Gholizadeh Navashenaq J, Mohammadi H, et al. The prognostic and therapeutic potentials of CTLA-4 in hematological malignancies. Expert Opin Ther Targets. 2022;26(12):1057–71. pmid:36683579
  92. 92. Shi A-P, Tang X-Y, Xiong Y-L, Zheng K-F, Liu Y-J, Shi X-G, et al. Immune checkpoint LAG3 and Its ligand FGL1 in cancer. Front Immunol. 2022;12:785091. pmid:35111155
  93. 93. Van Coillie S, Wiernicki B, Xu J. Molecular and cellular functions of CTLA-4. Adv Exp Med Biol. 2020;1248:7–32.
  94. 94. Arber DA, Jenkins KA, Slovak ML. CD79 alpha expression in acute myeloid leukemia. high frequency of expression in acute promyelocytic leukemia. Am J Pathol. 1996;149(4):1105–10. pmid:8863659
  95. 95. Kozlov I, Beason K, Yu C, Hughson M. CD79a expression in acute myeloid leukemia t(8;21) and the importance of cytogenetics in the diagnosis of leukemias with immunophenotypic ambiguity. Cancer Genet Cytogenet. 2005;163(1):62–7. pmid:16271957
  96. 96. Minton K. Inflammation: Inflammatory pathology of Fanconi anaemia. Nat Rev Immunol. 2016;16(6):336–7. pmid:27180812
  97. 97. Repczynska A, Ciastek B, Haus O. New insights into the fanconi anemia pathogenesis: a crosstalk between inflammation and oxidative stress. Int J Mol Sci. 2024;25(21).
  98. 98. Pasello M, Manara MC, Scotlandi K. CD99 at the crossroads of physiology and pathology. J Cell Commun Signal. 2018;12(1):55–68. pmid:29305692
  99. 99. Shastri A, Will B, Steidl U, Verma A. Stem and progenitor cell alterations in myelodysplastic syndromes. Blood. 2017;129(12):1586–94. pmid:28159737
  100. 100. Chung SS, Tavakkoli M, Devlin SM, Park CY. CD99 is a therapeutic target on disease stem cells in acute myeloid leukemia and the myelodysplastic syndromes. Blood. 2013;122(21):2891–2891.