Identification of DNA methylation prognostic signature of acute myelocytic leukemia

Background The aim of this study is to find the potential survival related DNA methylation signature capable of predicting survival time for acute myelocytic leukemia (AML) patients. Methods DNA methylation data were downloaded. DNA methylation signature was identified in the training group, and subsequently validated in an independent validation group. The overall survival of DNA methylation signature was performed. Functional analysis was used to explore the function of corresponding genes of DNA methylation signature. Differentially methylated sites and CpG islands were also identified in poor-risk group. Results A DNA methylation signature involving 8 DNA methylation sites and 6 genes were identified. Functional analysis showed that protein binding and cytoplasm were the only two enriched Gene Ontology terms. A total of 70 differentially methylated sites and 6 differentially methylated CpG islands were identified in poor-risk group. Conclusions The identified survival related DNA methylation signature adds to the prognostic value of AML.


Introduction
Acute myeloid leukemia (AML) is a highly aggressive hematologic malignancy characterized by a vast proliferation of immature myeloid blasts that accumulate in the bone marrow and blood. AML is closely correlated to cytokine networks of proliferation, differentiation and apoptosis of leukemic cells [1]. AML is caused by different factors including radiation, mutations and carcinogens [2][3][4][5]. The disease can progress quickly and can become fatal in a short period of time without treatment. Known prognostic factors of AML include age, mutations, complex karyotype, the antecedent hematologic disease, presence of elevated white blood cell counts and prior chemo or radiotherapy for another malignancy [6]. Intensive chemotherapy is initially effective in most patients with AML, however, the surviving LIC clones repopulate the disease and lead to subsequent disease relapse and poor prognosis [7]. In addition, the treatment of elderly patients and patients with relapsed refractory remains a challenge [8][9][10]. Therefore, an improved understanding of the molecular mechanisms underlying of AML could be helpful to improve the treatment efficacy to prolong the survival time for patients.
DNA methylation is an important epigenetic mechanism in regulating gene expression [11]. It is pointed out that epigenetic disturbances have been involved in the pathogenesis of leukemia [12]. Jiang Y et al found that progression from myelodysplastic syndrome to AML was correlated to increased aberrant DNA methylation [13]. Previous studies have investigated genome-wide methylation in AML [14]. In AML, the presence of common methylation patterns in p15 and E-cadherin has been described [15,16]. In addition, methylation of secreted frizzled related protein (sFRP)1, sFRP12, sFRP13 and sFRP15 with corresponding transcriptional silencing has been found in AML cell lines [17]. The survival analysis showed that GATA binding protein 4 (GATA4) promoter methylation was significantly associated with shorter overall survival of pediatric AML [18]. Thus it can be seen that DNA methylation may play a crucial role in the development of AML. In this study, we aimed to find potential survival-related DNA methylation signature in AML, which may pave the way for the development of novel tumor markers and therapeutic targets for AML.

DNA methylation data retrieval and analysis
DNA methylation data were downloaded from the TCGA dataset (http://tcga-data.nci.nih. gov/tcga). The data was derived from blood of AML. Among which, there were 195 samples with DNA methylation information. There were 188 samples with follow-up information. Finally, we selected 182 samples with both DNA methylation information and follow-up information. In order to improve data accuracy, the DNA methalytion sites were first preprocessed. DNA methalytion sites on sex chromosomes were excluded. Considering the heterogeneity of blood, based on the DNA methylation data, blood components were then predicted using the R packages in RefFreeEWAS. Finally, all samples were divided into the training group (127 cases) and validation group (55 cases) randomly. There were no overlapped cases between two groups. The chi-square test and t-test were used to analyze the statistical difference of clinical index between the two groups. And there was no significant difference in age, race, gender, vital status, survival time and disease risk result between the two groups. The clinical characteristics of these two groups were shown in Table 1.

Identification of survival related DNA methylation sites
In order to select the survival associated DNA methylation sites, all DNA methylation sites were analyzed by the single factor Cox proportional hazard (CoxPH) regression after adjustment of age, race, gender, blood constituent and cytogenetic risk. Similarly, after further adjustment of age, race, gender, blood constituent and cytogenetic risk, the multi-factor CoxPH regression analysis was used for identification of DNA methylation signature in the survival evaluation. The statistical significance was set at p<0.001. In order to investigate the characteristic of DNA methylation signature, the Illumina Infinium HumanMethylation450 BeadChips Assay was utilized for DNA methylation sites annotation.

Functional analysis of survival related DNA methylation signature genes
In order to study the biological function of survival-related DNA methylation signature genes, the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed by using the online software GeneCodis3 (http://genecodis.cnb.csic. es/analysis). And the threshold of false discovery rate (FDR) < 0.05 was set as the criteria of statistical significance.

Survival time analysis of DNA methylation signature in the training group and validation group
The risk score (RS) of identified DNA methylation signature was calculated by the following equation: N presents the number of DNA methylation; Methi presents the spectrum of ith DNA methylation site; Ci represents regression coefficient of ith DNA methylation site in the multi-factor CoxPH regression analysis. Kaplan-Meier survival curves were drawn and compared among subgroups using log-rank tests.
In addition, the receiver operating characteristic (ROC) analysis was performed to assess the 5 years' survival value of DNA methylation signature by using pROC package in R language. The area under the curve (AUC) under binomial exact confidence interval was calculated and the ROC curve was generated.

Identification of risk related DNA methylation sites
In order to identify the risk-related DNA methylation sites between patients with poor-risk (38 cases) and favorable-risk (35 cases), the related DNA methylation data were downloaded. We selected CpG sites based on differential methylation value calculated as mean (β case) − mean (β normal) (Δβ) combined with the false discovery rate (FDR) values. Finally, the threshold of |Δβ|>0.2 and FDR<0.05 was set as the criteria of statistical significance.

Survival related DNA methylation signature
After original data preprocess, a total of 3884 DNA methylation sites were identified in the training group. These DNA methylation sites were used for the single factor and multifactor CoxPH regression analysis. The result showed that 8 DNA methylation sites were identified. 8 DNA methylation sites were located in 4 CpG islands (chr12:81102034-81102716, chr17:78863569-78863813, chr3:10183305-10183941 and chr6:29600192-29600661) and 6 genes (MYF6, RPTOR, MMP10, SH3PXD2B, VHL and GABBR1). The annotation of 8 DNA methylation sites was shown in Table 2.

Functional annotation of survival related DNA methylation signature genes
In order to further study the biological function of survival-related DNA methylation signature genes (MYF6, RPTOR, MMP10, SH3PXD2B, VHL and GABBR1), GO and KEGG functional annotation were performed. The result showed that only 2 GO terms were obtained. Protein binding was the most significantly enriched molecular function (FDR = 0.0103025) involving RPTOR, VHL and SH3PXD2B; cytoplasm (FDR = 0.00664319) was the most significantly enriched cellular component involving RPTOR, GABBR1, VHL and SH3PXD2B. Enriched GO terms of survival related DNA methylation signature genes were shown in Table 3.

Survival time analysis of DNA methylation signature
In order to explore the association between the risk score and identified DNA methylation signature, the clustering analysis map of methylation value in the DNA methylation signature sites was performed (Fig 1). In the risk score calculation, the patients were dichotomized into either low risk or the high risk group. In the training group, a highly significant difference was observed between the high risk and the low risk group (p = 1.1e-06), which was shown in Fig  2. When the same DNA methylation signature equation was applied to the validation group, a similar significant difference was also observed between the high risk and the low risk group (p = 0.054), which was shown in Fig 3. Additionally, we performed 5 years' survival analysis of DNA methylation signature by ROC and calculated the AUC to assess the discriminatory ability of DNA methylation signature in the training and validation group, respectively (Figs 4 and 5). The AUC of the DNA methylation signature in the training group was 0.8441558, and the validation was 0.8170732. Our result suggested that the DNA methylation signature could be the prognosis model for predicting the survival situation of AML.

Identification of risk related DNA methylation sites
In order to identify the DNA methylation sites between patients with poor-risk and favorablerisk, the differentially methylated sites and CpG islands were analyzed based on the threshold of |Δβ|>0.2 and FDR<0.05. The result showed that 70 differentially methylated sites (64 hypermethylation and 6 hypomethylation sites) and 6 differentially methylated CpG islands (3 hypermethylation and 3 hypomethylation CpG islands) were identified. The top 20 differentially methylated sites in the poor-risk and favorable-risk group were shown in Table 4.

Discussion
AML is the most common form of adult leukemia and the survival rate is very low [19][20][21]. Therefore, it is urgent to understand the pathological mechanism and find potential survival  1. The clustering analysis map of methylation value in the DNA methylation signature sites. Diagram presents the result of a two-way hierarchical clustering of DNA methylation sites and risk score. The clustering is constructed using the complete-linkage method together with the Euclidean distance. Each row represents a DNA methylation site, and each column, a risk score value. The risk score clustering tree is shown on the right. The colour scale illustrates the relative value of the risk score: red, below the reference channel; green, higher than the reference. related genes in the development of AML. In this study, we found a DNA methylation signature involving MYF6, RPTOR, MMP10, SH3PXD2B, VHL and GABBR1, which could be a valuable tool in guiding treatment decisions for AML. Myogenic factor 6 (MYF6, also called MRF4 or Herculin) is expressed in skeletal muscle and is related to myogenesis [22][23][24][25]. It is found that the MYF6 is a differentially methylation gene in plasma cf-DNA in the different stage of hepatocellular carcinoma development [26]. In addition, the methylation frequency of MYF6 in stage I non-small cell lung cancer is obviously higher than that of non-cancerous lung disease control, which suggested that MYF6 could offer a specificity and a sensitivity in the stage I non-small cell lung cancer diagnosis [27]. Thus it can be seen that MYF6 methylation was associated with the development of cancer. In this study, we first found the DNA methylation of MYF6 in the blood of AML. Our result showed that MYF6 was significantly associated with survival time of AML and could be a diagnostic and prognostic marker of AML.
Regulatory associated protein of MTOR complex 1 (RPTOR) is a signal transduction gene important for hematopoiesis [28]. RPTOR, mechanistic target of rapamycin kinase and MTOR associated protein (mTOR), LST8 homolog (MLST8) constitute the core subunits of the mammalian TORC1 complex which play an important role in controlling cell growth, survival and metabolism and is often deregulated in cancer [29][30][31][32]. It is reported that RPTOR is hypermethylated in human hepatocytes [33]. Previous reports have demonstrated that RPTOR is methylated gene in breast tumors [34,35]. This suggested that RPTOR methylation may play an important role in the process of cancer. In the present study, we found that RPTOR was methylated in the blood of AML and significantly associated with the survival time of AML patients. Our result suggested that RPTOR played a crucial role in AML and may be a diagnostic and prognostic marker in the process of AML.
Matrix metallopeptidase 10 (MMP10), an enzyme promoting angiogenesis, promotes cell growth and invasion, and exerts anti-apoptotic property in vitro [36,37]. It is noted that MMP10 is essential to the tumor microenvironment. In colorectal cancer, MMP10 is up-regulated in cancerous tissue and adversely associated with patients survival [38,39]. In cutaneous melanoma, MMP10 is potentially modified by modification of histone acetylation [40]. In head and neck squamous cell cancer, MMP10 is a differentially methylated gene in radiationsensitive and -resistant tumors [41]. In immature teratomas, reduced methylation of MMP10 significantly enriched in axonal guidance signaling pathway [42]. Previous reports indicated that the epigenomics changes of MMP10, especially DNA methylation may be involved in the occurrence of cancer. Herein, we found the methylation of MMP10 in the blood of AML. Moreover, MMP10 was related to the survivability of AML patients. Our result may provide a new field in the diagnosis and prognosis of AML.  . Herein, we also found the relationship between VHL and AML. Our result showed that VHL was methylated and was significantly associated with survival time of AML patients.
Gamma-aminobutyric acid type B receptor subunit 1 (GABBR1) encodes the G protein-coupled receptor that can form the heterodimer with GABAB receptor 2, which triggering the proliferation, differentiation and migration of cancer cells. It is showed that multiple loci of GABBR1 within 6p21.3 are related to nasopharyngeal carcinoma [52,53]. GABBR1 is a different methylated gene in plasma cf-DNA in different early stage of hepatocellular carcinoma development [26]. In addition, GABBR1 is associated with the survival time of patients with gastric cancer [54]. It is also a survival-associated methylation marker for oral squamous cell carcinoma [55]. Thus it can be seen that GABBR1 methylation play a crucial role in various cancers. In the present study, we found that GABBR1 was methylated and associated with AML survival. It suggested that GABBR1 may be a survival-associated methylation marker for AML.
Previous studies have reported that the 5-year survival rate was 55% for patients with favorable cytogenetics, 24% for patients with intermediate risk, and 5% for patients with poor-risk cytogenetics [56]. Thus it can be seen that the survival time was associated with AML risk. In this study, we identified numbers of survival-associated differentially methylated genes such as LRPAP1, MAEA and TNF between poor-risk and favorable-risk patients.
LDL receptor related protein associated protein 1 (LRPAP1) is a gene that involved in the cell proliferation in cancer [57]. It is noted that LRPAP1 is a biologically relevant gene found in leukemia and was associated with different biological processes including cell apoptosis, signaling pathway and cell cycle checkpoint [58,59]. Macrophage erythroblast attacher (MAEA) is a 36-kD transmembrane protein that expressed by erythroblasts and macrophage cells and plays a crucial role in hematopoiesis [60,61]. It is reported that MAEA is a differentially methylated gene in Type-2 diabetes and idiopathic pulmonary fibrosis [62,63]. Tumor necrosis factor (TNF) is a proinflammatory cytokine. It has been identified that TNF is elevated in serum of patients with aplastic anemia and myelodysplastic syndromes, suggesting that the hematopoietic repressive activity of TNF may contribute to the cytopenic phenotype of these patients [64][65][66][67]. Interestingly, TNF levels are significantly higher in the peripheral blood of AML patients of M3, M4, and M5 subtypes when compared with healthy donors [68]. Furthermore, the increased level of TNF is associated with poor prognosis of patients with AML, especially older adults [7,[69][70][71]. Our result further indicated the important role of LRPAP1, MAEA and TNF in AML.

Conclusions
In a word, we have identified and successfully validated a DNA methylation signature in patients with AML. This signature adds to the potential predictive role in the survival time of However, there is a limitation to our study. In the present study, we didn't perform the deeper mechanism study based on the identified methylated genes. Some animal model and cell experiments are further needed to explore the potential mechanism of AML.

Author Contributions
Data curation: Ling Ye.