The t(8;21) and Inv(16) translocations disrupt the normal function of core binding factors alpha (CBFA) and beta (CBFB), respectively. These translocations represent two of the most common genomic abnormalities in acute myeloid leukemia (AML) patients, occurring in approximately 25% pediatric and 15% of adult with this malignancy. Both translocations are associated with favorable clinical outcomes after intensive chemotherapy, and given the perceived mechanistic similarities, patients with these translocations are frequently referred to as having CBF-AML. It remains uncertain as to whether, collectively, these translocations are mechanistically the same or impact different pathways in subtle ways that have both biological and clinical significance. Therefore, we used transcriptome sequencing (RNA-seq) to investigate the similarities and differences in genes and pathways between these subtypes of pediatric AMLs. Diagnostic RNA from patients with t(8;21) (N = 17), Inv(16) (N = 14), and normal karyotype (NK, N = 33) were subjected to RNA-seq. Analyses compared the transcriptomes across these three cytogenetic subtypes, using the NK cohort as the control. A total of 1291 genes in t(8;21) and 474 genes in Inv(16) were differentially expressed relative to the NK controls, with 198 genes differentially expressed in both subtypes. The majority of these genes (175/198; binomial test p-value < 10−30) are consistent in expression changes among the two subtypes suggesting the expression profiles are more similar between the CBF cohorts than in the NK cohort. Our analysis also revealed alternative splicing events (ASEs) differentially expressed across subtypes, with 337 t(8;21)-specific and 407 Inv(16)-specific ASEs detected, the majority of which were acetylated proteins (p = 1.5x10-51 and p = 1.8x10-54 for the two subsets). In addition to known fusions, we identified and verified 16 de novo fusions in 43 patients, including three fusions involving NUP98 in six patients. Clustering of differentially expressed genes indicated that the homeobox (HOX) gene family, including two transcription factors (MEIS1 and NKX2-3) were down-regulated in CBF compared to NK samples. This finding supports existing data that the dysregulation of HOX genes play a central role in biology CBF-AML hematopoiesis. These data provide comprehensive transcriptome profiling of CBF-AML and delineate genes and pathways that are differentially expressed, providing insights into the shared biology as well as differences in the two CBF subsets.
Citation: Hsu C-H, Nguyen C, Yan C, Ries RE, Chen Q-R, Hu Y, et al. (2015) Transcriptome Profiling of Pediatric Core Binding Factor AML. PLoS ONE 10(9): e0138782. https://doi.org/10.1371/journal.pone.0138782
Editor: Ken Mills, Queen's University Belfast, UNITED KINGDOM
Received: June 15, 2015; Accepted: September 3, 2015; Published: September 23, 2015
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication
Data Availability: All BAM files have been deposited at The database of Genotypes and Phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap) under substudy, phs000465.v10.p3, TARGET: Acute Myeloid Leukemia (AML).
Funding: This work was funded by a grant from the National Cancer Institute, NCI 5 R01 CA114563-08 (www.nih.gov) and a Scientific Leadership NIH National Clinical Trials Network (NCTN) Grant (U10CA180886).
Competing interests: The authors have declared that no competing interests exist.
Acute myeloid leukemia (AML) is a hematopoietic malignancy defined by genetic (and epigenetic) alterations in hematopoietic stem or progenitor cells that lead to dysregulation of critical signal transduction pathways resulting in clonal expansion without complete differentiation. The genomic landscape of AML is under investigation. Distinct profiles have been discovered for different karyotypes and single-nucleotide polymorphisms (SNPs), revealing the heterogeneity and complexity of AML. This genomic complexity leads to variability in responses to chemotherapy and disparate outcomes. Moreover, we and others have found age-dependent shifts in the genomic abnormalities of AML, some of which [2, 3] may contribute to differential outcomes observed in adult vs. pediatric AML. Although these previous studies have helped us to better understand the correlation between genotypes and phenotypes in AML, a more detailed examination of defined molecular subgroups may yield another level of understanding, which is not readily attainable by examining more molecular diverse AML populations.
Cytogenetic alterations have been shown to play a critical role in the diagnosis of AML. Fusions involving RUNX1-RUNX1T1 and CBFB-MYH11, collectively referred to as core binding factor (CBF) AML, are one of the most frequent and most-studied genomic events in AML[5, 6]. Despite extensive studies into the biologic implications of these fusion transcripts and their use for risk stratification,[7, 8] knowledge of the presence of these fusions has not led to new targeted interventions. Further, despite the fact that t(8;21) and Inv(16) implicate CBFA and CBFB, respectively, and lead to similar clinical outcomes, potential mechanistic similarities and differences remain to be well defined.
RNA-seq for whole-transcriptome sequencing has become a powerful approach for studying mRNA transcripts[9, 10]. In contrast to traditional microarray methods, RNA-seq can identify de novo transcripts that are not represented in the reference genome (i.e., fusion genes) while quantifying previously described reference transcripts and identifying splicing alterations. Recently, several adult AML studies using NGS technologies have been reported. The Cancer Genome Atlas (TCGA) Research Network revealed the genomic and epigenetic landscapes of 200 adult de novo AML patients using whole-genome, whole-exome, RNA, and microRNA sequencing, along with DNA methylation studies. In addition, MacRae et al. used RNA-seq to analyze 55 adult leukemia samples, identifying 119 genes whose expression is more consistent than the commonly used control genes across those leukemia samples. Lilljebjorn et al.  also used RNA-seq to identify fusion genes in adult leukemia patients. In contrast, the study of the pathogenesis of pediatric AML using NGS technologies is still in its earliest stages, and large studies have not extensively evaluated CBF-AML patients using this technology.
In this report, we use whole-transcriptome sequencing to interrogate the transcript profiles for pediatric CBF-AML, comparing these to transcripts from cases with normal karyotype. The results reveal that t(8;21) and Inv(16) translocations aberrantly impact a set of common genes and molecular pathways and there are unique gene-expression signatures, splicing differences, and fusions observed in the CBF subtype.
This cohort includes specimens from 64 patients with de novo AML with either t(8;21), N = 17; Inv(16), N = 14; or normal karyotype (NK), N = 33 treated on Children’s Oncology Group (COG) pediatric AML clinical trials. Patients with NK were selected for those with and without FLT3/ITD Mutation (N = 14 and 19, respectively). Baseline characteristics of the patients are shown in S1 Table.
RNA sequencing in pediatric AML samples
RNA sequencing was performed using the Illumina platform for all 64 samples, with an average of 47 million (27,576,734–91,175,150) reads per sample. Ninety-six percent of these reads were mapped to the human reference sequence (hg19/NCBI Build 37) using the next-generation sequencing (NGS) aligner Novoalign (www.novocraft.com); ~26,000 RefSeq genes were covered by at least one read and ~16,500 RefSeq genes had RPKM (Reads Per Kilobase per Million mapped reads) ≥ 1 (S2 Table). Ninety percent of these mapped reads were located within gene regions, including coding, UTR, and intronic regions, and the distribution was very similar among different cytogenetic abnormalities (Fig 1).
Identification of differentially expressed genes by RNA sequencing
In order to determine differential gene expression patterns specific to different cytogenetic categories, we performed principal component analysis (PCA) (Fig 2A). The PCA using all genes successfully separated out expression profiles for samples with Inv(16), t(8;21), or NK into three distinct clusters, suggesting that cytogenetic abnormalities profoundly affected gene-expression patterns. Two patients with NK had expression profiles that clustered with those with Inv(16). Closer examination of the two cases demonstrated the presence of CBFB-MYH11 through fluorescence in situ hybridization (FISH) in 22% of the studied metaphases in one case. However, the second case did not show CBFB-MYH11 fusions through FISH or real-time polymerase chain reaction (RT-PCR). The only fusion event shared by these two cases was the intergenic fusion NDRG1-ST3GAL1, which was also found in one t(8;21) sample and in one Inv(16) sample, but not in other NK samples.
(A) Principal component analysis for samples with different cytogenetic abnormalities. (B-D) Circular plots were drawn with the in-house software application OmicCircos to represent the t(8;21)-specific, Inv(16)-specific, and normal-specific differentially expressed genes. The track from outside to inside are the symbols of differentially expressed genes with high significance (p-value < 1.0E-08); genome positions by chromosomes (black lines are cytobands); average expression level for the samples with specific cytogenetic abnormalities (yellow); average expression level for the remaining samples (pink); fold change (red: up-regulated; blue: down-regulated); and the p-values associated with the expression patterns between one subtype and the remaining samples.
To identify differentially expressed genes specific to each of the cytogenetic cohorts, we performed differential expression analysis using DESeq package, which uses a model based on the negative binomial distribution with variance and mean linked by local regression. Comparing t(8;21) samples with the remaining samples, a total of 827 t(8;21)-specific genes were found to be differentially expressed with an adjusted p-value (multiple testing using the Benjamini-Hochberg method) of less than 0.05 (Fig 2B). Among these, 365 genes were up-regulated, with the RUNX1T1 gene most significantly up-regulated (p = 2.21x10-31; Table 1). RNA-seq reads were uniquely mapped into the entire coding regions of RUNX1T1 for the 17 samples with t(8;21), with very few reads mapping to this gene in patients with Inv(16) or NK (S1 Fig). Additionally, 462 genes were down-regulated in samples with t(8;21), with the RFX8 gene being the most under-expressed (p = 7.18 x 10−20). Eight of the top 20 under-expressed genes belonged to the HOX family (Table 1).
Similarly, AML samples with Inv(16) displayed 279 genes that were differentially expressed as compared to the remaining samples, with 181 of these genes up-regulated and 98 down-regulated at the adjusted p-value of 0.05 (Fig 2C). Matrix metallopeptidase 14 (membrane-inserted) (MMP14) mRNA was most significantly up-regulated (p = 7x10-22). Conversely, collagen type XXIII, alpha 1 (COL23A1) mRNA was the most significantly down-regulated. Three hundred and eighty normal-specific genes were also found (Fig 2D), indicating a widespread presence of differentially expressed genes among different cytogenetic abnormalities.
RUNX1 binding sites were enriched in differentially expressed genes
RUNX1 has shown to play a crucial role in haematopoiesis during embryonic development  and the two subunits of the core binding factors (CBFs), i.e., CBFA and CBFB, have been suggested to modify the transcriptional regulator functions of AML by either altering the normal RUNX1 transcription program, interfering with the RUNX1 assembly, or recruiting histone deacetylases and inhibiting the RUNX1 activity [19–21]. To study the expression of the RUNX1 targeted genes in the three cytogenetic categories, RUNX1 ChIP-Seq data in the ME-1 cell line were analyzed (GEO accession number GSE46044). 34,654 peaks were identified using HOMER ChIP-Seq analysis package (http://homer.salk.edu/) and 11,844 out of 20,805 (56.9%) ensemble coding genes (GRCh37) were targeted by these ChIP-Seq peaks. Compared with the differentially expressed genes in the three cytogenetic categories, 72.8% of differentially expressed genes in the t(8;21) samples; 73.8% of differentially expressed genes in the Inv(16) samples and 69.0% of differentially expressed genes in the normal samples were targeted by these ChIP-Seq peaks. There is significant enrichment for the RUNX1 binding sites in the differentially expressed genes in these three cytogenetic categories (Binomial test p-value is 2.1E-17, 7.1E-08 and 2.1E-05 respectively; S3 Table).
Genes commonly expressed in CBF AML
Because those cases referred to collectively as CBF AML share a common biology, clinical presentation, and outcome, we inquired whether the two cohorts also shared an expression profile. To this purpose, we detected differentially expressed genes in t(8;21) and Inv(16) using NK cohort as the control. Of the total of 1567 genes that are differentially expressed in all CBF AML cases [1291 in t(8;21) and 474 in Inv(16)], compared to samples with normal karyotype (NK), 198 differentially expressed genes are shared by the two subtypes in CBF AML (Fig 3A): 87 of these genes are up-regulated, 88 are down-regulated in both subtypes (S4 Table), while another 23 genes have opposing expression profiles (down-regulated in one subtype but up-regulated in the other). More genes share expression profiles between these two subtypes (175 / 198; binomial test p-value < 10−30) than do not. Furthermore, the 88 shared down-regulated genes include many HOX genes and are enriched in genes involved in morphogenesis, specifically embryonic skeletal system development (Fig 3B). In contrast, no gene sets are significantly enriched for the 87 shared up-regulated genes.
(A) Differentially expressed genes (red dots in the MA plot) in t(8;21) and Inv(16) vs. those with NK. 198 genes are shared in the two subtypes. (B) Gene Set Enrichment Analysis (GSEA) shows enriched functions for shared down-regulated genes between them.
Expression profile in samples with normal karyotype (NK)
Evaluation of distinct expression profiles of those with NK from those with t(8;21) or Inv(16) identified 175 significantly up-regulated and 205 down-regulated genes. We further studied the expression profiles for those with and without FLT3/ITD mutation in the NK cohort to determine whether FLT3/ITD mutation is associated with a specific expression pattern. Although the expression signature for those with NK was distinct from those with CBF AML, expression PCA failed to define a distinct expression profile for those with and without FLT3/ITD mutation (Fig 4A). HOXB7 was the only gene whose expression was significantly associated with FLT3/ITD mutation (adjusted p-value 0.034) (Fig 4B) with an adjusted p-value < 0.05.
Co-expressed genes define gene networks
Clustering of differentially expressed genes for each cytogenetic abnormality indicates the existence of co-expressed gene networks (Fig 5A, S2A and S2B Fig). To study the functionality of these networks, we calculated the correlation coefficient (R) for each pair of differentially expressed genes in each subtype and used Cytoscape to identify co-expressed genes with R2 > 0.6 and to determine subsets of the co-expressed gene networks with specific molecular functions (Fig 5B, S2C and S2D Fig). Up-regulated genes and down-regulated genes were clustered into different groups for the t(8;21)-specific differentially expressed genes. A sub-group containing 39 genes, located in the group with down-regulated genes, is enriched in the homeobox (HOX) gene family (Fig 5B). The group includes two HOX gene clusters on human chromosomes 7p15 (HOXA) and 17q21 (HOXB), the HOX cofactor myeloid ecotropic viral integration site 1 (MEIS1) and the NK2 homeobox 3 (NKX2-3). MEIS1 is a common leukemic collaborator and NKX2-3 is a homeobox transcription factor. All of these HOX genes were down-regulated in the samples with t(8;21) (Fig 5C). Most HOX genes were also down-regulated in the samples with Inv(16) except for HOXB2, HOXB3, HOXB4 and MEIS1. Furthermore, although samples with NK had higher expression levels in the HOX gene family, most genes in the HOXB gene cluster and NKX2-3 had even higher expression levels when they contained the FLT3-ITD mutation.
(A) Heatmaps showing the clustering of 827 t(8;21)-specific genes in 64 pediatric AML samples. (B) Co-expressed genes were determined based on the coefficient of determination (R2 > 0.6). The co-expression gene networks were generated using Cytoscape 2.8.3. Node color is based on the fold change of the differentially expressed gene (red: up-regulated; green: down-regulated), and node size corresponds to the degree of the node (i.e., the number of edges incident to it). (C) Gene expression of the HOX gene family for three types of cytogenetic abnormalities, where NK is separated into two groups based on the mutation of FLT3/ITD.
Several immunoglobulin-related gene families, including the Leukocyte Immunoglobulin-like Receptor (LIR) gene family and the Immunoglobulin Heavy (IGH) gene family, were also enriched in down-regulated genes for the t(8;21) AML samples, while no molecular processes or functions were enriched in up-regulated genes for the t(8;21) AML samples.
Alternative splicing is common in AML and is affected by acetylation
In addition to evaluating differential expression patterns, we assessed alternative splicing among the three cytogenetic cohorts. We used the Multivariate Analysis of Transcript Splicing (MATS) application to identify alternative splicing characteristic of samples from the cytogenetic subtypes. MATS is a computational tool that uses a statistical model with multivariate uniform prior to detecting differential alternative splicing events using RNA sequencing data. Five different alternative splicing events were detected by MATS, including skipped exon (SE), alternative 5’ splice site (A5SS), alternative 3’ splice site (A3SS), mutually exclusive exons (MXE), and retained intron (RI).
In our study, 337 t(8;21)-specific, 407 Inv(16)-specific, and 272 NK-specific alternative splicing events were detected. Skipped exon (SE), mutually exclusive exons (MXE) and retained intron (RI) seemed to be the predominant alternative splicing events in pediatric AML samples (Fig 6A and 6B). Furthermore, MATS separated all alternative splicing events into inclusion or skipping groups based on whether the alternative exon was included or skipped in the samples. Samples with t(8;21) and NK tended to include the alternative exons (Fig 6A), while samples with Inv(16) were likely to skip them (Fig 6B). 216 genes were affected by t(8;21)-specific alternative splicing events (Most significant events involving genes including RAB10, SERF2, HNRNPC, HNRNPD, HNRPDL, HINT1, NACA, PABPC1, RPL10, RPS12, RPS27, ARPC3, EIF1, STMN1, and ARPC4). 233 and 158 genes were also affected by Inv(16)-specific and normal-specific alternative splicing events, respectively. Gene Set Enrichment Analysis (GSEA) indicated that the majority of the affected genes are acetylated proteins: 127 genes (59%; p-value = 1.5E-51) for t(8;21)-specific; 135 genes (58%; p-value = 1.8E-54) for Inv(16)-specific; and 98 genes (62%; p-value = 1.2E-42) for normal-specific, and are enriched in the KEGG pathway ribosome.
Alternative splicing events detected by MATS for three cytogenetic abnormalities. Five different alternative splicing events were detected: skipped exon (SE), alternative 5’ splice site (A5SS), alternative 3’ splice site (A3SS), mutually exclusive exons (MXE) and retained intron (RI). All events were further separated into two groups based on whether the alternative exon was included (A) or skipped (B) in the samples.
Identification of fusion transcripts in pediatric AML samples
RNA sequencing has also been successfully used to identify gene-fusion events in cancers[26, 27], and many computational tools have been developed to detect these events[11, 28–30]. Most detection methods, however, have proved problematic because of high false-positive rates[31, 32]. We used four gene-fusion detection methods—Defuse, Tophat-Fusion, FusionMap and Snowshoes-FTD to identify gene-fusion events in our pediatric AML samples. The number of putative fusion events identified ranged from 300 to more than 2000 for each detection method, while only a few fusion events (2%–5%) overlapped between any two methods (Fig 7A). To reduce the high rate of false positives, only 69 putative fusion events (Fig 7B; S5 Table), identified by at least two detection methods or by one method with a ChimerDB hit, were used in our study. ChimerDB is a knowledgebase of fusion genes identified using bioinformatics analysis of transcript sequences based on various public resources, including GenBank, the Sanger Cancer Genome Project (CGP), OMIM, PubMed, and the Mitelman database. Fifty-one of the 69 putative fusion events (74%), were intra-chromosomal (Fig 7B), and the remaining 18 (26%) were found in inter-chromosomal junctions. Fifty-nine of the 69 identified fusions involved the coding regions of the affected gene (S4 Table). Eight putative fusion events were found in ChimerDB and six of them were previously reported in AML[34–37], suggesting that the combination of multiple gene-fusion detection methods and ChimerDB can accurately identify fusion events. The CBFB-MYH11 fusion event was identified in 12 out of 14 samples with clinically annotated Inv(16) (Fisher's exact test; p-value = 1.25E-11). Closer interrogation of the two cases without the CBFB-MYH11 fusion event demonstrated the presence of reads consistent with the fusion transcripts, but due to their low coverage, these cases did not meet the statistical threshold for identification. In addition, RUNX1-RUNX1T1 transcript fusions were identified in all 17 samples with clinically annotated t(8;21) (Fisher's exact test; p-value = 2.23E-16), further suggesting that the identification of putative fusion events was accurate.
(A) Gene-fusion events were detected using four gene fusion detection methods. (B) 69 putative fusion events shown in a circular plot. Red: intra-chromosomal fusion event; Blue: inter-chromosomal fusion event. (C) three fusion variants of NUP98. (D) Two in-frame fusions.
In addition to the known fusion transcript for the 31 patients with CBF AML (CBFB-MYH11 and RUNX1-RUNX1T1), an additional 132 fusion transcripts (for 47 fusion events) were identified, including 21 inter- and 111 intra-chromosomal translocations. In 33 patients without known karyotypic alterations (NK), a total of 126 fusion transcripts (for 53 fusion events) was detected, including 30 inter- and 96 intra-chromosomal fusions. In total, 287 fusion transcripts were identified. These included intra-chromosomal junctions of three fusion variants of NUP98 in 6 patients (NUP98-NSD1, N = 4; NUP98-HOXD13, N = 1; and NUP98-HMGB3, N = 1; Fig 7C). Frequent high-confidence in-frame fusions (i.e., PIM3-SCO2, ADSL-SGSM3 and SIDT2-TAGLN (Fig 7D), as well as all NUP98 variants) were confirmed by secondary methodology (PCR and Sanger sequencing).
There were 119 genes involved in the 69 fusion events and gene set enrichment analysis of these 119 genes using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (Huang et al. 2009) indicated that these fusion genes are enriched in genes that code for proteins which are post-translationally modified by the attachment of at least one methyl, phosphate, or acetyl group.
Fusion transcripts resulting from genomic translocations between RUNX1-RUNX1T1 in t(8;21) and CBFB-MYH11 in Inv(16), collectively referred to as CBF AML, have similar clinical outcomes, but the similarities and differences between the two entities have not been studied in detail. We investigated the transcriptome profiles of specimens from children with AML characterized by t(8;21) or Inv(16), and also from a third subset of patients with normal karyotype (NK) in order to define transcript-expression patterns as well as isoforms and pathways that are differentially expressed in these genomic subsets and to delineate the similarities and differences in these groups of patients. Hundreds of differentially expressed genes were found in each cohort indicating a widespread presence of differential expression among different cytogenetic abnormalities. Using the NK cohort as our control, we established expression profiles for each of the two subtypes [i.e., t(8;21) and Inv(16)] in CBF AML to define genes whose expression patterns are shared or differ between the two subtypes. A large number of differentially expressed genes were identified for the subsets of t(8;21) and Inv(16) compared to the NK cohort. We then demonstrated that the majority of the shared differentially expressed genes (175 / 198; binomial test p-value < 10−30) between t(8;21) and Inv(16) are consistent in the two subtypes (genes are up- or down-regulated in both subtypes) and these genes are enriched for a number of HOX genes involved in morphogenesis, specifically the development of the embryonic skeletal system. Additionally, each of the subgroups were enriched for differentially expressed genes that contained RUNX1 binding sites; 72.8% of genes in the t(8;21) samples; 73.8% of genes in the inv(16) samples and 69.0% of genes in the normal samples.
Besides karyotype-specific expression patterns, alternative splicing patterns among the cytogenetic cohorts were assessed. We identified a large number of specific alternative splicing events associated with each cytogenetic subset. Our data demonstrated that several types of splicing exist in our cohorts, with skipped exon (SE), mutually exclusive exons (MXE), and retained intron (RI) predominating. This suggests that splicing patterns in AML could be karyotype-specific. Different splicing events can modulate gene function by introducing or deleting functional gene domains or silencing genes by causing frame-shifts or introducing early-termination codon. Defining the functional isoforms that would be translated from those destined for nonsense-mediated decay is likely to prove critical in distinguishing biologically significant spliced-gene products from non-functional ones. Additionally, we demonstrated that the protein products of a significant majority of alternatively spliced genes are determined to be substrates of post-translational acetylation. Previous studies have demonstrated that acetylation strongly influences alternative splicing[39, 40], implicating acetylation in the karyotype-specific differential expression of alternative splicing. Gene set enrichment analysis of the genes altered by splicing indicated varied members of the KEGG pathway ribosome are involved. This is consistent with previous studies that show the fusion proteins, RUNX1-RUNX1T1 and CBFB-MYH11, to regulate ribosomal RNA transcription [41, 42]. The dysregulation of these processes is thought to aid in the fusion proteins’ role in altering differentiation, proliferation and disrupting normal hematopoiesis.
We further assessed the presence of fusion transcripts found in our study population. Given the high false-positive rate for most current gene-fusion detection methods[10, 31], our integration of four distinct gene-fusion detection methods with ChimerDB, a knowledgebase of fusion genes, allowed us to identify fusion transcripts more accurately than is possible with approaches that use a single method. (We accurately identified 29 out of 31 cases for those known fusion events (CBFB-MYH11 and RUNX1-RUNX1T1) in CBF AML.) In addition to identifying the 29 known fusion transcripts, we detected an additional 258 fusion transcripts (67 fusion events in 115 genes). Fusions of high interest include those involving NUP98, a nucleoporin gene that encodes a building block of the nuclear pore complex which mediates the transport of mediators of cellular function across the nuclear membrane. Three fusion variants of NUP98 were identified (NUP98-NSD1, NUP98-HOXD13, and NUP98-HMGB3). NUP98-NSD1 has been shown to co-occur with the mutation of FLT3/ITD and to be strongly associated with adverse outcomes, though NUP98-HOXD13 and NUP98-HMGB3 are less-known variants whose prevalence and clinical implications are yet to be determined. Further, we identified a large number of intra-chromosomal fusions whose true functionality needs be evaluated. Some of these fusions may result from transcriptional read-throughs that may or may not be functionally significant. Studies of the protein products and functional significance of these lesions is ongoing. Our study also indicates that the identified fusion transcripts are enriched in genes that encode proteins undergoing post-translational modification by acetylation, methylation, or phosphorylation, suggesting potential functional implications. A previous study has shown that chromatin proteins and metabolic enzymes are highly represented in acetylated, methylated, or phosphorylated proteins, suggesting that gene fusions may profoundly affect gene expression and metabolism in pediatric AML subtypes.
This study also highlights the significance of homeobox (HOX) genes in CBF AML. Dysregulation of HOX genes contributes to the perturbation of normal hematopoiesis[45, 46], and the overexpression of HOX genes in hematopoietic cells can contribute to leukemogenesis[47, 48]. Our results demonstrate that the expression levels of all HOX genes were down-regulated in the t(8;21) subtype, whereas in contrast, four HOX genes (HOXB2, HOXB3, HOXB4 and MEIS1) had much higher expression levels in the Inv(16) subtype than that in the t(8;21) subtype. The potential implication of differential HOX expression in CBF AML subtypes may cooperate with the leukemogenic potential of the two fusion events (CBFB-MYH11 and RUNX1-RUNX1T1). Furthermore, given our observations of high HOX expression in patients with FLT3/ITD mutation and previous reports of association of elevated HOX expression with adverse outcomes[8, 50], the hypothesis that HOX expression may mediate the evolution of resistance should be considered.
This study provides comprehensive transcriptome profiles for CBF AML subtypes alpha [t(8;21)] and beta [Inv(16)]. It delineates differential gene-expression profiles, transcript splice isoforms, and fusion transcript profiles for CBF AML subtypes, and it also identifies specific genes and pathways that may provide targets for therapeutic intervention.
Materials and Methods
Pediatric AML samples
64 diagnostic samples derived from either bone marrow (n = 59) or peripheral blood (n = 5) were used in this study. All samples were obtained by written consent from the parents/guardians of minors from three consecutive Children’s Oncology Group clinical trials (CCG-2961, AAML-03P1, and AAML-0531). The Institutional Review Board at Fred Hutchinson Cancer Research Center has reviewed and approved this study. It is filed under protocol 1642 (Biology of the Alterations of the Signal Transduction Pathway in Pediatric Cancer), IR File #5236. Collectively, the percentage of leukemic blasts in the samples is very high with a median of 77.5% (range 40–100%). Age range is 0.83–20.82 years with a median of 12.29 years. Males represent 42 out of 64 (66%) patients.
RNA preparation and sequencing
Genetic material from AML specimens was extracted using AllPrep DNA/RNA Mini Kits (Qiagen, Valencia, CA). At Hudson Alpha Institute (Huntsville, AL), 1 μg of high-quality total RNA was used for the conversion of mRNA into a cDNA library of template molecules based on mRNA capture with poly(T) magnetic beads, fragmentation, and reverse transcription to first-strand cDNA with reverse transcriptase and random primers using Illumina's TruSeq RNA Sample Prep kit (Illumina, San Diego, CA) according to the manufacturer's instructions. After adaptor ligation, each cDNA library was purified and enriched by PCR amplification; the final average fragment size, including adaptors, was 280 bases. Each library was then subjected to 50-cycle paired-end sequencing on the Illumina HiSeq, with four samples multiplexed into each flow cell lane.
Alignment of RNA-sequencing reads to the human genome
Paired-end RNA-sequencing reads were aligned to the human reference genome (hg19/NCBI Build 37). Both the human reference genome and the splicing junction sequences were combined to form the reference sequences using the USeq MakeTranscriptome program, and RNA-sequencing reads were aligned to the whole genome and splice junctions using Novoalign (Novocraft 2010). Novoalign used a structural-variation penalty to determine whether paired-end reads should be reported when they did not form proper fragments. Finally, the aligned reads were sorted and indexed using SAMTools and were stored in the SAM/BAM format. All BAM files have been deposited at The database of Genotypes and Phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap) under substudy, phs000465.v10.p3, TARGET: Acute Myeloid Leukemia (AML).
Identification of differentially expressed genes
Aligned reads were annotated using the HTSeq package (http://www-huber.embl.de/users/anders/HTSeq/). We used the HTSeq-count program to calculate the number of reads mapped in each gene based on the Homo_sapiens.GRCh37.69.gtf annotation file downloaded from ensembl.org. A union overlap resolution mode was used to remove ambiguous reads. DESeq was used to calculate the p-value among samples with different cytogenetic abnormalities for each gene. We applied the Benjamini-Hochberg procedure  to correct multiple testing and reported genes with an adjusted p value < 0.05 as differentially expressed genes (S6 Table).
Gene set enrichment analysis and visualization
We used DAVID to perform gene set enrichment analysis (GSEA) in order to associate molecular functions with the set of differentially expressed genes as well as with sets of alternative splicing and fusion genes. Furthermore, we used OmicCircos and Cytoscape to visualize the results of the analysis.
Detection of alternative splicing among different cytogenetic abnormalities
We used the MATS program to detect five distinct alternative splicing events. Putative alternative splicing events were identified from the RNA-sequencing data using the annotation file that HTSeq had downloaded from ensembl.org. TopHat was used to identify alternative splicing events, MATS was used to calculate the p-value for each alternative splicing event, and the false-discovery rate (FDR) control was applied to find differential alternative splicing events among samples with distinct cytogenetic abnormalities (S7 Table).
Identification of fusion events
Four gene-fusion detection methods—Defuse, Tophat-Fusion, FusionMap and Snowshoes-FTD were used to identify gene-fusion events in the pediatric AML samples. Fusion events identified by more than one methods were chosen as putative fusion events (S8 Table). Moreover, a knowledgebase of fusion genes, ChimerDB, was used to include those fusion events, which were only detected by a gene-fusion detection method. Visualization of fusion events was created using an in-house Perl program, which is based on the GD graphics library and uses UCSC hg19 known gene as gene and exon reference.
RT-PCR validation for putative fusion events
RNA was reverse-transcribed using Thermo Scientific’s Maxima H Minus First Strand cDNA Synthesis Kit (Thermo Fisher Scientific, Pittsburgh, PA). The resulting cDNA was used in PCR amplification of fusion junctions with primers listed in S9 Table. Fusion transcripts were verified by Sanger sequencing.
S1 Fig. An example showing the differential expression of genes between samples with different cytogenetic abnormalities.
RNA-seq reads were mapped to the region of RUNX1T1 for 64 pediatric AML samples (red: high read density; green: low read density).
S2 Fig. Co-expression of Inv(16)-specific and normal-specific genes.
(A-B) Heatmaps showing the clustering of differentially expressed genes among 64 pediatric AML samples for Inv(16)-specific and normal-specific differentially expressed genes. (C-D) Co-expression gene networks for Inv(16)-specific and normal-specific differentially expressed genes. Co-expressed genes were determined based on the coefficient of determination (R2 > 0.6). The co-expression gene network was generated using Cytoscape 2.8.3 (Smoot et al. 2011). Node color is based on the fold change of the differentially expressed gene (red: up-regulated; green: down-regulated) and node size corresponds to the degree of the node (the number of edges incident to the node).
S1 Table. Baseline characteristics of 64 pediatric AML patients.
S2 Table. Summary of mapping data for 64 pediatric AML samples generated by RNA sequencing.
S3 Table. List of Genes with RUNX1 binding sites in each Cytogenetic Group.
S4 Table. Shared down-regulated and up-regulated genes in CBF AML vs. those with normal karyotype (NK).
S6 Table. Differential expression analysis using DESeq.
S7 Table. Multivariate Analysis of Transcript Splicing (MATS).
S8 Table. Gene Fusion events Identified by 2 or more detection methods, or 1 detection method + ChimerDB.
The authors would like to acknowledge the late Robert Arceci, who passed after this manuscript was submitted. Dr. Arceci was a world renowned pediatric oncologist and a pioneer in the field of leukemia. We are saddened that we will not have the opportunity to continue to collaborate with him, but thankful to have had the chance to work with him for so many years. The authors also would like to thank Laura Kay Fleming, Ph.D., NCI Center for Biomedical Informatics and Information Technology, for providing editorial support during the preparation of this manuscript.
Conceived and designed the experiments: SM RER. Performed the experiments: SL. Analyzed the data: CH CN CY QC YH GK DM. Contributed reagents/materials/analysis tools: SM RER SL DM. Wrote the paper: CH RER SM DM DLS FO.
- 1. Schuback HL, Arceci RJ, Meshinchi S. Somatic characterization of pediatric acute myeloid leukemia using next-generation sequencing. Seminars in hematology. 2013;50(4):325–32. pmid:24246700.
- 2. Hollink IH, Feng Q, Danen-van Oorschot AA, Arentsen-Peters ST, Verboon LJ, Zhang P, et al. Low frequency of DNMT3A mutations in pediatric AML, and the identification of the OCI-AML3 cell line as an in vitro model. Leukemia. 2012;26(2):371–3. pmid:21836609.
- 3. Creutzig U, van den Heuvel-Eibrink MM, Gibson B, Dworzak MN, Adachi S, de Bont E, et al. Diagnosis and management of acute myeloid leukemia in children and adolescents: recommendations from an international expert panel. Blood. 2012;120(16):3187–205. pmid:22879540.
- 4. Appelbaum FR, Gundacker H, Head DR, Slovak ML, Willman CL, Godwin JE, et al. Age and acute myeloid leukemia. Blood. 2006;107(9):3481–5. pmid:16455952; PubMed Central PMCID: PMC1895766.
- 5. Ito Y. Structural alterations in the transcription factor PEBP2/CBF linked to four different types of leukemia. Journal of cancer research and clinical oncology. 1996;122(5):266–74. pmid:8609149.
- 6. Mrozek K, Heinonen K, de la Chapelle A, Bloomfield CD. Clinical significance of cytogenetics in acute myeloid leukemia. Seminars in oncology. 1997;24(1):17–31. pmid:9045301.
- 7. Grimwade D, Walker H, Oliver F, Wheatley K, Harrison C, Harrison G, et al. The importance of diagnostic cytogenetics on outcome in AML: analysis of 1,612 patients entered into the MRC AML 10 trial. The Medical Research Council Adult and Children's Leukaemia Working Parties. Blood. 1998;92(7):2322–33. pmid:9746770.
- 8. Thiede C, Steudel C, Mohr B, Schaich M, Schakel U, Platzbecker U, et al. Analysis of FLT3-activating mutations in 979 patients with acute myelogenous leukemia: association with FAB subtypes and identification of subgroups with poor prognosis. Blood. 2002;99(12):4326–35. pmid:12036858.
- 9. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nature reviews Genetics. 2011;12(2):87–98. Epub 2010/12/31. pmid:21191423; PubMed Central PMCID: PMCPmc3031867.
- 10. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews Genetics. 2009;10(1):57–63. pmid:19015660; PubMed Central PMCID: PMC2949280.
- 11. Asmann YW, Hossain A, Necela BM, Middha S, Kalari KR, Sun Z, et al. A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines. Nucleic acids research. 2011;39(15):e100. pmid:21622959; PubMed Central PMCID: PMC3159479.
- 12. Wang X, Cairns MJ. Gene set enrichment analysis of RNA-Seq data: integrating differential expression and splicing. BMC bioinformatics. 2013;14 Suppl 5:S16. pmid:23734663; PubMed Central PMCID: PMC3622641.
- 13. Eswaran J, Horvath A, Godbole S, Reddy SD, Mudvari P, Ohshiro K, et al. RNA sequencing of cancer reveals novel splicing alterations. Scientific reports. 2013;3:1689. pmid:23604310; PubMed Central PMCID: PMC3631769.
- 14. Cancer Genome Atlas Research N. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. The New England journal of medicine. 2013;368(22):2059–74. pmid:23634996; PubMed Central PMCID: PMC3767041.
- 15. Macrae T, Sargeant T, Lemieux S, Hebert J, Deneault E, Sauvageau G. RNA-Seq reveals spliceosome and proteasome genes as most consistent transcripts in human cancer cells. PloS one. 2013;8(9):e72884. pmid:24069164; PubMed Central PMCID: PMC3775772.
- 16. Lilljebjorn H, Agerstam H, Orsmark-Pietras C, Rissler M, Ehrencrona H, Nilsson L, et al. RNA-seq identifies clinically relevant fusion genes in leukemia including a novel MEF2D/CSF1R fusion responsive to imatinib. Leukemia. 2014;28(4):977–9. pmid:24186003.
- 17. Anders S, Huber W. Differential expression analysis for sequence count data. Genome biology. 2010;11(10):R106. pmid:20979621; PubMed Central PMCID: PMC3218662.
- 18. Ichikawa M, Yoshimi A, Nakagawa M, Nishimoto N, Watanabe-Okochi N, Kurokawa M. A role for RUNX1 in hematopoiesis and myeloid leukemia. International journal of hematology. 2013;97(6):726–34. pmid:23613270.
- 19. Mandoli A, Singh AA, Jansen PW, Wierenga AT, Riahi H, Franci G, et al. CBFB-MYH11/RUNX1 together with a compendium of hematopoietic regulators, chromatin modifiers and basal transcription factors occupies self-renewal genes in inv(16) acute myeloid leukemia. Leukemia. 2014;28(4):770–8. pmid:24002588.
- 20. Shigesada K, van de Sluis B, Liu PP. Mechanism of leukemogenesis by the inv(16) chimeric gene CBFB/PEBP2B-MHY11. Oncogene. 2004;23(24):4297–307. pmid:15156186.
- 21. Liu PP, Hajra A, Wijmenga C, Collins FS. Molecular pathogenesis of the chromosome 16 inversion in the M4Eo subtype of acute myeloid leukemia. Blood. 1995;85(9):2289–302. pmid:7727763.
- 22. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27(3):431–2. pmid:21149340; PubMed Central PMCID: PMC3031041.
- 23. Thorsteinsdottir U, Kroon E, Jerome L, Blasi F, Sauvageau G. Defining roles for HOX and MEIS1 genes in induction of acute myeloid leukemia. Mol Cell Biol. 2001;21(1):224–34. pmid:11113197.
- 24. Pabst O, Zweigerdt R, Arnold HH. Targeted disruption of the homeobox transcription factor Nkx2-3 in mice results in postnatal lethality and abnormal development of small intestine and spleen. Development. 1999;126(10):2215–25. pmid:10207146.
- 25. Shen S, Park JW, Huang J, Dittmar KA, Lu ZX, Zhou Q, et al. MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic acids research. 2012;40(8):e61. pmid:22266656; PubMed Central PMCID: PMC3333886.
- 26. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature. 2009;458(7234):97–101. pmid:19136943; PubMed Central PMCID: PMC2725402.
- 27. Pflueger D, Terry S, Sboner A, Habegger L, Esgueva R, Lin PC, et al. Discovery of non-ETS gene fusions in human prostate cancer using next-generation RNA sequencing. Genome research. 2011;21(1):56–67. pmid:21036922; PubMed Central PMCID: PMC3012926.
- 28. McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS computational biology. 2011;7(5):e1001138. pmid:21625565; PubMed Central PMCID: PMC3098195.
- 29. Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome biology. 2011;12(8):R72. pmid:21835007; PubMed Central PMCID: PMC3245612.
- 30. Ge H, Liu K, Juan T, Fang F, Newman M, Hoeck W. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. Bioinformatics. 2011;27(14):1922–8. pmid:21593131.
- 31. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009;461(7265):809–13. pmid:19812674.
- 32. Wang XS, Prensner JR, Chen G, Cao Q, Han B, Dhanasekaran SM, et al. An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer. Nature biotechnology. 2009;27(11):1005–11. pmid:19881495; PubMed Central PMCID: PMC3086882.
- 33. Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, et al. ChimerDB 2.0—a knowledgebase for fusion genes updated. Nucleic acids research. 2010;38(Database issue):D81–5. pmid:19906715; PubMed Central PMCID: PMC2808913.
- 34. Mueller D, Garcia-Cuellar MP, Bach C, Buhl S, Maethner E, Slany RK. Misguided transcriptional elongation causes mixed lineage leukemia. PLoS biology. 2009;7(11):e1000249. pmid:19956800; PubMed Central PMCID: PMC2774266.
- 35. Hollink IH, van den Heuvel-Eibrink MM, Arentsen-Peters ST, Pratcorona M, Abbas S, Kuipers JE, et al. NUP98/NSD1 characterizes a novel poor prognostic group in acute myeloid leukemia with a distinct HOX gene expression pattern. Blood. 2011;118(13):3645–56. pmid:21813447.
- 36. Raza-Egilmez SZ, Jani-Sait SN, Grossi M, Higgins MJ, Shows TB, Aplan PD. NUP98-HOXD13 gene fusion in therapy-related acute myelogenous leukemia. Cancer research. 1998;58(19):4269–73. pmid:9766650.
- 37. Tonks A, Pearn L, Musson M, Gilkes A, Mills KI, Burnett AK, et al. Transcriptional dysregulation mediated by RUNX1-RUNX1T1 in normal human progenitor cells and in acute myeloid leukaemia. Leukemia. 2007;21(12):2495–505. pmid:17898786.
- 38. Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols. 2009;4(1):44–57. Epub 2009/01/10. pmid:19131956.
- 39. Zhou HL, Hinman MN, Barron VA, Geng C, Zhou G, Luo G, et al. Hu proteins regulate alternative splicing by inducing localized histone hyperacetylation in an RNA-dependent manner. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(36):E627–35. pmid:21808035; PubMed Central PMCID: PMC3169152.
- 40. Hnilicova J, Hozeifi S, Duskova E, Icha J, Tomankova T, Stanek D. Histone deacetylase activity modulates alternative splicing. PloS one. 2011;6(2):e16727. pmid:21311748; PubMed Central PMCID: PMC3032741.
- 41. Bakshi R, Zaidi SK, Pande S, Hassan MQ, Young DW, Montecino M, et al. The leukemogenic t(8;21) fusion protein AML1-ETO controls rRNA genes and associates with nucleolar-organizing regions at mitotic chromosomes. Journal of cell science. 2008;121(Pt 23):3981–90. pmid:19001502; PubMed Central PMCID: PMC2904240.
- 42. Lopez-Camacho C, van Wijnen AJ, Lian JB, Stein JL, Stein GS. CBFbeta and the leukemogenic fusion protein CBFbeta-SMMHC associate with mitotic chromosomes to epigenetically regulate ribosomal genes. Journal of cellular biochemistry. 2014;115(12):2155–64. pmid:25079347; PubMed Central PMCID: PMC4199869.
- 43. Thanasopoulou A, Tzankov A, Schwaller J. Potent co-operation between the NUP98-NSD1 fusion and the FLT3-ITD mutation in acute myeloid leukemia induction. Haematologica. 2014;99(9):1465–71. pmid:24951466.
- 44. Zhao S, Xu W, Jiang W, Yu W, Lin Y, Zhang T, et al. Regulation of cellular metabolism by protein lysine acetylation. Science. 2010;327(5968):1000–4. pmid:20167786; PubMed Central PMCID: PMC3232675.
- 45. Rice KL, Licht JD. HOX deregulation in acute myeloid leukemia. The Journal of clinical investigation. 2007;117(4):865–8. pmid:17404613; PubMed Central PMCID: PMC1838955.
- 46. Sitwala KV, Dandekar MN, Hess JL. HOX proteins and leukemia. International journal of clinical and experimental pathology. 2008;1(6):461–74. pmid:18787682; PubMed Central PMCID: PMC2480589.
- 47. Magli MC, Largman C, Lawrence HJ. Effects of HOX homeobox genes in blood cell differentiation. Journal of cellular physiology. 1997;173(2):168–77. pmid:9365517.
- 48. Argiropoulos B, Humphries RK. Hox genes in hematopoiesis and leukemogenesis. Oncogene. 2007;26(47):6766–76. pmid:17934484.
- 49. Kuo YH, Landrette SF, Heilman SA, Perrat PN, Garrett L, Liu PP, et al. Cbf beta-SMMHC induces distinct abnormal myeloid progenitors able to develop acute myeloid leukemia. Cancer cell. 2006;9(1):57–68. pmid:16413472.
- 50. Kottaridis PD, Gale RE, Frew ME, Harrison G, Langabeer SE, Belton AA, et al. The presence of a FLT3 internal tandem duplication in patients with acute myeloid leukemia (AML) adds important prognostic information to cytogenetic risk group and response to the first cycle of chemotherapy: analysis of 854 patients from the United Kingdom Medical Research Council AML 10 and 12 trials. Blood. 2001;98(6):1752–9. pmid:11535508.
- 51. Nix DA, Courdy SJ, Boucher KM. Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC bioinformatics. 2008;9:523. pmid:19061503; PubMed Central PMCID: PMC2628906.
- 52. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. pmid:19505943; PubMed Central PMCID: PMC2723002.
- 53. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. 1995;57(1):289–300.
- 54. Hu Y, Yan C, Hsu CH, Chen QR, Niu K, Komatsoulis GA, et al. OmicCircos: A Simple-to-Use R Package for the Circular Visualization of Multidimensional Omics Data. Cancer informatics. 2014;13:13–20. pmid:24526832; PubMed Central PMCID: PMC3921174.
- 55. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105–11. pmid:19289445; PubMed Central PMCID: PMC2672628.