Complex Patterns of Chromosome 11 Aberrations in Myeloid Malignancies Target CBL, MLL, DDB1 and LMO2

Exome sequencing of primary tumors identifies complex somatic mutation patterns. Assignment of relevance of individual somatic mutations is difficult and poses the next challenge for interpretation of next generation sequencing data. Here we present an approach how exome sequencing in combination with SNP microarray data may identify targets of chromosomal aberrations in myeloid malignancies. The rationale of this approach is that hotspots of chromosomal aberrations might also harbor point mutations in the target genes of deletions, gains or uniparental disomies (UPDs). Chromosome 11 is a frequent target of lesions in myeloid malignancies. Therefore, we studied chromosome 11 in a total of 813 samples from 773 individual patients with different myeloid malignancies by SNP microarrays and complemented the data with exome sequencing in selected cases exhibiting chromosome 11 defects. We found gains, losses and UPDs of chromosome 11 in 52 of the 813 samples (6.4%). Chromosome 11q UPDs frequently associated with mutations of CBL. In one patient the 11qUPD amplified somatic mutations in both CBL and the DNA repair gene DDB1. A duplication within MLL exon 3 was detected in another patient with 11qUPD. We identified several common deleted regions (CDR) on chromosome 11. One of the CDRs associated with de novo acute myeloid leukemia (P=0.013). One patient with a deletion at the LMO2 locus harbored an additional point mutation on the other allele indicating that LMO2 might be a tumor suppressor frequently targeted by 11p deletions. Our chromosome-centered analysis indicates that chromosome 11 contains a number of tumor suppressor genes and that the role of this chromosome in myeloid malignancies is more complex than previously recognized.


Introduction
Hematological malignancies are broadly categorized into myeloid and lymphoid malignancies, depending on the hematopoietic lineage involved. This study focused on myeloid malignancies, in particular the disease entities acute myeloid leukemia (AML), chronic myeloid leukemia (CML), myelodysplastic syndromes (MDS) as well as the three classical myeloproliferative neoplasms (MPNs) polycythemia vera (PV), essential thrombocythemia (ET) and primary myelofibrosis (PMF). MDS and MPN are in most cases stable, chronic diseases. A fraction of patients, however, develop signs of disease progression such as myelofibrosis or elevated numbers of hematopoietic progenitors in peripheral blood referred to as "accelerated phase". A transformation to post-MPN or post-MDS AML marks the final stage of the disease and is associated with a very bad prognosis [1]. Genetic aberrations involving chromosome 11 have been widely reported across all hematological malignancies. Translocations of chromosome 11q affecting the 11q23 region have been intensely studied since the late 1970s when the first translocation between chromosomes 11 and 4 was described in acute lymphoblastic leukemia (ALL) [2]. In 1991 the gene that was affected by these translocations on chromosome 11 was identified to be MLL (myeloid/lymphoid or mixed-lineage leukemia) [3]. These translocations t(4;11) led to the formation of a fusion gene of MLL and AF4 (ALL1-fused gene from chromosome 4; current official symbol AFF1) on chromosome 4 [4]. Since then a variety of translocations involving MLL and more than 60 fusion gene partners have been identified. They are found both, in ALL and AML with a high prevalence in infants [5]. In addition to translocations, partial tandem duplications of MLL have also been described in AML [6,7]. The internal tandem duplications of MLL most often span between exon 3 and exons 9-11 [8], and show a strong association with chromosome 11q trisomies [7]. Classical karyotyping has revealed chromosomal deletions as common genetic changes in chronic lymphoid leukemia (CLL), AML, MDS and other hematological malignancies. A frequently deleted region mapped to 11q23 [9]. In recent years the upcoming of single nucleotide polymorphism (SNP) microarrays has allowed the detection of chromosomal gains and losses at a much higher resolution than with classical cytogenetics. Acquired copy number neutral loss of heterozygosity (LOH) associated with uniparental disomies (UPD), which were previously undetectable by classical cytogenetics, are now recurrently found in hematological malignancies. The first large study in AML using SNP microarrays identified chromosomal aberrations of all three types across the whole genome [10]. We, alongside others, reported such studies in the myeloproliferative neoplasms (MPN) [11][12][13][14]. All of these studies observed frequent aberrations on chromosome 11 including gains, losses and UPDs. UPDs were shown to somatically amplify mutant alleles of genes on various chromosomal arms such as 9p (JAK2), 1p (MPL) or 4q (TET2) [15][16][17][18][19][20][21][22][23]. On the short arm of chromosome 11, mutant alleles of WT1 were associated with UPDs in AML [24], while CBL mutations were associated with UPDs on chromosome 11q in several hematological malignancies [25][26][27]. CBL encodes an E3 ubiquitin ligase that attaches ubiquitin to a number of membrane-associated and cytosolic proteins (such as Flt3, Kit, Jak2 and Mpl) and targets them for degradation [28,29]. In this study, we present a systematic analysis of chromosome 11 in a set of 813 samples across different myeloid malignancies. We used high resolution SNP microarrays and whole exome sequencing to identify novel genetic aberrations of chromosome 11 in myeloid malignancies. We were able to detect commonly aberrant regions on this chromosome and to identify potential target genes of large aberrations.

Chromosome 11 aberrations in myeloid malignancies
In order to systematically analyze chromosome 11 aberrations in myeloid malignancies, we combined data from a total of 813 blood samples that were genotyped at highresolution with Affymetrix Genome-Wide Human SNP 6.0 microarrays. This cohort included 180 de novo acute myeloid leukemia (AML), 62 chronic myeloid leukemia (CML), 101 myelodysplastic syndrome (MDS), 244 polycythemia vera (PV), 118 essential thrombocythemia (ET) and 108 primary myelofibrosis (PMF) samples (Table 1, Figure 1A). For PV, ET, PMF and MDS, the majority of samples were in chronic phase of the disease, some samples were taken when patients showed signs of disease progression or had transformed to post-chronic phase AML as outlined in Table 1. Chromosome 11 aberrations were detected in 52 of 813 samples (6.4%) ( Table 1 and Figure 1B). The 52 samples were from 50 patients, for 2 patients we had 2 samples from different disease stages (Table S1). The samples harbored between 1 to 3 genetic changes on chromosome 11, except for sample 42 which had a complex aberration of chromosome 11 (Table S1). Excluding sample 42, we detected a total of 30 deletions, 11 gains and 17 UPDs ( Figure 2 and Table S1). In MPN, aberrations of chromosome 11 significantly associate with post-MPN AML compared to chronic phase MPN (P<0.0001, Fisher's exact test, Figure 1B). MPN patients that exhibited myelofibrosis or were in the accelerated phase of the disease but had not fully transformed to post-MPN AML (<20% of blasts in peripheral blood or bone marrow) were regarded as chronic phase patients in this analysis. This finding indicates that genes located on chromosome 11 contribute to disease progression if mutated. Associations of chromosome 11q losses of heterozygosity with disease progression or poor prognosis have been described previously in B cell chronic lymphatic leukemia [30] or neuroblastoma [31]. Abnormalities of chromosomal band 11q23 were associated with a poor outcome in infant acute lymphoblastic leukemia (ALL) [32].

CBL is a frequent target of chromosome 11q aberrations
We found that UPDs of chromosome 11q are the most recurrent defects in our dataset. A number of studies have shown that 11q UPDs are associated with mutations of the CBL gene (ensembl gene ID: ENSG00000110395) [25][26][27]. Mutations of CBL have been described to cluster within exons 8 and 9 or their exon-intron junctions [25][26][27]. Therefore, we sequenced these two exons of CBL in all samples that harbored chromosomal aberrations overlapping the CBL locus.
Of the 14 patients that had 11q UPDs, we detected SNVs in 9 patients (Table S1). One patient (sample 45) harbored a 6 bp tandem duplication ( Figure 3A). Out of 6 patients that had 11q gains overlapping CBL, one had a somatic mutation in CBL (C384Y in sample 44). PCR subcloning revealed that the gain amplifies the mutant allele (data not shown). No mutations were detected in the 7 patients with deletions overlapping CBL.
For the patients where we had control tissue available, the somatic origin of the variants detected in CBL was confirmed (Table S1). In order to identify mutations in other exons of CBL or in other genes that potentially associate with 11q aberrations we performed whole exome sequencing on three samples with 11q uniparental disomies (samples 30, 36 and 50) and two samples with 11q gains (samples 42 and 43) which did not have mutations in exons 8 and 9 of CBL. Only one of these samples (36) showed a mutation in CBL at the 3' splice site of exon 7 (Table S1). The variant was somatic and independently validated by Sanger sequencing. Fisher's exact test). A particularly interesting case in this set of patients had two mutations of CBL affecting exon 8 (W408C) and the 5' splice site of exon 9 ( Figure 3B). Both mutations were somatic and PCR subcloning revealed that these two mutations were on independent DNA strands. As all the bacterial clones analyzed contained only one of the mutations and no clones with wild type CBL were detected, we concluded that the patient harbors a compound heterozygous progenitor clone with distinct mutations of both CBL alleles ( Figure 3B).

Mechanisms that increase mutant CBL dosage
Our data on CBL suggest that there are several different genetic mechanisms for how the malignant clone can increase the mutant CBL allele dosage ( Figure 3C). The first mechanism is via mitotic recombination resulting in UPD. The second mechanism is the mutant allele by duplication. Another possibility is the inactivation of wild type alleles by two independent point mutations (compound heterozygosity). Interestingly, it seems possible that loss of a single CBL allele (haploinsufficiency) might be oncogenic as 7 patients in our cohort carried hemizygous CBL deletions ( Figure 2). In support of this hypothesis, heterozygous Cbl deficiency in mice showed accelerated blast crisis compared to Cbl wild type animals in a BCR-ABL transgenic murine model [27]. In addition, hemizygous deletions of CBL have been shown by others in MDS and related disorders [33].

Mutation of DDB1 associated with 11q UPD
Recently, mutations in the splicing factor 1 gene SF1 (ensembl gene ID: ENSG00000168066) and a member of the polycomb complex 2 (EED -ensembl gene ID: ENSG00000074266) were found in myeloid malignancies [34,35]. Both genes are located on chromosome 11q ( Figure  2). We did not find any mutations in these two genes by either whole exome or Sanger sequencing of EED and the C-terminal proline-rich region of SF1 that was found to be the mutational hotspot of the gene [34]. All samples that had aberrations spanning the two loci were analyzed (Figure 2). We performed whole exome sequencing of samples 30, 36, 42, 43 and 50 and attempted to identify genes other than CBL that might be associated with aberrations of chromosome 11q (Table S2). In two of the patients (samples 30 and 36), we performed a paired analysis as whole exome sequenced T lymphocyte DNA was available as germline control (samples 30c and 36c). In sample 30 we did not find any somatic mutations with an allelic frequency > 50%, which is expected for variants within the fully clonal 11qUPD region (data not shown). In addition to the somatic mutation in CBL described above, sample 36 also harbored another somatic mutation in DDB1 (ensembl gene ID: ENSG00000167986) ( Figure 4A). The CBL and DDB1 mutations in sample 36 were validated by Sanger sequencing and shown to be homozygous and fully clonal ( Figure 4A). Both mutations were also detected in an earlier sample of the same patient (sample 23). Sample 23 harbored an 11qUPD in a subclone and accordingly, the mutations in CBL and DDB1 were not fully clonal (data not shown). The Polyphen2 tool used to predict functional effects of human non-synonymous single nucleotide variants estimated the variant in DDB1 to be "probably damaging" with the highest probability score of 1. DDB1 was originally identified in patients suffering from Xeroderma pigmentosum, with inherited deficiency in nucleotide excision repair (NER). The gene was cloned together with its binding partner DDB2, with which it forms the DDB protein complex [36]. Later, DDB1 was found to form an E3 ubiquitin ligase complex together with CUL4A, ROC1 and a variable fourth protein that determines the target specificity of the E3 ligase. Overall, more than 30 different proteins have been identified as binding partners [37]. The ubiquitination activity of DDB1-CUL4A-ROC1 complexes has been shown to not only play important roles in NER [38] but also in regulating the expression of the tumor suppressor CDKN2A [39]. CDKN2A gene expression is associated with histone 3 -lysine 4 (H3K4) trimethylation mediated by the MLL-RBBP5-WDR5 complex. RBBP5 and WDR5 are two of the binding partners of the DDB1-CUL4A-ROC1 complex. DDB1 expression is required, together with MLL, for proper CDKN2A transcriptional activation [39]. Thus, inactivating mutations of DDB1 are likely to contribute to cancer not only by impairing NER, but also by preventing the transcription of tumor suppressor genes. It remains to be seen if the described example of a concerted action of DDB1 and MLL is unique or if there is a systematic relationship between these two genes that might play a role in hematologic malignancies.
In the remaining three samples that were whole exome sequenced (samples 42, 43 and 50) we identified a number of SNVs and small indels that we could validate by Sanger sequencing (Table S3). Only one gene appeared to be recurrent in this dataset, HEPHL1. Two patients (samples 42 and 50) harbored both an SNV in the HEPHL1 gene as indicated in Table S3. The function of HEPHL1 is not known. As we did not have control tissue available from these patients, we were unable to identify the somatic or germline origin of these variants.

Tandem duplication in exon 3 of MLL associated with 11q UPD
In order to find small scale genetic alterations that are either too small to be detected by Affymetrix microarrays or too large to be detected by standard exome sequencing pipelines, we analyzed exome coverage data that we gained after alignment of the short sequence reads to the human reference genome. We compared the coverage data of each of the five exome datasets to a set of control samples to identify regions of focal deletions or gains on chromosome 11q. In sample 50, we were able to detect a focal amplification in exon 3 of MLL (ensembl gene ID: ENSG00000118058) ( Figure 4B). Independent analysis by Sanger sequencing revealed a 513 bp tandem duplication in MLL exon 3. This duplication translates to an inframe duplication of 171 amino acids from position 528 to 698 of the MLL protein (uniprot ID Q03164-1) ( Figure 4B). We did not have control tissue of this patient available to confirm the somatic origin of this duplication. However, the duplication was not present in 196 control subjects ruling out the possibility of a common germline polymorphism. Tandem duplications in MLL have been described but usually affect the region from exon 3 to exon 9, 10 or 11 [8]. Small tandem duplications such as the 513 bp within exon 3 detected in our study have not been reported so far.

Chromosome 11p defects associate with de novo AML or target LMO2
On chromosome 11p, we identified a total of 4 CDRs ( Figure  2). The most telomeric CDR contained 14 genes. Interestingly, we found a significant association of aberrations spanning this CDR with de novo AML compared to secondary AML (P = 0.013). It is likely that one or more of the genes in this region play a particular role in de novo AML pathogenesis. The most centromeric CDR on the short arm of chromosome 11, defined by a deletion in sample 39 contains the LMO2 gene (ensembl gene ID: ENSG00000135363). In sample 32, where we detected a deletion spanning the LMO2 locus (Table S1), we also found an SNV in LMO2 in the remaining allele (c.G388A; p. G130S, Uniprot ID P25791-3) that was hemizygous in Sanger sequencing traces ( Figure 4C). The Polyphen2 tool estimated the variant to be "probably damaging" with the highest probability score of 1. Due to lack of control tissue in this patient we could not analyze the somatic or germline origin of this SNV. Based on the available data we postulate that there is a full loss of LMO2 activity in this patient. We tested all other patients with aberrations overlapping the LMO2 locus, but were unable to find any mutations in the coding region or at splice sites of LMO2 (data not shown). The deletions were detected across several different pathologies. LMO2 is frequently involved in translocations in T-cell leukemia [40]. It is expressed in different fetal tissues [41] and the full knockout in the mouse is known to be embryonic lethal [42]. Warren et al. showed that LMO2 is essential for erythroid development in the mouse. Deficiency in erythropoiesis was detected at E9.75. They confirmed by in vitro differentiation assays that this defect is intrinsic to the hematopoietic system and specific for the erythroid lineage [42]. Interestingly the patient in our study showed anemia with an hemoglobin level of 97 g/L at the time of sampling.

Concluding remarks and perspectives
In this study we applied a chromosome centered genetic analysis of myeloid malignancies. The rationale of this approach is that those chromosomes that exhibit frequent chromosomal defects might also harbor point mutations in the target genes of deletions, gains or UPD. Combining SNP microarray analysis and exome sequencing may increase the likelihood of identification of novel tumor suppressor genes or oncogenes. Applying this approach we systematically analyzed We found two somatic mutations in DDB1 and CBL. As can be seen in the Sanger sequencing traces, both mutations are homozygous due to amplification by the UPD. B: In sample 50 a tandem duplication in MLL exon 3 was detected. The top graph shows whole exome coverage data across MLL exon 3. The data is plotted as the log 2 ratio of the normalized exome sequencing coverage in the patient sample divided by the median normalized coverage of 8 independent control samples at each genomic position (X-axis). The position of the duplication is indicated by the red bar. Sanger sequencing confirmed an in-frame tandem duplication of 171 amino acids as shown at the bottom. C: A common deleted region on chromosome 11p targets LMO2. All deletions in the analyzed cohort that span the LMO2 locus are depicted next to the chromosome 11 ideogram. Red bars indicate deletions, green bars indicate gains. In sample 42, which harbored a deletion spanning the LMO2 locus, we also detected a point mutation in LMO2. The middle section shows a signal intensity plot measuring copy number from Affymetrix microarrays. The plot depicts signal intensity (log 2 scale) differences between the patient and a healthy control pool for each probe (as implemented in the Affymetrix Genotyping Console software). The deletion in sample 42 can be seen as the deviation from 0 for all probes in the deleted genomic region (X-axis). The point mutation in LMO2 as identified by Sanger sequencing is depicted at the bottom of panel C. A, B and C: Depicted are the genomic (letters) as well as the respective amino acid (box chains) sequences. Numbers above the boxes indicate amino acid positions in the proteins. Amino acids substituted in the patient samples are indicated by red boxes. The red circle indicates a splice site mutation. Reference and mutant sequences are shown. The arrows indicate the site of mutations below the Sanger sequencing traces. doi: 10.1371/journal.pone.0077819.g004 chromosome 11 in myeloid malignancies and detected a large complexity of genetic aberrations especially in patients with AML (de novo or secondary to MPN and MDS). The various genetic lesions of chromosome 11 in myeloid malignancies target CBL, MLL, DDB1, LMO2 and possibly other tumor suppressor genes that we could not identify in this study. The marked cytogenetic complexity associated with AML points towards a highly individual course of disease progression in each patient and might explain the current difficulty in treating patients that have transformed to AML.
Our data indicates that genetic stratification of patients into comparable groups at advanced disease stage will be extremely challenging or impossible due to highly individual mutagenesis profiles. Despite individual mutagenesis profiles, it is possible that common molecular features may emerge (based on gene expression and/or protein phosphorylation profiles). Systems level approaches may help in overcoming this obstacle of genetic heterogeneity, opening up the possibility of targeted therapies in the future. Based on current knowledge, treatment efforts in the chronic phase of myeloid malignancies should not only focus on correction of blood counts but also focus on prevention of disease progression as therapeutic intervention in advance disease stages are predicted to be difficult as the genetic complexity of tumors reach an immense scale.

Ethics statement
Peripheral blood samples were collected from patients after written informed consent. Sample collection was approved by local ethics committees. These were the "Ethik Kommission der Medizinischen Universität Wien" for samples collected in Austria, the "Comitato di Bioetica" for samples collected at the Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Policlinico San Matteo, Pavia, Italy, the "Local Ethical Committee of Azienda Ospedaliera-Universitaria Careggi, Firenze" for samples collected at the University of Florence, Italy, the Ethics Committee of University Hospital Brno for samples collected at the Masaryk University Brno, Czech Republic, and the "Eticki odbor Klinickog centra Srbije" for samples collected at the University of Belgrade, Serbia.

Patient samples
We analyzed a total of 813 samples from 773 patients. For 40 patients we had two samples available, which were in all cases from two different disease stages. Detailed information on the studied cohort is provided in Table 1. Genomic DNA was isolated from whole blood, granulocytes or mononuclear cell fractions according to standard procedures. For a subset of patients we had control tissue DNA available, extracted from either buccal mucosa cells, T lymphocyte fractions of peripheral blood or cultured skin fibroblasts.

Microarray analysis and whole exome sequencing
The genomic DNA was processed and hybridized to Genome-Wide Human SNP 6.0 arrays (Affymetrix, Santa Clara, CA) according to the manufacturer's instructions. Chromosomal copy number changes and UPDs were detected using the Genotyping Console version 3.0.2 software (Affymetrix).
For samples 30 and 36 where control tissue DNA was whole exome sequenced (samples 30c and 36c) we performed an analysis for somatic mutations by using the VarScan2 software with default parameters [46] starting from the post -processed alignment files generated by GATK.
For samples 42, 43 and 50 the final variant lists of the GATK Unified Genotyper were filtered for single nucleotide variants (SNVs) and indels on chromosome 11 that were passing filter criteria according to the GATK best practice guidelines v3 and that were not annotated in dbSNP137. Gene annotation was done using the ANNOVAR tool version 2012-02-23 [47].
The raw data of microarray analysis and whole exome sequencing are deposited in the ArrayExpress database under the accession numbers E-MTAB-1845 and E-MTAB-1850, respectively.

Coverage analysis from whole exome sequencing data
The analysis was performed for the five tumor samples, which had been whole exome sequenced. Samtools 0.1.18 [48] was used with the "depth" option to retrieve coverage data for chromosome 11 from the post -processed alignment files generated by the GATK analysis pipeline. The coverage for each base on chromosome 11 in a particular patient was normalized by the summarized coverage of all bases of chromosome 11 in that particular patient. The normalized coverage of sample 30 was compared to the median normalized coverage of a set of 5 independent control samples that had been processed and whole exome sequenced with similar chemistry and instrumentation as sample 30. A similar adequate control set of 8 independent control samples was generated for samples 36, 42, 43 and 50. All of the control samples used showed wild-type chromosome 11 as analyzed by Genome-Wide Human SNP 6.0 arrays (Affymetrix, data not shown).

PCR, Sanger sequencing, PCR subcloning
Primers for PCR were designed using the Primer 3 tool (http://www.bioinformatics.nl/cgi-bin/primer3plus/ primer3plus.cgi) or the ExonPrimer tool (http://ihg.gsf.de/ihg/ ExonPrimer.html) except for the primers amplifying CBL exons 8 and 9 which were taken from a publication by Sanada et al [27]. Primer sequences and PCR conditions are listed in Table  S4. PCRs were performed using the AmpliTaq Gold DNA Polymerase with Gold Buffer and MgCl2 solution (Applied Biosystems / Life Technologies, Paisley, UK) or the AmpliTaq Gold 360 Mastermix (Applied Biosystems). Sanger sequencing was performed using the BigDye Terminator v3.1 Cycle Sequencing kit and the 3130xl Genomic Analyzer (Applied Biosystems). Sequence analysis was done using the Sequencher Software 4.9 (Gene Codes, Ann Arbor, MI). For PCR product subcloning the TOPO Cloning Kit (Invitrogen / Life Technologies, Paisley, UK) was used according to manufacturer's instructions. PCR products derived from single bacterial clones were sequenced as described above.

Statistical analysis and plots
Fisher's exact tests were performed using Graphpad QuickCalcs (www.graphpad.com/quickcalcs). The plots depicting cohort distributions in Figure 1 were done using R version 2.8.1 (2008-12-22) [49]. The coverage plot in Figure 4B and the signal intensity plot in Figure 4C were done using GraphPad Prism version 5.0d for Mac OS X, GraphPad Software (San Diego, CA), www.graphpad.com.