New Mutations in Chronic Lymphocytic Leukemia Identified by Target Enrichment and Deep Sequencing

Chronic lymphocytic leukemia (CLL) is a heterogeneous disease without a well-defined genetic alteration responsible for the onset of the disease. Several lines of evidence coincide in identifying stimulatory and growth signals delivered by B-cell receptor (BCR), and co-receptors together with NFkB pathway, as being the driving force in B-cell survival in CLL. However, the molecular mechanism responsible for this activation has not been identified. Based on the hypothesis that BCR activation may depend on somatic mutations of the BCR and related pathways we have performed a complete mutational screening of 301 selected genes associated with BCR signaling and related pathways using massive parallel sequencing technology in 10 CLL cases. Four mutated genes in coding regions (KRAS, SMARCA2, NFKBIE and PRKD3) have been confirmed by capillary sequencing. In conclusion, this study identifies new genes mutated in CLL, all of them in cases with progressive disease, and demonstrates that next-generation sequencing technologies applied to selected genes or pathways of interest are powerful tools for identifying novel mutational changes.


Introduction
Chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL) is the most common adult leukemia in the western world. Most CLL cells are inert and arrested in G0/G1 of the cell cycle. However, it is the progressive accumulation of tumor cells that ultimately leads to symptomatic disease. Pathogenic mechanisms involve multiple external events (for instance, microenvironmental and antigenic stimuli) and internal events (genetic and epigenetic) [1] that are associated with the transformation, progression and evolution of CLL.
CLL is a heterogeneous disease. Some patients progress rapidly and have short survival, whereas others have a more stable clinical course that may not need treatment for years.
Previous studies have demonstrated that overexpression of genes belonging to the B-cell receptor (BCR) pathway, and regulators, such as CD79B, BTK, TNFRSF7 (CD27), IKB2 and SYK, are correlated with a more aggressive behavior in CLL [2,3,4,5]. Along with this finding, oncogenic mutations in multiple genes such as CARD11 [6], LYN [7] and A20 [8], among others, are described as causing deregulation in the BCR and related pathways, such as NFkB in B-cell lymphoma.
Several lines of evidence coincide in revealing that stimulatory and growth signals delivered by BCR and co-receptors enable lymphoma cells to avoid apoptosis and so to proliferate, thereby constituting the driving force for B-cell survival. These seem to be common features of lymphoma pathogenesis. It has been proved that maintained signaling mediated via this pathway is an essential mechanism for tumor cell survival in some lymphoma types, such as CLL and diffuse large B-cell lymphoma (DLBCL) [9,10,11]. However, the molecular mechanism responsible for the activation has not been fully clarified. On the basis of the hypothesis that BCR activation may depend on somatic mutations of the BCR genes and related pathways [12], we have used deep sequencing technology to investigate the existence of changes in the coding and regulatory sequences of a selection of 301 genes that are associated with BCR, or related to signaling from the B-cell receptor, such as the NFKB and JAK/STAT pathways.

Results
Clinical data of the 10 patients included in the study are shown in Table 1. It is a heterogeneous group of patients in terms of staging, genetic alterations and prognostic. Four of the patients showed progressive diseases, which is defined as the need for therapy, while stable disease is defined as those cases that do not require therapy.
FISH analysis revealed a variety of cytogenetic abnormalities, including del(13q), del(17p), del(11q) and trisomy 12. The most frequent of these was del(13q), which was found in 6 of the 10 samples.
To explore whether BCR activation depends on somatic mutations in BCR-signaling genes and related pathways, we used single-end deep sequencing to perform a mutational screening of the exonic and regulatory regions of 301 selected genes (Table S1). The study pipeline and DNA sequencing data, including QC metrics, are summarized in Figure S1 and Table S2.
In summary, detected variants were ranked by their VCF QUAL parameter and manually reviewed in the IGV browser (http://www.broadinstitute.org/igv/). The percentage of reads supporting the mutation of the total number of reads at a given position was taken as .20% with a minimum depth of 306 in tumor DNA, and ,0.5% in normal DNA [13]. Selected candidates (non-synonymous, nonsense or frameshift mutations, or that affecting intronic or UTR regions) were validated by PCR amplification of the genomic region and capillary sequencing (File S1). Seven variants were selected for validation, and all of them were validated by capillary sequencing (Table 2 and Figure 1).
Three of these mutations (present in KRAS, SMARCA2 and PRKD3 genes) were non-synonymous point mutations, located in the coding region of the genes. Two were G.A transitions and one was an A.G change. There was also a frame-shift mutation due to a 4-bp deletion (in NFKBIE gene) that resulted in a stop codon 13 aminoacids downstream from it, which would generate a truncated protein. The Ensembl Variant Effect Predictor (VEP) [14] predicted that mutations in two of the genes (KRAS and SMARCA2) were deleterious to protein function ( Figure 2 and Table 2).
The mutation in SMARCA2 lies inside the bromodomain of the protein. Using the NMR structure of the bromodomain (PDB code 2DAT, chain A) we were able to check that the WT residue is placed in one of the loops of the bromodomain, opposite to the region of the helical bundle involved in the binding with the acetyllisine, and so the effect of this mutation is difficult to predict (data not shown).
Finally, there were two point mutations located at intronic sites (STAT6 and ILB1) and one in the 39 UTR (LIFR).
Mutations in coding sequences were detected in the 3 cases with a progressive course. CLL7 and CLL5 (with 2 and 1 different mutations, respectively) exhibited adverse prognosis factors, such as unmutated IGVH (IGHV1-69 and IGHV2-70, respectively) and ZAP-70 positivity; case CLL5 also harbor 17p13 deletion and c-myc amplification. Case CLL9 progressed from stage RAI-BINET A/0 at diagnosis to C/III. This case harbors del(13q) and mutated IGVH3-21, a finding usually associated with a poor prognosis, irrespective of the IGVH mutational status [15] ( Table 1).

Discussion
Based on the hypothesis that BCR activation may depend on somatic mutations of the BCR and related pathways we have performed a complete mutational screening of 301 selected genes associated with BCR signaling and related pathways, such as Table 1. Clinical features, evolution and biological characteristics of the ten CLL patients studied. NFkB, using massive parallel sequencing technology in 10 CLL cases. Three out of four progressive cases included in the study have mutated genes in the coding region. The mutated genes were KRAS, SMARCA2, NFKBIE and PRKD3, all of which play a role in BCR, NFkB and related signaling pathways. The KRAS (v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) protein is a GDP/GTP-binding protein that acts as an intracellular signal transducer, a molecular regulator in a variety of signal transduction pathways, such as BCR signaling pathways ( Figure S2), or controlling phenomena such as cytoskeleton integrity, proliferation differentiation, adhesion, cell migration and apoptosis. From 17 to 25% of all human tumors have the activating KRAS mutation, but it has been described in the hematopoietic and lymphoid tissue of only about 5% of cases (COSMIC database; http://www.sanger.ac.uk/genetics/CGP/ cosmic/). Specifically in CLL, KRAS mutations have been described only exceptionally [16,17]. The mutation identified here, located at codon 60, has not been described before, but is located in the hotspot region of codons 59, 61 and 62. Therefore, the putative effect of this mutation on protein function should be similar, leading to the activation of the protein.

Samples
NFKBIE (nuclear factor of kappa light polypeptide gene enhancer in B-cell inhibitor epsilon) is upregulated following NF-kB activation. NFKBIE is able to inhibit NF-kB-directed transactivation via cytoplasmic retention of REL proteins. For NFKBIE, inactivating mutations have been described in Hodgkin/ Reed-Sternberg cells [18]. In the paper by Quesada et al. [19] a non-sense mutation in a CLL case is also described (chr6: 44230329; C/A) and in the COSMIC database, two mutations are described in breast cancer, one of which is a nonsense mutation, like the one described here. This kind of mutation should give rise to a truncated protein without the ankyrin motif mediating the  protein-protein interaction and would eventually be degraded. The inactivation of NFKBIE by mutation would lead to the sustained activation of the NFKB pathway ( Figure S2), which has previously been linked to CLL pathogenesis [20] SMARCA2 (SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 2), also known as SNF2L2 or BRM, is part of the large ATP-dependent chromatin remodeling complex SWI/SNF, which is required for transcriptional activation of genes normally repressed by chromatin. It has been described as a mediator of NFkB transcriptional activation, and is also associated with BCR signaling mediated by BTK [21,22]. ( Figure S2). Four substitution mutations are reported for the SMARCA2 gene in the COSMIC database and in the ICGC portal (http://dcc.icgc.org/); two mutations were identified and validated in CLL (intronic) and glioblastoma (nonsynonymous coding). The mutation found in the present study is located in the bromodomain, whose function is not completely clear although it may play a role in the assembly or activity of multi-component complexes involved in transcriptional activation. Mutations have been described in other members of the same family, such as ARID1A in around 50% of ovarian clear cell carcinomas and in gastric cancer [13,23,24], and SMARCA4/ BRG1 in lung tumors [25,26]. In both cases, mutations are located throughout the entire gene coding sequence.
PRKD3 (protein kinase D3) belongs to the protein kinase C family of serine-and threonine-specific protein kinases that can be activated after BCR engagement [27] by calcium and the second messenger, diacylglycerol ( Figure S2). It phosphorylates a wide variety of protein targets and is involved in diverse cellular signaling pathways [28]. Only sporadic mutations have been described in the PRKD3 gene in lung adenocarcinoma, ovary and brain glioma (COSMIC and ICGC databases).
The other mutations found in STAT6, IL1B and LIFR, are located in intronic regions and their consequences are therefore not clear.
Given the small number of patients included in the series, this study is unable to conclude statistical significance. To infer a possible role and relevance for each mutated gene described, they need to be validated in a larger series of patients and through functional studies. Nevertheless, the findings are consistent with the idea that, at least partially, BCR and NFkB signaling and related pathways in CLL cells could depend on somatic mutations.
In conclusion, this study identifies new genes mutated in CLL and demonstrates that next-generation sequencing technologies applied to selected genes or pathways of interest are powerful tools for identifying novel mutational changes.

Clinical samples
The current study was approved by the institutional Ethics Committee of the ''Instituto de Salud Carlos III'' (CEI PI 03_2011). Fresh blood samples from 10 untreated CLL patients were obtained from Hospital Universitario Puerta de Hierro-Majadahonda, with the collaboration of the CNIO Tumour Bank, in adherence with the protocols of the Ethics Committee of the Instituto de Salud Carlos III. All patients involved gave written informed consent for their participation in the study. Clinical data were collected according to the criteria of the National Cancer Institute (NCI) Working Group [32].
Tumoral DNA was collected from peripheral blood cells, CD19+ B-cell, then purified using the RossetteSep Human B cell enrichment kit (StemCell Technologies) or an immunomagnetic purification process (Myltenyi Biotec). Tumor cell purity was taken to be the ratio CD19/CD3, which was measured by FACS. The tumor B-cell purity of CLL samples after purification ranged from 88.00 to 99.80%. Non-tumoral DNA from each patient was collected from oral mucosa samples.
DNA was extracted by the phenol-chloroform method and the quality and quantity of purified DNA was assessed by fluorometry (Qubit, Invitrogen) and gel electrophoresis.

Target enrichment and sequencing
Gene selection and bait design. To select genes we browsed the KEGG and BioCarta databases and previously published data on CLL and genes acting on BCR signaling and related pathways, such as NFkB and JAK/STAT [2,3,11,12,20] ( Table S1).
The USCS Genome Browser (http://genome.ucsc.edu/) was used to select exonic sequences. A region 200 bp downstream of the start transcription point was also included. The coordinates of the genomic sequences were based on UCSC hg19.
The web-based design tool eArray (Agilent Technologies Inc, Santa Clara, CA, USA) was used to design the baits for the SureSelect Target Enrichment System kit.
Biotinylated-RNA 120-mer capture probes had sequences complementary to those of the selected regions, and extended 20 bp upstream and downstream of each region. Repeat masked regions were avoided in the design. The targeted region was 1.36 Mbp (up to 0.04% of the human genome), thereby consisting of 4,000 exons and 580 ds regions. 35,985 capture-centered probes were designed, allowing an overlap of 60 bp and a 46 tiling frequency. The average tiling frequency achieved was 3.26, because exons shorter than 120 bp had a unique complementary probe.
Selected sequences were enriched for each library using the SureSelect custom design Kit (Agilent Technologies).
Library preparation and sequencing. The protocol for ''SureSelect Target Enrichment System'' kit (Agilent Technologies) combined with the ''Genomic DNA Sample Prep for singleend sequencing'' guide (Illumina, San Diego, CA, USA) were used for sample sequence enrichment (31) and library preparation. Briefly, 1-3 mg of high molecular weight genomic DNA from each sample was fragmented by acoustic shearing on a Covaris S2 instrument. Fractions of 150-300 bp were ligated to Illumina's adapters and PCR-amplified for 6 cycles. Exon Enrichment: 300 ng of whole library were hybridized to SureSelect Oligo Capture kit for 24 h at 65uC. Biotinylated hybrids were captured, and the enriched libraries were completed with 12 cycles of PCR. The resulting purified DNA library was applied to an Illumina flow cell for cluster generation and sequenced using the Illumina Genome Analyzer IIx for 36 bases in a single-read format by following the manufacturer's protocols.

Data analysis
Alignment, mapping and identification of somatic variants. Sequencing data were first checked by FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) for quality control checks on raw sequence data and then aligned to the human reference genome (GRCh37) using Burrows-Wheeler alignment (BWA). That reads unmapped by BWA [33] were realigned using BFAST [34]. Somatic variants were identified using the Unified Genotyper v2 available at the GATK [35]. For variant calling we used GATK Unified Genotyper v2 applying the ''Discovery'' genotyping mode and default parameters for filtering. Thus, SNPs available at dbSNP 132 (hg19) and those reported by the 1000 Genomes Project were filtered out from VCF output files. The GATK QUAL field was employed for ranking selected somatic variants. The percentage of reads supporting the mutation of the total number of reads at a given position was taken as .20% in the tumor DNA, and ,0.5% in normal DNA [13]. Only those variants showing a depth 306 in tumoral samples, were considered for validation. Data has been deposited in the Sequence Read Archive (SRA) database (http://www.ncbi.nlm. nih.gov/sra) (SRA049097).
Biological impact predictions for detected variants were obtained from Ensembl Variant Effect Predictor (VEP: http:// www.ensembl.org/tools.html) [14]. Figure S1 Workflow. The USCS Genome Browser (http:// genome.ucsc.edu/) was used to select exonic sequences and the web-based design tool eArray (Agilent Technologies) was used to design the baits for the SureSelect Target Enrichment System kit. The library was prepared and sequenced in a GA2 sequencer from Illumina. Sequencing data were aligned to the human reference genome (GRCh37) using Burrows-Wheeler alignment (BWA) and BFAST. Somatic variants were identified using the Unified Genotyper v2 available at the GATK. Somatic variants were filtered by i) known SNPs available at dbSNP 132 (hg19) and those reported by 1000 Genome Project, ii) changes present in the matched germinal DNA and iii) synonymous changes. Ensemble VEP was used to predict biological effect and, finally the variants were ranked by GATK quality score, manually reviewed and validated by capillary sequencing.