Single-Cell Analysis and Next-Generation Immuno-Sequencing Show That Multiple Clones Persist in Patients with Chronic Lymphocytic Leukemia

The immunoglobulin heavy chain (IGH) gene rearrangement in chronic lymphocytic leukemia (CLL) provides a unique molecular signature; however, we demonstrate that 26/198 CLL patients (13%) had more than one IGH rearrangement, indicating the power of molecular technology over phenotypic analysis. Single-cell PCR analysis and next-generation immuno-sequencing identified IGH-defined clones. In 23% (18/79) of cases whose clones carried unmutated immunoglobulin heavy chain variable (IGHV) genes (U-CLL), IGH rearrangements were bialleic with one productive (P) and one non-productive (NP) allele. Two U-CLL were biclonal, each clone being monoallelic (P). In 119 IGHV-mutated (M-CLL) cases, one had biallelic rearrangements in their CLL (P/NP) and five had 2–4 distinct clones. Allelic exclusion was maintained in all B-clones analyzed. Based on single-cell PCR analysis, 5/11 partner clones (45%) reached levels of >5x109 cells/L, suggesting second CLL clones. Partner clones persisted over years. Conventional IGH characterization and next-generation sequencing of 13 CLL, 3 multiple myeloma, 2 Waldenstrom’s macroglobulinemia and 3 age-matched healthy donors consistently identified the same rearranged IGH sequences. Most multiple clones occurred in M-CLL, perhaps indicative of weak clonal dominance, thereby associating with a good prognosis. In contrast, biallelic CLL occurred primarily in U-CLL thus being associated with poor prognosis. Extending beyond intra-clonal diversity, molecular analysis of clonal evolution and apparent subclones in CLL may also reflect inter-clonal diversity.


Introduction
Chronic lymphocytic leukemia (CLL) is characterized by a monoclonal B-cells having a unique immunoglobulin heavy chain (IGH) gene rearrangement. Mutational status of the clonotypic immunoglobulin heavy variable (IGHV) gene stratifies CLL patients into two groups. In about 60% of cases the IGHV is mutated (M-CLL) while 40% are in germline configuration (U-CLL). In general, patients with U-CLL have a worse prognosis than those with M-CLL. The cellular origin(s) of CLL clone remains unresolved but recent DNA methylation studies have suggested that the U-CLL cell is more similar to a naïve B-cell, with M-CLL being similar to a memory Bcell [1].
Multiple productive IGH rearrangements (P) have been reported in a subset of CLL [2]. It is unclear whether these are derived from distinct/unrelated clones or if two productive rearrangements arise in a single B-CLL cell. The rule of allelic exclusion demands that each cell harbors only one productive rearrangement. If the first attempt at IGH rearrangement fails, the second allele is then allowed to rearrange; if the second allele fails to yield a productive rearrangement, the B-cell dies. A previous study suggested that CLL cells may not follow this rule and the presence of two productive IGH rearrangements in a single cell could result from IGHV gene replacement [3,4]. A more recent study however suggested that multiple productive IGH rearrangements in CLL may represent multiple independent clones, as suggested by light chain restriction or phenotype [5]. In support of this latter hypothesis are the observations that, by immunophenotyping, biclonal CLL is seen in a small percentage of patients [5][6][7][8][9][10][11]. In addition, unique molecular and cytogenetic features characterized phenotypically distinct clones coexisting in MBL, CLL and other B-cell lymphoproliferative disorders [12,13]. In spite of these collective data, the absence of single-cell analysis (SCA) in most studies has made it difficult to pinpoint the distinct clones especially those minor but still frequent clones that are likely to be missed by phenotyping, or clones that cannot be distinguished phenotypically.
Aberrant and recurrent mutations have been reported in multiple genes using conventional Sanger sequencing as well as genome-wide next-generation sequencing, suggesting that certain recurrent mutated genes contribute to clonal evolution and disease progression in CLL [14][15][16]. Given that even very small sub-clones appear to have a significant negative impact on outcome [17], this may be clinically important. And while it is believed that these subclones are related to the primary CLL clone, recent studies suggest that they may reflect small secondary clones which have a survival and growth advantage over the primary clone [5].
In the present study, we molecularly determined the incidence of multiple productive rearrangements in CLL, their clonal origin and their persistence throughout the course of disease. CLL patients identified as harboring more than one IGH rearrangement were analyzed to determine whether this represented bialleic rearrangements in the same host cell or distinct Bcell clones (bi-or multiclonality). Partner clones were confirmed using next-generation IGH sequencing (NGS) and their frequencies among B-cells were verified using SCA. For this cohort of patients, we found that the rules of allelic exclusion were maintained in all clones analyzed. Partner clones arose in both U-CLL and M-CLL, with a trend towards multiple clones among patients with M-CLL. In contrast, monoclonal disease with biallelic IGH typically arose in U-CLL. For patients with multiple independent clones, the partner clones were detected among very large numbers of the "primary" CLL clone, indicating that their frequencies exceed that of any normal B-cell population. Some partner clones exceeded 5x10 9 cells/L and were persistent over time and with treatment. Thus, in addition to potential intra-clonal diversity, molecular analysis of clonal evolution and apparent subclones in CLL may also reflect interclonal diversity. Biology Tumor Archive. Three age-matched healthy donors (HD) were anonymous. Three multiple myeloma (MM) [19] and two Waldenstrom's macroglobulinemia (WM) [20] were from the Cross Cancer Institute. The study was approved by Health Research Ethics Board of Alberta and University of Manitoba Research Ethics Boards, after written informed consent in accordance with the Declaration of Helsinki. Clinical characteristics of the 198 randomly selected CLL patients are summarized in Table 1. The cutoff for designating U-CLL or M-CLL was the 2% mutation frequency in IGHV genes.

Samples
Peripheral blood CLL lymphocytes were stored as a frozen cell pellet and aliquots were cryopreserved. Samples with a high lymphocyte count (>40x10 9 cells/L) were not fractionated. Those with low counts (10-40x10 9 cells/L) were B-cell enriched by negative selection using the RosetteSep Human B-Cell Enrichment Cocktail (STEMCELL Technologies, Vancouver, BC, Canada). Those with lymphocyte counts <10x10 9 cells/L had positive CD19 selection.

Complementary determining region 3 (CDR3) analysis
CDR3 analysis, primer sequences and calculation of CDR3 length followed Kriangkum et al [20]. CDR3 regions were amplified from gDNA using a fluorescence labeled FR3/JHc primer set. DNA fragment analysis was run on an ABI Prism 3130xl Genetic Analyzer (Applied Biosystems, Burlington, ON, Canada) and data was analyzed by GeneMapper software v4.0. PCR products were also cloned using TOPO TA cloning kit (Invitrogen) and sequence analysis was performed using BigDye Terminator v3.1 reagent (ABI) following the manufacturer instructions.

Identification of clonotypic IGH sequences
This procedure followed Taylor et al [21]. All clonotypic IGH sequences were amplified from gDNA using primer sets that bind to leader sequences of IGHV gene families and IGHJ region. The most prominent rearranged IGH products were sequenced and confirmed by matching with the CDR3 analysis. Mutational and junctional analysis was performed using IMGT/ V-QUEST program version 3.2.30 [22]. The primers for each rearranged IGH were designed based on unique CDR1 (sense) or CDR2 (sense) and CDR3 (antisense) sequences. Primers were tested specific against the clone of interest and without cross reactivity to various other Bclones of the same IGHV gene family. The selected primer set was used for clonal identification in SCA.

Cell sorting and SCA
Cryopreserved samples were thawed, maintained overnight in culture medium at 37°C, 5% CO 2 . CD19 + cells were sorted into PCR tubes at a frequency of 1, 10 or 100 cells/tube using an Influxcell sorter (BD Biosciences, Mississauga, ON, Canada). Sorted single cells were analyzed by nested PCR [19]. Analysis was performed in 16-24 individual cells in samples that are monoclonal biallelic. For those samples with multiple clones, SCA was carried out in 96-110 individual cells. Clonal frequency was calculated as the percentage of cells positive in the test reaction over the total number of cells positive for β2 microglobulin (β 2 m). Analysis of 10 and 100 cell-aliquots was performed to validate those with low clonal frequencies. The frequency was interpreted as the presence of at least one clonal cell in the aggregate pool of cells analyzed (e.g., if 10 tubes with 100 cells each-a total of 1000 cells-are analyzed and only one tube shows positivity, the frequency is estimated to be at least 1/1000).

Repertoire analysis by NGS
IGH CDR3 regions were amplified and sequenced by Adaptive Biotechnologies Corp (Seattle, WA, USA) using ImmunoSEQ, a multiplex PCR system used to amplify CDR3 sequences from gDNA samples [23][24][25]. Amplicons were sequenced using the Illumina HiSeq platform. Raw sequence data was filtered based on the IGHV, IGHD and IGHJ gene definitions provided by the IMGT database (www.imgt.org) and binned using a modified nearest-neighbor algorithm to merging closely related sequences and remove both PCR and sequencing errors. Data was analyzed using the ImmunoSEQ analyzer toolset.

Statistical analysis
Data management and analysis was performed using Microsoft Excel (Microsoft Corp., Redmond, WA, USA) and SAS 9.2 (SAS Institute Inc., Cary, NC, USA).

Results
A substantial subset of CLL patients harbor more than one CDR3 peak Typically, CLL is characterized by a monoclonal B-cell expansion yielding a single CDR3 peak profile in DNA fragment analysis, unlike the polyclonal profile usually seen in healthy donors. In a cohort of 198 CLL patients, CDR3 profiling identified 26 patients who exhibited 2-5 dominant CDR3 peaks, suggesting the presence of more than one B-cell clone and/or clones with biallelic IGH rearrangements. Table 2 shows the patients grouped by the number of IGH rearrangements identified. These 26 CLL were subjected to further molecular characterization as outlined in

IGH biallelic rearrangements and/or multiple clones in a subset of CLL patients
Of the 26 patients exhibiting more than one dominant CDR3 peak, 20 were in the U-CLL subgroup and 6 in the M-CLL subgroup; clinical features are shown in S1  Among U-CLL cases having two IGH rearrangements, each IGH rearrangement utilized a different IGHV gene (Table 3). For 18/20 patients, they comprised one P and one NP (i.e., outof-frame or the presence of premature termination codons). SCA showed that these P/NP resided in the same cell (Fig 2A, patient CLL-2), which confirmed them as biallelic rearrangements in 7/7 patients analyzed (Table 3). Biclonality was confirmed in two other patients (CLL-67 and CLL-100, Fig 2B). For CLL-67, this was also supported by immunophenotypic analysis (data not shown).
For the 6 M-CLL patients with more than one CDR3 peak (Table 4), one patient (CLL-40) had one biallelic P/NP clone. The other five patients had two or more partner B-cell clones ( Fig  2B). The most abundant clone was designated as primary B-clone. CLL-43 and CLL-129 had a biallelic P/NP rearrangement in the primary clone and a second clone with a monoallelic P rearrangement. Patient CLL-105 had three distinct monoallelic clones and one clone with biallelic P/NP rearrangement. In this case, initial flow analysis had indicated three different CD5 + subsets: CD5 + /kappa, CD5 + /lambda and CD5 + /polyclonal. Two clonal sequences, 105P_24 and 105NP_82, utilized the same IGHV gene segment but their different mutational profile and different IGHD-IGHJ gene usage argue against a shared origin (S2 Fig). A similar observation was also made for 105P_42 and 105P_45 (S2 Altogether, both biallelic rearrangements and/or multiclonality characterize a subset of patients with B-CLL. Each of the 37 B-cell clones analyzed from 26 patients had only one P IGH rearrangement per B-cell, meeting the restrictions of allelic exclusion. The incidence of biallelic rearrangements in the primary CLL clones was calculated to be 23% (18/79) in U-CLL, 2.5% (3/119) in M-CLL, or 10.6% (21/198) for the entire cohort. Biclonality occurred in 2.5% (2/79) of U-CLL while multiclonality (2-4 clones) was found in 4.2% (5/119) of M-CLL. Thus, biallelic rearrangements were more frequent in U-CLL than M-CLL (p<0.0001; Fisher's exact test) but the incidence of bi/multiclonality was comparable (p = 0.7047; Fisher's exact test). There was no statistical difference in mortality between patients with multiclonal vs monoclonal disease (S1 Text).

Partner clones persist over time
Longitudinal studies were carried out in the seven CLL patients with more than one clone. SCA indicated that for 7/7 patients, multiple B-cell clones persisted over time (Tables 5 and 6). Partner clones are detected against a "background" of an abundant primary CLL clone with a B-cell count from 5x10 9 cells/L to 312x10 9 cells/L. For the CLL patients evaluated here, 9/11 (82%) partner clones had at one or more points in the disease a B-cell count >1x10 9 cells/L (Tables 5  and 6). Five patients (CLL-43, CLL-67, CLL-100, CLL-105, CLL-112) harbored partner clones that at some point in time had a B-cell count >5x10 9 cells/L (range = 5.8-28.7x10 9 cells/L), fitting the working definition for a second CLL clones. For the two biclonal U-CLL patients (CLL-67 and CLL-100), partner clones were abundant (Table 5). Prior to treatment, CLL-67 had equivalent biclonal frequencies; treatment reduced the total lymphocyte count (TLC) from 34x10 9 to 0.5x10 9 cells/L, preferentially reducing the partner B-cell clone. In CLL-100, treatment reduced TLC from 312x10 9 to 13.5x10 9 cells/L, with both clones proportionately reduced.
None of the five patients in the M-CLL subgroup received treatment during the period of study (Table 6). For CLL-43, both major and minor clones persisted over a period of seven years, with samples taken between years 15 and 22 of the disease course. For CLL-105, CLL-129 and CLL-200, the ratios between the primary CLL clone and the partner clone(s) were consistent over time. In both CLL-105 and CLL-129, the absolute numbers for each clone continued to rise, with a steady increase in TLC. For CLL-112, the disease progressed during the third year, but the TLC remained relatively constant. By year 5, clonal dynamics in this patient led to preferential expansion of the primary clone.

Confirmation of IGH multiclonality in CLL by NGS
To validate our conventional clonal analyses, and to screen for any additional multiclonality, ImmunoSEQ analysis was performed on 13 CLL patients, including seven biclonal or multiclonal CLL and six "typical" CLL having a single dominant clone, three MM and two WM previously reported as biclonal, [19,20] and three HD. ImmunoSEQ generated 2x10 5 -3x10 6 reads/ sample, giving 4x-70x coverage. The dataset includes sequence of CDR3 regions and part of IGHV sufficient for identifying the gene family. Clonal frequencies were calculated based on sequences of P rearrangements and are shown in Fig 3 and Table 7.
Overall, NGS found all B-cell clones identified by DNA fragment analysis and SCA, including the NP sequences. ImmunoSEQ also identified a small number of additional clones (open circles in Fig 3). For CLL-43, two more clones were identified. For CLL-112 and CLL-200, each patient had one additional clone. Clonal frequencies determined by NGS were generally consistent with those by SCA of the same sample (Table 7).
Although numbers are small, it is provocative that only M-CLL was accompanied by more than one partner clone for 4/5 patients analyzed (Fig 3). In contrast, even though both MM and WM also have mutated IGHV, each of the 5 patients analyzed had only one partner clone [19,20]. Two biclonal U-CLL had only one partner clone, and control CLL classified as having only the primary CLL clone by conventional analysis, also had only one clone by ImmunoSEQ analysis.

Discussion
Here we show that the presence of more than one rearranged IGH allele in CLL may be related to a P/NP rearrangement in the same cell, and/or to the presence of unrelated "partner clones" that coexist with the primary CLL clone. Although others have reported the presence of multiple clones in CLL, for the most part analysis was by immunophenotype or light chain restriction. This is the first report to demonstrate expansion of multiple B-cell clones in a subset of CLL patients, using single-cell analysis and next-generation IGH sequencing. While biallelic P/ NP rearrangements were more frequent in U-CLL, the presence of more than one clone occurred with equal frequency in U-CLL and M-CLL. Interestingly, those cases with >2 clones were more frequent in M-CLL. In general, the secondary clones may represent coexisting MBL, although in some cases they were of sufficient size to constitute a second CLL clone. Initial screening of 198 patients by CDR3 analysis identified 172 patients having a single rearranged IGH allele (monoallelic), and 26 patients who exhibited more than one rearranged IGH allele and/or more than one clone (biallelic, biclonal or multiclonal). Altogether, we analyzed 37 B-cell clones from 26 CLL patients: each B-cell clone carried only one productive IGH allele. In contrast to a previous study [3], we found no failures of allelic exclusion in this cohort of patients. We also identified a frequent subset of U-CLL patients whose clonal B-cells harbored a failed IGH rearrangement in addition to their productive IGH rearrangement. The proportion of U-CLL with biallelic rearrangements (23%) was comparable to the range in normal B-cell populations [26][27][28]. The biallelic rearrangement pattern was in contrast to that seen in M-CLL, MM or WM, most of which harbored one productive rearrangement and a presumptive germline allele. The frequency of P/NP rearrangements in patients with M-CLL was 10 fold lower (2.5%) than in U-CLL. Since memory B-cells are known to include those with biallelic P/NP [26], the parent B-cells that give rise to M-CLL appear to be negatively selected for biallelic IGH. This may reflect fundamental differences in the parent B-cells that give rise to M-CLL, MM and WM as compared to those giving rise to U-CLL or healthy memory B-cells.
Most previous reports of more than one rearrangement in CLL were not molecularly confirmed at the single-cell level but relied on phenotypic characterization, a less definitive clonal identifier. Here multiple clones were identified by two different methods and the findings were confirmed by evaluation of individual CLL cells. SCA indicated that all seven CLL having a productive plus a non-productive rearrangement were biallelic. For those CLL harboring more than one productive rearrangement, SCA confirmed that they represented two or more distinct B-cell clones. The incidence of molecularly defined multiclonality in typical CLL shown here (7/198, 3.5%) was compatible with those reported by others for typical CLL [9]. However, this value is likely to be an underestimate because initial screening by CDR3 analysis in our study did not identify clones with equivalent length of CDR3. For multiclonal CLL reported here, the most abundant clone was designated as the primary CLL clone. Partner clones were consistently detectable for many years, at relatively constant ratios with the primary CLL clone. High count MBL may be little different from CLL. With this in mind, for several patients the absolute number of cells with the partner clonal rearrangement reached !5x10 9 cells/L, the working definition for a second CLL clone in affected patients. The relatively frequent presence of partner clones suggests that evaluation of clonal heterogeneity and clonal evolution in CLL would benefit from inclusion of molecular analysis for IGHV-IGHD-IGHJ signatures to distinguish between intra-clonal and inter-clonal diversity. This would provide a means to identify minor clones with mutations, as detected by genome-wide analysis [17].
NGS analyzes the repertoire of B-cells in a large dataset that quantifies each clonal frequency. For comparison and as controls, we analyzed B-cells from CLL, MM, WM and HD.  [19,20]. For sample N3, top frequencies were 0.035%, thus were placed outside of the y-axis for reference only, not to scale. An arbitrary cutoff line was drawn at the highest frequency found in HD. Dominant clones in CLL are defined as those with frequencies above the cutoff line. The number of dominant clones for each sample is shown on the right. Closed circle, clone identified by both ImmunoSEQ and SCA; open circle, clone identified only by ImmunoSEQ. Monoallelic CLL identified by conventional means also scored as monoallelic by ImmunSEQ NGS, confirming its ability to discriminate biallelic and multiclonality in CLL. To distinguish the small increase in monoclonal B-cells found in HD from the considerably more abundant clonal B-cell expansions identified in CLL patients, an arbitrary cutoff was made above the highest frequency found in HD (Fig 3). Overall, both ImmunoSEQ and SCA yielded compatible frequencies (Table 7), except for CLL-67 in which SCA showed a lower frequency. This was likely due to clonotypic IGHV sequence heterogeneity found in one of the two clones (data not shown). However, ImmunoSEQ NGS could not replace SCA for verifying clonal identity.
In all CLL cases, clonal cells are present prior to the diagnosis of CLL, with multiclonality in low-count MBL [29][30][31]. In the majority of cases, only a single transformed clone reaches the threshold for massive clonal expansion. However, here we show that in five CLL patients who have at least one partner clone, the partner clone was sufficiently frequent at some points in time for designation as a second CLL. Thus, it may be clinically important to determine which clone harbors non-IGHV driver mutations identified by NGS. There is as yet no way to determine the clinical significance of partner clones. Nevertheless, their presence in a subset of CLL patients means that genome-wide sequencing analysis should address the contributions of inter-clonal diversity to genomic patterns.