Complex Mutations & Subpopulations of Deletions at Exon 19 of EGFR in NSCLC Revealed by Next Generation Sequencing: Potential Clinical Implications

Microdeletions at exon 19 are the most frequent genetic alterations affecting the Epidermal Growth Factor Receptor (EGFR) gene in non-small cell lung cancer (NSCLC) and they are strongly associated with response to treatment with tyrosine kinase inhibitors. A series of 116 NSCLC DNA samples investigated by Sanger Sequencing (SS), including 106 samples carrying exon 19 EGFR deletions and 10 without deletions (control samples), were subjected to deep next generation sequencing (NGS). All samples with deletions at SS showed deletions with NGS. No deletions were seen in control cases. In 93 (88%) cases, deletions detected by NGS were exactly corresponding to those identified by SS. In 13 cases (12%) NGS resolved deletions not accurately characterized by SS. In 21 (20%) cases the NGS showed presence of complex (double/multiple) frameshift deletions producing a net in-frame change. In 5 of these cases the SS could not define the exact sequence of mutant alleles, in the other 16 cases the results obtained by SS were conventionally considered as deletions plus insertions. Different interpretative hypotheses for complex mutations are discussed. In 46 (43%) tumors deep NGS showed, for the first time to our knowledge, subpopulations of DNA molecules carrying EGFR deletions different from the main one. Each of these subpopulations accounted for 0.1% to 17% of the genomic DNA in the different tumors investigated. Our findings suggest that a region in exon 19 is highly unstable in a large proportion of patients carrying EGFR deletions. As a corollary to this study, NGS data were compared with those obtained by immunohistochemistry using the 6B6 anti-mutant EGFR antibody. The immunoreaction was E746-A750del specific. In conclusion, NGS analysis of EGFR exon 19 in NSCLCs allowed us to formulate a new interpretative hypothesis for complex mutations and revealed the presence of subpopulations of deletions with potential pathogenetic and clinical impact.


Introduction
Lung cancer is the leading cause of cancer-related deaths in western countries and standard therapeutic strategies including surgery, chemotherapy, and radiotherapy have almost reached a plateau [1]. In recent years, the pharmacological treatment of non-small cell lung cancer (NSCLC) has undergone a major contribution by the introduction of new molecular targeted drugs whose effectiveness is closely dependent on the presence of specific genetic mutations in the tumor context [2][3][4][5][6]. Somatic mutations in the tyrosine kinase domain of the Epidermal Growth Factor Receptor (EGFR) gene emerged as one of the most relevant targets for lung cancer treatment [7][8][9][10][11][12]. Most of EGFR mutations are inframe microdeletions at exon 19 affecting the conserved amino acids ELREA. These mutations represent 44% to 80% of EGFR mutations in different studies [13] and they are strongly associated with sensitivity to tyrosine kinase inhibitors [14][15][16]. Exon 19 deletions usually affect one allele, with the other one being wild type. The technique most widely used to detect and characterize EGFR deletions is Sanger sequencing (SS) of an exon 19 PCR product [17,18]. The presence of wild type DNA amplified from the normal exon 19 allele may hamper an accurate detection of the microdeletion in the mutant allele even if the best sequence alignment algorithms are used. DNA-cloning in plasmids followed by sequencing of multiple clones can allow a more accurate analysis of deletions, especially in case of complex mutations. Since DNA-cloning and sequencing is time consuming this approach has been rarely used [19,20].
Massive parallel sequencing, also known as next generation sequencing (NGS), could be particularly suited for the detection of microdeletions. This new technology, based on PCR from single molecules before sequencing, realizes a sort of chemical cloning. Therefore, wild type and mutant alleles are analyzed separately, resorting in an accurate characterization of mutations. The high accuracy of NGS technologies is also achieved by multiple read coverage of a variant base in an individual sample [21][22][23][24].
These particular features make the NGS one of the most sensitive technology currently available for mutation scanning, allowing to detect somatic mutations in subpopulations of DNA molecules, as shown in dilution experiments [25,26].
We decide to investigate a large number of somatic microdeletions of the EGFR gene by deep sequencing. Results were compared with those obtained by SS and potential biological and clinical implications are highlighted.

Results
A series of 116 NSCLC DNA samples investigated by SS, including 106 samples carrying exon 19 EGFR deletions and 10 samples without deletions (control samples), were subjected to deep NGS. About 440.000 sequences, with a mean of 3497+/ 2158 sequences per samples, for a total of about 72.000.000 bp, were obtained.
All the samples with deletions at SS were found to be positive for exon 19 deletions with NGS. No deletions were observed in control cases. Deletions detected by SS were exactly confirmed in 93 (88%) cases and the frequency of concordant data was not statistically different in the three centers. In 13 cases (12%) the deletion was not correspondent to that observed by SS where, in 6 cases the starting-ending point was not correctly detected and the deletion was misinterpreted (i.e. c2235-2249del instead of c2236-2250del and viceversa or 2237_2251del instead of 2236_2250del, as reported in Figure 1), in 7 cases the starting-ending point of the deletions were clear, but the exact sequence of the deleted bases could not be accurately defined (Table 1).
In 21 cases (20%) the AVA software of the 454 Junior instrument showed the presence of double or multiple frameshift deletions that produced a net in-frame change. Most of these complex mutations (16 cases) were the sum of a long and short non-in frame deletion (17 bp_del+1 bp_del). A double deletion of 18 bp (10 bp_del+8 bp_del), two double deletions of 15 bp (12 bp_del+3 bp_del and 8 bp_del+7 bp_del), and a triple deletion of 12 bp (2 bp_del+8 bp_del+2b p_del) were also observed ( Figure 2). In addition to these multiple deletions, a case showing a novel, particularly complex mutation, composed of a 24 bp_del followed by a duplication-insertion of 12 bp was also seen. In 5 (24%) of these cases carrying complex mutations the SS did not allow to define the exact sequence of the mutant alleles (Table 1), in the other 16 (14%) cases the alterations obtained by SS were interpreted by the operators as complex deletions (deletions+insertions), according to previously reported data [19,27,28]. A comparison between SS and NGS data is reported in Table 2. Five of the complex mutations observed in the present study were considered novel mutations, as they were never reported before (cases indicated by an asterisk* in Table 1 and 2). In 3 (60%) of these cases the SS was unable to accurately detect the mutations.
In 70 (66%) tumors the NGS analysis revealed, in addition to the main deletion, the presence of subpopulations of DNA molecules carrying different deletions which, in most cases, were structurally related to the main one. Each of these deletions accounted for 0.03% to 17% of the genomic DNA in the different tumors investigated. Most (69%) of the deletions observed in subpopulations of DNA molecules were in frame, in 31% of cases minimally expanded non-in frame deletions were seen. In the cases with subpopulations of deletions, PCR amplification and NGS analysis were repeated using the same experimental conditions. We confirmed all the subpopulations present in at least the 0.1% of the DNA molecules. Instead, we were able to confirm the data in only a portion (about 40%) of cases carrying subpopulations in less than 0.1% of the DNA molecules (data not shown). Aware of the fact that it is theoretically very difficult to confirm, in independent PCRs, the presence of subpopulations of deletions if they are present in few readouts, we decided to consider these data as low confidence events. Therefore, we divided the subpopulations in high confidence events, when they were present in at least the 0.1% and low confidence events when they occurred in less than 0.1% of the DNA. Forthy-six tumors (43%) showed high confidence subpopulations of deletions. Four tumors (4%) showed substantial subpopulations of deletions with at least one of them representing more than 2% of genomic DNA. These substantial subpopulations were present in 4 (19%) of 21 cases in which the main mutation was a double/multiple deletion and in none of the 85 cases carrying simple deletions as main mutations (P,0.001). A selection of cases carrying subpopulations of deletions is shown in Figure 4. Table S2 reports all the EGFR mutations observed by NGS in the whole series of tumors investigated.

Discussion
The present study was devised to evaluate a large series of microdeletions at exon 19 of the EGFR gene by the 454 GS Junior system. This new technical approach is based on the amplification of single DNA molecules by emulsion PCR giving rise to a sort of chemical cloning before pyrosequence analysis. Since expanded wild type and mutant alleles are examined separately, the method can allow an accurate evaluation of mutations without the interference of wild type alleles as it occurs when conventional sequencing methods are used. The results of NGS were compared with those obtained by conventional SS. In about 12% of the samples analyzed, SS failed in finding the accurate sequence of deleted bases, either using a dedicated software or manually. In particular, in 6 cases, the starting-ending point of the deletion point was not correctly detected. This can be ascribed to the fact that, by using Sanger sequencing, the wild type and mutated alleles are superimposed on the same electropherogram. In some samples, it can happen that the alignment of the two alleles is not perfect, as shown in Figure 1, so that the operator can misinterpret the starting-ending point of the deletion. In other 7 cases, the starting-ending point of the deletions were clear but the exact sequence of the mutant alleles was partially obscured (not assessable).
An accurate detection of EGFR microdeletions is highly recommendable, in that different deletions could have different effect on tumor development and progression or patient outcome after treatment with EGFR TKIs. Moreover, different deletion may have specific effects on the antigenicity of proteins carrying deletion. In our study, we have tested this hypothesis by evaluating the possibility to detect EGFR deletions with specific monoclonal antibodies in tumors with different deletions accurately characterized by NGS. We have shown that the 6B6 anti-mutant EGFR antibody strongly reacts to EGFR proteins carrying the E746_A750del, whereas the immunoreaction was absent in tumors affected by other types of deletions. Our data are in agreement with previously published data obtained with this antibody in larger series of NSCLCs [29,30]. However, due to the limited number of cases examined in these reports, additional studies will be required to definitely clarify this point. Our results suggest that the development of new monoclonal antibodies or cocktails of antibodies would require the exact knowledge of the deletions and that NGS could be a very accurate and reliable technique to address this point.
The alignment of the numerous sequences obtained by NGS by dedicated softwares, allowed to formulate a new interpretative hypothesis on the nature of particular EGFR deletions. A series of frequent (20% of cases) complex deletions, most of which reported as deletions associated to insertions in previous reports [19,27,28], may also be ascribed to the presence of non-in frame double or multiple deletions producing a net in-frame loss of genetic material. This interpretation is in keeping with the hypothesis that a short region within exon 19 is particularly fragile and preferentially subjected to microdeletions. The frequency of these complex mutations was statistically higher in cases with longer (18 bp) losses: about 80% of double/multiple deletions resulted in a loss of 18 bps. Five novel mutations were observed by NGS in this study and all of them were complex, double/multiple mutations, that in 3 (60%) cases were not resolved by Sanger Sequencing. This clearly confirms the superiority of NGS in the characterization of EGFR microdeletions at exon 19. Rare cases of double/multiple deletions have been reported in previous studies, especially when conventional sequencing was conducted on multiple samples after cloning of genomic DNA into plasmids [19,20,31]. Cloning was essential to better characterize multiple deletions and rule out the possibility that they were present on different alleles. However, biological cloning and sequencing of multiple samples is time consuming and not suitable for routine clinical diagnostic purposes. The GS Junior Technology, based on PCR cloning before pyrosequencing of multiple samples, is in our opinion an ideal approach for a fine characterization of complex deletions in exon 19 of the EGFR gene.
The NGS assay is one of the most sensitive methods available for the detection of somatic mutations when used in deep sequencing, and the sensitivity of NGS is dependent on the number of sequences obtained per sample [25,26]. In this study we decided to perform a deep NGS analysis taking a median of more than 3.000 sequences per sample. The high sensitivity of this NGS assay allowed us to detect in about 70% of cases, subpopulations of DNA molecules carrying exon 19 deletions different from the main mutation, but in most cases structurally related to it (Figure 4). In the majority of cases these subpopulations carried in frame simple or double deletions. In about one third of cases, less expanded non-in frame deletions were observed. These subpopulations were confirmed in independent PCR-NGS assays in 43% of the tumors investigated. However, only 4 tumors (4%) showed substantial subpopulations of deletions with at least one of them representing more than 2% of genomic DNA.  The presence of these subpopulations in a large proportion of cases examined is at moment unclear. It is extremely unlikely that they are due to cross-contaminations, since they were highly heterogeneous, usually related to the main deletion, and they were not seen in control samples. In addition, particular strategies were adopted to minimize cross-contaminations in our study (see Materials and Methods). These subpopulations could represent modifications of the main deletion or ex novo deletions acquired in cell clones during tumor progression. In both cases, this finding would support the hypothesis that this region within exon 19 of EGFR is highly instable in most patients affected by NSCLC carrying EGFR deletions. This genetic fragility may enable the development of both complex deletions and multiple subpopulations. To the best of our knowledge, this is the first demonstration of subpopulations of EGFR deletions in NSCLC. Our findings could have important clinical implications. Recent evidence indicate that tumor genotype may evolve dynamically under the selective pressure of targeted therapies [32]. We are tempted to hypothesize that the genetic fragility of exon 19 in particular patients could take a role in the dynamic evolution of lung tumors subjected to different therapeutic strategies. It would be interesting to monitor by NGS on repeated biopsies the main EGFR deletion as well as the deletions in minor clones during treatment.
In conclusion, our results indicate that NGS is particularly suitable for the study of EGFR deletions. This technique can accurately characterize EGFR deletions, even in cases in which conventional methods fail. Data obtained by NGS analysis allowed us to formulate a new interpretative hypothesis for complex deletions which represent about 20% of EGFR mutations in exon 19 as well as to identify 5 novel deletions. In addition, we report, for the first time to our knowledge, the presence of subpopulations of different deletions in most of the tumors investigated with pothential pathogenetic and clinical impact.

PCR amplification
DNA fusions primers containing genome-specific sequences, along with one of 7 distinct 10-bp MIDs (multiplex identifier sequences used to differentiate samples being run together on the same plate) and sequencing adapters were used to amplify a 108 bp region in EGFR (NM_005228.3) exon 19 ( Figure S1 and Table S1). PCR primers, were designed using the OligoAnalyzer 3.1 software (http://eu.idtdna.com/analyzer/Applications/ OligoAnalyzer/) and synthesized at MWG-Biotech AG.
A touch-down PCR cycling program was performed on the Gene Amp PCR system 9700 thermocycler (Applied Biosystems) with an initial step at 94uC for 2 min followed by 43 cycles at 94uC for 30 sec, 64uC (decreasing the temperature by 1uC each cycle for six cycles) for 30 sec, and 70uC for 30 sec, and a final step at 70uC for 5 min. Different strategies were adopted to avoid crosscontaminations: a) reactions were set up in positive-pressure hoods with UV sterilization systems to decontaminate reagents and equipment prior to carrying out PCRs; b) different hoods were used for PCR amplification of samples subjected to different runs; c) PCR reactions were conducted on 96-well plates, with a maximum of 4 samples loaded per plate.

Next Generation Sequencing
PCR products were visualized on agarose gel, purified using size-exclusion SPRI Ampure-XP DNA-binding paramagnetic beads (Agencourt Bioscience Corp., Beckman Coulter S.p.A, Milan, Italy), and quantified in 96-well format with the QuantiFluor TM -ST Fluorometer (Promega, Madison, Wisconsin, USA) using a PicoGreenH assay (Invitrogen, Carlsbad, CA). Samples were then diluted to an approximate concentration of 1610 9 molecules/mL and pooled at equimolar concentrations to create a highly multiplexed amplicon library. After pooling, the library was further diluted to 10 6 molecules/ml and subjected to emulsion PCR (emPCR) using the 454 GS Junior Titanium Series Lib-A emPCR Kit (Roche Diagnostics), according to the manufacturer's protocols. Following emPCR, the captured beads with bound DNA were enriched with a second DNA capture mechanism to separate out beads with and without bound emPCR products. By using a bead counter, the number of enriched beads was estimated to be between 300,000 and 1 million. The enriched pool of beads was then used for massively parallel pyrosequencing in a Titanium PicoTiterPlateH (PTP) with Titanium reagents (Roche Diagnostics), on the GS Junior instrument, according to the 454 GS Junior Titanium Series Amplicon Library Preparation Method Manual (available online: www.454.com).

Analysis of sequence data
Processed and quality-filtered reads were analysed with the GS Amplicon Variant Analyzer (AVA) software version 2.5.3 (454 Life Sciences). EGFR exon 19 reference sequence was extracted from Hg19 Human Genome Version together with both neighbor intronic regions. Such sequence was used as Reference Sequence to align every reads. The final alignments were checked manually and wrong alignments were edited by Jalview Multi-alignment Tool (http://www.jalview.org/). In order to parse and manage all data sequences and results (variations, frequency of mutations, forward and reverse reads check) custom scripts were created using Perl language.

Immunohistochemical analysis
The EGFR E746-A750 deletion specific (6B6) antibody (Cell Signaling Technology, Inc.) was used for immunohistochemical staining. Paraffin embedded sections were dewaxed, hydrated, treated with proteinase K (DAKO, Glostrup Denmark) and immunostained using a labelled polymer detection system (Bond Polymer Define Detection, Vision Biosystem, Mount Waverley, Australia) and automated stainer (BOND-maX, Vision BioSystem). The primary polyclonal antibody was used at a dilution of 1:100. Negative controls were obtained by replacement of primary antiserum with buffer. IHC expression of mAbs against EGFR was evaluated using the following scoring: 0 = negative; 1 = weak staining in .10% of cancer cells; 2 = moderate staining in .10% of cancer cells; 3 = strong staining in .10% of cancer cells. A score of 0 was considered negative, a score of 1 was considered weakly positive, and a score of 2 or 3 was considered highly positive.

Statistical analysis
The variables measured in the study were investigated for association by the Fisher's exact test or x 2 test as appropriate. A P,0.05 was considered as significant. Statistical analyses were performed using SPSS version 15 (SPSS, Chicago, IL). Figure S1 Polymerase chain reaction (PCR) primer design. The squares represent the primer binding sequence, the circles represent the multiplex identifier (MID) sequence, and the thick lines represent the fusion primer sequence for 454 applications. The lenght of each primer is 61 and 57 nucleotides for forward and reverse, respectively. The total amplicon lenght is 178 bp including 108 bp of the EGFR gene. (TIF)

Supporting Information
Table S1 Primer sequence for EGFR ex 19. Starting at the 59-end of each primer, fusion primer sequence for forward and reverse emulsion PCR and pyrosequencing is in standard font, followed by multiplex identifier sequence in italics and primer binding sequence in bold. (DOC)