Next-Generation Sequencing of Lung Cancer EGFR Exons 18-21 Allows Effective Molecular Diagnosis of Small Routine Samples (Cytology and Biopsy)

Selection of lung cancer patients for therapy with tyrosine kinase inhibitors directed at EGFR requires the identification of specific EGFR mutations. In most patients with advanced, inoperable lung carcinoma limited tumor samples often represent the only material available for both histologic typing and molecular analysis. We defined a next generation sequencing protocol targeted to EGFR exons 18-21 suitable for the routine diagnosis of such clinical samples. The protocol was validated in an unselected series of 80 small biopsies (n=14) and cytology (n=66) specimens representative of the material ordinarily submitted for diagnostic evaluation to three referral medical centers in Italy. Specimens were systematically evaluated for tumor cell number and proportion relative to non-neoplastic cells. They were analyzed in batches of 100-150 amplicons per run, reaching an analytical sensitivity of 1% and obtaining an adequate number of reads, to cover all exons on all samples analyzed. Next generation sequencing was compared with Sanger sequencing. The latter identified 15 EGFR mutations in 14/80 cases (17.5%) but did not detected mutations when the proportion of neoplastic cells was below 40%. Next generation sequencing identified 31 EGFR mutations in 24/80 cases (30.0%). Mutations were detected with a proportion of neoplastic cells as low as 5%. All mutations identified by the Sanger method were confirmed. In 6 cases next generation sequencing identified exon 19 deletions or the L858R mutation not seen after Sanger sequencing, allowing the patient to be treated with tyrosine kinase inhibitors. In one additional case the R831H mutation associated with treatment resistance was identified in an EGFR wild type tumor after Sanger sequencing. Next generation sequencing is robust, cost-effective and greatly improves the detection of EGFR mutations. Its use should be promoted for the clinical diagnosis of mutations in specimens with unfavorable tumor cell content.


Introduction
Lung carcinoma often presents at advanced stage and is the leading cause of cancer-related death in developed countries (http://seer.cancer.gov/statfacts/html/lungb.html#survival). The discovery in 2004 that activating somatic mutations in the tyrosine kinase EGFR gene make the tumor sensitive to tyrosine kinase inhibitors (TKIs) treatment has represented one of the most significant breakthrough in the field of molecular oncology in the past decade [1,2]. Randomized clinical trials have shown patient responses to the TKIs Erlotinib (Tarceva, OSI Pharmaceutical) or Gefitinib (Iressa, Astrazeneca) as first-line treatment in approximately two thirds of patients with EGFR mutated tumors with rates far superior to those obtained with conventional platinum-based chemotherapy [3][4][5][6][7][8][9].
EGFR mutated tumors are typically adenocarcinomas, where mutations can be identified in approximately a quarter of cases, and in a higher proportion of tumors from Asian patients.
Adenocarcinomas are now regarded as the most common lung carcinoma subtype, constituting approximately 40% of all non-small cell lung cancers (NSCLC) [24] and molecular analysis of EGFR exons 18,19,20,21 is recommended in all adenocarcinoma or lung tumors with an adenocarcinoma component [10]. Thus, the pathologic evaluation of a lung carcinoma now requires both an accurate subtyping by histological and immunohistochemical studies as well as the determination of the EGFR mutational status to select patients for TKIs therapy. This in depth evaluation obviously requires adequate amounts of tumor tissue of good quality, like those obtained from lung resections [25].
Unfortunately 60% of NSCLC are high stage locally advanced and/or inoperable tumors that have already metastasized to distant sites by the time they are detected (http://seer.cancer.gov/statfacts/html/lungb.html#survival). Thus, in patients with such tumors very limited samples -small biopsies or cytology specimens -are usually the only material available for histologic typing and for molecular analysis [26]. In these samples the issue of specimen purity i.e. the proportion of lesional material to the "contaminating" benign or nonlesional cells is often critical [27,28]. Sanger sequencing, the most widely used method for mutation detection does not have enough analytical sensitivity to reliably identify mutations in samples with a low proportion of tumor cells. It can give false negative results if the percentage of neoplastic cells is below a general threshold of 50% that corresponds to 25% mutated alleles, assuming the mutation is heterozygous and that the EGFR chromosomal site 7p12 is dysomic [29]. Therefore alternative methods -each with its own advantages and disadvantages -have been proposed to detect EGFR mutations with the goal of achieving higher sensitivity. Many are currently used for molecular analysis of routine samples [30]. These include High Resolution Melting (HRM) [31], Restriction Fragment Length Polymorphism (RFLP) [32], mutant allele-specific PCR [33], Peptide Nucleic Acid Locked PCR Clamping (PNA-PCR) [34,35], pyrosequencying [36], and immunohistochemistry with specific EGFR antibodies that detect the L858R mutation and exon 19 deletion [37,38]. Some mutation-specific methods like the Scorpion ARMS (TheraScreen EGFR29 mutation kits from QIAGEN Manchester [formerly DxS], Manchester, England) [39] reach a very high sensitivity (~1%) but underestimate not pre-designed mutations and require a rather significant amount of DNA, not always available in limited samples [40,41].
The development of next generation sequencing (NGS) methods -also known as massive parallel sequencing since they allow the parallel analysis of a very large number of DNA molecules -has represented one of the more significant technical advances in molecular biology [42]. These methods have become available since 2005 and are producing remarkable breakthroughs in oncology, including the definition of the entire DNA sequence of common types of human cancers [43]. This is a multicentric study to evaluate the application of NGS to the molecular diagnosis of EGFR mutations in limited samples of NSCLC, small biopsies and cytology specimens. Instead of analyzing a few cases for a large number of genes, we chose to target parallel sequencing to the EGFR mutation "hot spots". The focus was on EGFR analysis because, as mentioned above, EGFR mutations often need to be identified in DNA extracted from specimens that are both problematic for molecular diagnosis and at the same time are very crucial to decide patient treatment. NGS sequencing results were compared with those of Sanger sequencing, the method currently in use for the routine molecular analysis in the three Italian partner centers in Bologna, Padova and Naples, that participated to the study.
We reasoned that in spite of its greater complexity compared with Sanger sequencing (or other alternative methods) NGS offers several advantages -high analytical sensitivity, screening of the entire nucleotide sequence of the target region, semiquantitative evaluation of the mutated allele, analysis of many samples in a single run (high throughput)that make it ideal for the study of lung carcinoma. Our results indicate that NGS can be effectively applied to meet the needs of routine DNA analysis, and that it represents a practical alternative to other methods currently used to detect EGFR mutations.

Ethics Statement
Since EGFR mutational analysis is part of the routine diagnostic workup of patients with NSCLC the need for ethic committee's approval was not necessary for this study, in accordance with medical ethical guidelines of the Azienda Unità Sanitaria Locale di Bologna, Azienda Universitaria Policlinico Università degli Studi di Napoli Federico II, Azienda Universitaria Policlinico Università degli Studi di Padova and in accordance with general authorisation to process personal data for scientific research purposes from "The Italian Data Protection Authority" (http://www.garanteprivacy.it/web/guest/ home/docweb/-/docweb-display/export/2485392). Accordingly to these guidelines, a comprehensive written informed consent was signed for the procedures (fine needle aspiration, biopsies and surgical resections) that produced the tissue samples and for their diagnostic workup. All information regarding the human material was managed using anonymous numerical codes. Clinical data and follow up information were not used for this study. All samples were handled in compliance with the Helsinki declaration (http://www.wma.net/en/30publications/ 10policies/b3/). University Hospitals of Naples and Padova. All patients had a clinical indication for EGFR mutation testing for advanced lung carcinoma. Tumor cells for molecular analysis were obtained from cytology slides in 66 cases and from formalin-fixed paraffin embedded (FFPE) tissue sections in 14 biopsy specimens. Both cytology and biopsy specimens were routinely processed and diagnoses rendered according to standard criteria.
All cases were processed for DNA extraction after pathologic review ensured the presence of at least one hundred neoplastic cells. The proportion of neoplastic cells/total number of cells (i.e. tumor cell enrichment) was estimated on all cases. The total number of neoplastic cells present in the sample processed for molecular analysis was also evaluated. It was estimated as follows: in a given sample neoplastic cells were counted in five one square millimeter fields (1 mm 2 ) and the mean of these five values calculated; the mean was then multiplied for the total area (in millimeters) of the specimencytology smear or histology section of the biopsy -selected for molecular analysis.
For cytology samples cells were scraped from the area selected for the analysis after cover-slip removal by immersion of the slide in xylene. For FFPE material, six 10 μm thick sections were cut from each selected block, followed by one Hematoxylin and Eosin stain (H&E) control slide. The tumor area was marked on the control slide and tumor material was manually dissected under microscopic guidance from the corresponding 10 μm sections using a sterile blade. Twenty additional DNA samples -previously characterized for the EGFR mutational status -were utilized to evaluate the reproducibility of 454 parallel sequencing (see below).

DNA extraction
For Sanger sequencing DNA was extracted according to standard procedures as previously reported [44][45][46]. To ensure maximal yield of DNA for NGS two distinct protocols were followed for cytology and biopsy specimens. DNA was extracted from cytology samples using MasterPure DNA Purification Kit (Epicentre, Madison, WI, USA) according to the manufacturer's instruction. For samples obtained from biopsies DNA was extracted with the High Pure PCR Template Preparation Kit (Roche Diagnostics, Mannheim, Germany) following the manufacturer's protocol.

Sequencing
Eighty samples were tested for EGFR (exon 18, 19, 20 and 21) using Sanger and/or Next Generation sequencing. Sanger sequencing was performed at the Molecular Pathology facilities of the Anatomic Pathology sections of the Bellaria Hospital-University of Bologna (33 cases), of the University Hospital of Naples (29 cases) and of Padova (18 cases). Next Generation sequencing of all cases was performed in parallel using a 454 GS-Junior Next Generation sequencer at the Molecular Pathology facility of the Bellaria Hospital-University of Bologna, Anatomic Pathology section, starting from routinely processed material originally selected from the Anatomic Pathology section of the Bellaria Hospital and from the submitting institutions in Padova and Naples. Negative controls and no template DNA controls were included in all runs.
Sanger sequencing. PCR reactions were performed using the FastStartTaq DNA polymerase kit (Roche Applied Science, Mannheim, Germany), starting from 15-50 ng of DNA. Primers used for PCR are described in Table 1. PCR products were checked on 2.5% agarose gel and amplicons purified using Agencourt Ampure XP beads (Beckman Coulter, Inc., Fullerton, CA, U.S.A.). Sequencing was carried out according to standard procedures using the GenomeLab DTCS Kit Briefly, these include the following steps: PCR amplification of the target sequence, purification of the amplified fragments, emulsion PCR, and recovery of the emulsion PCR products that are then loaded on the Titanium PicoTiterPlate (Roche For EGFR mutation analysis we defined the following workflow format, using the primers shown in Table 1 to analyze exons 18-19-20-21. Approximately 10 ng of genomic DNA were amplified for each exon. All PCR reactions were performed using a FastStart High Fidelity Taq Polymerase (Roche Diagnostic, Mannheim, Germany). In each sequencing run 100 to 120 different target sequences were analyzed for parallel pyrosequencing. This allowed us to study 25-30 cases per run, since all four EGFR exons were evaluated on all patients, with a putative number of approximately 700 reads per target, according to manufacturers' specifications (http://www. 454.com/).
Considering that each target sequence -exon and patient specific -is univocally identified by a specific couple of MIDs, at least 5 forward primers and at least 6 reverse primers with unique MIDs were necessary to analyze 30 cases in the same run (or viceversa at least 6 forward primers and at least 5 reverse ones). We used grid schemes per each EGFR exon to identify the unique association between MID couples and specific target sequences (see Figure 1). Both forward and reverse strand sequences ("reads") were evaluated after parallel amplification of the target DNA. The sequences obtained were analyzed using the Amplicon Variant Analyzer (AVA) Software (Roche Diagnostics, Mannheim, Germany). Only nucleotide variations observed in both strands were considered for mutational call. Ambiguous base calls associated to stretches of homopolymer 4 base pair or longer were not considered mutated due to the limitations of the pyrosequencing chemistry that is used by 454 NGS for sequence analysis [47,48].
Analytical sensitivity. The analytical sensitivity of our 454 sequencing workflow format for EGFR mutational analysis was tested by serially diluting (1:1, 1:2, 1:10, 1:100, 1:1000) DNA from a pool of samples harboring a homozygous nucleotide polymorphism G>A at the 2470 position of the EGFR sequence (c2470 G>A, CAG CAA, Q787Q exon 20) in a pool of samples that did not harbor the nucleotide substitution (c2470, CAG). Each analysis was repeated at least twice.
Minimal amount of input DNA at the analytical sensitivity threshold.
The input DNA at the analytical sensitivity threshold was serially diluted in H 2 O to determine the minimal amount of DNA necessary for mutation detection.
454: Next Generation Sequencing reproducibility. Interassay reproducibility (i.e. the consistency of results with the same protocol in different runs) was assayed by repeating the sequence analysis of 20 DNA samples that were previously characterized for their EGFR mutational status (11 wild type cases and 9 EGFR mutated ones -7 with a deletion in exon 19, one with the L858R and one with the T790M mutations).

Performance of the 454 Next Generation Sequencing protocol
Analytical sensitivity and definition of the threshold for mutational call.
The analytical sensitivity of 454 NGS depends on the total number of reads that can be obtained for a given sample. Following our 454 sequencing format protocol, that targets a putative number of approximately 700 reads per amplicon, we tested a serial (1:1, 1:2, 1:10, 1:100, 1:1000) dilution of DNA with the c.2470 G>A nucleotide substitution at the 2470 polymorphic site of the EGFR gene sequence in a pool of human DNA without the substitution (c.2470 G).
The c.2470 G>A substitution was consistently detected down to a 1:100 dilution only if at least 10 consensual c.2470 G>A nucleotide reads were obtained after parallel 454 sequencing. This observation is illustrated in Figure 2. With a 1:100 dilution of c.2470 G>A DNA the nucleotide substitution was observed when the total number of reads was 2,359, corresponding to 23 consensual c.2470 G>A sequences ( Figure 2D). It was not observed when the total number of reads was 750, that would have corresponded to 7 consensual c.2470 G>A sequences ( Figure 2E). The c.2470 G>A substitution was also not observed with a 1:1000 dilution, even when the total number of reads analyzed was very high (between 3000 and 4000), but not enough to reach 10 c.2470 G>A reads that would have required a total of 10,000 reads per amplicon ( Figure 2F).
Based on these observations we established as criteria to define a sample mutated the identification of the mutation in at least 10 reads (i) and in at least 1% of the total number of reads analyzed (ii). The requirement of 10 consensual reads makes the criteria for the mutational call more stringent for samples that generate a total number of reads lower than expected, thus ensuring the specificity of the results.
Minimal amount of input DNA at the analytical sensitivity threshold. The amount DNA required to detect c.2470 G>A DNA at the analytical sensitivity threshold of 1% (1:100 c.2470 G>A DNA dilution) was serially decreased starting from 10 ng, to determine the minimal input DNA necessary to detect the nucleotide substitution. A minimal amount of 2 ng of DNA was sufficient to consistently obtain amplifiable DNA and detect the c.2470 G>A substitution. Considering that each human diploid cell contains 7 pg of DNA, 2 ng of a 1:100 dilution of c.2470 G>A DNA in c.2470_G DNA correspond to the detection of approximately 4 mutated cells in a total of 200 cells without the mutation, assuming that the mutation is heterozygous and EGFR dysomic.
Reproducibility. To evaluate the reproducibility of 454 parallel sequencing we utilized 20 DNA samples previously characterized for their EGFR mutational status: 11 were EGFR wild type, 7 had an exon 19 deletion, and one each had a L858R-exon 21 mutation and a T790M-exon 20 mutation. Each sample was repeated twice with 454 NGS and in all cases the mutational status was confirmed. In samples that harbored EGFR mutations the percentage of mutated reads varied on average 2.6% (median variation 1.45%, range 0. 6%-8.6%).

Pathologic diagnosis and microscopic evaluation of tumor cellularity in non-small cell lung carcinoma samples
Eighty cases were studied. Pathologic diagnoses are summarized in Table 2. Fifty-six of 80 cases were primary lung lesions diagnosed as adenocarcinoma (n= 33) or NSCLC not otherwise specified (NOS) (n= 23). Twenty-two specimens were from tumor metastases: 18 in lymph nodes, 2 in the pleura, and 2 in the bone. Two samples were cytology preparations from pleural effusion.
The results of the evaluation of tumor cellularity are summarized in Table 2 and illustrated in Figure 3. The proportion of neoplastic cells/total number of cells (i.e. tumor cell enrichment) was evaluated on all cases. Percentages of neoplastic cells ranged from 5 to 80% (mean 45.2%, median 50.0%). In 35 samples the proportion of neoplastic cells was less than 40%. In half of the cases the proportion of neoplastic cells was less than 50%. The total number of neoplastic cells in the sample submitted to EGFR mutational analysis was estimated in 53 cases. It ranged between 190 and 730,000 (mean 68,147, median 17,400). A numerical estimate of tumor cell enrichment and of the total number of neoplastic cells analyzed for EGFR mutation was available in all but two cases with discrepant results between Sanger and next generation sequencing (see below).   Table 3, Figures 4 and 5).
454: Next Generation Sequencing. Each target sequence was analyzed at least 100 times per sample, ranging from 101 to 2,656 reads per target (mean 521.8). The criteria outlined above of at least 10 reads with a consensual mutation in at least 1% the total number of reads were used to diagnose all mutations. Nucleotide variations associated to homopolymer stretches were observed, usually in only one strand ( Figure  4F). They were considered technical artifacts resulting from the pyrosequencing chemistry used by 454 NGS for sequence analysis, that does not adequately discriminate repeated sequences of the same nucleotide. A) cytology specimen from a 72 year old woman with adenocarcinoma metastatic to a mediastinal lymph node (May Grumwald Giemsa, 200X magnification, inset 600X); the proportion of neoplastic cells in the sample is 35%; DNA analysis was wild type after Sanger sequencing, but NGS showed two EGFR mutations (G721W, R831H) (case 57 of Table 5). B) biopsy specimen from a 65 year old man with adenocarcinoma metastatic to bone (vertebral body) (Hematoxylin and Eosin, 200X magnification, inset 600X); the proportion of neoplastic cells in the sample is 5%; DNA analysis was wild type after Sanger sequencing, but NGS showed the L858R EGFR mutation (case 80 of  Table 3. EGFR mutational analysis using Sanger and Next Generation sequencing. C-T or G-A transitions have been associated with sequencing artifacts in FFPE samples with low amounts of DNA [49]. Of our five mutations with no previous record in literature databases one (S752F) was a C → T transition (TCT → TTT), no mutation was a G → A transition.

Biopsy
The relevance for patient treatment with tyrosine kinase inhibitors of all mutations found in the study is summarized in Table 4.
Mutations were found in 16 of 52 (30.8%) cases diagnosed as adenocarcinoma and in 8 of 28 (28.6%) cytology samples that could not be further subtyped and were diagnosed as NSCLC. They were found in 6 of 14 (42.9%) biopsy specimens and in 18 of 66 (27.3%) cytology samples. 454 NGS detected mutations over a wide range of neoplastic cell number. The exon 19 deletion (del L747-A752) identified by Sanger sequencing in the specimen with the lowest number of neoplastic cells in our series (190 neoplastic cells) was also identified by 454 NGS. 454 NGS detected 22 EGFR mutations in cases with a proportion of neoplastic cells in the sample analyzed > 40% and 9 mutations in cases with a low proportion of neoplastic cells ( < 40%) ( Table 3, Figures 4 and 5).
In six cases two EGFR mutations (five cases) or three EGFR mutations (one case) were observed in the same specimen. In one case (Table 5, case 62) two mutations of the same exon (exon 18) were identified in different DNA strands (not in haploptype). In the remaining 5 cases mutations where located in different exons. In one of these 5 cases the percentage of mutated reads was identical for both mutations (T790M and L858R) ( Table 5, case 68), and therefore compatible with the same population of neoplastic cells harboring both nucleotide changes. In 2 of the 5 cases the percentage of mutated reads was very similar ( Table 5, case 57 and 30), suggesting that it is the same population of neoplastic cells to harbor more than one mutation.
The median number of neoplastic cells was 23,588 (range 1,390 -228,150) in the cases with multiple mutations and was 4,360 (range 190 -154,000) in those where a single mutation was detected. The difference in the number of neoplastic cells between the two groups was not statistically significant (p=0.3667) ( Figure 6A).
Cases with differences in the EGFR mutational status following Sanger and 454 Next Generation sequencing. Among the 80 cases analyzed in this study sequencing results with Sanger and 454 NGS were identical in 66 cases, 56 cases were wild type by both methods, in 10 cases the same mutation was identified with both. Differences in the EGFR mutation pattern after Sanger and 454 NGS were observed in fourteen cases (17.5%) and are summarized in Table 5. One example is illustrated in Figure 4 E-F. Sanger sequencing did not detect EGFR mutation in 10 of these 14 cases. In these 10 cases one or more mutations were identified with 454 NGS. They included exon 19 deletions in 5 of the ten cases, as well as L858R and R831H in one case each. In the remaining 4 cases Sanger sequencing identified EGFR mutations (Table 5, Cases 2, 30, 62, 68). All of them (del E746-A750, L585R, G719A, T790M) were confirmed by 454 NGS that also detected additional mutations.
The mutations detected by 454 NGS in the 14 cases were single in 8 and multiple in the other 6 cases, ranging from 2 to 3 mutations per case. In one case (Table 5, Case 68) Sanger sequencing identified two distinct mutations (T790M and L858R).
Among the 14 cases with different EGFR mutational status two sets could be easily recognized. In one consisting of 7 cases (Table 5, Cases 25,39,57,63,67,76,80) the proportion of neoplastic cells in the sample analyzed was below 40%. All these cases were considered wild type after Sanger sequencing, but 454 NGS detected EGFR mutations with a percentage of mutated reads that was below the analytical sensitivity threshold of Sanger sequencing that is generally set at 20% mutated alleles (40% neoplastic cells, assuming that the mutation is heterozygous and EGFR dysomic) (Figure 4 E-F). In the second set, also consisting of seven cases (Table 5, Cases 2,30,59,62,66,68,79) the proportion of neoplastic cells was greater than 40%. All four cases where Sanger sequencing detected EGFR mutations belong to this set. In all cases the mutations confirmed by 454 NGS had a high proportion of mutated reads ( > 20%) (Figure 4 A-D). The additional mutations detected in these cases by 454 NGS -but not by Sanger -had a low percentage of mutated reads, below the Sanger detection threshold of 20% mutated alleles: it was <5% in 6 of the 7 cases, and 16% in the remaining one .
In the 14 cases with different EGFR mutation pattern the median number of neoplastic cells in the samples analyzed was 4,196 (range 848 -228,150). It was 17,400 (range 190 -730,000) in the 66 cases where sequencing with the Sanger method and 454 NGS gave identical results. The difference between the two groups was not statistically significant (p=0.1721) ( Figure 6B).

Discussion
The remarkable association between certain EGFR mutations, especially exon 19 deletions or L858R mutation in exon 21 and clinical benefit in patients with NSCLC treated with EGFR tyrosine kinase inhibitors (TKIs), such as Gefitinib and Erlotinib, is well established [1][2][3][4][5][6][7][8][9][13][14][15][16]23]. Therefore the selection of lung cancer patients for molecular therapy with TKIs mandates the analysis of DNA extracted from tumor samples. Unfortunately it is difficult to achieve full control and standardization of all the pre-analytical steps related to sample collection that ultimately define the quality of the specimens that are submitted for molecular analysis. Guidelines identify a careful evaluation of tumor cell content before DNA analysis as a critical issue [10,11]. In patients with advanced lung carcinoma small biopsies or cytology specimens are frequently the only material available to establish the pathologic diagnosis and for molecular testing. In these samples the percentage of neoplastic cells is often low and to enrich the tumor cell content by dissecting tumor cells is cumbersome and often impossible. Since chemotherapy is the only available treatment for patients with advanced lung carcinoma, these samples are very crucial. Their use needs to be carefully optimized for both morphologic   diagnosis and DNA analysis in spite of their suboptimal nature [26]. Therefore, while the search for methods and protocols that identify mutations with high sensitivity is one of the goals of molecular diagnosis, the issue is particularly relevant for the management of patients with lung carcinoma. Many strategies that utilize a variety of technical approaches have been developed in the recent past and are being used for diagnosis [30][31][32][33][34][35][36][37][38][39]. The application of NGS to the routine analysis of patient samples is being evaluated and may radically change the approach to the molecular diagnostics of solid tumors [50][51][52]. Reports on the use of NGS for the analysis of lung carcinoma are still relatively few [53][54][55][56][57], but show encouraging results. Although most samples analyzed in these reports have been lung resections specimens, some with limited amounts of tumor have also been studied. Remarkably, Buttitta et al. have shown that in principle NGS can identify EGFR mutated alleles in bronchoalveolar lavages and in the pleural fluid of samples where tumor cells were very scarce or even altogether absent after microscopic evaluation [55].
To the best of our knowledge this is the first study to systematically assess the application of NGS to the analysis of the small biopsy and cytology lung carcinoma samples that represent the majority of the specimens routinely submitted for both tumor typing and molecular analysis. All specimens were unselected and representative of the material ordinarily submitted to three referral Italian centres. The proportion of neoplastic cells -i.e. tumor cell enrichment -was evaluated on all cases and in the majority of them the absolute number of tumor cells was also be assessed.
For NGS analysis we have used the 454 GS-Junior sequencer and have defined a protocol format that targets EGFR exons 18,19,20,21, all of which need to be evaluated according to current guidelines [10]. Our protocol is designed to analyze 100-150 amplicons per run, corresponding to 25-35 samples screened for the four EGFR exons and reaches an analytical sensitivity of 1%. This analytical sensitivity is equivalent to that of the most sensitive real time based PCR methods currently available that are however mutation specific and therefore not able to identify all possible EGFR mutations present in the sample. With the analysis of 100-150 amplicons per run and given the features of the 454 GS-Junior sequencer the theoretical re-sequencing depth is approximately of 700 reads per amplicon. We obtained an average number of ~522 reads per amplicon with adequate coverage of all exons on all samples analyzed. Should an even higher analytical sensitivity and greater re-sequencing depth be necessary in individual cases, this can easily be accomplished by decreasing the number of amplicons (i.e. patient samples) analyzed per run.
The importance of using methods that are highly sensitive is underlined by the observation that the average proportion of tumor cells in the specimens was ~45% and therefore inferior to the general threshold of 50% required to reliably diagnose mutations with Sanger sequencing. In case of a negative mutational result all these cases would have required re-testing with one of the high sensitivity methods currently available or re-biopsy to obtain a higher proportion of neoplastic cells, causing additional costs and treatment delays. This issue is clearly illustrated by the samples shown in Figure 3, representative of many of our randomly selected cases for which EGFR analysis was requested. In these samples a large excess of non-neoplastic elements is intermixed so closely with the neoplastic cells that tumor enrichment is impossible by manual dissection and very difficult even using laser assisted microdissection. It has to be underlined that there is no guarantee that re-biopsying the patient would give a better specimen for molecular analysis, since an excess of nonneoplastic tissue is very common in those obtained from metastatic tumor sites and in many aggressive high stage carcinomas. Interestingly, while the proportion of tumor cells in the samples submitted for molecular diagnosis was clearly an issue, the number of neoplastic cells was not. Several thousand tumor cells were present in most specimens, including cytology samples. This observation supports the findings of several studies that have shown how cytology specimens can be utilized to predict the response of patients with lung carcinoma to TKIs using a variety of molecular methods to identify mutations [30,44,[58][59][60][61].
All mutations identified by conventional Sanger sequencing were also identified by targeted NGS, but the proportion of cases -14 out of 80 -in which NGS identified at least one additional EGFR mutation compared with the Sanger method is considerable. Overall 16 more mutations were detected, and all were observed with a percentage of reads that was below the sensitivity of conventional sequencing. We found a very good correlation between the proportion of tumor cell content in the specimen and the results of mutational analysis, similar to a pilot study that has recently addressed this issue using 454 NGS [54]. In half of the cases where NGS identified EGFR mutations not seen after Sanger sequencing the proportion of neoplastic cells was below 40%. In the other half of them the proportion of neoplastic cells was adequate, but the far superior sensitivity of NGS detected mutational events not seen by the Sanger method. The issue of the absolute number of neoplastic cells in the specimens subjected to NGS analysis has not been addressed by other studies. We observed that the absolute number of neoplastic cells was lower in discrepant cases were NGS identified mutations not seen by conventional sequencing. The difference did not reach statistical significance, but it appears that the separate parallel analysis of individual nucleotide sequences allows better resolution in those specimens where the absolute number of neoplastic cells is relatively limited.
Very importantly, in six of the 14 discrepant cases mentioned above targeted NGS identified exon 19 deletions or the L858R mutation not seen after Sanger sequencing, allowing the patient to be treated with TKIs. In one additional case (Table 5, case 57) the R831H mutation associated with resistance to TKIs [22] was identified in a tumor that was EGFR wild type after the Sanger method. This mutation was present in a small subpopulation of cells, corresponding to 3% of mutated reads and its identification would have been very difficult with conventional sequencing methods, including pyrosequencing.
By generating a quantitative assessment of the number of mutated reads and by defining the haplotype of the nucleotide changes, targeted NGS allows to discriminate between large populations of mutated cells and small subclones and can provide useful information as to whether mutations are present in the same population of cells or not. It may thus contribute important insights into to the clonal evolution of EGFR mutated cases [56]. Sanger sequencing identified one case (Table 5, case 68) with two distinct mutations -the L858R predictive of response and the T790M associated with acquired resistance to TKIs treatment. Targeted NGS found both mutations in a conspicuous number of reads, confirming the result of conventional sequencing. Since both mutations had the same proportion of mutated reads they were most likely present in the same neoplastic cell population. A third, much smaller population of mutated alleles, compatible with a small neoplastic cells subclone was also identified in the same sample. In five more specimens multiple mutations were identified only by NGS. In two of them there were two nucleotide variants with a similar proportion of mutated reads in the same specimen, suggesting that both variants may have arisen in the same neoplastic cells (Table 5, case 57 and 30). In three of the five cases the relative proportions of variant alleles was more consistent with the existence of a dominant mutated neoplastic cell clone and of a small neoplastic cell subset carrying the additional mutation. Interestingly in one these cases a P772S exon 20 mutation known to be associated with TKIs treatment response was identified in a very small proportion of reads in a case with a dominant population of neoplastic cells carrying one of the exon 19 deletions typically associated with sensitivity to TKI treatment, indicating that more than one favorable mutational event may be present in the same tumor (Table 5, case 2).
Although multiple mutations were more commonly observed in cases with a large amount of neoplastic cells, the number of neoplastic cells present in the specimen -in absolute term or relative to that of the "contaminant" non-neoplastic cell population -did not correlate with the ability of NGS to identify multiple mutations.
Since targeted NGS allows to screen the entire EGFR exon and because of its high analytical sensitivity we have identified in our series three uncommon mutations (T785I, F795S, V845M) that have been previously reported in lung carcinoma or in other tumors types, the clinical relevance of which is currently undefined [62][63][64][65][66]. We have also identified five mutations that have not been previously reported (P691T, K708N, C721W, S752F, D807G). Low amounts of template DNA and formalin fixation can cause random polymerase errors in nucleotide incorporation and sequencing artifacts that are usually C-T or G-A transitions [49,67]. In the five cases with previously unreported mutations thousands of neoplastic cells were present. Only two of the five cases were formalin-fixed biopsies and in only one case -a cytology specimen fixed in alcohol -the mutation was a C → T transition. Considering that most of the knowledge collected on the spectrum of EGFR mutations has been acquired through conventional sequencing methods it is not surprising that novel nucleotide variations may be disclosed by highly sensitive next generation methods. One recent study using the next generation Illumina HISeq2000 sequencing platform for the analysis of routinely processed lung carcinoma samples has identified both uncommon and previously unreported mutations with a rate very similar to ours [53]. Although the meaning of these findings clearly requires additional investigation, we do not believe that unexpected sequence variants should be discounted, also considering that some of these changes can modify the response to TKIs treatment and may be markers of resistance to targeted molecular therapy [53].
Methods to test EGFR or other mutations in specimens with unfavorable tumor cell content need to be very sensitive [30]. Testing algorithms with parallel duplicate analysis using both conventional sequencing and a highly sensitive method have been suggested as a strategy for these specimens, but this adds to cost and turnaround time, and requires additional DNA for the analysis. Mutational analysis of EGFR by NGS overcomes the issue of limited tumor amounts because it is highly sensitive. If the percentage of tumor cells is established in the sample that is going to be sequenced by pre-analytical microscopic examination, quantitative NGS data allow to easily distinguish dominant mutations from alterations found only in subclonal fractions of the tumor. We have observed only small discrepancies in the proportion of mutated alleles after the repetition of samples with DNA mutations, indicating that the reproducibility of our targeted NGS protocol is more than adequate.
It has to be pointed out that our NGS protocol requires a minimal amount of 2 ng to consistently obtain amplifiable DNA for the detection of the c.2470 G>A polymorphism and that it is effective with the quantity and quality of DNA that is currently obtained by limited formalin-fixed biopsies and routinely processed cytology samples obtained from the different medical centres that participated to this multicentric study. The only technical drawback of 454 NGS that we encountered is its inability to discriminate homopolymer sequences. This is a consequence of the pyrosequencing chemistry utilized by the 454 platform and may result in ambiguous base calls that can be misinterpreted as frame-shift mutations [47,48] (Figure 4F). Several studies are indeed demonstrating that the performance of NGS in the analysis of routine samples is superior to that of other sensitive techniques including conventional pyrosequencing, and highly sensitive mutation-specific methods like Therascreen and chip array hybridization [53,57,68].
One issue that may limit the application of NGS to the routine practice of molecular diagnosis is its procedure that is relatively labor intensive, and therefore unpractical for the ad hoc analysis of individual specimens as soon as they arrive to the laboratory. However, many samples can be analyzed at the same time, even for a considerable number of different genes. Our protocol -optimized for the analysis of 100-150 amplicons in one run -has been designed for the needs of a referral molecular diagnostic laboratory where requests for mutational evaluation of EGFR and other genes easily accumulate in a short time. Since the entire NGS analysis is accomplished in 2 working days, turnaround time requirements of 10 working days [10,11] can be effectively satisfied by grouping the evaluation of specimens in batches. Overall reagent costs per run are approximately 2,000 Euro. If 100 amplicons are analyzed in a given run the reagent cost per amplicon is 20 Euro, and even if that of technical operators is added the overall figures per sample are inferior to that of most commercially available kits for EGFR mutation detection.
In conclusion, we have defined a NGS protocol based on the 454 GS-Junior platform for the analysis of EGFR and validated it with unselected limited tumor samples routinely submitted for molecular diagnosis to three different Italian laboratories. Targeted NGS is robust, cost-effective and greatly improves the detection of EGFR mutations in lung carcinoma patients. Its use should be promoted for the clinical diagnosis of mutations in specimens with unfavorable tumor cell content.