Novel Insight into Mutational Landscape of Head and Neck Squamous Cell Carcinoma

Development of head and neck squamous cell carcinoma (HNSCC) is characterized by accumulation of mutations in several oncogenes and tumor suppressor genes. We have formerly described the mutation pattern of HNSCC and described NOTCH signaling pathway alterations. Given the complexity of the HNSCC, here we extend the previous study to understand the overall HNSCC mutation context and to discover additional genetic alterations. We performed high depth targeted exon sequencing of 51 highly actionable cancer-related genes with a high frequency of mutation across many cancer types, including head and neck. DNA from primary tumor tissues and matched normal tissues was analyzed for 37 HNSCC patients. We identified 26 non-synonymous or stop-gained mutations targeting 11 of 51 selected genes. These genes were mutated in 17 out of 37 (46%) studied HNSCC patients. Smokers harbored 3.2-fold more mutations than non-smokers. Importantly, TP53 was mutated in 30%, NOTCH1 in 8% and FGFR3 in 5% of HNSCC. HPV negative patients harbored 4-fold more TP53 mutations than HPV positive patients. These data confirm prior reports of the HNSCC mutational profile. Additionally, we detected mutations in two new genes, CEBPA and FES, which have not been previously reported in HNSCC. These data extend the spectrum of HNSCC mutations and define novel mutation targets in HNSCC carcinogenesis, especially for smokers and HNSCC without HPV infection.


Introduction
HNSCC is a disease with significant morbidity and mortality. It is the fifth most common cancer, which is responsible for 5% of all tumor patients, and accounts for 560,000 new cancer incidents and 300,000 cancer deaths worldwide [1]. More than 50,000 new cases of HNSCC are diagnosed in the United States yearly, with a mortality rate of 12,000 annually [2]. The survival rate is only 50% within 5 years after diagnosis [2]. This malignancy is highly related to habitual factors, such as smoking, alcohol consumption, and infection with human papilloma virus (HPV), which has been associated with the majority of oropharynx cancers [3,4,5]. Despite the advances in medical care and operative skills, mortality of HNSCC has not been significantly improved for the past three decades. Therefore investigation of the molecular biology of HNSCC can strongly enhance the development of modern chemotherapy agents, which will improve and prolong the lifespan of HNSCC patients.
To acquire the comprehensive genetic signature of HNSCC, two independent groups performed next generation targeted exon sequencing of tumor DNA and matched normal DNA for 32 [20] and 74 [21] HNSCC patients. Both groups confirmed mutation rate in the genes that were previously reported for HNSCC, including TP53, CDKN2A, PIK3CA, PTEN, and HRAS. Importantly both groups reported the mutations in NOTCH1 for the first time in HNSCC. Overall they have demonstrated 14-15% mutation rate for NOTCH1. Later on NOTCH1 was sequenced and mutations in this gene were confirmed for head and neck cancer cell lines [22]. Mutations in the genes of PI3K pathway were also further validated [16]. Additional new mutated genes in HNSCC were discovered by both Agrawal et al and Stransky et al [20,21].
In order to validate the discovery of NOTCH mutations in HNSCC and to evaluate the role of NOTCH pathway in HNSCC tumorigenesis we recently performed a comprehensive analyses of the genetic, epigenetic and transcriptional alterations of NOTCH pathway in a cohort containing 44 HNSCC tumor and 25 normal tissues [23]. HNSCC tissues were deep-sequenced for NOTCH1 mutation and demonstrated 10.8% mutation rate of this gene, similar to previous reports [20,21]. We also demonstrated a bimodal pattern of NOTCH pathway alterations in HNSCC, while NOTCH1 receptor is targeted by inactivating mutations in 10.8% of HNSCC, the other 32% of HNSCC have significant upregulation of NOTCH pathway, determined by ligand and receptor activation, expression, and copy number increase [23].
In this work we extend our initial analysis of HNSCC mutational landscape, and annotate additional genetic alteration events in HNSCC. We employ next generation sequencing techniques to profile samples from 37 HNSCC patients from Johns Hopkins University. Based on our prior HNSCC mutation reports and publications reporting highly mutated genes in different tumor types, we select 51 highly actionable cancer related genes, including TP53, EGFR, HRAS, SRC, ABL1, PI3K, and others [3,6,9,12,15,16,20,21,24,25,26]. These 51 genes were selected from COSMIC, the Catalogue Of Somatic Mutations In Cancer, based upon the spectrum of mutations in individual cancer types with established or potential role in HNSCC tumorigenesis [27]. As a result, this study allows us to gain a comprehensive understanding of the established and additional abnormal genetic alterations in HNSCC.
The deep sequencing analysis that was performed in this work allows us to successfully confirm the mutation rates for HNSCCrelated genes, such as TP53, NOTCH1, CDH1 and PIK3CA genes. We also report here the association of smoking and increased mutation rates. Moreover, 38% of the point mutations discovered in this study have not been previously reported in the COSMIC database [20,21,27]. New genes, CEBPA and FES, are identified to be mutated in HNSCC. These are known tumor suppressor (CEBPA) and proto-oncogene (FES) that have been shown to be mutated in several tumor types [28,29,30]. Adaptation of targeted therapy for the mutated forms of these genes has the potential to improve clinical outcome of individual HNSCC patients carrying such mutations.

Tissue samples
Primary tumor tissues and matched lymphocytes were collected from 37 HNSCC patients at Johns Hopkins Hospital. Every participant signed a written informed consent before participating in this study. This study was approved by Johns Hopkins Medicine Internal Review Board (JHM IRB), and performed under approved research protocol NA_00036235. All samples were stored at 2140uC (liquid nitrogen) until use. All cancer samples were examined by board certified Pathologists from the Pathology Department of Johns Hopkins Hospital, JHH (WHW and JB). Tumor samples confirmed as HNSCC were subsequently microdissected to yield at least 75% tumor content. The clinical characteristic of the study cohort is listed in Table 1. These samples were adopted from a previously reported discovery cohort, used for analysis by multiple high-throughput platforms including DNA copy number, methylation, and expression analysis [23].

DNA preparation
Microdissected tissue samples and collected lymphocytes were digested in 1% SDS (Sigma) and 50 mg/ml proteinase K (Invitrogen) solution at 48uC for 48-72 hours for removal of proteins bound to DNA. DNA was then purified using standard phenol-chloroform extraction and ethanol precipitation methods as previously described [31,32]. DNA was resuspended in LoTE buffer (EDTA 2.5 mM and Tris-HCl 10 mM, pH 7.5). FFPE

Content selection for targeted exon next-generation sequencing
The SuraSeqH 7500 enrichment array was developed for deep sequencing analysis across many cancers and included a total of 51 genes selected from the COSMIC mutation database v58 (see Table 2 for details). Many of these genes, such as TP53, HRAS, RB1, PI3K and other have been previously shown to be mutated in many tumors, including HNSCC [9,16,20,21]. Other genes, such as ABL1, GATA1, PAX5 are involved in key cancer-related pathways, and known to be mutated in other tumor types, including leukemia, pancreatic and lung cancer [27,33,34,35,36]. Expression of several genes, including HIF1A, is known to be altered in HNSCC, but their mutational status was never evaluated. Twenty-eight of the selected genes were analyzed by targeted ''hotspot'' sequencing. These include genes in which mutations preferentially occur in specific loci (hot spots) and they were selected for targeted sequencing in those specific regions. Such regions contained predominantly single nucleotide substitutions within the selected exons. Twenty-three more genes with a more distributed mutational profile across many exons, such as tumor suppressors, were selected for sequencing of entire coding exons (whole exon sequencing), covering 85-100% of the coding sequence. Refer to Table 2 for more details.

Target enrichment
Exon sequencing was performed by Asuragen, Inc. (Austin, Texas). Gene-specific primers were designed to amplify products up to 200 bp. Genomic DNA from HNSCC tumor tissues and matched lymphocytes, used as a putative germline reference, was fragmented to an average size of ,4 kb using the Covaris S220 (Covaris, Woburn, MA). All DNA samples were evaluated for the extent of fragmentation following analysis using E-gels (Life Technologies). Fragmented genomic DNA (500 ng) was then merged with an emulsified ,2000 member primer library (SuraSeqH 7500) using the RainDance RDT 1000 platform. The RDT 1000 instrument is a microfluidic chip-based platform that incorporates microdropletbased technology to amplify hundreds to thousands of genomic loci with high specificity and uniformity [37]. Templates within merged droplets were amplified using the following PCR conditions: 1 cycle of 94uC for 2 min, 55 cycles of (94uC for 15 sec; 54uC for 15 sec; 68uC for 30 sec), 1 cycle of 68uC for 10 min and 4uC hold. Following emulsion breaking, the resulting PCR products were purified using Qiagen MinElute kit according to the manufacturer's instructions. A fraction of the purified PCR products was examined for size and quantity using the Agilent Bioanalyzer Lab-on-a-chip DNA 12000 and the Nanodrop, respectively.

Deep sequencing with the Illumina GAIIx platform
Following gene-specific PCR, a tagging PCR reaction was performed to append unique barcode sequences to the genespecific products from each sample and to add adapters specific for sequencing on the Illumina GAIIx platform. Samples were fingerprinted for multiplex sequencing using one of 48 barcodes (Illumina). Purified products (10 ng) were tagged using the following conditions: 1 cycle of 94uC for 2 min, 10 cycles of (94uC for 30 sec; 56uC for 30 sec; 68uC for 1 min), 1 cycle of 68uC for 10 min, and 4uC hold. These PCR products were then pooled and purified using Qiagen MinElute PCR purification kit, and quantified using the KAPA Library Quant kit (KAPA Biosystems, Cape Town, South Africa) following the manufacturer's instructions. All samples were normalized to 8.6 nM and pools of 15 samples per lane were prepared. Flow cell preparation and data acquisition were completed using Illumina's recommended protocols. Paired-end sequencing runs (26151) were performed using the Illumina GAIIx platform.

Mutation Confirmation on MiSeq
Samples from patient X15 and X33 were selected for targeted confirmation sequencing on the MiSeq, spanning custom amplicons. All sequencing was performed by Asuragen. Genespecific primers were designed to amplify targeted products up to 106 bp, with fully assembled products (containing adapter sequences compatible for the MiSeq) up to 233 bp. Templates were amplified as previously described [38]. For each sample, PCR products with Illumina adapters and barcodes (5 mL) were purified and normalized using 5 mL of AxyPrep TM Mag PCR Normalization and Clean-Up System, according to the manufacturer's recommendation (Axygen, Union City, California). Samples were eluted with 25 mL (EB-N) and equal volumes (2 mL) were pooled to make a single multiplex library. The pooled library was quantified using the KAPA Library Quantification Kit following the manufacturer's instructions. The library was then denatured and prepared to 13.6 pM in the presence of 15% PhiX (2.4 pM) to increase base diversity. Flow cell preparation and sequencing on the MiSeq was performed using version 3 chemistry, as recommended by the manufacturer's protocol.

Somatic mutation calling
The sequence read data generated from the GAIIx were demultiplexed, adapter and primer sequences were removed and trimmed to retain high quality data (q20 or higher) [37,39,40]. The sequence read data were filtered, aligned and variant scores calculated as previously described [38]. We retained only the high coverage regions for analysis, where high coverage was defined as having greater than 2% of the sample median coverage and above 100 reads. We also flagged the loci that are known to be associated with systematic positives. For each set of matched samples, we filtered out variants that were present in the lymphocytes as putative germline variants, or as sample-specific systematic error. In our analysis, if the matched normal loci were found to have greater than 1% variant reads, or the variant score difference was within the 99.5% percentile of all pair-wise differences for nonannotated loci across the matched pair, the variant was removed from consideration. The used mutation read frequency threshold, 6%, has positive prediction value with over 90% confidential threshold. Variants were annotated using gene structure from snpEff version 2.1b. Coding base substitutions are classified as missense, nonsense, splice site, or silent. All sequencing data are available in NCBI bioProject database. The SRA Accession Number is SRR1055850

Statistical analysis
P-values were calculated using Fisher exact test, comparing mutation event or mutation rate with clinical characteristics.

Clinical characteristics of the 37 HNSCC tumors
To discover HNSCC genetic signature and new genetic mutations, we have performed selected deep exon sequencing on 37 HNSCC tumors and their paired normal lymphocytes (Table 1). The characteristics of the HNSCC population largely reflect the demographics of head and neck cancer patients in the United States ( Table 1). The HNSCC patients were largely male (70%, 26 of 37) and Caucasian (95%, 35 of 37), age 34 to 87 years (median 6 SD = 58612 years). There was a history of tobacco and alcohol consumption in 65% (24 of 37) and 57% (21 of 37) of all patients, respectively, with average smoking history being 25.

Mutation spectrum in 37 HNSCC tumors
In our analysis, we have found a total number of 26 disruptive mutations detected in 46% (17 of 37) HNSCC patients (Figure 1). These mutations include 21 non-synonymous and 5 stop-gained mutations ( Table 3) that were detected in 11 out of the 51 genes being sequenced, including p53, NOTCH1, and CDH1. We have also detected other mutated genes, such as PIK3CA, PIK3R1, CDKN2A, and FGFR3, which were previously implicated in HNSCC tumorigenesis [13,16,21]. Of note, 10 of the detected 26 site-specific point mutations (38%) were never reported in the COSMIC database for any cancer type [27].
Overall, TP53 was found mutated 13 times in 29.7% (11 of 37) HNSCC patients, and FGFR3 was found mutated in 5.4% (2 of 37) HNSCC patients ( Figure 1 and Table S1). NOTCH1 mutations were found in 8.1% (3 of 37) patients, similar to prior reports [20,21,23]. Together with mutated genes that were previously identified and reported for HNSCC, we have discovered new mutation events for HNSCC. The novel mutated genes include: CEBPA and FES (Table 3 and S3). Other mutations can be found in Figure 1 and Tables 3 and S1.
We validated the CEBPA and FES mutations in matched FFPE tumor DNA using the SuraSeq 7500 enrichment panel, and also in frozen tumor tissue using a targeted amplicon PCR based library preparation couple with sequencing on the MiSeq platform (Illumina). Confirmatory analysis yielded similar results with 46% and 14% read frequency for CEBPA and FES mutations in samples X15 and X33, respectively (Table S3). The high mutation read frequency rate, 29% (CEBPA) and 30% (FES), was detected in FFPE DNA, confirming our original discovery. FES mutation was also validated by the analysis of HNSCC-TCGA data and demonstrated 1.4% mutation rate of this gene in HNSCC (Table S1).

Correlation with clinical data
We evaluated the association of mutations with clinical characteristics, such as tumor site, tumor stage, tumor recurrence, patient gender, race, age, HPV infection, follow-up period, disease status, smoking, and alcohol consumption status. We have found correlation of mutation status with smoking. As expected, more mutations were identified in tumor samples from patients with a history of tobacco consumption: 0.25 mutations in non-smoking group vs. 0.79 mutations in smoking group on average. Thus, at least one mutation was found in 50% of smoking patients vs. only 25% of non-smoking patients (Figure 1). Similarly, 71% of patients with somatic tumor mutations were found to be smokers, while no more than 60% of patients without any detected mutations in our study had history of tobacco use. Interestingly, there were no mutations detected in the commonly mutated TP53, NOTCH1, and CDH1 genes within the non-smoking group from our study. In general there were only 2 mutation events in the non-smoking group both in FGFR3 gene. Interesting, no FGFR3 mutations were found in the smoking population.
In agreement with previous reports [20,21,26] we have found increased TP53 mutation rate in the HPV negative population. On average the HPV-negative population has a TP53 mutation rate that is 4-foles lower that TP53 mutation rate in the HPVpositive population, 0.43 and 0.11 mutations, respectively. No other correlations of clinical characteristics with mutation status were detected in our study group.

HNSCC mutational landscape
HNSCC is a complex disease, usually characterized by accumulation of genetic and epigenetic alterations [7,8,9,20,21]. In this work, we have performed a comprehensive next generation exon sequencing of 51 tumor suppressor genes, oncogenes, currently actionable genes, and potentially actionable genes, that were selected based on frequency of mutation across many cancer types, including HNSCC. We were able to support our prior discovery of HNSCC mutations, as well as to discover new genetic alterations in HNSCC. Moreover, among the mutated 11 genes, that were previously shown mutated in HNSCC and/or other tumor types, we have discovered new sites of point mutations for 38% of them. In order to see the general picture of the mutation events, we have drafted the common pathways including discovered 11 genes ( Figure S1). The majority of the mutations were found in trans-membrane receptors or in the signal transduction pathways, such as PI3K/AKT and Wnt/CDH. The other group of mutated genes belongs to transcription factors ( Figure S1) Correlation with the previous data Our data are highly consistent with previously published reports from our group and other investigators. Thus, TP53 was found mutated in 47% [20] in 62% [21] and in 42% of HNSCC tumor samples, as reported by COSMIC [27]. Our analysis detected 30% of HNSCC samples harbored TP53 mutations (Figure 1 and Table S1). The lower rate of the detected TP53 mutations may be explained by the fact that the 59UTR and 39UTR of TP53 were not covered by the selected panel. Agrawal and Stransky groups have reported 14% and 15% mutations in NOTCH1 [20,21]. We have supported their data and have noted 8.1% mutation rate for NOTCH1. Stransky and colleagues [21] also found 2 mutations of CDH1 in 2.7% patients, matching our results. PIK3CA was found mutated in 4.1% to 10.6% of HNSCC patients [16,21]. We were able to detect PIC3CA in one tumor sample (2.7%). Similar rate of mutation was found for PIK3R1, 2.6% in [16] and 2.7% in our work. Our results are in agreement with the data from Stransky and colleagues showing low incident (5.4%) of FGFR3 mutations in HNSCC population (1.35% in [21]). On the other hand, the G2128T (pG697C) mutation of FGFR3 was previously found in 62% of OSCC [41]. However, the question of FGFR3-pG697C being a polymorphism, rather than a mutation remains unclear.
We have included several genes into deep-sequencing analysis, which were previously reported in several tumor types, but not in HNSCC. These are ABL1, AKT1, CEBPA, FOXL2, GATA1, IDH1, IDH2, VHL and more ( Table 2). Two of them were found mutated in our study in HNSCC for the first time: CEBPA and FES.

Novel mutations
CEBPA and FES single mutations were detected in our experiments. The incidence of these novel mutations cannot be accurately ascertained due to the sample size. Higher rate of mutation for those genes could be confirmed with a larger sample size.
CEBPA, CCAAT/enhancer-binding protein alpha, is a tumorsuppressor gene. Its activity was diminished by the multiple mutations found in different tumor sites [30]. The mutated fraction for CEBPA was as high as 44% (Tables S2 and S3). We have further confirmed this mutation in a matched FFPE tumor DNA sample for this novel mutation in HNSCC. This G-to-C mutation was detected in FFPE DNA at 29%. We have further validated this novel mutation in CEBPA through ultra-deep sequencing of the frozen tumor DNA using Illumina MiSeq platform (Table S3). Interestingly, pR323G mutation found in CEBPA protein lies within highly mutated Leucine zipper domain. Argenine at the position 323 was never shown to be substituted by mutations in any other cancer types, including cervix cancer and leukemia [27].
In case of non-receptor protein-tyrosine kinase FES, pS96L mutation is a new addition to the mutation library for this protein.
It lies just outside of FCH domain that allows binding to cytoskeleton. Such mutation was never found for any other cancer types, even among esophageal and lung adenocarcinomas, known for FES mutations [29,42]. Both the SuraSeq 7500 enrichment panel and the confirmatory assays on the MiSeq platform detected the C-to-T mutation at 12% and 14%, respectively. This mutation was also detected in the matched FFPE tumor sample at 30%. 279 HNSCC patient cohort of TCGA detected FES mutation with 1.4% mutation rate (Table S1), but none of TCGA cohort mutations targeted S96 amino acid.

Mutation rate and smoking
Among the 37 HNSCC patients participated in this study 24 (65%) have history of tobacco use. The smoking population of this study carries the majority (73%) of all detected mutations with 0.79 mutations per sample, which is 3.2-fold higher than number of mutations in non-smoking group (22%, 8 out of the 37 HNSCC). The non-smoking group bearing overall 2 mutations each found in 2 out of 8 patients. Both mutations targeted FGFR3, not found mutated in any smoking patient in our study. The other 6 non-smoking patients did not have any detected mutations. The additional 5 patients from our study group did not specify their smoking history. Interestingly, within the smoking group, the mutation was as high as 37.5% for TP53. None of the newly discovered mutations in CEBPA and FES genes were found in the non-smoking group.

TP53 mutations and HPV
Overall 30% of studied HNSCC samples harbor TP53 mutation. This rate is lower than previously reported TP53 mutation rate in COSMIC (42%) and in our prior report (47% in [20]), see Table S1 for details. The reason why such a low incidence (30%) of TP53 mutations found in this cohort is unclear. We had nine HNSCC patients with HPV infection in oropharynx out of total 37 samples in our study group. Only one of them harbor TP53 mutation, similar to results in [26]. Other reports could not detect any TP53 mutations in HPV-positive samples, which could be explained by the underrepresentation of HPVpositive samples in their cohorts [20,21]. In agreement with these prior reports [20,21,26] we detected higher TP53 mutation rate in HPV-negative patients: 0.43 in HPV-negative vs 0.11 in HPVpositive patients, While other reports demonstrated higher rate of TP53 mutations in HPV-negative group: from 73% to 100%, we have found only the modest number, 37.5%, of HPV-negative patients to harbor TP53 mutation (Figure 1). This fact can be explained by the fact that we did not analyze 59UTR and 39UTR of TP53.

Summary
Using targeted exon sequencing in primary HNSCC, we have validated and confirmed prior published mutational data, and identified 26 point mutations in 11 out of 51 analyzed genes. Although most of the mutated genes have been described in other solid tumors, mutations in two of them have not been previously detected in HNSCC. Among these 11 genes, 4 and 8 were also found mutated in the studies from our group and Stransky group, respectively. Of note, 10 out of the 11 genes were found mutated in the studies from TCGA group. The common genes identified mutations in our studies and the other three studies included TP53, NOTCH1, and PIK3CA. Yet only gene mutations in 10 out of 11 reported genes were found recurrently mutated in these three studies. Agrawal's group reported 3 out of 11 genes to be recurrently mutated: TP53, NOTCH1 and PIK3CA. Stransky discovered these three as well as CDKN2A, CDH1 and PIK3R1 genes to be recurrently mutated. On the other hand, among the 26 non-synonymous or stop-gained mutations: 12 were found in other three studies mentioned above and only 6 of them were found recurring in these three studies. Of note, CEBPA was not previously reported in any cited HNSCC sequencing studies, and was further thoroughly validated in our study (including MiSeq ultra-deep validation and FFPE DNA sequencing reevaluation). Altogether, low frequency of the mutations was recurring in HNSCC and reflects extent of tumor heterogeneity. The mutation rate and functional consequence of newly discovered mutated genes in HNSCC will be further investigated.
The role of newly discovered mutated genes, as well as the role of novel point mutations in previously reported genes, requires further validation using a larger HNSCC cohort and understanding of their role in HNSCC development and metastasis. The utilization of the newly adopted Asuragen SuraSeqH 7500 platform for deep sequencing analysis of additional FFPE DNA samples will be further investigated. The mutation status of additional genes, including FBXW7, Caspase8, and Fat-1 [20,21] in HNSCC will be also investigated in the prospective study. Such insight is particularly needed for HNSCC populations with a prior or current history of smoking and without HPV infection. In the future, some of these novel mutations identified here may be considered for personalized targeted therapies in HNSCC patients. Figure S1 Pathways alterations in HNSCC. 11 genes with detected mutations were used to draft simplified pathways, altered in HNSCC. Several trans-membrane receptors, transcription factors and members of signal transduction domains were found mutated during this study. The PI3K/AKT and Wnt/CDH1 pathways were altered through several mechanisms. The rate of mutation for each individual gene is reported below the gene name. Red, blue and green color stands for oncogenes, tumor suppressor genes and for genes with dual function, respectively. (TIF)