Meta-Analyses of Microarray Datasets Identifies ANO1 and FADD as Prognostic Markers of Head and Neck Cancer

The head and neck squamous cell carcinoma (HNSCC) transcriptome has been profiled extensively, nevertheless, identifying biomarkers that are clinically relevant and thereby with translational benefit, has been a major challenge. The objective of this study was to use a meta-analysis based approach to catalog candidate biomarkers with high potential for clinical application in HNSCC. Data from publically available microarray series (N = 20) profiled using Agilent (4X44K G4112F) and Affymetrix (HGU133A, U133A_2, U133Plus 2) platforms was downloaded and analyzed in a platform/chip-specific manner (GeneSpring software v12.5, Agilent, USA). Principal Component Analysis (PCA) and clustering analysis was carried out iteratively for segregating outliers; 140 normal and 277 tumor samples from 15 series were included in the final analysis. The analyses identified 181 differentially expressed, concordant and statistically significant genes; STRING analysis revealed interactions between 122 of them, with two major gene clusters connected by multiple nodes (MYC, FOS and HSPA4). Validation in the HNSCC-specific database (N = 528) in The Cancer Genome Atlas (TCGA) identified a panel (ECT2, ANO1, TP63, FADD, EXT1, NCBP2) that was altered in 30% of the samples. Validation in treatment naïve (Group I; N = 12) and post treatment (Group II; N = 12) patients identified 8 genes significantly associated with the disease (Area under curve>0.6). Correlation with recurrence/re-recurrence showed ANO1 had highest efficacy (sensitivity: 0.8, specificity: 0.6) to predict failure in Group I. UBE2V2, PLAC8, FADD and TTK showed high sensitivity (1.00) in Group I while UBE2V2 and CRYM were highly sensitive (>0.8) in predicting re-recurrence in Group II. Further, TCGA analysis showed that ANO1 and FADD, located at 11q13, were co-expressed at transcript level and significantly associated with overall and disease-free survival (p<0.05). The meta-analysis approach adopted in this study has identified candidate markers correlated with disease outcome in HNSCC; further validation in a larger cohort of patients will establish their clinical relevance.


Introduction
Molecular profiling of tumors, including Head and Neck Squamous Cell Carcinoma (HNSCC), has been employed to understand the mechanisms of carcinogenesis as well as the underlying reasons for treatment resistance. The increasing global incidence of HNSCC along with the lack of significant improvement in overall survival rate during the past four decades, necessitate newer approaches to understand the molecular mechanisms of the disease and to identify potential markers for early detection, prognosis and as targets for therapy. Molecular profiling in combination with patient validation in cancers of the prostate, ovary and breast has led to clinical application of marker panels such as Mamma print and Oncotype Dx. [1,2]. Similar attempts to establish diagnostic or prognostic molecular markers in HNSCC for clinical application have remained elusive.
High throughput molecular profiling using techniques that catalog the differences at the whole genome, transcriptome [3] [4] [5] and proteome [6] levels have been carried out in HNSCC. These studies have catalogued markers of progression [7] [8] and response to treatment [9]. Nevertheless, translation of these markers for clinical benefit has not been achieved due to the challenges associated with high throughput techniques in terms of marker selection and the need for large scale patient validation. These challenges are further confounded due to issues such as significant discordance between different high throughput studies, attributed to differences in sample processing, type of platforms used and the analytical algorithms applied [10] [11]. This problem is further augmented in transcriptomic studies primarily because transcript level changes can vary based on stage, tissue-specific and spatial fluctuations in expression of various genes.
Analytical strategies that can utilize existing databases and identify markers concordant across studies may be a beneficial approach to narrow down to markers of increased confidence and clinical applicability. Meta-analysis, referring to the analysis of publically available datasets, could hence, potentially enable identification of clinically relevant pool of markers; many such studies have been reported in cancers of different sites. Meta-analysis based approaches in pancreatic cancer have led to identification of novel targets and candidate biomarkers [12]. In prostate cancers, markers causal to the carcinogenic process were identified using a similar approach [13]. In addition, markers highly associated with prognosis and treatment outcome prediction in breast cancer [14] [15] were also identified through similar meta-analysis based approaches. The primary objective of this study was to carry out a cross-platform, cross-study meta-analysis of publically available microarray datasets in head and neck cancer, identify markers of biological relevance through a common analytical pipeline and subsequently validate them in clinical samples.

Search Criteria and Data Mining
Analysis of the publically available microarray datasets was carried out as per the PRISMA guidelines [16]. The public databases, Gene Expression Omnibus (GEO) (NCBI-http://www. ncbi.nlm.nih.gov/geo/) and Array Express (EBI) [http://www.ebi.ac.uk/arrayexpress] were searched for the presence of raw data of microarray experiments carried out in head and neck cancer. The series selected for the analysis included data from i) studies carried out in Head and Neck Squamous Cell carcinoma patients ii) studies including treatment naïve patients and iii) studies carrying out global profiling of transcriptomics using high-density arrays. Studies/ samples including thyroid, oropharynx and nasopharynx were excluded from the study due to their varied etiologies. The two platforms included in the analyses were Affymetrix [Affymetrix Inc., California, USA] and Agilent [Agilent Technologies, California, USA]. The raw data series profiled by both the platforms were grouped based on the individual technology and each technology (of either platform) was included in the analysis if at least 2 series were available in the database. The general work flow followed the stepwise protocol suggested by Ramasamy et al for carrying out meta-analysis [17]. The basic work algorithm is detailed in Fig 1.

Data Analysis Using Gene Spring
The raw data files used for the analyses included .CEL (Affymetrix platform) and .TXT (Agilent) files. Meta-analysis was carried out using Gene spring software (http://genespringsupport.com) [v12.5, Agilent, California, USA]. The raw data pertaining to each chip was uploaded onto the software wherein the samples were baseline transformed and normalized by Robust Multi-array Analysis (RMA) in Affymetrix or 75 th percentile in Agilent platforms (single color). The individual sample files were then classified into 'normal' and 'tumor' and reanalyzed as a single experiment. Gene level experimental data (arithmetic mean of all probes mapping to the same probe ID) was generated and quality control was carried out using Principal Component Analysis (PCA) in GeneSpring. The samples that were outliers in the PCA plots were removed and clustering carried out subsequently to ensure a clear stratification between the two categories of samples (Normal and Tumor). The process was carried out with multiple iterations to obtain a refined separation. Fold change analysis was then executed on the samples following which unpaired t-test (unequal variance) was performed to obtain significant gene entities. The p-value computation (asymptotic) and multiple testing correction (Benjamini Hochberg FDR) were further performed to obtain gene entities with p-value <0.05 and fold change (FC) of > 2.0. Inbuilt pathway analysis was carried out using the identified gene list. The significant gene list and pathways across the different technologies of Affymetrix were compared using the gene symbol/pathway names as an identifier. The individual gene entity list from each technology were extracted from the Genespring software and exported to excel files. The concordant gene entities as well as common pathways were identified across the technologies by using Microsoft Access (Microsoft Inc. WA, USA). Similar approach was adopted to identify all the common genes and pathways across the two platforms of Affymetrix and Agilent (Fig 1).

Functional Annotation Using Multiple Web Resources
The concordant gene list across the different platforms was analyzed in the different web resources to assess functional classes, protein-protein interactions and miRNAs targeting the genes. Gene Ontology Analysis was performed using PANTHER (Protein Annotation through Evolutionary Relationship) classification system (http://www.pantherdb.org/) [18] to evaluate the functional classes of the genes. The STRING database (STRING v9.1) (http://string-db. orgnewstring_cgi) [19] was used to predict and catalog the protein-protein interactions between the concordant genes. The miRNAs that target these genes were investigated using the miRWalk2.0 database (http://www.umm.uni-heidelberg.de/apps/zmf/mirwalk/index.html) [20].

Patient Based Validation
The concordant gene list was also cross compared with the TCGA database (http://www. cbioportal.org) [21] for their mutation, copy number variation (CNV) and mRNA expression status. The significant gene panel identified was also assessed for their co-expression data, overall and disease-free survival in correlation with their expression patterns in the TCGA cancer studies (Head and Neck Squamous cell carcinoma, Provisional; N = 528 samples).
The validation of selected genes was also carried out in HNSCC patients using quantitative real time PCR. The study was approved by Narayana Hrudayalaya Hospitals Institutional review board [NH/IRB-CL-2012-058]. Surgical samples of patients were selected from the bio repository of the Head and Neck Oncology department, Mazumdar Shaw Medical Center, Bangalore. All samples were collected after written informed consent. Previously untreated and recurrent patients diagnosed with HNSCC of all sites (excluding thyroid, oropharynx and nasopharynx) and stages during the period March 2010 to 2014, with details of follow up, were included in the study. The normal tissue samples were collected from healthy volunteers during the dental extraction after written informed consent. The demographic, clinical and pathological data of the patients were obtained from the electronic medical records and the patients were followed up for a period of 2 years.
The patients were categorized based on their Human papillomavirus (HPV) status as identified by p16 immunohistochemistry using established protocols. Formalin fixed paraffin embedded sections of the patients were collected from the department of pathology at the center. Briefly, tumor sections (5 micron) were de-paraffinised and treated with 3% hydrogen peroxide to stop endogenous peroxidase activity. The sections were incubated with the primary antibody (anti-p16, BioGenex, Fremont, CA, USA) and subsequent detection was carried out using the Horse Radish Peroxidase (HRP) system as per manufacturer's instructions (Dako REAL™ EnVision™ Detection System, Denmark). Sections processed similarly but without the primary antibody were taken as negative controls. The scoring of the stained slides was as per established studies [22]. The slides were evaluated using light microscopy and scored on a 0-3 scale taking into account percentage positivity and intensity of staining; 0 (complete absence of tumor staining), 1 (weak staining of tumor cells), 2 (< 50% tumors cells stained with moderate intensity) and 3 (>50% tumor staining with moderate/intense staining).

Marker Profiling for Validation by Quantitative Real Time PCR (qPCR)
RNA was extracted from the samples, archived in RNA Later (Ambion, Thermo Fisher Scientific, Massachusetts, USA), using TRIZOL reagent (Sigma Aldrich, MO, USA) and treated with DNase (Fermentas, Thermo Fisher Scientific, Massachusetts, USA) and assessed for contamination by Polymerase Chain Reaction (PCR). The integrity and quality of RNA was ascertained by gel electrophoresis and the 260/280 ratio. 1μg of RNA (260/280: 1.8-2.0) was converted into complementary DNA by the High Capacity cDNA Conversion kit (Applied Biosystems, CA, USA) as per the manufacturer's instructions. The selected markers were profiled using specific primers listed in S1 Table and all primers were evaluated for specificity using Basic Local Alignment Search Tool (BLAST) analysis (National Center for Biological Information, NLM, US). The qPCR efficiency was assessed using the slope of the linear regression model as per standard protocols. The Cp was measured across multiple serial dilutions of each cDNA to obtain the standard curve and the corresponding slope. The efficiency is calculated using the formula, Efficiency = 10 (-1/slope) . All real time PCR reactions were performed in triplicates using the Roche Light Cycler 480 real time PCR system (Roche Diagnostics, Germany) or the ABI Step One Real Time Machine (Applied Biosystems, CA, USA) using the Kapa SYBR Green PCR Master Mix (Kapa Biosystems, MA, USA). The thermal cycling conditions were 50°C for 2 min followed by an initial denaturation step at 95°C for 10 min, 45 cycles at 95°C for 30s, 60°C for 30s and 72°C for 30s. The experiments were carried out in triplicate for each data point. The Crossing point (Cp) and the subsequent analysis was carried out using the Second derivative max method in the software (Roche Diagnostics, Basel, Switzerland). All reactions were confirmed for their specificity by melting curve analysis of sample and the no-target control (NTC). A set (N = 4) of reference genes [(Glyceraldehyde-3-Phosphate Dehydrogenase (GAPDH), 60S acidic ribosomal protein P0 (RPLP0), 18S ribosomal Ribonucleic Acid (18SRNA) and β-Glucuronidase (GUS)] were assessed using the RefFinder [23] in a subset of the patient cohort (N = 26) and two of the best Reference Genes (RGs) were selected for further analysis. The relative change in expression was evaluated using the geometric mean of the selected reference genes [24]. The expression in normal samples served as the calibrator.

Statistical Methods
Statistical analysis was performed using STATA11.1 (College Station, TX, USA). Receiver Operating characteristic (ROC) curve analysis was used to evaluate the predictive power of each of the biomarkers, the optimal cut point that yielded the maximum sensitivity and specificity was determined for each biomarker. ROC curves were then plotted on the basis of the set of optimal sensitivity and specificity values. Area under the curve (AUC) was computed via numerical integration of ROC Curves. The biomarker that has the largest area under the ROC curve was identified as having the strongest association with the presence of head and neck tumor. The sensitivity and specificity of the association of the markers with recurrence/rerecurrence was analyzed using Clinical calculator (http://vassarstats.net/clin1.html).

Data Mining
Based on the search criteria, data mining of the different databases identified a total of 42 series [Affymetrix platform: N = 32, Agilent: N = 10]. After a further filtration of the data based on the inclusion and exclusion criteria, 18 series from Affymetrix and two from Agilent were included in the study ( Table 1). The technology and chips that were incompatible with the analytical pipeline and wherein only one series was available were excluded from the study. The . The sub sites investigated in the selected series were buccal mucosa, tongue, larynx, pharynx, and alveolus and retro-molar trigone. The normal samples were from gingivobuccal site in all the series. The series from each chip within each platform were analyzed separately using Gene spring analysis software and as per the analytical pipeline to identify the concordant gene entities lists.
Analysis across a single platform. In the Affymetrix platform, 18 series were analyzed, which included studies carried out on U133 plus 2 (N = 13), U133 A_2 (N = 2) and U133A (N = 4) chips; in one of the series, data was analyzed using two different chips (Table 1). Five series of U133 plus 2 were excluded since the samples were outliers during PCA and clustering; a total of 373 samples were included in the final analysis (N = 122, T = 251). The statistically significant gene list with a FC >2.0 and a p-value of <0.05 was used for further analysis. A total of 12,079 genes were identified after the analyses of U133 plus 2.0 while 4134 were identified from U133A and 3438 genes gene from U133A_2. Microsoft access analysis across these gene lists revealed a list of 965 gene entities of which 64.46% (N = 622) entities are concordant across the 3 chips (585 up regulated and 37 down regulated) (S1A File) and 35.54% (N = 343) of the genes showed mismatched trends of regulation. Similarly, an assessment of the concordance and discordance in regulation carried out across the technologies showed different trends (U133 Plus 2 Vs U133A Concordance = 76.53%; U133 Plus 2 Vs U133A_2 Concordance = 58.27%; U133A Vs U133A_2 Concordance = 86.11%).
The common pathways across the technologies were identified by comparison of their individual pathway lists. The list (N = 54) was then sorted based on the percentage of matched entities with respect to the total number of pathway entities. Twenty one pathways were identified with 30% or more matched entities in all the 3 technologies, among which a majority of them are cell cycle related pathways (S1B File).
In the Agilent platform (4x44k G4112F), 2 series, from which a total of 44 samples (N = 18, T = 26) were included in the final analysis A total of 8493 genes (up and down regulated) were identified from the analysis (FC>2.0; p<0.05); 338 of them being highly significant (p<0.001) with high fold change differences (FC>25 fold). Applying a cut off of 50% of matched entities from the total number of entities, 43 pathways were identified. The most significant pathways (> 65% of matched entities) identified were inflammatory response and extracellular matrix organization (ECM) (S1C and S1D File).
Cross-platform Analysis. The list of genes commonly identified between the two platforms revealed a total of 402 genes (p<0.05 and FC > 2.0) of which 181 (45.03%; 13 down regulated and 168 up regulated) were concordant and 221 (54.97%) discordant in the regulation   (Fig 2). The concordant list when further analyzed to identify the experimentally validated gene targets for miRNA expression from the miRWalk2.0 database, two major families of miRNA [let (N = 17) and mir (N = 47)] were identified, which targeted more than 5 genes from the list (S3 File).

TCGA Database and Experimental Validation of the Genes in HNSCC Patients
The 181 gene list was cross compared with the TCGA database (http://www.cbioportal.org) to assess the mutation, copy number and mRNA expression status in the head and neck cancer cohort (N = 300). A total of 59 genes were altered in at least 10% of the patients while a sub-set   disease free (Four patients were lost to follow up) (S2 Table). In the recurrent cohort, the patients underwent chemotherapy and radiation prior to the surgical treatment.
The genes for validation were selected from the 181 gene list based on multiple criteria; down regulated more than 5 fold in 2 technologies [Placenta-Specific 8 (PLAC8) and Crystallin, Mu (CRYM)], nodes within the String database (FOS, TTK, Ubiquitin conjugating enzyme E2  Fig). qPCR validation of the selected genes (N = 9) was then carried out using the relative quantification method using the geometric mean of the two reference genes. The markers showed regulatory trends in both the groups similar to those obtained from the meta-analysis (Fig 3A-3D). Six of the genes showed more than 70% validation in the sam-

Correlation of the Marker Profile with Treatment Outcome
The markers were correlated with treatment outcome (recurrence in the Group I and re-recurrence in the Group II). ANO1 showed the best efficacy (sensitivity: 0.83, specificity: 0.67) for detecting patients that fail treatment in cohort 1. ANO1 also showed a 3-fold increase in median fold expression in the recurrent cohort (4/5) of Group I when compared to non-recurrent cohort. ECT2 (0.83), FADD and UBE2V2 (1) also showed high sensitivity in Group I. In Group II 7/9 (UBE2V2, CRYM, FADD, CEP55, PLAC8, TTK and FOS) showed high sensitivity (0.67 to 1) in detecting treatment failure. The other markers showed low sensitivity and specificity (S3 Table) in both the groups.
HPV positivity was a good prognosticator in the overall cohort with 37% (3/8) showing treatment failure as compared to 63% (7/11) in negative patients. However, in combination with FADD, the only marker that showed significant association with HPV status, FADD high / HPV-patients showed a high percentage of treatment failure (60%, 6/10) as compared to FAD-D low /HPV+ patients (33%, 1/3).
Among these genes, only ANO1 and FADD showed significant association with overall survival (OS) and disease free survival (DFS) (p<0.05) in TCGA database. In addition, co-expression data showed that both these genes are expressed in patients with HNSCC (Spearman's correlation = 0.68) (Fig 4A). The 109 cases with upregulated expression of ANO1 showed low median survival as compared to the cohort without any alterations (18.96 vs 56.44 months; p = 0.0003) (Fig 4B). Similar association was also observed in the patients with treatment outcome (N = 217). Fifty four of these patients with increased ANO1 expression had a low DFS (20.04 vs 53.09 months; p = 0.02) (Fig 4C). FADD showed association only with overall survival   (Fig 4D). Assessment of the combined expression showed that coexpression of these two genes had a significant effect on OS (21.48 vs 57.88; p = 0.0007) and DFS (25.72 vs 53.09; p = 0.04) in HNSCC patients (Fig 4E and 4F).

Discussion
High throughput global profiling at the transcript and protein levels provide an extensive repertoire of probable candidate biomarkers for diagnosis and prognosis of various cancers. Nevertheless, issues pertaining to the analysis of data, possible cross-study discordance and the need for selecting apt candidates for extensive patient based validation has made the clinical utilization of this data extremely challenging. In addition, the data generated from any single study using these approaches are rarely utilized completely, with only subsets of the data being analyzed in context of the specific objective for which the study is designed. Multiple in vitro and patient related studies have been carried out in head and neck cancer that profiles the expression differences in the tumor as compared to the tumor adjacent mucosa or the completely normal tissue [11,25,26]. Consequently, although a number of disease-specific biomarkers have been identified, very few have been translated to clinical application. These issues associated with global profiling studies necessitates optimal utilization of publically available databases and identification of a functionally relevant, robust subset of biomarkers that can be validated for their clinical applicability. Meta-analysis, a systematic method that enables data analyses from independent studies and integrates them using common statistical pipelines, is invaluable in this context [17]. The primary objective of this study was to identify a functionally relevant and clinically applicable subset of biomarkers for HNSCC by utilizing a meta-analysis based approach.
The meta-analyses approach has been employed to identify clinically relevant biomarkers in cancers of many sites. Studies across microarray datasets in pancreatic cancer, prostate cancer and osteosarcoma have led to the identification of novel targets as well as biomarkers [12,13,27]. This approach has also led to gene signatures that predict treatment outcome and relapsefree survival in breast cancers that are now in clinical use [14,15]. Publically available databases on expression profiling of HNSCC patients were analyzed in a previous study, leading to identification of a signature for survival [28]. Another study identified six categories of HNSCC patients based on their molecular profile with the hypoxia-associated and mesenchymal subtypes being more aggressive [29]. Our study identified a subset of genes based on concordance across studies carried out using different technological platforms, thereby imparting a higher confidence value with regard to their potential clinical utility. The high percentage of genes (>80%) that were experimentally validated in terms of their trends in regulation between the microarray studies and the patient expression profiles further emphasized the significance of this approach to identify clinically relevant markers. The patient cohorts used in the study was of a smaller size but a further validation using TCGA data emphasized the clinical relevance of the selected markers. Cross-platform analysis carried out across the Affymetrix and Agilent platforms also highlighted the high percentage of discordance (>50%) among the trends of the genes identified. This discordance may be reflective of genuine biological differences owing to tumor heterogeneity across different sample cohorts or due to technical artifacts; nevertheless, this aspect further emphasizes the need for a meta-analysis based approach for marker identification. Cross-database comparison of the 181 gene signature identified in this study with the TCGA data identified ANO1 and FADD as altered with high frequency in HNSCC (>30%) and further these genes showed significant association with overall and disease-free survival. Both these genes are located on chromosome 11 (11q13), a highly amplified region in cancers [30,31]. Anoctamin 1 (ANO1), a transmembrane, Ca (2+) activated Cl (-) channel is known to be specifically expressed in HNSCC (>80% of the tumors) [32] correlating with the development of distant metastasis [33]. The expression of this gene, regulated by promoter hypermethylation, is known to provide the balance between tumor growth and metastasis [34]. Further, in vitro studies have shown that ANO1 promotes anchorage-independent growth and cell proliferation mediated by ERK1/2 and CCND1 (Cyclin D1) induction [35] [32]. The Fas Associated Death Domain (FADD), a protein involved in apoptotic signaling [36], along with additional roles in cell proliferation, has been associated with nodal metastasis, overall and diseasefree survival in patients with head and neck cancers [37] [38]. Concomitant amplification of the 11q13 region in HNSCC [39] and over expression of FADD have, in combination, suggested this gene to be the driver of this amplicon [40]. As observed in this study, both these genes, in combination, are also identified to be predictive of prognosis in breast cancers [41]. Identification of these two genes from this cross-study/cross-platform analysis, necessitates detailed investigation as to their role in the disease and their clinical applicability as biomarkers and/or therapeutic targets.
Among the other genes validated, ECT2 was altered in the highest number of HNSCC patients in TCGA. This cytokinesis regulator, when dysregulated, can activate Rho signaling pathways and thereby may result in malignant transformation as observed in lung adenocarcinoma and glioma [42][43][44]. A study in oral cancer has further implicated the gene to be involved in cellular proliferation and therefore would be of relevance in early carcinogenesis [45]. In this study, PLAC8 and UBE2V2 showed validation across maximum percentage of patient samples (90%). Interestingly, PLAC8 is reported to be a candidate oncogene and is known to affect regulation of autophagy and promote ERK-dependent epithelial mesenchymal transition in colon cancers [46,47]. However, in vitro studies in other cancer cell lines have shown contradictory functions of the gene, wherein it contributes towards pro-apoptotic functions [48]. These functional differences are suggested to be due to possible splice variants of the gene, an aspect that needs to be extensively investigated in order to establish its role in the early carcinogenesis of head and neck cancer. Studies that report the role of UBE2V2 in cancer are comparatively few, the gene is reported to be associated with poor prognosis in breast cancer [49] while the suppression of the gene in colorectal carcinoma cells is known to reverse oxaliplatin resistance [50]. Further studies are essential to elucidate its role in Head and Neck cancer.
Human papilloma virus is considered a major etiological factor in Head and Neck cancers especially in oropharyngeal cancers [51], recent evidence suggest an increasing influence in cancers of the oral cavity also [52]. Cancers with a HPV-based etiology are reported to lesser frequency of genetic changes [53,54], in this study although the overall differential expression pattern of the genes in the patients did not correlate to HPV status, FADD showed a significant association. Majority of the HPV+ patients showed a downregulation of FADD, in concordance with reports that HPV oncoprotein E6 binds to FADD and subsequently leading to a degradation of the protein [55]. This is considered a protective mechanism against apoptosis. In combination with the association of FADD/HPV positive patients with recurrence and prognosis, these evidences indicate contrasting roles for the gene that is based on the etiology of the cancer, further in vitro studies and patient validation can establish the same.
A surprising aspect of the study is the absence of the reported and known players in head and neck cancer, Epidermal Growth Factor Receptor (EGFR), Tumor Protein p53 (TP53) and Ras from the statistically significant list of differentially expressed genes. Previous review of transcriptomic profiling studies in head and neck cancer also showed that these major players have been absent from most transcriptomic studies [56]. This might suggest that the differences at transcript level expression of these genes are not as significant in the patients with the disease compared to the mutation based and protein level differences. On the other hand, the other known classes of genes reported previously such as the collagen family, Serpins, FN1, Cyclin dependent kinases are well represented in our cohort. Documentation of new molecules such as ECT2 and PLAC8, not extensively reported in HNSCC previously, further indicates that the meta-analysis based approach can enable identification of novel candidate biomarkers and therapeutic targets.
Analysis of the interaction networks between the statistically significant concordant genes showed two major clusters with many genes as cross connectors. FOS and MYC, the wellknown molecules involved in cell proliferation and differentiation were prime among them. The FOS family of genes is known to be correlated with increased aggressiveness of the tumors, their metastatic nature and thereby conferring poor prognosis in HNSCC [25,57,58]; a similar correlation was obtained from this study. MYC along with its co-amplified partner, CDK1 are reported to be correlated with survival and prognosis in head and neck cancers [59][60][61]. The significance of this network and the cross connecting nodes needs to be further investigated to understand their biological significance.
This study has hence identified a list of biomarkers of head and neck cancer with high confidence in terms of their functional and biological relevance and potential clinical applicability. Identification of these markers, concordant across multiple studies and different technologies, suggest their strong relevance to HNSCC. This study did not look into the site-specific variations that might be relevant in the biology of the disease; validation in larger and well annotated patient cohorts might provide insights into their specific roles in the process. In addition, similar meta-analysis based approaches towards identification of markers of resistance/response to chemotherapy in the patients will be extremely beneficial. Studies in this direction are currently ongoing in our group.