Molecular Portrait of Oral Tongue Squamous Cell Carcinoma Shown by Integrative Meta-Analysis of Expression Profiles with Validations

Oral Tongue Squamous cell carcinoma (OTSCC), the most frequently affected oral cancer sub-site, is associated with a poor therapeutic outcome and survival despite aggressive multi- modality management. Till date, there are no established biomarkers to indicate prognosis and outcome in patients presenting with tongue cancer. There is an urgent need for reliable molecular prognostic factors to enable identification of patients with high risk of recurrence and treatment failure in OTSCC management. In the current study, we present the meta-analysis of OTSCC microarray based gene expression profiles, deriving a comprehensive molecular portrait of tongue cancer biology, showing the relevant genes and pathways which can be pursued further to derive novel, tailored therapeutics as well as for prognostication. We have studied 5 gene expression profiling data sets available on exclusively oral tongue subsite comprising of sample size; n = 190, consisting of 111 tumors and 79 normals. The meta- analysis results showed 2405 genes differentially regulated comparing OTSCC tumor and normal. The top up regulated genes were found to be involved in Extracellular matrix degradation (ECM) and Epithelial to mesenchymal transition (EMT) pathways. The top down regulated genes were found to be involved in detoxication pathways. We validated the results in clinical samples (n = 206), comprising of histologically normals (n = 10), prospective (n = 29) and retrospective (n = 167) OTSCC by evaluating MMP9 and E-cadherin gene expression by qPCR and immunohistochemistry. Consistent with meta-analysis results, MMP9 mRNA expression was significantly up regulated in OTSCC primary tumors compared to normals. MMP9 protein over expression was found to be a significant predictor of poor prognosis, disease recurrence and poor Disease Free Survival (DFS) in OTSCC patients. Analysis by univariate and multivariate Cox proportional hazard model showed patients with loss of E-cadherin expression in OTSCC tumors having a poorer DFS (HR = 1.566; P value = 0.045) and poorer Overall Survival (OS) (HR = 1.224; P value = 0.003) respectively. Combined over-expression of MMP9 and loss of E-cadherin membrane positivity in the invasive tumor front (ITF) of OTSCC had a significant association with poorer DFS (Log Rank = 16.040; P value = 0.001). These results suggest that along with known clinical indicators of prognosis like occult node positivity, assessment of MMP9 and E-cadherin expression at ITF can be useful to identify patients at high risk and requiring a more intensive treatment strategy for OTSCC. Meta-analysis study of gene expression profiles indicates that OTSCC is a disease of ECM degradation leading to activated EMT processes implying the aggressive nature of the disease. The triggers for these processes should be studied further. Newer clinical application with agents that can inhibit the mediators of ECM degradation may be a key to achieving clinical control of invasion and metastasis of OTSCC.


Introduction
Oral Tongue Squamous Cell Carcinoma (OTSCC) is regarded as a biologically unique entity compared to cancers occurring in the other oral sub-sites. The trend in epidemiology of oral cancer in Asia in the past decade (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) shows OTSCC as the most frequently affected oral sub-site. [1] Earlier studies also report a higher incidence of OTSCC in India compared to other countries. [2,3,4,5] According to population based cancer registry (PBCR), the age adjusted incidence rate (AAR) for OTSCC in Chennai is showing an increasing trend from 3.6 to 5.7 per 100,000 persons above 25 years. Though there are poor prognostic indicators for OTSCC like occult node positivity, tumor depth, lymphovascular invasion and perineural invasion, there is still a need for molecular prognostic biomarkers that are reliable and robust to identify patients who are likely to have an adverse outcome.
Microarrays, a tool for genomic scale profiling of gene expression, is a well known potentially valuable means of understanding the complex interactions and networks in development of several diseases including cancer. [6,7] These high throughput studies have offered the advantage of understanding the biology of a cancer through an exhaustive analysis. The launching of public microarray data archives like Gene Expression Omnibus and the advent of advanced computational informatics tools have made it possible to compare and converge gene expression studies done independently across different platforms. However, the hallmark of scientific progress is reproducibility of published outcomes which has been difficult in the case of several microarray studies with major sources of discordance because of variation caused by random noise, biological and experimental differences, and differences in technical methods. [8] Most often we have findings that are not reproducible across studies due to data perturbations of individual studies, improper validations, and insufficient control of false positives. Despite these obstacles, several groups have successfully gleaned important insights from the focused comparison of disparate microarray results. [9,10] Many of the limitations can be mitigated by the use of standard reporting methods, together with careful application of largescale meta-analysis techniques.
Current study presents meta-analysis of OTSCCs as an exclusive sub-site for the first time as our primary objective. It was attempted to overcome the limitations of the individual expression profiling studies, resolving inconsistencies and reducing the likelihood of random errors, thus laying a foundation for uncovering the molecular aspects of OTSCC. We present the differentially expressed genes (DEG) comparing the OTSCC and normal expression profiles along with the involved signaling pathways. We have validated two biomarkers, MMP9 and Ecadherin found in meta-analysis in prospective and retrospective clinical samples as our second objective.

Materials and Methods
Identification of eligible OTSCC gene expression data sets OTSCC expression profiling studies were identified by searching the PubMed database. The following keywords and their combinations were used: "Oral tongue cancer gene expression microarray". The Gene Expression Omnibus database (http://www.ncbi.nlm.gov/geo) was also searched for terms "Oral Tongue Cancer", "Oral Tongue Squamous cell carcinoma", "mobile tongue cancer".
Inclusion Criteria. Gene expression data sets from exclusively anterior 2/3 (mobile tongue cancer) were taken for the study. The original experimental studies comprising of gene expression values for tongue tumor and normal tissues were taken. The expression data sets obtained from only standard microarray platforms were undertaken for the current study.
Exclusion Criteria. Studies on head and neck cancer with a few samples on tongue cancer were excluded. Studies from datasets with base of tongue samples, tongue cancer cell lines, non human tissues were excluded. Studies without inclusion of normal samples were excluded.

Individual Study Analysis
GEO accession number, sample type, platform, number of cases and controls, references and gene expression data were extracted from each of the selected study. The Entrez ID, gene symbol, and gene description were assigned to each probe in the series matrix expression file based on the corresponding platform file. All the expression values were base 2 log transformed. The dChip software was used to remove duplicates for a given Entrez id, by averaging the expression values from multiple probes corresponding to the same gene. [11,12] The expression values in each data-set were normalized by global median normalization. Each of the data-sets was first analyzed by BRB ArrayTools version 4.4 (http://linus.nci.nih.gov/BRB-ArrayTolls. html) [13] to identify the DEG among the normal and tongue tumour tissue samples. Study specific, gene specific P values were calculated using two sample random permutation t tests. A t statistic (t) for an individual gene was calculated and compared with 10,000 t statistics generated by randomly assigning the sample labels to the expression values of the gene. A transcript was considered differentially expressed when the difference in the mean expression was > 2 fold with a significance of P <0.05. The common genes across 5 data-sets were derived by combined P-value method. To improve cross-batch comparisons, the individual profiles were independently analyzed before combining them. Class comparison by BRB array tools was used to identify the differentially expressed genes between tumor and normal samples in independent studies. Fisher's χ2-based method was used to calculate the combined P-value for each gene and the average fold-change of the gene across all the data sets were summarized in S1 Table. Meta-analysis of the OTSCC data GEO data tables were constructed from gene-expression data as mentioned earlier with genes/ probes in the rows and samples in the columns and uploaded into INMEX (http://www.inmex. ca/INMEX). [14] The data was annotated by converting different gene or Probe ID to Entrez IDs. For each probe set, intensity values were subjected to simple log 2 transformation followed by quantile normalization. The FDR was fixed at 0.01. After all the data-sets were uploaded, they were processed, annotated, and a data integrity check was performed prior to meta-analysis. The Cochran's Q test was used to calculate the weighted sum of squared differences between individual study effects and pooled effect across studies. The QQ plot was observed and since the estimated Q values deviated significantly from the Chi squared distribution, random Effect Model with a significance of 0.05 was used for meta-analysis. The random effects model presumes that different studies present with substantial diversity, and evaluates the between study variance along with within study sampling errors. [15] The statistical analysis was done using the INMEX program.

Functional Analysis
The Functional analysis of INMEX was used to generate a new hypothesis by exploiting the characteristics of the DEG in the meta-analysis. A heat map was created using the pattern extractor produced gene expression profiles across the different data-sets and conditions used for the study. A Gene Ontology (GO) enrichment analysis was performed using a web-based Software GENECODIS (http://genecodis.cnb.csic.es) to interpret the biological implications of the DEG in OTSCC. [16] GENECODIS integrated information from different sources like NCBI Entrez Gene, KEGG, Swiss-Prot and other databases for concurrent enrichment studies. Two statistical tests, namely, the hypergeometric distribution and the χ2 test of independence were applied to identify categories, and their combinations, that were significantly enriched in the list of genes. In addition, we also performed the pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Network construction
The protein-protein interactions (PPIs) were derived using GeneMANIA (www.genemania. org). [17] GeneMANIA created a consensus network of the gene of interest based of the top 30 up regulated and down regulated genes based on the genomic networks available in public available sources like Reactome, BioCyc, BioGRID, Pathways common. Here we studied the co-expression, physical interaction, genetic interaction, shared protein domains, co-localization, pathways involved and predicted functional relationships between the differentially expressed genes relevant to OTSCC. We also used STRING (www.string.db.org) for network construction. [18] STRING database has known and predicted protein interaction. The interactions include direct (physical) and indirect (functional) associations that are derived from four sources namely genomic context, high throughput experimentation, previous knowledge, conserved co-expression. STRING quantitatively integrated interaction data from these sources to derive the integration maps.

Ethical Statement
All research involving human participants had been approved by the authors' Institutional Review Board (IRB) and all clinical investigations had been conducted according to the principles expressed in the Declaration of Helsinki. A written informed consent was obtained from all the participants and the content of the informed consent was approved by the respective Institutional Research Boards namely, Cancer Institute WIA; Protocol 1 HNCOG (Cancer Institute, Womens India Association; Protocol 1 Head and Neck Co-operative Oncology Group); SBDCECM105/13/58 (Sree Balaji Dental College Ethical Committee Meeting 105/13/ 158) and IHEC/SD/MDS/120MP2 (Institute Human Ethics Committee/ Saveetha Dental/ Master of Dental Surgery/120Maxillofacial Pathology 2). The finger prints were obtained for patients who were illiterate after explaining the protocol and a written consent was additionally taken from patient's relative presenting as witness.

Patient Data
Retrospective samples used for the study were from paraffin embedded sections of early staged tongue cancers (clinical stages, T1 and T2 with N0) (n = 167) from patients presenting between the years 1995 and 2007 to the Head and Neck Oncology clinic, Cancer Institute (WIA). The prospective primary tongue cancer samples (n = 21), corresponding adjacent apparent normal tissue (n = 4) were obtained from the patients presenting to the Head and Neck Oncology clinic, Cancer Institute (WIA) and taken for wide excision glossectomy. Histologically normal tongue tissues (n = 4) and formalin fixed paraffin embedded tongue normals (n = 10) were obtained from patients presenting to the dental outpatient clinic of Sree Balaji Dental College and Hospital and Saveetha Dental College and Hospital. These patients were non tobacco users and presented with erythema on tongue suspected clinically as erythroplakia due to irritation. After biopsy, these samples were found histologically normal. All the prospective samples were collected and stored in RNA later. Variables recorded and evaluated for the study included age, sex, site, pattern of the lesion, depth of invasion, clinical stage, histological grade (both Broders grading and Bryne's grading), occult node positivity and tobacco habits.

Treatment Details
Comprehensive history and physical examination of the oral cavity and additionally upper aero-digestive tract with neck imaging was done using ultrasound for the OTSCC patients. The patients had undergone standard treatment at Cancer Institute (WIA) consisting of either wide excision glossectomy or brachytherapy, with or without selective neck dissection (Levels I to IV). Patients unwilling/unfit for surgery were treated using External Beam Radiotherapy as per the decisions of multidisciplinary board of the Institution. Pattern of Failure and good outcome was recorded for each patient.

Histopathological analysis
Each patient had undergone a routine evaluation which included a biopsy for histological confirmation of cancer. All the clinical samples used for the validation studies (n = 206) were individually examined by two Oral pathologists (VS and PR) to assess the histopathology along with presence of percentage of tumor cells. Only tissues showing percentage of tumor cells >70% were included in validation studies. The invasive tumor front grading was done on OTSCC H and E stained slides according to criteria of degree of keratinisation, nuclear pleomorphism, pattern of invasion and lymphoplasmacytic infiltrate giving the scores from 1-4 as Bryne's grade and invasive pattern grading score (IPGS) 1-8 as described previously. [19,20] When true ITF was absent in the section, the deepest portion of the visible tumor was graded. The depth of invasion was measured from the highest portion of the basement membrane to the deepest portion of the tumor as described before [21] directly in micrometers using Pro-gRes CapturePro 2.8.8 software (JENOPTIK optical systems) at 4x objective magnification.

Real time PCR
Real time PCR was performed on prospective tissue specimens (n = 29) to measure the mRNA expression of MMP9 and E-cadherin. The primer sequences used for the study are shown in Table 1. The quantitative real-time RT-PCR was performed using FastStart Universal SYBR Green Master (Rox) (Roche) according to the manufacturer's instructions on a 7500 Real Time PCR System (Applied Biosystems). Universal thermal cycling conditions used were as follows: 10 min at 95°C, 40 cycles of denaturation at 95°C for 15 sec, and annealing and extension at 60°C for 1 min. Data was collected at every temperature phase during each cycle. The comparative threshold cycle (Ct) method was used to calculate fold change. β-Actin gene was used as a reference control to normalize the expression values. Triplicate reactions were performed for each gene expression studies and mean expression value was computed for subsequent analysis. The relative expression level of the genes was calculated using (2-ddct) method.

Immunohistochemistry
The IHC detection of MMP9 and E-cadherin expression was performed on 5 μm sections of FFPE tissues (n = 167) and histologically normal sections (n = 10). The sections were deparaffinised in xylene and rehydrated in absolute ethanol. Antigen retrieval was done with 0.05M Tris Buffer (pH 9) in pressure cooker for 20 minutes. Endogenous peroxidase activity was blocked by incubation in 0.03% hydrogen peroxide in distilled water for 10 minutes and then washed with phosphate buffered saline (PBS). Sections were pre-incubated with power block (Bio-Genex Laboratories, San Ramon, CA) for 10 minutes and then incubated with primary antibody against MMP9 (BioGenex Cat#AN504, clone EP1255Y, rabbit monoclonal antibody) and E-cadherin (BioGenex Cat#AM390, clone 36, mouse monoclonal antibody) at room temperature for 90 minutes. MMP9 and E-cadherin expression was observed using the SuperSensitive™ Polymer-HRP IHC Detection System (BioGenex Laboratories, San Ramon, CA). Sections were counterstained with hematoxylin, dehydrated, and mounted in DPX. Positive controls and negative controls were included appropriately for MMP9 and E-cadherin where primary antibody was replaced with 2% BSA in negative control. Immunostaining of the sections was reviewed with the corresponding haematoxylin and eosin stained sections.
Scoring of MMP9 and E-cadherin. The immunohistochemically stained tissue sections were reviewed and scored independently by two Oral Pathologists VS and PR, blinded to the clinical parameters. For MMP9 scores, the staining intensity was scored as shown previously. [22,23] The extent of MMP9 staining was defined as the percentage of positive staining areas of tumor cells or normal tongue epithelial cells in relation to the whole tissue area, scored on a scale of 0 to 4 as follows, 0, <10%; 1, 10-25%; 2, 26-50%; 3, 50-75%; 4, >76%. The sum of the staining intensity and the staining extent scores was used to calculate the final staining score. For statistical analysis, final staining scores of 0-5 and 6-7 were considered to be low and high MMP9 expression values respectively. E-cadherin was scored as described before. [24] The expressions were scored at the representative areas of mid tumor region as well as in invasive tumor fronts separately.

Statistical Analysis
All statistical analyses were done in SPSS version 16.0. Distribution of categorical variables was compared by Pearson's Chi-squared test or Fischer's exact test according to the counts of expected frequencies. Overall Survival (OS) in months was calculated from the start of the

Studies included in OTSCC Meta-analysis
In the current study, we collected a total of 5 expression profiling study data-sets according to the inclusion and exclusion criteria specified, comprising of 111 tongue tumors and 79 normals serving as controls. Selected details of the individual data-sets are summarized in Table 2. The current study design is shown as a flow chart. (Fig 1)

Class Comparison and common genes across databases by BRB array tools
The common genes across the 5 data-sets were derived by combined P-value method. Class comparison by BRB array tools gave a list of 434 DEG, of which 194 were up-regulated and 241 were down-regulated. (S1 Table) The advantage of this method is that the actual fold-change of gene expression and the consistencies of deregulation across the 5 datasets were also taken into consideration, which helped us to narrow down on genes that are both biologically and statistically significant.

Individual data analysis followed by Meta-analysis of Gene Expression in OTSCC
The individual datasets were subsequently loaded into INMEX, and subjected to simple log 2 transformation and quantile normalization with a fixed FDR of 0.01. A random effects model of Effect Sizes (ES) measures was applied showing the integration of the gene expression patterns across the studies. The DEG with a significance of <0.05 were selected. The total number of DEG from the meta-analysis is illustrated as a Venn diagram in Fig 2. The 'gained' genes (n = 178) are the DEG identified in the current meta-analysis only and the 'lost' genes (n = 3315) are genes identified as DEG in any of the individual data analysis but not in the current meta-analysis These are the genes that present conflicting changes in the expression profiles or show large variations in different studies. Heat map showing hierarchical clustering depicts the total number of DEG (n = 2405) in OTSCC datasets. (Fig 3) There was a significant overlap in the DEG obtained by both the above methods. The top 30 up regulated and top 30 down regulated are shown in Tables 3 and  4. The S2 Table shows the full list of DEG categorized by combined ES and significant P value.

Functional Annotation of DEG in OTSCC
To gain insights into the biological roles of the DEG from OTSCC, we performed a GO categories enrichment analysis. Gene ontology provides a common descriptive framework and functional annotation and classification to analyze the gene sets data. GO categories were organized  into three groups: biological process, cellular component, and molecular function (Fig 4A, 4B and 4C). We found GO terms for Biological process enriched for Collagen Catabolic Process (GO:0030574; adj P value = 3.23 E-07 ), Extracellular matrix Disassembly (GO:2022617; adj P value = 1.74 E-06 ), response to stress (GO:0006950; adj P value = 5.29 E-06 ), Cellular component Movement (GO:0006928; adj P value = 9.12 E -06 ), and Biological Adhesion (GO:0022610; adj P value = 9.12 E -06 ) significantly enriched. We found GO terms for Molecular function enriched for metallopeptidase activity (GO: 0004222; adj P value = 2.55). S3 Table shows the Significantly Enriched GO processes of the top DEG.
OTSCC tumorigenesis involves breakdown of the extracellular matrix (ECM) mediated by up regulation of matrix metalloproteases (MMPs). Several matrix metalloproteases were found to be up regulated in OTSCC namely MMP12, MMP1, MMP13, MMP10, MMP3 and MMP9 and they were among the top 30 up regulated genes involved in the aggressive course of OTSCC. (Table 3) It is well known that MMPs secreted by both the tumor and stromal cells are the chief players in ECM degradation promoting invasion and metastasis. The other important Epithelial to mesenchymal transition (EMT) genes have important roles in OTSCC tumorigenesis. Cadherins CDH19, CDH16, CDH11 were found to be down regulated in OTSCC tumor as a read out of EMT process. Several genes implicated in EMT were found to be up regulated in OTSCC. Most of the integrins (ITGA5, ITGA2, ITGA5, ITGA2, ITGB6, ITGB4, ITGA3, and ITGA6) showed a combined ES value greater than 1.2 and notably GSK3beta also showed an up regulation. Tenacin (TNC), a known anti-adhesive ECM molecule and its known receptor integrin alpha beta 6 (ITGB6), which is important for assigning a mesenchymal property to the epithelial cell, was found to be up regulated. SNAI2/SLUG, an important gene in EMT process, was up regulated in OTSCC (Combined ES = 1.934). The encoded protein is well known to be involved in epithelial-mesenchymal transitions and also has anti apoptotic activity. SNAI2/SLUG acts as a transcriptional repressor binding to Ebox motifs and is known to represses E-cadherin transcription. We found down regulation of genes like CAV2, KRT19, MITF, and NUDT13 which are also important to drive the EMT pathways. Protein-Protein Interaction Network shows the members of the uPA-uPAR System having important implications in OTSCC. Construction of protein-protein interaction networks using the gene list obtained showed the uPA and uPAR system chiefly up regulated in OTSCC. (Fig 5) The PPI network also showed Osteopontin (SPP1), up regulated in OTSCC tumors directly interacting with uPAR. The urokinase plasminogen activator (uPA) system, consisting of genes uPA (PLAU) (Combined ES = 2.3292) and receptor uPAR (PLAUR) (Combined ES = 1.895) was found to be up regulated in OTSCC tumors compared to normals.
Collagen Catabolism pathway is activated in OTSCC. Several genes involved in collagen catabolism pathway were found to be differentially expressed in OTSCC, of which COL4A6, COL5A2 showed significant up regulation and COL14A1 showed down regulation.
The down regulated genes in OTSCC are implicated in detoxification of carcinogenic compounds and environmental toxins. Among the down regulated genes, Hepatic Leukemia Factor (HLF; Combined ES = -2.1969) encodes a member of the proline and acidic-rich (PAR) protein family, a subset of the bZIP transcription factors. The encoded protein forms homodimers or heterodimers with other PAR family members and binds sequence-specific promoter elements to activate transcription. OTSCC also showed down regulation of

Validation Studies
Quantitation of MMP9 and E-cadherin mRNA expression in OTSCC tumors. MMP9 mRNA was significantly up-regulated in OTSCC compared to apparent normal ( ÃÃ P value = 6 E-05) and as compared to histologically normal tongue tissue. ( Ã P value = 0.02) E-cadherin mRNA expression was found to be down regulated in tumors, however, it did not show a statistically significant deregulation comparing the pairs of tumors and normals (Fig 6) MMP9 protein over-expression denotes poor prognosis and is significantly associated with disease recurrence and poorer DFS in OTSCC patients. Immunohistochemistry showed normal tongue mucosa epithelium negative for MMP9 expression. Positive MMP9 staining was noted in muscle cells which served as internal positive control. The characteristic distribution pattern of MMP9 in our series was found to be diffuse expression in both the tumor and stromal areas of OTSCC. Fig 7A, 7B and 7C show the pattern of expression of    (Fig 8) Loss of membrane positive E-cadherin expression at invasive tumor front indicates failure and poorer DFS. E-cadherin staining results are reported for 156 patients with OTSCC. The staining was unfit for interpretation with folds and loss of tissue in 11 samples. Fig 9A  shows the pattern of expression of E-cadherin in OTSCC. The membrane staining of E-cadherin was found to be preserved in 107/156 (68.5%) and staining was found to be absent for There was a significant higher nodal and loco-regional recurrence among patients whose   Other Prognostic Variables in OTSCC. Occult cervical node metastasis detected by elective neck dissection and a subsequent neck recurrence was observed in 17% of patients of which 30% belonged to stage T2. Univariate analysis showed tobacco habits, clinical stage tending to influence DFS, but OS was found be significantly associated with T stage (P value = 0.013). However, occult node positivity was found to be a significant predictor of poor DFS (P value = 0.007 and 0.019) by both univariate and multivariate Cox regression analysis models respectively along with the biomarkers discussed.

Discussion
The current study is the first attempt of an integrated meta-analysis of OTSCC gene expression profiles with the primary objective of deriving a molecular portrait of tongue cancer. We show that meta-analysis can increase the statistical power towards a more precise estimate of gene expression, increasing the reliability, resolving inconsistencies, and reducing the likelihood of random errors. Our analysis showed loss of 3315 genes, indicating that these genes would have been otherwise deemed significant and pursued in individual studies but is not perceived as relevant when analyzed across the 5 datasets. A "gain" of 178 genes depicts the ones very significant in meta-analysis along with common genes that are consistent across the datasets that are more reliable and reproducible for unraveling the OTSCC biology. OTSCC has a poorer outcome with a survival rate of~50% which has remained unchanged for the past four decades. [30] We have shown previously that OTSCC in Indian population has been showing a changing epidemiological trend with nearly 50% of the cases being non-tobacco associated [3] and also unlike oropharyngeal cancers, p16 expression was not found to be a surrogate marker for HPV in OTSCC. [31,32] The rationale of the current study was therefore to understand the biology of the OTSCC using gene expression profiles with a larger sample size deriving the molecular aspects to help derive newer management strategies in future for this aggressive cancer.

Tumor Microenvironment with MMPs, various ECM and EMT mediators in OTSCC
The molecular portrait derived in the current study shows OTSCC to be disease of tumor microenvironment. The top deregulated genes, are those involved in extracellular matrix remodeling and epithelial to mesenchymal transition (EMT) processes which can contribute towards an increased propensity of invasion leading to poorer outcome of OTSCC. About 6 members of the MMP family (MMP12, MMP1, MMP13, MMP10, MMP3, and MMP9) were among the top up regulated genes which can play significant role in all aspects of tumor progression via the enhancement of tumor-induced angiogenesis and the destruction of the local tissue architecture, thus allowing tumor invasion and metastasis. [33] Invasion of the tissues and metastasis to neck nodes is very common in OTSCC and is adverse in prognosis. In our current series also, occult node positivity was very significantly associated with a poor DFS. The MMPs are well known to facilitate ECM degradation, collagen breakdown, basement membrane degradation important for tumor cells to gain access to lymphatics and blood vessels, resulting in dynamic changes in the structure of the ECM, implying the aggressive course  of the disease. [34] Some of the other relevant genes that are interesting to be pursued in future studies are LAMC2, MYO1b and podoplanin. LAMC2 (a major component of laminin-5) has been reported to be highly expressed in several types of invasive tumors [35,36] including carcinomas of the pancreas [37,38], tongue [39,40], colorectal [41,42], lung [43], cervix [44], lung [45], extra hepatic cholangiocarcinoma. [46] An elevated level of LAMC2 in human cancers has been shown to be associated with poor survival in esophageal cancers. [47] Recent studies have shown that Myo1b is functionally involved in lymph node metastasis of human HNSCC. [48] Podoplanin identified in the current study has been reported as a reliable marker to determine the presence of lymphatic vessels, differentiating lymphatic endothelium from endothelium of blood vessels. Increased podoplanin has been shown to depict increased lymphatic vessel density indicative of lymph node metastasis. [49,50,51,52] Earlier studies suggest that podoplanin mediates the remodeling of the actin cytoskeleton via the down regulation of the activities of small Rho family GTPases. [53] Therefore LAMC2, Myo1B and podoplanin can be further explored as useful markers for OTSCC prognosis.

UPA-uPAR
Our data on protein-protein interaction network showed that uPA-uPAR system is chiefly involved in OTSCC tumorigenesis and suggests that it is an important pathway warranting more studies using relevant inhibitors. UPA-uPAR system has implications on tumor survival, growth, migration, invasion and metastasis. It is also well known that the activation of plasmin by cleavage of plasminogen is catalyzed by uPA binding to uPAR and is known to unleash a cascade of proteolysis by plasmin including the substrates MMP2 and MMP9 which can degrade the extracellular matrix. [54,55] Detoxification pathways down regulated Interestingly, the current study showed the genes down regulated, to be involved in xenobiotic metabolic pathways that are important for metabolism of carcinogens which may or may not be tobacco related. Our data showed FAM107A down regulation warranting a more detailed study in OTSCC in future. Studies have shown that FAM107A is dramatically decreased non-small cell lung cancers (NSCLC) with a minority of samples showing promoter methylation. [56] FAM107A has been described in renal cell carcinoma as a putative tumor-suppressor gene according to its role in the regulation of apoptotic processes [57,58] and has been found in the cluster of hypermethylated and transcriptionally repressed genes in hepatocellular carcinoma. [59] Our studies also showed GSTM5, the mu class of GST enzyme, involved in the detoxification of electrophilic compounds including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, to be down regulated. EPHX2 was another prominent down regulated gene identified in the current study which codes for Epoxide hydrolases which are important biotransformation enzymes. They convert epoxides obtained from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Down regulation of these enzymes in OTSCC may have significant implications in inefficient degradation of epoxides to dihydrodiols. MITF, another down regulated gene in our data set functions as a 'rheostat model' controlling phenotypic switches between proliferative, differentiated and tumorigenic/invasive phenotypes as shown in malignant melanoma. [60] Validation of MMP9 and E-cadherin Conventional histopathology lacks the power to discriminate between lesions that can recur compared to those that will not, therefore having little efficacy in prognostication of early staged tongue cancers. Our study showed patients with MMP9 over expression having a poorer DFS, very highly indicative of lymphatic metastasis consistent to earlier reports. [61,62,63,64,65] Well known EMT marker E-cadherin, being a membrane adhesion protein, showed preserved activity in the membrane and indicated aberrant expression when observed in cytoplasm. Current study showed loss of E-cadherin at ITF indicated poorer prognosis as shown previously. [66,67,68] The current study suggests that pattern of MMP9 and E-cadherin protein expression both in the centre and superficial areas of the OTSCC tumor and ITF can predict the treatment outcome, as evaluated by immunohistochemisty. It is well known that transformation of the cohesive and polarized epithelial cells into mesenchymal like cells confer the high mobility mark the hall mark of EMT. [69,70,71,72,73] Since the deepest and most invasive areas of tumor are histologically located at the ITF, these areas are known to orchestrate the clinical behavior. [19,74] Thus combined MMP9 over expression and loss of E-cadherin in OTSCC imply the aggressive nature of the disease.
The limitation of the current study attempted as a meta-analysis exercise is that all the samples studied belonged to a particular ethnic group, mostly Americans. So the results may vary marginally in other ethnic groups like Asians, among whom OTSCC is more prevalent. Still, the present study has been able to provide a comprehensive view of the biology of OTSCC along with the relevant genes and pathways to be pursued.

Conclusion
Current study indicates that ECM is severely compromised and EMT processes are activated in OTSCC. The key genes involved in degradation of basement membrane and lamina propia, and EMT activation can lead to infiltration and metastasis, explained by activated collagen catabolism pathways. Down regulation of genes involved in xenobiotics of the carcinogens should also be probed further for mechanisms. The probable "triggers" for these conditions have to be explored in detail. Validation of Meta data showed that prognostic implications for OTSCC can be derived by evaluating E-cadherin and MMP9 expression at the invasive tumor front by a routine technique like immunohistochemistry. The possibility of exploring the given markers in liquid biopsies can also help in depicting prognosis and detecting residual disease post treatment. Newer clinical application of agents that can inhibit the mediators of ECM degradation may be a key to achieving clinical control of invasion and metastasis of tongue squamous cell carcinoma.
Supporting Information S1