Similar Squamous Cell Carcinoma Epithelium microRNA Expression in Never Smokers and Ever Smokers

The incidence of oral tumors in patients who never used mutagenic agents such as tobacco is increasing. In an effort to better understand these tumors we studied microRNA (miRNA) expression in tumor epithelium of never tobacco users, tumor epithelium of ever tobacco users, and nonpathological control oral epithelium. A comparison of levels among 372 miRNAs in 12 never tobacco users with oral squamous cell carcinoma (OSCC) versus 10 healthy controls was made using the reverse transcription quantitative polymerase chain reaction. A similar analysis was done with 8 ever tobacco users with OSCC. These comparisons revealed miR-10b-5p, miR-196a-5p, and miR-31-5p as enriched in the tumor epithelium in OSCC of both never and ever tobacco users. Examination of The Cancer Genome Atlas (TCGA) project miRNA data on 305 OSCCs and 30 controls revealed 100% of those miRNAs enriched in never smoker OSCCs in this patient group were also enriched in ever smoker OSCCs. Nonsupervised clustering of TCGA OSCCs was suggestive of two or four subgroups of tumors based on miRNA levels with limited evidence for differences in tobacco exposure among the groups. Results from both patient groups together stress the importance of miR196a-5p in OSCC malignancy in both never and ever smokers, and emphasize the overall similarity of miRNA expression in OSCCs in these two risk groups. It implies that there may be great similarity in etiology of OSCC in never and ever smokers and that classifying OSCC based on tobacco exposure may not be helpful in the clinic.


Introduction
MicroRNAs (miRNAs) in mature form are noncoding RNAs, 19 to 25 nucleotides in length, with the ability to inhibit the translation and shorten the half-life of mRNAs [1]. MiRNAs can directly regulate multiple mRNAs, which encode the proteins that control important cellular processes. Many of these regulated pathways, including apoptosis, cell proliferation, and cell migration, can also contribute to cancer [2][3][4]. There are over 2000 known miRNAs, a subset of which have been shown to show changes in levels that correlate with various cancers [5,6] (http://mirbase.org/). Global expression analysis of these miRNAs in different cancers has identified miRNAs that function as oncomirs, like miR-21-5p, and are consistently upregulated in some cancer types, while other miRNAs are reduced in certain tumor types and appear to be tumor suppressors [7]. Various tumor types have been characterized to show a signature of miRNA levels associated with these tumors and their progression, which may aid in diagnosis and prognosis [5,6]. The small size and regulatory function of miRNAs have also made them the focus of research using them or similar molecules to change tumor cell properties and thus treat cancer [8][9][10].
Much effort has gone into describing a set of miRNAs that show consistent changes in levels, first with head and neck squamous cell carcinomas (HNSCCs) [11][12][13] and more recently with subsets of these cancers such as oral squamous cell carcinoma (OSCC) [14]. The results of these analyses have shown some consistencies, such as fairly universal upregulation of miR-21-5p, and somewhat lower consensus on other potential oncomirs, probably due to the variable amount of mixed epithelium/stroma in samples and diversity of etiology of the tumor subtypes. For example, oral pharyngeal cancer, unlike OSCC even in nonsmokers, is often associated etiologically with transforming HPV, specifically HPV16 [15][16][17]. Recent work has brought to light two distinct etiologies of OSCC, those associated with the main risk factor known, tobacco usage, and those not [18][19][20]. Like small-cell lung cancer in never smokers, OSCCs in never tobacco users seem to be distinct [14]. Hereafter, in this report, we will abbreviate this group who do not use tobacco, or other mutagenic products such as betel nut, to "never smokers". OSCC in never smokers, which is on the increase in the United States, seems to strike on average both older and younger patients than those associated with tobacco usage. It also tends to present in earlier stage, and occurs most frequently in the tongue and gingiva not the floor-ofmouth where tobacco-associated OSCCs occur. Molecularly, OSCCs in never smokers show lower rates of p53 gene mutations, and there is some evidence of differences in gene expression [19][20][21][22]. Like tobacco-related OSCCs they are rarely associated with transforming HPV or enrichment of the p16 tumor suppressor [23][24][25][26]. Fewer than 10% of oral cancers are HPV gene expression positive even in patients with no history of tobacco use [24,26]. Overall little is known about the etiology or the changes in mRNA or miRNA associated with this subtype of OSCC [24] We quantified levels of 372 miRNAs in 12 OSCC epithelial samples from never smokers versus 10 samples from a control group of subjects with apparently normal oral mucosa. We also tested levels of these miRNAs in a test group of OSCCs associated with tobacco usage. Next, we did similar comparisons using the miRNA expression data of the 344 control and OSCC samples from The Cancer Genome Atlas (TGCA) HNSCC cohort. Tumor samples of this study were dissected surgically and contain some stromal tissue. Together we used these datasets to compare miRNA expression in never and ever smokers with the goal of starting to gain insight on etiology of OSCC in these two different OSCC risk groups.

Clinical sampling
Two brush cytology samples each were collected from 12 subjects who never used tobacco, or other mutagenic agents like betel nut, and who presented with oral lesions that were biopsyproven OSCC. These patients were seen in the Oral and Maxillofacial Surgery Clinic in the University of Illinois Medical Center. Samples from normal controls were from 10 never tobacco users from oral sites that were normal on clinical examination by the oral surgeon. The second group of brush cytology samples was taken from 18 current or former tobacco users at lesion sites of either OSCC or nonmalignant disease with intact mucosa. Benign samples included mucosal lesions such as leukoplakia, all without dysplasia. All diagnoses were verified by histopathologic examination of surgically obtained tumor tissue for OSCCs and scalpel biopsy material for non-malignant lesions. All subjects in all groups provided written consent to participate in accordance with guidelines of the Office for the Protection of Research Subjects of the University of Illinois at Chicago, the local Institutional Review Board that formally approved of this research.

Brush cytology
Brush cytology was performed on patients as they presented in the clinic just prior to biopsy as described earlier, taking care to sample areas with intact epithelium [27]. Samples were immediately placed in Trizol (Life Technologies, Carlsbad, CA, USA), mixed, and frozen. We used a cervical cytology brush with RNA purification as described in Schwartz et al. [27].

RNA
Recent publications have stressed the problems with usage of Trizol to isolate miRNA with ethanol or isopropanol precipitations when RNA levels show a wide range [28]. While cells were stored in Trizol, we used a methodology similar to that recommended by Kim et al. and immediately following phase separation all samples were subjected to silicate-based binding purification to prevent selective miRNA loss (S1 Fig). We used RNeasy chromatography (Qiagen, Germantown, MD, USA) to remove mRNA followed by ethanol addition and RNeasy MinElute chromatography (Qiagen) to bind then elute small RNAs, including mature miRNA. There was a 6-fold range in sample RNA levels based on RT-PCR with similar average levels in the malignant and nonmalignant groups.
Quantitative RT-PCR 10 ng RNA was reverse transcribed in 5 ul reactions using the miRCURY LNA Universal RT microRNA PCR, Polyadenylation and cDNA synthesis kit (Exiqon, Woburn, MA, USA). cDNA was diluted 20 fold and assayed in 10 ul PCR reactions according to the protocol for miRCURY LNA Universal RT microRNA PCR against a panel of 4 miRNAs and a spike-in control for cDNA synthesis. Of each sample pair from a single subject, the sample with the higher yield based on reverse transcription quantitative polymerase chain reaction (RT-qPCR) was subjected to a scaled up cDNA synthesis and was assayed once by RT-qPCR on the micro-RNA Ready-to-Use PCR, Human panel I (Exiqon), which includes 372 miRNA primer sets. Negative controls, excluding template from the reverse transcription reaction, were tested and profiled like the samples with individual primer pairs. The amplification was performed in an Applied Biosystems Viia 7 RT-qPCR System (Life Technologies, Carlsbad, CA, USA) in 384 well plates. The amplification curves were analyzed for Ct values using the built-in software, with a single baseline and threshold set manually for each plate. Results with miRNAs shown to be differentially expressed in the initial screen were corroborated using a similar RT-PCR assay minimally in duplicate (Exiqon).

miRNA data analysis
For RT-qPCR data analysis, 40 miRNAs were selected as standards to normalize Ct values for each plate. These references were chosen because they were among a large subgroup of miRNAs expressed in all samples. We used the delta delta Ct method to calculate expression values. All Ct values were imported into the Rank Product program, which ranks levels for each miRNA within a sample, multiplies the values for all samples in one group to get the rank product, and then calculates a combined probability of the distribution for each RNA in the two groups to determine the probability of differential expression [29,30]. A cut off for the percentage (proportion) of false positives of 0.05 is taken as significant for differential expression. TCGA RNAseq data for 314 OSCC samples and 30 controls, were downloaded form the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga) as normalized miRNA quantification files along with accompanying patient clinical information files. The exact names for TCGA-derived miRNAs were obtained by examining a subset of 5 individual sample isoform quantification files to identify each differentially expressed miRNA based on its mapped genomic site as the 3p or 5p isoform. When ambiguous this designation was not given. Normalized miRNA level counts were loaded directly into the RankProdit Program to perform rank products analysis [29,30]. Nonnegative Matrix Factorization Consensus Clustering was used to identify the optimal number of distinct samples clusters among the 305 OSCCs in the TCGA dataset with known smoking status accessed through the GenePattern portal www.broadinstitute.org/. It was used to identify the optimal number of distinct samples clusters among the 305 OSCCs in the TCGA dataset [31][32][33]. First, all miRNAs with more than 50% samples with zero values were filtered out. Clustering was then done based on the levels of the 238 miRNAs which showed the greatest variation in expression levels (normalized standard deviation > 1). The cophenetic coefficient derived served as a measure of correlation between the sample distance induced by the consensus matrix.
Heat maps for visual presentation of the miRNA expression data were generated using BRB Array tools [34]. For the representation of RNA from brush cytology data set we used hierarchical clustering with 1-correlation and average linkage of the expression levels of 50 miRNAs shown to be differentially expressed between OSCC and nonOSCCs based on class comparison using BRB Array tools. For the representation of TCGA expression data nonnegative matrix factorization was used to cluster samples based on the expression levels of 228 most variably expressed miRNAs.

miRNA expression in OSCC in ever smokers
This work uses RNA from oral mucosal cells obtained by brush cytology [27,35] so we first sought to verify this approach to measure miRNA levels by examining miRNAs associated with OSCC in tobacco users. To focus on malignancy-specific pathologic changes, we compared expression of miRNAs in OSCCs of ever smokers versus that in nonmalignant lesions in a similar population. We compared epithelial miRNA from OSCC lesions in 8 patients, and nonmalignant oral lesions/conditions of 9 tobacco users, as outlined in Table 1. These included a granular cell tumor, mucosal aberrations such as fibrous hyperplasia and hyperkeratosis, and a soft tissue ameloblastoma of the gingiva. These lesions often show increased cell proliferation and possibly inflammation, but not other properties such as blocks to apoptosis and the increased tissue invasion that can occur with malignancy. We used the rank product methodology, a nonparametric statistical tool, to determine differentially expressed miRNAs [29,30]. This test for miRNA differential expression with OSCC in ever smokers revealed one miRNA that was induced specifically with OSCC using the criteria of >2-fold change and at p < 0.05 for the rank product test (Table 2A). This induced miRNA was miR-196a-5p, a miRNA associated with oral cancer in many studies. No miRNAs showed a decrease in expression.
Many published studies on oral cancer largely focus on betel-or tobacco-associated cancers and compare RNA in tumors versus normal nonpathological tissue. We also compared miR-NAs enriched with tobacco associated OSCC versus miRNAs in normal oral tissue. With this approach we saw induction of 8 miRNAs including miR196a-5p, miR-10b-5p, miR-31-5p, miR-451a and miR-144-3p (Table 2B). Of these, besides miR-196a-5p, miR-10b-5p and miR-31-5p have been shown to be enriched in earlier studies of head and neck cancer using surgically obtained whole mucosa, and miR144-5p and miR-451a were shown to be induced in the saliva of those with head and neck cancer [36][37][38].

miRNA expression in OSCC in never smokers
Never smoker patient features are summarized in Table 3. There were 12 patients, 7 females and 5 males, with ages ranging from 37 to 90 years, with the average age of 67. The control group of subjects ranged in age from 28 to 77; with an average of 61. There were 6 females and four males. MiRNA levels in these never smoker OSCCs were compared to that in nonpathologic mucosa in never smokers.
Among 372 miRNAs analyzed seven, miR-196a-5p, miR-10b-5p, miR-503-5p, miR-451a, miR-144-3p, miR-187-3p and miR-31-5p, showed increased expression in the OSCC samples of the never smokers, again using the rank product methodology (Table 2c) [29,30]. Confirmation of similar OSSC miRNA expression changes in ever smokers and never smokers. The TCGA data set of global miRNA expression data of 305 OSCC and control mucosa samples all prepared under standard methods linked to clinical data was used to further explore miRNAs in ever smokers versus never smokers. The set confirmed the observation that ever smokers and never smokers OSCCs show similar miRNA levels. We performed class comparison among the 217 ever smoker OSCC samples versus the 20 ever smoker controls (Table 4a, S5 Table). We then did the same between the 88 never smoker OSCC samples versus the 10 never smoker controls (Table 4b, S6 Table). All 10 miRNAs enriched in the never smoker OSCCs by rank product were contained in the list of miRNAs enriched with the ever smoker group. The ever smoker group showed more miRNAs differentially expressed but that may be due to the larger size of the ever smoker data sets. No miRNAs showed reduced levels in the never smokers tumor though three were depressed in the ever smoker tumor group versus control, miR-375, miR-1-2-3p, and miR-99a (Table 4c). When a direct comparison between ever smoker and never smoker OSCC miRNA expression data was done we saw only one miRNA was differentially expressed, miR-637, confirming how similar the two groups are (Table 4d). It was shown some time ago that HNSCC fall into 4 subtypes based on global miRNA expression, though one of the groups is heavily weighted to oral pharyngeal tumors [39][40][41]. We performed unsupervised clustering using nonnegative matrix factorization (NMF), which identifies common gene/RNA expression patterns, or metagenes, among samples [31]. It calculates a cophenetic coefficient for each value of K (number of clusters) which is maximal when  the clusters show maximal separation. Using the 238 miRNAs most variably expressed among the 305 OSCC samples K = 2-7 produced the highest cophenetic coefficient at K = 2 and 4 indicating two or four clusters of samples (Fig 2) [32]. A heat map reveas differential expression of a subset of the miRNAs used to separate the cases into two subtypes or clusters (S2 Fig). Given these two OSCC groups, we examined tobacco usage among the subjects and found it was possible one group showed a slightly higher number of smokers but this did not reach significance (Fig 2C). When we compared pack year exposure between these two groups we found the first group showed almost 40% lower exposure 62 6 ¼2.6 pack years versus the second 100+4.4 with t< 0.036. Suggesting tobacco exposure may indeed have some effect on miRNA expression. When 4 OSCC subclasses were examined for tobacco usage there was no significant difference in proportion of ever and never smokers based on the Chi Square test though some groups were small making statistical analysis difficult (Fig 2A and 2D).

Discussion
The etiology of OSCC of never smokers is unknown. If OSCC in never smokers is a distinct subtype of this cancer then regulatory RNAs in the tumor epithelium may be different than those of tobacco-associated OSCCs. Tumors from both cohorts are generally not associated with p16 enrichment or transforming HPV gene expression and that was the case here with one subject positive for p16 and one positive for HPV16 RNA (S2 Table) [24,26]. This study sought to examine OSCC epithelium to discern miRNA misexpression with this disease in never smokers. We measured miRNAs enriched in OSCC epithelium of never smokers and in ever smokers both compared to normal tissue. We found that miR-144-3p, miR-451a, miR-10b-5p, mir-31-5p, and miR-196a-5p were enriched with tumor formation whether the subject was an ever smoker or never smoker. Examination of the TCGA OSCCs and miRNA expression also suggests that overall there was much alike in OSCCs from ever and never smokers (S3 Table).
It was possible tobacco played a role in changing miRNA expression in a subgroup of OSCCs; however it has been noted that the "classical" subtype of HNSCCs, associated with heavy tobacco usage, is only a minor component of OSCCs. Indeed as published the TCGA group showed only 11.2% of group of 178 oral tumors showed the "classical" mRNA expression profile [39]. When we separated the OSCCs based on miRNA expression we found two or four subclasses (Fig 2). The two subclass clusters showed the maximal agreement with the mRNA based subclassifications with group one having most of the "atypical" mRNA expression pattern and group two having most of the "classical pattern" [40,41]. Curiously group two of the two subclasses showed a nominally higher percentage of smokers that did not reach statistical significance, but there was a 40% increase in cigarette pack year exposure 62±2.6 versus 100±4.4, t < 0.0389. These findings show tobacco usage may have some effect on miRNA expression, but other unknown factors have larger effects, and that relying on tobacco exposure to assign tumor type is not wise.
This study in part focuses on one cell type, epithelium, obtained by brush cytology, making it more sensitive to changes in expression of miRNAs found mainly in this specific cell type [27]. MiRNAs enriched in stroma and not epithlieum of tumors would not be apparent. It is important to note that anything that causes the brush to acquire cells normally not present in the epithelium, such as the epithelial invasion of lymphocytes that can occur in inflamed OSCC, can also result in changes in RNAs in the sample. Some samples may also have blood contamination while others do not. In particular, malignant lesions can be more highly vascularized with blood vessels next to epithelial cells, which can greatly increase the mix of blood cells exposed to the brush. We saw elevated levels of miR-451a in both tumor groups compared to normal tissue. This RNA is highly expressed in red blood cells (RBCs) and is a well-known indicator of RBC RNA contamination in plasma samples [42,43]. MiR-451a levels were highest in OSCC samples of ever smokers and lowest in normal samples of never smokers. In the samples, levels of miR-451a correlated with a second blood-linked miRNA, miR-144-3p, with a correlation coefficient of 0.91 [36,37,44], data not shown. While we showed both miR-451a and miR-144-3p are markers for OSCC, this property is almost certainly indirect and there is little reason to believe that expression of these two miRNAs is elevated in OSCC epithelial cells but that instead the tumors tend to have blood vessels in the epithelium. The TCGA dataset which contains RNA from surgically obtained OSCC and control tissue that is mainly epithelium and less than 50% stroma did not show increases of these miRNAs, probably because all samples had some blood.
In a study of lingual squamous cell carcinoma in young never smoker patients there was little evidence for differences in mutation spectrum in OSCCs of that group versus OSCCs of old smokers [45]. In contrast one HNSCC type, laryngeal tumors from tobacco users, showed a mutation spectrum, with a decrease of c>t mutations and an increase of c>a mutations, similar to lung carcinoma [45,46]. Both of these cancers are believed to be initiated by the mutagenic polycyclic aromatic hydrocarbons in tobacco and their combustion products. Because tobacco usage did not correlate with known tobacco mutation spectrums one might speculate that among many ideas, It may increase mutation rate via interactions with other compounds, or oral micro-organisms may change the metabolism of tobacco chemicals, so that the types of mutations that are induced change. Alternatively as Pickering et al. suggest tobacco may only be a tumor promoter for the cancer process in the mouth [45]. It could do this by increasing inflammation in oral tissue, or cell proliferation, decreasing apoptosis, etc. Our finding that miRNA expression was quite similar in OSCCs of ever and never smokers suggests tumor formation may be through similar processes in both groups. Although it would seem tobacco usage increases the probability of OSCC formation without modifying the process very much, how it does this remains a mystery.
The study of miRNA in OSCC epithelium of never and ever smokers revealed three miR-NAs that are enriched in epithelial OSCC samples obtained by brush cytology and the samples obtained surgically in the TCGA project ( Table 5). Two of these miRNAs have been shown to be enriched with OSCC in earlier studies, miR-196a-5p and miR31-5p. MiR-31-5p has been shown to be induced in HNSCC and OSCC compared to normal mucosa and is also induced in the epithelium with several inflammatory disorders like Idiopathic Bowel Syndrome [34][35][36]. Curiously, this miRNA was induced in benign lesions versus normal controls suggesting its induction is not specific to malignancy (S3 Table) and in a recent study was shown to be induced with leukoplakia [47]. MiR-503-5p was found to be induced with both TCGA OSCC groups and the nonsmoker OSCC sampled by brush cytology. It has been shown to be depressed in some tumor types, though in esophagus, colon, adrenocortical and parathyroid tumors it is enriched in advanced stages [48][49][50][51] [52]. Finally, miR-196a-5p was induced in all OSCC groups studied, suggesting it may be part of a key step in oral carcinogenesis (Table 2 and Fig 2). As it was not induced in the epithelium of nonmalignant oral pathologies, but only in OSCCs, it also may serve as an aid to diagnosis of OSCC (S3 Table). MiR-196a-5p has been shown to be induced in two studies of OSCC recently and in laryngeal carcinoma, and there is good evidence for a functional role in cervical squamous cell carcinoma [53][54][55][56][57]. Studies in cervical cancer suggest it has a role in regulating cell proliferation and that it targets p27, FOXO1 and the (PI3K) AKT pathway, and HOXC8, a cell remodeling protein [53,56] [53,58]. Additional studies in laryngeal cancer types saw properties for this alleged oncomir in cell migration and proliferation [55]. It seems to be highly induced in a large proportion of OSCCs of tobacco users, nontobacco users and betel nut users [53,56,58].
The work described suggests that there is much alike in OSCC in never and ever tobacco users in regard to miRNA misexpression so we sought to determine if these OSCC patients may share other factors related to OSCC causation. In addition to tobacco exposure, high ethanol intake and immunosuppression represent additional risk factors that have been linked to increased head and neck and OSCC [59][60][61]. Based on ethanol consumption the two groups were different. The never smokers showed statistically significant lower rates of total ethanol consumption in the TCGA data set and much lower number of subjects who consumed ethanol (S4 Table). This was supported by the current study where half of the ever smokers consumed ethanol regularly, two at heavy levels, while only 3 out of 12 of the never smokers reported any ethanol usage (Tables 1 and 2). This difference did not reach statistical significance. While immunoupression was not reported for TCGA patients, in the present study 2 of 12 never smoker patients were renal transplant patients treated with immuno-suppressants, with one additional patient with rheumtatoid arthritis treated with an anti-tumor necrosis factor drug, etanerecept [60]. No Immune suppression treatment history was found for the ever smoker OSCC group, though one subject was positive for HIV. In conclusion, despite the differences in risk factors between the two OSCC groups with different events presumably causing the cancers, the OSCC miRNA profile for ever smokers and never smokers was similar suggesting there is limited variability in miRNA changes when a normal cell progresses to OSCC. More work will be required to discern just how similar the carcinogenesis process is in the two groups and in other high-risk groups such as betel nut and ethanol users.
Supporting Information S1 Table. Sequences of primers and probe used to amplify and detect HPV16 E6 mRNA in the mRNA samples from never smoker OSCC lesions. (DOC)  Table. Rank product test of differential miRNA expression of ever smoker OSCC versus ever smoker normal epithelium in TCGA data set. (XLSX) S6 Table. Rank product test of differential miRNA expression of never smoker OSCC versus never smoker normal epithelium in TCGA data set. Half of a single brush oral mucosal sample was diluted 9x in Trizol and then both halves were subjected to RT-PCR to quantify 13 different miRNAs. We show that for the methodology used, storage of the sample frozen in Trizol, followed by 1-bromo-3-chloropropane (BCP) phase separation, then immediate glass filter binding using RNeasy columns (Qiagen), the range of miRNA species recovered was uniform from a single sample. This occurred whether the same was concentrated or diluted 9x. MiRNA from the concentrated and diluted samples was converted to cDNA then quantified using RT-PCR. A comparison of Ct values for 13 detectable miRNA species revealed similar relative amounts of each species with a correlation coefficient 0.96.