Identification and Validation of Reference Genes for RT-qPCR Studies of Hypoxia in Squamous Cervical Cancer Patients

Hypoxia is an adverse factor in cervical cancer, and hypoxia-related gene expression could be a powerful biomarker for identifying the aggressive hypoxic tumors. Reverse transcription quantitative PCR (RT-qPCR) is a valuable method for gene expression studies, but suitable reference genes for data normalization that are independent of hypoxia status and clinical parameters of cervical tumors are lacking. In the present work, we aimed to identify reference genes for RT-qPCR studies of hypoxia in squamous cervical cancer. From 422 candidate reference genes selected from the literature, we used Illumina array-based expression profiles to identify 182 genes not affected by hypoxia in cervical cancer, i.e. genes regulated by hypoxia in eight cervical cancer cell lines or correlating with the hypoxia-associated dynamic contrast-enhanced magnetic resonance imaging parameter ABrix in 42 patients, were excluded. Among the 182 genes, nine candidates (CHCHD1, GNB2L1, IPO8, LASP1, RPL27A, RPS12, SOD1, SRSF9, TMBIM6) that were not associated with tumor volume, stage, lymph node involvement or disease progression in array data of 150 patients, were selected for further testing by RT-qPCR. geNorm and NormFinder analyses of RT-qPCR data of 74 patients identified CHCHD1, SRSF9 and TMBIM6 as the optimal set of reference genes, with stable expression both overall and across patient subgroups with different hypoxia status (ABrix) and clinical parameters. The suitability of the three reference genes were validated in studies of the hypoxia-induced genes DDIT3, ERO1A, and STC2. After normalization, the RT-qPCR data of these genes showed a significant correlation with Illumina expression (P<0.001, n = 74) and ABrix (P<0.05, n = 32), and the STC2 data were associated with clinical outcome, in accordance with the Illumina data. Thus, CHCHD1, SRSF9 and TMBIM6 seem to be suitable reference genes for studying hypoxia-related gene expression in squamous cervical cancer samples by RT-qPCR. Moreover, STC2 is a promising prognostic hypoxia biomarker in cervical cancer.


Patient cohorts and tumor specimens
Totally 160 patients with locally advanced squamous cell carcinomas of the uterine cervix, prospectively recruited to our chemoradiotherapy protocol at the Norwegian Radium Hospital from 2001 to 2012, were included (Table 1). Illumina gene expression profiles existed for 150 patients (Illumina cohort, Table 1) and were used to select candidate reference genes for testing with RT-qPCR (Fig 1). For 42 of these patients, the hypoxia-associated dynamic contrast enhanced (DCE)-magnetic resonance (MR) imaging parameter A Brix [25] was available and used as measure of tumor hypoxia status. Ten independent patients (RT-qPCR cohort 1) were used for pre-evaluation of nine candidate genes. Moreover, 74 of the 150 patients in the Illumina cohort (RT-qPCR cohort 2) were used for further evaluation and validation of candidates. In addition, 22 biopsies from eight independent patients (2-4 biopsies from each patient) were used to evaluate intra-tumor heterogeneity (heterogeneity cohort).
Tumor volume and presence of pathologic lymph nodes were assessed from pre-treatment MR images. One to four tumor biopsies (approximately 5×5×5 mm) were taken at diagnosis, immediately snap frozen, and stored at −80°C. All patients received external radiation of 50 Gy Subgroup of patients in Illumina cohort. c Determined from pre-treatment MR images. Tumor volume undetermined for 13 tumors; nine in Illumina cohort, one in RT-qPCR cohort 1, and three in RT-qPCR cohort 2. d Calculated based on 3 orthogonal diameters (a,b,c) as (π/6)abc. e Detected at diagnosis by MRI or CT (n = 13), according to the response evaluation criteria in solid tumors version 1.1. f Determined by pharmacokinetic analysis of pretreatment DCE-MR images based on the Brix model [25].
to the primary tumor and elective areas, 45 Gy to the remaining pelvis, and additional 14 Gy to pathologic lymph nodes. This was followed by brachytherapy of 25 Gy. Concurrent cisplatin (40 mg/m 2 ) was given weekly in maximum 6 courses according to tolerance. The patients were followed up according to standard procedure, as described [25].   Gene expression analysis RNA isolation. Total RNA was isolated from cell lines using RNeasy MiniKit (Qiagen, Germantown, MD) and from frozen samples using Trizol reagent (Life Technologies) (Illumina cohort, RT-qPCR cohort 2) or miRNeasy MiniKit (Qiagen) (RT-qPCR cohort 1, heterogeneity cohort). All clinical specimens had more than 50% tumor cells in hematoxylin and eosin-stained sections, derived from the central part of the biopsy. Apart from the heterogeneity cohort, RNA from different biopsies of the same tumor was pooled and RNA concentration was measured using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Waltham, MA). RNA integrity was confirmed by Bioanalyzer (Agilent Technologies, Palo Alto, CA). All samples had an RNA integrity number (RIN) in the range 3.6-9.0 with a median RIN of 7.1.
Gene expression by Illumina bead arrays. For whole genome gene expression profiling, Illumina 48K bead arrays with approximately 48000 transcripts were used; HumanWG-6 v3 (patients, cell lines) and HumanHT-12 v4 (cell lines) (Illumina Inc., San Diego, CA). The patient data are part of the data sets used in previous work [25]. cRNA was synthesized, labeled, and hybridized to the arrays. Signal extraction and quantile normalization were performed using the software provided by the manufacturer (Illumina Inc.), and log2-transformed data were used in the analyses. The data were deposited in the gene expression omnibus (GEO) database (GSE75034).
Gene expression by RT-qPCR. For RT-qPCR analyses, the 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, CA) was applied, and the amplification curves were analyzed using the SDS Software v2.3 (Applied Biosystems). The DNA-free™ Kit (#AM1906, Ambion, Austin, TX) was used to remove genomic DNA, according to the manufacturer's description. Reverse transcription was performed by the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems), which includes random primers, using 2000 ng DNasetreated RNA in a 20 μl reaction. 1 μl cDNA was amplified using the TaqMan Fast Universal PCR Master Mix (2x) and Custom TaqMan Array 96-Well Fast Plates with dried-down Taq-Man Gene Expression Assays (Applied Biosystems), or single-tube TaqMan assays and 96-well Fast plates. Thermocycling parameters were as described in the Custom Array protocol, or for single-tube assays: 20 s at 95°C (enzyme activation), 40 thermal cycles of 1 s at 95°C (denaturation) and 20 s at 60°C (annealing and extension). TaqMan assays for candidate reference genes and three hypoxia-induced genes assessed in this study are provided in Table 2. Only pre-developed inventoried TaqMan assays were selected. For evaluating the TaqMan assays, the sizes of the PCR products were determined by gel electrophoresis, using a D1000 Screen-Tape 1 Station (Agilent Technologies). For the three selected reference genes (see below) and the three hypoxia-induced genes, the PCR efficiencies were determined from cDNA dilution curves and showed linear reaction efficiencies (S1 Fig).
All samples were run in duplicate, and the mean quantification cycle (C q ) for each sample was used in the analyses. Samples with mean C q > 37 or standard deviation > 0.4 were excluded. Minus reverse transcriptase controls were tested for residual genomic DNA for all TaqMan assays, and contamination was evaluated by non-template controls without RNA into the cDNA synthesis. All negative controls had undetermined C q . The expression level of a target gene was analyzed using the comparative C q -method [26], normalizing the C q -values of the target gene to a normalization factor calculated as the average C q -values of the reference genes; ΔC q,gene = C q,gene −(C q,CHCHD1 + C q,SRSF9 + C q,TMBIM6 )/3 for the reference genes identified in this study (see below).

Statistics
The geNorm [6] and NormFinder [7] algorithms were used to evaluate the stability of the candidate reference genes. The geNorm algorithm is based on the principle that the expression ratio of two ideal reference genes is identical in all samples. For each candidate, a stability measure M is calculated as the average pairwise variation of the gene with all other candidates; i.e. the average standard deviation of log2-transformed expression ratios between the genes. The genes with lowest M-values are considered to be most stable. The genes are ranked by stepwise exclusion of the least stable candidate and recalculation of M-values for the remaining genes is performed until the two most stable genes are left. To determine the number of reference genes to include for reliable normalization, pairwise variation between two sequential normalization factors (V n/n+1 ) was calculated for all samples, and the cut-off value for inclusion of an additional gene was set to 0.15 [6]. NormFinder is an ANOVA model-based approach, and was used for estimation of the overall variation of each candidate gene and its variation across patient subgroups by taking into account both the intra-and inter-group variations. The estimated variations are combined into a stability value for each gene, where a low value indicates more stable expression [7].
Spearman's rank correlation was used for estimating associations between continuous variables, Mann-Whitney U test was used to compare differences in gene expression levels between groups, and COX proportional hazard (PH) univariate analysis was used to assess the relationship of candidate reference genes to clinical outcome. The clinical endpoint was progression free survival (PFS) for follow-up until 5 years. PFS was defined as the time from diagnosis to disease-related death or first event of relapse. Deaths within 2 years after inclusion were regarded as disease-related unless another cause was documented. Patients were censored at their last appointment or at 5 years, and status was assessed at October 3rd 2014. A competing risk model was used for calculating cumulative incidence of disease progression, with death from other causes than cervical cancer as competing event. Eleven out of the 160 patients included died of other causes without experiencing recurrence during follow-up. Since around 1/3 of the patients experienced recurrence, cumulative incidence curves were compared between 1/3 of the cohort with the highest gene expression and 2/3 with the lowest expression. Gray's test was used to assess differences between the curves. Significance level was 5% unless otherwise indicated, and all P-values were two-sided.
All analyses were performed using R [27] version 3.1.1. For geNorm analyses, the read. qPCR() and selectHKs() functions in the ReadqPCR and NormqPCR packages [28] were applied. For the NormFinder analyses the Normfinder() function provided by Molecular Diagnostic Laboratory, Department of Molecular Medicine, Aarhus University Hospital Skejby, Denmark was used, and the competing risk analyses were performed with the cuminc() function in the cmprsk package [29].

Selection of candidate reference genes from array-based expression profiles
To identify stably expressed genes in cervical cancer samples, the selection of candidates was limited to genes that have been previously reported as reference genes in the literature [10,11,13,18,[30][31][32][33][34][35], as novel reference candidates based on meta-analysis of large-scale microarray or RNA sequencing data sets [19][20][21][22][23], or present on the TaqMan Express Human Endogenous Control Plate (Applied Biosystems). This resulted in a list of 422 different genes, for which 410 were present on the Illumina array used in the present study (Fig 1, S1 Table), represented by 631 different probes.
The most likely candidates among the 410 genes were identified by utilizing Illumina arraybased expression profiles of 150 cervical cancer patients and eight cervical cancer cell lines grown under normoxia and hypoxia. First, candidates were excluded based on their association with a hypoxia response in cervical cancer or with clinical parameters in the patient cohort. Genes for which expression of at least one probe in the clinical data set correlated with the hypoxia-associated DCE-MRI parameter A Brix (n = 42) were removed, as well as genes that were more than 1.5 fold up-or downregulated under hypoxia in at least one of the cell lines (S1 Table). In addition, candidates showing an association between their expression levels and tumor volume, stage, lymph node status, or clinical outcome in 150 patients (S1 Table) were excluded. The removed genes included GAPDH, HMBS, EEF1A1, ACTB, PGK1, RPLP0, and TBP, which have been identified as appropriate reference genes in previous studies of cervical carcinogenesis [10][11][12][13]36,37], and B2M, HPRT1, RPL11, RPL37A, ACTR3, and NDFIP1, which have been found to be suitable for hypoxia studies in head and neck cancer [18,30]. Totally, 99 different genes represented by 117 probes remained as reference candidates at this stage. To ensure detection of the reference genes in RT-qPCR assays, a further exclusion of genes based on a very low expression level (mean log2 expression of 150 patients < 7) was performed, reducing this number to 89 candidates represented by 101 probes.
In addition to stable expression, the reference gene should ideally have transcript abundance similar to the genes under investigation to increase the sensitivity to detect small gene expression changes [24]. We found that candidates with the lowest coefficient of variation (CV) were generally expressed at high levels, an observation also seen by others [24], whereas most hypoxia-regulated genes from our previous work [25] had a relatively low expression level (data not shown). A final panel of genes to be evaluated by RT-qPCR was therefore selected to represent different expression levels in addition to keeping the CV as low as possible. Candidates were also chosen in such a way that they represented different pathways or functional classes in order to minimize the risk of selecting co-regulated genes [6]. Finally, the availability of a suitable pre-designed and validated TaqMan-assay was used as a selection criterion. The panel of nine candidates to be tested in RT-qPCR assays is listed in Table 1, and had CVs in the range of 0.01-0.09 and mean log2 expression levels in the range of 7.4-14.0 in the clinical data set. Among these genes, GNB2L1 has been previously used as reference gene in hypoxia studies of head and neck cancer [18].

Pre-evaluation of 9 candidate reference genes by RT-qPCR
To ensure that the candidates could be properly detected in a TaqMan assay and possibly further limit the panel, a pre-evaluation of the nine genes was performed in 10 patients (RT-qPCR cohort 1, Table 1, Fig 1). The TaqMan assays were first evaluated by gel electrophoresis of the PCR-products from one patient (Fig 2A). For all assays, a strong band of expected size was seen. However, for SOD1 an additional weaker band of smaller size that might represent primer dimers also appeared, indicating that the assay was not working optimally. SOD1 was therefore excluded in the further analyses.
The eight remaining candidates were detected with C q -values ranging from 19.1 to 31.1, where GNB2L1 and TMBIM6 were the most abundant and RPL27A the least abundant transcript (Fig 2B). geNorm and NormFinder analyses were performed to rank the genes according to the overall stability across the ten patients (Fig 2C and 2D). The average expression stability M from the geNorm analysis ranged from 0.72 using all eight genes to 0.4 for the two most stable candidates (Fig 2C). The ranking of the genes in the NormFinder analysis was almost identical to the geNorm ranking (Fig 2D). For a more thorough stability analysis, including assessment of the stability across patient subgroups, and determination of the optimal number of reference genes to include for normalization, the five most stably expressed genes as determined by the two methods, i.e. CHCHD1, GNB2L1, IPO8, SRSF9 and TMBIM6, were examined further.

Determination of an RT-qPCR normalization factor
The five remaining candidates were evaluated by RT-qPCR analysis in 74 patients (RT-qPCR cohort 2, Table 1, Fig 1), resulting in C q -values between 17.9 and 25.4 (Fig 3A). geNorm and NormFinder analyses showed similar ranking of the genes according to the stability across the patients (Fig 3B and 3C). From the geNorm analysis, the average expression stability M ranged from 0.57 using all five genes to 0.45 using the two most stable candidates.
To evaluate the sufficient number of genes required for reliable normalization, a stepwise procedure was utilized by including genes sequentially into the normalization factor according to their increasing M-values until no significant contribution was achieved. The pairwise variation between the normalization factors with three and four genes were below the cut-off value (V 3/4 < 0.15), indicating that the three candidates with lowest M-value, CHCHD1, SRSF9 and TMBIM6, are sufficient for normalization (Fig 3D). These candidates were also found to be the three most stable genes in the NormFinder analysis ( Fig 3C). Moreover, their average M-value was 0.49 (Fig 3B), and below the range of 0.5-1 that should be required when evaluating reference genes in cancer biopsies [38].
We further evaluated the stability of the five candidates across patient subgroups using the NormFinder algorithm (Fig 4). In this analysis, CHCHD1, SRSF9 and TMBIM6 also appeared as the optimal reference genes. They were the most stable candidates across patients with low and high tumor stage or volume and with and without recurrence, and for patients with and without lymph node involvement at diagnosis the three candidates were among the four most stable genes. Moreover, in analysis of the tumor hypoxia status, CHCHD1, SRSF9 and TMBIM6 were the most stable candidates across patients with high and low A Brix (Fig 4). To further validate the stability of these three genes, geNorm analysis on RT-qPCR data from normoxic and hypoxic samples from eight cervical cancer cell lines was performed. The average M-value for the three genes were 0.79, indicating stable expression across normoxic and hypoxic culture conditions [38]. Thus, CHCHD1, SRSF9 and TMBIM6 seemed to be well suited for calculating a normalization factor and were selected for validation in studies of hypoxiainduced gene expression (Fig 1).  Validation of CHCHD1, SRSF9 and TMBIM6 as reference genes in hypoxia studies RT-qPCR analyses were performed for three genes that we have previously reported to be induced by hypoxia in cervical cancer cell lines and associated with A Brix in cervical cancer [25], i.e. DDIT3, ERO1A and STC2. The RT-qPCR cohort 2 of 74 patients, for which we had Illumina expression data, was utilized. Gel electrophoresis of the PCR products showed that the TaqMan assays worked properly, generating a single specific product of the expected size for each of the three genes (Fig 5A). The expression levels of DDIT3, ERO1A and STC2 were calculated by normalizing the C q -values to the average C q -value of CHCHD1, SRSF9 and TMBIM6. The -ΔC q -value was shown to be highly correlated with the Illumina data for all three genes (P < 0.001; ρ from 0.78 to 0.83). Further, significant negative correlations between Reference Genes for Clinical Hypoxia RT-qPCR Studies -ΔC q and A Brix were found, in agreement with the correlations obtained with Illumina data ( Table 3). The identified reference genes therefore seem suitable for measurement of hypoxiainduced changes in cervical cancer.
Since hypoxia is known to be associated with aggressiveness in cervical cancer [3][4][5], we evaluated the prognostic value of DDIT3, ERO1A and STC2 expression in the Illumina and RT-qPCR data sets. Based on the Illumina data of 150 patients, a statistically significant increased risk of disease progression was found for patients with the highest expression of DDIT3 or STC2 compared to the others (P = 0.047 and P = 0.015, respectively, Gray's test; data not shown), whereas ERO1A expression was not associated with outcome (data not shown). In the RT-qPCR cohort 2 of 74 patients, the strongest association to outcome was found for STC2 in both data sets, and for the RT-qPCR data the relationship was statistically significant (Fig 5B  and 5C). This further validated the suitability of CHCHD1, SRSF9 and TMBIM6 as reference genes in hypoxia studies, and suggested that high STC2 expression is associated with chemoradiotherapy failure, in accordance with a previous study on cervical cancer [39]. Moreover, by RT-qPCR analysis STC2 was verified as strongly hypoxia-induced in most of the eight cervical cancer cell lines (S2 Fig), supporting the finding that STC2 might be an important hypoxia-regulated gene in cervical cancer. DDIT3 and ERO1A were also verified as hypoxia-induced in cell lines, although a smaller induction was in general observed compared to STC2 (S2 Fig). Further, to address the intra-tumor heterogeneity of STC2 expression, its expression level in 22 biopsies from eight independent tumors was measured. The stability of the three reference genes was validated in these new samples with an average M-value of 0.89 in geNorm analysis. For most of the tumors all biopsies were classified into either the low-or high STC2 expression group (Fig 5D), further supporting a potential of this gene as hypoxia biomarker. Associations between STC2 expression and treatment outcome have also been shown for other cancer types, like gastric cancer [40], nasopharyngeal carcinoma [41], renal cell carcinoma [42], esophageal squamous-cell carcinoma [43], and colorectal cancer [44]. Studies in cell lines have identified STC2 to be involved in cellular growth, migration, and invasion, both under normoxic [43,45,46], and hypoxic conditions [47][48][49][50]. Taken together, STC2 might be a molecular biomarker of hypoxia-related tumor aggressiveness in cervical cancer that warrants further investigation.

Conclusions
We have identified and validated CHCHD1, SRSF9 and TMBIM6 as reference genes for studying hypoxia-related gene expression in fresh-frozen clinical samples from squamous cell carcinoma of the uterine cervix by RT-qPCR. We have also shown that high STC2 expression was Reference Genes for Clinical Hypoxia RT-qPCR Studies associated with chemoradiotherapy failure in our cohort, implying that STC2 might be a useful prognostic hypoxia biomarker in cervical cancer.