Effective Alu Repeat Based RT-Qpcr Normalization in Cancer Cell Perturbation Experiments

Background Measuring messenger RNA (mRNA) levels using the reverse transcription quantitative polymerase chain reaction (RT-qPCR) is common practice in many laboratories. A specific set of mRNAs as internal control reference genes is considered as the preferred strategy to normalize RT-qPCR data. Proper selection of reference genes is a critical issue, especially in cancer cells that are subjected to different in vitro manipulations. These manipulations may result in dramatic alterations in gene expression levels, even of assumed reference genes. In this study, we evaluated the expression levels of 11 commonly used reference genes as internal controls for normalization of 19 experiments that include neuroblastoma, T-ALL, melanoma, breast cancer, non small cell lung cancer (NSCL), acute myeloid leukemia (AML), prostate cancer, colorectal cancer, and cervical cancer cell lines subjected to various perturbations. Results The geNorm algorithm in the software package qbase+ was used to rank the candidate reference genes according to their expression stability. We observed that the stability of most of the candidate reference genes varies greatly in perturbation experiments. Expressed Alu repeats show relatively stable expression regardless of experimental condition. These Alu repeats are ranked among the best reference assays in all perturbation experiments and display acceptable average expression stability values (M<0.5). Conclusions We propose the use of Alu repeats as a reference assay when performing cancer cell perturbation experiments.


Background
Reverse transcription quantitative polymerase chain reaction (RT-qPCR) has proven to be a reliable method to quantify gene expression. Correct normalization is a critical issue for accurate interpretation of RT-qPCR results. This can be achieved using several strategies such as ensuring similar numbers of cells, similar amounts of input RNA, applying internal control reference genes like ribosomal RNAs (rRNAs) or messenger RNAs (mRNAs), or merging multiple strategies in one protocol [1,2].
The use of mRNAs as internal control reference genes for normalizing RT-qPCR data is being applied widely [2][3][4][5][6]. However, this strategy should be carried out carefully as its accuracy depends directly on the expression stability of the selected reference genes. According to the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE guidelines) [7], it is no longer accepted to consider that certain reference genes are stable by convention. Our group has previously reported a strategy for accurate normalization of RT-qPCR data based on geometric averaging of multiple stably expressed internal control genes [4]. In this study, we show that the choice of reliable internal controls is of particular importance in experiments that involve perturbation of cancer cells. Treating cancer cells with therapeutic agents or RNAi-mediating siRNA or shRNA molecules induces dramatic changes in the expression levels of many genes including commonly used reference genes. This phenomenon is due to (non-specific) off-target effects that are encountered upon delivery of such molecules [8], or indirect regulation after treatment. Therefore, we evaluated the expression of commonly used reference genes and expressed Alu repeats as internal controls for normalization in experiments that include perturbed cancer cell lines. Alu repeats are found in the untranslated regions of several thousands of known protein coding genes, and they have been reported to be useful as a single normalization factor for RT-qPCR reactions [9].
Treatment with ATRA. All-trans retinoic acid (ATRA) is a small lipophilic molecule [7,12] that inhibits proliferation and induces differentiation of neuroblastoma cells [4,[13][14][15]. We treated CLB-GA and NGP cells with 0 or 5 mM ATRA for one and five days, and observed that ATRA induces the outgrowth of neurites ( Figure 1B).
Treatment with withaferin-A. Withaferin-A is a steroidal lactone purified from the medicinal plant Withania somnifera. This compound induces apoptosis in neuroblastoma cells and is an antiangiogenic agent [8,16]. We treated SK-N-SH and IMR-32 neuroblastoma cells with withaferin-A and observed reduced cell viability in a dose and time dependent manner ( Figure 1C). We then treated SK-N-SH and IMR-32 cells with 0 or 1 mM withaferin-A for one day to evaluate the stability of the reference genes.
Treatment of neuroblastoma cell lines with TAE-684. TAE-684 is a small molecule inhibitor of activated anaplastic lymphoma kinase (ALK) [9,17] and reduces cell viability of ALK mutated neuroblastoma cells [18]. After treating SK-N-SH and CLB-GA cells with TAE-684, we observed reduced cell viability in a dose and time dependent manner ( Figure 1D). We then treated these 2 cell lines with 0, 0.1, 0.3 and 1 mM TAE-684 for 3, 6, 12, 24, and 48 hours to evaluate the stability of the reference genes.
Treatment of a NSCLC cell line with TAE-684. H3122 is a NSCLC cell line with an EML4-ALK fusion gene that was treated with TAE-684 in the same manner as described above.
Transient transfections of neuroblastoma cell lines with miR-1 mimic. MiR-1 targets the 39-UTR of the PTK9 mRNA leading to PTK9 degradation [19]. MiR-1 is often used as a positive control in experiments with miRNA mimic transfections to evaluate target gene mRNA down regulation by qPCR. We performed transient transfections of SK-N-BE(2c), SK-N-SH, SH-EP, and SH-SY5Y neuroblastoma cells with miR-1 mimic, negative control (a scrambled miRNA mimic), or mock transfection for 24 hours ( Figure 1E).

Transient transfections of SH-EP cells with siRNAs
against transcribed ultraconserved regions. We transfected SH-EP cells with siRNA against both strands of T-UCR uc.460, and an siRNA against the negative strand of T-UCR uc.73.The knockdown efficiency is shown in Figure 1F. More details about the experimental design are found in File S1.
Transient transfection of leukemia cell lines with miR-223 mimic. MiR-223 was found to be highly expressed in T-cell acute lymphoblastic leukemia (T-ALL) [20]. However, three T-ALL cell lines, HPB-ALL, ALL-SIL and TALL-1 presented with a low miR-223 level. We expected that overexpression of oncogenic miR-223 in these cell lines would increase the proliferative capacity of the cells and prove the oncogenic potential of this miRNA (data not shown).
Optimization of concentration of PHF6-targeting siRNA in T-ALL cell lines. We transfected PHF6 wild-type T-ALL cell line JURKAT and evaluated the knockdown efficiency on mRNA level ( Figure 2). Next, we transiently transfected the HSB-2 and PF-382 T-ALL cell lines (both PHF6 wild-type) with PHF6targeting siRNA. Significant PHF6 knock down was confirmed by qPCR (shown in Figure 2). More details about the experimental design are found in File S1. Treatment of breast cancer, AML, prostate cancer, colorectal cancer and neuroblastoma cell lines with JQ1. JQ1 is a small molecule compound that inhibits a bromodomain protein called BRD4. Targetting of this oncogene leads to growth inhibition of cancer cell lines. One known mechanism is through downregulation of MYCN. We treated two breast cancer cell lines (MCF-7 and SKBR3), one AML cell line (K562), one prostate cancer cell line (PC-3), one colorectal cancer cell line (SW-620) and one neuroblastoma cell line (SJNB-12) with 1 mM JQ1 for 24 and 48 hours.

MCF-7 and HeLa Transcriptome PCR Arrays
We used commercially available ready-to-use cDNA plates from MCF7 (breast cancer cell line) and HeLa (cervical cancer cell line). Each cDNA sample has been synthesized from RNA extracted from a cell line that had been exposed to one of 90 different chemical inhibitors. These chemical inhibitors target a wide range of different pathways resulting in various perturbations of a two widely used cancer cell lines. The chemical inhibitors and the genes they target are listed in Table S1.
Alu Repeats are the most Stably Expressed Reference Sequence mRNA levels of 11 candidate reference assays were measured in all above described experiments and the average expression stability was calculated using the geNorm algorithm. GeNorm ranks the reference genes according to their stability value (referred to as the M-value) and calculates the optimal number of genes to be used for normalization in a given experiment using the V-value ( Figure S1). The M-values can be used to rank the genes from the least to the most stable one [4].
In 13 of the 19 perturbation experiments performed, Alu repeats (Alu-Sq) were ranked among the three most stable reference assays ( Figure 3). This observation prompted us to further analyze the potential value of Alu repeats as stable reference candidates.
A rank aggregation method based on voting theory (Borda count) was used to combine the 19 ranked lists of reference candidates [21]. This method tries to find an ordered list of reference assays as close as possible to all individual ordered lists by calculating the weighted Spearman's footrule distance, and using a cross-entropy Monte Carlo algorithm or genetic algorithm [21]. The analysis of the 19 full gene lists, generated by the 19 different experiments, resulted in Alu repeats ranked at the first position ( Figure 4). This confirmed that Alu repeats represent the most stable reference assay across all data sets.

Discussion
RT-qPCR is the most commonly used method to quantify gene expression and accurate normalization is required to interpret the RT-qPCR data correctly. Normalization using endogenous control genes is a widely used method to correct for the technical variations that occur during RT-qPCR reactions. Until recently, a single non-validated reference gene has routinely been used as an internal control. We have previously reported that this strategy can lead to incorrect data with an error up to 3-fold in 25% of the cases [4]. Using multiple reference genes as internal controls for normalizing RT-qPCR data as well as using the appropriate reference genes for specific experimental purposes has already been strongly advocated in literature [2,22]. Validating the stable expression of reference genes is an important issue in every single experimental procedure since cell manipulations such as treatment with therapeutic compounds can dramatically influence their expression.
In this study, we emphasize on the fact that proper selection of the reference genes is important for interpretation of RT-qPCR data and demonstrate that Alu repeats represent the most stable reference assay in a wide range of experimental conditions in eight different cancer types.
We selected 11 reference genes that are widely used in literature and that belong to different functional classes to avoid coregulation. We selected structure related genes (ACTB, RPL13A), metabolism related genes (HPRT, GAPDH), and transcription related genes (TBP). The rest of the genes are not categorized specifically in one functional class. We also included expressed Alu repeats, which are abundantly interspersed throughout the genome. A commonly used normalization factor for RT-qPCR experiments in literature is 18S rRNA. The rRNA constitutes more than 90% of total RNA, and this fact led many researchers to use 18S rRNA as a control for normalization of gene expression data. However, it has been shown that equal fractions of rRNA do not necessarily ensure equal fractions of mRNA [23]. This concern is even of greater importance in cancer cell perturbation experiments as these perturbations may lead to differential expression of RNA polymerase I and/or II, or differential degradation of the two RNA populations [24], and consequently may result in further imbalances in the rRNA/mRNA ratio. In addition, the high amount of rRNA as compared to mRNA makes it difficult to subtract the background fluorescence in data analysis of RT-qPCR data [4]. Due to these reasons, using 18S rRNA as a control in RT-qPCR experiments could lead to false interpretation of gene expression data. We have therefore not evaluated 18S rRNA in our study.
RT-qPCR was performed for the 11 reference assays in 21 cancer cell lines derived from 9 different cancer entities using SYBR green technology. The cell lines were exposed to harsh treatment conditions, generating 19 different datasets with a total of 418 samples. The cells were treated with various chemical inhibitors including pro-apoptotic compounds and, differentiationinducing agents, or transfected with miRNA mimics or siRNAs. Using geNorm [4] implemented in qbase+ [1], we calculated the stability value or M-value. The M-value is the average pairwise variation (standard deviation) of the log-transformed ratios of expression levels of paired candidate reference genes. This value not only allowed us to rank the reference genes in terms of their stability, but also to compare the stability of these reference genes across different experimental conditions. M-values and the   Figure S2.
After calculating the M-values and ranking the reference genes, we noticed that Alu repeats are ranked among the most stable reference genes in the vast majority of the datasets and generally have low M-values. We then applied a rank aggregation strategy [3] to determine the optimal ranking of the reference genes across all 19 data sets. This analysis confirmed that Alu repeats represent the most stable reference assay with acceptable M-values. All our gene expression measurements were done using SYBR green. Different technologies are currently available, and several studies have performed comparisons between the different platforms used [25]. Results have shown that SYBR green is in very good concordance with the widely used TaqMan gene expression assays. In the current study, we have used cell lines that give a good yield of RNA and perfect quality (tested using the Experion system from Bio-Rad). We believe that using such high quality RNA material and such abundant reference genes will generate reproducible results regardless of the platform being used. Our results strongly emphasize the importance of proper selection of reference genes for different experimental setups. In addition, we showed that Alu repeats can serve as a stable reference in most of the experimental conditions.

Conclusions
The reliability of RT-qPCR data is based on the accurate normalization of the generated data using internal reference genes. The stability and suitability of putative endogenous control genes is a necessity for accurate normalization and for correct interpretation of gene expression data. In this study, we report that, among 11 commonly used reference genes, Alu repeats are the most stable reference sequence in cell lines from 9 different cancer types that were subjected to different perturbation experiments. We therefore recommend to include Alu repeats as a first candidate for normalization of RT-qPCR data.

Selection of the Reference Genes
The selection of the internal control genes evaluated in this study (Table 1) is based on a previous study published by our group [2,[4][5][6]. These genes are commonly used as reference genes in literature and belong to different pathways to avoid coregulation of these genes upon different treatment conditions. We expanded this selection to include expressed Alu repeats.

Cell Lines and Culturing of Cells
The cells used are established cell lines from 9 tumour types, neuroblastoma, T-ALL, melanoma, breast cancer, acute myeloid leukemia, prostate cancer, colorectal cancer, non-small-cell lung    Table S1.

Statistical Measurements and Data Analysis
GeNorm available in qbase+ (Biogazelle, http://www. qbaseplus.com) was used to calculate the M-values and quantify gene expression data. GeNorm is an algorithm that calculates a gene expression stability measure (M-value) of the selected reference genes. This is done by calculating the pairwise variation (standard deviation of logarithmically transformed expression ratios) of each reference gene with all other reference genes. The lowest M-value indicates the gene with the highest expression stability. Stepwise exclusion of the gene with the highest M-value allows the ranking of genes in terms of expression stability. The analysis of all individual ranked gene lists was done using the rank aggregation R package called ''RankAggreg'' [21]. RankAggreg was used to determine the most stable reference gene across all experiments. Rankaggreg is an R package for rank aggregation analysis. We used this package to analyze the individual ranked gene lists. These gene lists were ranked in terms of their M-values and the rank aggregation analysis allowed us to find the closest possible list to all individual lists generated by the individual experiments. More specifically, we used the Cross-Entropy (CE) Monte Carlo algorithm implemented in this package which starts by generating random lists and then converges towards the best optimal list through an iteration procedure that uses a distance function. Weighted Spearman's footrule distance is used for this purpose and in our case the weight used is the M-value generated by the GeNorm algorithm for every gene in the individual lists. Figure S1 GeNorm V-values of the individual experiments. GeNorm Vvalue is used to determine the optimal number of reference genes. GeNorm calculates the pairwise variation between 2 sequential normalization factors (NFs). The normalization factor is the geometric mean of expression of the selected reference genes. The normalization factor NF n+1 is the geometric mean of NF n plus an additional reference gene. V 2/3 is the variation between the NF 2 and NF 3, and so on. Vandesompele et al.