Comparison of MicroRNA Deep Sequencing of Matched Formalin-Fixed Paraffin-Embedded and Fresh Frozen Cancer Tissues

MicroRNAs regulate several aspects of tumorigenesis and cancer progression. Most cancer tissues are archived formalin-fixed and paraffin-embedded (FFPE). While microRNAs are a more stable form of RNA thought to withstand FFPE-processing and degradation there is only limited evidence for the latter assumption. We examined whether microRNA profiling can be successfully conducted on FFPE cancer tissues using SOLiD ligation based sequencing. Tissue storage times (2–9 years) appeared to not affect the number of detected microRNAs in FFPE samples compared to matched frozen samples (paired t-test p>0.7). Correlations of microRNA expression values were very high across microRNAs in a given sample (Pearson’s r = 0.71–0.95). Higher variance of expression values among samples was associated with higher correlation coefficients between FFPE and frozen tissues. One of the FFPE samples in this study was degraded for unknown reasons with a peak read length of 17 nucleotides compared to 21 in all other samples. The number of detected microRNAs in this sample was within the range of microRNAs detected in all other samples. Ligation-based microRNA deep sequencing on FFPE cancer tissues is feasible and RNA degradation to the degree observed in our study appears to not affect the number of microRNAs that can be quantified.


Introduction
MicroRNAs are noncoding small RNAs with a length of 22 to 26 nucleotides (nt). MicroRNAs result in degradation of complementary messenger RNA in many species. MicroRNAs play an important role in cell development, cell death, cell proliferation [1], apoptosis [2] and angiogenesis [3]. The pioneer bead-based flow cytometric microRNA expression profiling method revealed that microRNA profiles reflect the developmental lineage and differentiation of tumors [4]. Normal tissues and cancer tissues were shown to have distinctive expression profiles and microRNA profiles can describe microRNA-driven pathways in solid tumors [5].
The current methods for microRNA quantification are: realtime reverse transcription-PCR [6,7], microarray [8], Nanostring, and next generation sequencing [9][10][11]. Especially, the newly developed next generation sequencing technologies have the potential to discover novel microRNAs and other small RNAs. Most tissues processed in the health care setting will undergo formalin-fixation and paraffin-embedding (FFPE). Most clinical assays performed in pathology laboratories are optimized for FFPE tissues. FFPE tissues are typically stored at room temperature. Formalin preserves tissue samples by creating cross-linking between proteins, DNA and RNA. DNA extracted from FFPE tissue has been shown to be useful for copy number analysis and mutation analysis on at least one platform in some settings [12]. However, the chemical modification between macromolecules and RNAs caused by formalin fixation can accelerate the degradation of RNA [13] [14]. The stability of microRNAs is generally much more robust than messenger RNA [15]. Next generation sequencing based microRNA expression analysis has been developed on several platforms, including Roche/454 platform, Illumina's Genome Analyzer and ABI's SOLiD platform [16] [17], and different commercial protocols for miRNA library preparation have been developed by the three companies [18]. There is currently only limited evidence available that microRNA sequencing results are reliable and valid. Ma et al. demonstrated the feasibility of miRNA profiling based on Sanger sequencing in 10-year-old archived tissue [19]. To our knowledge there are only two peer-reviewed studies published that compared microRNA sequencing results of matched frozen and FFPE specimens using the Illumina platform [9,20]. The objective of this study was to determine if FFPE samples can be successfully characterized by deep miRNA sequencing. To this end we analyzed matched frozen and FFPE tissues of different histologies for which detailed processing and storage information was available.

Ethics Statement
Human malignant tissue samples procured during period 2003-2009 were obtained from the Cooperative Human Tissue Network (CHTN/NCI), Midwestern Division, which is funded by the National Cancer Institute, based on an internal review board (IRB) approved research protocol (Ohio State University Human Subjects Protocol -2011E0377). Other Investigators may have received specimens from the same subjects.

Tissue specimens
Eight paired human malignant tissue samples were received. Remnant surgical tissues were taken after diagnostic samples were secured from patients (five males and four females; median age 58 years, range 39-78 years) with breast invasive ductal carcinoma (2), renal clear cell carcinoma (2), lung adenocarcinoma (1), prostate adenocarcinoma (1), metastatic melanoma (1), and sarcoma of thigh (1). Surgical specimens were transported from the operating rooms to Tissue Procurement, examined under a pathologist's supervision and preserved within 5 to 57 recorded minutes. Paired tissue samples selected for research were either immediately snap frozen in liquid nitrogen and placed in a -80uC freezer for monitored storage or fixed in 10% buffered formalin for up to 24 hours and then processed into a paraffin embedded block and stored at room temperature. When the research samples were received from CHTN, the coded samples were accompanied by a coded final pathology report and a quality assessment report verifying collection of tumor tissue. The specimens have been reviewed by a pathologist and 4 cases had 100% tumor, 3 cases 90-95% and 1 case 50-60% tumor, and matched FFPE and frozen samples were generally comparable in tumor content and size. The clinicopathologic features of the cases used for microRNA sequencing and expression analysis are listed in table 1.
RNA Isolation, quantitation, and quality Assessment RNA from FFPE samples were extracted using Recover All Total Nucleic Acid Isolation Kit (Life Technologies Corporation, Carlsbad, CA). Briefly, 5-10 mg samples were sliced from paraffin blocks and deparaffinizated by xylene at 50uC, followed by 100% ethanol wash. The air-dry tissue samples were digested by proteinase K for 24 hrs in a microtube shaking incubator set at 50uC. The digested samples were mixed with appropriate volume of isolation additive and 100% ethanol. After passing the mixture through the filter cartridge, the DNA and RNA were retained on the filter. The DNA was removed by on-filter DNase digestion. The RNA was purified by washing buffer and eluted with nuclease-free water.
The microRNA samples from the fresh frozen samples were extracted using PureLink TM miRNA Isolation Kit (Life Technologies Corporation, Carlsbad, CA). Briefly, 5 mg fresh frozen tissue samples were mixed with 300 ml Binding Buffer (L3) and homogenized using tissue homogenizer. The homogenized samples were centrifuged, and supernatant were mixed with 300 ml 70% ethanol. The solution containing the RNA was purified by two round of spin columns cleanup. The RNA was eluted with 50-100 ml sterile RNase-free water.

MicroRNA sequencing
Small RNAs from FFPE and fresh frozen samples were prepared for SOLiD sequencing as follows: the total RNA samples were processed by the flashPAGE fractionator (Ambion) and flashPAGE Clean-Up Kit (Ambion). The enriched small RNA was then processed according to the SOLiD Small RNA Expression Kit protocol (Applied Biosystems). The purified small RNAs were ligated with 5' and 3' adapter mix using RNA ligase. The ligated products (40-60 bases in length) were reverse transcribed and purified on Novex 10% TBE-Urea gel. Subsequently, 15-18 cycles of PCR were performed by amplifying the purified cDNA with barcoded PCR primer sets provided in the kit, which differed by a unique 6-nucleotide sequence. The amplified products were loaded on Novex 6% TBE gel (Invitrogen) and the gel bands containing 110 to 130 bp fragments were excised. The amplified products were purfied from excised gel band, amplified by emulsion PCR, then loaded on Applied Biosystems SOLiD 4 next generation high throughput sequencing system for data acquisition. The quality of the samples and libraries were verified on the Agilent Bioanalyzer [21,22]. The raw SOLiD sequencing data were uploaded to the NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE45740 (http://www.ncbi.nlm. nih.gov/geo/query/acc.cgi?acc = GSE45740).

RT-qPCR for microRNA validation
The RNA from fresh frozen and FFPE tissues were validated by real-time (RT) quantitative PCR. microRNA expression was normalized to the small nuclear RNA U6 using TaqMan MicroRNA Assay kits (Applied Biosystems, Darmstadt, Germany). A 30 ng sample of RNA was processed by the TaqMan MicroRNA Reverse Transcription kit (Applied Biosystems, Darmstadt, Germany). In brief, 5 ml RNA was mixed with 1 mmol/l of each deoxyribonucleotide triphosphate, 50 units of Multiscribe Reverse Transcriptase, 56 reaction buffers, 4 units RNase inhibitor, and 56 gene-specific RT primers mix in a final reaction volume of 15 ml. Reactions were then incubated at 16uC for 30 min, 42uC for 30 min, 85uC for 15 min with a final hold at 4uC. Afterward, 15 ml of the cDNA solution was diluted by nuclease-free water to a final volume of 100 ml. Quantitive PCR was run on a 7900 HT PCR machine with denaturation step at 95uC for 10 minutes, followed by 40 cycles of a denaturation step at 95uC for 15 seconds, and an annealing/elongation step at 60uC for 60 seconds. Fold change for microRNA in tissue samples was then calculated using the equation 2-DCt test/2-DCt control.

Data analysis
The linker sequences of small RNA libraries were removed using cutadapt software (http://code.google.com/p/cutadapt/) under default parameters. The length distribution of microRNA was constructed by plotting reads counts against 0-30 nucleotide lengths. The small RNA Analysis Pipeline Tool from Applied Biosystems was used with the following parameters to include only sufficiently high quality reads in our study: 1) Minimal average read quality threshold was 20 (The QV value is calculated using a phred like score q = -106 log10 (p) where q is the quality value and (p) is the predicted probability that the color call is incorrect.), 2) to be stringent we allowed only 1 mismatch between reads and miRBase version 16 hits for each counted read 3) the minimal length of aligned reads had to be 17 bases. Data were normalized as reads per million (RPM). Sequences with expression value

Effects of FFPE sample degradation: microRNA read length distribution
Formalin fixation and long storage times are known to result in RNA degradation [23]. This degradation results in a shorter average RNA length [24]. To identify possible degradation effects of microRNAs, we analyzed the read length distribution of matched fresh frozen and FFPE cancer tissue samples. After the 3' adaptor sequences of each read were trimmed, sequence reads length distribution was examined (figure 1). Read lengths were observed to be centered at 21 nt in all frozen samples and all but one FFPE samples. There was one FFPE sample whose peak microRNA length was found at 17 nt, indicating significant degradation. We then carefully examined the small RNA classification profiles and correlation of microRNA expression values of fresh frozen and FFPE samples. We found that there are negligible or no effects on those analyses. Storage times or preparation methods were not different between this degraded FFPE sample and the other samples analyzed in this study. Also, the proportion of small RNAs in the 20-22 nt range was lower for several cases in the FFPE compared to the matched frozen specimens (one renal cell carcinoma case, the lung adenocarcinoma case, and the prostate cancer case). This is also likely due to degradation in FFPE samples, although small variations during the library preparation could partially contribute to the observed differences.

Effects of FFPE sample degradation: storage time
The storage time of the samples ranged from 2 to 9 years. 624 micoRNAs and their precursors were identified in the 8 pairs of tissue samples. Some microRNAs could not be detected in either fresh frozen or FFPE samples. Figure 2 shows the percentages of microRNAs that were detected in both fresh frozen and FFPE samples, only in fresh frozen samples and only in FFPE samples over the year of specimen collection. There were no storage-timedependent changes in those percentages, indicating that there is no significant association between storage time and the number of microRNAs present during this 9 year time window.
We next examined the percentage of microRNA reads among total reads (figure 3) to understand if the purity of microRNAs was reduced in the FFPE samples. Approximately 70% and 85% of reads in FFPE and frozen samples, respectively, were identified as microRNA sequences. A possible reason for this difference could be the presence of fragments of lncRNAs and mRNAs in the fraction of RNA smaller than 40 nucleotides (nt) in FFPE tissues (figure 4).

Effects of FFPE sample degradation: classification of small RNAs
To better understand if the difference in the percentage of small RNA sequences among all reads between frozen and FFPE tissue data we examined the distribution of RNA classes in one of the breast cancer sample. There were a total of 11563414 and 10273837 sequencing reads, in the two examined matched samples and the reads were mapped to the human genome (hg19, National Center of biotechnology information building) using custom Python scripts. The reads were classified in different RNA groups according to the EnsEMBL database. The majority of small RNAs in both fresh frozen and FFPE libraries were microRNAs (figure 4). Comparing data from matched fresh frozen samples to data from FFPE samples there were lincRNA and sequences of unknown classes in data obtained from FFPE samples. Given that only RNAs smaller than 40 nt were extracted and processed for sequencing this suggests that there is fragmentation of lincRNA and other long RNA in FFPE tissues, which are longer than 40 nt.
A total of 469 to 548 (mean 519) different microRNAs and their precursors were identified in both fresh frozen and FFPE samples. We determined the percentages of microRNAs that have more than a 2 fold difference of expression between matched fresh frozen and FFPE samples ( figure 5). Among the different cases, 69 to 145 microRNA transcripts were more than 2 fold overexpressed in fresh frozen samples. 73 to 186 microRNA transcripts were more than 2 fold overexpressed in FFPE tissues (figure 5).  (table 2). We sought out to identify features that are associated with higher correlation coefficients indicating invariance to the tissue preservation method. To test if the level of expression of a given microRNA is associated with the correlation between fresh frozen and FFPE tissue, Fisher transformed (arctanh) correlations were regressed onto mean FFPE expression ( figure 9). There was no association between expression values and correlations between data from frozen and FFPE tissue observed. We then sought out to understand if a possible lack of variation of individual microRNAs among the biological replicates is resulting in poor correlations. Fisher transformed correlations were regressed onto log transformed FFPE variances. The variance analysis showed that microRNAs with higher variance have significantly higher   correlations of data from FFPE and frozen tissues (figure 10; arctanh(r) = 0.47+0.596log10 (FFPE variance)), which may be due to the presence of true biological variation present in these microRNAs.

PCR validation of microRNA expression
We selected two microRNAs (miR-21 and miR-19a) that were expressed higher in frozen samples than in FFPE samples in our microRNA sequencing results for validation using PCR (Taqman). Both miR-21 and miR-19a were found to also be higher expressed in frozen tissue than in matched FFPE samples using a PCR assay (figure 11), consistent with our sequencing results.

Discussion
MicroRNAs as gene expression regulators appear to have an important role in cancer cell differentiation [4], proliferation [26], metastasis [27] and apoptosis [28]. There have been many microRNA profiling efforts using frozen cancer tissues [29] [30] [31]. Cancer tissue specimens are typically archived after formalin fixation and paraffin embedding (FFPE) and frozen tissue is only available for a minority of cases in pathology archives and tissue banks. A better understanding of the quality of microRNA sequencing results of FFPE tissues would allow for a more informed decision as to when use of FFPE tissue for small RNA sequencing experiments would be appropriate.
In this study, we analyzed small RNA sequencing profiles of eight pairs of matched fresh frozen and FFPE cancer tissue samples using the ABI SOLiD 4 platform. Storage time up to nine years appeared to not affect the number of detectable microRNAs in either frozen of FFPE tissue in this study. The presence of higher percentages of non-microRNAs in the fraction of nucleotides smaller than 40 nt suggested that FFPE tissues are more degraded, though. However, this apparent degradation did not result in fewer microRNAs being detected in FFPE tissues, suggesting that a sequencing approach can overcome sample degradation to some degree. While small RNAs in one of eight FFPE samples appeared degraded as suggested by the reduced read lengths centered at 17 nt compared to the other eight FFPE and all eight frozen samples centered at 21 nt, the number of aligned reads and the number of detected microRNAs in this degraded sample were similar to the data from other samples, which suggests that data obtained from samples with the above described degree of degradation would not necessarily need to be removed from all types of analysis.
Overall, there was excellent correlation between miRNA expression values between matched frozen and FFPE when combining all samples and all microRNAs ( fig. 6), or when analyzing all microRNAs within individual patient sample pairs ( fig. 7). Those findings are consisted with previous reports using different sequencing platforms suggesting that sequencing approaches can result in good correlations between FFPE and frozen samples when including several hundred microRNAs [32] [33] [9] [34].
Correlations of expression values of individual microRNAs (see figs. 8,9,10) were lower than correlations including all microRNAs (figs 6, 7). The underlying reason for this observation likely originates in the identified association of correlation coefficients of individual microRNAs and variance of microRNA expression values (fig 10). Very similar expression values of an individual microRNA among the eight cases (equal to low variance in fig. 10) typically result in low correlation coefficients because the variation between frozen and FFPE data is larger than the variation of expression between the eight cases included in this study. Therefore it is not necessarily surprising to have a wide distribution of correlations of individual microRNAs as analyzed in detail in figures 8, 9, and 10. Limitations of this analysis include limited small sample size and potential PCR drifts and PCR selection biases during amplification in the multiplex PCR based library construction [35] [36]. This disadvantage should be overcome by the 3 rd generation of sequencing technology, which is thought to not require amplification of the target sequence anymore but detect single nucleic acid molecules [37]. Furthermore, the use of adjacent tissue to create matched specimens for this analysis introduces some tissue heterogeneity effects and likely contributes to the observed differences between matched frozen and FFPE samples. In summary, microRNA sequencing of FFPE tissues is feasible using ligation-based sequencing and microRNA expression values of individual patients are highly correlated with expression data from matched frozen tissues. MicroRNAs with higher variance appear to have a higher correlation of their expression values in matched frozen and FFPE tissue data.