Reduction of Systematic Bias in Transcriptome Data from Human Peripheral Blood Mononuclear Cells for Transportation and Biobanking

Transportation of samples is essential for large-scale biobank projects. However, RNA degradation during pre-analytical operations prior to transportation can cause systematic bias in transcriptome data, which may prevent subsequent biomarker identification. Therefore, to collect high-quality biobank samples for expression analysis, specimens must be transported under stable conditions. In this study, we examined the effectiveness of RNA-stabilizing reagents to prevent RNA degradation during pre-analytical operations with an emphasis on RNA from peripheral blood mononuclear cells (PBMCs) to establish a protocol for reducing systematic bias. To this end, we obtained PBMCs from 11 healthy volunteers and analyzed the purity, yield, and integrity of extracted RNA after performing pre-analytical operations for freezing PBMCs at −80°C. We randomly chose 7 samples from 11 samples individually, and systematic bias in expression levels was examined by real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR), RNA sequencing (RNA-Seq) experiments and data analysis. Our data demonstrated that omission of stabilizing reagents significantly lowered RNA integrity, suggesting substantial degradation of RNA molecules due to pre-analytical freezing. qRT-PCR experiments for 19 selected transcripts revealed systematic bias in the expression levels of five transcripts. RNA-Seq for 25,223 transcripts also suggested that about 40% of transcripts were systematically biased. These results indicated that appropriate reduction in systematic bias is essential in protocols for collection of RNA from PBMCs for large-scale biobank projects. Among the seven commercially available stabilizing reagents examined in this study, qRT-PCR and RNA-Seq experiments consistently suggested that RNALock, RNA/DNA Stabilization Reagent for Blood and Bone Marrow, and 1-Thioglycerol/Homogenization solution could reduce systematic bias. On the basis of the results of this study, we established a protocol to reduce systematic bias in the expression levels of RNA transcripts isolated from PBMCs. We believe that these data provide a novel methodology for collection of high-quality RNA from PBMCs for biobank researchers.


Introduction
On Friday, March 11, 2011, the northern part of Japan was affected by a magnitude-9.0 earthquake, termed the Great East Japan Earthquake. This event was the fourth most powerful earthquake on record and triggered a devastating tsunami. After the disaster, the incidence of cardiovascular and psychiatric diseases increased in the coastal areas of Iwate [1], [2] and Miyagi prefectures [3], [4], and a large-scale population-based biobank, the Tohoku Medical Megabank, was instituted for systematic data collection, storing, and parceling of biological samples, such as serum, plasma, peripheral blood mononuclear cells (PBMCs), and genomic DNA and RNA extracted from PBMCs and urine, as well as heath information (lifestyle, history of illness, and living environment) and gene analysis information, in order to investigate the effects of the earthquake and tsunami on the health of local residents.
Collection of blood RNA in population-based cohort studies provides an invaluable opportunity within research communities to search for blood RNA biomarkers for assessing risks associated with the onset of common diseases. Quality management of sample transportation from remote assessment centers to the central laboratory is essential for biobank projects. Although genomic DNA is stable against pre-analytical operations such as freezing and transportation [5], [6], RNA is less stable to conformational changes than DNA because RNA is subjected to spontaneous degradation at room temperature. To collect stabilized RNA from whole blood samples, the UK biobank (http://www.ukbiobank.ac.uk/) employs Tempus Blood RNA tubes (Life Technologies, Carlsbad, CA, USA) [7], while other biobank projects use the PAXgene RNA System (Becton Dickinson and Company, Franklin Lakes, NJ, USA) [8]. Whole blood components, such as plasma and heme-containing red blood cells, inhibit polymerase chain reaction (PCR) and reverse transcription [9], and high concentrations of globin mRNA in whole blood RNA interfere with accurate transcriptome analysis [10]. However, using PAXgene [11], [12] or Tempus [13] tubes can help to stabilize RNA profiles in whole blood. Some reports using these blood collection tubes have shown that globin reduction can alter the expression of several genes [13][14][15]. Therefore, PBMCs are commonly used for gene expression profiling, and clinical transcriptional studies can provide important findings for potential biomarkers of disease initiation [16], [17].
Here, we propose a protocol to minimize the effects of preanalytical procedures on the transcriptome of PBMCs by assessing changes in the transcriptome with and without the use of stabilizing reagents. Seven commercially available stabilizing reagents were considered. Based on qRT-PCR experiments, we carried out quantitative analysis of gene expression for 19 selected candidate genes. Subsequently, we performed whole transcriptome analysis using a HiSeq2500 sequencer (Illumina Inc., San Diego, CA, USA) to examine changes in gene expression under various sample conditions. From our data, we developed a standard protocol for collecting high-quality RNA derived from PBMCs.

Ethics statement
Ethical approval for the study was obtained from the Ethical Committee of Iwate Medical University (Approval ID: HG H25-1). All the subjects gave us written informed consent to participate this study and the use of anonymous samples, explicitly explained to participants.

Blood collection and pre-analytical operations
The workflow for this study is shown in Figure 1. Forty milliliters of whole blood was collected from 11 healthy volunteers (mean age, 35.3610.6 years; eight men and three women) into five BD Vacutainer CPT tubes with sodium heparin (8 mL; Becton Dickinson and Company) through a 21-gauge needle. PBMCs were separated by centrifugation (Sorvall Legend XFR; Thermo Fisher Scientific, Waltham, MA) at 1,7006g for 20 min at room temperature. To remove any contaminating platelets and plasma, PBMCs were washed three times in 30 mL phosphate buffer saline (PBS) containing 2 mM EDTA, followed by centrifugation at 2506g for 10 min at room temperature [18]. PBMCs were then resuspended in 1 mL PBS, and cell numbers were determined using a C-Chip disposable hemocytometer (Biochrom AG, Berlin, Germany).

RNA extraction and measurement of purity, yield, and integrity
After thawing frozen PBMC samples at 37uC, total RNA was immediately extracted using a Maxwell16 LEV simplyRNA Blood Kit on a Maxwell16 Instrument (Promega) according to the manufacturer's instructions. A 4-mL aliquot of each sample of extracted total RNA was used for RNA quality control assessments, while the remaining RNA sample was stored at 2 80uC until use. The RNA yield was measured using a Qubit RNA Assay Kit on a Qubit2.0 Fluorometer (Life Technologies). RNA purity was estimated by measuring the ratio of absorbance at 260 and 280 nm (A260/A280) using a Nanodrop 2000 UV-Vis Spectrophotometer (Thermo Fisher Scientific). The RNA integrity number (RIN) was assessed using an RNA 6000 Nano Kit on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) according to the manufacturer's instructions.
Quantitative real time-PCR and data analysis RNA expression levels were measured using TaqMan Gene Expression Assays (Applied Biosystems). We prepared TaqMan PCR primers and probes for 21 genes (Table 1). GAPDH and ACTB were selected as controls (also referred to as reference transcripts). Twelve genes (IL1B, IL8, IL10, IL15, IFNG, TNF, JUN, HSPA1A, FOS, NFKB1, TP53, and NOS3) were selected according to previous studies of RNA stability [19]. SPI1 and CSF1R were selected as monocyte-specific markers. The remaining five genes were selected because these genes have been suggested to be biomarkers of human disease (IL6, SLC6A4, HDAC5, NR3C1, and BDNF).
cDNA was synthesized from RNA samples using a SuperScript VILO cDNA Synthesis Kit (Life Technologies). For a combination of 48 samples648 pairs of PCR primers, pre-amplification of cDNA and qRT-PCR experiments were performed in parallel BioMark 48.48 Dynamic Array chips (Fluidigm, South San Francisco, CA, USA). Two replicates of threshold cycle (Ct) values were measured for each condition.
The Ct values were analyzed using an in-house Perl script based on the DDCt method [20]. Let Ct i,j,k be a Ct value for a condition i, a transcript j, and a replicate k. In our script, at first, Ct i,j = Average k (Ct i,j,k ) was calculated by averaging Ct values over replicates. Next, DCt i,j = Ct i,j -Ct i,j9 was calculated by subtracting the Ct of the reference transcript j9 (e.g., GAPDH or ACTB) from the Ct of a target transcript j. Finally, DDCt i,j = DCt i,j-DCt i9,j was calculated by subtracting the DCt of the reference condition i9 (e.g., Ctrl1) from the DCt of the target condition i.

Data analysis of RNA-seq
Base calling and quality filtering were performed using real-time analysis (RTA) with HCS v2.0. The quality of sequence data was evaluated by FastQC (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc/). The human reference sequence file (hs37d5.fa) was downloaded from the 1000genome ftp site (ftp://ftp.1000 genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_ assembly_sequence/), and the annotated general feature format (gff) file was downloaded from the Illumina iGenome ftp site (ftp:// igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/NCBI /build37.2/). The human genome index was constructed with the bowtie-build v.0.12.9 [21]. The fastq files were aligned to the reference genomic sequence by TopHat v.2.0.9 [22] with samtools v.0.1.19 [23]. Estimation of transcript abundance was calculated, and the count values were normalized to the upper quartile of the fragments per kilobase of transcript per million fragments mapped reads (FPKM) using Cuffdiff v2.1.1 [24]. Since RNA from three individuals was analyzed by both qRT-PCR and RNA-Seq, the Spearman correlation coefficient between the results of qRT-PCR (DCt value) and RNA-Seq (log 10 [FPKM+1] value) was calculated by comparing 456 (3 individuals68 conditions619 transcripts) dimensional vectors. Pearson correlation coefficients between pre-analytical conditions were calculated based on log 10 [FPKM+1] values. Therefore, we first calculated the Pearson correlation coefficients between condition pairs for each of the seven individuals, and then the average and standard deviation of the correlation coefficients across the seven individuals were calculated for each condition pair. Cluster analysis was also performed based on log 10 [FPKM+1] values. A dendrogram was created using the CummeRbund package in R [25]. Transcripts that were differentially expressed among preanalytical conditions were tested based on the Wilcoxon signed rank test.

Results
Effects of pre-analytical conditions on RNA yield, integrity, and purity PBMCs from 11 volunteers were subjected to 10 pre-analytical conditions: Ctrl1, Ctrl2, Without stab, Later, Protect, Lock, Shield, Stab, SDS, and 1-Thio. The purity, yield, and integrity of extracted RNA for each of the 10 conditions are shown in Table 2.
The RNA yield of the Ctrl2 condition was equivalent to that of the Ctrl1 condition, confirming that our experimental results were consistent between replicates. Moreover, the RNA yield of the Without stab condition was equivalent to that of the Ctrl1 condition. This result implied that pre-analytical freezing did not lower RNA yield. Similarly, the RNA yield of the Protect, Stab, and 1-Thio conditions were not significantly different from that of the Ctrl1 condition. In contrast, the RNA yield of the Lock, SDS, Later, and Shield conditions was significantly lower than that of the Ctrl1 condition (Table 2). These data indicated that there was insufficient RNA yield for biobanking purposes under these conditions.
The RNA integrity (RIN value) of the Ctrl2 condition was equivalent to that of the Ctrl1 condition. The RNA integrity of the Without stab condition was significantly lower than that of the Ctrl1 condition (Table 2). This result suggested that pre-analytical freezing significantly lowered RNA integrity, implying that addition of stabilizing reagents was essential for prevention of reduced RNA integrity. Under the Lock and Stab conditions, RNA integrity was equivalent to that of the Ctrl1 condition. The RNA integrity of the 1-Thio sample was higher than that of the Ctrl1 condition, while the RNA integrity under the Protect and SDS conditions was significantly lower than that of the Ctrl1 condition. For the Later and Shield conditions, RNA integrity could not be measured, possibly due to low RNA yield and concentration. These results implied that RNA degradation during pre-analytical operations was effectively prevented under the Lock, Stab, and 1-Thio conditions.
The RNA purity (i.e., A260/A280) under the Ctrl2 condition was equivalent to that of the Ctrl1 condition. Similarly, under the Without stab and SDS conditions, the difference in RNA purity compared to the Ctrl1 condition was not significant. The RNA purities under the Later, Protect, and Shield conditions were significantly lower than that of the Ctrl1 condition. Conversely, the RNA purities of the Lock, Stab, and 1-Thio conditions were improved compared to that of the Ctrl1 condition. These results indicated that stabilizing reagents could significantly affect the purity of RNA.
From these results, we excluded the Later and Shield conditions from candidate reagents for preventing RNA degradation because of their insufficient RNA yield. The other six conditions (Without stab, Protect, Lock, Stab, SDS, and 1-Thio) as well as the two control conditions (Ctrl1 and Ctrl2) were examined in subsequent quantitative expression analyses.

Gene expression bias during pre-analytical operations analyzed by qRT-PCR
To examine whether gene expression in PBMCs was biased during pre-analytical operations and whether the bias could be reduced using stabilizing reagents, we performed qRT-PCR assays to analyze the expression of 21 transcripts from seven individuals for each of the eight conditions. When GAPDH was used as control, no significant bias in DCt was observed between the Ctrl1 and Ctrl2 conditions for all target transcripts (Figure 2), suggesting that our experimental results were consistent between replicates. Between the Without stab and Ctrl1 conditions, five transcripts (FOS, SPI1, IL6, HSPA1A, and BDNF) were significantly biased (significance level of 0.05), indicating that absence of a stabilizing reagent promoted bias of transcript expression in a significant portion (,40%) of transcripts. Under the Protect, Lock, Stab, SDS, and 1-Thio conditions, nine, three, one, five, and one transcripts were significantly biased, respectively, compared to the Ctrl1 condition ( Figure 2). These results indicated that gene expression bias during pre-analytical operations was reduced under the Lock, Stab, and 1-Thio conditions. Similar results were obtained when ACTB was used as control ( Figure S1).

Pre-analytical operation bias in whole transcriptome analyzed by RNA-Seq
To extend the above expression analysis of the 19 target transcripts to the whole transcriptome, we performed RNA-Seq for 56 samples (7 individuals68 conditions; Figure 1). The base quality of 95-99 nucleotides was 32.860.2 ( Figure 3A). The number of sequenced reads was 38.865.0 million. The percentage  of mapped reads was 90.1% 60.0% ( Figure 3B). These statistics suggested that our sequencing data had sufficient quality for expression analyses.
As an indicator of normalized gene expression levels, FPKM values were calculated using TopHat and Cufflinks (File S1) [22], [23], [24]. Spearman correlation coefficient between DCt values measured by qRT-PCR experiments (GAPDH was used as a reference transcript) and the FPKM value calculated from RNA-Seq data was 20.762 (P,2.2610 216 ), suggesting that qRT-PCR and RNA-Seq showed consistent quantitative results ( Figure S2). Figure 4A shows the average and standard deviation of Pearson correlation coefficients between the Ctrl1 conditions and the seven other conditions. Importantly, the Pearson correlation coefficient between the Ctrl1 and Ctrl2 conditions was 0.995560.00057, indicating the markedly high reproducibility of our experimental results between replicates. Additionally, the Pearson correlation coefficient between the Without stab and Ctrl1 conditions was 0.982560.010, which was significantly lower than that between the Ctrl1 and Ctrl2 conditions (P = 0.016; Wilcoxon signed rank test). This result suggested that pre-analytical freezing induced a systematic bias in the expression level of the whole transcriptome. As expected, under the 1-Thio, Stab, Lock, and Protect conditions, Pearson correlation coefficients with the Ctrl1 condition were not significantly different from that between the Ctrl1 and Ctrl2 conditions ( Figure 4A). However, the Pearson correlation coefficient between the SDS and Ctrl1 conditions was significantly lower than that between the Ctrl1 and Ctrl2 conditions (P = 0.016; Wilcoxon signed rank test). These results showed that the 1-Thio, Stab, Lock, and Protect conditions could reduce systematic bias due to pre-analytical operations. Pair-wise scatter plots between the eight conditions are shown in Figure S3.
To examine whether the transcriptomic bias due to preanalytical operations was larger than transcriptomic variations between individuals, we performed clustering analysis of the 56 transcriptomes ( Figure 4B). Interestingly, the Ctrl1, Ctrl2, 1-Thio, Stab, Lock, and Protect conditions were clustered together for each individual. The Without stab and SDS conditions were not included in the individual clusters. These results indicated that transcriptomic bias under the 1-Thio, Stab, Lock, and Protect conditions was less drastic than transcriptomic variations between We detected differentially expressed transcripts (DETs) between conditions by testing whether FPKM values under a condition were different from those under another condition using the Wilcoxon signed rank test with a significance level of 0.05 ( Figure 4C). Of 25,223 transcripts, only 558 transcripts were detected as DETs between the Ctrl1 and Ctrl2 conditions. The number of DETs was within the expected number of false-positive discoveries (25,22360.05 = 1,261). The number of DETs between the Without stab and Ctrl1 conditions was 10,411, which was much larger than the expected number of false-positive discoveries. Under the 1-Thio, Stab, Lock, and Protect conditions, the numbers of DETs were 3,293, 2,740, 4,621, and 3,940, respectively. These results suggested that the stabilizing reagents  tested in this study could decrease systematic bias in FPKM values, although the number of DETs was larger than the expected number of false-positive discoveries. The number of DETs between the SDS and Ctrl1 conditions was 12,086, indicating that SDS increased rather than reduced systematic bias in FPKM values. To take into account not only the number of DETs, but also the P-value distribution, we drew quantile-quantile plots (QQplots, Figure S4). The P-value distributions of the 1-Thio, Stab, Lock, and Protect conditions were more similar to that of the Ctrl2 condition compared to those of the Without stab and SDS conditions.

Discussion
In this study, we examined the effects of pre-analytical freezing, which is essential for the collection of PBMCs from remote assessment centers to a central laboratory, on gene expression levels in a transcriptome-wide manner. Even when no stabilizing reagent was used (i.e., the ''Without stab'' condition), RNA yield and purity were not lowered by pre-analytical freezing. However, RNA integrity was lowered under the Without stab condition, indicating that substantial degradation of RNA molecules occurred. qRT-PCR analysis supported this conclusion, demonstrating that five out of 19 transcripts exhibited systematic bias under the Without stab condition. In addition, whole-transcriptome analysis suggested that the expression levels of 10,411 out of 25,223 transcripts were systematically biased, suggesting that approximately 40% of whole transcripts were systematically biased due to pre-analytical freezing. Accordingly, it is necessary to implement an appropriate protocol to reduce systematic bias in gene expression levels to collect RNA from PBMCs for large-scale biobanks.
In this study, we analyzed the effects of seven commercially available stabilization reagents (Later, Protect, Lock, Shield, Stab, SDS, and 1-Thio; see the Methods section) on RNA purity, integrity, and yield. Our results demonstrated that Later and Shield significantly lowered RNA yield, making these two conditions inappropriate for biobanking purposes. Of the five remaining candidate reagents, the reduced RNA integrity observed under the Without stab condition was improved under the Lock, Stab, and 1-Thio conditions. Furthermore, qRT-PCR and RNA-Seq experiments consistently confirmed that systematic bias in expression levels was substantially reduced under these three conditions. Thus, use of these reagents could substantially improve collection and analysis of RNA for biobanks.
Protocols for sample collection for large-scale biobanks must be as simple and as low cost as possible. Based on the results of the present study, we established a protocol for pre-analytical treatment of samples at remote assessment centers to obtain high-quality RNA from PBMCs at a central laboratory ( Figure 5). For blood collection (Procedure 1), we used a Vacutainer CPT tube, which could help to reduce contamination of red blood cells in the PBMC layer without the need for technical expertise. After centrifugation (Procedure 2), extracted PBMCs were washed with more than 15 volumes of 10 mM PBS containing 2 mM EDTA (pH 7.2) to remove platelets (Procedures 3-6). Then, PBMCs are resuspended in a suitable volume of 10 mM PBS (Procedures 7-9). We assumed that RNA extraction was performed using a Maxwell16 instrument at a central laboratory (Procedure 15). Considering the cost of reagents, we selected 1-Thio, which is supplied with the Maxwell16 LEV simplyRNA Blood Kit, as a stabilizing reagent (Procedure 10). After this pretreatment, the tubes were frozen at 280uC for at least 24 h and at most one week (Procedure 11). For these pre-analytical operations, a cool incubator, a deep freezer, a tabletop centrifuge, a tabletop clean bench, and an aspirator are required at remote assessment centers. This equipment can be arranged within an area of 262 m.
Our protocol would require remote assessment centers to ship frozen tubes to a central laboratory using dry ice (Procedure 12). When the tubes would arrive at the central laboratory, they would be de-identified (Procedure 13) and would then be stored at 2 80uC until RNA extraction (Procedure 14). For RNA extraction, 32 tubes could be processed simultaneously. After thawing tubes at 37uC, RNA would be extracted from PBMCs using a Maxwell16 instrument (Procedure 15). The RNA liquid would be quantified and transferred into 4 sterile, barcoded cryovials (Procedure 16), which would then be stored separately in four deep freezers (Procedure 17). These procedures and barcoded samples could be managed using a robust and flexible LIMS system.
Our protocol would require two technical staff at each remote assessment center and two technical staff at the central laboratory. We trained all technical staff at the central laboratory using standardized training curriculum and a certification examination. Compared with the collection methods using PAXgene [11], [12] and Tempus [13] tubes, our protocol was more labor-intensive, and the cost of consumed goods was more expensive. Accordingly, there was a tradeoff between cost and quality in our protocol, and this tradeoff should be discussed for each biobank project. Our protocol presented in this study provided a novel methodology to collect high-quality RNA from PBMCs for biobank researchers. Collecting RNA from PBMCs based on our protocol may enable, for example, multilayer omics analysis (genome, methylome, and transcriptome) of PBMCs. We plan to collect 5,400 RNA samples according to our protocol over the next two years (2015-2016). In our feasibility trial, 32 RNA samples have been collected. Based on the trial, we modified and validated our protocol to make it as effective, manageable, and feasible as possible. Figure S1 Comparisons of alternative gene expression using the DDCt method. The reference gene was ACTB, and the reference sample was Ctrl1. The bar plot shows the mean DDCt 6 SD for each sample condition. *, p,0.05; **, p,0.01; ***, p,0.001 (Wilcoxon signed rank test).