Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease that causes death within a mean of 2–3 years from symptom onset. There is no diagnostic test and the delay from symptom onset to diagnosis averages 12 months. The identification of prognostic and diagnostic biomarkers in ALS would facilitate earlier diagnosis and faster monitoring of treatments. Gene expression profiling (GEP) can help to identify these markers as well as therapeutic targets in neurological diseases. One source of genetic material for GEP in ALS is peripheral blood, which is routinely accessed from patients. However, a high proportion of globin mRNA in blood can mask important genetic information. A number of methods allow safe collection, storage and transport of blood as well as RNA stabilisation, including the PAXGENE and TEMPUS systems for the collection of whole blood and LEUKOLOCK which enriches for the leukocyte population. Here we compared these three systems and assess their suitability for GEP in ALS. We collected blood from 8 sporadic ALS patients and 7 controls. PAXGENE and TEMPUS RNA extracted samples additionally underwent globin depletion using GlobinClear. RNA was amplified and hybridised onto Affymetrix U133 Plus 2.0 arrays. Lists of genes differentially regulated in ALS patients and controls were created for each method using the R package PUMA, and RT-PCR validation was carried out on selected genes. TEMPUS/GlobinClear, and LEUKOLOCK produced high quality RNA with sufficient yield, and consistent array expression profiles. PAXGENE/GlobinClear yield and quality were lower. Globin depletion for PAXGENE and TEMPUS uncovered the presence of over 60% more transcripts than when samples were not depleted. TEMPUS/GlobinClear and LEUKOLOCK gene lists respectively contained 3619 and 3047 genes differentially expressed between patients and controls. Real-time PCR validation revealed similar reliability between these two methods and gene ontology analyses revealed similar pathways differentially regulated in disease compared to controls.
Citation: Bayatti N, Cooper-Knock J, Bury JJ, Wyles M, Heath PR, Kirby J, et al. (2014) Comparison of Blood RNA Extraction Methods Used for Gene Expression Profiling in Amyotrophic Lateral Sclerosis. PLoS ONE 9(1): e87508. https://doi.org/10.1371/journal.pone.0087508
Editor: Udai Pandey, Louisiana State University Health Sciences Center, United States of America
Received: September 22, 2013; Accepted: December 26, 2013; Published: January 27, 2014
Copyright: © 2014 Bayatti et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: PJS, JK and NB are supported by funding from the European Union: Seventh Framework Programme (FP7/2007-2013) under the Euro-MOTOR project (No: 259867, http://www.euromotorproject.eu/) and PJS and JK by the EU Joint Programme–Neurodegenerative Disease Research (JPND), Sampling and biomarker OPtimization and Harmonization In ALS and other motor neuron diseases (SOPHIA). This is an EU Joint Programme - Neurodegenerative Disease Research (JPND) project. The project is supported through the following funding organisations under the aegis of JPND - www.jpnd.eu: France, Agence Nationale de la Recherche (ANR); Germany, Bundesministerium für Bildung und Forschung (BMBF); Ireland, Health Research Board (HRB); Italy, Ministero della Salute; The Netherlands, The Netherlands Organisation for Health Research and Development (ZonMw); Poland, Narodowe Centrum Badań i Rozwoju; Portugal, Fundação a Ciência e a Tecnologia; Spain, Ministerio de Ciencia e Innovación; Switzerland, Schweizerischer Nationalfonds zur Förderung der wissenschaftlichen Forschung (SNF); Turkey, Tübitak; United Kingdom, Medical Research Council (MRC). JCK holds a MND Association/MRC Lady Edith Wolfson Fellowship award (MR/K003771/1). JJB is funded by an MRC PhD studentship. PJS is supported as an NIHR Senior Investigator. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Amyotrophic lateral sclerosis (ALS) is a devastating and fatal disease that preferentially affects the motor system. Clinically, ALS manifests as progressive weakness of voluntary muscles and patients survive on average for 2–3 years after onset of symptoms. The mechanisms that cause neurodegeneration in ALS are incompletely understood, and are considered to operate through a number of molecular and genetic pathways including glutamate toxicity, oxidative stress, the formation of protein aggregates and defects in axonal transport , . Therefore the identification of biomarkers that may detect the early signs of ALS, assess disease progression, monitor the effects of treatment, or even help identify the cause of the disease is of great importance.
Gene expression profiling (GEP) is a powerful tool to help identify potential diagnostic and therapeutic targets in neurological diseases –. Analysis of global expression patterns and differentially expressed genes in an unbiased manner allows for identification of affected functional categories or specific pathways. One potential source for genetic material in this type of study is peripheral blood, which is routinely and easily accessed from patients . Although GEP of whole blood is informative in studying the mechanisms and pathogenesis of a number of diseases, including neurological disorders, the high proportion of globin mRNA present in red blood cells masks potentially important genetic information, and increases noise, thereby reducing sensitivity , .
Conventional methods for reducing the relative amounts of globin mRNA in blood samples by density gradient centrifugation and extraction of the white blood cell population have been shown to be effective in reducing globin interference. However, the long duration of experimental handling during the extraction process leads to RNA degradation and possible unintended gene induction that might affect the validity of the disease related changes in gene expression . Methods that allow rapid RNA stabilization, and long-term storage without degradation are commercially available e.g. PAXGENE (Qiagen) and TEMPUS (Applied Biosystems) and are currently being utilised (e.g., , –). These methods extract RNA from whole blood and so will be affected by globin interference, unless an additional globin depletion step is completed. An alternative strategy has been developed that aims to specifically reduce the amount of globin contamination by enriching the leukocyte population through immobilisation on a filter (LEUKOLOCK, Ambion), thereby avoiding a globin depletion step , .
In order to discover the best method of RNA isolation from blood to carry out GEP in a neurological disorder such as ALS, we compared these three commercially available systems for RNA extraction from blood. PAXGENE (PAX) and TEMPUS (TEM), which are column based methods leading to RNA extracted from whole blood, and LEUKOLOCK (LL), a method for isolating white blood cells for subsequent RNA extraction. We used a cohort of 8 sporadic ALS patients and 7, age-matched controls. RNA from PAX and TEM extractions also underwent a step to deplete globin using GlobinClear (GC, Ambion). Therefore, RNA from 5 experimental conditions (TEM, TEM+GC, PAX, PAX+GC, and LL) was hybridized onto U133 Plus 2.0 Human whole genome arrays (Affymetrix) and analysed. A number of genes found to be dysregulated in disease compared to controls were confirmed by real-time PCR. This allowed us to identify which method/condition is best suited for carrying out GEP in ALS.
RNA extraction, quality and quantity
RNA was extracted by PAX, TEM and LL as described in the methods (Figure 1). TEM and LL both had optional DNase steps which were omitted, and the on-column DNase step in PAX was also not carried out to avoid potential damage to RNA quality and yield. The quality and yield of extracted RNA was analysed by examining electropherogram traces generated by the Agilent Bioanlayzer running the samples on a total eukaryote RNA nano chip, and by UV spectrography using Nanodrop. Representative Bioanalyzer traces show that in the cases of TEM and LL, high quality RNA (RIN >7.0) can routinely be extracted (Figure 2). However, in the case of PAX extracted RNA, genomic DNA contamination led to a skewing of the 28S peak, and occasionally an additional genomic DNA peak was observed. Therefore all PAX samples were subjected to DNase treatment. No such contamination was observed with TEM and LL. After DNase digestion, high quality RNA was extracted with no signs of genomic DNA contamination, however this resulted in decreased RNA yield. Globin mRNA depletion (GC) using the Globin Clear kit was carried out using 1 µg of total RNA as input for each sample.
5 blood samples were drawn from each of 8 patients and 7 controls were extracted by three methods: LEUKOLOCK, LL; PAXGENE, PAX(x2); TEMPUS, TEM(x2). Samples were extracted separately and not pooled. Additional aliquots of RNA extracted from PAX and TEM samples were depleted for globin RNA using GlobinClear. The resulting 75 samples were amplified by IVT and run on Affymetrix U133 Plus 2.0 whole human genome arrays and analysed with MAS5.0, GCRMA or with the R package PUMA.
High quality RNA (RIN >7.0) can be extracted from all three methods. PAXGENE however requires a DNase step as traces from initial extractions show genomic DNA contamination at high molecular weights and as a “shoulder” to the 28S peak (indicated by asterisks). Good quality RNA can be detected after globin depletion with both TEM and PAXGENE. However, PAXGENE samples showed consistently lower concentration levels, as indicated by the smaller 18S and 28S peaks (FU, fluorescent units). Traces are representative and from single samples from each extraction method.
Averages of RNA yield and RNA quality (RIN values) for each method were quantified using the Nanodrop and Bioanalyser and are displayed in Figures 3A and B. In terms of yield, quantity of RNA was as follows: TEM > PAX > LL and TEM GC > PAX GC (Figure 3A). RNA quality as measured by RIN values showed little difference between samples, however LL tended to produce samples with higher average RIN values, while RIN values from PAX GC were the lowest. Although there was a reduction in RIN values after GC in the case of PAX, the RIN values of TEM GC samples remained high (LL, RIN = 8.5, ±1.0; TEM, RIN = 7.9, ±0.44; PAX, RIN = 7.9, ±0.44; TEM GC 7.9, ±0.3; PAX GC 7.2, ±0.59, see Figure 3B). PAX required the longest time and most experimental manipulation to carry out, while there were significantly less steps and time required for the other methods.
aRNA amplification and fragmentation
After the in vitro transcription (IVT) reaction all samples yielded enough material to progress to the hybridisation step (Figure 3C), however some PAX samples required concentrating with Glycoblue. Figure 4 shows representative Bioanalyzer traces from pre- and post- fragmentation amplified RNA (aRNA). Each method presented a distinctive aRNA profile. However, after fragmentation the profiles were not distinguishable. The profiles of LL, TEM GC, and PAX GC exhibited IVT products over a wide range of molecular weight sizes while PAX and TEM profiles displayed high intensity peaks at low or medium nucleotide sizes, either indicating degradation of product, or the high expression levels of a particular species of RNA that disappeared upon globin mRNA depletion. All samples were hybridised onto Human Genome U133 Plus 2.0 arrays.
After IVT, the aRNA profiles from each method showed distinctly different profiles. Generally, IVT resulted in a wide range of detectable products. However in the cases of TEM and PAX especially, distinct high intensity peaks were observed in all cases (black arrows). These disappeared in all cases in the globin mRNA depleted samples. After fragmentation, aRNA profiles were indistinguishable from each other (FU, fluorescent units). Traces are representative and from single samples from each extraction method.
Array hybridisation and Quality Control
Quality control (QC) for the 75 hybridised arrays was carried out. The polyA controls showed that there was no bias between high and low expressed genes during the preparation steps (Figure 5A). However, the different patterns of hybridisation controls (Figure 5B) indicate that the conditions of RNA extraction affect PAX hybridisations, causing a highly variable profile which disappears upon globin depletion. The % present calls (Tables 1 and 2, Figure 5C) are all consistent (∼40–50%), except for PAX (∼27–39%) and TEM (∼35–43%) which are clearly lower than the others. Scale factors (Figure 5D, Table 1 and 2), were all consistent within groups, and comparing between groups except PAX samples which were higher than the rest. Only a few samples showed RNA degradation, and they were included in the further analysis if the scale factor was not affected. The relative log signal of all arrays was generally similar apart for the outliers LL12 and TGC12 (Figure 5E), and in general signals from PAX hybridisations showed more variability. These observations indicate that the presence of globin mRNA is an important factor affecting hybridisation quality control. Of the 75 arrays, 72 passed quality control. Three arrays were excluded for not passing QC standards, including scale factor and outlying box plots.
Analysis of array data of all 75 samples run was carried out in Affymetrix Expression Console. PolyA (A) and Hyb (B) controls show that the preparation steps were consistent, and that in general all hybridisations were consistent, but that hybridisation of PAX samples differed from the rest. The % present call (C) values were generally consistent in LL, TEM GC and PAX GC samples, but were clearly lower with TEM and PAX gene samples. The scale factor between chips were generally low, except for PAX hybridisations (D) while the relative log expression (E) of all chips were generally similar with a few exceptions, and a slight general increase in PAX samples (FU, fluorescent units).
Further quality control was carried out on GCRMA normalised data to check the effects of GC on globin RNA levels. Genespring 12.5 (Agilent) was used to plot the average expression levels of a number of probe sets in the highest expressing human globin isoforms (alpha and beta) for each of the five conditions. In the case of alpha globin probe sets, representative plots (Figure 6A, B, C) indicate that alpha globin levels are highest in PAX and TEM samples, while LL are lower, and TEM GC and PAX GC exhibit the lowest levels of alpha globin mRNA levels. A similar pattern is also seen in the case of beta globin (Figure 6D).
Data were GCRMA normalised and levels of 3 representative alpha globin probe sets (A, B and C) and one beta probe set (D) were averaged for each condition. Expression level box plots indicate that in both the cases of alpha and beta globin, TEM and PAX exhibit highest levels of globin mRNA while TEM GC and PAX GC were the lowest, LL levels were generally lower than PAX and TEM, but higher than PAX GC and TEM GC.
Analysis in MAS5.0 was carried out to determine the effect of globin clearance on transcripts present. Analysis of the number of transcripts present in all arrays for each condition (Figure 7) indicated that globin clearance unmasks a large percentage of transcripts that would otherwise not be detected. Comparing all arrays in each condition for transcripts called present (Figure 7A), indicated that arrays from “globin positive” conditions (PAX and TEM) exhibited far less present transcripts than “globin negative”, globin depleted samples from PAX GC and TEM GC. The LL method which enriches RNA from leukocytes revealed present calls similar to the “globin negative” group of conditions. This pattern was also similar when comparing present calls in either controls only or patients only (data not shown). Venn diagrams comparing number of present calls between LL, PAX and TEM showed 9632 probe sets in common, while a comparison between LL PAX GC and TEM GC, exhibited 15576 probe sets in common (Figure 7B, C). A further comparison between the these two populations of transcripts indicated 6022 probe sets in the population of 15576 that were called present in the “globin negative” group representing genes unmasked by globin depletion (Figure 7D). DAVID analysis was carried out with the 6022 probe sets. Genes associated with intracellular organelles, most notably the nucleus and mitochondria were identified, including many related to transcription (see Table S1).
MAS5.0 normalised data indicated that the presence of globin affects the number of probes sets called present. Comparison of all probe sets called present in every array for each condition reveals higher numbers in the case of LL, TEM GC and PAX GC (“globin negative”) as compared to TEM and PAX (“globin positive”; A). Comparison between probe sets called present in LL, TEM and PAX reveals 9632 probe sets in common (B), and 15576 probe sets were found to be in common between LL, TEM GC and PAX GC (C). By comparing these 2 populations 6022 probe sets were found to be unmasked by globin depletion (D).
Gene expression analysis
Using QLUCORE OMICS Explorer, principal component analysis (PCA) on GCRMA normalised samples that took into account the expression levels of all probe sets on the arrays showed that 2 main clusters were visible and broadly corresponding to globin levels (Figure 8A). A tightly related “globin negative” cluster included the following conditions (LL, TEM GC and PAX GC), while a more variable “globin positive” cluster (PAX and TEM) was also present. Taking together this observation with the potential importance of 6022 probe sets uncovered by globin depletion to the ontology of ALS, subsequent gene expression analysis was carried out on the conditions in the “globin negative” group while “globin positive” samples were excluded.
PCA plotted using QLUCORE OMICS Explorer on GCRMA normalised data: each dot represents an array and takes into account the expression levels of every probe set and shows 2 main clusters (A). One tightly clustering ‘globin negative’ cluster includes the following conditions: LL, TEM GC and PAX GC, while a second more variable cluster “globin positive” includes arrays hybridised with PAX and TEM extracted RNA. Focusing on the globin negative group PUMA analysis identified differentially regulated probe sets (ALS patients vs controls) for the following (B; LL, 3047; TEM GC, 3619; PAX GC, 4511). A Venn diagram comparing these lists shows that TEM and PAX share more genes in common, than either with LL, and 142 genes in common between all 3 methods).
A PUMA analysis of differentially-regulated genes between patients and controls was carried out, and the three conditions revealed the following number of statistically significant (p≤0.05) differentially regulated probe sets: LL, 3047; TEM GC, 3619; PAX GC 4511 (Figure 8B). A Venn diagram comparing these results showed that 142 probe sets were in common between the three conditions, while TEM GC and PAX GC shared more common probe sets than either LL and TEM GC or LL and PAX GC. Due to the longer preparation time and additional steps required for PAX GC, PCR validation of gene lists were only carried out in the case of LL and TEM GC conditions (see Tables S2 and S3 for identities of differentially regulated probe sets for LL and TEM GC respectively). A control analysis was also carried out in order to measure the probability of measuring the number of differentially regulated genes in any two random groups of people of a similar size. This would help identify the amount of noise in our methods. The 15 individuals were randomly sorted into two groups five times (not based on ALS status) and these groups were compared against each other. PUMA analysis carried out for each random group pairing demonstrated the number of differentially regulated genes between groups was lowest in the case of LL with an average of 37.74% ±10.54 of the original number of genes. TEM GC exhibited more noise, with an average of 46.91% ±18.49 decrease compared to the number of genes in the original study (Table S4).
Real time PCR validation
To determine the reproducibility of the methods, 6 differentially regulated genes were chosen from each of the LL and TEM GC gene lists for real time PCR validation. These genes exhibited a fold change of at least ±1.5 (patient versus control), with two genes from high, medium and low probability (as calculated by PUMA) values were chosen. PCR calculated fold change values were calculated from patient and control ΔCt values (Figure 9).
ΔCTs plotted for 6 genes chosen for validation using the LL method of RNA extraction from blood (A). Four out of 6 genes (CDK1, EGR1, SSPN and IL23) could be validated from the Affymetrix microarray studies (C). ΔCTs plotted for 6 genes chosen for validation using the TEMPUS/GC method of RNA extraction from blood (B). Four out of 6 genes (EP400, E2F2, EHBP1 and KAZ) could be validated from the Affymetrix microarray studies (D). Blue bars represent levels of gene expression in ALS patients (n = 6) and red bars the levels in normal control individuals (n = 6).
In the case of LL, 4 out of 6 genes were validated with significance: CDK1, EGR1, SSPN, and IL23R (all p<0.05 with t-test), while 2 genes were not validated as being regulated in the same direction as the arrays: MBLN3 and FOXP1.
In the case of TEMPUS/GC, 4 out 6 genes were validated with significance: EP400, E2F2, EHBP1 and KAZ (all p<0.05), while one gene was regulated in a similar way to that seen in the array without significance: IGHMBP2 (p = 0.11), and 1 gene, PPP1R10, was not found to be regulated in a similar way observed with the arrays.
In summary both methods were able to validate two-thirds (4/6) of the genes chosen thereby indicating that these methods are similarly reliable. However the reliability of a method in terms of gene validation by PCR does not necessarily correlate with the relevance of the method to uncovering genes important in the disease process, therefore gene ontology studies were carried out on the lists of differentially expressed genes identified by the two methods.
Gene Ontology and KEGG pathway analysis
Initially, functional annotation clustering was carried out using DAVID. In the case of TEMPUS/GC 439 clusters were identified of which the top 15 are ranked (Table 3). In the case of LL, when high stringency was used, 383 gene clusters were identified of which the top 15 are ranked in Table 4.
Although there are similarities between the 2 lists in terms of cluster identification, e.g. nucleus/intracellular organelle, apoptosis/cell death, nucleotide binding, RNA recognition motifs and transcription/transcriptional regulation all being found to be enriched with both methods, there are major differences too. The extent of enrichment in TEM GC for the top 3 clusters: ribosome, nucleus and RNA splicing was much greater than the top 3 clusters for LL (nucleus, RNA recognition motif and nucleotide binding) and of the top 3 clusters in TEM GC, ribosome and RNA splicing were not present in the top 15 LL clusters. (In LL, splicing was ranked as 27, with an enrichment of 2.32, while ribosome biogenesis/RNA processing and metabolic processes was ranked as 267) Of the top 3 LL clusters, all 3 are found with the top 15 TEMPUS/GC clusters.
KEGG pathway analysis was also carried out in DAVID, in order to give an indication of the extent of detected changes in intracellular pathways with each experimental method (See Tables S5 and S6). KEGG analysis of the LL gene list revealed dysregulation of 45 pathways including a number of signalling cascades such as MAPK, Jak-STAT, and phosphatidylinositol and associated receptor-mediated pathways (Table S5). KEGG analysis of the TEM GC gene list indicated 33 affected pathways (Table S6). Specific signalling cascades observed in the LL list were absent, however a number of common receptor-associated pathways were affected, e.g. Insulin signalling pathway, Neurotrophin signalling, B-cell receptor signalling. Comparing between the two KEGG pathways analyses, LL and TEMPUS share 17 common dysregulated pathways (accounting for 38% of LL pathways, and 52% in the case of TEM GC).
Taken together, this analysis suggests that TEM GC and LL methodologies are similarly consistent when choosing genes to validate by qPCR, and that whilst they do result in similar enrichment in affected gene ontologies, the data from TEMPUS/GC shows more statistical significance.
This study compares three RNA extraction methods from blood and their suitability for GEP in a neurological disease such as ALS. All three methods allow blood drawn to be stored in stabilizing agents and therefore provide alternatives to traditional PMBC isolation which requires the immediate processing of blood samples . Two column based methods (PAX, TEM) resulting in total blood RNA extraction and one method which enriches for the leukocyte population were used (LL). Previous studies indicate that long term storage of blood in RNA stabilizing solutions does not adversely affect the transcript profile in extracted RNA , –. However, there have been reported differences between PAX and TEM methods , , . Here we are the first to compare whole blood extraction methods with a leukocyte enrichment method in human blood. In addition, globin depletion was carried out on PAX and TEM blood samples in order to compensate for the masking effects of high levels of alpha and beta globin transcripts using the commercially available GC method , . In total 5 conditions were therefore assessed and compared, LL, TEM, TEM/GC, PAX and PAX/GC.
Blood extraction and RNA isolation
Blood was drawn directly into TEM and PAX tubes containing RNA stabilizing solutions, the two methods required relatively small amounts of blood (3 ml and 2.5 ml respectively) and stored at −20°C after the required mixing (PAX also requires 2 hours incubation at room temperature before freezing). In the case of LL, a larger quantity of blood, 8–9 ml, was first collected in a standard vacutainer blood collection tube before being filtered into another collection tube. The LL filter required flushing with PBS and saturating with RNAlater before being stored at −20°C directly. For RNA extraction, using the 2 column-based methods, TEM required less time to carry out compared to PAX, also TEM extraction resulted in high yield, high quality pure RNA without the requirement of DNAse step, which was required for PAX, (though this may be accounted for by the slightly smaller amount of blood drawn). LL extraction resulted in intermediate yield but also required less time to carry out than PAX. These observations are in broad agreement with previous studies . Isolation of RNA with the LL method also took less time and experimental manipulation than PAX, and also resulted in high yield, high quality RNA without the need for a DNase step.
Globin depletion was carried on PAX and TEM extracted RNA samples using GlobinClear. Levels of the most abundant globin alpha and beta globin isoforms in adult human blood showed reduction in these samples, compared to non-globin cleared samples and averaging lower than levels in LL, but with greater variability. LL samples did exhibit residual levels of globin mRNA, which were also lower than non-globin cleared samples. Previous studies suggested that GC depletion results in lower RNA quality and RIN values . We did not observe any reduction in RNA quality, although the RNA yield was reduced, which was more marked in the PAX samples. GC utilises oligonucleotides that target the 2 most abundant human globin species (alpha and beta, ), yet similar reductions in lower expressing isoforms (i.e. delta, epsilon and zeta) were also observed in PAX and TEM GC RNA extractions (data not shown). Globin depletion added an additional step to the extraction protocol and the additional time and cost of the procedure must be taken into account when deciding a suitable method for RNA extraction from blood.
IVT and fragmentation
All methods/conditions yielded sufficient quality and quantity of RNA to proceed with a reverse transcription reaction. After second strand synthesis, amplification and production of aRNA by in vitro transcription was carried out. IVT yield was greatest in the following order: TEM > LL > PAX > TEM GC > PAX GC. Additionally, while PAX yielded on average more RNA after extraction than LL, the latter method resulted in higher levels of aRNA even though the same amount of input cDNA was used in the IVT reaction. Possible reasons for this may be explained by comparing pre-fragmentation profiles of aRNA observed with the different conditions. Intense discrete peaks at low mass nucleotide ranges could be observed in all TEM and PAX aRNA profiles after Bioanalyzer analysis. All TEM profiles exhibited similar peaks, and all PAX profiles exhibited similar peaks. These probably represent aRNA populations associated with globin since following globin depletion these peaks disappeared in all cases and a wide range of RNA populations were observed in both TEM and PAX. A similar broad range of aRNA populations were visible in pre-fragmented LL samples. After fragmentation, profiles were indistinguishable between methods. These observations highlight the importance of globin depletion/reduction in RNA extracted from blood samples, as aRNA populations from globin skew the relative proportions of other aRNA populations within samples.
Affymetrix array quality control
Quality control metrics were consistent within each sampling method. However, when comparing between methods the saturating presence of globin significantly affected the percentage of present calls using MAS5 analysis. PAX and TEM present calls were distinctly lower than LL, PAX GC and TEM GC. Thus, already we can distinguish two separate groups of arrays, a “globin positive” group with low % present calls/large scale factor, and a “globin negative” group with higher % present calls/low scale factor. The identity of these 2 groups is highlighted in the PCA plot in Figure 8. The arrays within the “globin negative” group are much more tightly related than the arrays in the “globin positive” group which exhibit much more variability. These observations were underpinned by an approximate 60% increase in transcript present calls in the “globin negative” group compared to the “globin positive” group (15576 vs 9632 respectively). GO analysis of the approximately 6000 additional transcripts called present in the globin negative group revealed many genes associated with the nucleus, gene transcription and mitochondria. A number of these genes could be potentially be important in the aetiology/progression of ALS, given the current focus on defects in RNA transcription and processing as well as oxidative stress . Levels of alpha and beta globin isoforms mirrored the expected globin status throughout all conditions, and although LL globin levels were higher than TEM GC and PAX GC, they exhibited less variability between samples. When comparing the number of present calls of LL with other conditions, more transcripts were found to be present and in common with TEM GC/PAX GC as compared to TEM/PAX, indicating that residual globin levels in LL do not obscure the visibility of lower expressed transcripts on arrays, in comparison with the GC depleted samples.
Real-time PCR validation of differentially regulated genes and gene ontology differences
Using the PUMA bioconductor package , lists of differentially regulated genes in ALS as compared to controls were produced from LL, TEM GC and PAX GC samples. Comparison of these gene lists indicated that PAX GC and TEM GC shared a larger number of common differentially regulated genes, compared with LL. This difference can be explained when considering that the starting material in PAX and TEM cases was whole blood, containing additional cell populations, while LL enriches for leukocytes. A control analysis was also carried out with LL and TEM GC samples, where the 15 subjects were randomly assigned into 2 groups irrespective of ALS status and gene expression analysis with PUMA was carried out in order to ascertain the probability of finding differentially regulated genes in any two random groups of subjects of a similar size. After 5 iterations, the average number of differentially regulated genes (expressed as a percentage of the original study) was lower in LL than TEM GC, suggesting less noise in the LL data. An average of almost 40% of the original number of genes (approx 1200) is likely to be seen as differentially expressed in any random study. Although a number of these may be bona-fide targets, this highlights the difficulty involved in identification of actual disease markers using the techniques employed in this study.
Due to the technical limitations of the PAX technique (low yield after additional methodological manipulation), this method was disregarded for PCR validation which was carried out on LL and TEM GC samples, as they represent better methods for GEP profiling. Six genes were chosen with a minimum of 1.5 fold difference between disease and controls for each method and real-time PCR resulted in the validation of 4 out of 6 genes for each condition, indicating that both methods present similar consistency and reliability. These similarities in consistency between the 2 methods may not represent similarities in differential regulation of gene pathways, and therefore GO functional annotation clustering analysis was carried out on the respective gene lists to ascertain putative differences/similarities. Considering the a) relatively few differentially regulated genes in common between the 2 methods, and b) the difference in starting material (whole blood versus leukocyte enriched), GO analysis revealed a striking similarity in the number of common cluster terms, and this was supported by similar observations when carrying out KEGG pathway analysis. It is noteworthy that the top cluster terms, ribosome-associated genes, in TEM GC were not present in LL, and this raises the question as to whether these are associated with pathological changes and/or mirror underlying differences in initial cell populations used as starting material for each extraction method. As there are only small differences between globin levels (depleted and undepleted isoforms) in TEM GC samples when comparing patients and controls (data not shown) we can discount any potential sampling/extraction bias leading to the observed differences in ribosome-associated gene expression. Taking into account the relatively small number of ALS patients and controls in this study, drawing conclusions regarding pathway analysis should be cautious, however examples of pathways involved in ALS include: defects in RNA processing , and thus changes in expression of ribosome-associated genes during disease may arise from the aggregation of proteins such as TDP-43 and FUS, and down-stream effects e.g. on stress granule formation arising from dissociating ribosomes . In addition, the most striking differences in LL-KEGG pathway analysis were the enrichment for specific intracellular signalling pathways, such as MAPK, and PI3K, cascades commonly involved in signal transduction from the cell membrane and involved in cell survival in ALS ,  as well as the JAK/STAT pathway, a regulator of neuronal apoptosis .
GEP with blood is a feasible approach for biomarker discovery in neurological disease. Overlap in transcript expression and splicing between blood and neural tissues has been demonstrated . The present study primarily highlights the importance of reducing the high levels of globin present when using blood for GEP profiling in neurological disease . After reduction in globin levels, the expression of ∼30% of additional transcripts are unmasked, many with functions of potential importance in the pathophysiology of ALS. Although reduction in globin was achieved through different methods, remarkably similar results were obtained when comparing dysregulated pathways in ALS. The preferred TEM GC and LL methods produced the highest yield, most consistent array performance, and similar reliability when validating gene targets with real-time PCR. PAX was more a labour-intensive method, also produced good quality RNA but with lower yield and, after subsequent globin depletion, RNA quality was reduced. In future biomarker studies when making a decision regarding which method to use, the extra cost and the time required for the globin depletion step in the case of TEM must be considered against the larger volume of blood required for LL. Although we have not analysed whether any of our validated genes correspond to bone-fide hits for biomarkers, future studies should concentrate on the identification and validation of potential targets that may arise as bona-fide biomarkers for ALS. Single targets or clusters of targets can be validated by PCR in either human samples or those from animal models in a blinded fashion to see whether they are predictive of ALS or in the case of serial sampling, of disease progression.
Ethical approval for the study was granted from South Sheffield Ethics Committee and written informed consent from each subject was obtained before commencing. Blood was collected after an overnight fast from patients (n = 8) or age- matched controls (n = 7, see Table 5) either directly into PAX/TEM tubes (approx. 3 ml) or into K2EDTA spray-coated collection (BD Vacutainer) tubes (approx. 8–9 ml), for immediate passage through a LL filter. RNA was stabilized in PAX and TEM tubes by slow inversion (PAX) or vigorous shaking (TEM) for 30 and 15 seconds respectively. LL filters were washed with PBS and saturated with RNAlater as described in the manufacturer's instructions. Samples were stored at −20°C until processed further.
RNA extraction and Globin depletion
RNA was extracted by PAX (Qiagen), TEM (Applied Biosystems) and LL (Ambion) methods as described in the respective maufacturer's instructions, in each case omitting any DNase step. All samples were extracted separately and not pooled. The quality and yield of extracted RNA was analysed by examining electropherogram traces generated by the 2100 Bioanalyzer (Agilent) running the samples on a total eukaryote RNA nano chip, and by UV spectrography using Nanodrop 1000 (Thermo Scientific). Samples were stored at −80°C until required for globin depletion (TEM) or amplification (LL). Before globin depletion was carried out on TEM and PAX samples, PAX samples underwent DNase (NEB) treatment (2 µg of RNA, 2U DNase I) for 10 mins at 37°C followed by addition of EDTA to a final concentration of 5 mM and subsequent heat inactivation at 75°C for 10 mins. Samples were run on the 2100 Bioanalyzer to measure quality, and subsequent globin depletion (GlobinClear, Ambion) was carried out with 1 µg of RNA as described in the manufacturer's instructions. Where the RNA concentration was <67 ng/µl, samples had to be concentrated using 50 µg/ml Glycoblue (Ambion), in 0.5 M ammonium acetate, 50% isopropanol.
RNA Amplification and Affymetrix microarray analysis
Linear amplification was employed using the Affymetrix GeneChip 3′ IVT Express Kit according to the manufacturer's instructions. 200 ng of total RNA was reversed transcribed to synthesize first-strand cDNA with oligo(dT) primers containing a T7 flanking sequence. After second strand synthesis, labelled aRNA was synthesized in an in vitro transcription reaction. In order to minimise batch effects, 10 samples were randomly chosen for each round of IVT. A number of samples had to be concentrated using 50 µg/ml Glycoblue (Ambion), in 0.5 M ammonium acetate, 50% isopropanol before fragmentation. 15 µg of aRNA was fragmented as described in the manufacturer's instructions and hybridized overnight onto U133 Plus 2.0 human genome arrays as previously described . 12 arrays were hybridised per run and samples to be run were chosen in a similar random manner to the amplifications in order to minimise batch effects. After washing the chips in the Affymetrix Fluidics System 450, they were scanned in the GeneChip 30007G Scanner.
Quality control was initially carried out on Affymetrix expression console, on MAS5.0 normalised data. The data were also normalised by GCRMA for further quality control, gender differences using the program QLUCORE OMICS Explorer and analysis of globin isoform levels in GENESPRING (Agilent). CEL file data were analysed by the R package PUMA for compiling a list of differentially regulated genes. We utilised PPLR (probability of positive log-ratio), a tool within the PUMA suite that measures the uncertainty in estimation of gene expression, i.e. the technical variability, in order to provide a robust estimate of differential expression and listed significantly differentially regulated genes with a pLikeValue <0.05 (Liu et al., 2006). Gene Ontology and KEGG pathway analysis was carried out using the bioinformatics tool DAVID .
200 ng of total RNA from 6 of the patients and 6 of the controls (Table 5) was reverse transcribed using the High Capacity RNA-to-cDNA Kit (Applied Biosystems) in reaction volume of 20 µl. This represented our best quality samples. As calculated from efficiency plots for each gene, samples of cDNA were diluted 1∶10, and 2.5 µl (2.5 ng) were used per PCR reaction for all genes except EHBP1, where 4 µl (4 ng) was used with 5x Brilliant III Ultra fast SYBR Green QPCR Master Mix (Agilent). In all cases primer concentrations were optimised to 300 mM and reactions carried out in triplicate. Real time PCR was carried out as previously described using the MX3000P cycler and MxPro software for analysis of Ct values. ΔCt values for each sample were calculated by subtracting Ct values from genes of interest from corresponding Ct values of the housekeeping gene GAPDH, which exhibited consistent expression across all samples. Subsequent ΔΔCt (ΔCtpat-ΔCtcont) were calculated in order to assess fold changes using the following formula: fold change = 2−(ΔΔCt). Unpaired two-tailed t-tests were used to determine whether differences in gene expression levels between patients and controls were statistically significant.
Gene Ontology of genes unmasked by globin depletion (top 100 GO terms).
List of statistically significant (pLikeValue <0.05) differentially regulated genes after PUMA analysis of LL samples (3047; FC, fold change).
List of statistically significant (pLikeValue <0.05) differentially regulated genes after PUMA analysis of TEMPUS GC samples (3619; FC, fold change).
Control analysis: the number of differentially regulated genes for each of the 5 random pairs of groups after PUMA analysis and the number of genes expressed as a percentage of the original study. The percentages were averaged and standard deviation was also calculated (STDEV). The distribution of ALS patients between groups is also detailed.
KEGG pathway analysis of LL affected pathways.
KEGG pathway analysis of TEM GC affected pathways.
Kind thanks to the ALS patients and control subjects who consented to give blood samples for this study.
Conceived and designed the experiments: NB JJB JC-K PRH JK PJS. Performed the experiments: NB JC-K JJB MW PRH. Analyzed the data: NB JC-K JJB MW PRH JK PJS. Wrote the paper: NB PRH JK PJS.
- 1. Kiernan MC, Vucic S, Cheah BC, Turner MR, Eisen A, et al. (2011) Amyotrophic lateral sclerosis. Lancet 377: 942–955.
- 2. Ferraiuolo L, Kirby J, Grierson AJ, Sendtner M, Shaw PJ (2011) Molecular pathways of motor neuron injury in amyotrophic lateral sclerosis. Nat Rev Neurol 7: 616–630.
- 3. Sharp FR, Xu H, Lit L, Walker W, Apperson M, et al. (2006) The future of genomic profiling of neurological diseases using blood. Arch Neurol 63: 1529–1536.
- 4. Cooper-Knock J, Kirby J, Ferraiuolo L, Heath PR, Rattray M, et al. (2012) Gene expression profiling in human neurodegenerative disease. Nat Rev Neurol 8: 518–530.
- 5. Kudo LC, Parfenova L, Vi N, Lau K, Pomakian J, et al. (2010) Integrative gene-tissue microarray-based approach for identification of human disease biomarkers: application to amyotrophic lateral sclerosis. Hum Mol Genet 19: 3233–3253.
- 6. Rollins B, Martin MV, Morgan L, Vawter MP (2010) Analysis of whole genome biomarker expression in blood and brain. Am J Med Genet B Neuropsychiatr Genet 153B: 919–936.
- 7. Wright C, Bergstrom D, Dai H, Marton M, Morris M, et al. (2008) Characterization of globin RNA interference in gene expression profiling of whole-blood samples. Clin Chem 54: 396–405.
- 8. Rainen L, Oelmueller U, Jurgensen S, Wyrich R, Ballas C, et al. (2002) Stabilization of mRNA expression in whole blood samples. Clin Chem 48: 1883–1890.
- 9. Vartanian K, Slottke R, Johnstone T, Casale A, Planck SR, et al. (2009) Gene expression profiling of whole blood: comparison of target preparation methods for accurate and reproducible microarray analysis. BMC Genomics 10: 2.
- 10. Matheson LA, Duong TT, Rosenberg AM, Yeung RS (2008) Assessment of sample collection and storage methods for multicenter immunologic research in children. J Immunol Methods 339: 82–89.
- 11. Asare AL, Kolchinsky SA, Gao Z, Wang R, Raddassi K, et al. (2008) Differential gene expression profiles are dependent upon method of peripheral blood collection and RNA isolation. BMC Genomics 9: 474.
- 12. Oster M, Pollinger JP, Stahler DR, Wayne RK (2012) Optimization of RNA isolation and leukocyte viability in canid RNA expression studies. Conservation Genetics Resources 4: 27–29.
- 13. Hammerle-Fickinger A, Riedmaier I, Becker C, Meyer HH, Pfaffl MW, et al. (2010) Validation of extraction methods for total RNA and miRNA from bovine blood prior to quantitative gene expression analyses. Biotechnol Lett 32: 35–44.
- 14. Mougeot JL, Li Z, Price AE, Wright FA, Brooks BR (2011) Microarray analysis of peripheral blood lymphocytes from ALS patients and the SAFE detection of the KEGG ALS pathway. BMC Med Genomics 4: 74.
- 15. Prezeau N, Silvy M, Gabert J, Picard C (2006) Assessment of a new RNA stabilizing reagent (Tempus Blood RNA) for minimal residual disease in onco-hematology using the EAC protocol. Leuk Res 30: 569–574.
- 16. Debey-Pascher S, Hofmann A, Kreusch F, Schuler G, Schuler-Thurner B, et al. (2011) RNA-stabilized whole blood samples but not peripheral blood mononuclear cells can be stored for prolonged time periods prior to transcriptome analysis. J Mol Diagn 13: 452–460.
- 17. Nikula T, Mykkanen J, Simell O, Lahesmaa R (2013) Genome-wide comparison of two RNA-stabilizing reagents for transcriptional profiling of peripheral blood. Transl Res 161: 181–188.
- 18. Menke A, Rex-Haffner M, Klengel T, Binder EB, Mehta D (2012) Peripheral blood gene expression: it all boils down to the RNA collection tubes. BMC Res Notes 5: 1.
- 19. Field LA, Jordan RM, Hadix JA, Dunn MA, Shriver CD, et al. (2007) Functional identity of genes detectable in expression profiling assays following globin mRNA reduction of peripheral blood samples. Clin Biochem 40: 499–502.
- 20. Sankaran VG, Xu J, Orkin SH (2010) Advances in the understanding of haemoglobin switching. Br J Haematol 149: 181–194.
- 21. Pearson RD, Liu XJ, Sanguinetti G, Milo M, Lawrence ND, et al.. (2009) puma: a Bioconductor package for propagating uncertainty in microarray analysis. BMC Bioinformatics 10..
- 22. Wolozin B (2012) Regulated protein aggregation: stress granules and neurodegeneration. Mol Neurodegener 7: 56.
- 23. Kim EK, Choi EJ (2010) Pathological roles of MAPK signaling pathways in human diseases. Biochim Biophys Acta 1802: 396–405.
- 24. Kirby J, Ning K, Ferraiuolo L, Heath PR, Ismail A, et al. (2011) Phosphatase and tensin homologue/protein kinase B pathway linked to motor neuron survival in human superoxide dismutase 1-related amyotrophic lateral sclerosis. Brain 134: 506–517.
- 25. Loucks FA, Le SS, Zimmermann AK, Ryan KR, Barth H, et al. (2006) Rho family GTPase inhibition reveals opposing effects of mitogen-activated protein kinase kinase/extracellular signal-regulated kinase and Janus kinase/signal transducer and activator of transcription signaling cascades on neuronal survival. J Neurochem 97: 957–967.
- 26. Huang da W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37: 1–13.