Hyperacute changes in blood mRNA expression profiles of rats after middle cerebral artery occlusion: Towards a stroke time signature

Stroke evolution is a highly dynamic but variable disease which makes clinical decision making difficult. Biomarker discovery programs intended to aid clinical decision making have however largely ignored the rapidity of stroke evolution. We have used gene array technology to determine blood mRNA expression changes over the first day after stroke in rats. Blood samples were collected from 8 male spontaneously hypertensive rats at 0, 1, 2, 3, 6 and 24h post stroke induction by middle cerebral artery occlusion. RNA was extracted from whole blood stabilized in PAXgene tubes and mRNA expression was detected by oligonucleotide Affymetrix microarray. Using a pairwise comparison model, 1932 genes were identified to vary significantly over time (p≤0.5x10-7) within 24h after stroke. Some of the top20 most changed genes are already known to be relevant to the ischemic stroke physiopathology (e.g. Il-1R, Nos2, Prok2). Cluster analysis showed multiple stereotyped and time dependent profiles of gene expression. Direction and rate of change of expression for some profiles varied dramatically during these 24h. Profiles with potential clinical utility including hyper acute or acute transient upregulation (with expression peaking from 2 to 6h after stroke and normalisation by 24h) were identified. We found that blood gene expression varies rapidly and stereotypically after stroke in rats. Previous researchers have often missed the optimum time for biomarker measurement. Temporally overlapping profiles have the potential to provide a biological “stroke clock” able to tell the clinician how far an individual stroke has evolved.


Introduction
Stroke is one of the leading causes of death and of disability [1]. The most specific and biologically powerful treatment for acute ischemic stroke (IS) is thrombolysis with recombinant tissue PLOS  plasminogen activator (rt-PA) given within the first 4.5 hours of onset [2,3]. Unfortunately, thrombolysis is disappointingly underused. This is mainly because of uncertainty about diagnosis, time of onset and the perceived risks of cerebral bleeding these entail [4][5][6][7]. Even in specialized centers, these limitations mean that fewer than 15% of ischaemic stroke patients can generally be treated with thrombolysis [6,8,9]. We already use imaging biomarkers to assist in the clinical decision making process. Indeed, computerized tomography (CT) detection of bleeds is critical for rt-PA use [2,10]. Penumbral imaging by CT and magnetic resonance imaging (MRI) are increasingly used to select patients likely to respond well to thrombolysis [11,12]. To date biochemical biomarker studies have focused on confirmation of stroke diagnosis and particularly on differentiation between ischemic and hemorrhagic strokes [13][14][15][16][17]. Whilst some leads are promising none has yet provided a clinically useful biomarker. Lack of specificity and sensitivity limit safe clinical utility and rather surprisingly most measurements have been performed later than the clinically relevant thrombolysis time window [18][19][20].
Some measurements have been made early in the evolution of a stroke. Cytokine and chemokine expression change rapidly after a range of injuries including stroke [21][22][23]. Similarly, a number of general and ischemia specific stress response markers such as cortisol, hypoxiainducible factors (HIFs) and oxidative stress markers such as glutathione S-transferase have been found to change rapidly after stroke [24][25][26][27][28]. However, little attention has been given to the possibility that the pattern of hyperacute changes in gene and protein expression in the blood might provide clinically useful information. This is surprising given the highly dynamic nature of stroke when speed of intervention is known to be critical for good outcome [3,29,30]. Systematic reviews have identified over 130 candidate stroke biomarkers selected for investigation primarily because of a known role for the molecule in stroke pathophysiology [18,31]. However, recent estimates for the human genome have placed the number of genes we possess to be around 19,000 for protein-coding, and 1,500 for non-protein coding [32]. Therefore, stroke researchers have barely scratched the surface of the candidates available.
Because stroke is a highly dynamic but variable process and because previous researchers have largely ignored the rapidity with which changes might take place, we examined the hyperacute responses to stroke through the lens of blood gene-expression over the first 24 hours after stroke in rats. Our explicit goal was to identify profiles of change that could aid clinical decision making. We employed whole genome microarrays to permit an unbiased identification of blood-based RNA expression profiles occurring after stroke. Others have used this technology before [14,27] but have not focused on the early changes after stroke. Specific acute stroke biomarkers acting as a stroke clock would be valuable in clinical practice to help identify increased numbers of stroke patients who could benefit from thrombolysis.

Ethical approval
This experiment was conducted in accordance with national guidelines (Australian code of practice for the care and use of animals for scientific purposes, 8 th edition, 2013). The use of animals (surgical and behavioural procedures) was prospectively approved by the Austin Health Animal Ethics Committee (Austin Health, Heidelberg, Victoria, Australia).

Animals and induction of stroke
Transient stroke was induced by middle cerebral artery occlusion (MCAo) in eight male 16 week old spontaneously hypertensive rats (SHR). Anesthesia was induced by 5%, and maintained with 2%, isoflurane in a 50:50 O 2 :air mixture through a nose cone. Body temperature was maintained at 37.4˚C throughout surgery via a rectal thermocouple and temperature control unit. Laser Doppler flowmetry (moorVMS-LDF, right-angled laser optic probe 0.8mm, MP5b, MoorLab, Devon, UK, coupled to iWORX, 308T, Dover NH, USA) was used to monitor cerebral blood flow (CBF) in the area 5mm lateral and 1mm posterior to bregma, within the MCA perfused cortex. MCAo was achieved using a silicon-coated thread (coating diameter 0.35mm and 2mm length), which was maneuvered through the internal carotid artery approximately 18mm until resistance was felt and a drop in CBF noted on Laser Doppler [33,34]. All incisions were closed using silk sutures and the animals allowed to recover from anesthesia. Reperfusion under anesthesia was carried out 1.5 hours after vessel occlusion by withdrawing the thread into the external carotid stump and sealing it in place.
Behavioral testing for neurologic deficit was undertaken at 1, 1.5 (immediately prior to anesthesia for reperfusion), 2, 3, 6 and 24 hours post stroke induction using a 5-point scale modified from that of Petullo et al. [35].
At 24 hours post-occlusion, the rats were killed by isoflurane overdose and additional blood (see below) was collected from the heart by syringe. The brain was cut into 2mm slices using a rat brain matrix, and stained with 1% 2,3,5-triphenyltetrazolium chloride (TTC) in saline for 20 minutes (10 minutes each side), before fixation in 10% formalin. Digital images of the stained brain slices were captured on a flatbed scanner (2400dpi), and infarct volume measured using ImageJ. Infarct areas were converted to volumes by finding the average area of two sections and multiplying by the distance between the slices.

Blood sampling
Animals were warmed briefly under an infrared lamp for approximately 10 minutes to facilitate dilation of the tail veins. Rats were firmly wrapped in a towel and blood collected from the lateral tail vein. Whole blood (250μl) was drawn from the lateral tail vein (1ml syringe, 26G needle) into 500 international units of heparin (500IU, Pfizer) at 0 (immediately before anesthesia induction), 1, 2, 3, and 6 hours post MCA occlusion. The 24 hour sample was taken by heart puncture immediately following anesthetic overdose. After mixing, the samples were centrifuged at 1100g for 10 minutes to separate the plasma and cells. About 150μl plasma was collected onto ice and transferred to -80˚C within 15 minutes. Seven hundred microliters of PAXgene solution (collected from PAXgene blood tubes, PreAnalytiX, Qiagen/ BD) was added to the blood cell pellet, the sample mixed and left at room temperature for 10-30 minutes before storage at -80˚C until RNA extraction.

RNA and microarray processing
RNA was prepared from the whole blood cell pellets using the PAXgene Blood RNA Kit (PreA-nalytiX #762174, Qiagen). The samples were randomized between 6 batches for workup in groups of 8. The manufacturer's instructions were followed except in accommodating the 10-fold smaller volume of the initial blood samples, which were in 1.6ml microfuge tubes. The PAXgene pellet was washed in 1 ml of RNase-free water and the final RNA samples were eluted in 40 +20 μl buffer BR5. The RNA was quantitated on a Nanodrop ND-2000 (mean yield 6.9μg) and quality was assessed by the OD 260/280 ratio (range 1.9-2.2). RNA was stored at -80˚C.
Standard Affymetrix protocols were followed for cDNA synthesis, fragmentation and labelling of samples for the microarrays. Briefly, RNA amplification was undertaken with the Nugen Applause WT Amp System (system cat# 5500-24, Millenium Science) using 200ng RNA for the first strand synthesis. cDNA was purified with the Qiagen Min-elute Reaction Clean-up kit and the Encore Biotine module (NuGen) was used for fragmentation and labelling.

Identification of existing candidate biomarkers
For comparison purposes, a systematic review was performed to identify papers claiming the utility of individual genes as candidate stroke biomarkers because of statistically significant changes reported in the manuscript. Those markers that only showed utility for stroke prognosis or when combined with a panel of other markers were not included in this list.

Data analysis
Infarct volumes are presented as average ± standard deviation.
For gene expression, pairwise comparisons were made between neighboring time points of the experiment using linear models, and the resulting F-statistic was used to identify genes whose expression varied the most across the time course of the experiment. Models were constructed, and significance testing was performed using the R package limma [36]. The level of significance for gene expression change over time was initially set by using the traditional p-value of <0.05. However to reduce false positive discovery this was later set to a pvalue of <5x10 -7 . Cluster analysis was performed using a Spearman correlation distance measure followed by a K-means clustering using the R function kmeans [36]. Gene expression data was averaged across the samples available for each time point, and mean centered so that similar clusters could be identified across different gene expression magnitudes. Heat maps were constructed using the R package gplots [37] and current gene annotations, including homolog identification and array coverage were obtained using the R package biomaRt [38].
For comparison of our data with other researchers' candidate biomarker gene lists, data manipulation and analysis has been conducted in R.

Animals
Stroke surgery was undertaken in 8 SHR rats. All animals were 16 weeks old and weighted 321 ±13g (range 305-342g) prior to surgery. Anesthesia duration required to induce MCAo was similar across animals (Average 50 minutes; range 41-69 minutes). Stroke was successfully induced in all animals, as evidenced by the development of behavioral deficits and a cerebral infarct at 24 hours post MCAo (Fig 1a and 1c). Average infarct volume was 205.76±45.79mm 3 . Damage consistently involved both the striatum and cortex (36.76±8.29mm 3 , 169.0±41.44mm 3 respectively; Fig 1b).

Blood samples
In all 8 rats, the 6 blood samples were collected except for one time point in one rat (6 hours post MCA occlusion in animal 3). A total of 47 samples were therefore collected for biomarker analysis.

Time responsive genes
Genes with expression varying the most over time were ranked by p-value of the F statistic, with a cut-off of 5x10 -7 . One thousand nine hundred and thirty-two genes reached this level of significance (S1 Table). The 20 most time responsive genes, their functional designation and overall expression are listed in Table 1

Patterns of gene expression after stroke
The cluster analysis identified 25 different patterns of gene regulation over time. Eight of them are illustrated here because of their potential clinical utility (Fig 3). Lists of genes with clusters are shown in supplementary data (S2 Table).  Table 1. Time responsive genes. Top 20 of the genes with expression varying the most over time (ranked by p-value of the F statistic, p<5x10-7). Genes presented with their p-value, probe ID, and brief gene description.

p-value
Probe ID Gene Name Description

Comparison with existing candidate biomarkers
Based on the published literature, a list of 73 candidate biomarkers was generated [13,14,16,18,27,[39][40][41][42][43][44][45] and reported in Table 2 using human nomenclature. The majority of this data comprised measurements made far beyond the stroke thrombolysis window. Of the 73 biomarkers listed, one was a protein dimer (Ddimer) and could not be represented by gene expression, and one was a metabolite (homocysteine) which was the result of a metabolic pathway involving several genes. Of the remaining molecules, if the name reported in the literature was different to the current gene annotation, then this was reported in parenthesis, and used to identify matches in our data. To be able to examine their expression levels in our rat data, the corresponding homologous genes were identified in rat if they existed. We then plotted (as a heat map) the mRNA expression profiles of these published biomarker candidates as detected within our experimental results against time post stroke in rats. This heat map highlighted that, for genes originally selected because of change at 24 hours, expression levels often began to change much earlier (1-3 hours) and indeed often exhibited greater change within this earlier window (Fig 4).
The genes identified as candidate biomarkers from the literature were also identified in our data, ordered by level of significance from our analysis, and compared to our top 1932 timeresponsive gene candidates. The ranks showed that 29 of the 59 genes (49%) presented in Table 3 were in our list of most significant changing genes over time, with the highest ranked gene being Interleukin 1 receptor, type II (Il1r2, rank = 8). It should be noted however that most candidates from the literature fall much lower in our ranking of the most time dependently changed genes.

Effect of surgery and anesthesia
To explore the difficulty of selecting appropriate controls in animal experiments requiring surgery to induce stroke, and which our experiment lacked, we also compared our data set with an available list of genes potentially specific for the impact of surgery and anesthesia. In 2001, Tang et al. studied the blood gene expression for different neurological conditions including stroke. They identified genes significantly (more than two fold) over or under expressed in the ischemic stroke and sham groups after 24 hours compared to untouched controls [46]. Twenty five genes were shown to be significantly over expressed in the animal blood at 24 hours after ischemic stroke while 98 had significantly decreased expression. In the sham group, 40 genes were upregulated and 126 downregulated at 24 hours after the intervention. These four lists of up-and downregulated genes for ischemic stroke and sham models were obtained from Tang et al. supplementary data. Their 'genes' were reported as probesets from the Rat U34A array.
To determine whether the up and downregulated specific stroke genes identified by Tang were present in our dataset and exhibited a similar response, gene names were mapped between the two datasets. For this, Tang's probesets were first annotated with current gene designations from the rat genome using the R library biomaRt [38]. This library was also used for mappings between the Affymetrix GeneChip Rat Gene 1.0st platform used for our data collection, and the older U34A which Tang employed.
The sham and stroke up-and downregulated genes identified by Tang were extracted from our stroke data and plotted in 4 different heat maps (Figs 5 and 6). These 4 heat maps confirmed that expression of these genes varied over time after stroke and that some of these changes occurred very early after stroke (as early as after one hour post vessel occlusion). In the significantly downregulated genes after stroke, more than a third of the genes identified by Tang et al. were actually first upregulated between 2 and 6 hours after MCA occlusion and only then downregulated at 24 hours (top half of Fig 7). This subset of downregulated genes after stroke contrasts with another subset also represented in the same heat map where the genes appeared to be upregulated before the start of the surgery then downregulated over a 24 hour period after stroke, with minimal expression observed between 2 and 6 hours after stroke (bottom half of Fig 7). Interestingly, from the heat map plot of genes from Tang's sham downregulated class (Fig 6 separated in A and B for better image quality), we can identify a subset of genes whose expression is suppressed after the initiation of the anesthesia and start of the surgery and with a maximum of downregulation peaking for the samples collected at 2 to 6 hours (after the end of the anesthesia period, Fig 6B).
Focusing on the 24 hour time point, it appeared that not all the genes identified as up or downregulated by Tang showed these characteristics in our dataset. The concordance between Towards a stroke clock using mRNA expression profiles Table 3. List of candidate ischemic stroke biomarkers ranked in our data. Genes identified based on published results ranked by p-value of the F statistic based on their gene expression over time in our data set. The red line represents the p<5x10 -7 cut-off.    Towards a stroke clock using mRNA expression profiles  the sham groups and generated a Venn diagram of the significantly differently expressed specific genes for the two conditions ( Fig 8A). We identified 49 genes whose expression was significantly changed only in the ischemic stroke group (17 upregulated and 32 downregulated genes). These potential ischemic-specific genes were then ranked in our prioritized list of the top 1932 genes varying over time after stroke (ordered by the p-value of the F-statistic). From the 17 genes upregulated exclusively in the Tang stroke group, 16 were successfully matched in our data set and four of them were found to be in our top 1932 gene list (Table 4). From the downregulated list of 32 genes, 24 were successfully matched and 7 were present in our top list ( Table 5). The expression levels of these potential specific ischemic genes were reanalyzed over time in our rat data set and the corresponding heat maps generated (Fig 8B and 8C). These heat  Towards a stroke clock using mRNA expression profiles Table 4. List of upregulated potential specific ischemic stroke genes ranked in our data. Genes identified by Tang et al. ranked by p-value of the F statistic based on their gene expression over time in our data set. The red line represents the p<5x10-7 cut-off. Towards a stroke clock using mRNA expression profiles maps also demonstrated a time evolving pattern of expression after stroke which was not as well defined as that for the other candidate biomarkers identified from the literature ( Table 2).

Discussion
The use of tPA therapy in ischemic stroke is largely limited by uncertainty about the time of stroke onset. The main objective of this study was to provide proof of principle that changes in gene expression profiles after stroke might provide a time dependent signature. The current study clearly identified significant hyperacute changes in the patterns of mRNA expression in the blood of rats over time after MCAo, and that the elaboration of a stroke clock based on gene expression patterns from blood is indeed feasible. Such a stroke clock would allow clinicians to understand where in the highly dynamic evolution of a stroke a patient is when they first present at hospital. Specific blood biomarkers capable of predicting stroke onset time may expedite stroke diagnosis in the emergency department and help increasing the number of patients eligible for intravenous thrombolysis or other beneficial intravascular procedures. To identify the most variable genes over time after stroke, we used a pairwise comparison model and significance testing. Our initial analysis found 13000 genes significantly changing over time with the classical p-value cut off for significance of 0.05. Since this would still give 5% false positives (equivalent to more than 1300 genes in our data set), we decided to increase the level of significance required to a p-value of 5x10 -7 to reduce false discovery.
Despite the lack of control group in our experiment, the biology of the genes identified by this highly selective regime as the most variable over time is consistent with what we known of the IS physiopathology. Some of the top 20 most time responsive genes appear to be genes that are relevant to the biology of the injury. For example, the interleukin-1 receptor (Il-1R) has been a target for antagonism in the management of acute stroke. Blockage of Il-1 action via its receptors has been shown to reduce brain damage in rodent model of cerebral ischemia [47,48]. It is also well known that the neuronal nitric-oxide synthase plays a crucial role in the regulation of cerebral blood flow against pathogenic factors associated with cerebral ischemia [49]. Prokineticin 2 (Prok2) has also recently been identified as a deleterious mediator for cerebral ischemia [50].
By cluster analysis, we found evidence for more different patterns of gene expression than expected. The majority of genes showed first either an upregulation or downregulation immediately after the ischemic event and then a progressive trend to return to their basal activity at 24 hours. These included gene expression profiles with high potential clinical utility including hyperacute or acute transient upregulation/downregulation (with broad or very narrow expression peaks (or troughs) at different points within the 2 to 6 hour window after stroke (Fig 3A and 3B). Many genes showed complex expression profiles with more than one peak of expression (Fig 3D) or varied in opposite directions (Fig 3C and 3H) over time.
In our experiment, the shape of the profiles for different gene sets identified differs dramatically and will allow the construction of a stroke clock where the expression patterns of a series of genes would be able to define the evolution of the stroke damage. Indeed, by pulling out the gene expression levels of Mrgprx3, Prok2 and Bank1 (all part of the top 20 most variable genes over time list) and combining them 2 by 2 at 3 different time points (expression of Bank1 and Mrgprx3 at 2 hours, expression of Bank1 and Prok2 at 6 hours and expression of Bank1 and Mrgprx3 at 24 hours; see striking image), we could create a tool capable of determining the stroke onset time.
The strength of our study is the use of sequential blood drawing in the same animals and the high frequency of repeated blood samplings starting hyperacutely after the initiation of the ischemic brain injury. Our finding gives a new perspective on most previously published stroke biomarker results. By drawing blood samples repetitively in the first 24 hours after stroke, we have been able to provide new evidence that many candidate biomarkers change hyperacutely after stroke and trend back towards their basal value within 24 hours.
Importantly, our study demonstrated that our identified biomarkers profiles are highly reproducible across individual animals and that changes in gene expression are of a significant magnitude (multiple log change). These observations are consistent with highly reproducible induction of stroke and collection of blood at well-defined times after stroke highlighting the utility of animal models in the quest for ischemic stroke biomarkers.
By analyzing the expression of Tang's gene lists (based on samples collected at 24 hours after stroke induction) within our data set, we found that marked changes of gene expression had occurred within the first 24 hours after stroke. We also noted differences in the identification of up or downregulated expression of some genes at 24 hours after stroke or sham surgery. Finally, we found that only a small proportion of the genes presented by Tang as potentially specific for ischemic stroke were part of our list of time responsive genes (when ranked by the p-value of the F-statistic and with a p-value cut off of 5x10 -7 ). It is not too surprising that our results differed from Tang's. Firstly, arrays have evolved considerably over the last ten years. Tang's expression data were collected on the Affymetrix GeneChip Rat Genome U34A array which contained probesets for 7,000 full-length sequences, and about 1,000 EST clusters. The Affymetrix GeneChip RatGene 1.0 ST array offers a coverage of more than 27,000 protein coding transcripts and 24,000 Entrez genes. Our data set is therefore enriched. This also explains why we could not find a match in our data set for all the genes presented by Tang. Mapping of older Affymetrix UTR-probeset array technology to the current rat genome was not perfect, as the Rat genome has been refined and improved since the U34A array was designed. As a result not all the genes, or probesets, identified from analysis of the array data had current annotations in the rat genome and in some cases U34A probesets matched to multiple genes. Differences observed between the two experiments at the 24 hours sampling time can also be explained by a difference of collecting methods (Tang et al. did not used PAXgene tubes and mixed the blood from two different animals to have sufficient blood to conduct the microarrays). The use of a single time point of 24 hours can also account for the relative short list of upregulated genes specific to stroke that Tang presented since we find that most genes have returned to near normal by this late time.
The reanalysis of the other genes identified as candidate stroke biomarkers from the human literature showed a greater overlap with our data. More than half of these genes were present in our most significant time responsive gene list. This argues in favor of the genes present in our list being specific to stroke physiopathology and that the changes in gene expression observed in our data are unlikely to all be due to the surgery or anesthesia required to induce stroke in our model system since this is not a feature of the human studies. This concordance of the rat and human data points to the potential for successful translation. It should be noted however that others have cautioned on extrapolating from rat to human data [19]. The heat maps created with the literature gene lists presented time patterns similar to those we identified with our cluster analysis. Even if these candidate biomarker genes were initially not identified for this use, they have the potential to contribute to a time signature for stroke. The more dramatic time signatures discovered in our unbiased whole genome experiment suggest that many of these genes may have even greater utility for this purpose.
Candidate biomarkers have rarely been selected in the context of looking for a stroke clock. Turck et al. analyzed blood samples taken at 2 different time points (within 3 hours and within 36 hours) after stroke but had to combine two separate cohorts of patients to achieve their aim [27]. The only study collecting sequential human blood samples in the same individual early after stroke was conducted by Tang et al. in 2006 but their attention focused on elaboration of a diagnostic tool capable of differentiating ischemic stroke patients from controls and the earliest blood samples were collected 3 hours after the ischemic event [14]. Therefore, this study is to our knowledge the first to combine whole genome screening and sequential blood sampling within a time frame capable of informing clinical decision making.
All experiments have limitations. Our study will need to be replicated and to include a comprehensive series of sham groups to provide information about the specificity of these biomarkers to brain ischemia and the potentially cofounding roles of surgery, ischemia in other tissues (primarily neck and jaw), anesthesia and stress. The experiment will also have to be replicated in female rats to account for sexual dysmorphism in stroke. Finally, the data reported here comes from an animal model of stroke. Such data can only be hypothesis generating and the changes identified here will need to be replicated in studies of humans with stroke. Nevertheless, in mass spectrometry analysis of plasma from patients with stroke, we have already found that protein expression in humans displays similar temporal profiles [51].
In conclusion, we have demonstrated that gene expression changes over time after ischemic stroke and that this change occurs quickly after the event. Gene expression might be used to construct a "Stroke Clock" which provides important information on patient selection for acute stroke therapy.
Supporting information S1 Table. Time responsive genes. List of the 1932 the genes with expression significantly varying over time (ranked by p-value of the F statistic, p<5x10 -7 ). Genes presented with their pvalue, probe ID, Ensemble gene ID and brief gene description.