Gene Expression Patterns in Peripheral Blood Correlate with the Extent of Coronary Artery Disease

Systemic and local inflammation plays a prominent role in the pathogenesis of atherosclerotic coronary artery disease, but the relationship of whole blood gene expression changes with coronary disease remains unclear. We have investigated whether gene expression patterns in peripheral blood correlate with the severity of coronary disease and whether these patterns correlate with the extent of atherosclerosis in the vascular wall. Patients were selected according to their coronary artery disease index (CADi), a validated angiographical measure of the extent of coronary atherosclerosis that correlates with outcome. RNA was extracted from blood of 120 patients with at least a stenosis greater than 50% (CADi≥23) and from 121 controls without evidence of coronary stenosis (CADi = 0). 160 individual genes were found to correlate with CADi (rho>0.2, P<0.003). Prominent differential expression was observed especially in genes involved in cell growth, apoptosis and inflammation. Using these 160 genes, a partial least squares multivariate regression model resulted in a highly predictive model (r2 = 0.776, P<0.0001). The expression pattern of these 160 genes in aortic tissue also predicted the severity of atherosclerosis in human aortas, showing that peripheral blood gene expression associated with coronary atherosclerosis mirrors gene expression changes in atherosclerotic arteries. In conclusion, the simultaneous expression pattern of 160 genes in whole blood correlates with the severity of coronary artery disease and mirrors expression changes in the atherosclerotic vascular wall.


Introduction
Coronary artery disease, a multifactorial chronic disease, is the leading cause of death in Western countries. Despite considerable advances in the prevention and treatment of coronary artery disease and its complications, morbidity and mortality remains high. In half of patients with coronary artery disease, the first manifestation is death [1]. Consequently, substantial efforts are being put into the development of new strategies for accurate noninvasive diagnosis of coronary artery disease and the identification of novel treatment targets [2].
Systemic and local inflammation has been shown to play a prominent pathologic role in atherosclerotic coronary artery disease [3]. Adhesion of leukocytes to activated endothelial cells and their migration into the arterial wall are thought to initiate, propagate, and destabilize coronary plaques. All types of blood constituents appear to play a role in plaque formation, although the majority of inflammatory lesions in atherosclerotic vascular tissue consist of foam cell macrophages and activated T-cells [4]. Several studies have found distinct gene expression patterns in atherosclerotic arteries [5][6][7][8]. While other pathways are likely also important, a consistent feature has been differential expression of inflammatory genes and genes involved in cell cycle control [9][10][11][12].
Microarray analysis of peripheral blood cells is a practical approach to study gene expression changes that may reflect not only genetic predisposition but also presence and activity of disease, environmental modifier effects, and treatment responses [13]. Total peripheral leukocyte count correlates with the severity of coronary atherosclerosis and is a strong predictor of cardiovascular outcome [14], but little is known about the role of phenotypic changes in circulating blood cells of patients with coronary atherosclerosis. In a recent micro-array analysis, 526 genes were found to be differentially expressed in isolated mononuclear cells from 41 patients [15]. Gene expression patterns of 50 of these genes together with 56 genes selected from the literature were subsequently shown to be associated with the presence of coronary artery disease in two independent cohorts. The aim of the present study was 1) to identify distinct genomic markers in peripheral whole blood that correlate with the severity of coronary artery disease using micro-array analysis and 2) to investigate to what extent gene expression patterns in peripheral blood mirror those in atherosclerotic arteries.

Patient Selection and Characteristics
Patients and control subjects were recruited from individuals that had undergone catheterization in the Duke University Hospital Cardiac Catheterization Laboratory and participated in a proteomics study to discover candidate proteins that are differentially displayed in populations with and those without angiographic coronary artery disease [16]. After being approached and providing informed written consent, subjects had clinical and laboratory data collected. The investigation conforms to the principles outlined in the Declaration of Helsinki, and was approved by the Duke Institutional Review Board.
Patient selection, design and results from the main proteomics study have been reported previously [16]. Populations were initially defined in order to minimize differences in plasma proteins unrelated to the presence or absence of coronary artery disease. As a practical strategy, three different cohorts of subjects (cases and controls) were enrolled: 1) matched men (n = 106), who were matched for age and ethnic group, 2) unmatched men (n = 82), who did not fulfill the matching criteria and 3) unmatched women (n = 53). The severity of coronary artery disease was scored using the Duke Coronary Artery Disease Index (CAD-Index) [17,18]. The CAD-index is a prognostic assessment of the extent of coronary artery disease, accounting for the number and severity of lesions and diseased vessels and involvement of left anterior descending and left main disease.
Inclusion criteria for the coronary artery disease patient population (cases) were: age between 40 and 65 and coronary artery stenosis of .50% in at least one major coronary artery. Inclusion criteria for the control population (controls) were: age between 40 and 65 for matched men cohort only, no angiographically detectable coronary artery stenosis on cardiac catheterization within the last two years, normal left ventricular ejection fraction and normal regional wall motion. Exclusion criteria for controls were typical signs of angina, or any history or evidence of myocardial ischemia on stress testing, myocardial infarction or unstable angina, any history of peripheral arterial or cerebrovascular disease, or significant vascular stenosis on noninvasive imaging or angiography. Exclusion criteria also included myocardial infarction within one month (for cases), diabetes, uncontrolled hypertension (systolic blood pressure .180 mmHg or diastolic blood pressure .100 mmHg) or with end-organ damage, renal insufficiency (creatinine .2.0 mg/dL or BUN.40 mg/dL), active malignancy, significant valvular heart disease, NYHA Class III or IV heart failure, cigarette smoking .2 packs per day, total cholesterol .300 mg/dL or triglyceride .400 mg/dL, anemia (hemoglobin ,12.5 g/dL for females or ,13.5 g/dL for males), and hypotension (systolic blood pressure ,90 mmHg and diastolic blood pressure ,50 mmHg).

Blood Sampling and Gene Expression Analysis
The blood samples (2.5 mL) were collected in PAXgene TM Blood RNA tubes and total RNA was isolated using the standardized RNA Kit (PreAnalytiX, Qiagen) [19]. RNA isolation started with a centrifugation step to pellet nucleic acids in the PAXgene Blood RNA Tube. The pellet was then washed, and Proteinase K added to digest proteins. Alcohol was added to adjust binding conditions, and the sample was applied to a PAXgene RNA spin column. During a brief centrifugation, RNA selectively bound to the PAXgene silica-gel membrane and eluted using an optimized buffer.
RNA was then quantified by absorbance at A260 nm and the purity was estimated by the ratio A260 nm/A280 nm. RNA integrity was confirmed by non-denaturing agarose gel electrophoresis. RNA was stored at 280uC until further analysis. The quality of 19 RNA samples was insufficient for microarray analysis due to degradation. The genomic studies were conducted in the Novartis Genomics Factory, Basel, Switzerland.
Genome-wide transcript profiling was assessed using human HGU133A oligonucleotide expression probe arrays (Affymetrix, Santa Clara, CA, U.S.A.), comprising 22,483 probe sets. The experiments were done according to the recommendations of the manufacturer [20]. Data was normalized using MAS5 (Affymetrix); the data is publicly available at the Gene Expression Omnibus (GEO) repository (accession number GSE12288, http:/ www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE12288). As quality control, RT-PCR was performed on 8 selected genes in 2620 subjects from the 'matched men' cohort.

Independent Evaluation of Predictive Gene Model in Human Aorta Tissue
To test whether the expression pattern in peripheral whole blood is representative for atherosclerosis in general, we have examined the capability of expression of genes derived from the peripheral blood cell study to predict the severity of atherosclerosis in human aortas. Gene expression data was generated using RNA extracted from a unique collection of freshly harvested human aortas with varying degrees of atherosclerosis (n = 67 donors). Donor identification, RNA extraction and micro-array methods (Affymetrix U95Av2) as well as gene expression signatures that differentiate between atherosclerotic disease states in human aortas have been reported previously [8]. As indicated in the original report, disease extent (normal, intermediate, severe) was scored by combining Sudan IV staining and raised lesion data. The ''normal'' or minimally diseased group showed no Sudan IV staining and contained no raised lesions, while the ''intermediate'' group showed more than 20% Sudan IV staining but contained no raised lesions. The ''severe'' group contained raised lesions covering more than 10% of the surface. We identified 20 normal, 25 intermediate and 22 severely diseased sections for this analysis.

Statistical Methods
Spearman rank correlation between CAD-index and gene expression was calculated (Partek Genomics Suite Version 6.3). An absolute correlation coefficient (rho) .0.2 was considered clinically relevant, corresponding to a p-value of 0.003 (n = 222). Among the 22,483 probe sets of the Affymetrix HGU133A chip, about 60 probe sets can be expected to have an absolute rho.0.2 by chance (false positives). Student's t test, parametric correlation and rank correlation according to Spearman were performed with the statistical software package S-Plus Version 6.
Projections to Latent Structures (PLS) analysis including Orthogonal Signal Correction (OSC) (SIMCA-P Version 10.0) was used to identify gene sets that discriminate between increasing CAD-indices or the three classes (normal, intermediate and severe) of atherosclerosis in the aorta samples. To reduce gene selection bias, models were subsequently repeatedly built based on data from two cohorts to predict CAD in the third cohort. In addition, extensive crossvalidation by leave-one-out technique and validation by response permutation was applied to 7 groups of approximately 32 subjects to reduce bias in creating a predictive gene set.

Patient Demographics
Demographic data, medical history and medication of the study population are summarized in table 1. A history of hypertension was significantly more common in the cases. Aspirin, statins, and blood pressure lowering agents were more frequently taken by the cases. All controls had no angiographically significant coronary artery disease (CAD-Index = 0). Within the cases, however, there was a wide distribution, with 81% of cases having a CAD-Index between 25 and 63. Although most cases (93%) had at least twovessel disease or severe single-vessel disease, the distribution of cases is skewed towards the lower end of CAD-Index.
Clinical laboratory parameters were available for all subjects (table 1). Hematocrit and white blood cell counts were not significantly different. Total cholesterol and LDL-cholesterol levels were significantly lower in the coronary artery disease group, probably reflecting a higher use of statins.

Gene Expression
Gene expression data from 222 out of 241 subjects were available for this analysis (110/121 cases and 112/120 controls); RNA from the remaining 19 subjects did not pass quality control due to degradation.
In a univariate analysis, 160 genes were found to correlate with CAD-Index with an absolute rank correlation coefficient (rho) .0.2 (P,0.003). All probesets correlating with CAD-Index are listed in table 2. Most of these genes are known to be involved in hematopoietic cell differentiation, cell growth or growth arrest, apoptosis, cell adhesion, matrix modulation and inflammatory and immune response, processes known to modulate atherosclerosis.
Using log-transformed data with signal intensities .80, only 19 probesets were found to be significantly differentially expressed in a multiway ANOVA (smoking, age, gender, cohort, race, CAD (i.e. case vs. control) and CAD-index as fixed factors or random effects, respectively) (table 2). However, when only taking the 20 controls with the least predicted CAD versus the 20 cases with the most predicted disease into account, a formal comparison yielded 90 out of the 160 probesets with statistically significant differential expression (p,0.05, no adjustment for multiple comparisons) (table 2). rt-PCR confirmed the Affymetrix results for 7 of the 8 genes tested in 20 cases and 20 controls (FKBP8, ITPK1, MARCH2, PNPLA2, TUBA3, UBXD1, FTL); the remaining gene (PINK1) did not show a significantly different expression on rt-PCR.

Correlation of Gene Expression Profile with Coronary Disease
All 160 genes with rho.0.2 were included in the PLS analysis, with CAD-Index as the only response variable. Polynomial regression analysis of the resulting t1-scores versus CAD-Index resulted in the prediction model including 95% confidence range of the regression and the 95% prediction interval with r 2 = 0.764 (p,0.001) (figure 1). Predictive accuracy was found to be excellent in the overall population (RMSEE (root mean square error of estimation) = 0.323), but improved with increasing threshold of CAD (RMSEE = 0.249 for controls vs cases with CAD.40; RMSEE = 0.204 for controls vs cases with CAD.60 and RMSEE = 0.172 for controls vs cases with CAD.70).
In order to test for robustness of the model, the PLS analysis was performed separately for each of the three cohorts, with the model repeatedly constructed using two cohorts (training sample) and tested in the third cohort (test sample). While the controls remain quite stable in the range of -2 standard deviations, the t1-scores of the cases were located mainly in the +2 standard deviation range and increase with increasing CAD-Index (figure 2). This relationship is clearly present in each cohort. Cross-validation of the model was also performed by dividing the data into 7 groups of on average 32 subjects and then developing a number of parallel models from reduced data with one of the groups deleted. The omitted group was then used as a test data set, and the differences between actual and predicted CAD-Indices were subsequently calculated for these data points. The reduced models validation demonstrated a Q 2 cum of 0.776, indicating an excellent predictive ability.
A Variable Importance in the Projection (VIP) of each gene for the separate PLS analyses of the three cohorts compared to the PLS analysis including all subjects was calculated. The VIP of the first 24 genes shows only little variation between the three cohorts suggesting a rather high stability of the prediction model (figure 3). A set of eight genes appears to have the highest impact on the model (FTL, FKBP8, TUBA3, PNPLA2, UBXD1, MARCH2, ITPK1, PINK1, in order of contribution; listed in bold in table 2). A PLS analysis only involving these eight highest ranking genes in the VIP analysis showed that the expressions profiles of these eight genes are also able to predict the CAD-Index (r 2 = 0.752). Adding traditional risk factors and biochemical markers do not significantly improve this model (r 2 = 0.782).

Test of Predictability in Human Aorta Tissue Samples
Since the genes whose expression contributes to prediction of CAD were studied within circulating leukocytes, we sought to define whether they actually reflect a molecular process that is ongoing within atherosclerotic arteries or not. Furthermore, as a test of reproducibility of the contribution of these 160 genes to predicting atherosclerotic disorders, we have investigated whether the in situ expression pattern of our 160 genes derived from peripheral blood could also adequately predict the severity of aorta atherosclerotic lesions. To achieve this goal, we have used gene expression data extracted from a large set of human aortas obtained from heart donors (n = 67), an independent human model of atherosclerosis. Excluding genes that are not present on the microarray used in the aorta expression study, the expression pattern of the remaining genes accurately separated the aorta samples according to the severity of atherosclerosis (figure 4). These results indicate that gene expression changes in peripheral blood are correlated with the extent of coronary atherosclerosis reflect similar pathophysiological changes in atherosclerotic arteries.

Discussion
In this large-scale expression analysis of peripheral whole blood cells, we have found 160 genes whose expression correlates with the severity of angiographically documented coronary artery atherosclerosis. Taking into account that the CAD-Index is a semiquantitative estimate of the extent of coronary atherosclerotic disease, which implies variation across subjects even with the same degree of disease, the prediction based on expression pattern of these genes is robust. Our findings are also robust as assessed by internal validation and consistency across three distinct subgroups. Importantly, the in situ expression pattern of the 160 genes derived from the peripheral blood analysis was also predictive of the severity of atherosclerosis in human aorta tissue. This provides validation of the association of this set of genes with atherosclerosis and support for the concept that peripheral blood gene expression reflects pathophysiology in the vascular wall. Taken together, the molecular signature in peripheral blood for varying degrees of coronary artery disease is remarkably consistent with that seen in the atherosclerotic arterial wall, providing valuable new information of the pathways and their genes that are involved in the atherosclerotic process.
Peripheral blood is easily accessible and routinely used for diagnostic laboratory analysis and thus is a good resource for additional tests that might define extent of coronary artery disease. Several inflammatory markers, including high sensitivity C-     [21]. Nevertheless, there is debate as to the additional prognostic value of these tests beyond traditional risk factors [22]. Other non-invasive analyses, such as coronary multislice CT can identify the extent of coronary artery disease, but such tests require specialized equipment and involve use of intravenous contrast and radiation. A simple blood test that predicts the extent of coronary artery disease could provide an additional useful tool for screening for coronary artery disease in at-risk populations. A similar approach has been successfully used for detection of cardiac allograft rejection and the response to immunosuppressive therapy [23]. Most of the differentially expressed genes in the present study are involved in bone marrow cell differentiation, cell growth or growth arrest, apoptosis, cell adhesion and matrix modulation, and inflammatory and immune response, processes known to modulate atherosclerosis. Since blood samples were taken in stable patients, our finding that circulating blood cells differentially express many pro-inflammatory genes supports the paradigm that inflammation is an important process in patients with coronary artery disease. Expression patterns of the same genes were found to correlate with the extent of atherosclerosis in human aortas as well, indicating that gene expression patterns in peripheral blood cells associated with coronary artery disease to some extent mirror gene expression changes in the atherosclerotic vessel wall. Indeed, many of the genes shared by our predictive models modulate monocyte or macrophage function, including MAN2A [24], RXRA [25], LGALS9 [26], PSG3 [27], CEPBA [28], ARGHAP4 [29], MADH5 [30], AIF1 [31], ELAVL2 [32], STXBP2 [33], KCNMB1 [34], PDE4D [35], EPHB2 [36], GGA3 [37], PLAUR [38], NPR3 [39] and TNFRSF5 (CD40) [40]. Interestingly, four of these genes (KCNMB1, NEDD4L, ADD1 and NPR3) have been implied in genetic susceptibility for hypertension [41][42][43][44], while two genes have been associated with genetic susceptibility for stroke (PDE4D) [45] or myocardial infarction (ADD1) [46]. The present results also appear to support a role for ferritin light chain (FTL) in atherosclerosis [47]. Ferritin is the major intracellular iron storage protein that plays a major role in the reaction to oxidative stress. Using a proteomic approach, You et al. found that the levels of ferritin light chain protein were significantly increased in atherosclerotic coronary arteries [48]. Ferritin light chain is also upregulated in circulating leukocytes of patients with juvenile rheumatoid arthritis, sickle cell disease, autoimmune renal disease or multiple sclerosis, indicating that altered FTL gene expression in   [49][50][51][52]. We intentionally did not separate peripheral blood cells or leukocyte subtypes. There is currently little pathophysiological evidence that the study of leukocyte subgroups would add to our predictive model and the isolation process could, in itself, affect the gene expression pattern. Using whole blood cells not only allows aggregate RNA expression analysis per patient without the need to pool rare subtypes, but is also more practical from a clinical perspective. Leukocyte levels in all groups were very similar, although it cannot be excluded that the percentage of specific subtypes differ between groups, and hence that different numbers of subtypes are responsible for the observed effect. Peripheral whole blood might also include differential expression signatures from reticulocytes, platelets or rare hematopoietic progenitors.
In a recent paper, Wingrove et al reported 526 differentially expressed genes (.1.3-fold expression) from a genome-wide microarray analysis of peripheral blood mononuclear cells of 27 cases with angiographically documented CAD and 14 controls [15]. The authors found that 14 genes, out of a a set of 106 genes including the 50 most significant genes from the microarray analysis and 56 genes selected from the literature, were associated with the presence of CAD and the severity of CAD in two independent cohorts. The overlap between our study and the Wingrove study at the individual gene level appears to be very limited. This might be in part due to the considerably different design of our study. Not only did we prefer a correlation-based approach, the Wingrove study also used a much smaller subset of patients for unbiased microarray-based gene discovery, and added 56 literature-based genes for the subsequent analysis in their two cohorts. As a result of our correlation analysis, we also did not exclude genes with differential expression below 1.3-fold; since atherosclerosis is a chronic disease, small changes in gene expression might accrue over time and result in a clinically relevant phenotype. Moreover, in contrast with our study, a substantial proportion of microarray samples in the Wingrove analysis were taken from patients presenting with an acute coronary syndrome, which might have significantly influenced expression levels. Another reason for the discrepancies between the two studies might be the different types of microarray used and different types of cells studied. In our study, we analyzed RNA from whole blood in all patients, in contrast with isolated mononuclear cells used in the discovery phase of the Wingrove study. An Ingenuity Pathway Analysis (IPA, Ingenuity Systems, Redwood City, Ca; USA) comparing the 366 genes with p,0.05 (from the 526 probesets) and our 160 genes with rho.0.2 shows that similar biological functions were hit, despite the different microarrays and different matrices used (data not shown). In any case, the discrepancies between both studies suggest that these results need to be validated in larger and more diverse populations.
Of the 160 genes we found to be correlated with the extent of CAD, only 19 were significantly differentially expressed between all cases and controls, while gene expression was significantly different for 90 genes when comparing 20 patients with the least predicted CAD-index to 20 patients with the highest predicted CAD-index. Most of our cases only have mild to moderate disease, with only a minority having extensive disease. Thus, in part as a result of our proteomics-driven patient selection, there is likely to be a very gradual transition from controls to cases, with the distrubution of cases being skewed towards the lower end of CADindex. We therefore assumed that the difference between controls and cases was not likely to be very large, hence our preference for a correlation-based analysis. Furthermore, since the average age of the controls was 52 years, it is highly likely that some degree of coronary atherosclerosis is present in these subjects. Interestingly, patients with normal angiograms but with microvascular dysfunction may also demonstrate peripheral monocyte activation, although not to the extent seen in patients with angiographically documented coronary artery disease [53]. Our findings that the present model also accurately predicts the severity of coronary artery disease in female patients, in whom advanced coronary artery disease is less likely at the age of 50, is reassuring. It is notable that CRP and LDL did not predict disease in our population. However, while these are excellent markers for future cardiovascular events [54], their ability to predict the severity of angiographically documented CAD is known to be low [55][56][57][58]. We even observed an inverse correlation between LDL-cholesterol levels and CAD-index. This might be at least in part due to differences in treatments, especially in statin use. Statins might indeed blunt gene expression differences in vascular cells and circulating monocytes to certain extent, which might have influenced our findings [59,60].
In conclusion, the combined predictive value of differentially expressed genes in peripheral blood correlates with the extent of coronary atherosclerosis. Importantly, the expression pattern of the same genes is also correlated with the extent of disease in atherosclerotic aortas. While these findings need prospective validation in further populations, our findings also suggest that gene expression profiles might represent a novel and promising non-invasive test to assess the presence and extent of coronary artery disease. Although the extent of angiographic disease is a strong predictor of clinical outcome, further studies in larger and unselected populations will also be needed to examine the   potential role of gene expression patterns in predicting outcome and to address potential confounding factors.