Genome-Wide Identification of Expression Quantitative Trait Loci (eQTLs) in Human Heart

In recent years genome-wide association studies (GWAS) have uncovered numerous chromosomal loci associated with various electrocardiographic traits and cardiac arrhythmia predisposition. A considerable fraction of these loci lie within inter-genic regions. The underlying trait-associated variants likely reside in regulatory regions and exert their effect by modulating gene expression. Hence, the key to unraveling the molecular mechanisms underlying these cardiac traits is to interrogate variants for association with differential transcript abundance by expression quantitative trait locus (eQTL) analysis. In this study we conducted an eQTL analysis of human heart. For a total of 129 left ventricular samples that were collected from non-diseased human donor hearts, genome-wide transcript abundance and genotyping was determined using microarrays. Each of the 18,402 transcripts and 897,683 SNP genotypes that remained after pre-processing and stringent quality control were tested for eQTL effects. We identified 771 eQTLs, regulating 429 unique transcripts. Overlaying these eQTLs with cardiac GWAS loci identified novel candidates for studies aimed at elucidating the functional and transcriptional impact of these loci. Thus, this work provides for the first time a comprehensive eQTL map of human heart: a powerful and unique resource that enables systems genetics approaches for the study of cardiac traits.


Introduction
It is well established that many cardiac traits and susceptibility to heart disease are heritable [1,2,3,4,5,6,7]. Several genome-wide association studies (GWAS) have uncovered common genetic variation, in the form of single nucleotide polymorphisms (SNPs), impacting on cardiac traits such as susceptibility to atrial fibrillation [8], ventricular fibrillation [9], heart rate [10] and electrocardiographic (ECG) indices of cardiac conduction [11,12,13,14] and repolarization [15,16]. There is widespread consensus that functional studies of GWAS-defined loci will advance our understanding of the molecular underpinnings of the associated traits.
SNPs identified by GWAS are considered to impact the respective clinical phenotype, either directly or indirectly by virtue of linkage disequilibrium (LD) with the causal variant(s) in the context of a haplotype. Many trait-associated haplotypes occur in non-coding regions of the genome [17] and are hypothesized to modulate the respective trait through effects on gene expression [18]. Such SNPs are particularly challenging to understand because they may exert effects on the trait either by affecting the expression of a neighbouring gene (cis-effect) or the expression of a gene located elsewhere in the genome (trans-effects). One way of understanding GWAS signals thus entails interrogating traitassociated variants for association with differential transcript abundance by expression quantitative trait locus (eQTL) analysis. Studying gene expression level effects of disease-associated haplotypes has successfully uncovered the molecular mechanisms underlying loci associated with increased risk of myocardial infarction [19], coronary artery disease [20] and colorectal cancer [21]. In recent years, multiple genome-wide eQTL resources have become available for various tissues including brain, liver and adipose tissue [22,23,24,25,26,27,28,29]. Because eQTLs may be tissue-specific, a similar resource for human heart is anticipated to have great value [23,29,30,31].
To this end, we have generated a human heart eQTL resource by genome-wide genotyping and determination of transcript abundance in 129 human donor heart samples. We subsequently overlaid previously identified cardiac trait GWAS signals with the identified eQTLs to identify candidate causal genes for the effects at these GWAS loci. This work provides an eQTL map of human heart, a resource that is likely to play an important role in furthering our understanding of the mechanisms associated with loci identified in GWAS on cardiac traits.

General design of study
We collected left ventricular samples from 180 non-diseased human hearts of unrelated organ donors whose hearts were explanted to obtain pulmonary and aortic valves for transplant surgery or explanted for heart transplantation but not used due to logistical reasons (e.g. no tissue-matched recipient was available). The subjects were assumed to be mainly of Western European descent. mRNA and DNA were isolated according to standard procedures. Transcript abundance was measured using the HumanHT-12 v4.0 whole genome array (Illumina) and genotyping was carried out using the HumanOmniExpress genome-wide SNP arrays (Illumina).

Data preprocessing and normalization
Gene transcript abundance: Of the 47,231 transcripts whose expression levels were measured on the array, only those that were expressed above background level and for which the probe sequence mapped unambiguously to the genome and did not contain common SNPs, were used in further analyses. This procedure left 18,402 transcripts for eQTL analysis. Model-based background correction and normalization across arrays and transcripts was performed to correct for technical variance present in gene expression levels. A total of 162 arrays passed the standardized microarray gene expression quality control.
Genotyping: Manhattan distance clustering and principal component analysis of the genotype data of 154 samples that were successfully genotyped, revealed 13 genetic outliers ( Figure  S1). To ensure a genetically homogenous group for further analysis, samples pertaining to these clusters were removed. An additional 12 samples were removed due to low call rate (,95%), high proportion of alleles identical-by-state (.95%), or extreme heterozygosity (FDR 1%). Only SNPs with a minor allele frequency (MAF) higher than 0.15 were considered in eQTL analysis. This cutoff was chosen to ensure sufficient power to detect eQTLs within a broad range of effect sizes ( Figure S2)

Genome-wide eQTL mapping
Each of the measured transcripts was tested for association with all SNPs using linear modeling, taking age, sex and clinical/ university center as covariates. We thus identified 6402 significant eQTLs (FDR #0.05). To remove redundant signals and identify independent expression-controlling loci, we performed linkagedisequilibrium (LD)-pruning. For this we grouped SNPs exhibiting LD (r 2 .0.6) into clusters, revealing 771 independent loci regulating 429 unique transcripts. These results are comparable   to eQTL studies in other non-diseased tissues of similar sample size [22,23,24,28,29]. Of these 771 eQTLs, 770 were cis-eQTLs for 428 unique transcripts (p,2.82610 25 ; FDR #0.05), where the associated SNPs lie within 1 Mb of the transcriptional start site (TSS) of the cognate transcript. For the four most significant cis-eQTLs, boxand-whisker plots and mean-standard-error plots for the individual genotypes are given in Figure 1. An overview of the most significant cis-eQTLs is given in Table 1 and the complete results are given in supplemental Table S1.
Of the independent significant eQTLs, one was found to be in trans (p,2.12610 211 ; FDR #0.05), with the expression of LOC644936 located on chromosome 5 being seemingly modulated by an eQTL (rs852423) on chromosome 7. However, as LOC644936 is a known pseudogene of ACTB and rs852423 is located within ACTB, we cannot rule out the possibility that rs852423 is in fact a cis eQTL for ACTB rather than a trans eQTL for LOC644936. Using BLAST to align the microarray probe sequence of LOC644936 to the human transcriptome uncovered a partial match with ACTB in addition to a 100% match with LOC644936.

Integration of eQTL data with cardiac GWAS loci
In order to provide candidate genes for the reported heartrelated GWAS loci, we listed the 102 SNPs previously associated with a cardiac trait at genome-wide statistical significance (p gwas # 5610 28 ), representing 74 independent loci (LD-pruned with r 2 . 0.6, see Materials & Methods). These corresponded to loci associated with ventricular fibrillation/sudden cardiac death, atrial fibrillation, heart rate, PR interval, QRS duration and QTc interval. Of these, the 64 SNPs that displayed a MAF of 15% or higher in the eQTL sample were overlaid with the eQTL data to identify transcripts under genetic regulation by these loci. All GWAS SNPs were tested for association with transcript levels of all 18,402 transcripts in this study. We identified a cis association between rs9912468, a modulator of QRS duration [12] with the level of expression of the PRKCA transcript at genome-wide statistical significance (p = 2.90610 29 , see Figure 2A). Besides PRKCA, no other GWAS SNP displayed an eQTL association pvalue that passed the stringent Bonferroni-corrected p-value threshold (p,0.05/64 SNPs 618,402 transcripts , 4610 28 ). A total of 34 SNPs were associated with the transcript level of a gene at a p#0.05 ( Table 2). Among these, rs8049607, a modulator of QTc-interval [16] was found to be associated in cis with the transcript level of LITAF (p,5610 24 , Figure 2C), and rs7612445 and rs6882776, both associated with heart rate [10] were associated in cis with the transcript levels of GNB4 (p,2610 24 , Figure 2B) and NKX2-5 (p,6610 23 , Figure 2D), respectively. The number of nominal associations for the 64 cardiac traitassociated SNPs tested represents a more than 7-fold enrichment (p,0.05, see Materials & Methods) compared to a random selection of 64 variants from the entire set of SNPs used in eQTL analysis.

Discussion
We conducted a genome-wide eQTL analysis in 129 samples of normal human myocardium, identifying genetic variation regulating gene expression in human heart and uncovering 771 genome-wide significant independent eQTLs. This resource, heretofore unavailable in human heart will contribute to advancing our understanding of the genetic mechanisms underlying loci associated with cardiac traits. All but one of the eQTLs identified were cis eQTLs. Other eQTL studies have identified only few trans eQTLs [ difficulty of detecting trans-regulatory variants in eQTL studies [31,32]. Based on larger eQTL studies in other tissues [22,24,25,26,29] as many as 4000 independent cardiac cis eQTLs are expected to be present, hence the results presented here are a subset of this theoretical complete set of cardiac eQTLs. In recent years, many novel loci associated with a number of cardiac traits, including cardiac arrhythmia and ECG indices, have been discovered. However, the identification of (novel) genes at these loci has lagged behind. The availability of a cardiac eQTL resource is likely to aid in the dissection of these loci by providing a means of prioritizing candidate genes for follow-up functional studies. Indeed, our current findings already provide candidate genes for a number of these loci ( Table 2). One such example is the PRKCA gene for the effect observed on QRS duration for the rs9912468-tagged haplotype on chromosome 9. PRKCA encodes protein kinase C alpha, a fundamental regulator of cardiac contractility and Ca 2+ handling in cardiomyocytes [33]. The mechanism by which it regulates QRS duration is unknown. Other candidates include the LITAF gene (encoding lipopolysaccharide-induced TNF factor) for the rs8049607-tagged haplotype associated with QTc-interval and the GNB4 gene (encoding guanine nucleotide binding protein) for the rs7612445-tagged haplotype associated with heart rate. None of these eQTLs (for PRKCA, LITAF and GNB4) have been previously identified in noncardiac tissues.
The utility of this approach is further evidenced by the fact that the 64 GWAS SNPs were enriched in nominally significant eSNPs as compared to a random selection of 64 variants from the entire set of SNPs used in eQTL analysis. Such an enrichment was reported before for GWAS loci in general based on eQTLs identified in lymphoblastoid cell lines from HAPMAP samples [18].
The eQTLs we identified represent an enriched set of highly relevant candidates to test in future studies for association with cardiac traits and disease. Among the highly significant eQTLs listed in Table 1, at least two SNPs could also be interesting from a pharmacogenetic point of view. One is rs1222809 which was found to be strongly associated with the expression level of the DHFR gene encoding dihydrofolate reductase, a putative target of the drug methotrexate. Of note previous studies have provided evidence that rs1650697, which is in complete LD with rs1222809, may be associated with adverse events to methotrexate in patients with rheumatoid arthritis [34,35]. The other potentially interesting eQTL from a pharmacogenetic point of view is rs4822466 which was found to be highly associated with the expression of GSTT1, a gene encoding the liver detoxifying enzyme Glutathione Stransferase T1.
The eQTLs we identified are expected to be enriched in the regulatory regions of the genome such as promoter regions, enhancers and transcription factor binding sites [36]. Recent work has begun to uncover these relationships for adult human heart [37]. However, formal testing for enrichment of eQTLs in the known regulatory regions [37] did not provide statistically significant enrichment (data not shown). At least in part, this may be due to the limited number of eQTLs we have identified.
A limitation of the presented study concerns the fact that not all transcripts have been tested for eQTL effects. Transcripts that were expressed below the (array-based) detection level or for which probe design was not optimal could not be tested. Conversely, not all haplotypes in the genome were tested as for instance we only tested SNPs with a MAF higher than 0.15. Furthermore, our sample size and therefore statistical power was limited, preventing the identification of eQTLs of smaller effect and trans eQTLs. The interpretation of the data concerning SNPs from GWAS presented    in Table 2 must take these considerations into account. Additionally, the single trans eQTL we identified is likely a false discovery and will require further investigation.
Our study was conducted in left ventricular myocardium. However, it is well known that different cardiac compartments such as the atria or the specialized conduction system display different gene expression patterns [38,39,40,41] and eQTL effects might thus differ across cardiac compartments. Furthermore, we have no information relating to cardiac traits such as ECG indices in the 129 individuals from whom the left ventricular samples were obtained; we were therefore unable to correlate gene expression with cardiac traits in these individuals [23,42].
In summary, we here provide the first eQTL map of human left ventricular myocardium that will enable systems genetics approaches in the study of cardiac traits.

Sample collection
Left ventricular samples were obtained from 180 non-diseased human hearts of unrelated organ donors whose hearts were explanted to obtain pulmonary and aortic valves for transplant or valve replacement surgery or explanted for transplantation but not used due to logistical reasons. The tissues were ascertained at the University of Szeged (Hungary; n = 79), Vanderbilt University (Nashville, USA; n = 46), University of Miami (USA; n = 30), and the University of Sydney (Australia; n = 25) and assumed to consist mainly of subjects of Western European descent based on selfreported ethnicity. The Vanderbilt samples were procured with the assistance of the National Disease Research Interchange (Philadelphia, PA).

Generation and processing of gene expression data
Total RNA was extracted from the human left ventricular heart samples using the mirVana miRNA isolation kit (Ambion) at the AMC, Amsterdam, The Netherlands. Sample processing order was randomized. RNA quality was assessed by Agilent Bioanalyzer (minimum RIN = 7) and spectrophotometry (minimum 260 nm:280 nm = 1.8). The Illumina TotalPrep-96 RNA Amplification Kit was used to generate cRNA starting from 200 ng total RNA. Genome-wide gene expression data was generated using Illumina HumanHT-12 v4 BeadArrays, containing 47,231 probes representing 28,688 RefSeq annotated transcripts (ServiceXS, Leiden, The Netherlands), following the instructions of the manufacturer.
Raw expression data were imported into the Illumina Bead-Studio and summarized at probe-level for each sample without normalization or background correction. The summarized data were subsequently imported into R (version 2. 15 beadarray package [44]. Quality control was performed using the ArrayQualityMetrics package in R [45]. Samples displaying transcriptional stratification using hierarchical clustering were omitted from the analysis. The summarized data of the 162 remaining samples was background corrected and quantile normalized using the neqc algorithm [46] across all samples. The neqc algorithm is the current standard data-preprocessing method for Illumina gene expression BeadArrays [47], and has been applied in eQTL studies with comparable sample size [29,30]. Probes containing common SNPs (HAPMAP Phase III release 2) [27,29] and probes whose sequence did not align or aligned ambiguously to the human reference genome (HG19), according to up-to-date Illumina HumanHT-12 v4.0 BeadArray annotation available from the Bioconductor project, were left out of the analysis. Additionally, probes with median expression levels below a study specific threshold (the median expression levels of Y chromosome transcripts in the female subjects of the sample population) were not considered for subsequent analyses.
Genotyping and genotype imputation DNA was extracted for genotyping from 162 heart samples that passed the gene expression analysis quality control criteria (see above) at the AMC, Amsterdam, The Netherlands. Genome-wide SNP genotyping was carried out using Illumina HumanOmniExpress Beadchips interrogating 733,202 genetic markers (Genome Analysis Center, Helmholtz Zentrum München, Germany). A total of 8 samples had sample quality issues (and were not hybridized) or failed hybridization, leaving genotype data for 154 samples. Quality control was performed in the GenABEL [48] package in R using default settings. Samples with low call rate (, 95%), extreme heterozygosity (FDR 1%) or high proportion of alleles identical-by-state (.95%) were removed. Additionally, any remaining samples showing genetic stratification through Manhattan distance hierarchical clustering (using the popgen [49] package in R), and confirmed with principal component analysis [48], were not considered ( Figure S1).
Power calculations were performed (with a fixed FDR of 0.05) to assess the influence of MAF on power in relation to observed gene expression fold changes. Based on these results, a MAF threshold of 0.15 was chosen to ensure sufficient power to detect cis eQTLs within a broad range of effect sizes ( Figure S2). Additionally, assuming Hardy-Weinberg equilibrium, a MAF of 0.15 or higher yields an expected number of three individuals homozygous for the minor allele, which we considered the minimum for fitting a meaningful additive genetic model. Imputation was performed using the MACH software [50] and the HAPMAP Phase III data. Only SNPs imputed with sufficient confidence were considered, using the estimate of the squared correlation between imputed and true genotypes. By setting the cut-off at 0.30, most of the poorly imputed SNPs are filtered out, compared to only a small number (,1%) of well imputed SNPs [51].

eQTL statistical analysis
After pre-processing and stringent quality control of gene expression and genotypic data as described above, a total of 129 heart samples were used in eQTL analysis. Each transcript was tested for association with SNP genotypes genome-wide using linear modeling (assuming an additive genetic model), taking age, gender and tissue collection center as covariates, using the GenABEL package [48] in R. Correction for multiple testing was performed on the complete set of cis eQTL p-values in the qvalue package in R [52]. A q-value (FDR) #0.05 was considered significant for cis eQTLs, corresponding to a p-value of 2.82610 25 . Cis relations were defined as those within 1 Mb of a transcription start site (TSS), in accordance with previous reports demonstrating that over 90% of cis SNPs are situated within 100 Kb of a TSS [26,27,29,47,53]. SNPs with an LD R 2 of larger than 0.6 were considered dependent and LD-pruned into clusters (LD clusters), in accordance with previous studies [23,29,30]. For trans eQTLs, only results with a p-value ,5610 28 were considered (corresponding to a target a (or p value) of 0.05 with a Bonferroni correction for 1 million independent tests [54,55]). Correction for multiple testing was done by using a step-up Benjamini & Hochberg procedure on all p-values ,5610 28 , and a q-value (FDR) #0.05 was considered genome-wide significant for trans eQTLs, corresponding to a p-value of 2.12610 211 .

eQTL biological interpretation and candidate gene prioritization
To prioritize candidate genes for further studies, additional data sources were integrated. Additional trait and disease associated SNPs were extracted from PubMed (www.ncbi.nlm.nih.gov/ pubmed; search terms: 'GWAS' AND 'cardiac', 'atrial fibrillation', 'sudden cardiac death', 'ECG [electrocardiographic]', 'PR interval', 'QRS', 'QT', 'repolarization'), the NHGRI catalog of published GWAS (http://www.genome.gov/gwastudies/), and GWAS central (https://www.gwascentral.org) on January 8, 2013. Analyses were restricted to samples of European ancestry. Results were classified into six categories: sudden cardiac death, atrial fibrillation, heart rate, PR duration, QRS duration and QTc duration. Next, each GWAS SNP passing genome-wide significance in the respective study (5610 28 , a target a of 0.05 with a Bonferroni correction for 1 million independent tests) was tested for association with expression of all 18,402 measured transcripts.
To determine the number of independent loci, LD-pruning was performed by merging all GWAS SNPs with LD r 2 .0.6 (HAPMAP R22 and HAPMAP Phase III). The p-value threshold for significant eQTL effects was set at 4610 28 , a target a of 0.05 with a Bonferroni correction for 1,177,728 tests (64 independent loci 618,402 transcripts).
To quantify the enrichment of eQTLs among the cardiac trait GWAS SNPs, we generated 100,000 randomized independent SNP sets of the same size as the number of independent GWAS loci, and with corresponding MAF distribution and proximity to genes. The number of nominally significant eQTL associations for the original independent GWAS loci is referred to as Q. Next, for each random set S i , we determined the number of eQTLs at nominal significance (p#0.05), referred to as Q i . The simulations yielded a fold-enrichment score, calculated as the average over all random sets of the ratio between Q and Q i , and an empirical pvalue, calculated as the proportion of simulations in which the number of eQTLs exceeds the number of nominally significant eQTL associations in the original independent GWAS loci.

Public access to microarray data
The microarray genotyping and gene expression data of the study have been deposited online at the Gene Expression Omnibus (GEO), with accession number GSE55232. Figure S1 Manhattan distance hierarchical clustering dendogram of 154 genotyped subjects. Manhattan distance hierarchical clustering revealed several genotypic outliers. The clustering was repeated using principal component analysis, identifying the same groups of outliers. (TIF) Figure S2 Results of eQTL power analyses in relation to MAF and gene expression fold change. eQTL power analyses were performed for different minimum minor allele frequencies (0.05, 0.10, 0.15, 0.20, 0.30 and 0.40). The gene expression fold change is defined as log 2 difference in gene expression observed per copy of the minor allele. In each analysis, for each log 2 fold change X, all eQTLs with an absolute log 2 fold change larger than X were considered, and the power was calculated as the percentage of those eQTLs for which the null hypothesis is rejected at FDR #0.05.

(TIF)
Table S1 Table of all significant eQTLs. This table contains the complete results for all significant non-diseased human heart eQTLs (FDR #0.05). It contains for each SNPtranscript pair the SNP ID, gene or transcript IDs (HGNC, Entrez Gene, RefSeq), genomic locations, minor and major allele, minor allele frequency, beta (effect size per copy of the minor allele), pvalue and distance between SNP and gene. The table is sorted on HGNC official gene symbol. (XLS)