Genetic Association Study Identifies HSPB7 as a Risk Gene for Idiopathic Dilated Cardiomyopathy

Dilated cardiomyopathy (DCM) is a structural heart disease with strong genetic background. Monogenic forms of DCM are observed in families with mutations located mostly in genes encoding structural and sarcomeric proteins. However, strong evidence suggests that genetic factors also affect the susceptibility to idiopathic DCM. To identify risk alleles for non-familial forms of DCM, we carried out a case-control association study, genotyping 664 DCM cases and 1,874 population-based healthy controls from Germany using a 50K human cardiovascular disease bead chip covering more than 2,000 genes pre-selected for cardiovascular relevance. After quality control, 30,920 single nucleotide polymorphisms (SNP) were tested for association with the disease by logistic regression adjusted for gender, and results were genomic-control corrected. The analysis revealed a significant association between a SNP in HSPB7 gene (rs1739843, minor allele frequency 39%) and idiopathic DCM (p = 1.06×10−6, OR = 0.67 [95% CI 0.57–0.79] for the minor allele T). Three more SNPs showed p < 2.21×10−5. De novo genotyping of these four SNPs was done in three independent case-control studies of idiopathic DCM. Association between SNP rs1739843 and DCM was significant in all replication samples: Germany (n = 564, n = 981 controls, p = 2.07×10−3, OR = 0.79 [95% CI 0.67–0.92]), France 1 (n = 433 cases, n = 395 controls, p = 3.73×10−3, OR = 0.74 [95% CI 0.60–0.91]), and France 2 (n = 249 cases, n = 380 controls, p = 2.26×10−4, OR = 0.63 [95% CI 0.50–0.81]). The combined analysis of all four studies including a total of n = 1,910 cases and n = 3,630 controls showed highly significant evidence for association between rs1739843 and idiopathic DCM (p = 5.28×10−13, OR = 0.72 [95% CI 0.65–0.78]). None of the other three SNPs showed significant results in the replication stage. This finding of the HSPB7 gene from a genetic search for idiopathic DCM using a large SNP panel underscores the influence of common polymorphisms on DCM susceptibility.


Introduction
Dilated cardiomyopathy (DCM) is a common form of heart muscle disease with a prevalence of 1:2,500 in the general population. It represents a major cause of cardiovascular morbidity and mortality and is characterized by systolic dysfunction as well as dilation and impaired contraction of the ventricles, often leading to chronic heart failure and eventually requiring cardiac transplantation [1]. In about 35% of cases DCM is a familial disease [2]. However, in the sporadic form of DCM, i. e. after exclusion of affected family members and all detectable causes (also called idiopathic DCM), a genetic component is discussed, but can thus far not be assigned to single gene defects. Knowledge of genetic risk factors for both, familial and nonfamilial forms of DCM is important to initiate treatment prior to symptomatic onset of the disease, to delay its occurrence or possibly halt its progression. To date, only a few common susceptibility alleles for sporadic DCM were identified from candidate-gene approaches, but could not be confirmed in replication samples [2,3], this being a common problem of single gene based analyses [4]. In contrast, unbiased genome-wide association studies (GWAS) allow the identification of genetic risk factors even outside of known genes, but higher power is needed to compensate for multiple testing [5]. No comprehensive GWAS was performed to date on sporadic form of DCM.
The cardiovascular gene-centric 50K single nucleotide polymorphism (SNP) ITMAT-Broad-CARe (IBC) array represents an established compromise between GWAS and hypothesis-driven candidate gene approach by analyzing polymorphisms in more than 2,000 genes known or predicted to be involved in cardiovascular phenotypes [6].
In this study, we conducted a screening based on the cardiovascular 50K SNP array with three independent replication studies to reveal insight in genetic contribution to idiopathic DCM. The four samples from Germany and France included 1,910 sporadic DCM cases and 3,630 healthy controls individuals. We identified a common intronic variant in HSPB7, encoding a cardiovascular small heat shock protein, to be associated with sporadic form of DCM.

Screening stage
In our screening case-control sample, DCM cases were more likely men, were slightly younger and less frequently smokers, had a lower BMI and a higher prevalence of hypertension, hypercholesterolemia as well as type 2 diabetes (Table 1).

Author Summary
Dilated cardiomyopathy is a severe disease of the heart muscle and often leads to chronic heart failure, eventually with the consequence of cardiac transplantation. Identification of genetic disease markers in at-risk persons could play an important role in preventive health care. Several mutations in familial forms of the disease are described. Here, we examine the role of common genetic variants on the sporadic form of dilated cardiomyopathy. By screening about 2,000 candidate genes previously related to cardiovascular disease in more than 1,900 cases and 3,600 controls, we show that a polymorphism in the HSPB7 gene (rs1739843) is strongly associated with susceptibility to dilated cardiomyopathy. We also show that the effect on disease risk is present in both German and French cohorts. Therefore, this study is an important step towards revealing insight in the genetic background of the sporadic form of dilated cardiomyopathy. presence of six genes and 27 polymorphisms in LD with the lead SNP (r 2 -value.0.5) ( Figure 1A). Nine of these SNPs were present on the cardiovascular 50K array after quality control and were located in HSPB7 gene as well as two genes downstream, CLCNKA and CLCNKB ( Figure 1B; Table S1). In this sample, the genomic inflation factor l was 1.285 for the highest 90% of the 30,920 observed p-values. When correcting rs1739843 for this l factor, the p-value was 1.06*10 26 and OR = 0.67 [95% CI 0.57-0.79] ( Table 2).

Replication
The four SNPs with uncorrected p,2.15610 26 in the initial scan (rs1739843, rs11701453, rs7597774 and rs2229714) were analyzed using logistic regression adjusted for gender in three  Table S1; *, two SNPs) in HSPB7 gene region on the 50K gene-centric human CVD bead chip after quality control and l-corrected association results in 664 DCM cases and 1,874 controls. Plots were generated by using the SNAP tool [33] Table 2. None of the four polymorphisms showed deviation from Hardy-Weinberg equilibrium in any replication samples.
In a combined analysis of the screening step, corrected for the l factor of 1.285, and the three follow-up studies (n = 5,540), the SNP rs1739843 reached a p-value of 5.28*10 213 (OR = 0.72 [95% CI 0.65-0.78]) for association with idiopathic DCM (Table 2, Figure 2). There was no between-study heterogeneity for this effect (I 2 = 6.9%, p = 0.36).

Resequencing
To reveal potential causal variants, the coding region of HSPB7 was resequenced in a total of 48 DCM patients. We detected three known synonymous variants (rs945416, rs732286 and rs1739840). The synonymous variants rs945416 (position 19, serine) and rs732286 (position 33, alanine) are in high LD with rs1739843 (r 2 = 0.96, HapMap CEU data release #24). SNP rs1739840 (position 117, threonine) is not available in HapMap. In the initial sample of 664 DCM patients, all three synonymous polymorphisms are in perfect LD to each other and to rs1739843 as shown by genotyping. Neither missense nor splice site de novo mutations were identified by sequencing. Synonymous SNP rs11807575, as well as non-synonymous variants rs77021870 and rs74626772 were listed in databases, but not found to be polymorphic in our sample.

Analysis of DCM candidate genes
Since the design of the 50K human gene-centric bead chip (IBC array) aims at a large-scale gene-based approach, we screened candidate genes which are known for or potentially involved in Table 2. Association of SNPs showing p-values , 2610 26 in the initial screening sample and follow-up in three independent replication samples analyzed by logistic regression adjusted for gender.

Discussion
In the present case-control study, we evaluated the relationship of common SNPs with sporadic DCM using a large-scale screening approach. Our comprehensive strategy set out to analyze the human gene-centric 50K bead chip (IBC array), which focuses on loci with a potential functional link to cardiovascular disease (CVD) and covers more than 45,000 SNPs from about 2,000 genes [6].
Our study identified a polymorphism (rs1739843) in intron 2 of the HSPB7 gene being associated with susceptibility to DCM in a German case-control sample with three replication steps. Recently, Cappola et al. reported an association between rs1739843 and both, ischemic and non-ischemic heart failure, applying the same gene-centric 50K bead chip [7]. They found a protective effect of the minor allele, which is in conformity with our results on DCM. As DCM is a potential preliminary stage for non-ischemic heart failure, these independent findings point to a possible common pathophysiologic cascade. However, a second association signal for heart failure located in the FRMD4B region (rs6787362, minor allele frequency (MAF) 10.4%) identified by Cappola et al. [7] could not be detected in our DCM case-control sample (p = 0.64). Our study had a power of 99% to find a nominal association between DCM and rs6787362 with p,0.05 and an OR = 0.67.
The finding on HSPB7 is also in-line with a previously reported large-scale re-sequencing approach in four biologically relevant cardiac signaling genes, which detected HSPB7 sequence diversity in sporadic cardiomyopathy [8]. Our data together with the results from Cappola et al. [7] and Matkovich et al. [8], substantiate the importance of rs1739843 or related polymorphisms in the HSPB7 locus for DCM and heart failure and possibly underscore a common genetic basis for these related phenotypes.
Matkovich et al. further report that none of the detected HSPB7 gene variants altered amino acid sequence [8], which is also consistent with the fact that we found neither missense nor splice site mutations in the HSPB7 sequence. Therefore, the biological mechanism explaining the association between the polymorphism rs1739843 and DCM risk remains still unclear. The three detected synonymous variants (rs945416, rs732286 and rs1739840) are in high LD with each other as well as with our lead SNP rs1739843 and lie on one LD block. Therefore, it could be hypothesized that these SNPs represent causal risk factors for DCM, as described for the P-glycoprotein encoding gene MCP1 and affected drug and inhibitor interactions [9]. Synonymous SNPs lead to changes in codon usage and may cause functional implications by conformational changes in protein structure due to translation efficiency. Alternatively, a de novo splice site could be created by a SNP or other (unmapped) polymorphisms outside the HSPB7 coding region may alter its gene expression. Clearly, functional studies would be required to prove these hypotheses.
Besides the HSPB7 gene, where the lead SNP is located, also five genes (CLCNKA, CLCNKB, C1orf64, ZBTB17 and SPEN) lying on the same LD block may potentially be responsible for the association with DCM. CLCNKA and CLCNKB encode for two members of the family of voltage-gated chloride channels. These proteins are predominantly expressed in the kidney and participate in renal salt reabsorption [10]. The function of C1orf64 is currently unknown. ZBTB17, also known as MIZ-1, encodes a zinc finger protein involved in the regulation of c-myc [11]. SPEN (RBM15C or MINT) encodes a conserved transcriptional repressor that controls the expression of regulators in diverse signaling pathways [12,13].
HSPB7, encoding the small heat shock protein cvHsp (also known as HspB7), is the functionally most plausible candidate gene in this genomic region. It is known to be expressed in cardiovascular and insulin-sensitive tissues [14]. In general, the expression and activation of heat shock proteins is influenced by elevated temperatures as well as ischemia, hypoxia and acute cellular stress [15,16]. In the aging skeletal muscle increase of cvHsp protein content was observed [17]. cvHsp was shown to be constitutively localized under non-stressful conditions to nuclear splicing speckles and may influence mRNA processing [18]. Recent data suggest co-localization between cvHsp and a-Bcrystallin in the z-band of cardiac tissue and interaction with other small heat shock proteins [19]. However, further investigations like genomic fine-mapping and subgroup analyses in the context of cardiomyopathies are needed.
Genetic analyses in familial forms of DCM led to the identification of risk loci showing X-linked, autosomal dominant or autosomal recessive patterns of inheritance [2,20,21]. Some of the DCM causing genes or plausible candidate genes were also covered by the 50K bead chip, wherefore we specifically tested those SNPs lying in risk gene regions (10 kb upstream and downstream, respectively). In these analyses, no significant association with any of the gene variants was found, indicating that in sporadic cases of DCM probably other pathways are involved than in familial DCM. However, less frequent variants may have been missed due to insufficient power of our screening sample. Furthermore, the distinction between familial and sporadic forms of DCM is, to a certain degree, somewhat arbitrary. Screening of family members is rarely done in clinical routine, but when carried out on a systematic basis, up to 7% of previously healthy first-degree relatives have reduced left ventricular function or dilation without presence of cardiac symptoms [22]. Therefore, it might be anticipated that genetic testing could help to identify individuals at risk in familial DCM but also in families of patients affected by so-called idiopathic forms of the disease.
Already known genetic factors account for only a fraction of DCM heritability [20]. Given a 1.5-fold increased risk of DCM among heterozygous subjects in our screening sample (48% in the general population-based KORA study) and a 2.25 times increased risk among homozygous subjects (34% in KORA), 49% of DCM cases would be attributable to the SNP rs1739843 (or correlated polymorphisms) with 19% attributable to heterozygous and 30% to homozygous carriers, respectively. Therefore, the genetic component seems to comprise a large proportion for this disease. However, with the prevalence of the idiopathic form of the disease being about 1:2,700 [23], a genetic screening of the general population would include four cases out of 10,000 screened persons and two of these would have the disease due to this SNP. Therefore, the great potential of this variant might rather be screening of high risk populations, or this pathway indicates potential drug targets. Further investigations should aim (1) to identify additional variants underlying DCM susceptibility with otherwise unknown etiology and (2) to analyze potential influence of these common alleles as modifiers for familial forms of DCM. Taken together for both, modifiers of familial forms and susceptibility alleles in idiopathic DCM, knowledge of genetic background will support preventive medical measures in the future.
Some limitations of our study should be mentioned. First, we conducted a large-scale SNP analysis focused on genes potentially involved in cardiovascular traits. Therefore, on the one hand we were able to detect associations between DCM and polymorphisms only in these pre-selected genes. On the other hand, the 50K human CVD bead chip allows comprehensive gene-based analysis with more than 2,000 well covered loci. Second, our sample size only allowed to detect moderate to large effects (e. g. for OR.1.3 and MAF = 30% or OR.1.5 and MAF = 20% or OR.1.7 and MAF = 10%, the power was 19%, 75% and 80% for p,2.15*10 26 , respectively). Therefore, we may have overlooked real association signals in our screening step. Third, there could be some population stratification in our initial screen sample. However, the observed l could also be caused -in part -by underlying association due to the analysis of pre-selected loci known or suggested to be involved in cardiovascular phenotypes. The fact that the association between rs1739843 in HSPB7 and idiopathic DCM was replicated in three independent samples strongly enhances the confidence in our results.

Ethics statement
The ethics committees of the participating study centers approved the study protocol and all participants gave their written informed consent. The study was in accordance with the principles of the current version of the Declaration of Helsinki.

Case-control samples and phenotyping
Cases for the initial German screening study were recruited from the German Heart Institute (Berlin), and controls were from a population-based German KORA study (follow-up survey F3, Augsburg) [24]. Phenotypic details are summarized in Table 1. Controls (n = 1,874) had no medical history for coronary artery disease (CAD), myocardial infarction or DCM; mean age was 62611 years and slightly more women (n = 986) than men (n = 888) were present in the control group. Inclusion criteria for DCM cases were the following: reduced systolic function (left ventricular ejection fraction (LVEF) ,45%) without angiographically assessed evidence of major CAD, significant valvular heart disease (.grade 2, i. e. such as mitral or aortic regurgitation), hypertensive heart disease, congenital heart disease, myocarditis (by endomyocardial biopsy, when available) or other secondary forms of heart failure. Patients with a positive family history were also excluded from this study. In DCM cases (n = 664), mean LVEF was 2463% and mean age of disease diagnosis was 46611 years.
A second replication study (France 1) was recruited in France (CARDIGENE) [25,26]. The French cases were of white European origin (all born in France, from parents born in France or neighboring countries) with a diagnosis of DCM, i. e. enlarged left ventricle end-diastolic volume/diameter .140 ml/m 2 on ventriculography or .34 mm/m 2 on echocardiography and LVEF #40% confirmed over a six-month period, in the absence of causal factors such as CAD or sustained hypertension, intrinsic valvular disease, documented myocarditis, congenital malformation, insulin-dependent diabetes. Only apparently sporadic DCM cases without additional (first degree) relative with DCM were included (but 8% were in fact with familial form after careful cardiac examination in relatives). Recruitment was performed in ten hospitals in six regions in France (Lille, Lyon, Nancy, Nantes, Paris-Ile de France, Strasbourg) from September 1994 to February 1996. A total of 433 patients (229 had undergone a cardiac transplantation) were included (n = 345 men, n = 88 women). Mean age of patients was 45611 years, mean LVEF was 2367% and mean end-diastolic volume was 195667 ml/m 2 . Controls (n = 395) were age-and gender-matched (n = 310 men, n = 85 women).
The third replication sample was also of French origin (France 2). Inclusion criteria were identical to the France 1 sample. A total of 249 patients from EUROGENE and PHRC were included (n = 198 men, n = 51 women). Mean age of patients at diagnosis was 51610 years. Controls (n = 380) were free of medical history for CAD, myocardial infarction or DCM and mean age was 46611 years (n = 301 men, n = 79 women).

Genotyping
Initial genotyping was carried out using the 50K gene-centric human CVD bead chip version 1 (IBC v1 array) (Illumina, San Diego, CA, USA) [6] following the manufacturer's protocol. Data were analyzed (calling and sample clustering) and exported employing BeadStudio analysis software (Illumina). From the initial 45,707 SNPs, those markers with low call rates (,95%) or low frequency (MAF,1%) were excluded. Minimal call rate per individual was 90%. We used identity-by-descent methods to exclude unknown first-degree relation of participants.
SNP rs1739843 was re-genotyped using the by-design TaqMan assay in initial case sample (n = 664) to check for discrepancies between human CVD bead chip and TaqMan genotypes. A .99.8% concordance of genotypes was found. For all genotyped samples a call rate .97% for each SNP assay was reached.

Resequencing
Polymerase chain reaction (PCR) primer were generated using Primer3Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/ primer3plus.cgi) [28] to cover the coding parts of the three HSPB7 exons (GenBank accession No. NM_14424.4). The primer sequences and PCR amplification products are listed in Table  S3. Included intronic regions were 267 bp for 59 end of intron 1, 156 bp for 39 end of intron 1, 136 bp for 59 end of intron 2, and 89 bp for 39 end of intron 2, respectively. PCR cycling conditions consisted of an initial denaturation at 95uC for 9 min, followed by 40 cycles with denaturation at 95uC for 30 s, annealing at 60uC for 30 s, and elongation at 72uC for 30 s, with a final elongation step at 72uC for 7 min.
After PCR amplification, primers and dNTPs were removed using ExoSAP-IT (USB Europe, Staufen, Germany) following the manufacturer's instructions. The purified PCR products were directly sequenced using the ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kit Version 3.1 on the ABI 3730 (Applied Biosystems, Foster City, CA, USA).

Statistical analyses
For initial screening and replication analyses, logistic regression adjusted for gender was used. P-values, odds ratios (OR) and their 95% confidence intervals (CI) were reported. The inflation factor l was computed in the 50K initial screening analysis for logistic regression analysis assuming a x 2 distribution with two degrees of freedom of the minus two-times log e p measures (90% highest pvalues). The p-values and CI from initial screening analysis were genomic-control corrected using this l factor via standard errors (standard error [corrected] = sqrt(l)*standard error) and beta estimates (95%CI beta [corrected] = beta61.96*standard error [corrected] ). Deviation from Hardy-Weinberg equilibrium was calculated with an exact test [29]. Statistical and association analyses were performed using JMP 7.0.2 (SAS Institute Inc, Cary, NC, USA) and PLINK v1.07 (http://pngu.mgh.harvard.edu/,purcell/ plink/) [30], respectively. Power analysis was carried out using Quanto 1.2.4 (http://hydra.usc.edu/gxe/). We combined the initial scan results corrected for l with the replication studies' results using a fixed effect model. Annotation of association results on a genome level was performed with WGAViewer software (http://people.genome.duke.edu/,dg48/WGAViewer/) [31]. LD patterns were calculated using HapMap releases #22 and #24 (http://www.hapmap.org/) [32].