Genome-Wide Meta-Analysis of Systolic Blood Pressure in Children with Sickle Cell Disease

In pediatric sickle cell disease (SCD) patients, it has been reported that higher systolic blood pressure (SBP) is associated with increased risk of a silent cerebral infarction (SCI). SCI is a major cause of neurologic morbidity in children with SCD, and blood pressure is a potential modulator of clinical manifestations of SCD; however, the risk factors underlying these complications are not well characterized. The aim of this study was to identify genetic variants that influence SBP in an African American population in the setting of SCD, and explore the use of SBP as an endo-phenotype for SCI. We conducted a genome-wide meta-analysis for SBP using two SCD cohorts, as well as a candidate screen based on published SBP loci. A total of 1,617 patients were analyzed, and while no SNP reached genome-wide significance (P-value<5.0x10-8), a number of suggestive candidate loci were identified. The most significant SNP, rs7952106 (P-value=8.57x10-7), was in the DRD2 locus on chromosome 11. In a gene-based association analysis, MIR4301 (micro-RNA4301), which resides in an intron of DRD2, was the most significant gene (P-value=5.2x10-5). Examining 27 of the previously reported SBP associated SNPs, 4 SNPs were nominally significant. A genetic risk score was constructed to assess the aggregated genetic effect of the published SBP variants, demonstrating a significant association (P=0.05). In addition, we also assessed whether these variants are associated with SCI, validating the use of SBP as an endo-phenotype for SCI. Three SNPs were nominally associated, and only rs2357790 (5’ CACNB2) was significant for both SBP and SCI. None of these SNPs retained significance after Bonferroni correction. Taken together, our results suggest the importance of DRD2 genetic variation in the modulation of SBP, and extend the aggregated importance of previously reported SNPs in the modulation of SBP in an African American cohort, more specifically in children with SCD.


Introduction
Sickle cell disease (SCD) is an inherited hemoglobin disorder affecting approximately 1 in 600 individuals of African American ancestry in the United States [1]. The clinical manifestations of SCD begin early in life [2] and continue with an increasing incidence of adverse events, involving genetic as well as environmental factors [3]. Previous studies have demonstrated that the arterial blood pressure in steady state patients with SCD is significantly lower than that of age, sex and race matched controls [4][5][6][7][8][9]. These findings are counterintuitive in view of the well-known vascular and renal abnormalities associated with SCD and the high prevalence of hypertension in African American adults [10][11][12]. Recently, in SCD children with no history of overt stroke or seizures, we have reported that higher systolic blood pressure (SBP) is associated with increased risk of an silent cerebral infarction (SCI) [13], a common form of neurological injury among children with SCD, occurring in at least 27% prior to six years of life and 37% by 14 years of life [14,15]. In addition, previous clinical studies have shown that the SCD may have a deleterious effect on myocardium, which contributes to abnormal rates of change in left ventricular cavity size and systolic/diastolic function in SCD patients [16][17][18]. Indeed, nearly one third of adults with SCD also develop an elevated tricuspid regurgitant velocity (TRV) that is associated with a much higher death rate in SCD patients compared to patients with SCD without elevated TRV. About 5-10% of these patients have true pulmonary hypertension [19]. Identifying genetic factors associated with systolic blood pressure (SBP) may help define both pathophysiological mechanisms, as well as identify patients at increased risk for SCI.
Studies of familial aggregation provide significant evidence that blood pressure is a highly heritable trait [20]; however, these estimates provide no information as to whether the same genetic variants influence blood pressure across human populations. In 2009, two genome-wide association studies (GWAS) and meta-analysis of inter-individual blood pressure variation in adults were conducted by the Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) Consortium [21] and the Global Blood Pressure Genetics (Global BPgen) Consortium [22], leading to the identification of a number of genomic loci implicated with these traits, including 7 for SBP. Subsequently, these consortia were combined and expanded to form the International Consortium for Blood Pressure (ICBP), who reported many additional novel loci for these traits utilizing individuals of European (N=200,000), East Asian (N=30,000), South Asian (N=24,000) and African (N=20,000) descents [23]. In the present study, we sought to apply two complementary approaches for identifying SBP variants in individuals with SCD. First, we performed a genome-wide association study for SBP in SCD cohorts of African American ancestry. Second, we attempted to validate the ICBP identified SBP loci in these patients. Finally, we assessed whether these reported variants are also associated with SCI, exploring the use of SBP as an endo-phenotype for SCI.

Study and Population Samples
This study includes two unrelated admixed African American ancestry SCD cohorts. Study protocols of both cohorts were approved by the Institutional Review Board (IRB) of Johns Hopkins University and Boston Medical Center. Additionally, IRB approval was acquired from all of the participating sites for subject enrollment and conducted in accordance with institutional guidelines.

Silent Infarct Transfusion (SIT) Trial cohort
The Silent Infarct Transfusion (SIT) Trial is an international, multi-center clinical study funded by the National Institute of Neurological Disorders and Stroke (NINDS) (http:// sitstudy.wustl.edu/) [24]. All the participants included in our study are of African American ancestry and written informed consent was obtained from parents of the SCD-affected individuals. For each patient, DNA was collected from Epstein-Barr virus (EBV) transformed lymphoblasts using Puregene Genomic DNA Purification kits (Gentra Systems, Inc). Demographic and phenotypic information were collected for each participant and the inclusion criteria for the recruitment were age (5-15 years) and hemoglobinopathy diagnosis (either Hb SS or Hb SB 0 -thalassemia). Details of the SIT Trial study design are given elsewhere [24].

Cooperative Study of Sickle Cell Disease
The Cooperative Study of Sickle Cell Disease (CSSCD) was a multi-institutional prospective longitudinal study of SCD funded by the National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health (NIH) [25]. In our study, we only included CSSCD participants who are of African American ancestry, and to match the samples with the SIT Trial cohort, age inclusion criteria of <15 years was used. Details of the CSSCD study design are given elsewhere [25].

Phenotype Assessment
Systolic Blood Pressure (SBP). In the SIT Trial, a single measurement of blood pressure was obtained at well-visit for children with SCD. No guidelines were formulated to include uniform assessment. In the CSSCD cohort, longitudinal blood pressure measurements were collected during routine visits but not during episodes of acute illness. Blood pressure measurements were made by study nurses or physicians following the procedure described in the study manual of operations [25]. Subjects were asked not to smoke for at least 30 minutes prior to the examination and were allowed to rest quietly for at least 5 minutes before the measurement was made. A single measurement of pressure was made with the patient in the sitting position. Mercury sphygmomanometers were used for the measurement with a cuff size sufficient to cover two thirds of the upper arm. Systolic and diastolic pressures were reported as the first and fifth Korotkoff sounds, respectively.
Silent Cerebral Infarction (SCI). A magnetic resonance imaging (MRI) was obtained from all participants and the presence or absence of SCI was adjudicated by a blinded panel of three expert neuro-radiologists.

Genotyping and Quality Control
Genotyping of the SIT Trial cohort was performed in two stages. For stage 1, a subset of 573 samples, along with 24 International HapMap Consortium [26] controls and 13 known duplicates, were genotyped at the Center for Inherited Disease Research (CIDR) at Johns Hopkins University using the Illumina HumanHap650Y SNP array (Illumina Inc., San Diego CA, USA). This array contains approximately 661,000 SNPs, of which ~100,000 were selected as tags for populations with African ancestry [27]. The Beadstudio software (Illumina Inc.) was used to cluster the data and samples with <96.5% call rate were re-genotyped. The reproducibility, calculated from duplicate pairs was 99.98% and genotype concordance with HapMap data was 99.76%. In stage 2, 509 samples were genotyped at the Center for Disease Control (CDC) at Washington University using the Illumina Infinium HumanOmni1-Quad SNP array (Illumina Inc.) and achieved the call rate of ≥96%.
For quality control (QC), we performed several rounds of data cleaning and a 96.5% cutoff was used for the sample call rate and SNP coverage in the combined SIT Trial data (n=1082), and resulted the exclusion of 6 individuals and 1,260 SNPs from the study. Cryptic relatedness was determined by examining pair-wise identity-by-descent (IBD), and 77 samples were identified as first-degree relatives and dropped from the study. Given the admixed nature of the study participants, we used principal component analysis (PCA) as implemented in EIGENSTRAT [28] to both identify genetic outliers (>6 standard deviations on any of the top ten principal components) and correct for any potential residual population substructure. In the SIT Trial, twenty-six individuals were identified as genetic outliers and further excluded from the analysis. Additionally, 48 samples, due to incomplete phenotype data, were also dropped from the study, leaving 925 samples (males: 51.9% & females: 48.1%) for the subsequent GWAS analysis. Among them, 89% of the samples (n=826/924) had a confirmed SCI status and 98 individuals (11%) were not classified. In total, 251 SCI positive and 575 SCI negative samples were assigned as cases and controls, respectively.
In the CSSCD cohort, DNA samples were genotyped at Boston University using Illumina Human610-Quad SNP arrays (Illumina, San Diego, CA, USA) with approximately ~600,000 SNPs. All samples were processed according the manufacturer's protocol and the BeadStudio software was used to make genotype calls utilizing the Illumina pre-defined clusters. Samples with <95% call rate were removed and SNPs with a call rate <97.5% were re-clustered. After re-clustering, SNPs with call rates >97.5%, cluster separation score >0.25, and excess heterozygosity between -0.10 and 0.10 were retained in the analysis. The pair-wise IBD was used to identify cryptic relatedness and PCA was applied to detect genetic outliers. After excluding these samples and following the SIT Trial age inclusion criteria (<15 years), analysis was restricted to a dataset of 692 samples (males: 52% & females: 48%).

Merging GWAS Data and Imputation
To infer un-genotyped SNPs and fill-in missing data in the SIT Trial and CSSCD cohort, HumanHap650Y, HumanOmni1-Quad and Human610-Quad SNP array datasets were merged and subsequently imputation was performed for autosomes using a Hidden Markov model, as implemented in the MaCH software [29] (version 1.16) (http://www.sph.umich.edu/csg/ abecasis/MaCH/), with 50 rounds and 200 states. QC was performed both before and after imputation and poorly imputed SNPs (RSq <0.5, squared correlation between imputed and true genotypes) were excluded and total 1,019,297 SNPs were analyzed.

SNP Selection for Validation
Due to the lack of any published studies that report genetic determinants for SBP in children, and/or more specifically in African American children at genome-wide significance, we used the ICBP identified SBP SNPs for validation. To validate results from the ICBP study in SCD patients, we examined the 28 SNPs that were reported associated with SBP [23]. For SNPs that were not available on our genotyping array, a close proxy for the index SNP with criteria of r 2 ≥ 0.6 from HapMap Phase III (release 2, ASW panel) [30] or 1000 Genomes project (Pilot 1, YRI panel) [31] was used.

GWAS Meta-analysis and Statistics
All quality control measures in both cohorts were performed using the PLINK software package [32], version 1.06 (http:// pngu.mgh.harvard.edu/purcell/plink/). In the SIT Trial cohort, to account for the uncertainty of the imputed data, the estimated allele dosage was analyzed using ProABEL [33] under a multivariate linear regression framework. Association for each SNP was assessed after adjusting for age, sex, height and the 1 st principal component, and assuming an additive effect of allele dosage on SBP. In the CSSCD cohort, SBP measurements were available for longitudinal time points and data was analyzed using a linear mixed effect model using the lme4 package (http://lme4.r-forge.r-project.org/) in R (http:// www.r-project.org/) (version 2.14.1). In the mixed effect model, age, sex, height and the 1 st principal component were used as fixed effect covariates, while multiple SBP measurements within each individual were treated as random effect. GWAS results from both the cohorts were meta-analyzed using inverse-variance weighted fixed-effect models as implemented in METAL (http://www.sph.umich.edu/csg/abecasis/metal) [34]. The variance inflation factor for genomic control (λ GC ), as described by Devlin and Roeder [35], was evaluated in each cohort prior to meta-analysis and a total of 1,019,297 SNPs were meta-analyzed. To explore the previously published SBP associated SNPs [23], a one-sided test of significance was used. To estimate the effect of these SNPs on SCI (data available only from the SIT Trial cohort), multivariate logistic regression was used after adjustments for age, sex and 1 st principal component. The genetic risk scores for SBP and SCI were constructed (based on previously published SBP variants and weighted according to their effect sizes) using an R package Genetics ToolboX (http://cran.r-project.org/web/ packages/gtx/index.html). To construct the genetic risk scores, this R package uses the same underlying statistics which was used by the ICBP [23] and can be defined as follows: Assuming a set of m SNPs from a discovery panel, for the i-th SNP in the j-th individual denotes x ij as the coded genotype (for directly genotyped SNPs) or estimated allele dosage (in case of imputation). If the set of regression coefficients of the reported SNPs are w 1 , w 2 . . w m , then the risk score for individual j is defined as: s j = s o + w 1 x 1j + w 2 x 2j + … + w m x mj , where s o is the intercept. In our analyses, we specify the coefficient w 1 , w 2 . . w m , to be the effect sizes (in mmHg per coded allele).

Gene-based Association Testing
To increase power by combining independent associations within a gene into a single, stronger aggregated signal, genebased association tests were performed using GWiS [37]. GWiS uses greedy Bayesian model selection (selecting a minimal subset of associated SNPs within a gene) to identify independent effects and estimates overall significance through permutation. For each test statistics, using meta-analysis summary data, the gene P-values were computed using 1,000,000 permutations and utilizing the 1000 Genomes Project ASW panel as a reference population to account for linkage disequilibrium (LD) between SNPs.

Genome-wide single SNP association
Genome-wide association and meta-analysis was performed for SBP in 1,617 subjects (843 males, 774 females) from the SIT Trial and CSSCD cohorts. The average ages of studied samples from the SIT Trial and CSSCD cohorts were 8.96 and 9.57 years, respectively. Detailed demographic and clinical characteristics for the study subjects are described in Table 1. The observed P-values show no early departure from the null ( Figure S1), indicating minimal inflation (λ GC =0.998) in test statistics due to potential population stratification and/or cryptic relatedness. None of the SNPs reached genome-wide significance (P-value <5.0x10 -8 ) ( Figure 1). However, a number of suggestive candidate loci (P-value <5.0x10 -5 ) were identified that approached genome-wide significance ( Table 2). The most significant signal was observed for rs7952106 (P-value: SIT Trial=6.40x10 -3 ; CSSCD=3.94x10 -5 ; Meta-analysis=8.57x10 -7 ). This SNP is located ~78 kb 5' upstream of the dopamine receptor D2 subtype (DRD2) gene on chromosome 11. rs7952106 is a common SNP with a minor allele frequency (MAF) of 23%, and directly genotyped in both the cohorts. The direction effect of the minor allele (G) is consistent across both GWAS (SIT Trial: Effect size=1.65 mmHg/allele; CSSCD: Effect size=1.50 mmHg/allele) and associated with increase in SBP. A second DRD2 intronic SNP (rs17529477; minor allele frequency [A] =12%) showed the same direction effect (Effect size=1.76 mmHg/minor allele; P-value=1.93x10 -5 ) and was in low LD with rs7952106 (r 2 : SIT Trial=0.25 and CSSCD=0.24).

Gene-based association analysis
Given the suggestion of multiple independent signals in DRD2, we performed a gene-based test combining independent associations within a gene (using 20kb flanking region) and obtained a P-value for each gene using permutation. In total, 32,155 autosomal genes were tested, and results are shown in Table 3. Among all the tested genes, none met the genome-wide significant criteria (P-value < 2.0x10 -6 ).
The most significant gene was MIR4301 (micro-RNA4301, Gene ID: 100422855) with a P-value=5.2x10 -5 ( Table 3). MIR4301 is a 65 base-pair long non-coding RNA at chromosome 11 and contained within the DRD2 intronic region, and shares the same set of associated SNPs as observed for the DRD2 gene. To examine micro-RNA target binding predictions, we used the software RNA22 (version 1.0) (http:// cbcsrv.watson.ibm.com/rna22.html) [38]. MIR4301 showed a predicted target site in the 3' UTR region of the DRD2 transcript (ENSEMBL: ENST00000355319). In our study, the observed suggestive genetic signals of DRD2 region SNPs and the prediction of MIR4301 binding to DRD2 as a potential target, suggests the plausible involvement of DRD2 region in the regulation of SBP in SCD cohorts. Examining the GTEx database, which queries lymphoblastoid, liver, brain cerebellum, frontal cortex, and temporal cortex tissue, we found no known eQTLs within or in close proximity of this locus.

Association of previously reported SBP loci
A total of 29 independent chromosomal loci associated with blood pressure have been reported from the ICBP metaanalysis, of which 28 show strong evidence for association with SBP. We attempted to validate these loci in the combined SCD cohorts. To ensure uniform comparison of the genetic effect and its direction, the SNPs were analyzed according to the reported coded alleles (under an additive genetic model). We were able to test 27 of these SNPs directly or with a proxy SNP (r 2 ≥0.6), and 15/27 SNPs showed the same direction affect on SBP as reported in the ICBP study (Table 4, Figure S2). Of the directly genotyped/imputed or proxy SNPs, 4 were nominally significant in the combined SCD cohorts, and 6 were significant at P-value <0.10 ( (α=0.05/27=1.85x10 -3 ). Given the limited power to detect significance for individual SNPs in the relatively smaller SCD samples (Figure S3), we constructed a genetic risk score for SBP incorporating the 27 previously reported SNPs; weighted according to effect sizes observed in the ICBP meta-analysis. The risk score derived from these 27 directly genotyped or proxy SNPs was nominally associated with SBP (P-value= 0.05), demonstrating the role of these SNPs in aggregate in the modulation of SBP in the SCD cohort.

Association of SBP reported loci with SCI
Given that higher SBP is associated with increased risk of an SCI in SCD patients, we determined whether any of the SNPs (or their proxies) associated with SBP in ICBP study was associated with SCI. Three SNPs were nominally associated with SCI (Table 4), with only the CACNB2 (5') locus (rs2357790) consistent between the SBP (P-value=0.05) and SCI (P-value=0.04) analyses. None of these SNPs was significant after Bonferroni correction. Further, to estimate the aggregated effect of these SNPs on increased risk of SCI, we constructed the genetic risk score for SCI (weighted according to effect sizes of published SBP SNPs) and no significant association was observed (P-value= 0.95).

Discussion
In recent years, genome-wide scans demonstrated a successful means of identifying novel common genetic variants that contribute to susceptibility to complex diseases, including blood pressure [21][22][23]39,40]. Here, we present the results from a meta-analysis of SBP from two SCD cohorts comprised of 1,617 SCD patients, all with African American ancestry. No associations were genome-wide significant; however we observed suggestive association at rs7952106, a 5' upstream SNP to DRD2 gene on chromosome 11, which showed consistent association evidence in both the studied cohorts. Further, in a gene-based test, a suggestive signal of noncoding RNA (MIR4301) and the prediction of MIR4301 binding to DRD2 as a potential target, suggests the plausible involvement of DRD2 region in the regulation of SBP in SCD cohorts.
Previously, several studies have shown that dopamine synthesis in the kidney has an important role in the regulation of fluid and electrolyte balance and systemic blood pressure [41][42][43]. Dopamine exerts its actions via 2 families of G-proteincoupled receptors D1-like receptors (DRD1 and DRD5) and D2-like receptors (DRD2, DRD3, and DRD4). Later, several lines of evidence also showed that an intact dopaminergic system is necessary to maintain normal blood pressure and that genetic hypertension is associated with alterations in dopamine production and receptor function [41][42][43][44]. Deletion of any dopamine receptor in mice results in increased blood pressure by mechanisms that are receptor dependent. In particular, mice lacking the DRD2 gene (DRD2-/-) have reactive oxygen species (ROS)-dependent hypertension [44]. In addition, the DRD2 polymorphisms were also reported with decreased DRD2 expression [45] and shown to affect DRD2 mRNA stability and synthesis of the receptor [46]. These studies suggest that the DRD2 locus is plausibly involved in the regulation of SBP. Our results support these findings and the suggestive association from the DRD2 region SNPs may represents a true signal associated with SBP in SCD patients.  Recently, several large and well powered studies from European ancestry populations have identified 29 genomic loci associated with blood pressure [21][22][23]. We sought to validate these reported loci in the setting of SCD in populations of African American ancestry, and to further test whether any of these loci were involved in SCI. Our study reports that the derived genetic risk scores for SBP is significantly associated with SCD children of African American ancestry. The significant association of the aggregated genetic risk scores with SBP in our study highlights the importance of these loci in the modulation of SBP in the studied SCD cohort.
In SCD patients, neurovascular complications are common and largely due to tissue ischemia and infarctions [13,47]. SCI Table 3. Gene-based association analysis of systolic blood pressure. Gene start and end positions includes ±20 Kb of 5' and 3'-untranslated regions of the genes.
The threshold for genome-wide gene significance (2.0x10 -6 ) was established using is a major cause of neurologic morbidity in SCD children with unclear genetic susceptibility [47,48]. Given that hypertension is a known risk factor for stroke and more recently, SBP has been reported associated with risk for SCI [13,49], identifying genetic variants associated with SBP in SCD patients may also lead to the identification of genes associated with SCI. Although we confirm the association of 4 loci with SBP, only the CACNB2 (5') locus showed the consistent nominal significance for both SBP and SCI ( Table 4).
A few limitations to the current study need to be acknowledged. First, although we observed a significant association of the aggregated genetic risk score with SBP, we failed to reproduce the significance of any individual SBP associated SNPs after multi-test correction and this may be due to the limited sample size of our study. As shown in Figure  S3, our study (n=1,617) was also under-powered to identify novel genome-wide significant variants. Secondly, biological differences that exist between ethnic groups and complex interaction of age with different genetic alteration may also have negatively impacted our ability to identify significant loci. Also, the admixed nature of the African ancestry population may lead to differences in local LD patterns. Since we are not likely to be genotyping the functional variant, changes in LD patterns between Europeans and the studied African American populations can change the nature of the observed associations. Thirdly, the ability to detect genetic determinants associated with any trait of interest largely depends on the quality and reliability of the data. In our study, it is noteworthy to highlight that we have used the longitudinal SBP measurements from the CSSCD cohort, as oppose to the single time point data available from the SIT Trial. In addition, CSSCD SBP measurements were taken after following the procedure described in the study manual of operations, and hence provides more certainty of less variability, whereas, no uniform guidelines were adopted in the SIT Trial. At last, it has also been known for over a decade that blood pressure is an age-dependent process [50,51]. The ICBP study was performed in adult individuals (38-72 yrs), whereas in our study, SCD patients are restricted to age < 15 years; therefore, it is possible that the lack of association for other loci may be due to an age-dependent genetic effect.
In summary, our results not only suggest the importance of DRD2 genetic variation in the modulation of SBP, but also extend the genetic significance of the 4 previously published loci in SBP in an African American population. Further, our study also identifies a significant association for the genetic risk score with SBP, suggesting the aggregated importance of previously reported SNPs in the modulation of SBP in the setting of SCD. This study provides new insight in SBP regulation in an admixed African American ancestry cohort, more specifically in children with SCD, and highlights the overlap in genetic signals between African American populations and European ancestry populations.  In the absence of the genotype data for the reported index SNP, best proxy was selected using HapMap phase III (release 2, ASW panel) and 1000 Genomes project (Pilot 1, YRI panel ) with cutoff r 2 ≥ 0.6 †. Observed effects are based on inverse variance weighted meta-analysis*P-values are based on one-tailed significanceNA indicates opposite direction of effect between ICBP and SCD cohorts ‡. Odds ratio based on the multivariate logistic regression adjusted for age, sex and 1st principal component