A Genome-Wide Association Study of Total Bilirubin and Cholelithiasis Risk in Sickle Cell Anemia

Serum bilirubin levels have been associated with polymorphisms in the UGT1A1 promoter in normal populations and in patients with hemolytic anemias, including sickle cell anemia. When hemolysis occurs circulating heme increases, leading to elevated bilirubin levels and an increased incidence of cholelithiasis. We performed the first genome-wide association study (GWAS) of bilirubin levels and cholelithiasis risk in a discovery cohort of 1,117 sickle cell anemia patients. We found 15 single nucleotide polymorphisms (SNPs) associated with total bilirubin levels at the genome-wide significance level (p value <5×10−8). SNPs in UGT1A1, UGT1A3, UGT1A6, UGT1A8 and UGT1A10, different isoforms within the UGT1A locus, were identified (most significant rs887829, p = 9.08×10−25). All of these associations were validated in 4 independent sets of sickle cell anemia patients. We tested the association of the 15 SNPs with cholelithiasis in the discovery cohort and found a significant association (most significant p value 1.15×10−4). These results confirm that the UGT1A region is the major regulator of bilirubin metabolism in African Americans with sickle cell anemia, similar to what is observed in other ethnicities.


Introduction
Cholelithiasis is the most common gastrointestinal disease to require hospitalization worldwide [1]. Most gallstones are primarily comprised of cholesterol; however, in certain patient risk groups, bile pigment stones are extremely common and represent a significant cause of morbidity. These stones are associated with increased serum unconjugated bilirubin levels, which occur in a number of conditions including hemolytic anemia. In other patient groups, ineffective erythropoiesis, ileal diseases, or gastrointestinal resections producing increased colonic spillage of bile salts, predispose to bile pigment stones [2].
Patients with sickle cell disease (SCD), particularly those with sickle cell anemia (homozygosity for HBB glu6val), are at risk for bile pigment cholelithiasis due to the association of this disease with hemolysis which produces an unconjugated hyperbilirubinemia [3]. Cholelithiasis occurs in up to 70% of these patients, resulting from the precipitation of polymerized calcium bilirubinate within the gallbladder lumen when the ion product of calcium and unconjugated bilirubin exceeds its solubility. Up to 50% of these patients undergo cholecystectomy at some point during their lifetimes, suggesting increased associated morbidity, creating a need for hospitalization, and potential mortality from infectious or other complications related to this process (1).
Cholelithiasis risk, for both cholesterol and bile pigment stones, is thought to be partially mediated by genes impacting bilirubin metabolism [4,5,6]. Genome-wide association studies (GWAS) of three large Caucasian cohorts have identified single nucleotide polymorphisms (SNPs) in UGT1A1 that are responsible for 18% of clinical variability of serum bilirubin [7]. Sequencing of the UGT1A1 gene in patients with Gilbert's syndrome, associated with unconjugated hyperbilirubinemia, revealed that an important modulator of enzyme activity is the length of TA repeats in the promoter region; the greater the number of repeats being associated with the highest serum bilirubin levels and cholelithiasis risk [8,9].
Genome-wide studies examining hyperbilirubinemia and cholelithiasis risk have not been performed in persons of African descent with sickle cell anemia. However, sequencing of UGT1A1 has been undertaken in SCD cohorts in Guadeloupe, Portugal, Jamaica, India and the United States; genotyping of 324 SCD children from the Cooperative Study of Sickle Cell Disease (CSSCD) revealed that the common (TA)6 and (TA)7 alleles were present in 75% [10][11][12][13][14][15]. The (TA)7/(TA)7 genotype was associated with increased serum bilirubin levels and cholelithiasis risk compared with the (TA)6/(TA)7 and (TA)6/(TA)6 genotypes across cohorts with suggestions that those with a (TA)8 allele are at highest risk, suggesting a similar pattern of inheritance as observed in Caucasians [10][11][12][13][14][15]. Treatment with hydroxyurea, known to decrease the hemolytic rate in sickle cell anemia patients, was unable to decrease bilirubin levels to normal levels in those with the (TA)7/(TA)7 genotype suggesting that this genetic defect may represent a new pharmacologic target for these patients [14].
One of the reasons for the focus on UGT1A1 in genetic studies relates to its known importance in bilirubin metabolism [8,9]. Upon release into the plasma, heme is metabolized to bilirubin via the actions of heme oxygenase-1. Bilirubin is poorly water soluble and in order for excretion to occur, disruption of its hydrogen bonds via a process termed glucuronidation, or conjugation, is essential. Glucuronidation of bilirubin is mediated by a family of enzymes termed the uridine-diphosphoglucuronate glucuronosyltransferases (UGT). Of these, UGT1A1 is the most important regulator of this process clinically and is one of the isoforms of the UGT1A gene complex comprised of 9 transcriptional units. Unconjugated or indirect bilirubin is responsible for all of the toxic effects of hyperbilirubinemia, supporting the clinical importance of this class of enzymes. Additionally, as alluded to previously, genetic studies of other patient populations characterized by hyperbilirubinemia, Gilbert's syndrome and cystic fibrosis, have demonstrated an association between genetic variants within dinucleotide repeats in the promoter of UGT1A1 and bile pigment stone formation [7].
We conducted a GWAS of serum total bilirubin in the largest number of African American sickle cell anemia patients studied to date to determine if genetic variants played a role in the increased risk of cholelithiasis observed in this population. Our analysis shows that UGT1A1 is the major regulator of bilirubin levels in African Americans with sickle cell anemia, and that genetically mediated differences in bilirubin conjugation play an important role in cholelithiasis risk in these patients.

Ethics Statement
All studies were approved by the Institutional Review Board of Boston Medical Center, Duke University, University of Pittsburgh, Johns Hopkins University, Washington University and the University of North Carolina at Chapel Hill. Additionally, IRB approval was acquired from all of the sites participating in the Walk-PHaSST, Multicenter Study of Hydroxyurea Use and SITT trials for subject enrollment. All patients signed informed written consent for participation in these studies.

Study Subjects
The discovery set consisted of 1,117 African American subjects from the CSSCD [16,17]. The replication cohorts were comprised of 195 subjects from the Multicenter Study of Hydroxyurea (MSH), 522 subjects from the Pulmonary Hypertension and Sickle Cell Disease with Sildenafil Therapy (Walk-PHaSST) study (NCT00492531) [18], 530 subjects from the Outcome Modifying Genes study (referred to subsequently as ''Duke'') that enrolled patients from Duke University Medical Center, the University of Table 1. Patient Characteristics in CSSCD, MSH and Walk-PHaSST cohorts.

Genotyping
The DNA from the CSSCD, MSH, and Walk-PHaSST samples were genotyped at Boston University using the Illumina Human610-Quad SNP array, with approximately 600,000 SNPs and the Illumina HumanCNV370-duo bead chip (MSH). The Duke samples were also genotyped using the Human610-Quad SNP array, whereas the SITT cohort was genotyped in two stages. In stage 1, a subset of 573 recruited samples was genotyped at the Center for Inherited Disease Research (CIDR) using IIIumina HumanHap650Y array. The genotype data for the remaining 509 samples were generated at the Center for Disease Control (CDC) in collaboration with Washington University using Illumina Infinium HumanOmni1-Quad array. To infer un-typed SNPs and fill-in missing data across Omni1M and HumanHap650Y platforms, imputation was performed by using Hidden Markov model implemented in MaCH software [20]. All samples genotyped at Boston University were processed according the manufacturer's protocol and BeadStudio Software was used to make genotype calls utilizing the Illumina pre-defined clusters. Samples with less than a 95% call rate were removed and SNPs with a call rate ,97.5% were re-clustered. After re-clustering, SNPs with call rates .97.5%, cluster separation score ..25, excess heterozygosity between 2.10 and .10, and minor allele frequency .5% were retained in the analysis. We used the genome-wide identity by descent analysis in PLINK to discover unknown relatedness. Pairs with identity by descent measurements greater  than .2 were deemed to be related subjects. Related subjects within individual or different studies were removed. We also removed samples with inconsistent gender findings defined by heterozygosity of the X chromosome and gender recorded in the database. Quality control parameters for the DUKE and SITT samples were comparable. In SITT, a 98% cutoff was used for both the sample and SNP call rates in the combined dataset (n = 1082), and resulted the exclusion of 6 individuals and 1,260 SNPs from the study. Seventy seven samples were identified as first-degree relatives (pairs with identity by descent measurements . 0.4), and 1 individual from each pair was removed. Twenty six individuals were identified as genetic outliers using EIGEN-STRAT (.6 standard deviations on any of the top ten principal components) and removed. Additionally, 68 samples were also dropped from the study, due to missing phenotype data, leaving 905 samples for subsequent analysis.

Phenotype
Serum total and direct bilirubin, lactate dehydrogenase (LDH), hemoglobin concentrations, and reticulocyte counts were measured using automated chemical and hematologic analyzers at the individual medical centers participating in these studies.
For the CSSCD patients, longitudinal bilirubin measurements were collected from phases 1, 2, and 3 of the study [16]. Only steady state measurements were used (4 months removed from blood transfusion). The longitudinal measurements of 3,250 study patients were analyzed using a Bayesian hierarchical mixed model that included a random effect per patient to account for the repeated measurements, as well as random intercept and age effects that were allowed to vary with the clinics. The random intercept and age effects were used to remove the between-site systematic differences. Markov Chain Monte Carlo method in Openbugs was used to estimate the predicted total bilirubin values, and log-transformed median predicted values were used as the phenotype for the GWAS. The details of the analysis are available as (Information S1). For MSH, Duke, Walk-PHaSST and SITT patients, the log transformed baseline total bilirubin was used.
CSSCD subjects were assessed for gallstones by cholecystogram, plain film report, and abdominal ultrasound. A combination of questionnaires and medical records were utilized to determine a history of cholelithiasis or cholecystectomy. The cholelithiasis phenotype was dichotomized into: a) those who had gallstones (or history of), a history of cholecystectomy, a nonfunctional gallbladder; or b) none of these conditions. Of the 1,062 CSSCD patients with gallstone information, 348 had gallstones, or a nonfunctional gallbladder, cholecystectomy or a history of one of these events.

Heritability of Total Bilirubin
To further examine heritability of serum bilirubin in the CSSCD population, we examined the correlation of serum bilirubin in 90 sibling pairs. As a comparison, we randomly selected serum bilirubin values for 200 unrelated pairs 1,000 times to get an average of the estimated correlation among unrelated individuals.

Description of GWAS and Secondary Genetic Association Analyses
The association between total bilirubin levels and each SNP that passed quality control was tested using multiple linear regression, adjusting for age and gender using the additive and genotypic model in PLINK software. Age and gender were both included as covariates, as both were found to have a significant association with total bilirubin levels (p,0.0001 for both covariates). The minor allele was the coded allele in the additive model. To control for population stratification, a separate GWAS was performed on the discovery set adjusting for age, gender and the top ten principal components calculated using EIGENSOFT [21].
Haploview was employed to generate LD heatmaps using genotype data from the CSSCD. The SNPs that reached genome-

5.44E226
Genome wide significant SNPs in the CSSCD study and their replicates in the independent cohorts. The table reports the SNP identifier from dbSNP, chromosome, physical coordinates (hg18), the coded allele in PLINK (also minor allele) and the non-coded allele, the minor allele frequency (MAF) from the CSSCD cohort, the gene clusters where the SNP is located, and regression coefficientt and p-value in each study.   wide significance in the CSSCD sample were analyzed for association in the other cohorts. In SITT, the genetic associations were performed after adjustments for the same covariates (age, gender and top ten principal components). To account for the uncertainty of imputed data, the estimated allele dosage was analyzed using ProABEL under a linear regression framework [22]. Duke only controlled for age and gender, because previous analyses had indicated there was not appreciable population stratification in that data set. In a secondary analysis, SNPs that achieved genomewide significance from the additive model were tested for association with the cholelithiasis in the CSSCD cohort. This association was tested using an additive multiple logistic regression model, adjusting for age. The same SNPs were also tested for association with LDH, reticulocyte counts and hemoglobin concentration, using the same Bayesian hierarchical model.

Patient Characteristics and Heritability of Total Bilirubin Levels
Patient characteristics for each cohort are shown in Tables 1 and 2. The CSSCD cohort was younger than the MSH, Duke, and Walk-PHaSST cohorts. Each of the cohorts were equally divided between males and females. Examination of the serum bilirubin levels amongst 90 sibling pairs revealed a positive correlation (r = 0.270, p = 0.009), consistent with heritability of bilirubin levels ( Figure 1A). In contrast, there was no correlation amongst 1,000 randomly selected pairs (r = 2.002 average p value = 0.502, Figure 1B).

Association Between UGT SNPs and Serum Bilirubin Levels
After our quality control procedures, 569,615 SNPs were analyzed in the GWAS of serum bilirubin in the CSSCD. Figure 2A shows the Manhattan plot, with the results of the GWAS for the additive model and the large spike of significant associations in chromosome 2. The QQ plot ( Figure 2B) shows no inflation (lambda factor, l = 1.01) suggesting that there is no confounding by population stratification. To confirm this, we repeated the analysis after adjustment for the top ten principal components and the magnitude and significance of the top SNPs did not change. Fifteen SNPs reached genome-wide significance for both the genotypic and additive models and we report only the results from the additive model (Table 3). These genes are located on a contiguous part of chromosome 2 of the UGT1 region. The linkage disequilibrium (LD) structure of these SNPs in Figure 3 shows two blocks in LD, although all 15 SNPs are in very strong LD (D'.0.53) (the LD structure of these SNPs can be show in terms of r 2 in Information S1). To determine if these 15 SNPs were part of one or multiple independent signals, another GWAS was performed adjusting for age, sex and our top SNP, rs887829. None of the fourteen SNPs showed a significant association with serum bilirubin after this adjusted analysis (complete results can be seen in Information S1), thus suggesting that these 15 SNPs are a part of the same signal. A complete list of the SNPs with a p value less than 5610 205 consistent with genome-wide significance can be found in Information S1. An increase in the number of minor alleles in all SNPs except rs4148326 and rs3755319 was associated with an increase in log total bilirubin (effect sizes ranging from 0.108 to 0.1838), so that the less common alleles are the risk alleles; however, the other two SNPs showed effect estimates in the opposite direction (effect sizes ranging from 2.13 to 2.14), so that the minor alleles are associated with lower levels of serum bilirubin. Only four of the SNPs that reached genome-wide significance in the CSSCD population were available for genotyping on the 370K array used for the MSH samples, while all of them were genotyped in the Duke, Walk-PHaSST and SITT cohorts. All of the SNPs were significantly associated with bilirubin levels and had a regression coefficient in the same direction in both the discovery and replication sets (Table 3). Table 4 shows the association between the 15 genome-wide significant SNPs and the odds for cholelithiasis. Twelve of the 15 SNPs were significantly associated with cholelithiasis (p ,.05) and all of them had associated odds ratios for cholelithiasis in a direction consistent with that observed in the analysis of total bilirubin: An increase in the number of risk alleles for all SNPs except rs4148326 and rs3755319 was associated with an increased odds for cholelithiasis (odds ratios ranging from 1.324 to 1.757). The other two SNPs showed effect estimates in the opposite direction (odds ratios of .524 and .561). We also tested whether these 15 SNPs were significantly associated with LDH levels, hemoglobin concentration and reticulocyte count to see if there was an association with markers of hemolysis. None of the SNPs showed an association with LDH, hemoglobin concentration or reticulocyte count (Table 5).

Discussion
Intravascular hemolysis in sickle cell anemia patients produces increased circulating heme levels and has been associated with clinical complications including cholestatic jaundice and cholelithiasis [10,[23][24][25][26][27][28] Unconjugated hyperbilirubinemia is a risk factor for the development of bile pigment stones in sickle cell disease and other hemolytic anemias [12,[29][30][31][32][33][34][35][36][37][38][39][40][41]. Typically, bilirubin is formed from metabolism of heme, which is derived principally from erythrocytic hemoglobin. Heme is oxidatively metabolized by heme oxygenase-1 into biliverdin, which is subsequently reduced enzymatically by biliverdin reductase to bilirubin. Upon formation, bilirubin is water insoluble, due to the presence of internal hydrogen bonding that creates a contorted molecule structure, inhibiting solubility. This form of bilirubin, termed unconjugated or indirect bilirubin, is responsible for all of the toxic effects of bilirubin observed clinically. To permit excretion, bilirubin complexes with albumin and is transported via the bloodstream to the liver, where it undergoes glucuronidation by the UGT family of enzymes; UGT1A1 is the only one with clinical relevance in humans [42]. UGT1A is a gene complex composed of 9 transcriptional units encoding an isoform of the UGT gene [30,31]. The most common genetic cause of impaired bilirubin glucuronidation occurs in Gilbert's syndrome, which follows an autosomal recessive inheritance pattern and most commonly affects Caucasians of European descent [43,44]. Clinically, these patients have a mild indirect hyperbilirubinemia, which worsens during periods of stress or febrile illnesses [45]. Genetic studies of these patients have identified a dinucleotide repeat polymorphism (TA) [5][6][7][8] in the TATA box of the UGT1A1 gene promoter that is associated with reduced UGT expression and produces hyperbilirubinemia [30][31]37,[46][47]. This polymorphism has also been observed in small cohorts of sickle cell anemia patients suggesting a common pathogenic link between ethnically divergent etiologies of indirect hyperbilirubinemia [5,[10][11][12][13][14][15]33]. In the current study, genome-wide genetic screening of a large cohort of sickle cell anemia patients not only confirmed these findings but also supported their biological relevance.
In addition to defects in bilirubin metabolism, genetic alterations in the glucuronidation pathway have been linked to abnormalities in the hepatic metabolism of certain medications, pre-disposition to cardiovascular disease and certain types of malignancy suggesting a clinical level of importance beyond cholelithiasis [35,42,[48][49][50][51][52][53]. How this relates to the pathogenesis of sickle cell anemia is not entirely clear clinically, but may be worthy of future study.
In genetic studies of primarily Caucasian cohorts, cholelithiasis and serum bilirubin levels are heritable traits. Family-based studies show that the UGT locus accounted for a significant proportion of the variation observed in both of these variables [18,[54][55][56]. Our study is the first to use GWAS to examine an African American population with sickle cell anemia, a condition where hemolysis predisposes patients to hyperbilirubinemia, to determine the heritability of serum bilirubin levels. Fifteen SNPs in UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8 and UGT1A9 reached genome-wide significance for association with total bilirubin levels in the CSSCD cohort; 13 of these were also associated with the presence or history of cholelithiasis. These findings were confirmed in 4 other cohorts totaling 3,269 patients, unusual for a rare disease such as sickle cell anemia, and representing a greater number of patients than all of the previous studies of this population combined.
SNPs we identified in the current study have been linked to hyperbilirubinemia in other populations. SNP rs887829, the most significant SNP associated with bilirubin levels identified in the CSSCD, Duke and Walk-PHaSST cohorts, was previously observed in associated with total bilirubin levels in a GWAS of 4,300 Sardinians [57]. This study also reported a significant association with bilirubin and the SLCO1B3 locus; however, we were unable to replicate these results in our study due to unavailability or low MAF of the markers in our array. In a study performed by Cheng et al. examining genetic variants associated with serum bilirubin in 619 healthy African Americans revealed that the top SNP was rs887829 (p = 1.77610 222 ) [58]. This suggests that this variant is not unique to African Americans with sickle cell disease. It has been theorized that this SNP may confer protection against malaria which may explain its penetrance amongst African populations. SNP rs887829 is located in the promoter region of UGT1A1, 221 bp upstream from the (TA) n repeat sequence. Horsfall et al. found a strong association was found between rs887829 and the (TA) 7 and (TA) 8 repeat sequence suggesting that this may be a marker for the (TA)7/(TA)7 or 8 genotypes [34]. This proximity to the promoter element and strong level of association suggest that rs887829 is a marker for the (TA) n repeats and that additional sequencing of this gene would be redundant [58]. SNP rs6742078, identified in the current study, was associated with total bilirubin in a meta-analysis previously performed by Johnson et al., including the Framingham Heart Study, Rotterdam Study and Age, Gene, Environment and Susceptibility-Reykjavik study cohorts [33]. In addition, SNPs rs3755319, rs7586110, and rs6759892 were also found to be significantly associated with total bilirubin in a cohort of 4,300 Sardinians [57]. SNP rs887829 was found to be associated with cholelithiasis in a study published by Buch et al., in which a cohort of 2,606 German cholelithiasis patients was compared with 1,121 South American controls (OR = 1.73, p value .003) [11].
There was no association between LDH, hemoglobin concentration or reticulocyte count, markers of hemolysis, and the SNPs identified in the present study. A potential explanation for this is the weak association between bilirubin levels and hemolytic rate. While the "gold standard" for hemolysis is the red cell lifespan, this is rarely performed and surrogate blood biomarkers such as reticulocyte count, LDH, bilirubin and haptoglobin levels are used clinically to estimate the degree of hemolysis. None of these measures are specific for hemolysis. An elevated serum bilirubin level in sickle cell anemia reflects multiple pathophysiologic processes that include liver disease in addition to hemolysis and therefore may lack the phenotypic specificity required for genetic association studies of hemolysis. Another possible reason for this finding is that the genes associated with bilirubin levels are all involved in bilirubin catabolism and not production suggesting that they are not reflective of the processes leading to bilirubin formation.
In summary, SNPs in UGT1A1 are most tightly associated with bilirubin levels in African Americans with sickle cell anemia. This study in conjunction with those conducted in other populations at risk for unconjugated hyperbilirubinemia and other cohorts of sickle cell anemia patients support the concept that genetically mediated differences in bilirubin conjugation play an important role in the cholelithiasis risk. It is possible that targeting this pathway pharmacologically may offer new therapeutic options for these patients.

Supporting Information
Information S1 Description of the Bayesian hierarchical model used to create the phenotype in the CSSCD cohort. Supplementary Figure 1 contains information on the LD structure of the UGT1A region in the CSCCD cohort. Supplementary Table 1 contains information on the analysis of the association between bilirubin and after adjusting for our most significant SNP. Figure 1: LD Structure in CSSCD Cohort. LD plots for regions in genes UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9 and UGT1A10 on chromosome 2 in the CSSCD subjects. The LD plot was generated using Haploview 4.2. Each diamond represents the r 2 value between two SNPs. The LD color scheme is: white r 2 = 0, 0,r 2 ,1 grey (the darker the shade of grey, the higher the r 2 value), black r 2 = 1.