A Genome-Wide Association Study on Chronic HBV Infection and Its Clinical Progression in Male Han-Taiwanese

It is common to observe the clustering of chronic hepatitis B surface antigen (HBsAg) carriers in families. Intra-familial transmission of hepatitis B virus (HBV) could be the reason for the familial clustering of HBsAg carriers. Additionally, genetic and gender factors have been reported to be involved. We conducted a three-stage genome-wide association study to identify genetic factors associated with chronic HBV susceptibility. A total of 1,065 male controls and 1,623 male HBsAg carriers were included. The whole-genome genotyping was done on Illumina HumanHap550 beadchips in 304 healthy controls and HumanHap610 beadchips in 321 cases. We found that rs9277535 (HLA-DPB1, P = 4.87×10−14), rs9276370 (HLA-DQA2, P = 1.9×10−12), rs7756516 and rs7453920 (HLA-DQB2, P = 1.48×10−11 and P = 6.66×10−15 respectively) were significantly associated with persistent HBV infection. A novel SNP rs9366816 near HLA-DPA3 also showed significant association (P = 2.58×10−10). The “T-T-G-G-T” haplotype of the five SNPs further signified their association with the disease (P = 1.48×10−12; OR = 1.49). The “T-T” haplotype composed of rs7756516 and rs9276370 was more prevalent in severe disease subgroups and associated with non-sustained therapeutic response (P = 0.0262). The “G-C” haplotype was associated with sustained therapeutic response (P = 0.0132; OR = 2.49). We confirmed that HLA-DPB1, HLA-DQA2 and HLA-DQB2 loci were associated with persistent HBV infection in male Taiwan Han-Chinese. In addition, the HLA-DQA2 and -DQB2 complex was associated with clinical progression and therapeutic response.


Introduction
It is common to observe clustering of chronic hepatitis B surface antigen (HBsAg) carriers and hepatocellular carcinoma (HCC) in families of affected individuals [1][2][3][4][5]. A considerable number of patients exposed to HBV in the early stage of life became chronic HBsAg carriers [3,[6][7][8]. Most of the children born to the mothers with active HBV replications became HBsAg carriers. This type of perinatal infection gives little room for genetic roles to participate [3]. However, horizontal transmission is as important as perinatal infection in HCC families. The total number of HBsAg carriers was higher in children of male HCC patients (high horizontal transmission) than in children of female HCC patients (high perinatal infection) [5]. Genetic roles could be operating in patients with the horizontal transmission as well as in those mothers with inactive HBV replications. Previous candidate gene studies suggested that human leukocyte antigen (HLA), cytokines, DNA repair and others were involved in HBV clearance or progression to hepatocarcinogenesis [8][9][10]. Recent genome-wide association studies (GWAS) confirmed that human gene variations in HLA-DP and HLA-DQ areas were associated with HBsAg persistence [11][12]. However, the clinical significance of these genes on HBV-related disease progression remained unclear. It could be shielded by multi-factorial problems associated with chronic HBV infection. It is well-known that the prevalence of HBsAg carriers in males is much higher than that in females [6]. Therefore, we conducted a GWAS in male Taiwan-Han Chinese to examine the differences between non-HBsAg carriers, inactive HBsAg carriers and active HBV related liver diseases.

Study Participants
A total of 1623 male case samples (321 in the GWAS scan, 646 in the first replication and 656 in the second replication) were recruited from Chang Gung Memorial Hospital (CGMH), Linkou or Taiwan Liver Cancer Network (TLCN). The cases with age. 30 years were enrolled when they were seropositive for the HBsAg for .6 months and seronegative for the anti-hepatitis C antibody (anti-HCV). All of the patients denied HIV infection. The 1623 HBsAg carriers were classified into three groups. The first group includes unrelated HBsAg carriers with persistent normal alanine aminotransferase (ALT) level (PNALT) for .5 years and with HBV DNA,10 5 cps/mL, which were carefully selected from HBsAg carriers who visited the carrier clinic of Chang Gung Memorial Hospital and were followed for a long run [13,14]. The second group was comprised of unrelated chronic hepatitis B (CHB) patients with fluctuated ALT levels and HBV DNA. 10 5 cps/mL. The third group included unrelated HCC patients. The age and sample size information for this three-stage study is given in Table 1.
The 304 male control samples for the first stage GWAS scan were obtained from the Taiwan Han Chinese Cell and Genome Bank (who had no liver related disease and were seronegative for HBsAg and anti-HCV) [15]. The 761 male control individuals in the two replication studies were selected from local residents of Taoyuan County, Taiwan through a project designated ''Integrated Delivery System of Health Screening, Taoyuan, Taiwan''. We included Han-Chinese origin subjects with age.30 years. The inclusion criteria for the controls were seronegative for HBsAg and anti-HCV, and with normal liver biochemistry records. We did not use anti-Hepatitis B core protein (anti-HBc) in our inclusion criteria because nearly 90% of general population age.30 years were infected with HBV [16,17]. Those anti-HBc negative subjects who have been living in Taiwan, an HBV endemic area, for more than 30 years were not excluded. They had chances to expose to HBV, but may be genetically resistant to HBV infection [3].
This study as well as the inform consent were approved by the Ethics Committee of Chang Gung Memorial Hospital (IRB 94-581). A written informed consent was obtained from all the participants before the study. This clinical investigation had been conducted according to the principles expressed in the Declaration of Helsinki.

Genome-wide SNP Genotyping and Quality Control
The genomic DNA was extracted from peripheral blood lymphocytes using the MagNA Pure LC DNA Isolation Kit with automatic DNA isolation instruments (MagNA Pure LC II; Roche Diagnostics GmbH, Mannheim, Germany). In the first-stage GWAS scan, the genome-wide SNP genotyping was done by Genizon Biosciences (Quebec, Canada) using Illumina Human-Hap610 beadchips in the case group; while the genotyping for the control group was done by DeCode Genetics (Reykjavik, Iceland) using Illumina HumanHap550 beadchips. Only the overlapping SNPs on both chips were included in analysis. For filtering highquality genotype data and for enlarging the number of potentially associated SNPs, we carried out the following quality control procedures other than the widely-used standard of the call rate of 95% and a minor allele frequency (MAF) of 5%; we included SNPs if: (1) a successful call rate in both cases and controls was . 90%; (2) an MAF in the controls was .1%; (3) the Hardy-Weinberg equilibrium (HWE) was not violated in the controls (HWE test P-value.10 27 ).
SNP genotyping. In the two replication studies, the genotyping of the SNPs identified by GWAS was performed using TaqMan Genotyping assays (Applied Biosystems Inc.) or Seque-nomMassARRAY System (San Diego, CA). The experiment with TaqMan assays was done by Vita Genomics (New Taipei City, Taiwan), and the experiment with SequenomMassArray was done by the Academia Sinica National Genotyping Center (Taipei, Taiwan). Table 1. Descriptive characteristic of the three-stage study samples in different clinical groups.

Statistical Analysis
We performed three single-locus association tests: genotype test, allele test and Cochran-Armitage trend test to compare allele and genotype frequencies assuming three modes of inheritance: additive, dominant and recessive between the cases and the controls for each SNP. To increase the chances of identifying potentially associated SNPs in the GWAS scan, a SNP marker with -log 10 (P-value) $5 in any of the three above mentioned tests would be considered for further analysis. In the first and the second replication studies, the significance level after Bonferroni correction for multiple comparisons (P = 0.05 divided by the number of SNPs in the analysis) was used instead. The metaanalysis was then carried out using the Mantel-Haenszel method. The odds ratios (ORs) and 95% confidence intervals (CIs) were calculated by logistic regression analysis with age adjustment. The linkage disequilibrium (LD) map and values for the analyzed SNPs were generated by the Haploview software [18]. Haplotypes were inferred and maximum likelihood estimates of haplotype frequencies were generated by using the expectation-maximization (EM) algorithm. Those haplotype categories with estimated frequencies of less than 5% were combined into one category called ''All other''. Effects of each haplotype category with estimated frequencies of at least 5% on persistent HBV infection were then assessed between the HBsAg carriers and the controls using logistic regression.
To detect the effect of potential population stratification, we used the principle component analysis (PCA) based software ''EIGENSTRAT'' and PLINK Multidimensional scaling analysis (MDS) to investigate the structure of the first stage GWAS samples. Based on the distribution of the observed -log 10 (trend test P-value) for the whole-genome SNPs against the theoretical distribution of expected -log 10 (trend test P-value), the quantilequantile (Q-Q) plot was produced to imply the potential existence of HBV-associated genetic variants.
In the evaluation of the cumulative effects of the risk alleles identified, we calculated the ORs and 95% CIs using logistic regression models with the median number of total risk alleles used as the reference group for the total 2688 individuals.
To assess the association between the clinical characteristics and the SNPs, the three clinical groups (PNALT, CHB and HCC) were investigated in the case-only analysis. Each of the three groups was coded as a binary trait and was tested for SNP associations using logistic regression models under assumptions of three modes of inheritance and with age adjustment. In the restrained samples who received any one of three anti-HBV treatment courses, the relationship of the SNP genotypes and the therapeutic response was evaluated using the Fisher's exact test in each treatment group and the Cochran-Mantel-Haenszel (CMH) test after controlling for treatment. In logistic regression analysis, the additive effect of risk alleles on the therapeutic response was assessed in each treatment group with age adjustment and in the full sample with age and treatment adjustment. In addition, association of inferred haplotype categories with therapeutic response was analyzed to evaluate the combined effects of multiple causal alleles and loci.

Genome Wide Association Study and Replication
In the first-stage GWAS scan, the Illumina HumanHap610 beadchips annotated 598,821 SNPs in the cases and the Illumina HumanHap550 beadchips annotated 560,184 SNPs in the controls. The gene chips covered 547,435 SNPs in common. After the quality control filtering, a total of 456,262 SNPs on the 22 autosomal chromosomes were obtained and then used in our initial discovery of GWAS. The population structure (Figures S1 (A) and (B)) and the Q-Q plot ( Figure S2) using the whole genome genotype data of 321 cases and 304 controls indicated no evidence of population stratification between the HBV cases and the controls (inflation factor l = 1.002).
A total of 61 SNPs which reached the significance level of 10 25 were found in the whole genome (Table S1 and Figure S3). After eliminating the SNPs which failed to be validated by Taqman genotyping, we focused our study on thirty-eight SNPs on chromosome 6 (P,10 25 in any of the three tests; Figure 1 and Table S1). Based on the Linkage Disequilibrium (LD) block and the trend test P-value, fifteen SNPs were selected for the first replication in an independent cohort of 646 cases and 345 controls (Table S2). Seven SNPs showed significant association with P, 0.01 in the first replication.
Two SNPs rs9277554 and rs2051549 were not considered further due to their close proximity to rs9277535 and rs7453920. The other five significantly associated SNPs (rs9276370, rs7756516, rs7453920, rs9277535 and rs9366816) in the first replication were selected for the second replication in an independent cohort with 656 cases and 416 controls.
The combined analysis of total series revealed consistent results as in the single stages. The most significant association was conferred by rs7453920 (P = 6.66610 215 , OR = 2.31) on HLA-DQB2 and rs9277535 on HLA-DPB1 (P = 4.87610 214 , OR = 1.59).

Pairwise LD Patterns in the Five Associated SNPs and Haplotype Association
The pair wise LD of the five associated SNPs was calculated for the 625 individuals in the GWAS scan. The haplotype block was generated from the Haploview software ( Figure 2). The LD block built up by the three SNPs rs9276370, rs7756516 and rs7453920 suggested that they might not contribute independently to the disease risk. The rs9277535 and rs9366816 were independent from the other three SNPs mentioned above, and they showed weak association with each other (r 2 = 0.30). Hence we reevaluated the association of each one of the five SNPs using logistic regression analysis adjusted for the other specified SNP and age in the combined samples. As a result, the rs7453920 remained significant after controlling for the effect of nearby rs9276370 and rs7756516 (Table S3 P (Table 3). After examining haplotype-disease association using logistic regression models, the ''T-T-G-G-T'' haplotype appeared to confer evident susceptibility to persistent HBV infection (52.83% in cases vs. 43.05% in controls; P-value = 1.48610 212 , OR = 1.49, 95% CI: 1.33-1.66 after age adjustment). We also found that the ''T-T-G-A-T'' haplotype had protective effect from developing the disease (19.69% in cases vs. 24.13% in controls; P-value = 1.17610 24 , OR = 0.77, 95% CI: 0.68-0.88 after age adjustment). When estimating analyzing the haplotypes made up of four SNPs excluding rs9366816, the opposite effects of the rs9277535 G and A alleles contributing to persistent HBV infection appeared to be more prominent (Table S5).

Clinical Significance of the SNPs
Among the 1623 HBsAg carriers, the SNP rs9276370 showed a lower TT genotype frequency in the PNALT group than in the other two groups (0.825 vs. 0.865; trend test p value = 0.0241; Table 4). The test for additive effects of the T alleles reached significance with P = 0.0376 (OR = 0.73; 95% CI: 0.55-0.96).
Those in the PNALT group were less likely to carry rs7756516-T alleles compared to the other two more aggressive liver disease groups (CHB and HCC, Table S4). The frequency of the TT genotype was lower in the PNALT group than in the other two groups (0.815 vs 0.865; additive genetic effects P-value = 0.0249; OR = 0.74, 95% CI: 0.56-0.96; Table S4).

Therapeutic Response
Among the CHB group, we reviewed the charts retrospectively and focused on the patients who received a course of the anti-HBV therapy. A variety of regimens developed in the past 20 to 30 years. We only considered the patients who received three widely used regimens: (1) lamivudine 100 mg daily, which has been used widely since its availability in early 2000 [19]; (2) entevavir 0.5 mg daily, soon replacing lamivudine as the most widely used oral antiviral agent accredited to its high potency of antiviral effect and low rate of drug resistance [20]; and (3) pegylated interferon a-2a 135-180 mcg weakly for one year [21]. Most of the patients received free therapies according to the nationwide health insurance policy. The regimen initially covered a one-year therapy only but changed to an up to 4-year regimen in recent years instead [22][23][24]. Therefore, we excluded those who received lamivudine or entecavir for ,1 year or .4 year, and those who received pegylated interferon a-2a for ,1 year. A total of 226 patients were included for this analysis. Those patients with HBV DNA,10 4 cps/mL with normal ALT at the 1-year post-treatment and then remained persistent normal ALT, or transient ALT elevation to ,2 times the upper limit of normal during follow-up were defined as sustained responders [25][26]. Those patients with ALT.2 times the upper limit of normal or HBV DNA.10 4 during the post-treatment follow-up period were defined as nonsustained responders.
The treatment naive rate was highest in the lamivudine group (69.8%), followed by 40.6% in the entecavir group and 39.5% in the pegylated interferon a-2a group (P,10 24 ; data not shown). The mean age at treatment was older in the entecavir group than in the other two groups (P = 10 24 , Table 5). In the 226 cases, the frequency of GG and GT genotypes in the non-sustained responders was only 12.43% compared to 26.32% in the sustained responders (Fisher's exact P = 0.0121 and CMH associated P = 0.0190). Similar trends were found in the treatment groups of lamivudine and pegylated interferon a-2a, but not in the entecavir group. Furthermore, in logistic regression analysis, the association of rs9276370 G alleles revealed significant additive effect on therapeutic response after adjusted for age and treatment (P = 0.0138, OR = 2.51, 95% CI: 1.21-5.22, Table 5). In estimating the haplotypes composed of two risk SNPs rs9276370 and rs7756516, two major haplotypes ''TT'' (90.93%) and ''GC'' (7.74%) were obtained (Table S6). The ''G-C'' haplotype had positive effect on sustained therapeutic response (P-value = 0.0132, OR = 2.49, 95% CI: 1.21-5.11 after adjusting for age at treatment); while the majority ''T-T'' haplotype showed negative effect on sustained therapeutic response (P-value = 0.0262, OR = 0.46, 95% CI: 0.23-0.91 after adjusting for age at treatment) as given in Table S6. The result indicated the potential use of the ''T-T'' and ''G-C'' for distinguishing the two types of

Discussion
From a total of 1065 male controls and 1623 male HBsAg carriers, this multi-stage GWAS study identified that three SNPs within HLA-DPB1 (rs9277535), HLA-DQB2 (7453920) and HLA-DPA3 (rs9366816) loci were independently associated with persistence HBV infection in a male Taiwanese Han Chinese population. The former two SNP associations were also reported in previous Japanese and Mainland Chinese GWAS studies [11,12,27]. Our findings confirmed that HLA-DP and -DQ loci played important roles in the development of persistent HBV infection.
A study from Mainland China revealed that HLA-DPA1 was highly associated with persistent HBV infection in Northern Han Chinese; while HLA-DPB1 was associated with the Southern minorities [28]. Taiwan Han Chinese mainly migrated from southern China [29]. Thus we did not identify SNPs in HLA-DPA1 in the GWAS scan. In addition to HLA-DP and HLA-DQ, a recent publication from Hu et al. (2013) reported that two novel SNPs rs3130542 (near HLA-C) and rs4821116 (in UBE2L3) were associated with persistent HBV infection in central China [27]. However, both of the SNPs did not show significant signals in our screening stage.
The rs9277535 in 39UTR of HLA-DPB1 showed significant association with persistent HBV replication in our as well as other series ( Table 2). The results of the haplotype analysis implied the crucial role that the rs9277535 SNP might play in the HBV persistence and viral clearance. In both of our haplotype analysis composed of five and four (including and excluding rs9366816), the individuals with the ''T-T-G-G-T'' or ''T-T-G-G'' haplotype were more likely to be persistent HBsAg carriers compared to those without the haplotypes (OR = 1.49 and 1.67 respectively); whereas those with the ''T-T-G-A-T'' or ''T-T-G-A'' haplotype were more likely to clear the HBV virus (Table 4 and S5).
In a study of European and Africa-American populations, rs9277534 rather than rs9277535 in the HLA-DPB1 39UTR region was associated with HBV recovery [30]. In their flow  Table 3. Haplotype association of the 5 risk SNPs rs9276370, rs7756516, rs7453920, rs9277535 and rs9366816 with persistent HBV infection. cytometry and mRNA analysis, the HLA-DP expression level was higher with the GG genotype than with the other genotypes. The higher HLA-DP expression might promote mechanisms of high HBV clearance in the controls [30,31].
The SNP rs7453920 (HLA-DQB2) showed significant association with persistent HBV infection but did not associate with clinical outcome [32]. The SNP rs7756516 on 39UTR of HLA-DQB2 identified in our cohort also showed significant association between the PNALT group and the other two more aggressive disease groups (CHB and HCC) ( Table S4, additive effect P = 0.0249; OR = 0.74). There is a lower prevalence of rs7756516 genotype TT in the PNALT group than in the other two groups (0.815 vs. 0.865, trend test P = 0.0134). From the NIH web site, the highest prevalence of TT genotype of rs7756516 (0.788) was reported for the Han Chinese in Metropolitan Denver (HAPMAP-CHD). Their prevalence is similar to the prevalence in our controls (0.736) and lower than that in our cases (0.846, Table 2). These findings are compatible with a high prevalence of HBsAg in Taiwan [6].
The SNP rs9276370 near the 59 region of HLA-DQA2 was also associated with persistent HBV infection (Table 1) and clinical progression (Table 3). Together with data of HLA-DQB2, the HLA-DQA2 and HLA-DQB2 complex may participate in HBV persistent infection and disease progression. In the 226 cases who received a course of anti-HBV therapy, a higher non-sustained responder rate was found in the TT genotype of rs9276370. This is especially true in the lamivudine group (P = 0.0074) and the pegylated interferon a-2a group (P = 0.0814, Table 4). The entecavir group did not show this trend. The mean age at treatment in the lamivudine group was younger and had been followed for a longer period (P = 10 24 , Table 5). There was also a higher prevalence of treatment naive cases (69.8%) than the other two groups (40.6% and 39.5%; P,10 24 ). Whether these differences may explain the therapeutic difference between Lamivudine and entecavir groups remain uncertain. Entecavir is a strong nucleoside analog that may suppress HBV replication completely [33]. The low HBV protein expression may decrease the impact of antigen presentation by HLA. The results of our haplotype analysis indicated the important role that HLA-DQA2 rs9276370 and HLA-DQB2 rs7756516 might play in therapeutic response. The ''T-T'' haplotype composed of rs9276370 and rs7756516 was associated with non-sustained therapeutic response (P = 0.0262, Table S6); while the ''G-C'' haplotype was associated with sustained therapeutic response (P = 0.0132; OR = 2.49). The ''T-T'' and ''G-C'' haplotypes have the potential for predicting the two types of therapeutic responders, although the sample size of the current analysis was too small and further replication studies should be carried out to validate the combined effects of these two SNPs.
Recently, Pan et al. identified that four tagging SNPs rs477515, rs28366298, rs3763316 and rs13204672 showed significant associations with non-response to hepatitis B vaccination [34]. The SNP rs477515 (located ,12 kb upstream of HLA-DRB1) as well as rs2516049 achieved the significance level in our initial screening stage, but we failed to validate them due to multiple repetitive sequences near these SNPs.
The rs9366816 near HLA-DPA3 was associated with persistent HBV infection (Figure 2 and Table S3). The rs9366816 is close to HLA-DPB2 as well. The information about these two HLA class II pseudogenes is limited [35]. The SNP rs9366816 in HLA-DPA3 is not in LD to with all the other associated SNPs except for rs9277535 (r 2 = 0.30; Figure 2), which implies its independent contribution to the disease susceptibility. The additive G allele effect on the therapeutic response (sustained vs. non-sustained) was evaluated in logistic regression analysis. For each of the three treatment groups, the P-values, ORs and CIs were adjusted for age, whereas for the entire sample, the statistics were adjusted for age and treatment. doi:10.1371/journal.pone.0099724.t005 We concluded that HLA-DPB1, HLA-DQA2 and HLA-DQB2 are associated with persistent HBV infection in male Han Taiwanese. HLA-DQA2 and -DQB2 complex is associated with the clinical progression and therapeutic response of chronic HBV infection.