Molecular characterization of hepatitis B virus in Bangladesh reveals a highly recombinant population

The natural history and treatment outcome of hepatitis B viruses (HBV) infection is largely dependent on genotype, subgenotype, and the presence or absence of virulence associated mutations. We have studied the prevalence of genotype and subgenotype as well as virulence and drug resistance associated mutations and prevalence of recombinant among HBV from Bangladesh. A prospective cross-sectional study was conducted among treatment naïve chronic HBV patients attending at Bangabandhu Sheikh Mujib Medical University, Dhaka, Bangladesh for HBV viral load assessment between June and August 2015. Systematical selected 50% of HBV DNA positive patients (every second patient) were enrolled. Biochemical and serological markers for HBV infection and whole genome sequencing (WGS) was performed on virus positive sample. Genotype, subgenotype, virulence, nucleos(t)ide analogue (NA) resistance (NAr) mutations, and the prevalence of recombinant isolates were determined. Among 114 HBV DNA positive patients, 57 were enrolled in the study and 53 HBV WGS were generated for downstream analysis. Overall, 38% (22/57) and 62% (35/57) of patients had acute and chronic HBV infections, respectively. The prevalence of genotypes A, C, and D was 18.9% (10/53), 45.3% (24/53), and 35.8% (19/53), respectively. Among genotype A, C and D isolates subgenotype A1 (90%; 9/10), C1 (87.5%; 21/24) and D2 (78.9%; 15/19) predominates. The acute infection, virulence associated mutations, and viral load was higher in the genotype D isolates. Evidence of recombination was identified in 22.6% (12/53) of the HBV isolates including 20.0% (2/10), and 16.7% (4/24) and 31.6% (6/19) of genotype A, C and D isolates, respectively. The prevalence of recombination was higher in chronic HVB patients (32.2%; 10/31 versus 9.1%; 2/22); p<0.05. NAr mutations were identified in 47.2% (25/53) of the isolates including 33.9% novel mutations (18/53). HBV genotype C and D predominated in this population in Bangladesh; a comparatively high prevalence of recombinant HBV are circulating in this setting.


Introduction
There are an estimated two billion people with serological markers of present or past Hepatitis B virus (HBV) infection globally; 257 million of these are chronically infected [1]. The outcomes of acute HBV infection range from complete recovery to fulminant liver disease. A failure to clear HBV after acute infection may lead to either inactive or active chronic infection, which can induce hepatic insufficiency, end-stage liver disease including liver cirrhosis (LC) and hepatocellular carcinoma (HCC) [2,3].
Current classification segregates HBV into 10 different genotypes (A-J; segregated by <7.5% genomic sequence diversity); these are further classified into 40 different subgenotypes, which are separated by >4% genomic sequence diversity [4]. The dominant HBV genotypes and subgenotypes differ by geographical location, transmission dynamics, disease progression, and response to antiviral therapy [5]. The clinical progression of HBV infection is dependent on multiple factors, which includes age when infected, genetic factors, and the infecting genotypes and subgenotypes [6]. Notably, factors such as genotype and subgenotype as well as HBV e antigen (HBeAg) status, viral load, drug resistance mutations in the reverse transcriptase (RT) domain of polymerase gene, and mutations in the basal core promoter (BCP) precore (PC) and core gene have a major influence in determining disease progression and treatment outcome [7][8][9]. End-stage liver disease, and poor response to interferon therapy is commonly observed in chronic HBV infection associated with genotypes C and D [5]. It has also been observed that horizontal HBV transmission is more common with genotypes A and D [5]. Further, a number of genetic characteristics including mutations in preS1, preS2, and S genes have been shown to be associated with viral replication, progression of liver disease including HCC, occult HBV infections (OBI), immune escape, and therapy escape [10][11][12][13]. Mutations in the core promoter (CP) have been associated with severity of liver disease; a G1896A mutation in the precore and core gene creates a premature stop codon at position 28 and abolishes the synthesis of HBeAg [10,14].
Bangladesh is a country with intermediate endemic HBV and a chronic HBV carriage rate of 2-6%. The prevalence of chronic HBV among the general population and various high-risks groups, including intravenous drug users, ranges from 0.8% to 6.2% [15] [16]. However, data regarding the prevalence of HBV genotypes and subgenotypes, the prevalence of recombinant viruses, virulence-associated characteristics such as drug resistance mutations are limited from Bangladesh. In order to address this paucity of data, we performed a prospective cross sectional study to determine the dominant HBV genotypes and subgenotypes. We additionally assessed the prevalence of recombination, resistance associated mutations, and the prevalence of virulence mutations in HBV in this setting.

Study population
A prospective cross-sectional study was conducted among treatment naïve patients attending the Bangabandhu Sheikh Mujib Medical University (BSMMU), a tertiary care hospital in Dhaka, for HBV viral load testing between June and August 2015. All patients attending at BSMMU for HBV viral load assay and providing written informed consent were eligible for enrollment in the study. Parental or guardian consent were collected for patient <18 years of age. Systematically selected 50% (every second) of the HBV DNA positive patients was enrolled in the present study. Venous blood samples were collected from enrolled patients for biochemical, virological, and molecular testing.

HBV, HCV, HIV serology
All plasma samples were tested for HBs Ag, HBe Ag, anti HBs, anti HBe, anti HBcTotal, and anti HBc IgM using serological tests as per the manufacturer's recommendation (Beijing Wantai Biological Pharmacy Enterprise Co., Ltd., Beijing, China). Serum samples were classified as being from acute HBV infections (HBs Ag positive, anti HBcTotal positive, anti HBC IgM positive, anti HBs negative), or chronic HBV infections (HBs Ag positive, anti HBcTotal positive, and anti HBc IgM negative and anti HBs negative) according to the CDC guidelines for the interpretation of hepatitis B serological test results (http://www.cdc.gov/hepatitis/HBV/PDFs/ SerologicChartv8.pdf). All samples are screened for anti hepatitis C (HCV) antibody by ELISA (Beijing Wantai Biological Pharmacy Enterprise Co., Ltd., Beijing, China) and for anti human immunodeficiency virus (HIV) antibody by ELISA (DIALAB ELISA, Biorad, France) as per manufacturer's recommendation.
PCR amplicons were purified using the QIAamp PCR product purification kit (QIAgen GmbH, Hilden, Germany). The eluted DNA was quantified by a fluorescence-based dsDNA quantification method using the Quant-iT dsDNA Assay Kit in a Qubit fluorometer (Invitrogen). For sequencing, genomic fragments were pooled into equal quantities of each individual PCR amplicon. One nanogram of pooled DNA from each individual sample was subjected to library preparation using the Nextera XT DNA sample preparation kit (Illumina, San Diego, CA, USA), in which each sample was assigned to a unique barcode sequence using the Nextera XT Index Kit (Illumina). Sequencing of the libraries was performed using MiSeq reagent kit v2 (300 cycles, Illumina) on an Illumina MiSeq platform. All samples were sequenced in a single run. The Illumina fastq sequence files were assembled using Genious 8.0.5 software package (Biomatters Ltd, Auckland, New Zealand) utilizing a reference-based mapping tool after primer sequences clipping (i.e. the consensus sequence was obtained by mapping individual reads of each sample to a reference sequence). A minimum variant frequency of 5% and 500-fold coverage were chosen as cut-off values and all analysis was done on dominant/ consensus variants. The resulting sequences were deposited in GenBank under accession numbers MF925358 to MF925410.

HBV recombination and phylogenetic analysis
One hundred and three HBV whole genome sequences (WGS) representing all 10 genotypes and at least two sequences for each of the subgenotypes were downloaded from Genbank and combined with 53 HBV WGS from the current study. These 156 sequences were subjected to recombination and phylogenetic analysis (S1 File) [4,18]. All sequences were analyzed for possible recombination by RDP4 v 4.55 software. Any recombination detected by at least 5 of the 7 programs (RDP, Geneconv, Bootscan, Maxchi, Chimaera, Siscan, and Topol) was considered as true recombination. RDP4 v4.55 standard default setting was used except for Bootscan and Siscan the window sizes 300bp, step size 30 were used. Data on the type of recombination, recombination breakpoint (start and end point), homology with major parent & minor recombinant parent, the size of the recombinant fragment, and the location of the recombination were determined.
For phylogenetic analysis, all 156 complete genome sequence were aligned using MUSCLE in the Genious software package (S1 File) [19]. Phylogenetic analysis was conducted in two stages. In the first stage, analysis was conducted using the full length genome sequences (data not shown). As recombination analysis revealed a highly recombinant population and the limits of the regions susceptible to recombination, we conducted second stage of analysis to determine the evolutionary relationships between the isolates. In this stage, two partial sequence alignments were created; the first alignment contained WGS without the recombination susceptible region (1-1,272bp and 2,028-3,215bp) and the second set of alignment contained the recombination susceptible region only (1,273bp-2,027 bp). The sequence alignments were subjected to Jmodel test to identify the best model for phylogenetic analysis. The suggested nucleotide substitution model (GTR+G+I) was subsequently used in the phylogenetic analysis using RAxML v7.2.8 (available in Genious package). To confirm the reliability of phylogenetic tree, bootstrap resampling and reconstruction were performed 1000 times [20].
All data (socio demographic, biochemical, and virological) was recorded and analyzed using the Statistical Package for the Social Sciences (IBM SPSS version 23, NY, USA). Pearson's Chi squared test was used for the comparison qualitative variables and Mann-Whitney U test for ordinal scale variables. The one-way ANOVA test was used for comparing the significance between three or more groups. A p value <0.05 was inferred to indicate statistical significance.

Results
From June to August 2015, a total of 274 treatment naive patients attending BSMMU for HBV viral load measurement were invited to participate the study. Among these, 159 patients provide informed consent to join the study. Of these 159 patients, 114 were HBV DNA positive and 57 patients (50%; every second patient) were enrolled in this study. The sex, demographics, liver enzyme profiles, HBV serology, and hepatitis status of all 57 patients are presented in Table 1. Approximately 80% (47/57) of the patients were male and the median age was 32.2 years (12 to 65 years; IQR 20); 38.5% (22/57) of patients had an acute HBV infection and 61.5% (35/57) had a chronic HBV infection. AST, ALT, and serum bilirubin was significantly higher in those with an acute HBV infection (p<0.05). HBe Ag seropositivity was higher amongst the acute HBV patients than the chronic HBV patients, although this was not statistically significant. The median viral loads of the acute HBV infection cases were significantly higher than chronic HBV infection (2.9x10 6 versus 3.2x 10 3 ; p<0.001) ( Table 1). All patients were HCV and HIV negative (data not shown).
Recombination analysis using RDP identified evidence of recombination in 22.6% (12/53) of the HBV isolates including 20.0% (2/10), and 16.7% (4/24) and 31.6% (6/19) of genotype A, C and D isolates, respectively. The recombination events were classified into four groups based on the size of the recombinant fragment and the major parent genotypes; in group 1 (4 isolates), group 2 (4 isolates), group 3 (2 isolates), and group 4 (2 isolates) recombination was evident between genotype C/B, D/C, A/C and D/B, respectively. The recombinant groups, types, fragment lengths, breakpoints, major and minor parents, and sequence homologies with these parents are shown in Fig 2, Fig 1B and S1 Table. The majority of the recombination events were identified in the X gene and in the early part of the PC/C gene. The number of recombination events was significantly greater in the HBV associated with chronic infections (32.2% (10/31) versus 9.1% (2/22); p<0.05).
NAr mutations were further characterized for their presence in RT domains or interdomian regions. Interdomian mutations were more common than domain mutations for previously reported (74.6%; (11/15)

Discussion
A description of genotypes and subgenotypes is important for a better understanding of the epidemiology, transmission, virulence potential, and clinical outcome of HBV infections [5]. One of the key criteria for assigning a virus to a subgenotype is generating a whole genome sequence. Here, we analyzed a collection of HBV genome sequences collected in a single healthcare facility in Bangladesh. To our knowledge, this is the first study reporting genotypes and subgenotype of HBV in Bangladesh through WGS. Considering the prevalence of chronic HBV and the limited availability facilities for characterizing HBV in Bangladesh, this genotyping and subgenotype Stop codon on S 3.8 (2) 10 (1) 4.2 (1) 0 (0) 0.417 d = One way ANOVA test data from the Bangladeshi population is important for clinical management decision making, disease modeling, and health resource allocation for the management of chronic HBV [31]. The majority of the HBV in our study belonged to genotype C and D, which have a higher risk of HCC and chronic infection then genotype A and B. The progression to chronic HBV infection has been shown to be commonly associated with genotypes A and D then with other genotypes [31]. Our data is in agreement with recently published HBV genotyping data from Bangladesh, and data from neighboring countries including India [32] [33]. Genotypes A and D are known to be horizontally transmitted, more than half of the HBV identified here were genotype A or D, indicating possible horizontal transmission through blood or blood products [31]. Additionally, we observed a high degree of recombination among these HBV isolates; it is not apparent if this recombination occurred a result of co-infection/super infection with two genotypes within the patient or if the patient was infected with the recombinant strain. In half of the recombinant sequences, the recombination fragment (minor parent) was a genotype C virus. As the prevalence of genotype C HBV is high in Bangladesh, it is possible that patients were co-infected/super infected with two genotypes. The majority of the recombination breakpoints identified here, were in agreement with previous studies where recombination breakpoint hotspots have been observed in the X gene and the preCore/core gene [34]. Although the prevalence of recombination (HBV B/C) in HBV genotype B/Ba (B2-B5) from Vietnam, China, Hong Kong, Indonesia, and Thailand is high [35], the prevalence of genotype D/C or D/B recombination are relatively infrequent [34].
We chose to select patients who were treatment naïve to identify preexisting drug resistance mutations not likely influenced by treatment selective pressure. It is not unexpected that the majority of the viruses did not harbor primary or secondary drug resistance associated mutations, as the majority of the patients were treatment naive. Approximately half of the HBV had putative and pretreatment NAr mutations, including a third of the viruses exhibiting a novel mutation, and the prevalence of novel mutations was higher in the genotype D viruses. Eleven genotype-dependent AA polymorphic positions were identified for A-, C-and D-genotypes; similar observations have been reported previously. The cause of novel amino acid substitution associated with NAr and the aa dependant polymorphism is not known, however, it has been suggested that such mutations may be associated with the evolution and adaption of HBV in a defined population [21]. We identified an isoleucine at rt91 and tyrosine at rt221 is genotype A dependant; however, these positions (isoleucine at rt91 and tyrosine at rt22) have been reported as putative NAr mutations. Therefore, potential NAr mutations need further investigation regarding nucleos(t)ide resistance in vitro and in vivo. Moreover, the AA sites in interdomians displayed the highest mutation frequency (S1 Fig). It is likely that the interdomians are less crucial for RT function and antiviral resistance, rather the mutations within which might be driven by host immune responses [36]. All genotype D isolates and 8% of the genotype C isolates had an 18bp preS1 deletion. Longitudinal studies have shown that the preS deletion mutations occur during the long course of liver disease, but not at the beginning of HBV infection [12]. It is thought that such deletions evolve during the course of long lasting infections and are associated with higher risk of HCC. We found that HCC associated mutations in at A168V and V184A were significantly higher in genotype D isolates [12]. Analysis of MRH region showed that 17% of the isolates had mutations in the "a" determinant region. These mutations can affect the antigenicity of HBsAg, and have shown to be responsible for false-negative results by commercial assays for HBsAg, evasion of anti-HBV immunoglobulin therapy, and evasion of vaccine induced immunity. These  "vaccine-escape" mutants are more common in countries with high rates of endemic infections and universal immunization programs [23]. Mutations in BCP region, specifically the G1762A/G1764A double mutation, have suggested to be closely associated with HCC [13]. One-third of the isolates in the present study across all genotypes harbored these mutations, indicating the presence of HBV with increased risk for HCC in Bangladesh. The G1896A mutation in the precore/core gene results in a stop codon in 28 aa of core gene, and has been shown to be associated with fulminant hepatitis [10]. Approximately half of the genotype D isolates in this study harbored this mutation, indicating a potential for hepatic flare in these patients.
This study has limitations. First, specimens was collected over a short period of time and from a single tertiary care hospital and may not be representative of the population in Dhaka or Bangladesh as a whole. Second, a lack of data on the clinical presentation from the patients whom the sample was collected limits the clinical relevance of the viral subgenotype. Longitudinal studies on patients with specific subgenotype infection are essential to fill this knowledge gap.
Supporting information S1 File. The HBV reference genome sequence and sequences isolated from Bangladesh in this study used for phylogentic analysis. Genotype, subgenotype, GenBank accession number and country of origin of the reference sequences used for phylogenetic analysis. BD_HBV followed by isolate number, GenBank accession number and subgenotype of the HBV isolates from this study.