Genetic diversity, haplotype analysis, and risk factor assessment of hepatitis A virus isolates from the West Bank, Palestine during the period between 2014 and 2016

Background Hepatitis A virus (HAV) infection is one of the major causes of acute viral hepatitis. HAV genotypes and its genetic diversity is rarely investigated in our region as well as worldwide. Aims The aims of the present study were to determine the HAV genotypes and its risk factors and to investigate the genetic diversity of the HAV isolates in the West Bank, Palestine. Study design A cohort of 161 clinically and laboratory-confirmed HAV (IgM-positive) cases and 170 apparently healthy controls from all the districts of the West Bank, Palestine during the period of 2014 to 2016 were tested for HAV infection using IgM antibodies, RT-PCR and sequence analysis of the VP3/VP1 junction region of the HAV genome. Phylogenetic analysis, genetic diversity and haplotypes analysis were used to characterize the VP3/VP1 sequences. Results All the 34 sequences of the HAV were found to be of HAV-IB sub-genotype. The phylogenetic analysis showed four main clusters with cluster III exclusively consisting of 18 Palestinian isolates (18/23-78%), but with weak bootstrap values. A high haplotype diversity (Hd) and low nucleotide diversity (π) were observed. Cluster III showed high number of haplotypes (h = 8), but low haplotype (gene) diversity (Hd = 0.69). A total of 28 active haplotypes with some consisting of more than one sequence were observed using haplotype network analysis. The Palestinian haplotypes are characterized by closely related viral haplotypes with one SNV away from each other which ran parallel to cluster III in the phylogenetic tree. A smaller Palestinian haplotype (4 isolates) was three SNVs away from the major haplotype cluster (n = 10) and closer to others haplotypes from Iran, Spain, and South Africa. Young age, low level of parent’s education, infrequent hand washing before meals, and drinking of un-treated water were considered the major HAV risk factors in the present study. Conclusion Haplotype network analysis revealed haplotype variation among the HAV Palestinian sequences despite low genetic variation and nucleotide diversity. In addition, this study reconfirmed that age and parent’s level of education as HAV risk factors, while hand washing and treating drinking water as protective factors.


Introduction
Hepatitis A virus (HAV) is "a non-enveloped RNA virus belonging to the family Picornaviridae, genus Hepatovirus". HAV is one of the major causes of acute hepatitis worldwide and contributes to substantial morbidity in both developed and developing countries. Based on HAV genome sequencing, human HAV have been classified into three genotypes, HAV I, II, III and sub-divided into 6 sub-genotypes (IA, IB, IIA, IIB, IIIA, IIIB) [1].
Worldwide, the incidence rate of the HAV infection is underestimated due to the clinical presentations of this disease, since infection at early childhood is largely passed asymptomatic or has mild forms [2]. Recent data showed that the global incidence of HAV is 1.9% with an estimate of 119 million cases of HAV infection [3,4]. Approximately 1.4 million new cases of HAV reported each year with up to 22% of the cases being hospitalized [5]. Palestine (West Bank, and Gaza Strip) was classified as an area of high endemicity of HAV infection [6]. The Palestinian official figures put the HAV infection incidence rate as high as 9.5-85 per 100,000 during the period between 2000 and 2018 [7]. Actual incidence rate is thought to be higher due to underreporting and asymptomatic cases.
HAV is transmitted mainly by the fecal-oral route, consumption of contaminated water and food and to a lesser extend from person to person or via blood transfusion [8][9][10]. Poor hygiene and sanitation practices are reported as the major risk factors for HAV infection, particularly in low and middle-income countries [11].
The World Health Organization (WHO) classified the HAV endemicity based on anti-HAV IgG antibodies as follows: high � 90% IgG seroprevalence by 10 years of age, intermediate � 50% IgG seroprevalence by 15 years of age, < 90% IgG seroprevalence by 10 years of age, and low � 50% IgG seroprevalence by 30 years of age, < 50% IgG seroprevalence by 15 years of age [12].
The HAV infection in children is either asymptomatic or with mild symptoms. On the contrary, infection in adults is more frequently occurring with symptoms. In high endemic regions, HAV infection is acquired in early age of childhood and most adult is positive for anti-HAV IgG with a life-long immunity, whereas, in low endemic countries, most adult population is susceptible to infection [3].
A recent review investigated and analyzed data based on anti-HAV seroprevalence in the Middle East and North Africa (MENA) countries reported a gradual shift in the age of HAV infection from early childhood to late child and adulthood and indicating a shift towards intermediate endemicity in these countries in general [6]. However, based on a single old study conduct in Gaza Strip, Palestine, between 1995 and 2001, Palestine is still considered a high endemic country. Therefore, no solid conclusion has been made regarding the current endemicity levels of HAV and vaccine recommendations in Palestine [13].
The aims of this study were to determine the HAV genotypes, the risk factors associated with HAV infections and to visualize the genealogical relationship between intraspecific HAV individual genotypes at the population level and exploring the genetic diversity between the HAV sequences.

Study sample
This case-control, cross-sectional study design, comprised of 161 clinically and laboratory confirmed HAV (IgM-positive) cases and 170 apparently healthy controls from all the districts of the West Bank, Palestine during the period of 2014-2016. All the 161 HAV cases included in the study were randomly-selected from clinically and laboratory-confirmed (IgM-positive) pool of cases tested for HAV-IgM antibodies at the Central Public Health Laboratory, Palestinian Ministry of Health. In parallel, the 170 healthy controls were randomly-selected individuals attending the primary health care centers in the Palestinian Ministry of Health, for routine medical intervention without any apparent symptoms of HAV infection. Cases and controls were matched for number of subjects. A questionnaire was used to collect demographic, socioeconomic status, behavioral and medical data including age, sex, residence, education, working status, housing details, toilet facilities (flush vs. pit), drinking water sources (pipe-public net sources vs. collected well or local spring), available used sewage systems (public net system vs. holes), income, educational level of parents, and other socio-economic data (S1 and S2 Questionnaires). The study was approved by the Palestinian Ministry of Health under the reference number 145/1541/2014. Verbal informed consent was obtained from all patients and control group or their guardians in case of minors. The data was analyzed anonymously. Pregnant women and non-communicable disease patients were excluded from the study.

Clinical samples
Five milliliters blood were collected from each participant in a plain tube. The serum samples were separated by centrifugation at 4000 rpm for 10 minutes. Then, the serum was separated into two 1.5ml micro-tubes, one for serological assays and the other for RNA extraction. The serum samples were kept at -20 o C until use.

Serological assays
All serum samples of both the HAV cases and the healthy subjects were tested for HAV-IgM antibodies at the Central Public Health Laboratories, Palestinian Ministry of Health using a chemiluminescent microparticle immunoassay (CMIA) for the qualitative detection of IgM antibody (Architect, Abbott-USA) according to manufacturer's instruction. Briefly, in the first step, the prediluted sample, assay diluent, and hepatitis A virus (human) coated paramagnetic microparticles are mixed and washed. The anti-HAV IgM binds to the anti-human acridiumlabeled conjugate that that was added in the second step. Following a second wash, pre-trigger and trigger solutions were added to the reaction mixture. The resulting chemiluminescent reaction was measured as relative light units (RLUs). The presence or absence of IgM anti-HAV in the sample was determined by comparing the chemiluminescent signal in the reaction to the cutoff signal determined from an Architect anti-HAV IgM calibration.

Molecular assays
Extraction of viral RNA. The HAV viral genome was extracted from 200μl of serum samples, using a QIAamp Mini Elute Virus spin kit for the viral RNA/DNA extraction (QIAGEN, Germany) according to the manufacturer's instructions. Briefly, 200μl of serum were added to 20μl of proteinase K and 200μl of lysis buffer with carrier RNA. The mixture was incubated at 56˚C for 15 minutes using heat block. Then, the whole mixture was transferred into QIAamp Mini column and centrifuge for 1 minute at 8000 rpm. The QIAamp mini column was washed twice using washing buffer 1 and 2. Finally, the RNA was eluted in 1.5 ml micro-tube using 40μl diethylpyrocarbonate (DEPC) water. The RNA extract was stored according to manufacturer's instruction until use.
Reverse transcriptase-polymerase chain reaction (RT-PCR). The VP3/VP1 junction of the HAV genomes of the 136 serum samples from IgM-positive patients were amplified using the primer pair described by Lee et al., (2012) [14]. Briefly, the synthesis of the cDNA and the RT-PCR were carried out in a 25μl reaction mixture containing 4μl of viral RNA extraction, 10U of Avian Myeloblastosis Virus (AMV) reverse transcriptase, 10pmol of each the forward and the reverse primers: (HAV1; 5'-GCTCCTCTTTATCATGCTATGGAT-3' and rHAV2; 5'-CAGGAAATGTCTCAGGTACTTTC-3') and 12.5μl of PCR Reddy master mix (Thermo-Fisher Scientific). PCR products (6μl) were loaded into a 2% agarose gel, electrophoresed, and stained with ethidium bromide for band visualization at an expected length of 244bp using the Gel Doc System 2000 (Bio-Rad Laboratories-Segarate, Milan, Italy). Of the 136 PCR-positive samples, 34 representative PCR amplicons were selected randomly for sequence analysis. Both the RNA extraction and the RT-PCR were carried out at the research laboratory, the Arab American University-Palestine, in the Medical Laboratory Sciences Department. The PCR amplicons of the 34 samples were purified and sequenced using the new ABI Big Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems) using an automated sequencer (ABI 3730xl DNA Analyzer, Applied Biosystems, Foster City, CA, USA). The HAV identity search was conducted using GenBank Basic Local Alignment Search Tool (BLAST) http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi.

Risk assessment analysis
The EpiInfo, a free CDC statistical package, was used to analyze data. Odds ratio, 95% confidence interval, and Fisher's exact test were used to assess the risk factors of HAV infection. Pvalue was considered significant when less than 0.05.

Genetic diversity analysis
The GenBank-deposited sequences produced by the present study along with sequences retrieved from the Genbank were aligned using the MEGA version X [15]. Maximum likelihood phylogenetic tree with 1000 iterations for bootstrapping was constructed using MEGA version X.
Population nucleotide diversity indices such as nucleotide diversity per site (π), average number of nucleotide differences (k), mean genetic diversity (Hd), genetic differentiation parameters (Fst and Nm) and neutrality tests including Tajima's D and Fu Li's F test were calculated using DnaSP ver. 6.12.03 [16]. The PopArt 1.7 [17] was used to construct a medianjoining haplotype network analysis based on country of source of viral genomes to estimate relationship between haplotypes using nexus input files produced by DnaSP version 6.12.03. Haplotype network analysis was colored and edited using a free open-source vector graphics editor called Inkscape 1.0 (www.inkscape.org). The haplotype analysis was double checked by reconstructing the median joining tree using Network 10 (https://www.fluxusengineering. com/) with RDP input file generated by DnaSP ver. 6.12.03 and default parameters of the software including epsilon value of zero and the connection cost method of Röhl for genetic distance calculation [17]. Greedy FHP method for the genetic distance calculation was also used [18].

Characteristics of study population
A total of 331 individuals were included in the study with a median age of 15 years and a 1:1 female-to-male ratio. The HAV case group were from all 11 Palestinian districts in the West Bank-Palestine with 35% (56/161) from the Al-Khalil district. All the 34 sequenced samples were shown to be of HAV-IB genotype. Twenty-three sequences were deposited in the gene Bank ( Table 1). The remaining 11 sequences were poorly sequenced yielding upon trimming sequences less than 200 nucleotides, which is the minimum sequence size accepted by the Gen-Bank. These 11 sequences were dropped from the genetic diversity analysis.

Risk assessment
The overall, 331 participants (161 cases and 170 health individuals) were analyzed for the HAV risk factors (Table 3). Young age as a demographic variable was found to be significantly associated with HAV infection (OR = 7.16, CI: 4.41-11.63, P<0.0001). Level of parent's education was the only socioeconomic risk factor significantly associated with HAV infections (OR = 4.72, CI: 2.64-8.41, P < 0.0001). Furthermore, treating drinking water and washing hands before meals as hygiene and behavior risk variables were found to be statistically significant (R = 0.11, CI: 0.01-0.965, P< 0.03 and R = 0.24, C:0.15-0.39, P<0.0001) ( Table 2).

Phylogenetic analysis
Phylogenetic tree of the VP3/ VP1 junction region of HAV genome was constructed from 23 HAV GenBank-deposited sequences from the current study along with 28 sequences retrieved from the Genbank. The phylogenetic tree showed four main clusters with cluster III exclusively consisting of 18 Palestinian isolates. Cluster I consisted of 12 strains, of which four were isolated in Palestine (Fig 1). However, the four clusters were weakly supported by bootstrap values.

Genetic differentiation and diversity
Population nucleotide diversity indices and neutrality tests were calculated for the VP3/VP1 junction region of the HAV genome, based on phylogenetic clusters (Tables 3 and 4). A high haplotype diversity (Hd) and low nucleotide diversity (π) were observed ( Table 3). The total haplotype (gene) diversity (Hd) for the 23 HAV sequences from Palestine and 28 from the Genbank was 0.93± 0.07. At the same time, the total nucleotide diversity per site (π) was 0.01± 0.002, confirming low genetic diversity in the HAV study bulk. The average number of nucleotide differences between any two sequences (k) was 1.7 which is very low. The DnaSP ver. 6.12.03 estimated the total number of haplotypes for the four probable cluster at 24 with highest in cluster I (h = 9) ( Table 3). Cluster III is composed of 18 isolates, exclusively Palestinian isolates, while one third of cluster I was from Palestine. The Palestinian cluster (III) showed high number of haplotypes (h = 8), but lowest haplotype-to-sequence (h:n) ratio (0.4:1), compared to the other clusters and lowest haplotype (gene) diversity (Hd = 0.69). Haplotype diversity (H d ), and number of segregating (polymorphic) sites (S) were highest in cluster I which included Palestinian strains (4/12) along with those from other countries (Fig 1); confirming the highest level of genetic diversity between the four probable clusters. Nucleotide diversity (π) is equally low in all clusters (0.01± 0.002). The average number of nucleotide differences (k) is also equal in all clusters except cluster I. Tajima Table 3). Inter-population pairwise genetic distance (Fst) between the four HAV probable populations ranged from 0.46 to 0.65 with Nm value from 0.11 to 0.12 (Table 4)  differentiation and minimal migration and gene flow (Nm) between subpopulations. Fst values for cluster III which is purely Palestinian compared to clusters I, II and IV were highest (0.52, 0.52 and 0.63, respectively). However, genetic differentiation among subpopulation is generally low especially between clusters I and IV (Gst = 0.073). The genetic differentiation between clusters is low as supported by other low genetic differentiation parameters including Gst, Da, and Dxy (Table 4). Fst: Wright's F-statistics, pairwise genetic distance [22], Nm: Gene flow and population migration among populations [22,23], Kxy: Average proportion of nucleotide differences between populations. Dxy: The average number of nucleotide substitutions per site between populations [20], Da: The number of net nucleotide substitutions per site between populations [20], Gst: Genetic differentiation index based on the frequency of haplotypes [24].

Haplotype network analysis
The median-joining haplotype network constructed by PopArt 1.7 using the 49 taxa produced a total of 28 active haplotypes with some of which consisting of more than one sequence (2-10 sequences) (Fig 2A). Peripheral haplotypes mainly had single nucleotide variation (SNV) from central haplotypes (Fig 2A). The Palestinian haplotypes (red circles) are characterized by closely related viral haplotypes with one SNV away from each other. The Palestinian haplotypes (red) formed a cluster of nodes (n = 9) surrounding a major node (haplotype) consisting of 10 identical HAV sequences which ran parallel to cluster III in the phylogenetic tree (Fig 1). A smaller Palestinian HAV haplotype consisting of four sequences was three SNVs away from the major haplotype cluster (n = 10) and closer to haplotypes from Iran, Spain, and South Africa which again matches cluster I in the phylogenetic tree. The reconstruction of median joining haplotype network analysis using Network 10 revealed similar haplotype network analysis profile by both distance calculation methods, the Röhl method and the Greedy HPF method (Fig 2B).

Discussion
In this study, young age, low level of parent's education significantly increased the odds of HAV infection (OR>1, P<0.05), while the significantly low odds (OR<1, P<0.05) with the increase of hand washing before meal and treating of drinking water indicated the decrease of HAV infection suggesting these two variables as protective factors. On the contrary, Koroglu et al., (2017) reported that water and sanitation were not significant risk factors for HAV infection, whereas, gross domestic product (GDP), gross national income (GNI), and the human development index (HDI) were all highly associated with HAV infection rate in the Middle East and North Africa [6]. Furthermore, Hayajneh et al., (2015), reported that the incidence rate of HAV infection in Jordan decreased due to increase in level of maternal education, use of processed bottled drinking water and good sanitation practice [25][26][27]. Our results were in congruence with other studies that reported personal hygiene (hand washing before food preparation, cooking, eating and after defecation), living on crowded campuses, drinking unprocessed water, were the major risk factors of HAV infections [28][29][30][31]. In addition, political conflict in Palestine and the occupation of the West Bank and Gaza Strip, Palestine as well as the wars in neighboring Middle East countries disturbed the environmental and health systems and subsequently increased the spreading of infectious disease outbreaks, including HAV, not only in source countries but also in neighboring regions that host refugees.
To the best of our knowledge, the present study is the first to use genetic diversity indices and haplotype analysis networking to analyze HAV genetic variation from clinical sample. All of the 34 sequenced samples in the present study proved to be of the sub-genotype IB, which is the predominantly circulating genotype in the Mediterranean region from both clinical and environmental samples such as Spain, Jordan, Egypt and Turkey [32][33][34][35].
The maximum likelihood phylogenetic tree of the 23 HAV-IB sequences in the present study with other 28 HAV sequences retrieved from the Gene Bank, identified at least four clusters with weak bootstrap values. Although most (78%) of the isolates in 2015 and 2016 distinctively clustered in clade III, four isolates detected between the years 2014 and 2016 intermixed with other HAV-IB isolates from Turkey, Spain, South African, Iran and Uruguay in clusters I. One isolate clustered uniquely by its own (Fig 1). Similar results had been described in a Bulgarian study which reported the splitting of HAV-IB into several clusters with few cases intermixing between Bulgarian and European isolates while others formed a unique Bulgarian cluster [4]. Furthermore, Wang et al., (2013) delineated HAV isolates from the same area and during the same period to have clustered to several closely related lineages with low genetic diversity, suggesting either they possess a fitness advantage in the region, or an endemic transmission of closely related strains circulating in the neighboring regions [36]. It is more likely that the four Palestinian isolates in cluster I and the lonely clustering Palestinian isolate were contracted from imported food or travelers coming from different endemic areas, whereas, cluster III with purely Palestinian isolates indicates a common source epidemic that swept over the area during the period of study in 2015 and 2016. In addition to the weak bootstrap values, the genetic diversity indices such as total nucleotide diversity per site (π), average number of nucleotide differences between any two sequences (k), and neutrality indices supported the low population nucleotide diversity among HAV sequences. This is consistent with the low antigenic variability of HAV, which is reflected by the existence of a single serotype [36].
The negative values of Tajima's D and Fu-Li's F indicate the amount of nucleotide variation observed (π) between HAV isolates is much less than expected (θ) which means low nucleotide variation. HAV clusters I and III with relatively high negative values for Tajima's D and Fu'Li's F statistics and low Hd (0.69) may have been subjected to recent population expansion or selection variation events that reduce genetic diversity such as selective sweep or bottleneck. This low variation between HAV clusters was further supported by the equal values of nucleotide diversity (π) in almost all cluster and the same applied to the average number of nucleotide differences (k). On the other hand, signs of polymorphism and nucleotide variation can be discerned in cluster IV which had positive Tajima's D value (0.17). This cluster consisting of European HAV isolates can be argued to have begun a balancing selection where cluster IV is on the brink of developing into a distinct population ( Table 3). The nucleotide intra-variation within the same cluster might be the result of a locally-generated mutant in the course of HAV epidemic with such minor variations originating from different selective pressures such as viral response to host defense mechanisms or viral replication in the face of physiological alterations [36]. Such data had been supported by recent published data from Bulgarian study which showed that the sequence analysis of HAV-IA subgenotypes were either identical or showed very few (1 to 4) nucleotide variations [37].
The fixation indices such as Fst confirmed genetic differentiation between the four HAV probable populations with minimal migration between populations, thus reducing gene flow (Nm) between the clusters (Fst > 0.25 and Nm<1).
The total number of haplotypes produced by the haplotype analysis was 28. Indeed, several haplotypes (4-9) were observed in each phylogroup. This proves that despite low genetic diversity indices, haplotypes networking can be a good surrogate for unraveling the diversity among monomorphic population and even within the same phylogenetic group which is due to the fact that haplotype network analysis is based on detecting the single nucleotide variation (SNV) among studied sequences making it a powerful tool to detect genetic and haplotype variation. Interestingly, our study showed that the number of haplotypes in the clusters containing the Palestinian isolates, I and III, were the highest with h = 9 and 8, respectively ( Table 3). The high number of haplotypes in cluster III indicates that the Palestinian HAV isolates, despite being from a geographically confined area, are more diverse and heterogeneous than those from Europe (cluster IV). However, despite the high number of haplotypes among our isolates, in comparison to the European cluster, the nucleotides variation is very low (1-2 nucleotides) which might be the result of a common origin coming from recent population expansion or selective sweep or genetic bottleneck.
Despite the high number of Palestinian haplotypes (red circles in Fig 2), but they are characterized by closely related viral haplotypes with 1-2 SNVs away from each other reflecting how close these haplotypes are. Cluster I is the most heterogeneous as it had the highest number of haplotypes (h = 9), the haplotypes �2 SVNs apart from each other, and representing isolates from Asia, Europe, South America and Africa.
Moreover, the identical median-joining haplotype networks constructed by PopArt 1.7 and Network 10.0 (Fig 2) supported the low-bootstrap value phylogroups created by the maximum likelihood phylogenetic tree using MEGA version X (Fig 1) and running parallel with genetic and haplotype diversity parameters calculated by DnaSP ver. 6.12.03.
It is noteworthy, that even though a variety of genetic and haplotype diversity indices and several genetic differentiation parameters were calculated and networks and trees constructed, some variation may have been missed due to different reasons such as missing nucleotide during sequencing and insufficient number of sequences from study area and from Genbank.
In conclusion, this study confirmed that based on VP3/VP1 junction region of the HAV genome; HAV isolates possesses low genetic variation and nucleotide diversity, albeit haplotype network analysis revealed haplotype variation among the Palestinian sequences. Furthermore, the study reconfirmed that age and parent's level of education are considered as HAV risk factors, while hand washing before meals and treating drinking water as protective factors.