Hepatitis B virus genotypes A1, A2 and E in Cape Verde: Unequal distribution through the islands and association with human flows

Hepatitis B virus (HBV) diversity has not been previously studied in Cape Verde. The archipelago was discovered in 1460 by Portuguese explorers, who brought African slaves to colonise the islands. In this study, we investigated the HBV characteristics from 183 HBsAg-positive Cape Verdean individuals. Phylogenetic analysis of the pre-S/S region and the full-length genomes revealed 54 isolates with HBV/A1 (57%), 21 with HBV/A2 (22%), 19 with HBV/E (20%), and one with HBV/D (1%). HBV genotypes and subgenotypes were unequally distributed through the islands. In São Vicente, the main northern island, most isolates (84%) belonged to the African-originated HBV/A1, with the remaining isolates belonging to HBV/A2, which is prevalent in Europe. Interestingly, the HBV/A1 isolates from São Vicente were closely related to Brazilian sequences into the Asian-American clade, which suggests the dissemination of common African ancestors through slave trade. In contrast, in Santiago and nearby southern islands, where a recent influx from different populations circulates, a higher diversity of HBV was observed: HBV/A1 (40%); HBV/E (32%); HBV/A2 (28%); and HBV/D (1%). HBV/E is a recent genotype disseminated in Africa that was absent in the era of the slave trade. African and European human flows at different times of the history may explain the HBV diversity in Cape Verde. The possible origin and specifics of each HBV genotype circulating in Cape Verde are discussed.


Introduction
Hepatitis B virus (HBV) infection remains a major cause of liver disease worldwide. It is estimated that two billion people have been infected with HBV and more than 240 million are chronic carriers [1]. HBV infection is highly endemic in Asia and Africa. Among African countries, the prevalence of HBV surface antigen (HBsAg) varies from 5% to up to 15% [2][3][4]. Despite this, HBV epidemiology is still poorly documented in most African countries [5]. PLOS  with unique features, such a second start codon in the pre-S1 region and the rare ayw4 serotype [5]. Due to its low genetic variability and the fact that this genotype is found exclusively in Africa or African descendants, Mulders and colleagues [35] suggested that the introduction of genotype E into the human population is a recent event that occurred from the mid to the late 19th century, when the slave trade was over. Despite the possible recent introduction of HBV/ E as a human pathogen, different clusters of HBV/E have been identified [36,37]. The aim of this study is to associate the phylogenetic data of HBV with historical facts in order to uncover and shed light on the origins and diversity of HBV subgenotypes in Cape Verde. The characteristics and specifics of each HBV genotype circulating in Cape Verde are discussed.

Ethics statement
The Cape Verde National Ethics Committee in Health and Research and the Research Ethic Committee of Oswaldo Cruz Institute, Rio de Janeiro, Brazil approved the study. Serum samples that tested positive for HBsAg were stored at-20˚C in local Cape Verdean health units, codified, anonymously handled, and sent in dry ice to Brazil for HBV molecular analysis. The results of the present research were made available to the Cape Verde National Ethics Committee and to local health services where the serum samples were stored.

Serum samples, viral DNA extraction and PCR amplification
Blood samples were collected between 2010 to 2016 from individuals living in Boa Vista, Fogo, Maio, São Vicente, Santiago and Santo Antão islands. Sera were stored at -20˚C in the only two public hospitals, located in Praia (Santiago) and Mindelo (São Vicente), able to perform HBsAg serology of suspected cases. Samples from 183 individuals, who tested positive for HBsAg by ELISA (Monolisa HBsAg ULTRA, Bio-Rad Laboratories, France), were randomly selected.
HBV-DNA was extracted from 200 μl of serum sample with High Pure Viral Nucleic Acid Kit (Roche Diagnostics, Mannheim, Germany) according to the manufacturer's instructions. HBV-DNA from pre-S/S region was PCR amplified in semi-nested reaction as previously described [38]. The first round of amplification was performed with PS1-P3 oligonucleotide primers. The second round of amplification was performed with sense primer PS1 and a mixture of two antisense primers, S2 and S22 (S1 Table). Serum samples positive for pre-S/S PCR amplification were further subjected to amplification of the whole genome using primers P1 and P2 (S1 Table) as previously described [39].

Nucleotide sequencing
PCR products from the pre-S/S region and full-length HBV genomes were purified from agarose gel after electrophoresis and directly sequenced (Big Dye Terminator v3.1 Cycle Sequencing kit, Applied Biosystems, Foster City, CA), using HBV internal primers (S1 Table) as described previously [38,40]. Sequencing reactions were analysed on an ABI 3730 automated sequencer (Applied Biosystems). Sequence alignments were performed by the Clustal X programme using reference sequences of all HBV genotypes/subgenotypes, as previously described [41]. Pre-S/S and full-length sequences of each genotype/subgenotype from Cape Verde were further aligned with representative genomes for which complete genome and geographic localisation were available in GenBank, as previously described [32]. Nucleotide alignments were performed by Clustal in MEGA version 6 software. This software was also used to calculate genetic distances and to deduce the amino acids of each genomic region. Phylogenetic trees were constructed by the maximum likelihood method inferred with the PhyML programme [42] using an online web server [43] under the GTR + I + G nucleotide substitution model selected using the jModeltest v.2 programme and the SPR (Subtree Pruning and Regrafting) branch-swapping algorithm for heuristic tree search. The consistency of the tree topology was estimated with an approximate likelihood-ratio test [44] based on a Shimodaira-Hasegawa-like procedure.

Statistical analysis
Categorical variables were compared using Fisher's exact and Chi-square tests. All statistical analyses were performed using GraphPad software tests in contingency tables. Differences were considered to be statistically significant when the p-value was less than 0.05.

Results
The demographic profile of the 183 HBsAg-positive subjects studied here is shown in Table 1. Among the individuals whose gender was known, 104 (59%) were men, and 73 (41%) were women. Their ages ranged from five to 74 years with an average of 35.5 years and a median of 34 years. Only 30 (16%) individuals were more than 51 years old. Adults (20-50 years old) comprised approximately 74% of the studied population.
Among the 145 individuals who gave information on their educational status, 90 (62%) never attended school or at most completed their first educational level. Only 22/145 (15%) had attended upper secondary/tertiary studies. Ninety-seven HBsAg-positive samples were from individuals who lived in Santiago, the first main island located in the southern region of the archipelago. Three additional samples were collected in southern islands close to Santiago (Brava, Fogo and Maia). São Vicente, with 77 collected HBsAg-positive samples, is the second most important island located in the northern region of the archipelago. One sample was collected in Santo Antão. Finally, five samples came from Boa Vista, the easternmost island of Cape Verde. HBV-DNA could be detected in 126 of the 183 (69%) serum samples. There were no significant differences in the HBV-DNA positivity in regards to the demographic characteristics of the participants. The classification of HBV sequences into genotypes and subgenotypes was performed after alignment and construction of a phylogenetic tree of all 95 pre-S/S sequences obtained here, in combination with reference strains from all of the genotypes/subgenotypes of HBV. Among these isolates, HBV full-length genomes were successfully PCRamplified from 19 isolates. Four different genotypes/subgenotypes were found. HBV/A1 was the most prevalent (n = 54, 57%) followed by HBV/A2 (n = 21, 22%) and HBV/E (n = 19, 20%). One HBV/D isolate was also detected ( Table 1). As expected, the deduced amino acids of the small S proteins of all HBV/A isolates corresponded to the adw2 serotype. HBV/E isolates were from ayw4 serotype and the HBV/D isolate was ayw2.
There were no significant differences in the demographic characteristics of the participants regarding the genotyping distribution, with the exception of the geographic residence of the participants. Although HBV/A1 could be detected in the southern and northern islands, a large proportion of HBV/A1, 32/54 (59%), belonged to the São Vicente northern island. Moreover, in São Vicente, HBV/A1 was the most frequent (32/38, 84%) genotype, with a small proportion of HBV/A2 (6/38, 16%). In contrast, most HBV/A2 and HBV/E isolates were found in Santiago and the nearby southern islands (15/21, 71% and 17/19, 89%, respectively). Contributing to genotype diversity, among the 53 genotyped samples in the southern islands, 21 (40%) belonged to subgenotype A1 and one HBV/D isolate was found in Santiago. In Boa Vista (the easternmost island of Cape Verde), among the three genotyped samples, two belonged to HBV/E and one to HBV/A1. This unequal distribution of HBV genotypes in the archipelago was extremely significant (two-sided P value < 0.0001) ( Table 1). One hundred thirty-four sequences of HBV/A1, 231 sequences of HBV/A2 and 250 sequences of HBV/E with complete genome and geographic localisation information in GenBank were used for alignment and comparison with sequences from Cape Verde (see supplementary material). The pre-S/S trees were constructed after alignment of the pre-S/S region of the same GenBank sequences with complete genome and geographic region information. Similar phylogenetic trees were obtained for HBV/A genotypes when comparing pre-S/S and full-length trees. Most (50) HBV/A1 isolates, from a total of 54 Cape Verdean HBV/A1 isolates (51 isolates with only pre-S/S sequences and three with complete genome sequences), clustered together with isolates from the 'Asian-American' clade (with 0.79 and 0.99 of support for pre-S/S and full-length trees, respectively), represented in Table 1. Sociodemographic characteristics of the HBsAg positive individuals and genotype distribution.  clade. The mean distance among full-length sequences from this clade was 1.9 ± 0.1. In contrast, a small number (3/54) of Cape Verdean isolates clustered into the African clade (blue colour) together with the isolates from seven sub-Saharan countries, namely Congo, Kenya, Malawi, Rwanda, Tanzania, Uganda and Zimbabwe, as well as with 16/18 South African isolates (Fig 2A). The two remaining South African isolates clustered into the Asian-American clade; the mean genetic distance within full-length sequences from this clade was 2.3 ± 0.1. One pre-S/S HBV/A1 sequence from Cape Verde (Fig 2A) was outside of both clades. . Two hundred and fifty HBV/E isolates, whose full-length sequences and geographic origins were available in GenBank, were used for alignment and comparison with 20 HBV sequences from this work (marked with black square). Accession numbers are indicated in S1 File. The Southwest African lineage (SWAL) is shown in purple. All nodes marked with an asterisk showed aLRT support ! 0.80. Countries are indicated when two or more neighbor sequences are of common origin.

A2 (n = 21) n (%) E (n = 19) n (%) D (n = 1) n (%)
https://doi.org/10.1371/journal.pone.0192595.g003 Although a larger proportion (32/54, 58%) of HBV/A1 isolates came from the São Vicente and Santo Antão northern islands, the four sequences grouping outside the Asian-American clade were from Santiago. Overall genetic distance within all full-length HBV/A1 sequences was 2.3 ± 0.1. The three full-length HBV/A1 isolates from Cape Verde clustering in the Asian-American clade were very closely related, with a mean genetic distance of 0.4 ± 0.1. Compared with the Asian-American clade, these three A1 sequences from Cape Verde were more closely related to the 23 Brazilian sequences with a net average distance among them of 0.4 ± 0.1. By pairwise comparison, the distance between them varied from 0.4 to 1.9. The deduced amino acid sequences of all seven HBV open reading frames (polymerase, pre-core/core, pre-S1/pre-S2, S and X protein) for the 188 HBV/A1 isolates (134 reference sequences and 54 isolates from this work) used to construct the phylogenetic trees on Fig 2A and 2B, were compared. Table 2 highlights seven consensus amino acids of the HBV polymerase of the Cape Verdean A1 isolates. The first four positions, A91, H138, P198, and H269, were shared by the Asian-American clade. These amino acids were found to be largely predominant (88%) in isolates from the Asian-American clade. For all other African countries, with exception of Somalia, typical residues for those positions were I (91), Q (138), S (198), and Y (269). Somalia isolates displayed I91, Q138 and the amino acid consensus of the Asian countries at the other six positions. Cape Verde isolates displayed a unique three amino acid consensus in the polymerase: T356, I601 and S665, with frequencies of 83% for T356 and 100% for the two others. No other unique amino acid consensus of the HBV/A1 isolates of Cape Verde was found in the other open reading frames.
Twenty-one out of 22 HBV/A2 pre-S/S sequences from Cape Verde grouped into a single cluster quite separate from the HBV/A2 sequences of the other geographic regions (Fig 2C). The seven full-length genomes of Cape Verde isolates that were successfully sequenced confirmed that the Cape Verde HBV/A2 isolates clustered in a group separated from the others, with 0.99 of support (Fig 2D). Among all 231 isolates analyzed, only one clustered with the HBV/A2 Cape Verdean isolates (Fig 2C and 2D). This sequence was from Poland (accession number GQ477464, see S1 File). The overall mean genetic distance within the 238 HBV/A2 full-length sequences was 1.1 ± 0.1. The genetic distances between the HBV/A2 full-length genomes of Cape Verde and those of the other localities (continental Africa, Asia, Americas or Europe) were similar (approximately 2.2).
Genetic distances of HBV/A2 isolates within countries where they circulate, were determined, based on full-length sequences. Low (0.2-0.4) values were found for Cuba, Brazil, Japan, Belgium, Estonia and Latvia. In other countries (Argentine, Martinique, South Africa, Germany and Italy) they varied from 0.7 to 1.1. The genetic distances between HBV/A2 sequences of Cape Verde were higher (1.8 ± 0.2), as it is the case for Poland (1.7 ± 0.1) and France (1.5 ± 0.1). The highest value (2.2 ± 0.1) was observed in Spain.
Interestingly, the deduced amino acid sequences of the HBV proteins showed that some amino acid consensuses of the Cape Verdean A2 isolates where different from those observed in other regions of the world (Table 3). This is the case for four, 10, and three consensuses in the pre-S1, polymerase, and X protein, respectively. These variations appeared at high frequencies (57-100%) among the isolates from Cape Verde. High rates (78-99%) [37] of other amino acids in these positions were observed for all other HBV/A2 sequences from around the world. In all cases, those differences of frequencies were extremely statistically significant (P < 0.0001). Fig 3A and 3B HBV/E show phylogenetic analyses based on the pre-S/S and full-length genomes performed with all 250 complete HBV/E isolates available in GenBank. Nine fulllength HBV/E sequences from Cape Verde, as well as ten additional pre-S/S sequences (total = 19), were included. A large majority (n = 225) of the HBV/E sequences from GenBank were from African isolates, since HBV/E originated recently in Africa [26]. The remaining 25 sequences were from Argentina, Belgium, Colombia, Cuba, Mexico, Martinique, United Kingdom, Saudi Arabia and Japan. The genetic distance among all the full-length HBV/E isolates was 2.0 ± 0.1. Different from the HBV/A trees, where similar phylogenetic patterns were observed, the HBV/E trees showed a rather distinct clustering pattern when comparing the pre-S/S region and the full-length genome trees (Fig 3A and 3B). In the pre-S/S analysis, the HBV/E sequences of Cape Verde were scattered in the tree, clustering with sequences from different countries (Angola, Guinea, Nigeria, and United Kingdom). As previously described, the full-length genomes of all isolates from Angola, Namibia and the Democratic Republic of Congo clustered together into a cluster called 'Southwest African lineage' (SWAL) [37]. Among the nine full-length Cape Verdean HBV/E isolates, two sequences clustered into SWAL, with support of 0.99. In addition, sporadic sequences isolated in South Africa and Central-South America (Argentina, Colombia, Cuba and Mexico) belonged to the separate SWAL lineage (represented in purple). The other seven full-length genomes from Cape Verde clustered near several sequences from Guinea with support of 0.85 (Fig 3A and 3B).
As previously noted, some variations of amino acids are specific for the SWAL lineage [37]. Table 4 shows that the two sequences from the Cape Verde clustering in the SWAL group displayed I57 in the small S, H177 and L612 in the polymerase gene, and L30 and G36 in the X protein. This was expected because these amino acid residues are consensus inside the SWAL lineage. However, both Cape Verdean isolates belonging to the SWAL lineage displayed G245 in the polymerase, which is the consensus for the African isolates outside SWAL, instead of W245, which is the consensus of isolates inside SWAL. No specific variations were observed for the HBV/E belonging to Cape Verde. Finally, the only HBV/D isolate was more closely related to the HBV/D4 subgenotype (Fig 3A).

Discussion
Two main HBV genotypes, A and E, were observed in Cape Verde. Both are frequently found in Africa, with one or the other being highly prevalent in most African countries. Despite the high variability of African HBV/A, only two subgenotypes, HBV/A1 and HBV/A2, were detected in Cape Verde. HBV/A1 was the most frequently detected subgenotype in Cape Verde. It is also the major African genotype spread among Afro descendants outside of Africa. It is likely that HBV/A1 was spread by the slave trade, which exported African slaves to Asia in the 17th century as a result of Arab or Portuguese trade and to Latin America in the 16th to 19th centuries through the trans-Atlantic slave trade [31]. By phylogenetic analysis, we showed that most HBV/A1 isolates from Cape Verde clustered into the Asian-American clade and these isolates were very closely related to the HBV/A1 isolates from Brazil (Fig 2A and 2B). This identity of the HBV genomes may be explained by the spread of HBV/A1 during the Atlantic slave trade. In fact, it is known that most of the slaves brought to Cape Verde by the Portugueses were en route to Brazil and the West Indies (Antilles and Central America) [6][7][8]. The question of the location of the forcible capture of these Africans that spread HBV/A1 remains unanswered. Tracing the routes of African slavery is difficult, as the Portuguese mixed different ethnic groups captured in different African regions, exporting them inside and outside of Africa. Our results and other studies [31,32,45,46] show that inside Africa, other than Cape Verde, only Somalia has a significant number of isolates clustering into the HBV/A1 Asian-American clade. Moreover, some other Somalian isolates cluster into the African clade (Fig 2A and 2B). Bantu people, disseminated in Africa at the time of slavery, seem to be the link between Somalia, Asia, Cape Verde and Brazil. Bantu people in Somalia may be the origin of the HBV/A1 that differentiated into the two clades (African and Asian-American) since it was dispersed through Asian and American countries. Other possibility for the geographic origin of HBV/A1 is Mozambique, from which Africans were captured for slavery between 1837 and 1856, circumventing the laws that banned the transatlantic slave trade [32,47]. To date, no HBV full-length sequences from this locality have been published. Other previous studies corroborate that the dispersion of HBV/A1 through continents was via Somalia and/or Bantu people during the Atlantic slave trade [31,45,46,48]. However, it is not possible to rule out the hypothesis that HBV/A1 originated from western African countries (with Angola and Congo as the major sources of slaves to Brazil) and that HBV/A has now been replaced by HBV/E. HBV/A2 was the second most prevalent subgenotype in Cape Verde, with 21/95 (22%) isolates. HBV/A2 is frequently detected in northwest Europe and the USA, and has been isolated in South Africa [26,27,29,31]. Several studies have suggested that the dispersion of HBV/A2 should be more recent than that of HBV/A1 [48,49]. The evolutionary pattern of HBV/A2 suggests an exponential growth of infections between 1970 and the mid-1990s [49]. The spread of HBV/A2 seems to be linked to sexual transmission, since HBV/A2 is more prevalent in sexual behavioural risk groups, such as men who have sex with men [50][51][52][53]. Although it is possible that HBV/A2 was disseminated at the time of slavery, as suggested previously [46], the origin of the currently circulating HBV/A2 seems to be more recent than that of HBV/A1, and would have occurred during the first decades of the 20th century, as described above. Indeed, we observed that the isolates from Cape Verde did not cluster with the isolates from South Africa (Fig 2C and 2D), suggesting that the introduction of HBV/A2 into Cape Verde had not occurred via South Africa. HBV/A2 is highly prevalent in Europe, but it is unlikely that HBV/ A2 was introduced into Cape Verde by the Portuguese explorers during colonisation. It is known that Portuguese settlers had a low contribution in the formation of the Cape Verdean people, as shown by historical facts and by a mitochondrial DNA study [6]. Moreover, there is a consensus that HBV/A2 is more recent than HBV/A1 [46]. Indeed, we found that the genetic distances within all of the isolates of HBV/A2 were much lower (1.1 ± 0.1) than that within HBV/A1 (2.3 ± 0.1), in agreement with studies demonstrating that the HBV/A2 subgenotype is more recent than that of HBV/A1 [48,49]. It is probable that HBV/A2 introduction to Cape Verde occurred during or after the exponential growth of HBV/A2 in Europe. Corroborating a recent European origin of the HBV/A2 isolates in Cape Verde, the only full-length sequence that clusters with the Cape Verdean HBV/A2 isolates was a sequence from Poland. The genetic distances within Cape Verde isolates (1.8 ± 0.1) were similar to those observed within some European countries such as Poland (1.7± 0.1) and France (1.5 ± 0.1), thereby suggesting a similar time of infection/differentiation in Cape Verde, as in some European countries.
Several amino acid consensuses of the Cape Verdean HBV/A1 and HBV/ A2 isolates were different from those observed in other regions of the world. Two of these variations, namely pre-S1 74V and 91V, observed among the HBV/A2 isolates from Cape Verde, were present in almost all 151 HBV/A1, QSA3, HBV/A4 isolates used for sequence alignment. This raises the hypothesis that HBV/A2 isolates from Cape Verde to be intersubgenotypic recombinants. However, no evidence of recombination was observed for any Cape Verdean sequence when analyzed by RDP4 and Simplot recombination programs. Other two variations in HBV/A2, namely 132M and 133I in X protein, correspond to the well studied double mutation 1762T-1764A in the basal core promoter, associated with the anti-HBe phenotype. This double mutation has been observed in different genotypes of HBV. However, its frequency was much higher in HBV/A2 from Cape Verde than from other countries.
While HBV/A predominates in eastern and south eastern Africa, HBV/E is the most frequent genotype in a large area of central and western Africa [26,49]. Out of Africa, the HBV/E isolates have sporadically been found within the Americas, indicating that this genotype was introduced into the general African population after the end of the trans-Atlantic slave trade [26], likely within the last 130 years [54]. Distinct from most genotypes, HBV/E has a low degree of genetic diversity, and HBV/E isolates are classified into a single monophyletic group [35]. In this study, HBV/E was detected only in the southern and Boa Vista islands, with an overall frequency of 20% (19/95 isolates). The genetic distance within all HBV/E isolates (2.0 ± 0.1) was similar to that within a single subgenotype (HBV/A1; 2.3 ± 0.1). This relative low degree of variability is probably due to the more recent origin of HBV/E. However, pre-S/ S and full-length phylogenetic analyses showed that HBV/E isolates from Cape Verde did not cluster together (Fig 3A and 3B), thereby suggesting different origins/introductions of HBV/E circulating in Cape Verde.
In contrast to HBV/A1 and HBV/A2, no specific consensus of amino acids was observed among the HBV/E isolates circulating in Cape Verde, indicating that HBV/E was more recently introduced in Cape Verde than HBV/A1 and HBV/A2. The lower prevalence of HBV/E in relation to HBV/A1 indicates that there has not been, at least so far, any explosion of HBV/E in Cape Verde, as is supposed to have occurred in western Africa [26].
In this study, we observed an unequal distribution of the HBV genotypes throughout the Cape Verde islands. São Vicente and the northern islands showed a high prevalence of HBV/ A1 from the Asian-American clade, a low prevalence of HBV/A2 and the absence of the recent HBV/E genotype. This distribution is as expected for the time of the discovery of the archipelago: a high frequency of the HBV/A1, all of them belonging to the Asian-American clade associated with the slave traffic, and a small percentage of HBV/A2 that corresponds to the European genotype. In contrast, Santiago and the southern islands displayed a mixture of ancient and recent HBV genotypes (HBV/A1/A2/E and D). Furthermore, Santiago displayed HBV/A1 sequences grouping inside and outside of the Asian-American clade. HBV genotype prevalence in the southern islands may reflect the ancient and modern influx of human population in Cape Verde.
In conclusion, the diversity of HBV genotypes observed in Cape Verde seems to be linked to the ancient historical relationship and the modern relationship of Cape Verde with African and European countries.
Supporting information S1 File. Full-length genome sequences used to construct phylogenetic trees (Figs 2 and 3). Sequences are identified by their GenBank accession numbers and countries of origin. The following criteria were used for inclusion in the phylogenetic studies: non-recombinant human isolates with known country of origin whose nucleotide sequences have been totally determined and did not show any insertion. (DOCX) S1