Deciphering the Complex Distribution of Human Immunodeficiency Virus Type 1 Subtypes among Different Cohorts in Northern Tanzania

Background Increased understanding of the genetic diversity of HIV-1 is challenging but important in the development of an effective vaccine. We aimed to describe the distribution of HIV-1 subtypes in northern Tanzania among women enrolled in studies preparing for HIV-1 prevention trials (hospitality facility-worker cohorts), and among men and women in an open cohort demographic surveillance system (Kisesa cohort). Methods The polymerase encompassing partial reverse transcriptase was sequenced and phylogenetic analysis performed and subtype determined. Questionnaires documented demographic data. We examined factors associated with subtype using multinomial logistic regression, adjusted for study, age, and sex. Results Among 140 individuals (125 women and 15 men), subtype A1 predominated (54, 39%), followed by C (46, 33%), D (25, 18%) and unique recombinant forms (URFs) (15, 11%). There was weak evidence to suggest different subtype frequencies by study (for example, 18% URFs in the Kisesa cohort versus 5–9% in the hospitality facility-worker cohorts; adjusted relative-risk ratio (aRR) = 2.35 [95% CI 0.59,9.32]; global p = 0.09). Compared to men, women were less likely to have subtype D versus A (aRR = 0.12 [95% CI 0.02,0.76]; global p = 0.05). There was a trend to suggest lower relative risk of subtype D compared to A with older age (aRR = 0.44 [95% CI 0.23,0.85] per 10 years; global p = 0.05). Conclusions We observed multiple subtypes, confirming the complex genetic diversity of HIV-1 strains circulating in northern Tanzania, and found some differences between cohorts and by age and sex. This has important implications for vaccine design and development, providing opportunity to determine vaccine efficacy in diverse HIV-1 strains.


Introduction
Human immunodeficiency virus type 1 (HIV-1) is characterised by extensive genetic variability, as a consequence of high replication and mutation rates and frequent recombination [1,2]. Based on phylogenetic analysis, HIV-1 strains have been divided into four major phylogenetic groups: M, N, O and P. Group M, the predominant circulating group responsible for the global HIV-1 pandemic [3], is divided into nine subtypes (designated A-D, F-H, and J-K) [2]. Some of the viral strains, such as subtypes A and F, have been further sub-divided into sub-subtypes A1-A5, and F1-F2, respectively [4]. In addition, different subtypes may recombine to form circulating recombinant forms (CRFs), which continue to be transmitted from one individual to another, or unique recombinant forms (URFs) if there is no evidence of transmission from the patient in which they arose to another. Currently, at least 58 CRFs and numerous URFs have been identified [5].
Most known HIV-1 groups and subtypes have been reported in Africa; this region shows the greatest diversity of circulating HIV-1 viral strains [4]. Tanzania is among the countries in Africa severely affected by the HIV epidemic, with an estimated prevalence of HIV among the general adult population of 5.7% in 2007/2008 [6]. The distribution of HIV-1 subtypes in different regions of Tanzania gives an impression of a genetically more complex and diverse epidemic than some of its neighbouring countries. During the initial phase of the epidemic in Tanzania, HIV-1 subtypes A and D predominated and were found in similar proportions [7]. Subtype C was subsequently introduced during the late 1980searly 1990s from neighbouring southern African countries [4,7]. Currently, subtypes A1 and C are the main circulating strains among the general population [8][9][10] and in high-risk groups [9,11]. This is in contrast to the epidemics in other East African countries, which consist of mainly subtypes A1 and D [12][13][14]. Tanzania also reports the highest prevalence of recombinant forms within the East African region [8,9].
Increased understanding of the genetic diversity of HIV-1 is challenging but important in the development of an effective vaccine [15], and may impact transmission, diagnosis, disease progression, viral burden, response to treatment and emergence of ART resistance [1,16]. Limited information is available on the distribution of HIV-1 subtypes in Tanzania and no studies have been conducted in Mwanza region of northern Tanzania. We describe the distribution of HIV-1 subtypes and associated risk factors in this region, where we developed local laboratory capacity to conduct HIV-1 viral genotyping. We included a general population cohort and two high-risk cohorts to compare the distribution of HIV-1 subtypes among different populations and geographical locations.

Ethics statement
The studies were approved by the Ethics Committees of the Tanzania National Institute for Medical Research (NIMR), Kilimanjaro Christian Medical Centre (KCMC) and London School of Hygiene and Tropical Medicine. All participants involved in this study gave written informed consent (all aged $18 years). Study documents were stored in a secure location to ensure participants' confidentiality. Staff were trained on confidentiality issues, research ethics and protection of human subjects.
Those testing HIV-positive were referred to care and treatment centres.

Study populations
We analysed data from women participating in two prospective cohort studies conducted in preparation for trials of candidate microbicides and HIV-1 vaccines (hospitality facility-worker cohorts), and among men and women from the general population enrolled in the Kisesa open cohort for demographic health surveillance (Kisesa cohort; Figure 1). We aimed to enrol 50 individuals from each of the three cohorts, and analysed one sample per participant.

Study procedures for hospitality facility-worker cohorts
Participants in the cohorts were recruited between 2008-2010 from hotels, restaurants, bars, guesthouses, food-sellers at makeshift facilities, and shops selling traditionally-brewed beer in the towns of Geita, Kahama, and Shinyanga near Mwanza city (microbicides-preparedness study) and Moshi town (vaccinespreparedness study). During a screening visit, women underwent a brief interview to collect limited information about demographic characteristics, behavioural risk factors and other information to determine their study eligibility, and a blood sample was collected for HIV testing. Women were eligible if they were aged 18-44 years, willing to undergo HIV testing and receive results, and not planning to move away from the recruitment site for the duration of the study.
Women fulfilling the inclusion criteria were invited to come to an enrolment visit within 4 weeks. HIV-negative women were eligible for enrolment in both the microbicides-preparedness and the vaccines-preparedness studies. In the vaccines-preparedness study, HIV-positive women with CD4 cell count $350 cells/mm 3 and who were ART-naïve with no indication to start ART were also enrolled. At enrolment, structured face-to-face interviews were conducted to obtain information about socio-demographic characteristics, employment, reproductive history, sexual behaviour and work mobility. All women enrolled in the study were scheduled to return to the clinic after every three months for 12  For this virological study, we aimed to recruit all HIV seroconverters identified during the follow-up of the cohorts. In order to reach the target sample size, we also aimed to recruit all HIV-positive women enrolled in the vaccines-preparedness study, and a random sample of HIV-positive women identified during the screening process in the microbicides-preparedness study.

Study procedures for Kisesa cohort
The Kisesa cohort has been described previously [17] and will be reviewed here briefly. Kisesa is located about 20 kms from Mwanza city, along the main road connecting Mwanza with major cities in Kenya. Demographic surveillance has been conducted at approximately half-yearly intervals since 1994. In addition, detailed sero-surveys are conducted approximately every three years to collect blood for HIV-1 testing and sexual behaviour data. Typically around 70% of all adults aged $15 years who were invited participated in the sero-surveys [18]. A sero-survey was conducted in 2010-2011 (sero 6), in which dry blood spots were taken from 9,276 participants and tested for HIV in the NIMR laboratory in Mwanza. Individuals who requested voluntary counselling and testing (VCT) in order to know their HIV status were given a separate rapid test by trained VCT counsellors, and a plasma blood sample was taken for storage in NIMR laboratories. For the purpose of this study, we randomly selected plasma samples from individuals who were HIV-1 positive during the sero 6 sero-survey.

HIV-1 diagnosis
Testing for infections was performed according to standard operating procedures in each of the NIMR and KCMC laboratories. In the hospitality facility-worker cohorts, HIV rapid testing was performed at screening in parallel using SD Bioline HIV-1/2 3.0 (Standard Diagnostics, Inc., Korea) and Determine HIV-1/2 (Alere Medical, Co., Ltd, Japan) tests. If the rapid tests were positive or discordant, HIV infection was confirmed in the respective laboratories using either third generation Murex HIV 1.2.O (Abbott UK, Dartford, Kent, England) and Vironostika HIV Uniform II plus O (bioMérieux Bv, The Netherlands) enzyme-linked immunosorbent assays (ELISAs; at NIMR laboratory for the microbicides-preparedness study), or only Vironostika HIV Uniform II plus O ELISA (at KCMC laboratory for the vaccines-preparedness study). In the microbicides-preparedness study, samples discrepant or indeterminate on ELISA were tested for P24 Antigen (Genetics Systems HIV-1 Ag EIA, Bio-rad Laboratories, Marnes-La Coquette, France) and if positive were classified as HIV-positive. Samples negative for P24 antigen were tested by Western Blot (INNO-LIA, HIV I/II score, Innogenetics NV, Gent, Belgium). At enrolment and follow-up visits, HIV testing was done using ELISAs as per the screening algorithm. In the vaccines-preparedness study, the HIV testing algorithm used at screening was applied at enrolment and throughout the follow-up period.
In the Kisesa cohort, HIV tests were run on the DBS samples using two ELISAs in series (Vironostika HIV-Uniform II Plus O (bioMérieux Bv, the Netherlands) and if positive then Enzygnost Anti-HIV 1/2 Plus (Dade Behring Neward, DE, USA)); only those with two positive results were considered HIV-positive. [2] P-value omitted since differences are by design. [3]Including food preparation, mamalishe and bar work.
[4] Lower limit of detection was 300 copies/ml, except for the 10 subsequent seroconverters, where the lower limit of detection was 75 copies/ml.
[5] Imputed as half the lower limit of detection, for those with undetectable HIV VL. [6] Comparing categories split by the overall median. doi:10.1371/journal.pone.0081848.t001

Sample collection, viral load measurement and genotyping
Among those selected for this virological study, whole blood was collected to isolate peripheral blood mononuclear cells (PBMC) and aliquot plasma.  [19,20]. The PCR products were purified with a QIAquick PCR purification kit (Qiagen, Valencia, CA) and sequenced using the BigDyeH Terminator v3.  [19,20]. Sequencing products were purified using Centri-Sep TM spin columns (Princeton Separations, Inc New Jersey, USA).

Genotyping assay quality control
To successfully implement the genotyping assay developed by UoM at NIMR, we randomly selected from our archive 39 duplicate plasma samples, which were previously processed in UoM. Genotyping was performed independently at NIMR and data generated compared to data obtained from UoM. Sequences were assembled and edited using Sequencher, version 4.10.1 (GeneCodes Corporation, Ann Arbor, MI) and aligned using ClustalW in Bio-Edit Sequence Alignment Editor v7.0.9.0 [21], yielding a percentage identity for NIMR versus UoM results. We determined the proportion achieving at least 98% [22] identity, and the proportion with matching subtypes.

Genetic analysis of cohort samples
The nucleotide sequence chromatogram files were assembled and edited using Sequencher version 4.10.1 (GeneCodes Corporation, Ann Arbor, MI). Sequences were aligned with reference pure subtype sequences of HIV-1 group M downloaded from Los Alamos National library (LANL) HIV Sequence Database [5] using ClustalW in Bio-Edit Sequence Alignment Editor v7.0.9.0 [21]. The REGA HIV-1 sub-typing tool [23] at the Stanford database was used to assign subtypes [24]. Phylogenetic trees were inferred in SeaView version 4.3.5 [25], using HKY85 model [26] and Neighbor-joining method [27] with 1000 bootstrap replicate values [28].

Statistical methods
Participant characteristics were summarised by cohort and the following variables: HIV-1 seroconverter (defined as #36 months between the last negative and first positive test dates), sex, age, education, main job, ethnicity, religion, age at first sex, number of partners in lifetime, and last 12 months and HIV VL. We investigated the characteristics associated with HIV-1 subtype using multinomial logistic regression (with outcome of subtype categories), adjusted for study, age and sex. The variables were added to the model one at a time, and we included those with overall p,0.10 in a multivariable model, from which we removed those no longer meeting the p,0.10 threshold. Finally, we considered adding each of the omitted variables (retaining those with p,0.10). Continuous variables were categorised or their effects were assumed linear (see Table 1). Analyses were performed using Stata (StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP).

Sequence Data
Sequences from this study were deposited in GenBank under the following accession numbers: KC831606-KC831741.

Participant characteristics
We selected 162 participant samples for analysis of which 11 (7%) were PCR-negative, 10 (6%) failed sequencing or had poor sequences, and 1 (1%) was dropped due to sample mix-up in the quality control check, leaving 140 (86%) samples that were included in the present study. Of these, 53 (38%), 42 (30%) and 45 (32%) were from the microbicides-preparedness study, the vaccines-preparedness study, and the Kisesa cohort, respectively.
In Table 1, we present the characteristics by study cohort among participants included in the final genotyping analysis. Overall, 31 (22%) were identified as HIV-1 seroconverters, the majority of whom by design were female hospitality facilityworkers, since Kisesa surveys only occur every 3 years. Among seroconverters, the median time from the first positive test date to blood collection was 2 (interquartile range [IQR]: 0, 9) months. The majority of participants were female (89%). Female hospitality facility-workers were younger than participants from the Kisesa cohort (median 30 versus 35 years). Women from the hospitality facility-worker cohorts predominantly worked as waitresses or in similar roles (42% in the towns near Mwanza city and 52% in Moshi town), compared to mainly farming in the Kisesa cohort (61%). Hospitality facility-workers were less likely to be married or   Overall, 93% of participants had not progressed to secondary level education, 47% were of Sukuma ethnicity and 78% were Christian. Participants from the hospitality facilities in Moshi and the Kisesa cohort had lower numbers of sexual partners over their lifetime and in the last 12 months, compared to hospitality facilityworkers in towns near Mwanza city (p = 0.07 and ,0.001, respectively). The median HIV VL was 4.3, 4.1 and 5.2 log 10 copies/ml in the microbicides-preparedness study, vaccinespreparedness study and Kisesa cohort, respectively.

HIV genotyping
Quality Control. From the list of samples already processed in UoM, we randomly selected 39 for HIV genotyping at NIMR. Of these, four samples were not sequenced at UoM (2 failed PCR, 2 failed sequencing) and one did not have a matching sample in the NIMR archive. Of the remaining 34 samples, 32 samples were successfully sequenced and analysed, with 27 (84%) samples meeting the $98% identity threshold. We failed to determine the subtype of one sequence generated at UoM due poor quality of sequence. Of the remaining 31 sequences, 28 (90%) had concordant HIV subtypes. The three discrepant sequences were of subtypes D versus C/D, C versus C/H, and CRF10_CD versus C, for NIMR versus UoM, respectively.
In Table 2, we show the distribution of HIV-1 subtypes by participant characteristics. The proportions of participants infected with subtype A virus were similar across the three study populations, while the proportion with subtype C was relatively higher among hospitality facility-workers in towns near Mwanza city (43%) compared to the other populations (31% among hospitality facility-workers in Moshi town and 22% in the Kisesa cohort). Participants in the Kisesa cohort were more likely to be infected with URF viruses (18%) compared to hospitality facilityworkers in towns near Mwanza city (9%) and Moshi town (5%). The proportion of participants infected with subtype D was lowest among hospitality facility-workers in towns near Mwanza (11%).
The distribution of HIV-1 subtypes by demographic and behavioural factors was not uniform. The crude data suggest that participants who were aged $40, those who had incomplete primary versus higher education, Muslims compared to other religions, and those reporting $10 lifetime sexual partners had relatively higher proportions of subtype A. We also examined a number of factors which were collected among female hospitality facility-workers only (data not shown), including duration of working in the facilities, number of places lived in the past year, number of times in the past year spent .1 week away from home, problematic alcohol drinking and past year history of exchanging money or gifts for sex. There was no evidence that the HIV-1 subtype distribution differed by these factors. [1] P-value from X 2 test. [2] HIV seroconverters defined as those with #36 months between known last negative and first positive test dates. [3] Including food preparation, mamalishe and bar work.
[4] Imputed as half the lower limit of detection (300 copies/ml, except for the 10 subsequent seroconverters, where the lower limit of detection was 75 copies/ml), for those with undetectable HIV VL. Median is as shown in Table 1 (4.5 log 10 copies/ml). doi:10.1371/journal.pone.0081848.t002 Table 3. Study-, age-and sex-adjusted associations between subtype and participant characteristics. [2] HIV seroconverters defined as those with #36 months between last negative and first positive test dates. [3] Including food preparation, mamalishe and bar work. [4] Only 33% of men reported the number of lifetime partners, so it was not possible to fit a model with both sex and number of lifetime partners, therefore the results reported are study-and age-adjusted only (and compared to model with study and age only).
[5] Imputed as half the lower limit of detection (300 copies/ml, except the 10 subsequent seroconverters, where the lower limit of detection was 75 copies/ml), for those with undetectable HIV VL. doi:10.1371/journal.pone.0081848.t003 Our final regression model included study population, age and sex only, since none of the other covariates reached p,0.10 after adjustment for these factors. As shown in Table 3, there was some evidence to suggest independent associations with subtype by study population, age and sex (p = 0.09, 0.05, and 0.05, respectively), although the confidence intervals were generally wide. In particular, comparing females versus males, there was a lower relative risk of being infected with subtype D compared to subtype A (adjusted relative-risk ratio (aRR) = 0.12, 95% CI 0.02, 0.76). Furthermore, there were trends to suggest lower relative risk of subtypes C, D and URF compared to subtype A with older age (aRR = 0.86, 95% CI 0.54,1.38; aRR = 0.44, 95% CI 0.23,0.85; and aRR = 0.67, 95% CI 0.34,1.33, per 10 years, respectively).

Discussion
This study describes the molecular epidemiology of HIV-1 among men and women in a peri-urban general population and among women known to be at increased risk of HIV-1 infection in northern Tanzania. We observed multiple HIV-1 subtypes in the study populations, confirming the complex genetic diversity of HIV-1 strains circulating in these areas.
Our results show that HIV-1 subtypes A (39%) and C (33%) were the most prevalent, followed by subtype D (18%) and URF (11%). This is consistent with previous studies conducted in northern Tanzania [8,11,29]. To our knowledge, this is the first study to investigate the molecular epidemiology of HIV-1 in the towns near Mwanza city, the main Tanzanian urban centre on the shores of Lake Victoria. To our knowledge, only one study has previously described the distribution of HIV-1 subtypes around Lake Victoria, and this was conducted in Bukoba town of Kagera region [8]. Findings from the previous and present studies suggest that HIV-1 subtypes A and C may be the most prevalent subtypes in these regions around Lake Victoria.
We observed a substantial proportion of recombinant viruses across the three cohorts, with the general population cohort in Kisesa having the highest frequency. However, this was a relatively lower frequency than previously reported [7]. This confirms that re-infection (or superinfection) may occur among people infected with HIV, leading to recombinants that are subsequently transmitted [30]. The high genetic variability of the virus may pose significant problems in the specificity and/or sensitivity of serological and molecular diagnostic tests [31]. In Uganda, genetic variability was shown to have an effect on response to ART [32] and the development of HIV dementia among individuals with advanced immunosuppression [32] and this has implications for the treatment of HIV infection. Likewise, an effective anti-HIV-1 vaccine should elicit efficient cellular as well as humoral immune responses and, in particular, broadly neutralising antibodies able to target the largest number of HIV-1 genetic forms [1,15,16,33].
We found only weak evidence of a difference in subtype distribution by cohort, with a suggestion of higher prevalence of subtype C, and lower subtype D, in hospitality facility-workers from towns near Mwanza city compared to hospitality facilityworkers in Moshi town and the Kisesa general population. Higher prevalence of subtype C among hospitality facility-workers may be because they are more likely to be exposed to sex partners from areas where subtype C is highly prevalent. We observed that females were less likely to be infected with subtype D, compared to subtype A. Furthermore, we found trends towards higher relative prevalence of subtype A with older age. This may be due to subtype A, along with subtype D, being more predominant at earlier stages of the epidemic, with subtype C increasing over time, or may be related to better prognosis among those infected with subtype A compared to subtype D [34,35]. While this is a small study, these differences warrant further investigation.
In the present study, we genotyped samples from multiple cohorts recruited in five towns in northern Tanzania, including women known to be at increased risk of infection and men and women from the general population. Thus, we were able to compare our results across the study populations with different socio-demographic and risk behaviour profiles. These strengths should be considered in light of a number of limitations. Firstly, many of the participants were enrolled as prevalent cases, and so the time of infection was unknown and may have been many years earlier. This is relevant both because subtype distributions may change over time, and because data on some risk factors, particularly sexual behaviour, may apply to periods after infection. Secondly, we amplified only a partial region of polymerase gene and this may lead to misclassification of HIV-1 recombinant forms. However, sequencing this gene provides opportunity for future studies to examine primary HIV drug resistance mutations [36]. Thirdly, the sample size was small, and included only a relatively small number of men and seroconverters; the latter constrained our ability to examine trends in the distribution over time.
In summary, we have described the molecular epidemiology of HIV-1 in northern Tanzania across divergent populations, and confirmed the complex genetic diversity of HIV-1 strains circulating in the study areas. These results underscore the need for appropriate HIV-1 vaccine development to address the multifaceted HIV epidemic. As reported in previous studies, HIV-1 subtypes A and C were the most prevalent viral strains. We also observed a substantial proportion of recombinant viruses, although somewhat lower than previously reported. We found some differences in the distribution of HIV-1 subtypes between cohorts, and by age and sex. This study provides the groundwork for future HIV-1 vaccine research to be conducted in northern Tanzania. The populations involved in this study provide opportunity to assess the efficacy of candidate vaccines against diverse HIV-1 strains.