HIV-1 subtype diversity, transmission networks and transmitted drug resistance amongst acute and early infected MSM populations from Coastal Kenya

Background HIV-1 molecular epidemiology amongst men who have sex with men (MSM) in sub-Saharan Africa remains not well characterized. We aimed to determine HIV-1 subtype distribution, transmission clusters and transmitted drug resistance (TDR) in acute and early infected MSM from Coastal Kenya. Methods Analysis of HIV-1 partial pol sequences from MSM recruited 2005–2017 and sampled within six months of the estimated date of infection. Volunteers were classified as men who have sex with men exclusively (MSME) or with both men and women (MSMW). HIV-1 subtype and transmission clusters were determined by maximum-likelihood phylogenetics. TDR mutations were determined using the Stanford HIV drug resistance database. Results Of the 97 volunteers, majority (69%) were MSMW; 74%, 16%, 9% and 1% had HIV-1 subtypes A1, D, C or G, respectively. Overall, 65% formed transmission clusters, with substantial mixing between MSME and MSMW. Majority of volunteer sequences were either not linked to any reference sequence (56%) or clustered exclusively with sequences of Kenyan origin (19%). Eight (8% [95% CI: 4–16]) had at least one TDR mutation against nucleoside (n = 2 [2%]) and/or non-nucleoside (n = 7 [7%]) reverse transcriptase inhibitors. The most prevalent TDR mutation was K103N (n = 5), with sequences forming transmission clusters of two and three taxa each. There were no significant differences in HIV-1 subtype distribution and TDR between MSME and MSMW. Conclusions This HIV-1 MSM epidemic was predominantly sub-subtype A1, of Kenyan origin, with many transmission clusters and having intermediate level of TDR. Targeted HIV-1 prevention, early identification and care interventions are warranted to break the transmission cycle amongst MSM from Coastal Kenya.


Study design
Data and samples were obtained from a prospective observational study following high-risk volunteers in a HIV-1 vaccine feasibility study in Coastal Kenya. MSM volunteers, further characterized into MSMW and MSME, were recruited between 2005 and 2017, and followed monthly or quarterly as previously described [12].
Identification of acute and early infection in this cohort has also been described in detail elsewhere [12]. In brief, HIV-1 testing was performed at each study visit using two rapid antibody test kits in parallel (Determine, Abbott Laboratories; Unigold, Trinity Biotech). Discordant results were resolved using an enzyme-linked immunosorbent assay (ELISA, Genetic System HIV-1/2 plus O EIA; Bio-Rad). All HIV-1 negative or discordant samples were tested for p24 antigen (Vironostika HIV-1 p24 ELISA; Biomerieux), and pre-seroconversion and post-seroconversion samples were tested for HIV-1 ribonucleic acid (RNA) (Amplicor Monitor 1.5; Roche).
HIV-1 antibody test results were relayed to volunteers in real time. All volunteers testing HIV-1 negative were supported with risk reduction counselling. Those testing HIV-1 positive were either enrolled for follow up care in other early infection studies [29,30], or referred to their proximate clinics of choice for follow up care and antiretroviral therapy (ART).
In Kenya, ART was rolled out in public health facilities in 2006, with eligibility based on a pre-defined CD4+ T-cell count or WHO clinical staging criteria. Standard first-line regimen included two nucleoside reverse transcriptase inhibitors (NRTI) and a non-nucleoside reverse transcriptase inhibitor (NNRTI). Individuals failing first-line regimen were switched to a second line regimen comprising two NRTIs and a protease inhibitor (PI) [31]. Immediate ART initiation, regardless of the CD4 T-cell count or WHO clinical staging, was recommended from 2016 [32].
For the purpose of our analysis, the earliest HIV-1 infected samples (or HIV-1 pol sequence, where available) from volunteers diagnosed with acute and early infection, defined as samples collected within 6 months of the estimated date of infection (EDI), were considered [33]. Overall, 97 MSM volunteers met our eligibility criteria and were included in the analyses.

HIV-1 genotyping
Of those included in our analyses, samples from 81 volunteers had available HIV-1 pol sequence data (genotyping details published elsewhere) [28]. Genotyping for the remaining 16 samples were done as follows: HIV-1 RNA was extracted from 100 μl of blood plasma using the RNeasy lipid tissue mini kit (Qiagen). HIV-1 RNA were reverse transcribed and PCR amplified using the one-step Superscript III RT/Platinum Taq High Fidelity protocol (Invitrogen) according to manufacturer's instructions, with pol primers JA269-JA272 [34]. A nested PCR with primers JA270 and JA271 was then done using Dream Taq DNA polymerase (5/UL) (ThermoFisher Scientific) according to manufacturer's instructions. Successfully amplified PCR products were confirmed by agarose gel electrophoresis and prepared for sequencing using the inner primers (JA270 and JA271) and the Big Dye terminator kit (Applied Biosystems). These were processed using the 3130 genetic analyzers (Applied Biosystems).

HIV-1 subtype determination
The forward and reverse fragments were assembled using Sequencher (v5.4.6) and saved in a consensus FASTA file. All sequence files were aligned using the Clustal algorithm in MEGA7 [35]. In addition, the most recent (2010) HIV-1 subtype reference sequence dataset was obtained from the Los Alamos HIV sequence database (https://www.hiv.lanl.gov/content/ index). A profile alignment was done in Clustal X2 (v 2.1) [36] for the volunteer and the reference sequences. The combined volunteer and reference sequence alignment was edited in MEGA7 and submitted for phylogenetic reconstruction using the general time reversible (GTR) model of nucleotide substitution with gamma distributed rate heterogeneity. Branch support was assessed using the Shimodaira-Hasegawa like approximate Likelihood Ratio Test (aLRT-SH) on the PhyML online portal [37]. The resulting phylogenetic tree was viewed in Figtree (v1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/), with branch support of aLRT-SH � 0.90 considered significant [38].

Transmission clusters
Based on the subtyping results above, sequences were grouped into the main subtypes observed. For each subtype-specific dataset, a search for related sequences was done separately using the NCBI GenBank BLAST tool [39], with results limited to a threshold of 10 similar hits per volunteer sequence. Duplicate sequences were removed based on the sequence identifiers and accession numbers. Redundant sequences were then removed using Skipredundant on EMBOSS (http://www.bioinformatics.nl/cgi-bin/emboss/skipredundant). Every single hit was further explored to identify and exclude previously published volunteer sequences.
Overall, 330 reference sequences were identified (S1 Table). These were aligned in turn with volunteer subtypes A, C and D sequences, and submitted for phylogenetic analysis as outlined above. Clusters were identified using Cluster Picker [40]. Branch support of aLRT-SH �0.90 and a genetic distance of �0.06 were considered acceptable to infer transmission clusters [41]. Active transmission clusters were further explored using aLRT-SH branch support of � 0.900 and genetic distance �0.015 [41]. Transmission networks were defined based on the number of MSM sequences as dyads (2 sequences) and networks (� 3 sequences) [38,42].

Transmitted drug resistance
Volunteer sequences were submitted to the Stanford HIV drug resistance database using the calibrated population resistance tool to screen for pol resistance-associated mutations (http:// cpr.stanford.edu/cpr.cgi). Transmitted resistance mutations were identified based on the WHO list for surveillance of genotypic drug resistance mutations [43]. The prevalence of transmitted resistance was estimated, and their 95% binomial confidence intervals (CI) presented. In addition, phylogenetic analysis of volunteer and reference sequences was repeated, as described above, to assess for clustering amongst isolates identified with surveillance drug resistance mutations.

Data analysis
Continuous data were presented using medians and interquartile ranges (IQR). Age was further stratified into two categories based on the median value as youth/younger adults (<24.9 years) and older (�25.0 years) participants. Categorical data were presented using frequencies and percentages. Associations in continuous data were determined using the non-parametric rank-sum test. Associations in categorical data were determined using the Pearson's chisquared test. All analyses were done using Stata I/C 15.0 (StataCorp LLC).

Ethical considerations
The study received ethics approval from the Kenya Medical Research Institute (KEMRI) Scientific and Ethics Review Unit (parent protocol numbers. SSC 894 and SSC 1027). All volunteers provided written informed consent. All consensus HIV-1 pol sequence fasta files are available from GenBank (Accession numbers MK192535-MK192631).  Table 1).

Transmission clusters i) Subtype
Of the 72 sub-subtype A1 volunteer sequences, majority were either not related to any of the reference sequences Overall, the 72 sub-subtype A1 volunteer sequences had a mean genetic distance of 0.019 nucleotide substitutions/site. There was no significant difference between the mean genetic distance of MSME and MSMW sequences (0.024 vs 0.017 nucleotide substitutions/site, respectively, p = 0.218).
ii) Subtype C. Of the nine subtype C volunteer sequences, 7 (78%) formed a network and a dyad (Fig 2, S2 Fig) Overall, the nine sub-subtype C volunteer sequences had a mean genetic distance of 0.014 nucleotide substitutions/site. There was no significant difference between the mean genetic distance of MSME and MSMW volunteers (0.003 vs 0.019 nucleotide substitutions/site, respectively, p = 0.211).
iii) Subtype D. Of the 15 subtype D volunteer sequences, 8 (53%) formed two networks and one dyad (Fig 2, S3 Fig). Both networks were triads and comprised mixed MSME/MSMW clusters. The dyad also comprised a mixed MSME/MSMW cluster. Majority of the subtype D sequences were not related to any of the reference sequences (n = 10 [67%). The remaining sequences were related to reference sequences of either Kenyan (n = 3 [20%]) or Ugandan (n = 1 [7%]) origin. One sequence (7%) was related to a sequence mix of both Kenyan and Ugandan origin (S3 Fig).   N = 158). Tip labels colored according to risk group as follows: grey (references), blue (men who have sex with men exclusively, MSME) and green (men who have sex with men and women, MSMW). Branches are colored according to HIV-1 pol subtype inferences as follows: grey (subtype references), red (subtype A1), purple (subtype D), brown (subtype C) and green (subtype G). Overall, the 15 sub-subtype D volunteer sequences had a mean genetic distance of 0.021 nucleotide substitutions/site. There was no significant difference between the mean genetic distance of MSME and MSMW volunteers (0.019 vs 0.021 nucleotide substitutions/site, respectively, p = 0.849).
The most prevalent TDR mutation was the K103N mutation (n = 5). The HIV-1 pol sequences from these volunteers formed two highly supported transmission clusters of three and two sequences each (S4 There was evidence to suggest that older volunteers had higher TDR levels compared to the younger volunteers (25.0-39.9 vs. 18.0-24.9 years; 15.0% vs 3.5%, p = 0.043). A temporal increase in the prevalence of TDR was also observed, from an estimated 3.9% in 2005-2008 to 13.6% in 2013-2017. However, this did not attain statistical significance (p = 0.470). There were also no statistically significant TDR differences between MSME and MSMW volunteers (6.7% vs 9.0%, p = 0.705) ( Table 4).

Discussion
While little mixing has been reported between MSM and the general population in Kenya [9], our findings now demonstrate much intermingling between MSME and MSMW in Coastal Kenya. This is evident from the significant mixing of both MSME and MSMW sequences observed in most of the cluster networks, and the consistent comparability in genetic distance between the two groups. In addition, there were no significant differences between HIV-1 subtype distribution and transmitted drug resistance between MSME and MSMW. Our data therefore suggest that the HIV-1 MSME and MSMW epidemics are of a homogenous molecular characterization, and that the differential HIV-1 acquisition risk is likely behavioral, including a higher frequency of receptive anal intercourse among MSME as reported earlier [12].
Overall, our data also confirm that the HIV-1 MSM epidemic from Coastal Kenya is largely characterized by a predominance of sub-subtype A1 infections, which is consistent with subtype diversity literature from Kenya [13][14][15][16][17][18]. However, and while some studies have also reported high and/or increasing circulating or complex recombinant forms amongst the general heterosexual population [19][20][21][22], this was not evident in our study population. The absence of recombinants in this HIV-1 infected MSM population likely complements observations from a previous study reporting little mixing between the general heterosexual and MSM population [9]. However, subtype inferences in our analyses were based on partial HIV-1 pol genome data, and the possibility that we may have missed recombination breakpoints occurring outside the pol genome cannot be ruled out. Indeed, full genome analysis of 13 MSM isolates from Coastal Kenya reported four unique recombinant forms, with sub-subtype A1 related segments reported in most of their pol region, but C and D segments occurring in vpu, gag, env and nef [44].  About two-thirds of the volunteer sequences formed phylogenetically linked clusters, with little genetic variability between related sequences. This is consistent with literature on MSM epidemics from sSA [11] and other more developed settings [8,38]. While some networks comprised volunteers who were all infected within one year of each other, one network included individuals who were infected over a duration of more than nine years. These findings point towards a high proportion of both active and long-sustained transmission networks. However, the high clustering may also reflect the high sampling density of MSM from our setting. Nonetheless, prevention interventions, including pre-exposure prophylaxis (PrEP), particularly targeting MSM at high risk of being in transmission clusters, are therefore warranted. Early identification and linkage to care of HIV-1 infected but unaware MSM individuals may also positively contribute to the control of the MSM epidemic in this setting.
Majority of our HIV-1 MSM sequences were either not related to any other reference sequence, or closely related to sequences of Kenyan origin. Our data, therefore, suggest that the HIV-1 MSM epidemic in Coastal Kenya is likely predominantly of local origin, and not necessarily imported from outside Africa. Not surprisingly, the few remaining volunteer sequences were mostly linked to sequences from other East African countries including the subtype D predominant Uganda, subtype A predominant Tanzania, and subtypes C predominant Burundi and Zimbabwe. This is likely a reflection of the extensive transport and commerce networks by road and railways through the Coastal town of Mombasa, which has been postulated to contribute to the spread of HIV-1 in East Africa [21,45,46]. Only four volunteer sequences suggested genetic relatedness with references from outside Africa, including the  United States of America, Finland and Philippines. It is possible that these references may be from Kenyans who may have travelled or immigrated to these countries, or that residents from these countries got infected whilst visiting Coastal Kenya, a common tourist destination. Our data also suggest a possible TDR increase, from low (<5%) to intermediate (5-15%) level resistance, over the last decade amongst HIV-1 infected MSM in Coastal Kenya, which is consistent with other studies from the general population in Kenya [26]. The intermediate level resistance in our early and recently infected MSM population is consistent with estimates from HIV-1 newly diagnosed heterosexual adults [24] but in contrast with low TDR levels reported in volunteers with chronic HIV-1 infection [27], all from the same setting. This may reflect decreased viral fitness of transmitted resistance variants during early infection, and subsequent replacement by wild-type virus with better replicative capacity in the absence of selection from ART in chronic infection [47,48].
Observed TDR mutations were non-complex, with a predominance of the common NNRTI-based K103N mutation. The high level of K103N mutation is consistent with literature from Kenya [23,24,26], and has been attributed to the widespread use of NNRTI in first line ART and the historic use of Nevirapine monotherapy in the prevention of mother to child transmission. NNRTIs have a low genetic barrier [49] and mutations may persist for long durations [50], making them easy to be onward transmitted. Indeed, all our observed K103N mutations formed two phylogenetically related transmission clusters spanning over several years, suggesting that they may have been in circulation for a while, and were unsuspectingly propagated onward in the community as new infections. This observation has also been reported in other developed settings [51,52]. Focus on interventions towards early identification of infected MSM and linkage to care are therefore warranted, and may contribute to a subsequent reduction in transmitted NNRTI resistance mutations.
The main strength of our study is the use of data and samples from a well-characterized acute and early infected MSM cohort collected over a duration of more than 10 years from a sSA setting. However, our study is not without limitations. Firstly, the small sample size limited our effort for an in-depth analysis to delineate temporal effects and associations by subtypes, transmission clusters and transmitted drug resistance. Secondly, we only included data and samples from MSM populations and were therefore unable to make systematic comparisons of subtype distribution, transmission clusters and TDR between MSM and the general heterosexual population. Lastly, we only used MSM data and samples from one setting along Coastal Kenya, which limits the extent to which our findings may be generalized at the national level or in other regions.
In conclusion, and limitation notwithstanding, our data from a HIV-1 early and acute infected MSM population suggest that this concentrated epidemic is characterized with a predominance of HIV-1 sub-subtype A1, likely of Kenyan origin, with many MSM transmission clusters and having intermediate level TDR. The high proportion of both active and long-sustained transmission networks are likely propagated by HIV-1 infected individuals unaware of their HIV-1 status. Targeted HIV-1 prevention, early identification and care interventions are therefore warranted to break the transmission cycle amongst MSM in Coastal Kenya. A bigger, well designed, and nationally representative study aimed at understanding the molecular epidemiology of HIV-1 infection within and between high-risk groups is also critical to inform targeted interventions towards controlling the HIV-1 epidemic in Kenya.
Supporting information S1 Table. A