The classification of HIV-1 strains in subtypes and Circulating Recombinant Forms (CRFs) has helped in tracking the course of the HIV pandemic. In Senegal, which is located at the tip of West Africa, CRF02_AG predominates in the general population and Female Sex Workers (FSWs). In contrast, 40% of Men having Sex with Men (MSM) in Senegal are infected with subtype C. In this study we analyzed the geographical origins and introduction dates of HIV-1 C in Senegal in order to better understand the evolutionary history of this subtype, which predominates today in the MSM population
We used a combination of phylogenetic analyses and a Bayesian coalescent-based approach, to study the phylogenetic relationships in pol of 56 subtype C isolates from Senegal with 3,025 subtype C strains that were sampled worldwide. Our analysis shows a significantly well supported cluster which contains all subtype C strains that circulate among MSM in Senegal. The MSM cluster and other strains from Senegal are widely dispersed among the different subclusters of African HIV-1 C strains, suggesting multiple introductions of subtype C in Senegal from many different southern and east African countries. More detailed analyses show that HIV-1 C strains from MSM are more closely related to those from southern Africa. The estimated date of the MRCA of subtype C in the MSM population in Senegal is estimated to be in the early 80's.
Our evolutionary reconstructions suggest that multiple subtype C viruses with a common ancestor originating in the early 1970s entered Senegal. There was only one efficient spread in the MSM population, which most likely resulted from a single introduction, underlining the importance of high-risk behavior in spread of viruses.
Citation: Jung M, Leye N, Vidal N, Fargette D, Diop H, Toure Kane C, et al. (2012) The Origin and Evolutionary History of HIV-1 Subtype C in Senegal. PLoS ONE 7(3): e33579. https://doi.org/10.1371/journal.pone.0033579
Editor: Chiyu Zhang, Jiangsu University, China
Received: September 26, 2011; Accepted: February 15, 2012; Published: March 28, 2012
Copyright: © 2012 Jung et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: MJ was supported by a PhD grant from the Région Languedoc-Roussillon and from the University of Montpellier 2, France. Nafissatou Leye has a PhD grant from S.C.A.C. (Service de Coopération et d'Action Culturelle) of the French Embassy in Senegal. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
HIV-1 group M, which predominates in the global HIV/AIDS epidemic, can be further subdivided into subtypes (A–D, F–H, J, K), sub-subtypes (A1 to A4, F1 and F2), circulating recombinant forms (CRF01 to CRF51) and numerous unique recombinant forms (URFs) (www.hiv.lanl.gov). This genetic diversity has an impact on almost all aspects of the management of this infection going from identification and monitoring of infected persons, to treatment efficacy and vaccine design –. The classification of HIV strains has also helped in tracking the course of the HIV pandemic . Numerous molecular epidemiological studies showed a heterogeneous geographic distribution of the different HIV-1 M subtypes and CRFs. The initial diversification of group M most likely occurred within or near the Democratic Republic of Congo (DRC) , , where the highest diversity of group M strains has been observed and the earliest cases of HIV-1 infection (1959 and 1960) have been documented in Kinshasa, the capital city . Different HIV variants have then spread across the world, and the epidemics in the different continents and countries are the result of different founder effects. Today, subtype C accounts for 50% of all infections . The majority of subtype C infections are found in southern Africa where they represent almost 100% of circulating HIV-1 strains. Subtype C also predominates in India, Ethiopia and southern China, and has entered East Africa, Brazil, and many European countries. With increasing mobility and human migration, HIV-1 variants inevitably intermix in different parts of the world and the distribution of the different HIV-1 variants is a dynamic process.
In Senegal, which is located at the tip of West Africa, both AIDS viruses, HIV-1 and HIV-2, co-circulate. HIV-2 was first described in Senegal, but like in other West African countries, the prevalence of HIV-2 remained low and is decreasing , . Today HIV-1 predominates and since the description of the first HIV-1 AIDS case in 1986, HIV-1 seroprevalence remains below 1% in the general population but can reach up to 20% in population groups with high risk behavior like female sex workers (FSWs) or men having sex with men (MSM) . Several studies showed that CRF02_AG predominates in Senegal, representing 50–70% of circulating strains in the general population and FSWs, but in contrast to surrounding west African countries, a wide diversity of other HIV-1 variants co-circulate; subtypes A1, A3, B, D, F, G, H, CRF01, CRF06, CRF09, CRF11, CRF45 and HIV-1 group O have all been documented , –. As mentioned above, the distribution of HIV-1 subtypes/CRFs can differ between geographic origins and between population groups. Recently our studies showed that 40% of MSM in Senegal are infected with subtype C, which is in strong contrast with 4% to 10% in the general population and FSWs , –. The factors associated with the rapid spread of subtype C and its predominance in the global epidemic are not entirely known, but in certain regions where it has been introduced, subtype C has overtaken other HIV-1 variants . The high prevalence and the rapid spread of subtype C among MSM needs thus particular attention because this could also lead to an increase overtime of subtype C in the general population because more than 90% of MSM recognize having sex with women .
Using a combination of phylogenetic analyses and a Bayesian coalescent-based approach, we studied the phylogenetic relationships of subtype C isolates from Senegal with other subtype C strains that were sampled worldwide, in order to define the origin and onset of the subtype C epidemic in MSM in Senegal.
Origin of subtype C sequences in Senegal
Among the HIV-1 subtype C pol sequences that were downloaded, we first eliminated all sequences that were not identified as subtype C (i.e. intersubtype recombinants) by the REGA-subtyping tool and kept only one isolate per patient. The final dataset includes a total of 3,081 sequences spanning a 1,011 bp fragment in pol between positions 2,253 and 3,263 on the HXB2 genome, including 56 (among which 24 MSM and 18 newly sequenced) strains from Senegal (Table 1 and Table S1). Sequences were included from 4 different continents and 61 countries: Africa (22 countries), the Americas (7 countries), Asia (9 countries) and Europe (23 countries) (Table 2). The majority (67.73%) of the sequences are from Africa and more precisely from southern Africa (55.14%) that is South Africa (22.36%) and Zambia (20.55%), and to a lower extent Botswana (4.32%), Mozambique (3.18%), Malawi (2.30%), Swaziland (1.53%), and Zimbabwe (0.91%). Subtype C sequences from Asia are predominantly from India (355 sequences on a total of 380) and those from the Americas mainly from southern Brazil (253 sequences on a total of 299). Subtype C sequences from Europe represent 10.22% of the dataset and are collected from 23 different countries, without a single country or area that predominates in the dataset.
The maximum likelihood (PhyML) tree of the 3,081 subtype C sequences is shown in Figure 1. The strains from Senegal are highlighted in red, those from southern Africa (South Africa, Zambia, Zimbabwe, Malawi, Mozambique, Botswana, and Swaziland) in orange and those from the other African countries, which are predominantly from East Africa, in yellow. Strains from Asia, the Americas, and Europe are highlighted in green, purple and blue respectively. The sequences from Senegal are interspersed with the other African strains, but one significant cluster (98.9% aLRT support), which comprised all sequences obtained from MSM from Senegal, was identified. The phylogenetic tree shows also separate clades for subtype C strains from southern Africa and one from eastern Africa (cluster B, 75.9% aLRT support), each of which contains sequences from Senegal. The tree shows the presence of two other major clusters, one for the majority of South American (cluster A, purple) and one for the Asian strains (cluster C, green), each apparently resulting from different single introductions, but no strain from Senegal was observed in these clusters. The clusters from South America and Asia are each supported by 72.7% and 82.3% aLRT values, respectively. No significant cluster of European subtype C was observed, they are all interspersed with strains from different geographic origins mainly in Africa and in Asia and southern America. In order to exclude the possibility of artifactual phylogenetic clustering due to drug induced convergent evolution, especially for the clades from Senegal, the phylogenetic tree analysis was repeated on an alignment where 43 (i.e. 129 nt, ∼12.7% of the full alignment) codon positions known to be associated with major resistance mutations were removed. This analysis shows the same subtype C clusters (Figure S1).
Maximum likelihood (PhyML) phylogenetic tree based on 1,011 nucleotide sites of pol gene sequence (nucleotides 2,253–3,263 of HXB2 coordinates) from 3,081 HIV-1 subtype C isolates. Sequences were isolated in the countries shown in Table 2. Sequences are colored to their region of origin: Senegal in red, Southern African countries (South-Africa, Botswana, Malawi, Mozambique, Swaziland, Zambia and Zimbabwe) in orange, other African countries (mainly from the East) in yellow, North and South America in purple, Asia in green and Europe in blue. The branch support (aLRT) of clade A, B, C and MSM are of 73%, 76%, 82% and 99% respectively.
The above analysis showed that subtype C was introduced into Senegal at multiple occasions. Figure 2 shows in more details the subtype C sequences that are most closely related to those observed in Senegal. As described in Materials and Methods, only sequences that branched with one or more sequences from Senegal until the second ancestral node in the phylogenetic tree of the 3,081 sequences, were used for this subtree. In addition to the 56 sequences from Senegal, 121 other subtype C sequences were included (Table S2), representing 5.7% of the total alignment. Figure 2 shows the tree obtained by PhyML with strains colored according to their geographic origin (the same tree with strain names is available in Figure S2). HIV-1 strains from Zambia are represented by a separate color in this tree because strains from this country are frequently present. The majority of the subtype C strains from Senegal and those from the MSM cluster (node C) are falling in clusters (aLRT >85%) which are mainly represented by strains from Zambia and other countries from southern Africa (for example node A, E and F). Nevertheless, some strains from Senegal are related to subtype C from east African countries (majority Ethiopia: node D). Although the exact country at the origin of the most recent common ancestor of the MSM strains remains uncertain, this was most likely in southern Africa. The first ancestral node to the MSM cluster (node B) suggests an origin in Zambia, but this node is only supported with 83.7% aLRT and 11% bootstrap values. The first ancestral node (node A), supported by an aLRT value of 94.7% and a bootstrap value of 49%, contains mainly strains from Zambia but also from other southern African countries. The Bayesian phylogenetic tree analysis performed with MrBayes shows similar results (Figure S3).
Detailed maximum likelihood (PhyML) phylogenetic tree constructed using 1,011 nucleotide sites of pol gene sequence (nucleotides 2,253–3,263 of HXB2 coordinates) from 177 HIV-1 subtype C isolates from Senegal and close relatives (see text). Branch support values (bootstrap and aLRT) are displayed (see figure legend). Colors indicate the geographic origin and sequences were isolated in the following countries: 56 in red from Senegal, 25 in orange from Zambia, 49 in yellow from southern Africa (Botswana 6; Mozambique 5; Swaziland 2; South Africa 35; Zimbabwe 1), 12 in green from East Africa (Burundi 2; Ethiopia 9; Kenya 1; Sudan 2), 3 in blue from other African countries (DRC 1; Equatorial Guinea 1; Gabon 1) and 30 in black from European and Asian countries (Belgium 4; China 1; Germany 2; Denmark 1; Spain 5; France 1; Greece 1; Israel 1; India 1; Italia 1; Luxembourg 1; Norway 2; Portugal 2; Sweden 7).
Dating the subtype C epidemic in Senegal and MSM population
We used a Bayesian MCMC approach implemented in BEASTv1.6.1 to estimate the dates of the most recent common ancestors (MRCAs) for the subtype C sequences from Senegal in the general population and for the subtype C epidemic in the MSM population. We used the Bayesian skyride population growth model associated to three molecular clock models: strict, relaxed uncorrelated lognormal, and relaxed uncorrelated exponential. Moreover, we used four different priors on the average substitution rate among branches with varying informative levels. Figure 3 shows the resulting estimations of the MRCA dates for the different models and priors used. More details are provided in Table S3, including substitution rate estimations.
Coalescent based estimations (BEAST) and 95% highest posterior density (HPD) intervals of the MRCA dates of 56 HIV-1 subtype C pol sequences obtained from the general and the MSM population. Results are displayed for all tested substitution rate priors and molecular clock models, except for relaxed exponential with both less informative priors which provides very large 95% HPD intervals and shows convergence problems (see Table S3 for detailed results, including substitution rate estimations).
Bayes factors (BF) indicate that the relaxed exponential model has a small advantage (BF in the 3 to 5 range) over the relaxed lognormal model, which in turn is slightly better (BF in the 3 to 6 range) than the strict molecular clock. However, the relaxed exponential model becomes non-informative when non- or poorly informative priors on the substitution rate are used (U[0,1] and N[2.5×10−3, 10×10−4], see Materials and Methods), which reveals spurious peaks leading to very large (up to ∼400 years) 95% Highest Posterior Density (HPD) intervals and unrealistic estimates. Except in these two cases, the results with all models and priors are quite consistent. As expected, when we used more informative priors we obtained more restricted 95% HPD intervals. Nevertheless, the median date estimates of the MRCAs of subtype C in the general population of Senegal and for the MSM cluster are similar for all models and priors, indicating likely epidemic origins in the early 80's, in the MSM population. The MRCA for the subtype C strains that entered at multiple occasions into the general population (i.e. heterosexual or mother to child transmission), is estimated in the early 70's.
To illustrate in more detail the MRCA of the subtype C strains in the MSM population and their relation to the other HIV-1 C strains from Senegal, the maximum clade credibility (MCC) tree with time scale obtained from BEAST is shown in Figure 4. We see the same MSM cluster as in the phylogeny of Figure 2 (see also Figure S2 and S3), and the early 70's and 80's dates for the MRCAs of general and MSM population respectively.
Maximum clade credibility tree with time scale obtained with BEAST using 1,011 nucleotide sites of pol gene sequences (nucleotides 2,253–3,263 of HXB2 coordinates) from 56 HIV-1 subtype C isolates from Senegal. This tree is obtained using the relaxed uncorrelated lognormal molecular clock model and moderately informative substitution rate prior (Normal: 2.5×10−3,7.5×10−4). Clades with posterior probabilities ≥95% are indicated by diamonds. MSM isolates are colored in red.
We verified whether presence of drug resistance mutations could have an impact on MRCA dates and substitution rate estimations. Therefore calculations were repeated on the three different molecular clock models and for the four priors on an alignment where 43 codon positions known to be associated with major resistance mutations were removed. This analysis showed no significant difference, compared to the results obtained with the complete alignment (Table S3 for details on estimations and Figure S4 for the MCC tree with time scale).
Finally, our reconstruction of the demographic history of HIV-1 C in Senegal identified an initial, slow growth phase until the end of the 70's followed by a period of quick exponential-like growth at the end of the 90's where the epidemic growth became slower (Figure 5).
Estimates of HIV-1 C effective number of infections (Ne) over time from 56 Senegalese pol sequences using a Bayesian skyride plot in BEAST with relaxed uncorrelated lognormal molecular clock and moderately informative substitution rate prior (ucld.mean Normal: 2.5×10−3, 7.5×10−4). The X-axis represents the time in year. The Y-axis represents the HIV-1 effective number of infections (log10 scale). The black line marks the median estimate for Ne and the blue shadow region displays the 95% highest posterior density (HPD) interval.
In this study we analyzed the geographical origins and introduction dates of HIV-1 subtype C in Senegal in order to better understand the evolutionary history of this subtype which predominates today in the MSM population . Our evolutionary reconstructions suggest that multiple subtype C viruses with a common ancestor originating in the early 1970s entered the country, followed by a sharp growth of the effective number of infections over the next decade.
This analysis of more than 3,000 globally collected reference sequences most likely provides an adequate representation of global subtype C diversity, and provides also additional information on the subtype C epidemic in other continents. The phylogenetic tree analysis showed several major clusters of subtype C sequences, mainly related to the continent of origin, like Asia, Southern America or Africa, except for Europe. Interestingly, among the African strains, a separate cluster of strains derived from patients living in east African countries was observed , and subtype C strains from Europe do not form a separate cluster and are interspersed among the different continents and major clusters. Our data also confirm the previously reported link of the subtype C epidemic in Brazil with east Africa –.
Our analyses with various methods (PhyML, MrBayes and BEAST) showed a significantly well-supported cluster which contained all subtype C strains that circulate among MSM in Senegal. The MSM cluster and other strains from Senegal are widely dispersed among the different subclusters of African strains, suggesting multiple introductions of subtype C into Senegal from many different southern and also eastern African countries. More detailed analyses showed that the majority of the HIV-1 C strains from Senegal, including those circulating among MSM, are more closely related to strains from southern African countries, mainly Zambia. The cluster of subtype C strains derived from the MSM population includes also strains from HIV-1 infected men from Senegal, who were not identified as MSM. Homosexuality is illegal in Senegal and male-to-male sex is condemned by political and religious authorities and by the general population, therefore most MSM keep their sexual life secret, including from their own family and more than 90% of MSM reported having sex also with women . Thus, these additional strains in the MSM cluster are most likely from individuals with male-to-male sex activities. Subtype C in MSM may have its origin directly from southern Africa but it is also possible that the ancestor of this subtype C cluster circulated already for a certain period in the general population in Senegal before it was introduced into the MSM group.
The wide diversity and multiple introductions of subtype C fit also with the distribution of the HIV-1 variants in the general population in Senegal. Several studies showed that in addition to CRF02_AG, many other HIV-1 subtypes and CRFs are also present in the country, reflecting multiple introductions , –. This is most likely related to the important trading activity and travel links of the country with many other African countries , . Our estimates suggest that the MRCA of the subtype C strains that entered Senegal was in the early 1970's, about 10–15 years before the description of the first HIV-1 AIDS case in the country or the first HIV-1 subtype C strain in 1988 in Senegal . The MRCA date estimate of subtype C in Senegal is relatively close to those estimated in other African countries, like 1966 for subtype C in Ethiopia , beginning of the 70's for Zimbabwe  or in the late 60's for Malawi . As expected, we found that MRCA of subtype C in Senegal is not specific, because multiple introductions occurred, and our MRCA date estimate corresponds most likely to those of subtype C strains outside Senegal. In contrast to southern African countries, subtype C did not become the predominant strain in Senegal and did only spread efficiently in the MSM population, underlining the importance of high risk behavior in spread of viruses . The MRCA of subtype C in the MSM population is estimated in the early 80's and is the result of a single introduction. This estimate coincides with the period where the HIV-1 C epidemic started a quick exponential-like growth phase in Senegal for nearly 15 years according to the Bayesian skyride analysis.
Our study showed also that analysis of alignments with or without codons that are associated with drug resistance did not have a significant impact on phylogenetic clustering or on MRCA date and substitution rate estimations. Among the different molecular clock models used, Bayes factors suggested the use of the relaxed exponential molecular clock above the most frequently used relaxed lognormal molecular clock. However, the very large confidence intervals and convergence problems with the exponential model with poorly informative priors, and the almost similar results with informative priors for both models are probably at the basis for the preferential use of the relaxed lognormal molecular clock model for HIV.
Previous studies suggest that subtype C could spread more efficiently due to the predominance of CCR5 variants or a stronger predisposition for localization in the female genital mucosa than other subtypes, which may facilitate both vertical and heterosexual transmission –. Increase of subtype C could also have implications on treatment because other subtype C specific mutations have been documented and commercial drug resistance assays cannot correctly test subtype C infections , –. A cross-sectional study of women in Kenya indicated that women infected with subtype C had a higher viral load and lower CD4 counts than those infected with subtypes A and D, which could also have an impact on pathogenesis and transmission . Therefore, it is important to continue to monitor HIV-1 subtype/CRF distribution among different population groups in Senegal. However, in order to be able to compare trends over time, such studies should be organized in a standardized way. For example, WHO proposed standardized protocols for surveillance of drug resistance mutations in recently infected individuals . These studies can be combined with subtype/CRF characterization.
Because MSM reported having sex also with women, they could potentially serve as a bridge between high-risk men and low-risk women. This sexual mixing pattern might contribute in the future to the subsequent increase of subtype C in the general population. An increase from 4% in 2000 to almost 10% between 2000 and 2010 among the general population in Senegal has already been observed, and subtype C sequences recently obtained from HIV-1 C infected women in 2011 that cluster within the clade of strains from the MSM population have now been observed (Coumba Toure Kane, unpublished results). Understanding the origins and dispersal patterns of HIV-1 clades at regional and country levels is useful to improve the characterization and control of HIV spread. Continuous monitoring of HIV variants seems necessary to adapt treatment and vaccine strategies to be efficient against local and contemporary circulating HIV variants.
Materials and Methods
Nucleotide sequence dataset
In order to increase the number of sequences and to cover a wide geographic range, we used the pol region for our analysis. Pol sequences are highly studied because they are the target of antiretroviral drugs. A total of 56 subtype C pol gene sequences from Senegal were used in this study. Thirty-eight were obtained from the Los Alamos HIV sequence database (www.hiv.lanl.gov) from previously published reports and eighteen were newly characterized from ongoing molecular epidemiology and/or drug resistance studies mainly in Dakar, the capital city of Senegal (Table 1). We downloaded only sequences that were at least 1,000 nucleotides in length and spanning the genomic region which covers protease and majority of RT in pol between positions 2,253–3,263 on the HXB2 genome. Sequences were from blood samples collected between 1990 and 2009. In addition, all available subtype C sequences spanning the same genomic region and for which country of origin and sampling year were known, were also downloaded from the Los Alamos HIV database (www.hiv.lanl.gov). We then submitted all the sequences to the REGA subtyping tool v.2 to confirm subtype assignments and to eliminate eventual intersubtype recombinants , . We selected one sequence per individual when sequential sequences were available or when sequences were epidemiologically linked by direct donor–recipient transmission.
HIV-1 pol sequencing
The 18 new HIV-1 pol sequences were obtained with an in-house technique as previously described . Briefly, RNA was extracted using the QIAamp Viral RNA extraction kit (Qiagen SA, Courtabeauf, France) and processed for reverse transcription polymerase chain reaction (RT-PCR) with the integrase specific primer IN3 5′-TCTATBCCATCTAAAAATAGTACTTTCCTGATTCC-3′ using the Expand reverse transcriptase (Roche Diagnostics, Meylan, France) according to the manufacturer's instructions. The resulting cDNA served as template in the subsequent nested PCR reaction during which a 1,865 base pairs fragment, corresponding to the protease and the first 440 amino acids of the reverse transcriptase region of the pol gene, was amplified with previously described primers and cycling conditions using the Expand Long Template PCR system (Roche Diagnostics, Meylan, France). The amplified HIV-1 nucleic acid fragments were purified using the Geneclean Turbo Kit (Q-Biogen, MPbiomedicals, France) and directly sequenced with primers encompassing the pol region using BigDye Terminator version 3.1 (Applied Biosystems, Courtaboeuf, France) according to the manufacturer's instructions. Electrophoresis and data collection were done on an Applied Biosystems 3130XL Genetic Analyzer. The sequenced fragments from both strands were reconstituted using Seqman II from the DNAstar package v5.08 (Lasergene, Madison, WI, USA).
Sequence alignment and phylogenetic tree analysis
The 18 newly obtained sequences were aligned with the alignment of subtype C sequences downloaded from the Los Alamos HIV database, using the L-INS-i method from MAFFT , , and then manually edited with MEGA5 . The HXB2 subtype B prototype strain was used as outgroup. In order to study potential bias due to drug-induced convergent evolution, all our analysis were also repeated on an alignment for which we removed 43 codon positions known to be associated with major resistance mutations according to the WHO-list of 2009 . The following positions were excluded for protease (23, 24, 30, 32, 46, 47, 48, 50, 53, 54, 73, 76, 82, 83, 84, 85, 88, 90) and RT (41, 65, 67, 69, 70, 74, 75, 77, 100, 101, 103, 106, 115, 116, 151, 179, 181, 184, 188, 190, 210, 215, 219, 225, 230), leaving 882 nt in the final alignment. Both complete (1,011 nt) and restricted (882 nt) sequence alignments are available from the authors upon request. Maximum Likelihood phylogenies were inferred using the GTR+I+Γ4 nucleotide substitution model recommended by  and implemented in PhyML v3.0 . The SPR option was selected to search the tree space and aLRT SH-like branch supports were used to assess confidence in topology . The phylogenetic tree was drawn with FIGTREE (tree.bio.ed.ac.uk/software/figtree/).
In order to better determine and visualize the relationship of the subtype C sequences from Senegal to those from other geographic areas, another phylogenetic analysis was performed with less sequences. For this subtree, we collected from the large, previous phylogenetic tree, all descendant sequences of nodes that are first or second level ancestor of at least one sequence from Senegal (i.e., all Senegalese sequences plus their sisters and close relatives). A phylogeny was then inferred, using the same method and options as described above, but in addition to aLRT we ran a non-parametric bootstrap with 100 replicates to obtain a second assessment of branch supports. A phylogenetic analysis on this subset of sequences was also inferred using MrBayes v3.1  with the same substitution model as for the maximum likelihood tree, and with chain length and tree sampling frequency of 5×107 and 1×104 generations, respectively. A burn-in of 2,000 sampled trees (i.e. ∼40%) was selected. By the end of the run, the average standard deviation of split frequencies was below 0.01 and the potential scale reduction factor of every parameter was in the range [0.999, 1.001], except the parameter pinvar which is at 1.002, proving the convergence of the Markov chains (see MrBayes manual).
Dating the introduction of subtype C in Senegal and MSM population
Estimates of the substitution rate and dates of the most recent common ancestor (MRCA) of subtype C in Senegal and in the sub-epidemic in MSM were obtained using BEAST v1.6.1 . The 56 pol gene subtype C sequences from Senegal were analyzed under a GTR+I+Γ4 substitution process (as for phylogenetic analyzes). We used three different molecular clock models (strict clock, relaxed uncorrelated exponential and relaxed uncorrelated lognormal)  as implemented in BEAST with a Bayesian skyride tree prior as a coalescent demographic model with time-aware smoothing . For the parameters of each molecular clock model (ucld.mean, uced.mean and clock.rate for the relaxed lognormal, relaxed exponential and strict molecular clock respectively) we tested a total of four different priors, one non-informative prior based on a uniform distribution (between 0.0 and 1.0) and three priors with varying information levels based on normal distribution with a mean of 2.5×10−3 (based on estimations from a previous study  in the same genomic region and as estimated by Path-O-Gen: tree.bio.ed.ac.uk/software/pathogen/) and standard deviations of 10×10−4, 7.5×10−4, and 5.0×10−4, respectively. For the ucld.stdev parameter (representing the variability of the rates among branches for the relaxed lognormal molecular clock) we used a prior based on an exponential distribution with mean of 0.1 (personal communication with A. Drummond). MCMC simulations were run for 2.5×108 chain steps with sub-sampling every 2.5×105 steps. Convergence of the chains was inspected using Tracer v.1.5. For each tested prior and for each parameter, effective sample size (ESS) values were always above 300. The Bayes Factor was calculated to compare molecular clock models, using marginal likelihood as implemented in Tracer v.1.5. The Maximum Clade Credibility with time scale (MCC) tree was obtained by TreeAnnotator v1.6.1 with a burn-in of the first hundred trees.
Maximum likelihood phylogenetic tree based on 3,081 HIV-1 subtype C pol sequences, without codons associated to drug resistance in PR and RT. Maximum likelihood phylogenetic tree (PhyML, with the same options as for the tree in Figure 1) based on 882 nucleotide sites of pol gene sequence from 3,081 HIV-1 subtype C isolates; nucleotide sites with coordinates 2,253–3,263 of HXB2 are included, but codon positions known to be associated with major resistance mutations according to the WHO-list of 2009 were removed (see Materials and Methods). Sequences were isolated in the countries shown in Table 2. Sequences are colored according to their region of origin: Senegal in red, Southern African countries (South-Africa, Botswana, Malawi, Mozambique, Swaziland, Zambia and Zimbabwe) in orange, other African countries (mainly from the East) in yellow, North and South America in purple, Asia in green and Europe in blue. The branch support (aLRT) of clades A, B, C and MSM are respectively of 94%, 92%, 83% and 96%.
Maximum likelihood phylogenetic tree constructed of 56 HIV-1 C pol sequences from Senegal and 121 close relatives. Detailed maximum likelihood (PhyML) phylogenetic tree constructed using 1,011 nucleotide sites of pol gene sequence (nucleotides 2,253–3,263 of HXB2 coordinates) from 177 HIV-1 subtype C isolates from Senegal and close relatives (see Materials and Methods) as shown in Figure 2 but names of the strains are added. Branch support values (bootstrap and aLRT) are displayed (see figure legend). Colors indicate the geographic origin and sequences were isolated in the following countries: 56 in red from Senegal, 25 in orange from Zambia, 49 in yellow from southern Africa (Botswana 6; Mozambique 5; Swaziland 2; South Africa 35; Zimbabwe 1), 12 in green from East Africa (Burundi 2; Ethiopia 9; Kenya 1; Sudan 2), 3 in blue from other African countries (DRC 1; Equatorial Guinea 1; Gabon 1) and 30 in black from European and Asian countries (Belgium 4; China 1; Germany 2; Denmark 1; Spain 5; France 1; Greece 1; Israel 1; India 1; Italia 1; Luxembourg 1; Norway 2; Portugal 2; Sweden 7).
Bayesian phylogenetic tree of 56 HIV-1 C pol sequences from Senegal and 121 close relatives. Detailed Bayesian phylogenetic tree (MrBayes, same model and similar options as for the tree in Figure 2, see Materials and Methods) constructed using 1,011 nucleotide sites of pol gene sequence (nucleotides 2,253–3,263 of HXB2 coordinates) from 177 HIV-1 subtype C isolates from Senegal and close relatives. Clades with posterior probabilities ≥95% are shown. Colors indicate the geographic origin of the sequences, which were isolated in the following countries: 56 in red from Senegal, 25 in orange from Zambia, 49 in yellow from southern Africa (Botswana 6; Mozambique 5; Swaziland 2; South Africa 35; Zimbabwe 1), 12 in green from East Africa (Burundi 2; Ethiopia 9; Kenya 1; Sudan 2), 3 in blue from other African countries (DRC 1; Equatorial Guinea 1; Gabon 1) and 30 in black from European and Asian countries (Belgium 4; China 1; Germany 2; Denmark 1; Spain 5; France 1; Greece 1; Israel 1; India 1; Italia 1; Luxembourg 1; Norway 2; Portugal 2; Sweden 7).
Bayesian tree with timescale of 56 HIV-1 C pol sequences from Senegal, without sites associated to major, known resistance in PR and RT. Maximum clade credibility tree with time scale obtained with BEAST using 1,011 nucleotide sites of pol gene sequences (nucleotides 2,253–3,263 of HXB2 coordinates) from 56 HIV-1 subtype C isolates from Senegal. This tree is obtained using the relaxed uncorrelated lognormal molecular clock model and moderately informative substitution rate prior (Normal: 2.5×10−3, 7.5×10−4). Clades with posterior probabilities ≥95% are indicated by diamonds. MSM isolates are colored in red.
Genbank accession numbers per country of subtype C HIV-1 strains included in the study.
Details of the strains included in the restricted phylogenetic tree analysis from Figures 2 , S2 and S3.
Dating the subtype C epidemic in general and MSM populations in Senegal. Coalescent based estimations (BEAST) and 95% highest posterior density (HPD) intervals of the MRCA dates and substitution rates of 56 HIV-1 subtype C pol sequences obtained from the general and the MSM population. Results are displayed for all tested substitution rate priors and molecular clock models.
Conceived and designed the experiments: MP OG. Performed the experiments: MJ NL NV. Analyzed the data: MJ NL NV DF HD CTK OG MP. Contributed reagents/materials/analysis tools: HD CTK. Wrote the paper: MJ DF OG MP.
- 1. Thomson MM, Pérez-Alvarez L, Nájera R (2002) Molecular epidemiology of HIV-1 genetic forms and its significance for vaccine development and therapy. Lancet Infect Dis 2: 461–71. Review.
- 2. Peeters M, Aghokeng AF, Delaporte E (2010) Genetic diversity among human immunodeficiency virus-1 non-B subtypes in viral load and drug resistance assays. Clin Microbiol Infect 16: 1525–31. Review.
- 3. Gamble LJ, Matthews QL (2010) Current progress in the development of a prophylactic vaccine for HIV-1. Drug Des Devel Ther 5: 9–26. Review.
- 4. Tebit DM, Arts EJ (2011) Tracking a century of global expansion and evolution of HIV to drive understanding and to combat disease. Lancet Infect Dis 11: 45–56. Review.
- 5. Vidal N, Peeters M, Mulanga-Kabeya C, Nzilambi N, Robertson D, et al. (2000) Unprecedented degree of human immunodeficiency virus type 1 (HIV-1) group M genetic diversity in the Democratic Republic of Congo suggests that the HIV-1 pandemic originated in Central Africa. J Virol 74: 10498–507.
- 6. Rambaut A, Robertson DL, Pybus OG, Peeters M, Holmes EC (2001) Human immunodeficiency virus. Phylogeny and the origin of HIV-1. Nature 410: 1047–8.
- 7. Worobey M, Gemmel M, Teuwen DE, Haselkorn T, Kunstman K, et al. (2008) Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature 455: 661–4.
- 8. Hemelaar J, Gouws E, Ghys PD, Osmanov S, WHO-UNAIDS Network for HIV Isolation and Characterisation (2011) Global trends in molecular epidemiology of HIV-1 during 2000–2007. AIDS 25: 679–89.
- 9. Barin F, M'Boup S, Denis F, Kanki P, Allan JS, et al. (1985) Serological evidence for virus related to simian T-lymphotropic retrovirus III in residents of west Africa. Lancet 2: 1387–9.
- 10. Hamel DJ, Sankalé JL, Eisen G, Meloni ST, Mullins C, et al. (2007) Twenty years of prospective molecular epidemiology in Senegal: changes in HIV diversity. AIDS Res Hum Retroviruses 23: 1189–96.
- 11. UNAIDS website. Available: www.unaids.org/en/regionscountries/countries/senegal/. Accessed 2011 Aug 23.
- 12. Toure-Kane C, Montavon C, Faye MA, Gueye PM, Sow PS, et al. (2000) Identification of all HIV type 1 group M subtypes in Senegal, a country with low and stable seroprevalence. AIDS Res Hum Retroviruses 16: 603–9.
- 13. Ayouba A, Lien TT, Nouhin J, Vergne L, Aghokeng AF, et al. (2009) Low prevalence of HIV type 1 drug resistance mutations in untreated, recently infected patients from Burkina Faso, Côte d'Ivoire, Senegal, Thailand, and Vietnam: the ANRS 12134 study. AIDS Res Hum Retroviruses 25: 1193–6.
- 14. Diop-Ndiaye H, Toure-Kane C, Leye N, Ngom-Gueye NF, Montavon C, et al. (2010) Antiretroviral drug resistance mutations in antiretroviral-naive patients from Senegal. AIDS Res Hum Retroviruses 26: 1133–8.
- 15. Ndiaye HD, Toure-Kane C, Vidal N, Niama FR, Niang-Diallo PA, et al. (2009) Surprisingly high prevalence of subtype C and specific HIV-1 subtype/CRF distribution in men having sex with men in Senegal. J Acquir Immune Defic Syndr 52: 249–52.
- 16. Soares EA, Martinez AM, Souza TM, Santos AF, Da Hora V, et al. (2005) HIV-1 subtype C dissemination in southern Brazil. AIDS 19: Suppl 4S8186.
- 17. Wade AS, Kane CT, Diallo PAN, Diop AK, Gueye K, et al. (2005) HIV infection and sexually transmitted infections among men who have sex with men in Senegal. AIDS 19: 2133–2140.
- 18. Thomson MM, Fernández-García A (2011) Phylogenetic structure in African HIV-1 subtype C revealed by selective sequential pruning. Virology 415: 30–8.
- 19. Fontella R, Soares MA, Schrago CG (2008) On the origin of HIV-1 subtype C in South America. AIDS 22: 2001–11.
- 20. Bello G, Passaes CP, Guimarães ML, Lorete RS, Matos Almeida SE, et al. (2008) Origin and evolutionary history of HIV-1 subtype C in Brazil. AIDS 22: 1993–2000.
- 21. de Oliveira T, Pillay D, Gifford RJ, UK Collaborative Group on HIV Drug Resistance (2010) The HIV-1 subtype C epidemic in South America is linked to the United Kingdom. PLoS One 5(2): e9311.
- 22. Véras NM, Gray RR, Brígido LF, Rodrigues R, Salemi M (2011) High-resolution phylogenetics and phylogeography of human immunodeficiency virus type 1 subtype C epidemic in South America. J Gen Virol 92: 1698–709.
- 23. Kane F, Alary M, Ndoye I, Coll AM, M'boup S, et al. (1993) Temporary expatriation is related to HIV-1 infection in rural Senegal. AIDS 9: 1261–5.
- 24. Kanki PJ, Peeters M, Gueye-Ndiaye A (1997) Virology of HIV-1 and HIV-2: implications for Africa. AIDS 11: Suppl BS33–4.
- 25. Kanki PJ, Hamel DJ, Sankalé JL, Hsieh C, Thior I, et al. (1999) Human immunodeficiency virus type 1 subtypes differ in disease progression. J Infect Dis 179: 68–73.
- 26. Tully DC, Wood C (2010) Chronology and evolution of the HIV-1 subtype C epidemic in Ethiopia. AIDS 24: 1577–82.
- 27. Dalai SC, de Oliveira T, Harkins GW, Kassaye SG, Lint J, et al. (2009) Evolution and molecular epidemiology of subtype C HIV-1 in Zimbabwe. AIDS 23: 2523–32.
- 28. Travers SA, Clewley JP, Glynn JR, Fine PE, Crampin AC, et al. (2004) Timing and reconstruction of the most recent common ancestor of the subtype C clade of human immunodeficiency virus type 1. J Virol 78: 10501–6.
- 29. McDaid LM, Hart GJ (2010) Sexual risk behaviour for transmission of HIV in men who have sex with men: recent findings and potential interventions. Curr Opin HIV AIDS 5: 311–5. Review.
- 30. Abraha A, Nankya IL, Gibson R, Demers K, Tebit DM, et al. (2009) CCR5-and CXCR4-tropic subtype C human immunodeficiency virus type 1 isolates have a lower level of pathogenic fitness than other dominant group M subtypes: implications for the epidemic, J Virol 83: 5592–5605.
- 31. Ball SC, Abraha A, Collins KR, Marozsan AJ, Baird H, et al. (2003) Comparing the ex vivo fitness of CCR5-tropic human immunodeficiency virus type 1 isolates of subtypes B and C. J Virol 77: 1021–38.
- 32. Renjifo B, Gilbert P, Chaplin B, Msamanga G, Mwakagile D, et al. (2004) Preferential in-utero transmission of HIV-1 subtype C as compared to HIV-1 subtype A or D. AIDS 18: 1629–1636.
- 33. John-Stewart GC, Nduati RW, Rousseau CM, Mbori-Ngacha DA, Richardson BA, et al. (2005) Subtype C is associated with increased vaginal shedding of HIV-1. J Infect Dis 192: 492–496.
- 34. Martinez-Cajas JL, Pai NP, Klein MB, Wainberg MA (2009) Differences in resistance mutations among HIV-1 non-subtype B infections: A systematic review of evidence (1996–2008). J Int AIDS Soc 12: 11.
- 35. Vergne L, Snoeck J, Aghokeng A, Maes B, Valea D, et al. (2006) Genotypic drug resistance interpretation algorithms display high levels of discordance when applied to non-B strains from HIV-1 naive and treated patients. FEMS Immunol Med Microbiol 46: 53–62.
- 36. Snoeck J, Kantor R, Shafer RW, Van Laethem K, Deforche K, et al. (2006) Discordances between interpretation algorithms for genotypic resistance to protease and reverse transcriptase inhibitors of human immunodeficiency virus are subtype dependent. Antimicrob Agents Chemother 50: 694–701.
- 37. Neilson JR, John GC, Carr JK, Lewis P, Kreiss JK, et al. (1999) Subtypes of human immunodeficiency virus type 1 and disease stage among women in Nairobi, Kenya. J Virol 73: 4393–4403.
- 38. Bennett DE, Bertagnolio S, Sutherland D, Gilks CF (2008) The World Health Organization's global strategy for prevention and assessment of HIV drug resistance. Antivir Ther 13: Suppl 21–13.
- 39. de Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, et al. (2005) An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics 21: 3797–800.
- 40. Alcantara LC, Cassol S, Libin P, Deforche K, Pybus OG, et al. (2009) Standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences. Nucleic Acids Res 37: 634–42.
- 41. Vergne L, Diagbouga S, Kouanfack C, Aghokeng A, Butel C, et al. (2006) HIV-1 drug-resistance mutations among newly diagnosed patients before scaling-up programmes in Burkina Faso and Cameroon. Antivir Ther 11: 575–9.
- 42. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30: 3059–3066.
- 43. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33: 511–8.
- 44. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol 28: 2731–9.
- 45. Bennett DE, Camacho RJ, Otelea D, Kuritzkes DR, Fleury H, et al. (2009) Drug resistance mutations for surveillance of transmitted HIV-1 drug-resistance: 2009 update. PLoS One e4724.
- 46. Posada D, Crandall KA (2001) Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV-1). Mol Biol Evol 18: 897–906.
- 47. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307–21.
- 48. Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol 55: 539–52.
- 49. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.
- 50. Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7: 214.
- 51. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4: e88.
- 52. Minin VN, Bloomquist EW, Suchard MA (2008) Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol Biol Evol 25: 1459–71.
- 53. Brown BK, Darden JM, Tovanabutra S, Oblander T, Frost J, et al. (2005) Biologic and genetic characterization of a panel of 60 human immunodeficiency virus type 1 isolates, representing clades A, B, C, D, CRF01_AE, and CRF02_AG, for the development and assessment of candidate vaccines. J Virol 79: 6089–101.
- 54. Vergne L, Kane CT, Laurent C, Diakhaté N, Gueye NF, et al. (2003) Low rate of genotypic HIV-1 drug-resistant strains in the Senegalese government initiative of access to antiretroviral therapy. AIDS 17: Suppl 3S31–8.
- 55. Vergne L, Peeters M, Mpoudi-Ngole E, Bourgeois A, Liegeois F, et al. (2000) Genetic diversity of protease and reverse transcriptase sequences in non-subtype-B human immunodeficiency virus type 1 strains: evidence of many minor drug resistance mutations in treatment-naïve patients. J Clin Microbiol 38: 3919–25.