FL and AJLB designed the study, and FL performed preliminary analyses. GJH analyzed the data and prepared the results for publication. AR developed the BEAST software and provided advice and assistance for data analysis. ALP recruited study subjects and supervised clinical care. All authors contributed to writing the paper.
¤ Current address: Veterinary Epidemiology Research Unit, SAC (Scottish Agricultural College), Inverness, Scotland
ALP reports receiving consulting and lecture fees from Bristol-Myers Squibb, Gilead Sciences, and GlaxoSmithKline. The other authors declare that they have no competing interests.
The structure of sexual contact networks plays a key role in the epidemiology of sexually transmitted infections, and their reconstruction from interview data has provided valuable insights into the spread of infection. For HIV, the long period of infectivity has made the interpretation of contact networks more difficult, and major discrepancies have been observed between the contact network and the transmission network revealed by viral phylogenetics. The high rate of HIV evolution in principle allows for detailed reconstruction of links between virus from different individuals, but often sampling has been too sparse to describe the structure of the transmission network. The aim of this study was to analyze a high-density sample of an HIV-infected population using recently developed techniques in phylogenetics to infer the short-term dynamics of the epidemic among men who have sex with men (MSM).
Sequences of the protease and reverse transcriptase coding regions from 2,126 patients, predominantly MSM, from London were compared: 402 of these showed a close match to at least one other subtype B sequence. Nine large clusters were identified on the basis of genetic distance; all were confirmed by Bayesian Monte Carlo Markov chain (MCMC) phylogenetic analysis. Overall, 25% of individuals with a close match with one sequence are linked to 10 or more others. Dated phylogenies of the clusters using a relaxed clock indicated that 65% of the transmissions within clusters took place between 1995 and 2000, and 25% occurred within 6 mo after infection. The likelihood that not all members of the clusters have been identified renders the latter observation conservative.
Reconstruction of the HIV transmission network using a dated phylogeny approach has revealed the HIV epidemic among MSM in London to have been episodic, with evidence of multiple clusters of transmissions dating to the late 1990s, a period when HIV prevalence is known to have doubled in this population. The quantitative description of the transmission dynamics among MSM will be important for parameterization of epidemiological models and in designing intervention strategies.
Using viral genotype data from HIV drug resistance testing at a London clinic, Andrew Leigh Brown and colleagues derive the structure of the transmission network through phylogenetic analysis.
Human immunodeficiency virus (HIV), the cause of acquired immunodeficiency syndrome (AIDS), is mainly spread through unprotected sex with an infected partner. Like other sexually transmitted diseases, HIV/AIDS spreads through networks of sexual contacts. The characteristics of these complex networks (which include people who have serial sexual relationships with single partners and people who have concurrent sexual relationships with several partners) affect how quickly diseases spread in the short term and how common the disease is in the long term. For many sexually transmitted diseases, sexual contact networks can be reconstructed from interview data. The information gained in this way can be used for partner notification so that transmitters of the disease and people who may have been unknowingly infected can be identified, treated, and advised about disease prevention. It can also be used to develop effective community-based prevention strategies.
Although sexual contact networks have provided valuable information about the spread of many sexually transmitted diseases, they cannot easily be used to understand HIV transmission patterns. This is because the period of infectivity with HIV is long and the risk of infection from a single sexual contact with an infected person is low. Another way to understand the spread of HIV is through phylogenetics, which examines the genetic relatedness of viruses obtained from different individuals. Frequent small changes in the genetic blueprint of HIV allow the virus to avoid the human immune response and to become resistant to antiretroviral drugs. In this study, the researchers use recently developed analytical methods, viral sequences from a large proportion of a specific HIV-infected population, and information on when each sample was taken, to learn about transmission of HIV/AIDS in London among men who have sex with men (MSM; a term that encompasses gay, bisexual, and transgendered men and heterosexual men who sometimes have sex with men). This new approach, which combines information on viral genetic variation and viral population dynamics, is called “molecular phylodynamics.”
The researchers compared the sequences of the genes encoding the HIV-1 protease and reverse transcriptase from more than 2,000 patients, mainly MSM, attending a large London HIV clinic between 1997 and 2003. 402 of these sequences closely matched at least one other subtype B sequence (the HIV/AIDS epidemic among MSM in the UK primarily involves HIV subtype B). Further analysis showed that the patients from whom this subset of sequences came formed six clusters of ten or more individuals, as well as many smaller clusters, based on the genetic relatedness of their HIV viruses. The researchers then used information on the date when each sample was collected and a “relaxed clock” approach (which accounts for the possibility that different sequences evolve at different rates) to determine dated phylogenies (patterns of genetic relatedness that indicate when gene sequences change) for the clusters. These phylogenies indicated that at least in one in four transmissions between the individuals in the large clusters occurred within 6 months of infection, and that most of the transmissions within each cluster occurred over periods of 3–4 years during the late 1990s.
This phylodynamic reconstruction of the HIV transmission network among MSM in a London clinic indicates that the HIV epidemic in this population has been episodic with multiple clusters of transmission occurring during the late 1990s, a time when the number of HIV infections in this population doubled. It also suggests that transmission of the virus during the early stages of HIV infection is likely to be an important driver of the epidemic. Whether these results apply more generally to the MSM population at risk for transmitting or acquiring HIV depends on whether the patients in this study are representative of that group. Additional studies are needed to determine this, but if the patterns revealed here are generalizable, then this quantitative description of HIV transmission dynamics should help in the design of strategies to strengthen HIV prevention among MSM.
Please access these Web sites via the online version of this summary at
Read a related
Information is available from the US National Institute of Allergy and Infectious Diseases on
The US Centers for Disease Control and Prevention provides information on
Information is available from Avert, an international AIDS charity, on
The Center for AIDS Prevention Studies (University of California, San Francisco) provides information on
The US National Center for Biotechnology Information provides a
The NIH-funded
Sexually transmitted infections spread through an often complex network of sexual contacts [
Another line of investigation has taken the approach of phylogenetic analysis of population-based samples of viral sequences. These have yielded different outcomes according to risk group. Infections among injection drug user populations often reveal clustering to a greater or lesser extent [
One possible reason why uncertainty over the degree of clustering has persisted has been the nature of sampling. Diagnosis of acute (or recent) infection is usually made in only a small proportion of individuals. Pilcher et al. describe 107 (18%) and 23 (4%) individuals out of 583 as “recent” and “acute” infections, respectively [
There have been a number of recent developments that have permitted a new approach to this issue. The recommendation that patients with HIV should receive a genotype test for resistance before commencing antiretroviral therapy (ARV) [
In this study, we have used HIV sequences generated from routine clinical treatment to provide a dense sample of the population attending a large London clinic. We adopted a “relaxed clock” [
Our base dataset comprised 2,126 anonymised HIV-1 nucleotide sequences (concatenated full-length protease [PR] and partial reverse transcriptase [RT] coding sequences, 1,497 nucleotides in length) from patients attending HIV clinics at the Chelsea and Westminster Hospital, London, during the period 1997–2003. The Chelsea and Westminster clinic is the largest clinic serving patients with HIV in London, with more than 6,500 patients, contributing 29% of the London patients to the United Kingdom Collaborative HIV Cohort (UK CHIC) study in 2006. Its primary catchment area comprises the inner-city London boroughs of Westminster, Kensington and Chelsea, and Wandsworth, with an HIV-infected patient population that is characteristic of London, including a high proportion (>75%) of MSM. Sequences were provided by VircoBVBA (Michelen, Belgium), having been generated for VircoGEN resistance reports for patients about to start therapy or experiencing failure of therapy. Overall, 384 (18%) patients were receiving ARV at the time the analyzed sample was taken. More details of patients receiving ARV are given in
For patients with multiple sequences, only the oldest sequence was included. Sex and self-reported exposure group were available for each sequence, but other identifiers and clinical data were delinked to maintain confidentiality. HIV-1 subtype was determined using the REGA HIV Subtyping Tool version 2.0 [
In order to remove the influence of convergent evolution at antiretroviral drug resistance mutations on the phylogenetic analysis, two versions of the dataset were analyzed: (i) third-base positions only (for analyses of exact and exact plus ambiguous differences; 499 sites) and (ii) a codon-stripped dataset from which 37 codons associated with major resistance in PR (30, 32, 33, 46, 47, 48, 50, 54, 76, 82, 84, 88, and 90) and RT (41, 62, 65, 67, 69, 70, 74, 75, 77, 100, 103, 106, 108, 115, 116, 151, 181, 184, 188, 190, 210, 215, 219, 225, and 236) were stripped from the alignment (leaving 1,386 nt). Analyses based on genetic distance made use of the first dataset using uncorrected (Hamming) distances; those based on Bayesian Monte Carlo Markov chain (MCMC) phylogenetic approaches used the second.
Phylogenies were constructed with MrBayes [
The ancestral state of amino acids for each cluster group was determined using MrBayes. Separate runs (106 generations, sampling every 100 generations, burn-in 25%) were performed for each group using an HIV-1 subtype C sequence as outgroup under the GTR + I + Γ model of nucleotide substitution. For each run, trees were constrained to include a topology prior for a monophyletic group comprising the cluster group of interest. At the root node of this constraint, the ancestral states of each of the 37 amino acid positions (both genes) associated with drug resistance were compared to known mutations attributed to drug resistance (
Dated phylogenies were obtained using a Bayesian MCMC method (BEAST version 1.4.2; available from
We analyzed consensus sequences of the HIV-1 PR and RT coding regions obtained from the plasma viral population of patients (
Initial comparisons of all sequences in the dataset required a simple, computationally tractable approach because of its large size (2.26 × 106 pairwise comparisons), while recognizing the potential for bias through convergent evolution in patients prescribed the same drugs. This was avoided by restricting analysis to the third-base position in the 499 codons sequenced. We present these data as a colour density graph of the number of exact and ambiguous differences between all patient consensus sequences (
Key indicates the number of comparisons for each datapoint by colour. Sequences were compared at all 499 third-base sites and recorded for an exact difference or an ambiguous difference (see text for details). Two major peaks reflect within subtype (30–60 differences) and between subtype (100–110 differences) comparisons, respectively. The third smaller region of density close to the origin identifies patients with at least one other closely related sequence in the dataset.
In addition to the two major peaks, a small, yet distinct, third density peak can be seen close to the origin, which represents pairwise comparisons between particularly closely related sequences. The apparent trough in density between this and the adjacent group corresponded to approximately 25 nucleotide differences, which was used to define a subset of sequences that had at least one other close relative. This subset includes 483 patients, of which 402 were infected with HIV-1 subtype B. These 402 were overwhelmingly MSM, with just ten self-reporting injection drug use (four female, six male), and 13 self-reporting heterosexual contact (nine female, four male), as a risk factor, and were the basis of the detailed studies described below.
In order to identify patterns of transmission among the 402 subtype B sequences, two complementary approaches were adopted. The first, based on the matrix of pairwise exact differences at 499 third-base positions, recognized all linkages between individuals involving 24 differences (4.8%) or fewer. Many of the groups revealed link only two or three individuals, but nine groups with six or more members were identified using this criterion (
Patients included in major clusters are represented by a red node, and connecting lines between red nodes represent a genetic distance of less than 4.8% (24 differences). Sensitivity of the clusters to the distance criterion shown by additional nodes in green (5.0%), blue (5.2%), and orange (5.4%).
In the second approach, we performed a detailed phylogenetic analysis on the same sequence dataset. To maximize resolution, the full sequence dataset was used (all codon positions), but with 37 codons associated with major resistance in PR and RT removed, leaving 1,386 aligned nucleotide positions. A Bayesian MCMC phylogeny was reconstructed from these site-stripped sequences, using a subtype C sequence as an outgroup (
Clusters with ≥10 members (and a posterior probability of 1) are shown in red. Letters indicate the position of identified clusters. Scale bar indicates number of substitutions.
A cluster is defined by nodes with a posterior probability of 1. Letters indicate the six largest clusters phylogenetically defined.
Six of the larger clusters were made up entirely of MSM (clusters B, D, E, F, H, and I). Clusters A and C are also primarily composed of MSM, but include one female patient in each. Cluster G (seven individuals) included one female and one injection drug user. Thus, the clustering identified in this study reflects the epidemiology of HIV transmission by sexual contact among MSM.
Clusters A–E and H, comprising 88 patients in all, were selected for further analysis on an individual basis using phylogenetic methods incorporating a relaxed molecular clock (the other clusters were not large enough for this analysis). For each cluster, an exponential model of population growth was significantly favoured over a constant population size (data not shown). Evolutionary rate estimates were obtained using the unpartitioned (GTR + I + Γ) nucleotide substitution model and the SRD06 model, which has two rate partitions, for third-base position and first- plus second-base positions respectively. Substitution rates at third-base positions vary nearly 3-fold among clusters (
Relaxed-Clock Analysis of Major Clusters Using BEAST
Time-scaled phylogenies were generated using the partitioned SRD06 model.
(A) Full trees with scale bar graduations in years.
(B) Terminal branches removed with scale in calendar years to show timing of transmission events.
It is important to know whether the structure of the transmission network has influenced the transmission of drug-resistant virus. We have identified all patient samples with resistance-associated mutations in the six clusters by coloured circles at the tips (
Each viral sequence in the time-scaled phylogenies represents a different patient. Therefore, for any two sequences, the branches connecting them through their most recent common ancestor (MRCA) include at least one transmission event. Consequently, the distance between the MRCA and the previous node estimates the upper bound of time between transmission events.
It is very likely that not all members of any transmission cluster have been sampled, so the average time between transmissions will almost certainly be overestimated. Analysis of the overall distribution of internode intervals, estimated from the trees (
Median branch length was 13.14 mo, and the 25th percentile was 5.8 mo. Similar results were obtained using the GTR + I + Γ model (unpublished data).
We have made use of sequence data obtained for resistance genotyping for the largest clinical centre treating patients with HIV in London to reconstruct the transmission network in this population. In contrast to previous studies based on sparsely sampled populations, by examining all pairwise comparisons among sequences from more than 2,000 patients, we were able to identify a subset of 402 subtype B pairs with a genetic distance of 5% or less. Detailed phylogenetic analysis identified a number of large clusters among these patients, which together comprised 25% of this group. “Dated phylogeny” analysis of these clusters [
New HIV diagnoses among MSM have risen steadily in the United Kingdom for almost 10 years, and are now approaching twice the number recorded annually in the mid to late 1990s [
We have used a database of HIV sequences collected in the course of routine clinical treatment from 2,126 patients to characterize the relationships between viruses infecting different individuals attending a large clinic in London. The depth of sampling meant this study was much more informative about the transmission patterns than previous studies [
Other possible limitations of the study should be recognised. Although the use of a phylogenetic definition of clusters avoids the necessity to select an arbitrary distance value, there are clear restrictions on what can be concluded from the phylogeny. The similarity between many of the sequences
One of the possible consequences of rapid transmission within clusters is a local increase in the transmission of drug-resistant strains [
As the time-dependent phylogenies are calibrated in calendar years, we are able to estimate when most of the transmissions in each cluster occurred. For cluster A, the largest cluster comprising 30 individuals, most of the transmissions between them occurred within about 6 y preceding 1999 (
As epidemiological models become increasingly complex, incorporating variable mixing patterns [
Rates estimated using the GTR + I + Γ (solid line) and SRD06 (empty square = third-codon position; filled circle = first-/second-codon positions) models, respectively for the six largest clusters. Error bars show 95% highest posterior density of SRD06 rates.
(34 KB PPT)
Rates using the GTR + I + Γ (filled circles) and SRD06 (empty circles) substitution models are shown separately (solid line = median) for the six largest clusters.
(55 KB PPT)
(115 KB PPT)
BEAST analysis is based on a summary of a large number of phylogenetic trees where individual trees may have a slightly different topology (depending on the degree of support for the existence of each node). To indicate support for an estimated node date, it is possible to estimate the marginal distribution of TMRCA for all taxa that are found within below each node of the maximum clade credibility tree. For each node within each cluster tree, the distribution of the TMRCA was calculated from the same tree sample used to produce the trees. The distributions of each TMRCA have been aligned (according to time) with the cluster trees from
(1.5 MB PPT)
Numbers of analyzed samples where patient had been receiving ARV before sample was taken, by year.
(30 KB DOC)
Average difference between 1,695 subtype B sequences and proportion of polymorphic sites. p-distance: uncorrected genetic distance; dS p-distance: uncorrected distance at synonymous nucleotide sites; dN p-distance: uncorrected distance at non-synonymous nucleotide sites.
(27 KB DOC)
Numbers of resistance-associated mutations observed in each cluster according to coding region and in total.
(38 KB DOC)
The sequences analyzed have been deposited in GenBank (
We are very grateful to Esther Fearnhill and Dr. David Dunn, Medical Research Council Clinical Trials Unit (London, United Kingdom), for assistance with data; to Dr. Sergei Kosakovsky Pond for his assistance with the analysis in the early stages of this work; and to Dr. Simon Frost for discussions.
antiretroviral therapy
general time-reversible
Monte Carlo Markov chain
most recent common ancestor
men who have sex with men
protease
reverse transcriptase
time to MRCA
United Kingdom Collaborative HIV Cohort