Inferring HIV Transmission Dynamics from Phylogenetic Sequence Relationships

New insights into HIV transmission dynamics, say the authors, are likely to come from analyzing the viral sequence information that is being routinely collected during HIV genotyping.

D espite the range of resources directed at understanding the HIV pandemic over the past 25 years, surprisingly little is known about how HIV infection spreads through populations. Unlike some other infectious diseases, acute infection with HIV is difficult to identify. HIV disease most often manifests years after the transmission event. Together with the special challenges involved in determining exposures related to sexual behavior or drug use, all of these factors have made it difficult to apply the tools of traditional epidemiologic investigation. Recent antibody testing strategies to identify incident HIV for surveillance programs have met with limited success [1]. Key questions that remain unanswered by empirical data include the role of acute infections in sustaining the current pandemic, and the effects of antiretroviral treatment programs on transmission of drugresistant and drug-susceptible strains of HIV. Without really understanding how HIV spreads, it is difficult to optimize prevention or control strategies.
As effective anti-HIV therapies emerged over the past decade, clinical care and surveillance programs have increasingly emphasized the importance of testing for resistance to antiretroviral drugs. This most commonly involves sequencing of viral genes for resistance mutations. The rapid expansion of this HIV genotyping has predictably resulted in creation of vast databases that now contain viral sequence information. The new study by Andrew Leigh Brown and colleagues in this issue of PLoS Medicine [2] shows that modern analytic tools may yield important new insights into HIV transmission dynamics from the information routinely collected in such sequence databases.

HIV "Phylodynamics" for the Study of Local Epidemiology
Leigh Brown and colleagues were interested in better understanding the epidemiology of HIV among men who have sex with men in London. To this end, they obtained access to a relatively large convenience sample of HIV pol sequences (see Glossary) obtained through the routine testing of 2,126 unique HIV-infected patients served by a large university medical center in London. They used a "phylodynamic" approach, an interdisciplinary blend of immunodynamics, epidemiology, and evolutionary biology, to infer the shortterm dynamics of HIV transmission in the base population from relationships among sequences in their study sample.
The authors initially applied a viral genetic relatedness cutoff to filter the data down to a computationally manageable subset of 402 HIVinfected individuals that exhibited at least one other close sequence relative in the study population. Nine large putative transmission clusters were identified within this subset of protease and reverse transcriptase sequence data on the basis of genetic (Hamming) distance. The presence of these transmission clusters was subsequently independently verified using Bayesian Markov chain Monte Carlo phylogenetic methodology. The authors then used a "relaxed clock" approach to generate time-scaled phylogenies of these data, to infer the timing and distribution of transmission events within the 88 sequences contained in the six clusters that were large enough for analysis.
While components of the methodology were previously established and applied in other contexts, the results of this first successful application of phylodynamics to HIV sequence data-mining are themselves noteworthy for several particular reasons. First, the internode distances within the study's time-scaled phylogenies were surprisingly short-in more than a quarter of cases, transmission events appear to have occurred fewer than six months after infection. Second, a substantial majority of the transmissions inferred to have taken place in the clusters were concentrated in a welldefined five-year period, bounded by periods of less frequent transmission. Together, the phylodynamic data suggest that the (sexual) transmission of HIV in London over the previous decade may have occurred not as a slow and steady process, but rather via discrete outbreaks fueled in part by efficient transmission during acute HIV infection.

Differences from Previous Studies
Phylogenetic sequence analysis has been used extensively in HIV epidemiology. These data are commonly used to support the identity of supposed "transmission pairs" for purposes of contact investigation [3], translational biological studies [4], and epidemiologic studies in which HIV transmission is an outcome [5]. Looking at larger sequence databases, a number of investigators have taken the clustering outcome as evidence of individual membership in a contact network or as an (indirect) marker of infectivity. Their studies have correlated clustering with acute disease stage [6][7][8], viral factors [9], risk behaviors [7,10], and even geography [10]. The present study is distinguished from these reports by its focus on the internal architecture of the sequence clusters. Leigh Brown and colleagues' ability to study internal cluster structure clearly depends on access to large numbers of clustered sequences (which might relate in turn to either the structure of underlying contact networks or to the density of population sampling).

Implications for Public Health and Clinical Practice
If application of Leigh Brown and colleagues' phylodynamic methods to HIV can be further validated and their results confirmed by additional investigators, the finding that HIV is frequently transmitted through discrete outbreaks would suggest the need for a stronger emphasis on outbreak detection and network intervention/outbreak control strategies [11]. These strategies are currently used for other diseases, such as syphilis and tuberculosis. In this context, it is worth noting that sequence data-mining techniques can be as easily misused as used properly [12]. Guidelines are needed to clarify individual privacy rights and provide a legal framework for dealing with such sequence data that balances patient autonomy with scientific and public health objectives. Until then, exceptional caution should be used in dealing with phylogenetic/dynamic associations at the individual level.
Most immediately, the ability to illustrate epidemic dynamics through the analysis of phylogenetic sequence information should encourage surveillance and prevention researchers to explore sequence databases with renewed vigor. With the revision of guidelines encouraging more frequent resistance testing of newly diagnosed patients [13], and the new creation of sequence databases worldwide, hopefully the number of populations with the high-density sampling necessary for phylodynamic analysis may be increasing. What is occurring globally, in diverse settings, with the introduction of antiretroviral treatment programs? To what degree is transmission efficiency affected by drug resistance, and how will this affect future treatment options? Do the dynamics of devastating epidemics in sub-Saharan Africa or Eastern Europe differ in some fundamental way from those in the most developed countries? The provocative data from Leigh Brown and colleagues suggest an outbreak model for London's community of men who have sex with men; similar and complementary investigations in diverse settings should clarify the actual need for new global HIV control strategies.