The Role of Viral Introductions in Sustaining Community-Based HIV Epidemics in Rural Uganda: Evidence from Spatial Clustering, Phylogenetics, and Egocentric Transmission Models

Using different approaches to investigate HIV transmission patterns, Justin Lessler and colleagues find that extra-community HIV introductions are frequent and likely play a role in sustaining the epidemic in the Rakai community. Please see later in the article for the Editors' Summary


Introduction
Effective prevention and control of the human immunodeficiency virus (HIV) builds upon an understanding of the dynamics that sustain viral transmission within sexual networks [1,2].These networks are comprised of sexual partnerships between individuals within households, between community members not sharing a household, and between individuals in different communities.While sufficiently large intra-community sexual networks can potentially maintain local HIV epidemics, virus introduced from sources external to the community may also sustain incidence [3,4].The effectiveness of interventions designed to prevent HIV transmission within a given community or any other geographic unit depends in part upon the attributable fraction of new cases infected through partners residing within the targeted area and those infected from partners residing outside of that area [4][5][6][7].These proportions are particularly relevant to population-based antiretroviral therapy (ART) strategies for HIV prevention that aim to benefit individuals who do not themselves receive the treatment by reducing their risk of infection.
In 2011, ART was established as a highly effective tool for HIV prevention in the landmark HPTN 052 clinical trial [8], which showed that ART almost universally prevents HIV transmission within HIV-discordant couples [8,9].The concept of ART for HIV prevention (''treatment as prevention'') is now widely accepted, and in 2012, it was adopted by the US President's Emergency Program for AIDS Relief as a key strategy for population-based HIV control [10].Despite the widely heralded success of HPTN 052, it is unknown whether ART can be scaled to levels necessary to interrupt community-level HIV transmission.Uncertainty remains, in part, because the treated population in HPTN 052 represented a unique subset of the total HIV-infected population: participants were in the chronic stages of HIV infection, receiving care for their disease, and in a stable sexual partnership [8].Transmission in the broader population occurs along a complex sexual network in which virus is transmitted by infected individuals in early and chronic stages of HIV infection and between individuals who may or may not be in stable sexual partnerships.These complexities have motivated large community-randomized controlled trials (CRCTs) of ART for HIV prevention in African populations, including the HPTN 071 study in Zambia and South Africa [11] and the Mochudi Prevention Project in Botswana [12].By virtue of their communityrandomized design, these CRCTs presume that the preponderance of viral transmissions occur between partners residing within the same communities of randomization [13]; however, it is unknown what fraction of HIV transmissions in Africa occur within communities versus across community boundaries.
The empirical study of HIV transmission outside of stable couples is challenging, but new approaches to epidemiological inference and evolutionary biology provide unprecedented opportunities to understand the spatial scale of HIV transmission networks.Here we test the hypothesis that extra-household HIV transmission is predominately sustained through intra-community sexual networks using population-based cohort data from 14,594 individuals, including 189 individuals with incident HIV residing within 46 communities in the Rakai District, Uganda.Rakai, bordered by Tanzania to the south and Lake Victoria to the east, is rural and represents one of the earliest epicenters of the HIV/ AIDS epidemic in east Africa [14].Presently, HIV transmission in Rakai is endemic, with circulation of HIV-1 subtypes A, D, and C, and multiple recombinant viruses [15].
Our study consists of three primary analyses, in all of which the primary geographic unit of interest was the community.In the first analysis we used the geographic coordinates of participant households and measured the tendency of HIV-seropositive persons to spatially cluster within and outside of communities.If local transmission dynamics dominate, we expect infected persons to spatially cluster at geographic distances consistent with intracommunity transmission.In the second analysis we examined the genetic relatedness of infecting viruses within communities.If transmission is sustained through local sexual networks, viruses within newly infected persons should be more similar to viruses of other HIV-infected persons within the community than to those of individuals outside the community.Finally, we used egocentric network information on the geographic locations of recent sexual partners to estimate the proportions of new transmissions occurring between household, community, and extra-community partners.In this third analysis we also estimated the proportion of household transmissions occurring within 1 y of an index household infection.Each of these three independent, yet complementary, analyses has its own strengths and weaknesses, and together they are a powerful set of inferential tools for understanding the spatial scale and structure of HIV transmission networks.

Ethics Statement
The study was independently reviewed and approved by Ugandan (Ugandan Virus Research Institute Security and Ethics Committee; Protocol GC/127/13/01/16) and US (Western Institutional Review Board; Protocol 200313317) institutional review boards.All study participants provided written informed consent at baseline and follow-up visits using institutional review board-approved forms.

Study Population and Setting
The Rakai Community Cohort Study (RCCS) is a wellcharacterized population-based HIV surveillance cohort in the Rakai District, Uganda (Figure 1A).Methods for the RCCS have been described in detail elsewhere [6].Briefly, the RCCS enrolls all consenting persons aged 15-49 y residing in 50 village communities.The RCCS defines households as a group of persons who sleep under one roof and eat out of a common pot, and a community as an administrative unit whose boundaries are determined by the Ugandan government (Local Council 1 and Local Council 2 units, the two smallest political units in Uganda).Eleven larger community groupings (2-8 communities each), referred to as geographic regions, were previously designated by the RCCS based upon geographic proximity and the frequency of cross-community contact (Figure 1B) [6].
Study participants are administered a detailed questionnaire at visits occurring every 12-18 mo and provide a serological sample at each visit.HIV serostatus is assessed by two enzyme immunoassays (Vironostika HIV-1, BioMerieux, and Recombigen, Cambridge Biotech), with Western blot confirmation of discordant enzyme immunoassays and for all HIV seroconverters (HIV-1 WB, BioMerieux-Vitek).RCCS participation rates are ,90% of persons present at time of survey, and follow-up rates between successive visits are ,75%.
In this study, we used data from RCCS survey round 13 (RCCS R13) for all data analyses (spatial clustering, viral phylogenetics, and egocentric transmission models).RCCS R13 was conducted between June 17, 2008, and December 7, 2009, within 46 of the 50 RCCS communities.It included surveys of 14,594 participants residing in 8,899 households, the collection of household GPS coordinates (8,156/8,899, or 91.6% of study households; resolution ,3-5 m), and viral sequencing for ART-naı ¨ve HIVseropositive individuals.Participants who were HIV seropositive upon entry into RCCS R13 were defined as HIV seroprevalent in all analyses.The average maximum distance between any two households within a community (i.e., the community size) was ,3 km (Figure S1).Though our three primary analyses use data drawn from the same study population (RCCS R13), each analysis was conducted independently of the others.

Spatial Clustering Analysis
Using the geographic coordinates of participant households in RCCS R13 the spatial relatedness between HIV-seropositive individuals was characterized by t(d 1 ,d 2 ), defined as the relative probability that a participant A residing within a distance range, d 1 to d 2 , from an HIV-seropositive participant B was also HIV seropositive versus the probability that any RCCS participant was HIV seropositive, regardless of spatial location [16].It is estimated as: where V i (d

Viral Extractions and HIV-1 Subtype Assignment
Viral RNA extractions were performed on sera of all ARTnaı ¨ve HIV-seropositive participants in RCCS R13 (n = 1,434) using the QiAmp Viral Mini Kit (Qiagen).Extracted RNA was amplified by reverse transcription PCR and an additional nested PCR in two separate assays for partial gag (HXB2 nucleotides 1249 to 1704) and env (HBX2 nucleotides 7858 to 8260) sequences, as previously described [15,17].RNA extractions and PCR assays were conducted in separate designated laboratory spaces for quality control.HIV amplicons were sequenced using direct Sanger methods on the Applied Biosystems 373xl DNA Analyzer.Results were examined immediately for contamination and batch effects.We also repeated testing for a subset of specimens (extraction through sequencing).Sequential samples from the same individual always clustered together when compared using phylogenetic methods (Figure S2).
HIV-1 subtype assignments were made using the US National Center for Biotechnology Information genotyping database and then confirmed phylogenetically with reference sequences from the Los Alamos National Laboratory HIV Sequence Database (HIVDB).Sequences were aligned with MUSCLE v3.7 and manually edited in Bioedit v7.1.3[18].Ambiguous regions in sequence alignments were removed using GBLOCKS v0.91b [19].Final alignments were ,564 bp in the gag gene and ,467 bp in the env gene.Sequences were scanned with all available methods in the Recombination Detection Program v3.44 [20].Within-gene recombination events identified in one or more analyses were verified using jumping hidden Markov models [21].Intra-gene recombinant sequences were excluded from additional phylogenetic analyses (gag, n = 17; env, n = 8).

Phylogenetic Analysis
Maximum likelihood (ML) methods under an HKY-85 model of nucleotide substitution were used to estimate genetic pairwise distances and reconstruct phylogenetic trees for gag and env genes and for HIV-1 A, D, and C subtypes separately (six datasets in total).African reference sequences (one per individual reference ID) were selected from the Los Alamos National Laboratory HIV Sequence Database for analyses.Using HKY-85 genetic pairwise distance, the three Los Alamos National Laboratory HIV Sequence Database reference sequences most similar to each participant's sequence were identified, and the unique subset of these sequences was defined as the reference set for RCCS R13 (Table S1).The reference set included viral sequences from all major geographic regions in sub-Saharan Africa.
ML phylogenetic trees were reconstructed under two models of nucleotide substitution, the HKY-85 model and the general time reversible model with gamma distributed rate heterogeneity and a proportion of invariable sites (GTR+I+G) [22,23].In the GTR+I+ G model all possible nucleotide substitution rates are estimated, whereas in the HKY-85 model only transition and transversion rates are estimated (six versus two substitution rate parameters).We defined a cluster of related HIV cases as two or more participants whose sequences were contained within a monophyletic group in ML trees in either one or both gene regions (gag or env) at a bootstrap threshold of 90% or greater (1,000 replications).Clusters also met intra-cluster median genetic distance thresholds, where thresholds were defined using RCCS genetic data from epidemiologically linked HIV-incident couples (i.e., where at least one of the partners was an incident case).Specifically, genetic distance thresholds for each gene region were defined as the 95% quantile of the distribution of ML branch length distances between epidemiologically linked sexual partners (i.e., known couples) where at least one of the partners was an incident case and the partner sequences were contained within a monophyletic cluster with moderately high clade support ($70%; Figure S3).Distance thresholds estimated for gag and env genes were 1.3% and 2.6%, respectively.
ML clusters were confirmed using Bayesian phylogenetics, where confirmation was established if the same sequences clustered together in the Bayesian tree with posterior probability equal to one.The ML tree topologies obtained using the more parameter-rich GTR+I+G model were similar to those obtained under the HKY-85 model, and so Bayesian confirmation of clusters was conducted using the HKY-85 model only.Bayesian analyses were conducted using MrBayes v3.2 [24], where trees were obtained through separate unconstrained phylogenetic analyses (i.e., no molecular clock) and each codon position was allowed to have its own site-specific rate.Four independent runs were performed for 3610 8 generations, and a burn-in of 25% was used for final analyses.Effective sample sizes for all parameters exceeded 200.
We assessed the sensitivity of our cluster definition using alternate cluster definitions in the ML analysis: 70%, 80%, 90%, and 99% bootstrap thresholds with and without genetic distance thresholds for HKY-85 and GTR+I+G models of substitution.We present the ML radial and square phylogenetic trees estimated under the HKY-85 model as figures in this article and in the Supporting Information.Community and household labels used in the square trees were blinded (i.e., true RCCS identification numbers were not used), and the exact community locations were not labeled on geographic maps to ensure the privacy of our study participants.The ML phylogenetic trees constructed under the GTR+I+G model and the Bayesian phylogenetic trees are available from the authors upon request.

Egocentric Transmission Model
Study participants in RCCS R13 were asked about their most recent sexual partners (up to four partners, restricted to last 12 mo).Stable partnerships were defined as either marriages or long-term consensual unions.All other partner types (boyfriend/ girlfriend and casual) were defined as non-stable.Participants were asked whether each sexual partner's primary residence was within the same household, within the same community, or outside of that individual's community.As per protocol, RCCS participant identifiers could be matched with a named partner only for stable (usually household) partners.If the stable partner was also an RCCS participant, we considered those partners to be epidemiologically linked.In instances where the epidemiologically linked partner did not participate in RCCS R13 but did so in a prior RCCS survey and he/she was HIV seropositive at his/her last study visit, we considered that partner HIV seroprevalent.When discrepancies between the self-reported geographic locations of household partners and GPS data obtained through RCCS were identified (,2%, n = 256 self-reported partners), data were independently reviewed and adjudicated by study investigators (M.K. G., A. D. R.).
We considered a household HIV-seropositive partner to be on ART if that person was on ART for $50% of the inter-survey interval in which their initially uninfected partner was at risk for HIV.The RCCS has identified no HIV seroconversions within serodiscordant couples where the HIV-infected partner is on ART since ART was introduced in Rakai in 2004 [25]; therefore, we assumed that HIV-seropositive household partners on ART posed no risk to their uninfected partners in this analysis.
HIV sequence data for self-reported sexual partners was obtained only if those partners could be identified as being another RCCS participant, and this was possible only for stable partners.For phylogenetic methods to exclude any self-reported partner as a source of infection, sequences from all partners and the ability to detect co-infection are needed.As neither was available in this study, the egocentric transmission model and phylogenetics were conducted as independent, though complementary, analyses.
We used egocentric sexual partner data from HIV-seronegative and -incident participants (excluding those HIV-seronegative participants who entered into the study for the first time in RCCS R13 or who had missed more than two previous study visits) to model the probability of HIV infection from self-reported partners and unreported partners/sources as follows: where Y i is equal to 1 if participant i is an incident case; n i is the number of partners of case i; w ij , z ij , and m ij are indicators of whether partner j of case i is ART-naı ¨ve seroprevalent, incident, or missing HIV status, respectively; a and c are the probabilities of infection from ART-naı ¨ve seroprevalent and incident partners, respectively, between study rounds; p ij is the probability of case i being infected by a partner j with missing status given their respective locations; and r i is the probability of i being infected from an unnamed partner/source.The probability of infection from a self-reported partner of unknown HIV status was modeled as follows: where logit(p ij ) is the log odds that i was infected by partner j, HH ij is an indicator of whether participant i shares a household with partner j, C ij is an indicator of whether the partner is outside the community, and F i is an indicator of whether partner i is female.Parameters were estimated using Markov chain Monte Carlo methods.The numbers of infections attributable to specific partnership types were estimated by sampling parameters from the posterior distribution and then simulating sources of infections for each parameter set (250,000 iterations).In households where both partners had incident infection we initially randomly assigned one partner as having been infected first (i.e., without an identifiable incident partner) and the other partner as having been infected second (i.e., with an identifiable incident partner).Assignments were updated in each Markov chain Monte Carlo iteration and accepted or rejected using the standard Metropolis-Hastings criteria.For each incident infection, the probability of infection by each type of partner was calculated based upon the current parameter set and then normalized so that they summed to one (i.e., calculated conditional on that individual having been infected).Which partner (or unknown source) infected each individual was then randomly selected based upon these probabilities.
The sensitivity of the parameter estimates from our transmission model to unreported partnerships and misreported community status of partners was assessed by running 100 simulations where 10% of the reported partnerships in the original data were unreported and 100 simulations where the community status of 10% of extra-household partners was misreported (i.e., intracommunity was changed to extra-community or vice versa).

Spatial Clustering of HIV-Seropositive Individuals
Spatial clustering of HIV-seropositive individuals within households.We observed strong spatial clustering of HIVseropositive individuals within households (Figure 2A-2C).The probability that a participant living in the same household as an HIV-seropositive participant was also HIV seropositive was 3.2 (95% CI: 2.7-3.7)times greater than the probability that any RCCS participant was HIV seropositive (shown in red, Figure 2A).Even stronger household spatial clustering was observed among HIV-incident cases: the probability that a participant living with an HIV-incident case was also HIV incident was 10.8 (95% CI: 2.3-23.6)times the probability that any participant was an HIVincident case (shown in blue, Figure 2C).

Spatial clustering of HIV-seropositive individuals within
communities.We explored whether there was spatial clustering of HIV-seropositive individuals outside of households at distances up to 30 km.We found statistically significant though weaker spatial clustering of HIV-seropositive persons outside of households.Compared to all study participants, persons living 10-250 m from a HIV-seropositive participant were 1.22 (95% CI: 1.14-1.29)times as likely to be HIV seropositive themselves, and those living 250-500 m away were 1.08 (95% CI: 1.00-1.17)times as likely to be HIV seropositive (Figure 2A and 2D).
We also examined whether incident cases spatially clustered with other HIV-incident and -seroprevalent cases outside the household, since spatial clustering among all HIV-seropositive persons may reflect historic rather than recent patterns of HIV transmission.In contrast, we observed no statistically significant extra-household spatial clustering of HIV-incident cases with other incident or seroprevalent cases (Figure 2B and 2D), though incident cases appeared to weakly cluster with seroprevalent cases at geographic distances less than 500 m (shown in yellow, Figure 2B and 2D).There was no significant spatial clustering beyond 500 m in any spatial analyses and no significant intra-household or extra-household spatial clustering between HIV-incident and HIV-seroprevalent persons on ART (Figure S4).

HIV Phylogenetics within and across Communities
Viral sequence data for the gag and env genes were obtained for 1,099/1,434 (76.6%)HIV-seropositive participants who were not on ART at the time of the RCCS R13 survey (Table S2), including 164 of 189 (86.7%) incident cases (Table S3).On average, 15 (range 3-24) viral sequences were retrieved from HIV-incident cases, and 85 (range 15-143) sequences were retrieved from HIVprevalent cases per geographic region.Sequences were predominantly HIV-1 subtypes A1 or D, and both subtypes were found in all communities.Of those participants with sequence information in both gene regions (n = 842/1,099), 21.1% (n = 178/842) did not share the same HIV-1 subtype in gag and env genes.No statistically significant differences were found between HIV-infected individuals from whom viral sequences were obtained (in either or both genes) and those from whom no viral genetic data were obtained for duration of the participant's infection (prevalent or incident), gender, marital status, or geographic region of residence.However, there was a significant decrease in the number of sequences obtained with each increasing year of age (either gene: RR = 0.988; 95% CI: 0.980-0.995;both genes: RR = 0.990; 95% CI: 0.982-1.00).

Genetic relatedness of HIV viruses within households.
Our study population included 165 epidemiologically linked couples where both partners had participated in RCCS R13 and were HIV seropositive and not on ART at the time of the survey.Twenty-five percent (n = 42/165) of these couples included at least one incident case (both partners were HIV incident in 9/42 incident couples).Sequence information was available for at least one gene region (either gag or env) in 63.6% (n = 105/165) of epidemiologically linked couples, including 76.2% (n = 32/42) of those with one or more incident cases (n = 7/9, 77.7% of those where both cases were incident).Ninety-nine percent (n = 104/ 105) of epidemiologically linked pairs with sequence data shared a household, including all 32 incident couples.
There were 12 households where sequence data were available for two persons who were not epidemiologically linked, all of whom were HIV-seroprevalent pairs.Median intra-subtype genetic distance in these pairs was 6.4% in gag (n = 7/12, IQR: 3.0%-7.5%)and 9.4% in env (n = 10/12, IQR: 7.0%-10.7%),and only one pair phylogenetically clustered within the ML trees.A detailed summary of the HIV genetic data for all of the 105 epidemiologically linked couples with HIV sequence data is included in Table S4.
Genetic relatedness of HIV viruses within and across communities.Shown in Figure 3B is the distribution of intrasubtype genetic distances in the gag gene for incident couples (i.e., one sequence obtained from an incident case) sharing the same community (median = 6.3%;IQR: 5.4%-7.3%).This distribution was nearly identical to that seen within geographic regions (median = 6.4%;IQR: 5.4%-7.4%)and across all communities (median = 6.4%;IQR: 5.5%-7.3%).Similar distributions were observed in the env gene (data not shown).
Two participants sharing a phylogenetic cluster suggestsbecause of our strict cluster definition-that they are separated by a relatively short and recent chain of transmission.Only 19.0% (209/1,099) of HIV-infected participants in RCCS R13 shared a phylogenetic cluster with at least one other RCCS study participant in either the gag or env genes.A total of 95 phylogenetic clusters were identified across all ML phylogenetic trees (n = 209 individuals; Tables 2 and S4).The majority of clusters included only two (86.3%, n = 82/95) or three HIV-infected persons (9.5%, n = 9/95).We also identified four additional phylogenetic clusters, of which two clusters contained four individuals each (2.1%, n = 2/ 95) and two clusters contained five individuals each (2.1%, n = 2/ 95).None of the identified phylogenetic clusters contained a reference sequence, and 40.0%(n = 38/95) contained at least one incident case, encompassing 50 incident cases in total (Table 2).Almost half of all phylogenetic clusters identified (44.2%, n = 42/95) were household pairs of two (63 prevalent cases; 21 incident cases).Of the 53 clusters that contained participants who spanned households (n = 53/95), 38 clusters crossed community boundaries (71.7%).These 38 cross-community clusters included 28 pairs (47 prevalent cases; nine incident cases); seven triplets (18 prevalent cases; three incident cases), two clusters of size four (four prevalent cases; four incident cases), and one cluster of size five (one prevalent case; four incident cases).Nearly half of the crosscommunity clusters (47.4%, n = 18/38) also spanned geographic regions.Community clusters (n = 15/53) included 12 pairs (19 prevalent cases; five incident cases), two triplets (four prevalent cases; two incident cases), and one cluster of size five (three prevalent cases; two incident cases).When analyses were restricted to only those clusters containing at least one incident case (n = 38/ 95), similar geographic patterns were observed (Table 2).
There were six phylogenetic clusters that contained only incident cases (6.3%, n = 6/95), of which five contained a single household pair (ten incident cases) and one contained two household pairs (four incident cases).Our definition of a phylogenetic cluster may have precluded the identification of some transmission chains; however, in sensitivity analyses the proportion of clusters with more than one household that crossed community boundaries was robust to the strictest (66.7%, n = 18/ 27 crossed community boundaries) and most relaxed (74.0%, n = 77/104 crossed community boundaries) phylogenetic cluster definitions assessed (Table S5).A detailed summary of each of the 95 phylogenetic clusters identified is included in Table S5.

Probable Infection from Household, Community, and Extra-Community Sources
A total of 11,992 recent sexual partners were self-reported by 5,368 women and 4,152 men who were HIV seronegative at a previous study visit (Table 3).Of these self-reported partners, 42.1% (n = 5,043) could be epidemiologically linked to another RCCS participant who participated in RCCS R13 or a previous survey round.Ninety-six percent (n = 5,159/5,368) of women reported only one sexual partner in the last 12 mo, compared to 59.2% of men (n = 2,458/4,152) (Table S7).Of enumerated selfreported partners, 63.0% (n = 7,549/11,992) held primary residence within the participant's household, 19.5% (n = 2,342/ 11,992) were within the participant's community but outside of the household, and 17.5% (n = 2,101/11,992) had a primary residence outside of the participant's community (Table S8).Household partnerships were almost always stable partnerships (i.e., 99% were marital or long-term consensual unions), whereas partnerships outside of the household were usually not stable (95%; Table S8).The majority of extra-household sexual partners were reported by unmarried persons (n = 2,895/4,443, 65.2%).
Attributable fractions of HIV infections from householdbased transmission.Using the egocentric partner data, we estimated that 39.0% (95% CI: 32.3%-43.9%) of 189 incident cases were infected by a household sexual partner (Table 4).Those with an incident household partner (n = 9 household pairs) had an estimated 26.0% (95% CI: 13.4%-45.0%)probability of acquiring HIV from that partner (Table 5).In 20.6% of cases where infection was attributed to a household partner with known HIV status, that partner was him/herself an incident case.There were 38 incident events among 250 individuals in a stable sexual partnership with an ART-naı ¨ve HIV-seroprevalent partner.After accounting for risk from other self-reported partners and unknown sources, we estimate that the probability of transmission from these seroprevalent household partners not on ART was 15.3% (95% CI: 10.9%-20.6%).Among at-risk individuals who had an HIVseroprevalent partner who was on ART for 50% or more of the risk interval (n = 29), only one became HIV-infected; and there were no infections among the 27 with partners who were on ART for 60% or more of the interval.
The HIV status for the suspected index partner in 16.2% (95% CI: 11.6%-20.1%) of household transmissions was unknown.

Attributable fractions of HIV infections from community,
extra-community, and unknown sources.Infections from self-reported extra-household partners were estimated to account for 39.5% of new cases (95% CI: 33.9%-42.3%), of which the majority (62.1%, 95% CI: 54.9%-69.7%)were from self-reported partners outside the community (Table 4).Where the specific location of these self-reported extra-community sexual partners was known (68%), 50% lived outside of the Rakai District and were geographically dispersed throughout Uganda (Figure 1A).While men were 1.8 times more likely to disclose an extracommunity partner than women (1,061/4,152 versus 761/5,368; 95% CI: 1.7-2.0),those women who reported an extra-community partner had higher odds of HIV acquisition from that selfreported partner than men who reported an extra-community partner (odds ratio = 5.0; 95% CI: 2.2-14.1).Acquisition from unknown sources accounted for 21.4% of total infections (95% CI: 14.8%-29.6%),although the individual probability of such infections was low (0.3%; 95% CI: 0.2-0.5).Sensitivity analysis.Sensitivity analyses were conducted to determine the robustness of the parameter estimates in Table 5 to underreporting and misreporting of self-reported sexual partnerships.In simulations where 10% of self-reported partnerships were considered unreported, the median bias in parameter estimates for the transmission model was less than 10% of the width of the 95% confidence interval in all cases except for the probability of infection from an unnamed source (r), which increased as expected.Moreover, all 95% CIs included the original point estimate, with the exception of r, which differed as expected.In simulations where 10% of extra-household partnerships were considered to have a misreported geographic relationship with the study participant (i.e., extra-community partners were reported as community partners or vice versa), the median bias of each parameter estimate was less than 10% of the reported 95% CI width, and 97% or more of the 95% CIs from simulated estimates included the estimate from the original data.

Discussion
Using spatial statistics, viral phylogenetics, and egocentric transmission models we find evidence that extra-community HIV introductions are frequent, and likely play a significant role in sustaining ongoing HIV incidence in rural Rakai, Uganda.We estimate that viral introductions combined with intra-household transmission account for the majority of incident infections in this HIV-endemic region, though our data also suggest that communitybased sexual networks play a critical part in HIV spread.Our results underscore the complexities of HIV epidemic dynamics and sexual networks in rural Uganda and have important implications for the design and implementation of CRCTs and HIV prevention programs.
Each of the analyses used illuminates a different aspect of HIV transmission networks, and together they provide a powerful framework for understanding the spatial scale and structure of HIV transmission networks (Figure 4).Spatial analyses reveal whether HIV incidence or prevalence is elevated in close proximity to HIV-infected persons, but cannot distinguish whether spatially related cases are part of the same sexual network.Viral phylogenetics provides insight into the relationship between spatial and viral genetic similarity; however, high mutation rates and sparse sampling of networks make it impossible to definitively link cases to an infecting source.Egocentric transmission models relate the geographic distribution of personal sexual networks to individuals' risk of HIV infection, but give minimal insight into global network structure.
All three analyses suggest that frequent HIV introductions into communities play a critical role in ongoing HIV incidence in rural Rakai, Uganda (Figure 4).They show limited spatial clustering of HIV cases outside of households, multiple circulating HIV viruses within communities, and a significant proportion of incidence resulting from extra-community partnerships.Together, our data imply that there are frequent viral introductions into communities, followed by onward transmission within households (where we estimate over 1/3 of transmission occurs) and within small intracommunity sexual networks.These findings do not rule out an important role for community-level sexual networks in the Rakai HIV epidemic, but do suggest that local HIV epidemics are not sustained through community-based viral transmission alone.Furthermore, they highlight the risks of applying the results of sexually transmitted infection studies in urban areas outside of Africa (e.g., studies showing strong spatial clustering of gonorrhea cases in Baltimore [26]) to HIV control efforts within rural Africa.
In this prospective population-based cohort, intra-household HIV transmission was common, accounting for approximately 39% of new incident cases.This fraction is within the range of that previously estimated in 18 sub-Saharan African countries [27], but lower than the 55%-97% estimated in Zambia and Rwanda [28], both based on cross-sectional Demographic and Health Surveys (DHS).Hence, targeting treatment to stable HIV-discordant couples could prevent substantial numbers of new infections, but the effectiveness of this strategy is largely contingent on the rapid identification and treatment of HIV-infected index partners.Consistent with other studies [29,30], we found that the highest risk of HIV acquisition was within the first 18 mo of an index partner's infection.Chronically HIV-infected individuals also posed substantial, though lower, risk to their uninfected partners; however, ART appeared to eliminate this risk entirely.The strong protective effect of ART observed in this population-based study corroborates the findings from the HPTN 052 clinical trial and other observational studies of HIV transmission in Africa [25].Though no individuals in our study acquired HIV from an Table 4. Attributable HIV transmissions by geographic location of sexual partner and gender of newly infected participant (estimated from egocentric transmission model).identifiable HIV-seroprevalent partner on ART, we cannot rule out the possibility that non-identifiable sexual partners of incident cases were taking ART at the time of transmission.While intra-household transmission was common, it is extrahousehold transmission that determines the geographic scale of HIV epidemics.Here we estimate that more than half of all household introductions were the result of extra-community partnerships, with a wide geographic range of sexual partner networks.Fifty percent of extra-community partners had a primary residence outside of Rakai, including major urban centers in Uganda (i.e., Kampala and Masaka).Within the Rakai District, but outside of the RCCS target area, there are fishing communities along Lake Victoria where HIV prevalence is extremely high (,40% in data from an unpublished pilot study of 2,106 individuals in fishing communities in the Rakai District).Preliminary data show that men from these high-risk fishing communities frequently travel to RCCS communities, which may in part explain the high rate of HIV infection we observed among  unmarried women with extra-community partners.Mobility has long been associated with HIV transmission in Africa [31,32], though how exactly it relates to local epidemic dynamics, including the persistence of viral transmission in African contexts, remains understudied.Studies of other infectious diseases and network simulations suggest that such long distance ''jumps''-even when infrequent-can facilitate persistence of infection within broader contact networks [33][34][35].We did not measure the impact of local treatment as prevention in this study; however, our results provide insight into the mechanisms and upper limits of its effectiveness when implemented only locally, given the relative fractions of community and cross-community HIV spread.Our results suggest that community-based ART programs could have a major impact on African epidemics, but also highlight the need to target extra-community sources of HIV infection.Viral introductions could be reduced either through wider spread coverage of ART among HIVinfected persons or through prevention interventions that provide direct protection to uninfected individuals (e.g., male circumcision or pre-exposure prophylaxis).Targeting interventions that provide direct protection to those most likely to have extra-community partners may be an important addition to local HIV control strategies.

HIV Status of
Viral introductions pose significant challenges to epidemiological studies of HIV risk and prevention.Exposure misclassification may be common when using community viral load or other aggregated community-level measures of individual HIV risk [36,37].Similarly, in the case of CRCTs, indirect intervention effects may be obscured when cross-community transmissions are frequent [13].Incorporating phylogenetics and detailed information on individual partnerships into study design may facilitate interpretation of results from community-based studies of treatment as prevention, including the upcoming HPTN 071 and Mochudi Prevention Project trials [11,12].
Our study had several limitations.While RCCS demographics, including age distribution, marital status, and sexual behaviors, are largely representative of the broader Uganda population (Table S9) [38], our results may not be generalizable and suggest the need to study the spatial dynamics of HIV in other settings.In particular, uptake of HIV preventive services may be greater in RCCS communities, which could bias our estimates of per partnership risk if local HIV-infected partners were less likely to be infectious than partners outside of Rakai.A comparison of male circumcision prevalence in our study population versus in the general Ugandan population, as sampled in the DHS survey in 2011, revealed that the male circumcision rate was higher among RCCS participants than among DHS participants (39.4% versus 26.8%), though HIV prevalence and ART use among HIVinfected persons was similar between RCCS and DHS sampled populations (Table S9).We also considered newly enrolled HIVseropositive persons to be HIV seroprevalent, potentially underestimating the effect of early HIV infections on transmission.Overrepresentation of particular types of partnerships in our sample could also have biased results.For example, oversampling of household partners could lead to overestimation of the importance of household transmission; however, the proportions of men and women who were married in RCCS were similar to those reported in the Ugandan DHS, and household partners were not selectively recruited over community partners [38].Another limitation is that we identified the geographic sources of HIV infection from self-reported sexual partner data that may be inaccurate.The presence of HIV-incident cases for which no possible infecting partner could be determined indicates that some sexual partners were unreported.If these unreported sources of infection were evenly split between community and extracommunity partners (as opposed to following the distribution in the data), our estimate of the percentage of extra-household transmission due to community partners would increase from 38% to 45%.Furthermore, sensitivity analyses show that randomly unreported partnerships or randomly misreported community status would not substantially bias the results.However, systematic biases in partnership reporting could bias our results.
A notable of strength of our study was its prospective population-based study design, which captured a representative sample of the sexually active adult population in rural Rakai and yielded a sampling fraction of local sexual networks (,70% of the censused population) in the 46 surveyed communities.Individuals lost to follow-up during the interval of observation were more likely to be unmarried and younger than those who remained in the study.Such missing persons may be more mobile and at higher HIV risk.If so, our estimate of the frequency of cross-community transmission is likely an underestimate of the true value.Despite limited losses to follow-up and a high sampling fraction of the primary geographic unit of analysis (the community), we still observed minimal phylogenetic clustering between HIV sequences obtained from the same community, which limited our ability to identify HIV transmission chains using molecular epidemiological methods.Low levels of phylogenetic clustering are not uncommon in studies of HIV epidemics, particularly phylogenetic studies of heterosexual HIV transmission networks [39,40].Still, we were surprised to find so many singleton lineages within communities, given study participation rates.While it is true that we may have undersampled local sexual networks to some extent, high viral diversity within communities, coupled with a lack of spatial clustering outside of households and a high probability of infection from extra-community partners, implies that the limited phylogenetic clustering is a reflection of frequent viral introductions, at least in part.Intra-host HIV evolutionary dynamics, including HIV co-infection and rapid HIV genetic drift, also may have obscured the identification of HIV transmission chains using our phylogenetic approaches.
Taken together, our analyses reveal a complex picture of HIV dynamics in rural Uganda, and suggest that incidence is in part sustained through repeated introductions of HIV, with frequent intra-household transmission and some onward transmission through small intra-community networks.It remains unknown whether these patterns reflect broader source-sink dynamics, in which localized key populations, such as fishing communities with high HIV prevalence, may have a major effect on regional HIV transmission dynamics.HIV introductions present a challenge to local HIV control programs and CRCTs, necessitating a commitment to widespread combination HIV prevention in sub-Saharan Africa, and a deeper understanding of the extracommunity partnerships that reintroduce infection into rural populations.

Supporting Information
Figure S1 The geographic scale of RCCS communities.Communities are color-coded according to their RCCS geographic region (see Figure 1 for color key).The means for the average and maximum geographic distances between households within a community (across all communities) are marked with dotted red lines.The size of the dot is proportional to the size of the surveyed population/community size.(PDF)  Editors' Summary Background.About 35 million people (25 million of whom live in sub-Saharan Africa) are currently infected with HIV, the virus that causes AIDS, and about 2.3 million people become newly infected every year.HIV destroys immune system cells, leaving infected individuals susceptible to other infections.HIV infection can be controlled by taking antiretroviral drugs (antiretroviral therapy, or ART) daily throughout life.Although originally available only to people living in wealthy countries, recent political efforts mean that 9.7 million people in low-and middle-income countries now have access to ART.However, ART does not cure HIV infection, so prevention of viral transmission remains extremely important.Because HIV is usually transmitted through unprotected sex with an infected partner, individuals can reduce their risk of infection by abstaining from sex, by having one or a few partners, and by using condoms.Male circumcision also reduces HIV transmission.In addition to reducing illness and death among HIV-positive people, ART also reduces HIV transmission.
Why Was This Study Done?Effective HIV control requires an understanding of how HIV spreads through sexual networks.These networks include sexual partnerships between individuals in households, between community members in different households, and between individuals from different communities.Local sexual networks (household and intra-community sexual partnerships) are sometimes assumed to be the dominant driving force in HIV spread in sub-Saharan Africa, but are viral introductions from sexual partnerships with individuals outside the community also important?This question needs answering because the effectiveness of interventions such as ART as prevention partly depends on how many new infections in an intervention area are attributable to infection from partners residing in that area and how many are attributable to infection from partners living elsewhere.Here, the researchers use three analytical methods-spatial clustering statistics, viral phylogenetics, and egocentric transmission modelingto ask whether HIV transmission in rural Uganda is driven predominantly by intra-community sexual networks.Spatial clustering analysis uses the geographical coordinates of households to measure the tendency of HIV-infected people to cluster spatially at scales consistent with community transmission.Viral phylogenetic analysis examines the genetic relatedness of viruses; if transmission is through local networks, viruses in newly infected individuals should more closely resemble viruses in other community members than those in people outside the community.Egocentric transmission modelling uses information on the locations of recent sexual partners to estimate the proportions of new transmissions from household, intra-community, and extracommunity partners.
What Did the Researchers Do and Find?The researchers applied their three analytical methods to data collected from 14,594 individuals living in 46 communities (governmental administrative units) in Rakai District, Uganda.Spatial clustering analysis indicated that individuals who lived in households with individuals with incident HIV (newly diagnosed) or prevalent HIV (previously diagnosed) were 3.2 times more likely than the general population to be HIVpositive themselves.Spatial clustering outside households was relatively weak, however, and was confined to distances of less than half a kilometer.Viral phylogenetic analysis indicated that 44% of phylogenetic clusters (viruses with related genetic sequences found in more than one individual) were within households, but that 40% of clusters crossed community borders.Finally, analysis of the locations of selfreported sexual partners indicated that 39% of new viral transmissions occurred within stable household partnerships, but that among people newly infected by extrahousehold partners, nearly two-thirds were infected by partners from outside their community.
What Do These Findings Mean?The results of all three analyses suggest that HIV introductions into communities are frequent and are likely to play an important role in sustaining HIV transmission in the Rakai District.Specifically, within this rural HIV-endemic region (a region where HIV infection is always present), viral introductions combined with intra-household transmission account for the majority of new infections, although community-based sexual networks also play a critical role in HIV transmission.These findings may not be generalizable to the broader Ugandan population or to other regions of Africa, and their accuracy is likely to be limited by the use of self-reported sexual partner data.Nevertheless, these findings indicate that the dynamics of HIV transmission in rural Uganda (and probably elsewhere) are complex.Consequently, to halt the spread of HIV, prevention efforts will need to be implemented at spatial scales broader than individual communities, and key populations that are likely to introduce HIV into communities will need to be targeted.Additional Information.Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001610.
N Information is available from the US National Institute of Allergy and Infectious Diseases on HIV infection and AIDS N NAM/aidsmap provides basic information about HIV/AIDS, and summaries of recent research findings on HIV care and treatment N Information is available from Avert, an international AIDS charity, on many aspects of HIV/AIDS, including information on HIV and AIDS in Uganda and on HIV prevention strategies (in English and Spanish) N The UNAIDS Report on the Global AIDS Epidemic 2013 provides up-to-date information about the AIDS epidemic and efforts to halt it N The Center for AIDS Prevention Studies (University of California, San Francisco) has a fact sheet about sexual networks and HIV prevention N Wikipedia provides information on spatial clustering analysis (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages) N A PLOS Computational Biology Topic Page (a review article that is a published copy of record of a dynamic version of the article as found in Wikipedia) about viral phylodynamics is available N Personal stories about living with HIV/AIDS are available through Avert, NAM/aidsmap, and Healthtalkonline

Figure 1 .
Figure 1.Rakai District, Uganda.(A) Rakai (,2,200 km 2 ), a rural district in southwest Uganda, with population ,450,000 (,700 communities).RCCS R13 study participants (n = 1,085) reported 1,169 sexual partners with primary residence outside the Rakai District, but within Uganda (where disclosed, residential locations of sexual partners are indicated with red dots on the map).Only three sexual partners were reported to be living outside Uganda (two in Tanzania and one in the United Kingdom, not shown).(B) The Rakai district at a higher resolution, with the 11 geographic regions surveyed in RCCS R13 indicated in color.There are two primary highways (Masaka Road to Tanzania and the Trans-African National Highway to Rwanda and the Democratic Republic of the Congo [DR of Congo]) and numerous secondary roads that extend throughout the district.doi:10.1371/journal.pmed.1001610.g001

Figure 2 .
Figure 2. Spatial clustering of HIV-seropositive persons within households (0 km) and in geographic windows of 250 m up to 10 km (the first window is 10-250 m, and windows are centered every 50 m starting at 125 m).Spatial clustering analyses show whether HIV prevalence or incidence is elevated within certain distances of other HIV-seropositive persons.We define the spatial clustering of HIV-seropositive individuals as t(d 1 ,d 2 ), the relative probability that an HIV-seropositive person resides within a distance window, d 1 to d 2 , from another HIVseropositive person compared to the probability that any individual is HIV seropositive in the entire study population.Where spatial clustering exists, values of t(d 1 ,d 2 ) exceed one.Shaded areas show the 95% bootstrapped confidence intervals for spatial clustering estimates.(A) The spatial clustering between HIV-seropositive persons (prevalent or incident cases with other prevalent or incident cases; red).(B) The spatial clustering of HIVseroincident cases with ART-naı ¨ve HIV-seroprevalent persons (yellow).(C) The spatial clustering of HIV-seroincident cases with other HIV-seroincident cases (blue).(D) A blowup of the area where significant extra-household spatial clustering (,500 m) was identified among all HIV-seropositive persons (marked with black box in [A-C]).Data are shown only up to 10 km (no significant spatial clustering was observed beyond this distance).doi:10.1371/journal.pmed.1001610.g002

Figure 3 .
Figure 3. Maximum likelihood phylogenetic analyses of the HIV-1 gag gene.(A) Boxplots of the intra-subtype gag genetic pairwise distances for epidemiologically linked (Epi linked) incident couples (i.e., at least one member of the couple was an incident case) and for all epidemiologically unlinked incident pairs of individuals in RCCS R13.(B) Boxplots of intra-subtype gag genetic pairwise distances by the geographic distance between the incident pair.(C) A ML phylogenetic tree (radial) of HIV-1 subtype A gag sequences from HIV-seroprevalent (n = 245) and HIVincident (n = 55) cases, where taxa are colored by the geographic region from which they were isolated.Reference strains (n = 87) are in black.Grey circles indicate nodes with bootstrap support of $70%; black circles indicate intra-household clusters; { indicates an intra-household virus also sharing a cluster with at least one other household.Additional radial and rectangular phylogenetic trees for HIV-1 subtypes A, D, and C for gag and env genes are included in Figures S5, S6, S7, S8, S9, S10, S11, S12, S13.doi:10.1371/journal.pmed.1001610.g003

Figure 4 .
Figure 4. Summary of inferential methods and study results and conclusions.The dotted blue line represents the border of a hypothetical community.doi:10.1371/journal.pmed.1001610.g004

Figure
Figure S2 Phylogenetic analyses of gag and env genes for specimens that underwent repeated viral RNA

Table 1 .
Summary statistics for the 46 Rakai communities (within 11 geographic regions) surveyed in RCCS R13.
a Married refers to currently married individuals and includes married monogamous and married polygamous individuals.b Calculated using Poisson regression and assuming that HIV seroconversion occurred at the midpoint of the follow-up interval.doi:10.1371/journal.pmed.1001610.t001

Table 2 .
Characteristics of 95 phylogenetic clusters identified in maximum likelihood phylogenetic analyses (HKY-85) of 915 gag sequences and 1,026 env sequences obtained from 1,099 HIV-infected participants in RCCS R13.
a Categories not mutually exclusive.b Refers to clusters of two individuals who shared the same household.c Refers to clusters of two or more individuals who spanned households but shared the same community.d Refers to clusters of two or more individuals who spanned households and communities.e Refers to clusters of two or more individuals who spanned households, communities, and geographic regions.doi:10.1371/journal.pmed.1001610.t002

Table 3 .
Descriptive characteristics of HIV-seronegative and -incident participants in egocentric partner analysis (n = 9,520).
a ''Married, polygamous'' for women refers to a woman in a marital relationship with a man who has multiple wives.b Self-reported sexual partners from the egocentric partnership block of the RCCS study questionnaire (records up to four partners in the last 12 mo).c Categories not mutually exclusive (i.e., participants may report multiple partners with different HIV serostatus).d Partners were considered to be on ART if they were using ART for 50% or more of the corresponding index participant's time at risk.e Partner on ART for 58% of the newly infected index participant's time at risk (previous to current survey interval).doi:10.1371/journal.pmed.1001610.t003

Table 5 .
Probability of HIV infection by partner type over 18-mo study interval.

Table S1
Accession numbers for Los Alamos National Laboratory HIV Sequence Database reference sequences used for maximum likelihood and Bayesian phylogenetic analyses.This table includes the accession numbers, geographic location, year of collection, and HIV-1 subtype for each gag and env reference sequence used in phylogenetic analyses.(DOCX)TableS6Sensitivityanalyses of phylogenetic clustering results to choice of evolutionary model and bootstrap and genetic distance thresholds.Phylogenetic cluster analyses were conducted at 70%, 80%, 90%, and 99% bootstrap thresholds, with and without genetic distance cutoffs under the HKY-85 and GTR+I+G models of evolution.We present the cluster summary data shown in Table2under these different evolutionary models and genetic distance and bootstrap threshold criteria.