PLoS ONEplosplosone19326203Public Library of ScienceSan Francisco, USAPONED121310110.1371/journal.pone.0046156Research ArticleBiologyComputational biologyPopulation modelingInfectious disease modelingComputer scienceComputer modelingMedicineClinical research designEpidemiologyEpidemiologyInfectious disease epidemiologyInfectious diseasesViral diseasesHIVHIV diagnosis and managementRetrovirology and HIV immunopathogenesisInfectious disease modelingPublic Health and EpidemiologyInfectious DiseasesComputational BiologyEpidemiologyComputer ScienceCombining Epidemiological and Genetic Networks Signifies the Importance of Early Treatment in HIV1 TransmissionReconstruction of HIV1 Transmission NetworksZarrabiNarges^{1}^{*}ProsperiMattia^{2}^{3}BellemanRobert G.^{1}ColafigliManuela^{3}De LucaAndrea^{3}^{4}SlootPeter M. A.^{1}^{5}^{6}Computational Science, University of Amsterdam, Amsterdam, The NetherlandsCollege of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of AmericaClinic of Infectious Diseases, Catholic University of Sacred Heart, Rome, ItalyUnit of Infectious Diseases, Siena University Hospital, Siena, ItalyNational Research University ITMO, St. Petersburg, RussiaNanyang Technological University, Singapore, SingaporeKhudyakovYury E.EditorCenters for Disease Control and Prevention, United States of America* Email: N.Zarrabi@uva.nl
The authors have declared that no competing interests exist.
Performed the phylogenetic analysis: MP. Provided the visualization support: RGB. Conceived and designed the experiments: NZ PMAS. Performed the experiments: NZ. Analyzed the data: MP MC ADL NZ RGB PMAS. Contributed reagents/materials/analysis tools: NZ MP RGB. Wrote the paper: NZ MP RGB MC ADL PMAS.
2012289201279e461567520122882012Zarrabi et alThis is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Inferring disease transmission networks is important in epidemiology in order to understand and prevent the spread of infectious diseases. Reconstruction of the infection transmission networks requires insight into viral genome data as well as social interactions. For the HIV1 epidemic, current research either uses genetic information of patients' virus to infer the past infection events or uses statistics of sexual interactions to model the network structure of viral spreading. Methods for a reliable reconstruction of HIV1 transmission dynamics, taking into account both molecular and societal data are still lacking. The aim of this study is to combine information from both genetic and epidemiological scales to characterize and analyse a transmission network of the HIV1 epidemic in central Italy.
We introduce a novel filterreduction method to build a network of HIV infected patients based on their social and treatment information. The network is then combined with a genetic network, to infer a hypothetical infection transmission network. We apply this method to a cohort study of HIV1 infected patients in central Italy and find that patients who are highly connected in the network have longer untreated infection periods. We also find that the network structures for homosexual males and heterosexual populations are heterogeneous, consisting of a majority of ‘peripheral nodes’ that have only a few sexual interactions and a minority of ‘hub nodes’ that have many sexual interactions. Inferring HIV1 transmission networks using this novel combined approach reveals remarkable correlations between high outdegree individuals and longer untreated infection periods. These findings signify the importance of early treatment and support the potential benefit of wide population screening, management of early diagnoses and anticipated antiretroviral treatment to prevent viral transmission and spread. The approach presented here for reconstructing HIV1 transmission networks can have important repercussions in the design of intervention strategies for disease control.
This research was partly sponsored by the DynaNets project (www.dynanets.org), European Union grant agreement number 233847, and a grant from the ‘Leading Scientist Program’ of the Government of the Russian Federation, under contract 11.G34.31.0019. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding received for this study.Introduction
Understanding the dynamics of infectious disease spreading demands a holistic approach [1]. Social interactions as well as genetic diversity of the transmitted viral agent among individuals dictate the dynamics of infectious disease spreading in a population. Hence, the infection transmission can be investigated at different spatiotemporal scales, from molecular to epidemiological levels.
At the epidemiological level, scientists have been trying to study the spread of infectious diseases using social or sexual contact networks, modelling the population as a complex network (where nodes are individuals and links are relationships) and running models of disease spread on top of that. In the case of type HIV1 infection, these models have been used to understand the complexity of HIV1 transmission and spread of viral drug resistance [2]–[8]. However, these models require estimation of many parameters such as frequency of sexual actions, transmission probability per action, and parameters that shape the network structure. For example, even though there is uncertainty about the network structures formed by social/sexual contacts, the network structure of ‘men who have sex with men’ (MSM) is assumed to be approximately scalefree with an exponent value in the range from 1.5 to 2.0 [6], [9]. Therefore, the degree distribution follows a powerlaw with a scaling factor equal to the exponent. A powerlaw distribution implies that lowdegree nodes are many, whereas highdegree nodes are few [10], [11]. These assumptions however, are subject to change in different communities and cultures. Therefore the primary assumptions on the network structure and the choice of the uncertain parameter values to build a sexual contact network are still controversial.
Phylogenetic analysis has been employed to study the evolution of HIV1 both at the population and intrahost level during different stages of the disease, using molecular sequences [12], [13]. Phylogenetic theory exploits genetic information of viruses and other species using mathematical methods of molecular evolution [14], [15]. Phylogenetic trees show the evolutionary relationships among genetic sequences in a population, where topology and branch lengths can be estimated via likelihoodbased, parsimonybased or distancebased methods. Genetic isolates are placed at the leaves of these trees and the internal nodes are considered as hypothetical ancestors under a species' coalescence paradigm. Phylogenetic trees can be used to infer transmission clusters [16], [17], as well as temporal and spatial dynamics of the species' evolution, in a socalled phylodynamic framework [18], [19]. However, phylogenetic methods may not necessarily accurately represent the evolution of species and transmission of disease, both due to strong assumptions of the underlying mathematical models, and due to noise in the data. For instance, evolution of species is not always reducible to a tree form and a hierarchical tree may not represent the evolution of a species, such as in the case of recombination events [20]. Moreover, the agreement between phylogenetic reconstruction and epidemiological evidence of transmission events can be decreased due to other factors: in the case of HIV1 infection, these include the long period of infectivity and convenient sampling (i.e. biased, nonuniform sampling in terms of locations or periods) [20], [21], [22].
This work proposes a new approach to combine information present at both genetic and epidemiological levels in order to obtain a more comprehensive picture of HIV1 transmission. A filterreduction method is applied to infer a metanetwork of HIV1 sequences based on the corresponding patient's demographic and medical information. For this metanetwork, we use the term contact network as it contains all the contacts that are socially and sexually possible contact between infected individuals in the population. In contrast to standard network methods, no assumptions are being made on the network structure. An intersection of such contact network with a genetic distance network is subsequently computed, from which a hypothetical transmission network is inferred. The method is then applied to identify the HIV1 subtype B transmission networks in central Italy. The structure of the inferred networks for the MSM and heterosexual risk groups is in agreement with the recognized network structures for social and sexual contacts in the HIV1 infected population [23]. Moreover, highly connected patients in the network are found to be significantly correlated with longer periods without antiretroviral treatment.
Considering population level data beside genomic data is essential for understanding the true nature of infectious disease transmission networks, as was alluded to by DeGruttola et al. [24]. The approach presented here is, to the best of our knowledge, the first attempt to use both genetic and social information in order to characterise transmission networks for HIV1.
ResultsCharacteristics of the study population
A dataset of 895 HIV1 infected patients from a regional study cohort in Rome, Italy (see methods) was used in this study. Patients were divided into two separate groups according to their viral subtype: B and nonB subtype. Onehundredtwentytwo (13.5%) patients with a nonB subtype were excluded from the analysis. Of the 773 (86.5%) subtype B patients, 118 (15.3%) patients who had an unknown/other entry for the transmission group were also excluded from the analysis. Of the 655 patients included in the analysis, 65.0% were males and 35.0% females; HIV transmission risk categories were 27.0% MSM, 39.0% heterosexual contacts, 33.0% injecting drug users (IDU), 1.0% infected through blood products; 84.4% were Italianborn, 10.4% nonItalian born, while for 5.2% nation of birth was unknown. The median interquartile range (IQR) age was 48 (43–53) years; the median (IQR) calendar year of estimated seroconversion, an estimate of the start of the infection, was 1996 (1993–2000); the median (IQR) calendar year of viral genotyping was 2004 (2001–2007). At the time of viral genotyping, the overall median (IQR) plasma viral load was 4.1 log_{10} HIV RNA copies/ml (3.5–4.7). The percentage of therapynaive patients was 19.3%, whilst 80.7% were antiretroviral therapyexperienced. The median (IQR) time from the estimated seroconversion date to the first viral sequence date was 8 (4–11) years. In the subset of therapyexperienced patients, the median (IQR) time from the estimated seroconversion date to the first therapy date was 3 (1.25–5) years, and the median (IQR) time passed from the first therapy date to the viral sequencing date was 4 (1–8) years.
Filterreduction method and network construction
We proposed a new filterreduction method to infer networks of HIV infected patients, taking into account patients attributes and parameters from literature. The filterreduction method was defined as follows. Consider a socialsexual network as a graph/network composed of N nodes, V(N). We started with an undirected fullyconnected network of V(N) in which there is a link between each pair of nodes. A set of filters F was applied to the fullyconnected network, reducing the number of edges through the filtering process. Depending on the data and type of the network the filtering process could vary. For building the network, we used HIV1 sequence data that were annotated with demographical information and we applied a set of social filters (Table 1). The social filters were basic epidemiological criteria such as belonging to a similar age range (filter 1) and similar transmission risk group (filter 2), and the effect of treatment in reducing the transmission probability (filter 3). A direct connection between every two nodes that did not satisfy the epidemiological criteria was removed from the network. Table 1 summarizes the specific filtering rules used for reduction of the associated contact network (For details on the filtering process see Material and Methods). An undirected contact network is derived through the filtering process. For the heterosexual population a bipartite network is derived. This is an effect of rule b in of the second filter, in which we consider two populations with different genders, males (g1) and females (g2), and only links between different genders are allowed. A seroconversion function is applied to convert the undirected network to a directed one. The seroconversion function is based on patient's estimated seroconversion date and assigns the direction from a patient with an older seroconversion date to a patient with a more recent seroconversion date. The function results in having no directed cycles in the networks, meaning that there is no way to start at some vertex v and follow a sequence of edges that loops back to v again. Hence, the inferred network is a directed acyclic graph (DAG), a directed graph with no directed cycles [25]. DAGs are suitable to study and model processes in which information flows in a consistent direction through the network such as disease transmission [26], [27]. In the case of HIV1, a “superinfection” may rarely occur, in which a patient is infected twice with two different virus strains (from different donors). However, it is highly unlikely that a patient is infected back with a variation of its own virus. In DAG, it is possible that nodes receive more than one incominglink (the case of superinfection) but, since there are no directed cycles in the network, a node would never be reinfected with a variation of its own virus. Figure 1 shows the workflow for constructing networks using the filter reduction method.
10.1371/journal.pone.0046156.g001Workflow for constructing networks using the filterreduction method.
Starting from an undirected fullyconnected network of all HIV sequences in the data, a set of social/sexual filters is applied to obtain an undirected filtered network. To convert the network to a directed one a seroconversion function is applied, deriving a contact network.
10.1371/journal.pone.0046156.t001Social/sexual filters for constructing a contact network.
For patients1 and 2:
Filter 1
If (maximum_age_range<age_{1}−age_{2}) connection = 0
Filter 2
Rule a: If (r_{1} is not equal to r_{2}) connection = 0Rule b: If(r_{1} = r_{2} = “Heterosexual” & g_{1} = g_{2}) connection = 0Rule c: If ( r_{1} = “Blood products” or r_{2} = “Blood products”) connection = 0
Filter 3
If (t_{1} is older than s_{2}) connection = 0If (t_{2} is older than s_{1}) connection = 0
Rules for social/sexual filters. gender (g), risk group (r), therapy date (t), estimated seroconversion date (s).
Analyzing characteristics of the contact network
To analyse the inferred networks we fist visualized the networks and plotted the degree distributions. Figure 2 shows the network for the entire population that consists of three subnetworks corresponding to the major HIV1 transmission risk groups (MSM, heterosexual, IDU). There were a few patients with “blood product” mode of infection which were isolated from other risk groups. We analyzed the degree distribution of the network as a whole (i.e., for all risk groups) and the degree distribution of each subnetwork separately. The cumulative degree distributions of the contact networks of the total, in and outdegrees are plotted (log scale) and shown in Figure 3. Indegree is the number of incoming edges to a node and outdegree is the number of outgoing edges from a node. The total degree is the sum of in and outdegrees.
Visualization of the contact network consisting of three subnetworks corresponding to the major HIV1 transmission risk groups: MSM (yellow), Heterosexual (red), and IDU (green).
10.1371/journal.pone.0046156.g003Degree distributions of the contact network.
The cumulative total (black), in (blue), and outdegree (pink) distributions for the entire network (all risk groups), MSM, Heterosexual, and IDU risk groups plotted in loglog scale.
The degree distributions presented in Figure 3 are based on social and demographical information and are intermediate results before incorporating the genetic data. From the distributions however, one can see that the degree of highly connected patients in IDU is significantly higher than those in MSM and patients acquiring infection through heterosexual contacts. To further investigate the structural differences between networks of the three risk groups, we measured additional network properties including fraction of removed edges, average degree, average path length, global and local clustering coefficients and assortativity, (Table 2).
10.1371/journal.pone.0046156.t002Properties of the contact network.
MSM
Heterosexual
IDU
All risk groups
fraction of removed edges
80.4%
91.2%
45.7%
91.3%
average degree
34.7
22.30
117.1
56.7
average path length
2.16
2.83
1.48
2.20
clustering coefficient (global)
0.59
0.00
0.76
0.71
clustering coefficient (local)
0.70
0.00
0.82
0.47
assortativity (degree)
0.04
−0.20
−0.11
0.45
The percentage of removed edges from the MSM and heterosexual networks is almost twice as the percentage of removed edges from the IDU network. This implies that the MSM and heterosexual contact networks are sparser than the IDU and although the same filters were applied to all risk groups, the nodes in the IDU contact network remains more connected and the network structure is more compact. These observations together with the discrepancies in the degree distributions (Figure 3) and measurements in Table 2 implies that there are structural differences in the contact networks and therefore HIV1 transmission dynamics between the IDU, MSM and heterosexual populations. The higher degree in the IDU population can be understood from the fact that the IDU was one of the first risk groups affected by the HIV epidemic in Northern Italy and had the highest risk of HIV infection in 1985 [28].Moreover, needle sharing among IDU has a much higher probability of transmission per single act and therefore it is plausible that, besides the differences in trend over time and access to treatment over time regarding the epidemics among the different risk groups, the mode of transmission within IDU by itself might also have contributed to the observed higher degree of distribution. The heterosexual population has a bipartite contact network and therefore the clustering coefficients are zero. Bipartite networks are representative of heterosexual contact networks for sexually transmitted diseases (STDs) such as HIV/AIDS, since the infection only transmits between males and females and not between individuals with the same gender [29].
We used community detecting methods based on the leading eigenvector of the community matrix to identify community structures in the network [30]. The method helps to identify parts of a network where nodes are densely connected to each other but are sparsely connected to other nodes in the network. The results confirmed the existence of two major communities in the MSM and Heterosexual risk groups (Figure S1). We explain the appearance of these communities from an epidemiological point of view. Formation of communities in a network is due to a local increase in the connectivity between nodes in some parts of the network. Knowing that the connectivity of patients within a community is higher than between communities suggests that people residing in one community had a higher possibility of having contacts and infecting each other. To explore the possible reasons of a higher chance of having infection transmission events between people residing in one community, we mapped the patient's estimated seroconversion years to colour codes from cyan to red. An interesting trend was observed suggesting that the first community (blue to green) contains patients who were infected from 1980 to the late 1990s, while the second community (yellow to red) contains patients who were infected more recently, after the year 2000 (Figure 4). The temporal separation of the communities may reflect the influence of the introduction of more potent and effective antiretroviral therapies during the second half of the 90 s [31]. The observed trend in the estimated seroconversion year also showed that the HIV1 incidence in the IDU population decreased over time after the late 80 s (see Figure 4). This is inline the observed decrease in spreading of HIV among the IDU population in Italy after the 80 s as reported by Rezza et al. [32]. However, the trend of HIV infections through different modes of transmissions in our data set (see Figure S2) did not necessarily respect the overall Italian trends [33], [34] and a more representative sample is needed if we want to extend the results from the county/regional to the national scale.
10.1371/journal.pone.0046156.g004The inferred contact network coloured based on estimated year of seroconversion.
The colouring trend in the patient's estimated seroconversion year, ranging from 1982 (blue) to 2008 (red).
Next we studied the relationship between the untreated infection period and the connectivity of the patients in the network. For that we defined an untreated infection period (UIP) for each patient which is computed by:UIP is the period that the patient was infected but had not started antiretroviral therapy yet (either because of being unaware of infection or not fulfilling the immunovirological criteria to be eligible for treatment or not willing to be treated). We detect a correlation between the untreated infection period and the number of outgoing edges from a node (outdegree) in the network. The correlation is strongest for the MSM population with a high statistical significance (r = 0.90, 95% confidence interval, CI (0.87, 0.93), pvalue<2.2e16), where r is the Pearson's productmoment correlation. The correlation was less strong but still highly significant for the heterosexual contacts (r = 0.74, 95% CI (0.68, 0.79), pvalue<2.2e16), IDU (r = 0.86, 95% CI (0.83, 0.89), pvalue<2.2e16) and the overall population (r = 0.83, 95% CI (0.81, 0.85), pvalue<2.2e16). The UIP versus the outdegree of nodes is plotted in Figure 5 and one can clearly see that nodes with higher outdegree tend to have longer UIPs. The inferred networks are direct outcome of the filters we applied. To test the effect of filters on the detected correlations, we rebuilt the networks by each time removing one filter from the filtering process and measured the correlations again. We see that removing the age and risk group filters does not significantly change the correlations. By removing the treatment filter, the correlations decrease but are still statistically significant (data shown in Table S1).
10.1371/journal.pone.0046156.g005Untreated infection period (UIP) versus outdegree.
UIP vs. the outgoing degree of nodes in the MSM, Heterosexual, IDU and all risk groups populations. The Pearson's correlation coefficients, 95% confidence intervals and pvalues are depicted on each graph.
Constructing the hypothetical transmission networks
To construct a hypothetical transmission network we coupled information from both genetic and epidemiological scales. To this aim, we computed the intersection of the contact network with a genetic network which was obtained from a genetic distance matrix [16], [35]. The genetic distance matrix gives a weighted fully connected network which connects all sequences with each other using their genetic distances as weights (see Dataset S1). The connection between every two nodes with a genetic distance higher than a certain threshold was removed from the network. We used the threshold value of 0.04 nucleotide substitutions per site and derived a genetic network (See Figure S3 and Figure S4). The threshold of 0.04 corresponds to the 15th percentile of the overall distance distribution measured through the phylogenetic tree. The sense is that all retained links include sequences that are closer than the 85^{th} percentile of the all pairwise comparisons (see [17] for a discussion on the optimal threshold). Additionally, we measured the fraction of removed edges from the genetic network by varying this parameter in a range from 0.02 (1^{st} percentile) to 0.05 (35^{th} percentile). We observed that by increasing the threshold value, the percentage of removed edges gradually decreases for the MSM. But, for the heterosexual, IDU and all risk groups the percentages drop under 50% for threshold value 0.05 (Table S2). Subsequently, the genetic network was overlaid with the contact network and the intersection network was computed. The resulting socialgenetic intersection network, as a hypothetical transmission network, satisfied both genetic and epidemiological criteria for transmission events. Figure 6 shows the hypothetical transmission network of the entire population. To analyse the characteristics of the inferred network, we plotted the degree distributions (Figure 7) and measured the network properties presented in Table 3.
The hypothetical transmission network of the entire population obtained from computing the intersection of the contact and the genetic network. Patients are colored based on their risk groups: MSM (yellow), Heterosexual (red), IDU (green) and blood products (cyan).
10.1371/journal.pone.0046156.g007Degree distributions of the hypothetical transmission network.
Cumulative total (black), in (blue), and out (pink) degree distributions of the hypothetical transmission network of the MSM, heterosexual, IDU and all risk groups plotted in loglog scale.
10.1371/journal.pone.0046156.t003Properties of the hypothetical transmission network.
MSM
Heterosexual
IDU
All risk groups
fraction of removed edges
98.1%
98.0%
74.4%
96.7%
average degree
3.32
4.86
55.30
21.10
average path length
2.86
3.27
1.78
2.22
clustering coefficient (global)
0.36
0.00
0.60
0.59
clustering coefficient (local)
0.50
0.00
0.74
0.45
assortativity (degree)
−0.07
−0.17
−0.22
0.11
In Figure 7, the cumulative degree distributions of the hypothetical transmission networks for the MSM, heterosexual, IDU and for all risk groups are shown. For the MSM and heterosexual populations, the cumulative outdegree distributions were fitted to a straight line, in logscale, with slopes equal to 2.65±0.43 and 1.88±0.31. Fitting to a straight line in a loglog scale suggests that the degree distribution follows a powerlaw with a scaling factor equal to the slope [10], [11]. To ensure the fit to the powerlaw distribution we performed a statistical test, using maximumlikelihood fitting methods with goodnessoffit tests based on the KolmogorovSmirnov statistic [10]. We followed the procedure proposed by Newman et al. (2007) [11] to test for powerlaw distribution of the data. The method uses maximum likelihood estimators for fitting the powerlaw distribution to the data, along with the goodnessoffit based approach to estimate the lower cutoff for the scaling region. The uncertainty in the fitted parameters was estimated using a function that implements the nonparametric approach for estimating the uncertainty in the estimated parameters for the powerlaw fit. To calculate the pvalue for the fitted powerlaw model, we use a function that implements the KolmogorovSmirnov test (which computes a pvalue for the estimated powerlaw fit to the data) for the powerlaw model. If the resulting pvalue is greater than 0.1 the power law is a plausible hypothesis for the data, otherwise it is rejected (See Table 4).
10.1371/journal.pone.0046156.t004Basic parameters of the data and the power law fit.
quantity
n
Degree
Data
Power law (p)
goodnessoffit pvalue
<x>
σ
MSM
176
Total in out
3.32 1.66 1.66
5.62 3.05 3.58
27 18 27
1.82 (0.54) 2.09 (0.38) 2.65 (0.43)
11 (1.80) 2 (1.26) 5 (1.61)
0.0040 0.0590 0.1730
Heterosexual
255
Total In out
4.86 2.43 2.43
7.68 3.56 5.67
49 17 39
3.50 (0.61) 2.50 (0.48) 1.88 (0.31)
18 (4.78) 4 (1.71) 2 (1.87)
0.6130 0.0030 0.1020
IDU
217
Total in out
55.30 27.65 27.65
43.43 23.90 33.54
175 90 146
3.50 (0.12) 3.50 (0.31) 1.96 (0.47)
69 (5.23) 42 (6.02) 15 (12.37)
0.0170 0.0000 0.0000
All risk groups
655
Total in out
21.10 10.55 10.55
35.15 18.47 23.08
175 90 146
3.5 (0.82) 1.6 (0.51) 2.0 (0.29)
69 (27.84) 5 (3.91) 14 (7.24)
0.0160 0.0000 0.0000
Basic parameters of the data (total, in and outdegree distributions of the MSM, heterosexual, IDU and all risk groups), along with their powerlaw fits and the corresponding pvalue. Goodnessoffit tests compare the observed data to the hypothesized powerlaw distribution. If the resulting pvalue is greater than 0.1, powerlaw is plausible for the data (statistically significant values are denoted in bold).
Then we performed statistical tests (via a likelihood ratio test) to compare the powerlaw again alternative (Exponential and Poisson) distributions for the data. For each alternative distribution, we computed a likelihood ratio shown in Table 5. If the calculated likelihood ratio is significantly different from zero, then its sign indicates whether the alternative is favored over the powerlaw model or not. The statistical tests results and positive likelihood ratios show that the MSM outdegree distribution is a good fit to the power law model in comparison to Exponential and Possion distributions.
10.1371/journal.pone.0046156.t005Test of power law behavior in the data and likelihood ratios of alternative distributions.
Power law (pvalue)
Poisson
Exponential
Support for power law
LR
pvalue
LR
pvalue
MSM (outdegree)
0.1730
2.31
0.02
0.35
0.72
good
Heterosexual (totaldegree)
0.6130
4.08
<0.01
−2.67
0.01
moderate
Heterosexual (indegree)
0.1020
3.28
<0.01
1.85
0.06
good
For each degree distribution we give a pvalue for the fit to the powerlaw model and likelihood ratios (LR) for the alternatives. We also quote pvalues for the significance of each of the likelihood ratio tests. Significant pvalues are denoted in bold. Positive values of the likelihood ratios indicate that the powerlaw model is favored over the alternative. The final column of the table lists the judgment of the statistical support for the powerlaw hypothesis for each distribution. “Moderate” indicates that the powerlaw is a good fit but there are other plausible alternatives as well; “good” indicates that the powerlaw is a good fit and that none of the alternatives considered is plausible.
Transmission network and phylogenetic clusters
We compared the inferred transmission network with a set of genetic clusters obtained through phylogenetic analysis of the corresponding viral sequences (see Materials and Methods and Figure S5). A total of 61 clusters (from size 2 to 52) were identified, where 39% of all patients were included in these clusters (see Figure S6 for the cluster size distribution). Nodes, representing individual viral isolates, residing in the same cluster are identified to be genetically close and therefore, possibly transmitted the virus to each other. For every two nodes in a same genetic cluster we tested if they were connected (directly or indirectly) in the transmission network. The percentage of genetically close nodes that were connected in the transmission network was 37% for MSM, 55% for heterosexual, and 95% for IDU. The high percentage of genetically close nodes in the IDU population also supports the idea that the needle sharing does play an important node in the transmission of HIV in the resulting contact network.
Factors associated with superspreaders
High outdegree nodes in the network have a higher probability of outspreading the virus to more contacts. In a population these nodes can play the role of superspreaders with lot of connections [36]–[39]. In Table 6, we report the results of a multivariable linear regression analysis conducted to identify factors associated with superspreaders or higher outdegree nodes in the network. In all populations a longer untreated infection period and a higher number of incoming links were associated with superspreaders. The risk of being a superspreader was also associated with a higher viral load and an older age in the MSM population. The risk in males was higher than females in the heterosexual population and in all risk groups. We also performed a univariable regression analysis to identify the independent effect of covariates with respect to superspreaders (See Figure S7).
10.1371/journal.pone.0046156.t006Factors associated with outdegree nodes.
Factor/risk group
MSM
Heterosexual
IDU
All risk groups
Coef
Std
P value
Coef
Std
P value
Coef
Std
P value
Coef
Std
P value
Age (years)
0.02
0.01
0.0078
0.01
0.01
0.7788
0.03
0.00
<0.0001
0.02
0.00
<0.0001
Viral load (copies/ml)
0.02
0.06
0.7450
−0.03
0.04
0.5242
−0.06
0.01
0.0001
−0.08
0.01
<0.0001
UIP (years)
0.22
0.01
<0.0001
0.24
0.01
<0.0001
0.13
0.00
<0.0001
0.17
0.00
<0.0001
Gender (Male/Female)



0.39
0.10
0.0001
0.09
0.03
0.0044
0.31
0.03
<0.0001
Indegree
0.12
0.01
<0.0001
0.15
0.01
<0.0001
0.01
0.00
<0.0001
0.03
0.00
<0.0001
Results of a multivariable regression analysis showing the factors associated with high outdegree nodes. The outdegree is the dependent variable in the analysis, and age, viral load, UIP, gender, and Indegree are independent variables.
Comparison with random networks
To compare the hypothetical transmission networks with random graphs, we generated random networks of the same size (nodes and edges) as the inferred transmission networks for each population (MSM, heterosexual, IDU and all risk groups). For this, we used the fraction of remaining edges in each network, as a probability to generate an edge in the random network. Table 7 compares the properties of the inferred transmission networks with random networks. One can see that the inferred networks are different from random networks of their own size by having lower average path lengths, higher clustering coefficients and higher assortativity coefficients.
10.1371/journal.pone.0046156.t007Properties of the hypothetical transmission network against random networks.
MSM
Heterosexual
IDU
All risk groups
Inferred
Randomized
Inferred
Randomized
Inferred
Randomized
Inferred
Randomized
average degree
3.32
3.24
4.86
5.02
55.30
55.35
21.10
21.50
average path length
2.86
4.38
3.27
3.59
1.78
1.74
2.22
2.44
clustering coefficient (global)
0.36
0.02
0.00
0.02
0.60
0.25
0.59
0.032
clustering coefficient (local)
0.50
0.02
0.00
0.01
0.74
0.25
0.45
0.032
assortativity (degree)
−0.07
−0.02
−0.17
0.03
−0.22
−0.02
0.11
<−0.01
Both inferred and randomized networks are of the same size in terms of number of nodes and edges. The properties of the randomized network is an average over the properties of 5 random networks.
Discussion
A new method for inferring hypothetical HIV1 transmission networks is introduced using information from both genetic and epidemiological scales. This study constitutes, to the best of our knowledge, the first attempt to combine social and genetic data to characterise transmission networks for HIV1. We propose a new filterreduction method for network construction and used it to build a network of HIV1 sequences based on their connected social and demographical information. To characterise the hypothetical transmission networks we compute the intersection of the social network with the genetic network obtained from the genetic distance matrix of Italian patients. Standard network approaches consider a predefined network structure with certain parameter values to build a network, such as scalefree structure with an exponent in the range of 1.5 to 2.0 for the MSM population in HIV transmission [5], [6]. The main advantage of the method presented here is that it does not require any preassumption on the network structure. The network structure itself is an emergent characteristic of our approach. The powerlaw distribution for the MSM and heterosexual outdegree distributions yields a scalefree structure for these networks with exponents equal to 2.65 and 1.88. This means that the structure of the hypothetical transmission network for the MSM and heterosexual population is heterogeneous, consisting of a majority of ‘peripheral nodes’ that have only a few sexual interactions and a minority of ‘hub nodes’ that have many sexual interactions. This finding is in line with the results obtained from analysis of the degree distribution of HIV transmission networks for the MSM population in the UK [23].
Interestingly, we uncover a positive correlation between the duration of untreated infection periods and the outdegree of the nodes in the network. This important finding may be explained by the fact that untreated individuals have higher viral loads and are therefore more infectious; moreover not being on therapy is generally associated to a higher probability of not being diagnosed or not being compliant to treatment and prevention messages conveyed by health care providers. This finding underscores the importance of case finding, early diagnosis and anticipated antiretroviral treatment as tools to prevent HIV1 transmission and spread [40], [41].
The delay between the median estimated seroconversion and the start of genotyping may have caused the older half of infections to be a bias sample, as in the preHAART (highly active antiretroviral therapy) era when only the slow progressors survived to be genotyped later. To investigate this effect, we perform the analysis on a subset of recent infections, by only considering instances with first positive test after 1998 calendar year. There were 202 patients with a recent infection in the data in which 79 were MSM, 99 were Heterosexual, 24 were IDU. The correlation between the untreated infection period and the outdegree of nodes in the contact network still holds (Figure S8). However, the degree distributions of the transmission network did not pass the statistical test for fit to a powerlaw. The number of 202 recent infections in our current dataset is relatively a small sample. Doing the analysis on recent infections is worthwhile but requires having access to recently collected data, which will definitely be considered in our future studies.
Superspreaders are highly infectious individuals with a high viral load and a high rate of partner change [36], [42]. Identifying and controlling these superspreaders is crucial for stopping the spread of disease in a population [43], [44]. The identified factors associated with superspreaders highlighted in the results section could help to achieve this goal. The identified correlation presented in this paper also suggests the association of hubs in the network (superspreaders) with not being on antiretroviral treatment for longer periods. The stages of infection between the seroconversion, the detection of the infection, and the initiation of therapy are crucial in driving the transmission epidemics. Individuals who do not test regularly and have a risky sexual behaviour can more easily become hubs or superspreaders, along with those who do not initiate a therapy early after the first positive test and do not change at risk behaviours. The fact that, in this study, networks' hubs were those with a longer untreated period confirms this hypothesis. Until recently, the initiation of antiretroviral treatment has not been decided by a transmission prevention policy, but rather by considering patient's immunological conditions [HIVAIDS treatment 2011 guidelines: http://www.aidsinfo.nih.gov/contentfiles/adultandadolescentgl.pdf]. Our observation, along with data presented from recent clinical studies [40], [45], strongly suggests that early treatment should be considered in order to prevent transmission, although the costbenefit of such a strategy must be further assessed in different populations and epidemiological scenarios.
The transmission of HIV drug resistance is another important clinical and epidemiological concern which induces treatment failure. Approximately 10% of newly diagnosed patients with HIV1 infection in Europe are infected with a drug resistant virus [46], [47]. Therefore, there is an urgent need for prevention strategies in order to block the transmission of drug resistant virus. Characterisation of the HIV transmission networks proposed in this paper is a first step that can facilitate the investigations on the transmission of viral drug resistance.
In this study we have limited ourselves to transmission within the three main risk groups, omitting transmission between risk groups which are also observed in the phylogenetic analysis [17]. The reason for that was having no access to reliable social and behavioral data to include transmission between risk groups and we will consider extending our current study in that direction upon availability of the required data.
We believe that the new approach presented here for inferring transmission networks can have important repercussions in the design of intervention for disease control not only for HIV, but potentially for a wide range of viruses and emerging pathogens.
Materials and Methods
In this study, we combined information from both genetic (derived from HIV1 RNA sequences) and epidemiological scales to characterize a transmission network of the HIV1 epidemic in central Italy. The study population included HIV1 infected patients, with viral genotyping between 1997 and 2009, enrolled and followed up at the Clinic of Infectious Diseases of the Catholic University of the Sacred Heart in Rome, Italy. Inclusion criteria were to have at least one viral genotype sequence performed for each patient, allowing multiple observations for patients with more than a viral genotype available. We applied a novel filterreduction method to infer a network of HIV1 sequences based on the corresponding patient's epidemiological information, obtaining a potential contact network. The method is based on real patient data and no preassumptions are made on the network structure. To characterize the transmission network of HIV1, the intersection of the contact network with a genetic network based on a genetic distance matrix was computed.
The Data
HIV1 RNA sequences from a regionwide cohort study of HIV1infected people in Rome and Lazio region, Italy, were used [The database is a part of the three national HIV data cohort in Italy: ARCA (www.hivarca.net), Icona (http://www.fondazioneicona.org), and Master (http://www.mastercohort.it)]. The viral sequence information encompassed the HIV pol gene region, covering the whole protease and most of the reverse transcriptase gene (at least the first 1–250 amino acids). Sequence data was annotated with corresponding patient's demographics and treatment information, including: sequence id (numeric), viral subtype, sequence calendar year (numeric), patient's gender (male/female), age (numeric), mode of HIV transmission (MSM, heterosexual, IDU, blood products, other/unknown), country of origin (Italian/nonItalian/unknown), ART status (ARTexperienced/ARTnaive), seroconversion year (median time between last HIV1 negative test date and first HIV1 positive test date), calendar year of first HIV positive test and of first available antiretroviral therapy (numeric), plasma HIVRNA load (numeric) at viral sequencing time, presence of resistance mutations for nucleosidetide/nonnucleoside reverse transcriptase inhibitors and protease inhibitors in the HIV1 sequence (binary). The unknown/other risk group members were excluded from the analysis. In the case of missing values for the last negative test date, in order to estimate the seroconversion we take the first positive test date minus one year which is the average time difference between the estimated seroconversion date and first positive test in the data. For a number of patients in the dataset, multiple sequences were recorded at different time points, but we only considered the earliest sequence per patient for social/epidemiological analysis. The sequence data was used for phylogenetic analysis and subsequent inference of transmission clusters, while the annotated demographical and treatment information were used for social network construction. The statistics of patient's characteristics are presented in Table 8.
10.1371/journal.pone.0046156.t008The statistics of patients characteristics (total n = 655, subtype B patients, excluding entries with unknown risk group).
Data statistics
Number of unknown/missing data entries
Risk group
22.8% MSM (n = 176)
33.0% heterosexual (n = 255)
28.0% IDU (n = 217)
0.9% blood products (n = 7)

Gender
65% male (n = 426)
35.0% females (n = 229)

country of origin
84.0% Italian (n = 553)
10.4% nonItalian (n = 68)
5.2% unknown (n = 34)
Antiretroviral therapy
19.3% therapynaïve (n = 127)
80.7% therapyexperienced (n = 528)

median (IQR)
Age
48 (43–53) years

Estimated seroconversion date
1996 (1993–2000) calendar year
79.0% unknown (n = 517)
Last negative test date
1995 (1991–1999) calendar year
78% unknown (n = 515)
First available positive test date
1995 (1991–2000) calendar year

viral genotyping date
2004 (2001–2007) calendar year
0.4% unknown (n = 3)
First available therapy date
1998 (1995–2003) calendar year

plasma viral load (At the time of viral genotyping)
4.1 log10 HIV RNA copies/ml (3.5–4.7)
0.4% unknown (n = 3)
time from estimated seroconversion date to the first therapy date
3 (1.25–5) years
79.0% unknown (n = 517)
time from estimated seroconversion date to the first viral sequence date
8 (4–11) years
79.0% unknown (n = 517)
Phylogenetic analysis
HIV1 sequences matching the inclusion criteria were aligned using MUSCLE software [48] and the resulting multiple alignments were edited in order to remove drugresistance associated mutations [IASUSA list 2010 (http://www.iasusa.org/pub/topics/2010/issue5/156.pdf)] that can lead to a convergent evolution bias in the phylogenetic tree estimation. A phylogenetic tree was then estimated using the maximum likelihood FastTree software [49], assessing node reliability via the builtin ShimodairaHasegawa test. Transmission clusters were extracted from the phylogenetic tree using the PhyloPart java application [17]. The PhyloPart uses a depthfirst algorithm to extract a crisp partition (i.e. clustering) from an input phylogenetic tree, constraining its search on the comparison between subtree (i.e. potential clusters) and wholetree patristic distance distributions, plus additional ancillary topologic criteria. When the subtree is highly (>90%) supported by bootstrap (or posterior probability or other statistical test), when at least two distinct patients are in the subtree, and when the median patristic distance is below a percentile threshold of the wholetree distance distribution, then a cluster is found. If the depthfirst search reaches a leaf node without finding any cluster, then the instance is classified as a singleton. Additionally, a genetic distance matrix was calculated with the MEGA software using the LogDet function [50].
Filtering process in the filterreduction method
The filterreduction method was used to build a contact network from the dataset. Each node in the network represents a viral sequence isolate of HIV1 obtained from a patient. Starting from an undirected fullyconnected network of all patients, a set of social/sexual filters was applied. These filters considered patients' demographical and treatment information. A direct connection between every two nodes that did not satisfy the epidemiological criteria was removed from the network (the percentage of removed edges from the network by applying each filter is presented in Table S3). In what follows the social filters for building the contact network are described in more detail:
Filter 1: The age filter indicates the maximum age range for an individual to be socially or sexually interactive with another individual. If the age difference between two patients exceeds the maximum age range the direct connection between them is filtered. The age difference is a free parameter and can be changed. We used a value of 10 years for this parameter based on a study on agedisparate and intergenerational sex in South Africa [51]. We also perfrmend a sensitivity analysis on this parameter by varying the value between 2 to 20 years (data shown in Table S4).
Filter 2: This filter considers the patient's gender (g) and risk group (r). Three rules are implemented: Rule a: the connection between patients from different risk groups is filtered, this results in creation of three separate subnetworks corresponding to the major HIV transmission risk groups (MSM, Heterosexual, and IDU). Rule b: for the heterosexual risk group the connection between patients with the same gender is filtered. Rule c: The “Blood product” risk groups are isolated from the population, as they were not infected through sexual relationships.
Filter 3: Observational studies suggest that the transmission probability of HIV1 decreases by 80–98% after a patient starts treatment [52], [53]. This is mainly due to the smaller amount of viral particles in the genital secretions and mucosa after treatment and the behavioural changes in the patients sexual and social habits when they become aware of their disease. Following this observation, we filtered connections to a patient A from any other patient whose therapy initiation date (t) predated patient A's estimated seroconversion date (s).
Network visualization
The network visualizations in this article were produced using an inhouse developed interactive visualization tool, called “Twilight”, which is based on the igraph software package for complex network research [54]. The layout for all graphs was produced using an implementation of FruchtermanReigngold algorithm provided by igraph [55]. A demo of network visualization is shown in Video S1 and more information on Twilight can be found at http://uva.computationalscience.nl.
Supporting Information
Communities in the MSM and heterosexual populations. Two main communities (green and blue) identified in the MSM and heterosexual populations using community structure detecting methods based on the leading eigenvector of the community matrix. The red edges are connecting different communities.
(TIF)
Prevalence of mode of transmission groups stratified by calendar year in the study population.
(TIF)
Visualization of the genetic network. The genetic network is built based on the genetic distance matrix. There is a link between every two patients in the network if their genetic distance is smaller than the threshold value of 0.04 ns/s. Patients are coloured based on their corresponding risk group: MSM (yellow), heterosexual (red), IDU (green) and blood products (cyan).
(TIF)
Degree distributions of the genetic network. Cumulative total (black), in (blue), and outdegree (pink) distributions of the genetic network plotted in loglog scale for the MSM, Heterosexual, IDU and all risk groups.
(TIF)
Phylogenetic tree and genetic clusters. Phylogenetic tree with the leaves colored as cluster Ids (nodes residing in one genetic cluster have the same cluster Id). The colors have been generated by dividing the RGB spectrum into specific intervals, corresponding to the number of distinct clusters. The red leaves scattered through the whole tree are “singletons” (i.e. unclustered isolates).
(TIFF)
Genetic clusters size distribution. Genetic clusters extracted from the phylogenetic tree analysis. A total of 61 clusters (from size 2 to 52) were identified and 39% of all patients were included in these clusters.
(TIF)
Univariable regression analysis of factors associated with superspreaders. Plots of numerical factors (age, viral load, UIP and indegree) versus the out degree of nodes in the MSM, heterosexual, IDU and all risk groups. The correlation coefficients depicted on the graphs show the strength of a linear relationship between independent factors with respect to superspreaders.
(TIF)
Untreated infection period (UIP) versus outdegree of recent infections.UIP vs. the outgoing degree of nodes in the MSM, Heterosexual, IDU and all risk groups populations, for recent infections in the dataset (instances with first positive test after 1998 calendar year). The Pearson's correlation coefficients, 95% confidence intervals and pvalues are depicted on each graph.
(TIF)
Correlation between the UIP and outdegree of the nodes by removing each filter from the filtering process in network construction. None implies that all filters are applied and none is removed from the filtering process.
(DOC)
Fraction of removed edges from the genetic network using different genetic thresholds. Each threshold value corresponds to a percentile of the overall distance distribution measured through the phylogenetic tree.
(DOC)
Percentage of edges filtered from the network by applying each different filter and all filters.
(DOC)
Sensitivity analysis on the “maximum age difference” parameter.
(DOC)
Genetic distance matrix. Excel file of the measured genetic distance between every two viral sequences in the Italian patient dataset.
(CSV)
Appearance of risk group clusters in a contact network. The video shows the construction of a contact network and appearance of three clusters corresponding to the three major HIV risk groups (MSM, heterosexual, IDU).
(RAR)
ReferencesFergusonN (2007) Capturing human behaviour. DoddPJ, GarnettGP, HallettTB (2010) Examining the promise of HIV elimination by ‘test and treat’ in hyperendemic settings. WalenskyRP, PaltielAD, LosinaE, MorrisBL, ScottCA, et al. (2010) Test and treat DC: forecasting the impact of a comprehensive HIV strategy in Washington DC. SorensenSW, SansomSL, BrooksJT, MarksG, BegierEM, et al. (2012) A mathematical model of comprehensive testandtreat services and HIV incidence among men who have sex with men in the United States. SmithRJ, OkanoJT, KahnJS, BodineEN, BlowerS (2010) Evolutionary dynamics of complex networks of HIV drugresistant strains: the case of San Francisco. MeiS, QuaxR, Van De VijverD, ZhuY, SlootPMA (2011) Increasing risk behaviour can outweigh the benefits of antiretroviral drug treatment on the HIV incidence among menhavingsexwithmen in Amsterdam. VerdascaJ, Da GamaMMT, NunesA, BernardinoNR, PachecoJM, et al. (2005) Recurrent epidemics in small world networks. SlootPMA, IvanovSV, BoukhanovskyAV, Van De VijverD, BoucherCAB (2008) Stochastic simulation of HIV population dynamics through complex network modelling. SchneebergerA, MercerCH, GregsonSA, FergusonNM, NyamukapaCA, et al. (2004) Scalefree networks and sexually transmitted diseases: a description of observed patterns of sexual contacts in Britain and Zimbabwe. NewmanMEJ (2005) Power laws Pareto distributions and Zipf's laws. ClausetA, ShaliziCR, NewmanMEJ (2007) Powerlaw distributions in empirical data. LemeyP, RambautA, PybusOG (2006) HIV evolutionary dynamics within and among hosts. BrennerBG, RogerM, RoutyJP, MoisiD, NtemgwaM, et al. (2007) Quebec Primary HIV Infection Study Group, High rates of forward transmission events after acute/early HIV1 infection. FelsensteinJ (2004) Inferring Phylogenies. Steel M (2010) The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing. Edited by Lemey P, Salemi M, Vandamme AM. Biometrics, vol. 66, no. 1, pp. 324–325 [second edition].LewisF, HughesGJ, RambautA, PozniakA, Leigh BrownAJ (2008) Episodic Sexual Transmission of HIV Revealed by Molecular Phylodynamics. ProsperiMCF, CiccozziM, FantiI, SaladiniF, PecorariM, et al. (2011) A novel methodology for largescale phylogeny partition. LemeyP, RambautA, DrummondAJ, SuchardMA (2009) Bayesian Phylogeography Finds Its Roots. DrummondAJ, RambautA (2009) Bayesian evolutionary analysis by sampling trees. Huson DH, Rupp R, Scornavacca C (2010) Phylogenetic Networks. Cambridge University Press, UK.HueS, ClewleyJ, CaneP, PillayD (2004) HIV1 pol gene variation is sufficient for reconstruction of transmissions in the era of antiretroviral therapy. BrownA, GiffordRJ, ClewleyJP, KuchererC, MasquelierB, et al. (2009) Phylogenetic reconstruction of transmission events from individuals with acute HIV infection: toward morerigorous epidemiological definitions. Brown AL, Lycett S, Weinert L, Hughes G, Fearnhill E, et al. (2010) Analysis of the Degree Distribution of HIV Transmission Networks Inferred from Viral Sequence Data. (2010) 17th Conf Retrov Opportun Infect.DeGruttolaV, SchooleyRT (2011) Antiretroviral therapy as prevention: linking the mainframe to Main Street. Thulasiraman K, Swamy MNS (1992) Graphs: Theory and Algorithms. p. 118.FleischerNL, Diez RouxAV (2008) Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction. Oakes JM, Kaufman JS, Glymour MM (2006) Using causal diagrams to understand common problems in social epidemiology. Method Soc Epidemiol. San Francisco, CA: JosseyBass, pp. 393–428.BarcheriniS, CantoniM, GrossiP, VerdecchiaA (1999) Reconstruction of human immunodeficiency virus (HIV) subepidemics in Italian regions. GomezGardenesJ, LatoraV, MorenoY, ProfumoE (2008) Spreading of sexually transmitted diseases in heterosexual populations. NewmanMEJ (2006) Finding community structure in networks using the eigenvectors of matrices. PhillipsKD (1992) Protease inhibitors: a new weapon and a new strategy against HIV. RezzaG, NicolosiA, ZaccarelliM, SaglioccaL, NespoliM, et al. (1994) Understanding the dynamics of the HIV epidemic among Italian intravenous drug users: a crosssectional versus a longitudinal approach. BrancatoG, BrancatoG, PezzottiP, RapitiE, PerucciCA, et al. (1997) Multiple imputation method for estimating incidence of HIV infection. The Multicenter Prospective HIV Study. GiulianiGRM, Di CarloA, PalamaraG, DorrucciM, LatiniA, et al. (2005) Increased HIV incidence among men who have sex with men in Rome. HughesGJ, FearnhillE, DunnD, LycettSJ, RambautA, et al. (2009) Molecular Phylodynamics of the Heterosexual HIV Epidemic in the United Kingdom. LloydSmithJO, SchreiberSJ, KoppPE, GetzWM (2005) Superspreading and the effect of individual variation on disease emergence. LiljerosF, EdlingCR, Nunes AmaralLA, StanleyHE, AbergY (2001) The web of human sexual contacts. Castellano C, PastorSatorras R (2012) Competing activation mechanisms in epidemics on networks. Scientific Reports 2, 371.KitsakM, GallosL, HavlinS, LiljerosF, MuchnikL, StanleyH, MakseH (2010) Identification of influential spreaders in complex networks,. CohenMS, ChenYQ, McCauleyM, GambleT, HosseinipourMC (2011) Prevention of HIV1 Infection with Early Antiretroviral Therapy. AnglemyerA, RutherfordGW, BaggaleyRC, EggerM, SiegfriedN (2011) Antiretroviral therapy for prevention of HIV transmission in HIVdiscordant couples. ChenL, JhaP, StirlingB, SgaierSK, DaidT, et al. (2007) Sexual Risk Factors for HIV Infection in Early and Advanced HIV Epidemics in SubSaharan Africa: Systematic Overview of 68 Epidemiological Studies. MetzgerVT, LloydSmithJO, WeinbergerLS (2011) Autonomous Targeting of Infectious Superspreaders Using Engineered Transmissible Therapies. PastorSatorrasR, VespignaniA (2002) Immunization of complex networks. GranichRM, GilksCF, DyeC, De CockKM, WilliamsBG (2009) Universal voluntary HIV testing with immediate antiretroviral therapy as a strategy for elimination of HIV transmission: a mathematical model. VercauterenJ, WensingAM, Van De VijverDA, AlbertJ, BalottaC, et al. (2009) Transmission of drugresistant HIV1 is stabilizing in Europe. Van de VijverDA, WensingAMJ, BoucherCAB (2006) The Epidemiology of Transmission of Drug Resistant HIV1. EdgarRC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. PriceMN, DehalPS, ArkinAP (2010) FastTree 2 – Approximately MaximumLikelihood Trees for Large Alignments. TamuraK, DudleyJ, NeiM, KumarS (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. LeclercMadlalaS (2008) Agedisparate and intergenerational sex in southern Africa: the dynamics of hypervulnerability. CastillaJ, Del RomeroJ, HernandoV, MarincovichB, GarcíaS, et al. (2005) Effectiveness of highly active antiretroviral therapy in reducing heterosexual transmission of HIV. AttiaS, EggerM, MüllerM, ZwahlenM, LowN (2009) Sexual transmission of HIV according to viral load and antiretroviral therapy: systematic review and metaanalysis. CsárdiG, NepuszT (2006) The igraph software package for complex network research. FruchtermanTMJ, ReingoldEM (1991) Graph drawing by forcedirected placement.