An HIV Epidemic Model Based on Viral Load Dynamics: Value in Assessing Empirical Trends in HIV Virulence and Community Viral Load

Trends in HIV virulence have been monitored since the start of the AIDS pandemic, as studying HIV virulence informs our understanding of HIV epidemiology and pathogenesis. Here, we model changes in HIV virulence as a strictly evolutionary process, using set point viral load (SPVL) as a proxy, to make inferences about empirical SPVL trends from longitudinal HIV cohorts. We develop an agent-based epidemic model based on HIV viral load dynamics. The model contains functions for viral load and transmission, SPVL and disease progression, viral load trajectories in multiple stages of infection, and the heritability of SPVL across transmissions. We find that HIV virulence evolves to an intermediate level that balances infectiousness with longer infected lifespans, resulting in an optimal SPVL∼4.75 log10 viral RNA copies/mL. Adaptive viral evolution may explain observed HIV virulence trends: our model produces SPVL trends with magnitudes that are broadly similar to empirical trends. With regard to variation among studies in empirical SPVL trends, results from our model suggest that variation may be explained by the specific epidemic context, e.g. the mean SPVL of the founding lineage or the age of the epidemic; or improvements in HIV screening and diagnosis that results in sampling biases. We also use our model to examine trends in community viral load, a population-level measure of HIV viral load that is thought to reflect a population's overall transmission potential. We find that community viral load evolves in association with SPVL, in the absence of prevention programs such as antiretroviral therapy, and that the mean community viral load is not necessarily a strong predictor of HIV incidence.


Introduction
Virulence can be defined as the severity of disease caused by a pathogen; the virulence of a pathogen may evolve within a host population as the rates of transmission and host mortality are balanced by natural selection. For HIV, virulence can be defined as the rate of disease progression in the absence of antiretroviral treatment. Understanding if HIV virulence has evolved will inform our understanding of HIV epidemiology and pathogenesis, as increases in HIV virulence would result in more rapid disease progression [1][2][3][4], the earlier initiation of antiretroviral therapy [5], and an increased per-contact transmission risk [6][7][8][9][10].
The estimation of epidemic trends in HIV virulence includes the measurement and analysis of proxy markers for HIV disease progression, with set point viral load (SPVL) the most prognostic single marker of the time to AIDS after HIV infection [1][2][3][4]. While exact definitions of SPVL vary, it is generally the HIV plasma RNA viral load after the resolution of acute infection, but within 6 months to two years after seroconversion (and prior to initiation of antiretroviral therapy). We previously completed a meta-analysis of 8 published studies of population-level trends in SPVL; our meta-analysis found a positive summary trend (0.013 log 10 copies/mL/year, P = 0.07) [11], consistent with increased virulence of HIV. However, the studies also showed large variation in SPVL trends (range: 20.013 to 0.035 log 10 copies/mL/year) [11].
The causes of this variability remain unexplained, but may include estimation methods (clinical or statistical) or differences among the HIV cohorts (and associated local epidemics) in which trends were studied. These cohorts contain primarily men of European descent with HIV subtype B infections. Our previous analysis of study parameters in these 8 cohorts (including transmission risk group frequency, sample size, length, calendar year time period, seroconversion lag, and sampling lag) showed that only seroconversion lag was associated with SPVL trend; i.e., shorter periods between the last negative and first positive HIV-1 antibody tests were correlated with increased SPVL trends. It followed that for the 6 of 8 studies in the meta-analysis that were prospective seroconverter cohorts, the summary SPVL trend was 0.018 log 10 copies/mL/year, 38% greater than the summary trend of 0.013 copies/mL/year for all eight cohorts (including both seroconverter and seroprevalent cohorts). However, this explanation for variation in SPVL trends is insufficient, as trends from seroconverter cohorts are still highly variable (range: 20.002 to 0.035 log 10 copies/mL/year) [11].
Positive trends in mean population SPVL (HIV virulence) may reflect the process of viral adaptive evolution in the human population; recent modeling studies [12,13] have proposed the existence of an evolutionarily optimal SPVL of 4.52 log 10 copies/ mL, defined as the viral load that balances transmission probability with infected lifespan. For example, low SPVL will result in lower infectivity but more total transmissions (due to greater life expectancy), whereas high SPVL will result in high infectivity but fewer total transmissions (due to decreased life expectancy). The proposed optimal SPVL of 4.52 log 10 was qualitatively consistent with mean SPVL levels found in the Amsterdam Cohort Study (4.36 log 10 copies/mL) [12] and the Zambian Transmission Study (4.74 log 10 copies/mL) [9]. Mean SPVLs in the studies included in our meta-analysis were also qualitatively similar to 4.52, albeit with a wider range (from 4.25 to 5.2 log 10 copies/mL). An even greater range in population viral loads was seen in a review of 57 studies, where medians ranged from 3.7 and 5.6 log 10 copies/mL and the overall median was 4.45 log 10 copies/mL [14]. These 57 studies were not studies of SPVL distributions, specifically; however, viral loads in the asymptomatic period are generally stable and likely to be close to SPVL [15,16].
We have developed a stochastic, agent-based HIV evolutionary and epidemic model based on the dynamics of HIV viral load. This model is unique for HIV epidemic models in that it allows for the viral virulence phenotype (set point viral load) to change over the course of an epidemic. The model contains known functions related to viral load trajectories in acute, chronic and disease stages, viral load and transmission probability, SPVL and disease progression rate, and the heritability of SPVL across transmission pairs. These components provide an evolutionary framework in which a balance is achieved between efficient transmission and slow disease progression. This type of evolutionary structure for HIV transmission potential was proposed by Fraser [12].
Our primary aims are to understand both the underlying causes of empirical SPVL trends and also the variation among populations/cohorts in the estimation of these trends (rather than variation in SPVL among individuals e.g., [15,17], or the relative contributions of viral genetic and environmental factors to SPVL variation [13]). To do this, we estimate the temporal changes in SPVL that can result from adaptive viral evolution in human populations, and we assess deviations from these trends related to virologic parameters such as initial mean SPVL, maximum percontact transmission rate, or epidemic stage. We also examine the effects of potential sampling biases due to improvements in the rates of HIV diagnosis, changes in cohort recruitment procedures, or earlier initiation of ART after diagnosis-biases that were not explicitly considered by the studies of empirical SPVL trends described in the meta-analysis.
A secondary aim is to understand trends in HIV community viral load (CVL). CVL is typically defined as the arithmetic or geometric mean (or median) of all reported individual viral loads (individuals diagnosed and sampled, with detectable viral load) in a specific community. CVL is considered to be a heuristic proxy for overall transmission potential (and level of HIV-associated health care) in a given community [18][19][20][21]. Thus, the ability to accurately assess trends in CVL is critical for prevention programs that promote reduced population viral load and transmission potential as a measure of effectiveness: CVL needs to be a robust and informative metric for comparisons through time, meaningfully related to HIV epidemiological parameters that are important to the public health community. However, recent work has described potential pitfalls in the relationship between CVL and transmission potential (e.g., issues of sampling bias, population context, and ecological fallacy) [22]. In our analysis, we focused on the potential evolutionary context of trends in CVL: there are similarities between the estimation of SPVL and CVL trends, and we hypothesize that understanding biases related to SPVL trends will inform our understanding of CVL trends. Overall, we expect our model will improve our understanding of HIV epidemics and the virologic metrics that are used to study them.

Materials and Methods
We have constructed a stochastic agent-based HIV epidemic model that simulates viral load dynamics within and between individuals. The model has four main components: 1) the distribution of set point viral load (SPVL) in a population of HIV-infected individuals linked via a sexual contact network (the population component) [12,14,23]; 2) the predictive relationship between HIV viral load and the per-contact viral transmission rate [6][7][8][9][10]; 3) the predictive relationship between SPVL and the rate of disease progression [1][2][3][4]16]; and 4) the partial heritability of SPVL across transmission pairs (the viral genotype plays a role in determining SPVL) [24][25][26][27]. These components were embedded within a simple model for a sexual contact network. The model includes fixed parameters (with estimates following from previous studies, when possible) and variable parameters (with unknown or uncertain estimates that we varied to evaluate their effects on epidemic and evolutionary output) ( Table 1). The underlying model was written in C with a front-end written in R.

Distribution of set point viral load within a population
This component includes HIV-uninfected people, HIV-infected people, and people who have died of AIDS. Each simulation starts with N total HIV-uninfected and infected individuals at time zero, with each infected individual (of n total HIV-infected individuals) provided a SPVL value following from a Gaussian distribution with user-defined mean and variance, with the variance parameter following published estimates [12,14]. Entry of new (uninfected) individuals into the population is at a constant input rate such that

Author Summary
Virulence can be defined as disease severity; virulence of a pathogen may evolve as the rates of host mortality and transmission are balanced. HIV virulence trends are estimated using set point viral load, a proxy for the rate of HIV disease progression. To assess the capacity for HIV virulence to evolve and to place published virulence trends in an evolutionary context, we developed an evolutionary model based on HIV viral load dynamics. Our model reveals that HIV virulence evolves to an optimal set point (,4.75 log10 copies/mL) similar to levels seen in natural populations. In comparing published trends to modelbased trends, we infer: a) published trends are consistent with HIV adaptation to the human population; and b) variation among published trends may be explained by epidemic context or by sampling biases resulting from improvements in HIV screening and diagnosis. We also assess trends in HIV community viral load, defined as the mean viral load of all reported viral loads in a community, and thought to reflect a population's transmission potential. We find that community viral load evolves in association with set point, in the absence of prevention programs like antiretroviral therapy, and does not necessarily predict HIV incidence.
the population, in absence of HIV infection, will stay at its initial value. Infected individuals may die of their HIV infections according to formulas given below. Further model details are given in Text S1.

Epidemic simulations
We ran epidemic simulations for 100 years, in discrete timesteps of one day. For each model run we tracked individual viral load trajectories, infected lifespan, sexual partnerships and transmission histories, and epidemic growth or decline, and we estimated population measures of AIDS mortality, SPVL trends and heritability. The mean, median, and variance of SPVL in the population could be calculated directly at any day in a simulated epidemic. Likewise, the mean, median, and variance of community viral load (CVL) could be calculated at any day using the viral load from each currently alive and infected individual. We conducted sensitivity analyses on each parameter from Table 1, but focused on: the rate of viral load increase in the asymptomatic period (s from Equation 1 in Text S1); the rate of disease progression (maximum time to AIDS after infection, D max from Equation 2 in Text S1); and the maximum daily rate of transmission in the asymptomatic period (B max in Equation 3 in Text S1).
To examine the potential for adaptive evolution of HIV virulence, we examined 10 replicate epidemic simulations for three different values of initial population mean SPVL (3.5, 4.5, and 5.5 log10 copies/mL). Trends in SPVL were calculated by regressing against the times of infection recorded for each individual (each individual had individual-specific SPVL and time of infection).

Null distributions of 20-year SPVL trends
We sought to place published SPVL trends in the context of results from our epidemic model. To do this, we created hypothetical null distributions of SPVL trends for epidemics with initial mean SPVL values equal to 3.5, 4.5, or 5.5 log 10 copies/ mL. For each of the 10 replicate runs for each initial mean SPVL, we randomly chose 100 separate 20-year time periods with replacement; we chose 20-year periods as most empirical estimates of SPVL trends cover 20-year periods. From each of these 20-year time periods we estimated a univariate linear regression of SPVL by calendar year of infection. This created a distribution of 1610 3 possible observed SPVL trends (100 20-year periods from 10 replicate model runs) for any given set of model parameters, where only the starting random number seed and the randomly chosen 20-year period were different between trend estimates.
The HIV epidemics of European and North American countries are younger than 100 years; the first introduction of HIV to the US is estimated to have occurred in 1969 [28] with introductions to Europe following. Empirical SPVL trends have been estimated most often using data from approximately 1985 to 2010 [29]. This may affect our choice of an appropriate Table 1. Parameters of the model and standard initial values.

Recreation of sampling biases
Two distinct types of sampling biases may potentially result from the fact that HIV viral loads in primary infection are higher in symptomatic individuals [30,31]: a trend in mean SPVL may result from improvements over time in referral or cohort recruitment practices [32], or improvements over time in diagnostic techniques and identification of new HIV infections [33]. For example, if rapid progressors are more readily identified and diagnosed as the epidemic progresses, and these individuals initiate ART before set point is measured, then fewer individuals with (relatively) high SPVL will be sampled as the epidemic ages. This is a bias caused by earlier diagnosis and ART initiation, and may lead to inferred trends of decreasing virulence. Alternatively, if increased diagnosis and recruitment of newly infected individuals is associated with improvements in referral or cohort recruitment, but does not lead to earlier ART initiation, biased sampling in the opposite direction may result. In this case, relatively more symptomatic individuals would be sampled, and trends of increasing virulence may be inferred. (We recognize that increased and earlier diagnosis not leading to earlier ART initiation is not consistent with current clinical practice; however, most published studies of HIV virulence trends included sampling years from the 1980s and 1990s.) We recreated the above types of biased sampling in simulated epidemics as follows: 1) we divided randomly chosen 20-year epidemic periods (used to estimate SPVL trends) into individuals infected in years 1-10 versus years 11-20; 2) from the latter subpopulation (years 11-20), we created subsamples that included individuals with SPVLs either greater or less than SPVL = 5.0 log 10 copies/mL (http://aidsinfo.nih.gov/guidelines/archive/ adult-and-adolescent-guidelines); 3) depending on the type of sampling bias to be recreated, we randomly selected a portion of individuals to be removed from either the ''SPVL.5.0'' or ''SPVL,5.0'' subsamples ( Figure S1).

Community viral load trends
For each day in a simulated epidemic, we calculated community viral load (CVL) as the mean and median viral load of all HIVinfected individuals who were currently in day 45 or greater of their infection (one half of the time to reach set point; with the length of acute infection set to 90 days in our standard parameter settings). This threshold is meant to mimic empirical estimates of CVL, where the majority of individuals in early acute infection are unsampled and do not contribute to CVL estimates.

Results
We developed a stochastic, agent-based, HIV evolutionary and epidemic model based on viral dynamics. Within this model, variation in SPVL across individuals is due to viral and other (host and environmental) factors. This provides an evolutionary framework based on viral transmission potential [12,13] that determines the population trends in virulence and results in an evolutionary balance between virulence (disease severity) and transmission, as SPVL governs the rate of disease progression (the length of the asymptomatic period) and the transmission rate (based on individual viral load, which follows from SPVL as infections progress). Our primary goal is to understand the underlying causes of empirical SPVL trends and also the variation among cohorts/populations in the estimates of these SPVL trends.

Model validation
For model runs of 100 years with standard parameter values (Table 1), we found the following: epidemic growth, starting from 500 HIV-infected individuals in an initial population size of 75,000 (infected and uninfected), varies across initial mean SPVL values. Runs with intermediate (4.5) and high (5.5) initial mean SPVLs resulted in faster rates of epidemic growth; these resulted in .35,000 HIV-infected individuals before year 20 ( Figure 1A). Kaplan-Meier survival curves stratified by quartiles of SPVL were consistent with published survival analyses from two separate cohorts [3,12] ( Figure S2). The estimated heritability of SPVL across transmission pairs were consistent with empirical estimates of SPVL heritability [24][25][26][27], and decreased over the course of simulated epidemics as variance in SPVL decreased ( Figure S3). Population variation in SPVL and VL were consistent with empirical estimates: a recent meta-analysis reported interquartile ranges (IQR) for 51 studies of population viral loads [14] with an average IQR of ,1.0 log 10 copies/mL; IQRs of SPVL for our model were 0.9, 0.86, and 0.84 log 10 copies/mL for new infections taking place in years 25, 50 and 75 of simulated epidemics, respectively ( Figure S4).

Frequency of transmissions from early HIV infection
For a standard epidemic run (Table 1), ,5% of all infections (over 100 year epidemic simulations) took place while the source partner was in ''early HIV infection'' (defined as 3 months, up to Fiebig stage V [34]), although this frequency was higher (,10%) in the early stages of simulated epidemics ( Figure S5). Changing the definition of ''early HIV infection'' to include the entire first year of infection increased the frequency of early infection transmissions to ,20% of all transmissions (and this frequency is higher, ,25%, if only simulated epidemic years 10 to 40 (,1980 to 2010) were considered) ( Figure S5).

Set point viral load evolves to an optimal level
We found that SPVL evolves toward a equilibrium value of approximately 4.75 log 10 copies/mL, regardless of the mean SPVL of the founding population of infected individuals (at time zero) ( Figure 1B). Variability across 10 replicate model runs (with only the random number seed changed) in SPVL trends was minimal ( Figure 1B). This SPVL is slightly higher than the 4.52 optimal SPVL predicted by Fraser [12] and Shirreff [13], as well as the 4.45 overall median of 57 studies reported by Korenromp [14].

Empirical trends are similar to model-based trends
We next compared empirical SPVL trends to results from our model by producing null distributions of 20-year linear SPVL trends. Figure 2A shows the different null distributions for model runs with founder mean SPVLs of 3.5, 4.5, and 5.5 log 10 copies/ mL, using all 100 years of simulated epidemics. Broadly, trends estimated from epidemics with initial mean SPVL = 3.5 were positive and trends from epidemics with initial mean SPVL = 4.5 or 5.5 were negative (Figure 2A). Interestingly, the distribution for epidemics with starting mean SPVL = 4.5 included mostly negative trends despite the optimal SPVL equal to ,4.75 copies/mL, due to transient increases in virulence early in epidemics, before the number of susceptible individuals begins to decrease ( Figure 1B) [13,35]. We compared the empirical SPVL trends to these null distributions, and the magnitudes of the empirical trends were similar to (and therefore consistent with) the model-based trends. The majority of these trends were located to the far right (positive) side of the distributions (SPVL trends.0.01 log10 copies/mL/year) ( Figure 2A). (95% confidence intervals for the empirical trends are overlaid on null distributions created from model years 0 to 100 in Figure S6A.)

Founding HIV lineages may have had low virulence
When we compared empirical SPVL trends to potentially more appropriate null distributions (produced by sampling only years 0 to 40, or years 10 to 40, of simulated epidemics), the empirical trends were even more similar to model-based trends. Specifically, the larger empirical trends (.0.01 log10 copies/mL/year) were consistent with the median trends from null distributions for simulated epidemics with initial mean SPVL = 3.5 ( Figure 2B). Thus, the larger empirical trends may be unbiased measures of adaptive viral evolution in populations where the initial founding viral lineages contained low virulence. The median SPVL trend for the null distribution produced from sampling from years 10 to 40 of epidemics with initial mean SPVL = 3.5 was 0.0073 log 10 copies/mL/year ( Figure 2B), compared to a median SPVL trend of 0.0011 log 10 copies/mL/year when sampling from years 0 to 100. (95% confidence intervals for the empirical trends are shown overlaid on null distributions created from model years 10-40 in Figure S6B.)

The affect of sampling biases on empirical SPVL trends
We simulated two types of sampling biases to begin in the latter half of 20-year time periods: 1) earlier ART initiation; and 2) increased rates of diagnosis (without associated earlier ART initiation). Hypothetically, these sampling biases may result in linear SPVL trends that are decreased and increased relative to true trends, respectively. Indeed, simulating these biases on model output resulted in incorrect estimates of the true evolutionary trends in SPVL (Figures 3A and 3B). Figure 3A shows incremental positive shifts in SPVL trend distributions that result from increased sampling biases (by increasing the proportion of the HIV-infected population that is be affected by, in this example, increased rates of diagnosis without associated earlier ART). For this specific example (which included only simulated epidemics with initial mean SPVL = 3.5), the medians of biased SPVL trends were significantly different from the median of the true trend distribution (unbiased median = 0.0011; 10% removed median = 0.0028; 50% removed median = 0.0129 log 10 copies/mL/ year). Similar shifted distributions, though in the opposite direction, were found for biases due to earlier ART initiation.
A more appropriate null distribution to assess sampling biases Figure 3A shows the effect of sampling biases on null distributions produced from years 0 to 100 of simulated epidemics. As noted above, a more appropriate null distribution may be a time period from simulated epidemics that reflects the expected sampling times of HIV subtype B epidemics; this can be years 10 to 40 of simulated epidemics, reflecting approximately years 1980 to 2010. When this narrowed time period is used to recreate potential sampling biases, the effects on SPVL trend distributions are distinct than those seen when using the full 100 years of simulated epidemics: the median trend values increase significantly with more increased sampling biases (unbiased median = 0.0073; 10% removed median = 0.0094; 50% removed median = 0.0208 log 10 copies/mL/year), but all distributions overlapped extensively ( Figure 3B). In effect, with this narrow null distribution it is more difficult to distinguish between adaptive viral evolution and sampling biases as possible explanations for the empirical SPVL trends.

SPVL trends at different epidemic stages (biases due to evolutionary context)
An assumption inherent to many studies of HIV virulence is that SPVL trends will be linear. This assumption is likely false, as epidemic growth or decline is affected by the availability of susceptible individuals (among other factors); epidemics may not experience constant linear growth, and thus evolutionary pressures on the virus may shift over the course of an epidemic. Furthermore, processes of HIV evolution, including natural selection and genetic drift, can be affected in complex ways by changes in the viral effective population size (measured in this context most simply as prevalence). We examined the distribution of SPVL slopes in progressing stages of simulated epidemics ( Figure 4A). For simulated epidemics with varying initial mean SPVLs (3.5, 4.5, and 5.5 log 10 copies/mL), the distribution of slopes changed over time, as more extreme slopes (further away from zero) occurred in the first 20 years of epidemics, and all distributions converged to near zero as the epidemic entered years 40 and greater. This suggests that if the empirical SPVL trends are due to adaptive viral evolution, then we can hypothesize that local/regional epidemics (represented by national HIV cohort populations) with increasing SPVL were founded by viruses of low virulence (SPVL,3.5).

Sensitivity to parameter values
We tested whether model output was sensitive to the parameters of the viral load functions. We initially focused on: 1) the viral load progression rate in chronic infection; 2) the maximum rate of progression to AIDS; and, 3) the maximum daily rate of transmission in the asymptomatic period. Variation in the rate of viral load increase and in the rate of disease progression had only minor effects on epidemic growth and the evolution of virulence ( Figures S7 and S8). Variation in the maximum transmission rate (B max ) had large effects on epidemic growth and the early pattern of evolution toward optimal SPVL ( Figure S9). With higher maximum transmission rates, the epidemic size increases more rapidly and virulence (SPVL) increases; these increases in SPVL were transient, however, as the supply of susceptible individuals soon declines and limits further epidemic growth. The evolutionarily optimal SPVL remains equivalent across different values of B max . This pattern of increasing virulence in the early stage of the epidemic followed by decreasing virulence as susceptible supply declines might be expected [35] and was also observed in the simulations of Shirreff [13].
Model simulations with B max following directly from Fraser [12] (B max = 0.001044 per day) resulted in epidemic growth that was slow relative to expectations from empirical HIV epidemic data. As B max in Fraser [12] and Shirreff [13] was the maximum transmission rate estimated for serodiscordant couples within HIV cohorts [9], and likely an underestimate of the transmission rate in the general population [12], we increased B max for our standard runs (e.g., Figures 1 to S7) to B max = 0.0025 per day. Increasing B max allowed more accurate recreation of HIV epidemic growth rates; this did not affect the optimal HIV SPVL, but variation in B max did affect the rate at which SPVL changes in the early epidemic, with higher B max associated with early increases to higher SPVL ( Figure S9). In addition to the viral load progression rate, the maximum rate of disease progression, and the maximum transmission rate, we examined the effects of variation in other model parameters on SPVL trends. However, the time to peak viremia, the peak viral load in acute infection, the length of acute infection, the viral load at AIDS, the heritability of SPVL at time zero, and the mutational variance of SPVL all had minimal impacts on optimal SPVL.

Community viral load trends
In our simulated epidemics the mean CVL, measured as the mean or median of log 10 -transformed VL of all infected (and sampled) individuals in a population at a given time, changed over time ( Figure 4B). (Mean and median CVL are nearly identical in our simulations.) Community viral load evolved in association with SPVL: CVL trends were influenced by the initial mean SPVL of the founding population, and annual trends in mean/median CVL were qualitatively similar to annual SPVL trends. Mean CVL was consistently lower than mean SPVL due to frailty bias (individuals with low SPVLs were included in CVL estimates more often than individuals with high SPVLs) ( Figure 4B). The variance around the mean CVL decreased over the first half of the model runs before stabilizing around 0.3 log 10 copies/mL. The distribution of all viral loads in our simulations was qualitatively similar to the expected distribution proposed by Miller [22] for a population sample of ''detectable viral loads from individuals not on treatment.' ' We assessed the hypothesis that CVL is a heuristic measure of a population's overall transmission potential, i.e., that mean CVL is positively associated with incidence [18][19][20]. First, for complete 100-year epidemic simulations, with initial mean SPVL values of 3.5, 4.5 and 5.5 log 10 copies/mL, we compared yearly values of mean CVL to annual incidence (infected/susceptible) for each year ( Figure 5A). Over this extended timescale, mean CVL and annual new infections were significantly correlated only for initial SPVL = 3.5 (Spearman's rho = 0.56, P-value = 1.86 2 09; for initial SPVL = 4.5, rho = 0.11, P-value = 0.293; for initial SPVL = 5.5, rho = 0.12; P-value = 0.254).
Next, we modified this model-based comparison of CVL and incidence in order to make comparisons more equivalent to the 10-year timescales of empirical observations of CVL [18][19][20]. We created a data set containing all 10-year time periods from the 100-year simulated epidemics, each created with a sliding window of length 10 years with increment of one year. For each of these 10-year periods, we estimated Spearman's rho and P-value between yearly CVL and new infections ( Figure 5B). A minority of 10-year time periods from each epidemic (each initial mean SPVL) contained significant (P,0.05) associations between yearly mean CVL and number of infections, and the direction of association (Spearman's rho) alternated between positive and negative over the course of the epidemic ( Figure 5B). These simulations were run in the absence of prevention programs such as antiretroviral therapy, yet suggest that mean CVL is not necessarily a strong predictor of HIV transmission potential.

Discussion
We developed a stochastic agent-based HIV evolutionary and epidemic model that allows for the virulence phenotype, defined by the set point viral load, to change over time. The model is based on viral dynamic functions of transmission, disease progression, and heritability, but retains standard HIV epidemic model output such as prevalence, incidence and the proportion of transmissions occurring during different stages of infection. With this model we addressed questions related to HIV virulence evolution, sampling biases, and epidemic context, and explored how viral dynamic parameters affect SPVL trends and epidemic growth. We also addressed questions about the relationship between community viral load and incidence, in order to assess the use of community viral load as a useful public health metric.

Set point viral load trends
Our model shows that HIV virulence, using set point viral load as a proxy, can adaptively evolve in a host population. This result is consistent with previous work [12,13], in both evolutionary processes (adaptation of HIV virulence to optimize transmission potential) and patterns (optimal SPVL ,4.75 log 10 copies/mL). It is also consistent with studies showing HIV adaptation to the human population in response to both cellular [36,37] and humoral [38,39] responses over the course of the epidemic.
To assess the published empirical trends in SPVL, we created hypothetical null distributions of 20-year linear trends in SPVL and placed the empirical trends within these distributions. Because local epidemics with published SPVL trends may have unique epidemic and evolutionary contexts, particularly the SPVL (virulence) of the founding viral strain, we created null distribu- Figure 3. The effect of sampling biases on the estimation of model-based SPVL trends. A. Comparison of distribution of 20-year linear SPVL trends estimated from unbiased (black lines, initial mean SPVL = 3.5; grey lines, initial mean SPVL = 4.5) and biased (dotted lines, multiple colors representing multiple sub-sampling levels) data sets. The underlying distributions are produced from years 0 to 100 from simulated epidemics. Removing subsets (%) of all individuals (a schematic representation of the biased sampling process is shown in Figure S1A) results in a distribution of linear trends with a median SPVL trend of greater magnitude than the unbiased trends. B. Comparison of distribution of 20-year linear SPVL trends estimated from unbiased and biased data sets, but with the underlying distributions produced from years 10 to 40 from simulated epidemics. doi:10.1371/journal.pcbi.1003673.g003 tions of SPVL trends for simulated epidemics with initial SPVL means of 3.5, 4.5 and 5.5 log 10 copies/mL. Notably, the simulated and empirical trends are within the same magnitude (between 20.02 to 0.03 log 10 copies/mL/year), which suggests that adaptive HIV evolution may explain observed trends in HIV virulence.
What explains the variation among empirical trends? The placement of the empirical SPVL trends spans the three null distributions; yet, five (out of eight) lie within the upper distribution (.0.01 log 10 copies/mL/year) of the simulated trends (Figures 2A).
If we narrow our null distribution to years 10 to 40 of simulated epidemics to better reflect the years 1980 to 2010 of European and North American subtype B epidemics, the empirical trends of greater magnitude appear more consistent with model-based trends; these empirical trends are consistent with these epidemics being founded by viral populations of low virulence (virulence less than the optimal level) ( Figure 2B). We know from the respective publications that 4 of these 5 empirical trends belong to HIV cohorts with initial mean or median SPVLs less than our modelpredicted optimal SPVL (,,4.75): 3.6 [40], 4.19 [41], 4.3 [42], and 4.4 [43] log 10 copies/mL (the first sampling period for these studies was most often a 2 to 5-year period starting in 1985; this would correspond to a time approximately between years 15 to 20 in our simulated epidemics). It is essential for future work to compare simulated adaptive evolution of SPVL to real epidemic data, using phylodynamic analysis to reconstruct epidemic histories overlaid with empirical and simulated SPVL trends.

Recreation of cohort-specific sampling biases
While our model results suggest that adaptive evolution may explain empirical trends, and that epidemic context may explain variation in empirical trends, we also assessed whether trends and variation among trends could be the result of cohort-specific sampling biases. To do so, we recreated two types of simple biases: improvements in diagnosis that result in increased sampling of high virulence individuals; or earlier ART initiations that result in decreased sampling of high virulence individuals. We found that the empirical SPVL trends are consistent with biased trends resulting from improved diagnosis of symptomatic cases. Whether this particular type of sampling bias may explain the empirical trends-as opposed to being explained by adaptive evolutionrequires study of specific cohort clinical and community practices. Diagnosis of symptomatic individuals without earlier ART initiation is not consistent with clinical practice; it is unclear what the strength of this bias could be over time in HIV cohorts, and it is likely that both of the potential biases recreated here do exist (to some relative degree) in every HIV cohort.
We can try to infer, using data from the meta-analysis of observed SPVL trends [11] and our model output, which potential type of sampling bias is stronger in the empirical data. The metaanalysis contained 8 studies of SPVL trends, 6 of which were prospective cohorts that estimated virulence trends using only SPVL data from individuals with an estimated date of HIV infection. These seroconverter cohorts are less vulnerable to sampling biases, because individuals enter the cohort uninfected. In the meta-analysis, the summary SPVL trend for all eight cohorts (including both seroconverter and seroprevalent cohorts) was 0.013 log 10 copies/mL/year, 38% lower than the summary trend of 0.018 seen for the six seroconverter cohorts. A discrepancy between trend estimates in this direction (decreasing SPVL trend) is consistent with a sampling bias caused by earlier ART initiation in the seroprevalent cohorts. (i.e. the lower SPVL trends in the seroprevalent cohorts could be explained by a sampling bias caused by earlier ART initiation in those populations). This hypothesis is consistent with decreasing clinical thresholds for ART initiation (http://aidsinfo.nih.gov/guidelines/archive/adultand-adolescent-guidelines).

Community viral load trends
The use of community viral load as a population-level metric of HIV transmission potential has been proposed [18][19][20][21]. To date few studies, empirical or modeling, have assessed this proposition thoroughly [22]; for CVL to be a useful tool for public health inferences, it must accurately and precisely reflect HIV epidemiological parameters of interest. There are similarities between the estimation of SPVL and CVL trends-we hypothesize that understanding the underlying causes or sampling biases related to SPVL trends can inform our understanding of CVL trends. With our model, we attempted to evaluate the possible affects of HIV epidemic and evolutionary context on trends in mean CVL. Our findings suggest the relationship between CVL and incidence is not straightforward, yet is strongly modulated by epidemic context, including: 1) the initial mean SPVL of an epidemic; and 2) the epidemic stage in which the CVL and incidence relationship is evaluated. As shown in Figure 5B, significant associations between CVL and incidence can be identified in simulated epidemics, but there are both positive and negative associations. In this scenario CVL is not a robust population-level metric of HIV transmission potential. However, our estimate of CVL does not include individuals on ART with depressed viral loads and thus does not perfectly coincide with real world applications. Nevertheless, our model shows that attempts to infer transmission potential from CVL can give highly misleading results, as CVL is influenced by both historical and evolutionary factors.

Caveats and future additions
The global HIV pandemic is one of multiple separate epidemics that can be stratified by viral genotype (subtypes or circulating recombinant forms) [44,45], and by human population (e.g., transmission risk group, geography) [46,47]. Our model is not designed to recapitulate the entire global pandemic, but rather local epidemics containing a single subtype. The disease progression function of our model is based on subtype B data; it is possible that the relationship between SPVL and disease progression is different across subtypes [48][49][50][51][52], but the disease progression function likely holds for different risk groups within local subtype B epidemics (e.g., heterosexual sex, men who have sex with men, or injecting drug use within subtype B epidemics). The transmission function of our model is based on serodiscordant heterosexual couples with counseling [9]; as discussed above, using B max from Fraser [12] that saturates at viral loads results in slower epidemic growth rates. Elevating this rate to perhaps better reflect transmission rates in the general population results in more realistic growth rates but a similar level of optimal virulence. Additionally, SPVL is known to vary among hosts due to host genetics (HLA type) [53]; our model does not distinguish among individuals in their susceptibility to infection or host effect on SPVL (which can influence transmission and disease progression). Nor does our model allow for variation within individuals in viral reproductive rate; i.e., in our model all viral lineages within a single person are assumed to contain the same viral genetic factors for viral reproduction and SPVL/virulence. When this is not the case, and virions within a host are allowed to vary in reproductive rate, it is theoretically possible that different evolutionary trends in relation to optimal SPVL may be seen [54].
Our model includes simple demographic and sexual mixing terms, which we believe are sufficient to address the issues explored in this paper. A notable result from our model epidemics was the relatively low frequency of transmissions from ''early HIV infection'' relative to other published estimates [55]. For our standard epidemic runs, ,10% to 25% of infections took place while the source partner was in ''early HIV infection'' (with ''early HIV infection defined as either the duration of acute infection or the first year of infection, respectively). Further work will clarify whether our low estimates suggest that behavioral or network parameters (rather than strictly viral dynamics, as included in our model) are the likely source of the high contribution of early HIV infection to onward transmission that is reported elsewhere. An additional possible cause of the high reported frequencies of transmission in early HIV infection are viral genetic factors associated with increased transmissibility in early infection [56][57][58][59]; our model does not provide viruses with stagespecific transmission probabilities.
We plan to extend our model and analysis by comparing virulence trends among populations with more complex sexual mixing patterns, and among populations with varying sample fractions at different stages of an ART treatment cascade. We hope this epidemic modeling approach based on viral dynamics will be a useful tool in the prediction or evaluation of potential outcomes of prevention programs. Figure S1 Recreation of simple SPVL sampling biases. The schematic illustrates two types of potential sampling biases, each reconstructed within a 20-year period (randomly-selected from a larger 100-year epidemic). Each 20-year period is divided into two 10-year periods, with the second 10-year period sub-sampled to recreate a sampling bias that may occur as an HIV epidemic progresses within a clinic, cohort, region or country. The blue region illustrates a sampling bias caused by improved diagnosis of symptomatic cases in the 2 nd 10year period (symptomatic cases are associated with higher viral loads; SPVLs.5.0 log 10 copies/mL). In this case, one will not sample a greater portion of individuals with low set point viral loads as the epidemic progresses, which we simulate by removing a portion of the individuals with SPVL,5.0 log 10 copies/mL in the 2 nd 10-year period (light blue circles). This would lead to an higher estimated rate of SPVL change over time. The green region illustrates a sampling bias caused by increased rates of earlier ART initiation to symptomatic individuals in the 2 nd 10-year period. In this case there are fewer people with higher SPVLs to sample, which we simulate by removing a portion of the individuals with SPVL.5.0 log 10 copies/mL in the 2 nd 10-year period (light green circles). This would lead to an lower estimated rate of SPVL change over time.  Figure S6 Confidence intervals of the empirical SPVL trends. A. Confidence intervals are placed on top of modelproduced null distributions produced using years 0 to 100 of the simulated epidemics. Shown are null distributions from multiple runs with initial population mean SPVLs of 3.5, 4.5 and 5.5 log 10 copies/mL (see Figure 2A). B. Confidence intervals are placed on top of model-produced null distributions produced using years 10 to 40 of the simulated epidemics. Shown are null distributions from multiple runs with initial population mean SPVLs of 3.5, 4.5 and 5.5 log 10 copies/mL (see Figure 2B).  Figure 5. Mean community viral load is not linearly or consistently associated with annual incidence. A. Plot of yearly estimates of mean community viral load versus annual incidence, for 100 years of a simulated epidemic. B. Distributions of P-values for Spearman correlations between mean community viral load and incidence, by year, for 10-year periods from the same 100 year epidemic, from a sliding window of 10 with one year increments. Shown are a plot of CVL and incidence for a 100-year simulated epidemic, Spearman correlation coefficients between CVL and incidence for each overlapping 10-year period, and P-value for each Spearman correlation test. Significant associations between CVL and incidence can be positive or negative, depending on epidemic context. doi:10.1371/journal.pcbi.1003673.g005 Figure S9 Increased viral transmission rates result in increased epidemic growth but similar set point dynamics. A. Infected individuals and B. Mean set point viral load over time. Variation in the annual rate of transmission (B max in Equation 3) significantly affects epidemic growth (due to changes in the number of susceptible individuals), but does not affect the evolutionarily optimal set point viral load (although it changes the shape of SPVL change).

(TIF)
Text S1 Additional detailed description of the model. (DOCX)