HIV Treatment as Prevention: Systematic Comparison of Mathematical Models of the Potential Impact of Antiretroviral Therapy on HIV Incidence in South Africa

Background Many mathematical models have investigated the impact of expanding access to antiretroviral therapy (ART) on new HIV infections. Comparing results and conclusions across models is challenging because models have addressed slightly different questions and have reported different outcome metrics. This study compares the predictions of several mathematical models simulating the same ART intervention programmes to determine the extent to which models agree about the epidemiological impact of expanded ART. Methods and Findings Twelve independent mathematical models evaluated a set of standardised ART intervention scenarios in South Africa and reported a common set of outputs. Intervention scenarios systematically varied the CD4 count threshold for treatment eligibility, access to treatment, and programme retention. For a scenario in which 80% of HIV-infected individuals start treatment on average 1 y after their CD4 count drops below 350 cells/µl and 85% remain on treatment after 3 y, the models projected that HIV incidence would be 35% to 54% lower 8 y after the introduction of ART, compared to a counterfactual scenario in which there is no ART. More variation existed in the estimated long-term (38 y) reductions in incidence. The impact of optimistic interventions including immediate ART initiation varied widely across models, maintaining substantial uncertainty about the theoretical prospect for elimination of HIV from the population using ART alone over the next four decades. The number of person-years of ART per infection averted over 8 y ranged between 5.8 and 18.7. Considering the actual scale-up of ART in South Africa, seven models estimated that current HIV incidence is 17% to 32% lower than it would have been in the absence of ART. Differences between model assumptions about CD4 decline and HIV transmissibility over the course of infection explained only a modest amount of the variation in model results. Conclusions Mathematical models evaluating the impact of ART vary substantially in structure, complexity, and parameter choices, but all suggest that ART, at high levels of access and with high adherence, has the potential to substantially reduce new HIV infections. There was broad agreement regarding the short-term epidemiologic impact of ambitious treatment scale-up, but more variation in longer term projections and in the efficiency with which treatment can reduce new infections. Differences between model predictions could not be explained by differences in model structure or parameterization that were hypothesized to affect intervention impact. Please see later in the article for the Editors' Summary


Introduction
There has recently been increasing interest in expanding provision of antiretroviral therapy (ART) as a tool for reducing the spread of HIV in generalised epidemics in sub-Saharan Africa [1][2][3][4][5]. As momentum gathers for ''HIV treatment as prevention'', there is an urgent need to understand how ART might contribute to averting HIV transmissions, in addition to its direct benefits in reducing morbidity and mortality amongst treated patients. Mathematical modelling has supplied critical insights to discussions about treatment as prevention by providing a framework for combining information about the relationship between an infected individual's viral load and HIV transmissibility [6,7], the reduction in a host's HIV viral load when on ART [8,9], and the populationlevel contact structure over which HIV is transmitted [10,11].
The idea of using medicines that suppress viral concentrations to reduce transmission of infection was posed almost as soon as the first HIV drugs were developed [12,13]. Early models of the impact of ART focused on the opposing effects of reduced transmissibility and extended survival on new HIV infections, and whether associated increases in sexual risk behaviour would negate the prevention benefits of ART [10,12,[14][15][16][17][18][19][20][21][22][23]. Since then, longitudinal observational data and one randomized controlled trial have demonstrated substantial reductions in the risk of heterosexual HIV transmission when the infective partner is virally suppressed [24][25][26][27][28], and continued follow-up of individuals receiving ART has confirmed the durability of viral suppression [29], including in sub-Saharan Africa [30,31]. At the same time, there have been tremendous improvements in access to treatment in sub-Saharan Africa [32]. More recent modelling has shown more optimism about the potential for treatment to reduce new HIV infections in this region, with much work focused on the setting of South Africa, home to one in six people living with HIV globally [33].
Perhaps the most provocative of these modelling efforts has been the study by Granich and colleagues suggesting that a strategy involving annual testing and immediate treatment for all HIV-infected individuals, combined with other interventions, could eliminate HIV by the year 2050 [34]. Wagner and Blower implemented the same model but used different assumptions about treatment uptake amongst asymptomatic infected individuals that they characterised as being more realistic, and concluded that elimination would not be possible [35]. Kretzschmar et al. highlighted how choices in model structure affect the epidemic dynamics and intervention impact [36]. Dodd et al. showed that the potential for treatment to eliminate HIV depends on the patterns of sexual mixing in the population [11]. An age-structured model by Bacaër et al. found that elimination might be possible with less frequent testing than proposed by Granich et al., given recent epidemic trends and increases in condom usage [37]. Bendavid et al. used a microsimulation model to highlight that, in addition to increasing HIV testing, improving linkage to and retention in care are essential to achieving maximal benefits of test-and-treat interventions [38].
Other models have focused on the potential prevention benefits of providing treatment in line with current therapeutic guidelines. Eaton et al. estimated that 60 to 90 new infections could be averted for every 1,000 additional persons treated with CD4 cell count below 350 cells/ml (the current World Health Organization recommendation for when to start treatment [39]), depending on how well patients on treatment are retained in care [40]. The Goals model, used in the evaluation of the new UNAIDS Investment Framework, found that a US$46.5 billion incremental investment over the years 2011 to 2020, incorporating expanding access to ART, could avert 12.2 million new infections and 7.4 million deaths globally over that period [41]. Using a microsimulation model of the HIV epidemic in KwaZulu-Natal Province, Hontelez et al. found that expanding access to ART from those with CD4 cell count #200 cells/ml to those with #350 cells/ml required 28% more patients to receive treatment, but amounted to only a 7% increase in annual investment [42]. Cumulative net costs broke even after 16 y.
Models have also sought to understand the impact of past and current treatment policies; Johnson et al. used the ASSA2003 and STI-HIV Interaction models to assess the relative contributions of increased condom usage and ART scale-up to the declines in HIV incidence in South Africa up to 2008 [43]. Finally, other mathematical models have been used for short-term projections as a basis for power calculations for community-randomized trials of treatment as prevention [44].
Each of these models has predicted dramatic epidemiologic benefits of expanding access to ART, but models appear to diverge in their estimates of the possibility of eventually eliminating HIV using ART, the cost-effectiveness of increasing the CD4 threshold for treatment eligibility, and the benefits of immediate treatment compared to treatment based on the current World Health Organization eligibility guidelines. Directly comparing the models' predictions is challenging because each model has been applied to a slightly different setting, has used different assumptions regarding other interventions, has been used to answer different questions, and has reported different outcome metrics.
In this study we seek to understand the extent to which diverse mathematical models agree on the epidemiological impact of expanded access to ART by simulating the same set of intervention scenarios across the models and focusing on standardised outputs. The intervention scenarios are designed to be simple enough to be consistently implemented across different types of models in order to control several aspects of the treatment programme and isolate the effects of model structure, parameters, and assumptions about the underlying epidemic on estimates of intervention impact. The purpose of this study is not to make predictions about the impact of any particular intervention in any specific setting, but rather to better characterise the array of mathematical models being used to inform policy about treatment as prevention in hyperendemic settings such as South Africa.

Study Design
Literature and reports of meetings on related topics were reviewed in August 2011, and researchers who had previously developed mathematical models of the potential epidemiological impact of expanded access to ART, calibrated to the South African epidemic setting, were invited to participate in the model comparison exercise by simulating a standardised set of ART scale-up scenarios. Three aspects of the treatment programme were systematically varied: the CD4 threshold for treatment eligibility, access to treatment for those eligible, and the retention of patients on treatment. The timing of ART introduction and the rate at which individuals start treatment after becoming eligible were also standardised. The impact of an intervention was measured by comparing the number of new infections in the intervention scenario with that in a counterfactual epidemic simulation in which no ART is provided within the same model population. The counterfactual of no ART was chosen so that comparison between models would be independent of assumptions about the historic growth in ART uptake. As such, the results should not be interpreted as estimates of the future impact of treatment compared to current patterns of ART coverage, but can be generally taken as estimates of the overall net impact of treatment in a hypothetical scenario that assumes rapid ART scale-up and a homogenous rate of ART initiation across all ARTeligible adults. Although different models may incidentally have been calibrated using the same data, no standardisation was imposed on the specific epidemiologic data used for model calibration or on the calibration procedure itself in this exercise.

Mathematical Models
Twelve groups accepted the invitation to participate in the model comparison exercise. The collection of models encompasses a wide range of model structures, mechanisms for representing HIV transmission and disease progression, overall levels of complexity, and detail in the characterisation of treatment programmes. Table 1 summarises the names, authors, and key references for each model, and compares aspects of model structure. Four of the models are agent-based microsimulation models (i.e., models that track the behaviour and infection status of individual people) and use random-number generators to simulate particular events such as a new partnership formation or transmission events. The remaining eight models are deterministic compartmental models that stratify the population into groups according to each individual's characteristics and HIV infection status and use differential or difference equations to track the rate of movement of individuals between these groups. One of the models, the BBH model, solves the differential equations analytically, while the others numerically evaluate the differential equations. Ten of the models explicitly simulate both sexes and heterosexual HIV transmission, and six of the models include some form of age structure, although the extent to which age affects the natural history of HIV, the risk of HIV acquisition, and the risk of HIV transmission varies amongst these. All of the models simulate the national HIV epidemic in South Africa except for the STDSIM model, which simulates the higher prevalence Hlabisa subdistrict of KwaZulu-Natal Province, South Africa. Box 1 gives further comparative description of the structures and parameterization of the mathematical models.

Intervention Scenarios
Three different CD4 cell count thresholds for treatment eligibility were considered: CD4 count #200 cells/ml, CD4 count #350 cells/ml, and all HIV-infected individuals. In each eligibility scenario, treatment initiation was simulated under the assumption that all eligible individuals had equal access, without prioritisation for any subpopulations. It was further assumed that eligible individuals with access to the intervention would initiate ART at a constant rate after reaching eligibility, such that average time from eligibility to treatment initiation would be 1 y.
Treatment access was defined as the proportion of eligible individuals who eventually initiate treatment. For example, 60% access and eligibility at CD4#350 cells/ml implies that 60% of individuals will initiate treatment, on average 1 y after their CD4 count drops below 350 cells/ml, while 40% will never access treatment. Seven levels of treatment access were evaluated: 50%, 60%, 70%, 80%, 90%, 95%, and 100%.
Retention was defined as the percentage of individuals remaining on treatment after 3 y, excluding from both the numerator and the denominator those who had died while on treatment. The levels of retention were 75%, 85%, 95%, and 100% (no dropout), with individuals dropping out from treatment at a constant rate such that the desired level of retention was achieved at the 3-y time point. The prognosis and future treatment options for individuals who dropped out from treatment were not standardised.

Intervention Scale-Up
ART was assumed to be introduced into the population from the beginning of year 2012, with no treatment provision prior to this (in contrast to the rapid scale-up of treatment that has actually occurred prior to 2012 in South Africa). Intervention scale-up was immediate-a fraction (corresponding to the specified level of ART access) of individuals already eligible for treatment at the start of the intervention period were assumed to initiate treatment at a constant rate from that point, along with individuals who became eligible for treatment after the start of the intervention period.

Output Metrics
The measures of intervention impact were the percentage reduction in HIV incidence rate among adults (aged 15-49 y) in the ART scenario versus the no-ART counterfactual, the cumulative number of person-years of ART provided since the introduction of ART, and the cumulative number of person-years of ART provided per infection averted as a measure of the ''efficiency'' with which ART prevents infections. The percentage reduction in incidence was defined by calculating the difference in the adult HIV incidence rate between the intervention and no-ART counterfactual in the same year and dividing this by the incidence rate in the counterfactual scenario. The number of person-years of ART provided per infection averted was calculated by dividing the cumulative number of person-years of ART by the difference between the number of new adult infections since year 2012 in the intervention and the counterfactual scenario. Each of these metrics was reported at the midpoints of the years 2020 and 2050. Most of the models included in this study were not designed with the intention of making realistic projections to year 2050, but these results were included to gain some insight into the long-term dynamics of the models.
In addition to these measures of intervention impact, each model reported the HIV prevalence and HIV incidence rate amongst males and females aged 15-49 y for the no-treatment counterfactual simulation and the total size of the adult population (age 15 y and older). Each model also produced the proportion of the HIV-infected population in each CD4 count category (#200, 200-350, and .350 cells/ml) and in early HIV infection in year 2012, and the proportion of HIV transmissions from individuals in each category.
The Eaton and STI-HIV Interaction models report posterior means and 95% credible intervals for model outcomes of interest (see Box 1). The Bendavid model completed simulations only for 50%, 80%, and 100% access, and 75%, 85%, and 100% retention scenarios, and only simulated to year 2040, so results for this model are reported for the year 2040 where other model results are reported for year 2050. The BBH model completed simulations only for the 85% and 100% retention scenarios. The Granich model did not simulate ART for the CD4#200 cells/ml eligibility threshold, while the STI-HIV Interaction model did not simulate ART eligibility for all HIVinfected individuals. As a result of these models not completing all intervention scenarios and outputs, some analyses include only a subset of the models. To maximise comparability, the 40% reduction in transmission due to combination with other preventive interventions assumed by Granich and colleagues in [34] was not incorporated here.

Box 1. Comparative Description of Mathematical Models
This box elaborates on the comparison of aspects of the models' structure, assumptions, and parameterization presented in Table 1. Specific details about the structure and implementation of each of the models are available in the references included in Table 1 or from the HIV Modelling Consortium (http://www.hivmodelling.org/plos-medicinespecial-collection). Many of the models allow individuals to have different propensities for sexual risk behaviour. Each of the microsimulation models allows individuals to have both long-term (or marital) partnerships and short-term (or informal or casual) partnerships that are different in duration, and individuals have heterogeneous propensities to form shortterm partnerships. In the STDSIM model a proportion of the population engages in commercial sex work partnerships; in the EMOD model a proportion can have transitory partnerships, a third partnership type that is shorter than a casual partnership. Among the microsimulation models, EMOD and STDSIM explicitly simulate the sexual partnership network, while the Bendavid and Synthesis Transmission models calculate the risk of acquiring HIV for an individual in a partnership by sampling the distribution of viral load across, respectively, the entire population and potential partners. The deterministic models assume that sexual contacts occur instantaneously. The BBH, Granich, and HIV Portfolio models assume that all individuals form new contacts at the same rate and mix homogeneously. The other deterministic models stratify the population into risk groups that form new contacts at different rates (Eaton, Fraser, and Goals: three groups; STI-HIV Interaction: two groups). The STI-HIV Interaction and Goals models additionally include commercial sex workers, and the Goals model includes transmission among men who have sex with men and injecting drug users. The STI-HIV Interaction model separates both the lowand high-risk groups into those with short-term or long-term partnerships or both. The Eaton, Fraser, and STI-HIV Interaction models all include a degree of ''assortative'' mixing (preferential partnership formation with those in the same risk group), and all partnerships are formed in the same risk group in the Goals model, except for low-risk men and women who are married to high-risk partners. The CD4 HIV/ ART model does not explicitly model sexual mixing but rather calculates the number of new HIV infections by multiplying the current number of HIV-infected adults by a fixed force of infection calculated from the Spectrum model projection for South Africa. All of the models except for the Granich model simulate different stages of HIV infection that affect the transmissibility of an individual, including a period of elevated infectiousness during the first few weeks of infection and increased transmission during later stage infection. Parameters governing the relative transmissibility during early infection are based principally on two sources: a metaanalysis of HIV transmission per coital act by Boily et al. [68], which estimated a 10-fold increase in transmission relative to asymptomatic infection (BBH, CD4 HIV/ART, Goals, and STI-HIV Interaction), or a reanalysis of data from Rakai, Uganda [70], by Hollingsworth et al. [69], which estimated a 26-fold increase (Eaton, EMOD, Fraser, and Synthesis Transmission). Relative transmissibility after the early stage is according to clinical stage (asymptomatic and AIDS: BBH, CD4 HIV/ART, EMOD, Goals; asymptomatic, pre-AIDS, and AIDS: STDSIM, STI-HIV Interaction) or CD4 count (Eaton, Fraser, and HIV Portfolio). The Bendavid and Synthesis Transmission models simulate the change in viral load for infected individuals and associate HIV transmission with this according to an empirically described relationship [6]. Many models assume an increased risk of male-to-female transmission compared to female-to-male transmission, and attenuation in femaleto-male transmission due to male circumcision. The Goals, STDSIM, and Synthesis Transmission models include an increased risk of HIV transmission in the presence of other sexually transmitted infections. The models that simulate each individual's viral load (Bendavid and Synthesis Transmission) mechanistically relate reduction in transmission on treatment to the effect of ART on viral load, while the other models all assume a reduction in transmission of greater than 90% for individuals on ART. The Bendavid, Eaton, and Synthesis Transmission models simulate a period of a few months of incomplete viral suppression after ART initiation before the full reduction in infectiousness is achieved. These three models and EMOD include a return to higher infectiousness during treatment failure. The remaining models assume a fixed reduction in transmissibility as soon as treatment is started, until either death on ART or dropout from treatment. The Bendavid and Synthesis Transmission models simulate switching to second-line ART upon an immunologic (Bendavid) or virologic (Synthesis Transmission) failure event. The Synthesis Transmission model is the only model to explicitly simulate heterogeneous adherence to treatment between patients and the emergence and impact of resistance. The models vary in their assumptions about what happens to an individual after dropping out from treatment. The CD4 HIV/ART, Fraser, Goals, Granich, and HIV Portfolio models return individuals who drop out to an untreated state, allowing them to restart treatment in exactly the same manner as those that have never been treated, while the Bendavid, STDSIM, STI-HIV Interaction, and Synthesis Transmission models do not allow individuals to start treatment again in the implementation for this exercise. Eaton allows individuals to restart treatment, but only once, and EMOD allows half of individuals to restart treatment after they once again satisfy the eligibility criterion. Eleven of the models simulate the South African national HIV epidemic, while the STDSIM model has been calibrated specifically to the higher prevalence Hlabisa subdistrict of KwaZulu-Natal Province, South Africa. Nine models were calibrated to reproduce the historical time series of HIV prevalence in South Africa, while the BBH, HIV Portfolio, and Bendavid models were initialized using the current epidemic state in the years 2009, 2011, and 2012, respectively, and simulated forward from that point. Most of the models were calibrated to yield a single set of model parameters and outputs. Two of the models (Eaton and STI-HIV Interaction) were calibrated using a Bayesian framework allowing for uncertainty in model parameters, which produces a joint posterior distribution of parameter combinations consistent with the observed HIV epidemic [43,84]. The STI-HIV Interaction model allows for uncertainty in sexual behaviour, the natural history of HIV infection, and the effect of ART, while the Eaton model only allows for uncertainty in sexual behaviour and sexual mixing parameters. Many of the models include facilities to simulate HIV testing and diagnosis, retention in care prior to treatment eligibility, and other processes related to achieving successful treatment, but these were not implemented for this exercise in order to conform to the simple intervention scenarios.

Scenarios Representing the Existing ART Programme in South Africa
In a separate analysis, seven of the models (CD4 HIV/ART, Eaton, Fraser, Goals, Granich, STDSIM, and STI-HIV Interaction) were used to estimate the impact that the existing scale-up of ART in South Africa has had on HIV incidence and prevalence by comparing model simulations that include the ART scale-up over the past decade with the no-ART counterfactual. Models either used an existing calibration to the number of people on ART in South Africa (Fraser and STDSIM) or were calibrated using estimates of the number of adults starting and on ART in each year from 2001 to 2011 [45] (CD4 HIV/ART, Eaton, Goals, Granich, and STI-HIV Interaction).
Five models (Bendavid, CD4 HIV/ART, Eaton, Goals, and Granich) constructed short-term projections of HIV incidence in South Africa assuming different trajectories for continued ART scale-up from 2011 to 2016, the period covered by South Africa's national strategic plan [46]. Starting from the number of patients on ART in mid-2011, the numbers of adults starting ART in each of the years from mid-2011 through mid-2016 was specified. A ''baseline'' scenario was considered in which 400,000 adults would start ART in each of the next 5 y (approximately the number who started ART in 2009), for a total of 2 million new adults initiating ART. Three other scenarios were considered for the total numbers starting ART over the same period: (i) ''low''-1.2 million start ART; (ii) ''medium''-3 million start ART; and (iii) ''high''-3.9 million start ART. (The exact number starting in each year is listed in Table 2.) The HIV incidence rate at the midpoint of 2016 and the cumulative number of new adult HIV infections over the period 2011 to 2016 were reported for each of these scenarios. For these projections, assumptions regarding CD4 distributions at ART initiation and rates of retention were based on actual treatment guidelines and programme experiences, but were not standardised across models. Figure 1 shows HIV prevalence and HIV incidence in 15-to 49y-old males and females simulated by each of the models under the counterfactual assumption of no ART provision.

Results
Other epidemiologic statistics are presented in Table 3. The estimates of adult male HIV prevalence in year 2012 ranged between 10% and 16%, and estimates of female prevalence between 17% and 23%. Male HIV incidence in year 2012 ranged between 1.1 and 2.0 per 100 person-years, and female incidence ranged between 1.7 and 2.6. The STDSIM model calibrated to KwaZulu-Natal Province simulated a considerably larger burden of HIV, consistent with observation [47], with prevalences in year 2012 of 23% and 33% in males and females, respectively, and incidence rates of 3.0 and 3.9 per 100 person-years, respectively. All of the sex-stratified models simulated higher HIV prevalence in adult women than in men, with sex ratios in HIV prevalence in year 2012 between 1.2 and 1.7, and all of the models except for Bendavid simulated higher incidence in year 2012 in females than in males.
Nearly all of the models projected declines in HIV incidence after 2012 in the absence of ART, but the magnitude of the projected natural changes between 2012 and 2050 varied widely from almost no change (Goals and Granich) to greater than 45% reduction (Bendavid and Synthesis Transmission).
Model projections of future national population growth in the absence of ART varied widely, ranging from a 6% reduction to a 13% increase in the population aged 15 y and older between the years 2012 and 2020. For comparison, the low and high variants for the projected total population growth from the United Nations Population Division over the same period (which incorporates some assumptions about ART provision) are 1.5% and 6.1% [48]. Figure 2 presents the outcomes of an intervention starting in year 2012 with ART eligibility at CD4 count #350 cells/ml, reaching 80% of those requiring treatment, and retaining 85% of patients on ART after 3 y. This scenario reflects an optimised implementation of the current World Health Organization treatment guidelines [39] and the Joint United Nations Programme on HIV/AIDS definition for ''universal access'' of reaching 80% of those in need [32]. Compared to the notreatment counterfactual scenario, ART provision reduced incidence in year 2020 by 35% to 54% across all models (Figure 2A). There was much greater variation, however, in the estimated longterm impact of the intervention. In year 2050, the range of the predicted reduction in incidence was from 32% to 74%. The relative impact of the ART intervention on HIV incidence decreased between 2020 and 2050 in four models and increased in seven.

Number of Person-Years of ART per Infection Averted
There was considerable variation between models in estimates of the number of person-years of treatment per infection averted. For the scenario described above, the range of estimates for the number of person-years of ART per infection averted between 2012 and 2020 was between 6.3 and 18.7, and over the period 2012 to 2050, the range was 4.5 to 20.2 ( Figure 2B). The four models with the greatest estimates of the number of person-years of ART provided per infection averted (Eaton, EMOD, STI-HIV Interaction, and Synthesis Transmission) all explicitly included variation in transmission by age (e.g., allowing for reduced impact through ART provision to older adults who are less sexually active and hence less likely to expose susceptible individuals), whereas the other models did not assume reduced transmission by older people.
(STDSIM allows for decreased sexual activity for those older than 50 and has the lowest estimate of person-years of ART per infection averted, but simulates a much higher HIV incidence.)

Determinants of Programme Impact
The impact on incidence of increasing access from 50% to 100%, improving 3-y programme retention from 85% to 100%, and changing the CD4 threshold for treatment eligibility, is shown for each model in Figure 3. The reduction in incidence increases approximately linearly with access in all models. In    most models, improvements in retention in care led to greater impact of treatment on HIV incidence. The benefit of improving retention was minimal for the Fraser, Granich, and HIV Portfolio models. Each of these models regards individuals who have dropped out of treatment identically to untreated eligible individuals, allowing them to start treatment again on average within 1 y. In several models, improved retention means that the impact improves more rapidly with increasing access (i.e., the slope in reduction in incidence as access increases is steeper for higher retention). Figure 4 shows how the number of person-years of ART provided per infection averted up to year 2020 varied in relation to the intervention programme. There were no consistent trends across all models. In some models, with earlier initiation of treatment, fewer years of ART were required per infection averted (efficiency increases), whereas the opposite was predicted in others.

STI−HIV Interaction
ART access 50

Treatment Eligibility and the Theoretical Impact of ''Test and Treat''
The models varied in their predictions as to the relative benefit of increasing treatment eligibility from a CD4 threshold of #200 cells/ml (national guidelines in some settings and close to actual experience in many) to #350 cells/ml (international guidelines) compared to further increasing eligibility to all infected individuals ( Figure 3). The Bendavid, CD4 HIV/ART, Goals, HIV Portfolio, and Synthesis Transmission models all predicted that there would be only a relatively modest benefit in moving from initiation at #200 cells/ml to #350 cells/ml, and a much greater benefit in moving from initiation at #350 cells/ml to immediately upon diagnosis of HIV infection. In contrast, the BBH model simulated very little benefit in moving from the #350 cells/ml threshold to immediate eligibility. The Eaton, Fraser, and EMOD models showed similar benefits associated with each of the increments at moderate levels of access.
One important argument that has been made for immediate ART is that commitment of a large amount of ART now could reduce the cumulative amount of ART required in the future as a result of averted HIV infections [2,49]. Whether such savings could occur was evaluated by investigating whether the cumulative person-years of treatment through year 2050 to implement immediate treatment is less than the amount of ART required when treating after the CD4 count falls below 350 cells/ml for the same levels of access and retention. In six (BBH, CD4 HIV/ART, Fraser, Goals, HIV Portfolio, and STDSIM) out of eleven models (excluding STI-HIV Interaction) this was not the case: increasing eligibility from CD4#350 cells/ml to immediate initiation always required more person-years of treatment, even with ''perfect'' ART programmes (100% access and 100% retention). However, for the EMOD model, expanding eligibility from CD4#350 cells/ ml to all HIV-infected adults required fewer cumulative personyears of treatment in all intervention scenarios (including access as low as 60% and retention in care as low as 75%). The Synthesis Transmission model found expanding access to be ART-saving with 70% access and retention above 95%, or with 80% access and retention above 85%. The other three models that found that expanding access could be ART-saving required more demanding assumptions about programmes: according to the Granich model, immediate initiation would be ART-saving if access were above 90% and retention above 95%; according to the Eaton model, access and retention would need to exceed 95%; and according to the Bendavid model, access and retention would both need to be 100%.
In an intervention treating all HIV-infected adults with 95% access and 95% retention, three (CD4 HIV/ART, EMOD, and HIV Portfolio) out of nine models (excluding BBH, Bendavid, and STI-HIV Interaction) predicted that HIV incidence would fall below 0.1% per year by 2050. The Granich model, which was used to argue the case for HIV elimination using treatment, projected that incidence in South Africa would be 0.13% under this scenario, a 92% reduction (in the original published projections, there was an assumption that risk of infection would fall by an additional 40% due to other interventions [34]).

Understanding Differences between Model Predictions
One factor expected to influence how much ART reduces HIV is the fraction of all transmission that occurs after individuals reach treatment eligibility thresholds, in the absence of any treatment [50]. Figure 5A shows the proportion of transmissions that occur from individuals in each CD4 count range in the counterfactual simulation in year 2012. Of the models that include a period of early infection, the percentage of new infections that occurs during this stage is between 4% and 28%, while between 20% and 51% of transmission results from individuals with CD4 cell count #200 cells/ml. These percentages of transmission after ART eligibility can be compared with the percentage reduction in incidence in year 2020 ( Figure 5B). Here, it is assumed that access is 80% and 3-y retention in care is 85%. Although this comparison explains why, within one model, earlier treatment initiation reduces HIV incidence more, the amount of between-model variation in projected impact explained by the distribution of transmission by CD4 count is modest. R 2 values for this relationship were 0.28, 0.20, and 0.40 for eligibility at CD4#200, eligibility at CD4#350, and immediate eligibility, respectively. The correlation did not improve when considering higher access or higher retention scenarios.
Two other factors hypothesized to explain the differences between the model projections are different assumptions about the efficacy of ART in reducing transmission-between 90% and 99%-and different assumptions about the outcomes of individuals who drop out from treatment programmes. To test the importance of these factors, selected intervention scenarios were repeated under the artificial assumption that an individual never transmits after initiating treatment (treatment is 100% efficacious at preventing transmission, and retention on treatment is 100%). This assumption increased the intervention impact in every model, but, surprisingly, did not reduce the variation in the results between models or improve the ability of factors such as different model assumptions about CD4 progression, HIV transmission, or the future trajectory of HIV incidence to explain the variation.

Synthesis Transmission
ART eligibility: CD4 < 200 CD4 < 350 all existing calibrations to ART coverage levels in the Western Cape and KwaZulu-Natal Provinces, respectively. All of the models predicted that ART should already have had a substantial impact on the HIV epidemic, estimating that HIV incidence in year 2011 was between 17% and 32% lower than it would have been in the absence of ART. The increasing impact on HIV incidence over time mirrors the steep increase every year in the number of people starting treatment during this period. The impact on prevalence was more modest and less consistent across models. The Eaton and STI-HIV Interaction models estimated that prevalence is around 8% higher than it would have been without treatment (an absolute increase in prevalence of one percentage point) due to the increased survival for those infected with HIV. The Fraser and Granich models suggest that this effect is offset by the reductions in incidence, so that there is no net change in prevalence. It is unlikely that standard surveillance methods based on monitoring trends in prevalence would have detected this impact, despite the significant underlying reductions in incidence.
The estimated potential impact of further ART scale-up is summarised in Table 4. In the baseline scenario, where 400,000 people are started on ART each year, the models estimated that incidence would be reduced in 2016 by between 13% and 26% compared to the incidence rate in 2011. If 800,000 fewer people are put on ART, then between 39,000 and 186,000 more new adult HIV infections would occur over the period 2012 to 2016 than under the baseline scenario. If more people are put on ART-3.0 or 3.9 million over the next 5 y-then the models estimated that the number of new infections over the 5-y period would be reduced by 64,000 to 327,000 and 270,000 to 521,000, respectively, compared to the baseline. The table underscores that there are still substantial potential preventive benefits from expanding ART coverage in South Africa. The models that tended to estimate the greatest reduction in incidence in hypothetical programmes over the medium term (CD4 HIV/ ART, Goals, and Granich) also tended to project greater reductions in incidence over the short term in these more realistic scenarios.

Discussion
The mathematical models used to simulate the impact of treatment on HIV incidence in South Africa are diverse in their structure, level of complexity, representation of the HIV transmission process and the ART intervention, and parameter choices. All twelve of the models compared in this analysis predicted that treatment could substantially reduce HIV incidence-even using past or existing treatment guideline eligibility criteria, provided that coverage is high. Only three (CD4 HIV/ART, EMOD, and HIV Portfolio) out of nine models (excluding BBH, Bendavid, and STI-HIV Interaction), however, predicted that treatment could reduce HIV incidence below 0.1% by year 2050 (the definition of ''elimination'' established by [34]), even with very high access and retention. When simulating the historical scale-up of ART in South Africa, the models indicated that ART may already have reduced HIV incidence by between 17% and 32% in 2011, compared to what would have been expected in the absence of ART. Although there have been ad hoc informal model comparison exercises [51], collections of work using standardised assumptions for interventions [52], and thorough model comparisons involving a few research groups [53,54], to our knowledge, this exercise is the first to bring together such a large number of independent modelling groups to examine the same set of interventions. We hope that this will provide a foundation for much more collaborative work.
In this study we set out to test whether different models of the potential impact of treatment on new HIV infections in South Africa would make similar predictions when implementing the same intervention scenarios. We found substantial consistency between the model projections of the impact of ART interventions on HIV incidence in the short term (8 y). However, there was more variation in the predicted longer term (38 y) reductions in incidence, and models also produced divergent estimates of the number of person-years of ART provided per infection averted. While establishing where models agree and disagree about the epidemiological impact of ART represents an important scientific finding in itself, the substantial variation in the long-term impact and efficiency of interventions demands further investigation and explanation.
Based on epidemiological theory and previous modelling studies, we hypothesized a number of model attributes that might explain differences in model predictions about the impact of ART, including the amount of transmission in different stages of HIV infection, the assumed efficacy of ART for preventing transmission, opportunities for treatment reinitiation following dropout from a treatment programme, the age and sex structure of the population, future population growth rates, the degree of heterogeneity and assortativity in sexual mixing, the future trajectory of HIV incidence in the absence of intervention, and the inclusion of changes in sexual behaviour over the past decade. There was indeed substantial variation between the models in their characterisation of each of these aspects of the system, largely reflecting the true uncertainties that persist even after decades of tremendous research into the epidemiology of HIV in South Africa. We were able to show that crude differences in the proportion of transmission at each stage of infection explained a modest amount of the variation in the short-term impact of ART, but less of the long-term impact. However, beyond this, findings from the models did not appear to clearly support any of these hypotheses in univariate analyses, likely because of the large number of processes that interact nonlinearly to create HIV epidemics and interventions. For example, projecting a seemingly simple quantity such as the number of person-years of ART that will be provided in an intervention depends on future population growth, the natural trend in the epidemic, the proportion of HIVinfected individuals qualifying for treatment, retention and survival on ART, and the impact that ART provision has on future HIV incidence. This situation contrasts with that of an earlier exercise that compared predictions of the impact of male circumcision interventions [51], where the relationship between the established efficacy of the intervention and population-level impact was less complicated.
Having investigated the extent to which differences in ART programmes determine differences in results, the natural focus for future model comparison studies should be to explore the contribution of other hypotheses through incrementally standardising biological, behavioural, and demographic model parameters, and calibrating models to the same levels of HIV prevalence and incidence. A systematic approach to standardising model parameters would identify which parameters most significantly influence the results and guide priorities for future data collection. The HIV Modelling Consortium (http://www.hivmodelling.org) will coordinate such research efforts in coming months to investigate the extent to which variation in model predictions is driven by differences in underlying models of sexual mixing, or different models of the natural history of infection and epidemic trajectory.
Although our experiment and analysis has focused on how factors included in models can affect model predictions, it is important to note that if all models exclude an important aspect of the system, they could all be wrong. Early models of the impact of ART on HIV incidence were very focused on the concern that increased sexual risk behaviour might offset the reduction in transmission for those on treatment, but for this exercise all of the models assumed that population risk behaviour would not change in response to the introduction of ART. This may be a reasonable assumption given consistent evidence that patients report safer sexual behaviour after starting ART [55][56][57][58][59] and given the relative lack of information from sub-Saharan Africa about how the untreated and HIV-negative population responds to the availability of treatment [60]. But in other epidemic settings the availability of ART has been associated with receding gains in protective behaviour [61][62][63], and monitoring this in sub-Saharan African settings will be a priority for surveillance over coming years. The models also all assumed high efficacy of ART to reduce transmission. True effectiveness will depend on adherence and the level of viral suppression, which is mainly determined by adherence levels. While there are some data from South Africa on viral suppression rates outside carefully controlled trial settings [64], further information on this and on patterns of acquired and transmitted resistance will help in the calibration of models. Only one of the models in this exercise (Synthesis Transmission) explicitly incorporated the effect of antiretroviral drug resistance on the impact of ART interventions. Models have predicted that antiretroviral drug resistance could be widespread in sub-Saharan Africa in coming decades [20], which could eventually lead to the spread of transmitted drug resistance [65,66]. This could affect the long-term costs and efficacy of treatment-as-prevention strategies [67].
Another finding from systematically comparing models is that often seemingly independent modelling studies rely on the same limited data. Nearly all of the models relied on two sources to derive parameters for elevated infectiousness during the first few weeks of infection [68,69], but both of these sources are based principally on data from a few retrospective couples in Rakai, Uganda [70] (see [71]). This highlights both how invaluable these data are and also the importance of recognising the dependencies between seemingly independent modelling studies. However, even using the same data, models may reach different conclusions. The Eaton, EMOD, and Fraser models all in some way used the estimates of early HIV infectivity from [69] but estimate very different contributions of this stage to overall HIV transmission ( Figure 4A), and the three models all reached different conclusions Table 4. HIV incidence rate per 100 person-years in year 2016 for different potential scenarios of future ART scale-up.  from those in another recent modelling study relying on these same estimates [72]. The purpose of this exercise was not to draw conclusions or recommendations about specific ART intervention strategies, but rather to test the hypothesis that a range of different models would come to similar conclusions about the impact of ART on HIV incidence when the same interventions were modelled. The simulated interventions were artificially simple and stylized to enable comparison between models. These did not explicitly simulate the steps of HIV testing, diagnosis, linkage to care, and adherence to ART required to achieve the access levels specified in the intervention scenarios (although several of the models include facility for this and have investigated this in independent analyses). Interpretation of models simulating high levels of treatment coverage should be cautioned by data suggesting that at present fewer than one-third of patients in sub-Saharan Africa are continuously retained in care from HIV diagnosis to ART initiation [73], and that barriers remain to access to and uptake of HIV testing [32]. The models assumed that all individuals eligible for treatment were equally likely to access treatment, which might not be true in practice (for example, women are more likely to start treatment than men [74]). The comparison scenario (counterfactual) against which interventions were evaluated assumed no treatment at all, which made it easier to compare models, but is clearly not the relevant benchmark for policymakers. This study has also considered treatment in isolation from other interventions, even as there is broad consensus that ''combination prevention'' strategies are presently the best strategy for attacking the epidemic [41,75].
We hope that this study will help to characterise the models that are being used to investigate questions related to the impact of HIV treatment and enable those who rely on models for decisionmaking to think critically about how the assumptions underlying models affect the results. The relative consistency between models' estimates of the short-term epidemiological impact of ART, including the impact of the existing ART programme, provides some reassurance that model projections on this time scale may be relatively robust to the substantial uncertainties in parameters and systems. This is a significant result considering that such shortterm projections are often the most relevant for policy and resource allocation questions. On the other hand, the substantial variation in long-term epidemiological impacts and efficiency of ART, upon which arguments of substantial epidemic reduction and cost savings hinge, suggests that results in these areas from any single model should be extrapolated with caution. Care should be taken to ensure that models evaluating the long-term costs, benefits, and cost-effectiveness of treatment programmes adequately communicate the degree and myriad sources of uncertainty that influence these outputs.
A common question when faced with a diversity of model results is whether some models are ''better'' or ''worse''. Without data against which to test the predictions of models, it is not possible to answer this question in a study such as this, nor is this the correct question to be asking. Rather, users of model outputs should ask whether models include the necessary components to capably answer the specific questions at hand, and whether the models make credible assumptions in light of the information available, and choose models accordingly. Evaluated along these guidelines, the most appropriate models will vary between applications, so there is no single ''best'' model. However, in this exercise, the models that tended to project more ''pessimistic'' outcomes for the interventions seemed to do so for important reasons. For example, models that estimated poorer efficiency of ART for averting infections tended to be those that simulated ART provision for those at older ages, who might be at lower risk of transmitting, or included the elevated risk of transmission for those failing treatment, whereas models with more optimistic predictions assumed that risk behaviour did not vary by age or that transmission was fully suppressed immediately upon beginning treatment until death on ART or dropout. Artificial convergence of models should be avoided when true uncertainties persist about the system. It is incumbent upon modellers to incorporate and communicate uncertainty in projections, and identify which components of the system account for the uncertainty. For this exercise, only one model (STI-HIV Interaction) included a comprehensive analysis accounting for uncertainty about basic epidemiology and intervention efficacy. While the focus of the study was on variation between models, it is interesting to observe that the 95% credible interval representing parameter uncertainty for this model encompassed the point estimates of the other eleven models.
Fortunately there will be important new opportunities in the near future to test, validate, and improve epidemiological models of HIV treatment. These include comparing projections to the experience of expanded ART in industrialised countries [61,63], the observed impact of ART in well-characterised communities [76], and results of a number of community-randomized trials of treatment as prevention that will soon be underway [44]. As new data are reported, the accuracy of models projecting the impact of treatment as prevention should improve, and we expect that validated and scientifically based model projections will continue to be central in understanding how ART can have the greatest impact in mitigating the global HIV epidemic.

Editors' Summary
Background. Following the first reported case of AIDS in 1981, the number of people infected with HIV, the virus that causes AIDS, increased rapidly. In recent years, the number of people becoming newly infected has declined slightly, but the virus continues to spread at unacceptably high levels. In 2010 alone, 2.7 million people became HIV-positive. HIV, which is usually transmitted through unprotected sex, destroys CD4 lymphocytes and other immune system cells, leaving infected individuals susceptible to other infections. Early in the AIDS epidemic, half of HIV-infected people died within eleven years of infection. Then, in 1996, antiretroviral therapy (ART) became available, and, for people living in affluent countries, HIV/AIDS gradually became considered a chronic condition. But because ART was expensive, for people living in developing countries HIV/AIDS remained a fatal condition. Roll-out of ART in developing countries first started in the early 2000s. In 2006, the international community set a target of achieving universal ART coverage by 2010. Although this target has still not been reached, by the end of 2010, 6.6 million of the estimated 15 million people in need of ART in developing countries were receiving ART.
Why Was This Study Done? Several studies suggest that ART, in addition to reducing illness and death among HIVpositive people, reduces HIV transmission. Consequently, there is interest in expanding the provision of ART as a strategy for reducing the spread of HIV (''HIV treatment as prevention''), particularly in sub-Saharan Africa, where one in 20 adults is HIV-positive. It is important to understand exactly how ART might contribute to averting HIV transmission. Several mathematical models that simulate HIV infection and disease progression have been developed to investigate the impact of expanding access to ART on the incidence of HIV (the number of new infections occurring in a population over a year). But, although all these models predict that increased ART coverage will have epidemiologic (population) benefits, they vary widely in their estimates of the magnitude of these benefits. In this study, the researchers systematically compare the predictions of 12 mathematical models of the HIV epidemic in South Africa, simulating the same ART intervention programs to determine the extent to which different models agree about the impact of expanded ART.
What Did the Researchers Do and Find? The researchers invited groups who had previously developed mathematical models of the epidemiological impact of expanded access to ART in South Africa to participate in a systematic comparison exercise in which their models were used to simulate ART scale-up scenarios in which the CD4 count threshold for treatment eligibility, access to treatment, and retention on treatment were systematically varied. To exclude variation resulting from different model assumptions about the past and current ART program, it was assumed that ART is introduced into the population in the year 2012, with no treatment provision prior to this, and interventions were evaluated in comparison to an artificial counterfactual scenario in which no treatment is provided. A standard scenario based on the World Health Organization's recommended threshold for initiation of ART, although unrepresentative of current provision in South Africa, was used to compare the models. In this scenario, 80% of HIV-infected individuals received treatment, they started treatment on average a year after their CD4 count dropped below 350 cells per microliter of blood, and 85% remained on treatment after three years. The models predicted that, with a start point of 2012, the HIV incidence would be 35%-54% lower in 2020 and 32%-74% lower in 2050 compared to a counterfactual scenario where there was no ART. Estimates of the number of person-years of ART needed per infection averted (the efficiency with which ART reduced new infections) ranged from 6.3-18.7 and from 4.5-20.2 over the periods 2012-2020 and 2012-2050, respectively. Finally, estimates of the impact of ambitious interventions (for example, immediate treatment of all HIV-positive individuals) varied widely across the models.
What Do These Findings Mean? Although the mathematical models used in this study had different characteristics, all 12 predict that ART, at high levels of access and adherence, has the potential to reduce new HIV infections. However, although the models broadly agree about the short-term epidemiologic impact of treatment scale-up, their longer-term projections (including whether ART alone can eliminate HIV infection) and their estimates of the efficiency with which ART can reduce new infections vary widely. Importantly, it is possible that all these predictions will be wrong-all the models may have excluded some aspect of HIV transmission that will be found in the future to be crucial. Finally, these findings do not aim to indicate which specific ART interventions should be used to reduce the incidence of HIV. Rather, by comparing the models that are being used to investigate the feasibility of ''HIV treatment as prevention,'' these findings should help modelers and policymakers think critically about how the assumptions underlying these models affect the models' predictions.