Bridging the gap between HIV epidemiology and antiretroviral resistance evolution: Modelling the spread of resistance in South Africa

The scale-up of antiretroviral therapy (ART) in South Africa substantially reduced AIDS-related deaths and new HIV infections. However, its success is threatened by the emergence of resistance to non-nucleoside reverse-transcriptase inhibitors (NNRTI). The MARISA (Modelling Antiretroviral drug Resistance In South Africa) model presented here aims at investigating the time trends and factors driving NNRTI resistance in South Africa. MARISA is a compartmental model that includes the key aspects of the local HIV epidemic: continuum of care, disease progression, and gender. The dynamics of NNRTI resistance emergence and transmission are then added to this framework. Model parameters are informed using data from HIV cohorts participating in the International epidemiology Databases to Evaluate AIDS (IeDEA) and literature estimates, or fitted to UNAIDS estimates. Using this novel approach of triangulating clinical and resistance data from various sources, MARISA reproduces the time trends of HIV in South Africa in 2005–2016, with a decrease in new infections, undiagnosed individuals, and AIDS-related deaths. MARISA captures the dynamics of the spread of NNRTI resistance: high levels of acquired drug resistance (ADR, in 83% of first-line treatment failures in 2016), and increasing transmitted drug resistance (TDR, in 8.1% of ART initiators in 2016). Simulation of counter-factual scenarios reflecting alternative public health policies shows that increasing treatment coverage would have resulted in fewer new infections and deaths, at the cost of higher TDR (11.6% in 2016 for doubling the treatment rate). Conversely, improving switching to second-line treatment would have led to lower TDR (6.5% in 2016 for doubling the switching rate) and fewer new infections and deaths. Implementing drug resistance testing would have had little impact. The rapid ART scale-up and inadequate switching to second-line treatment were the key drivers of the spread of NNRTI resistance in South Africa. However, even though some interventions could have substantially reduced the level of NNRTI resistance, no policy including NNRTI-based first line regimens could have prevented this spread. Thus, by combining epidemiological data on HIV in South Africa with biological data on resistance evolution, our modelling approach identified key factors driving NNRTI resistance, highlighting the need of alternative first-line regimens.

T 2 , respectively for rst and second-line treatment), "suppressed" (S 1 and S 2 ) and "failed" (F 1 and F 2 ). An individual is considered as having "started treatment" (T 1 or T 2 ) if he initiated treatment less than 3 months ago. Afterwards, he/she is considered either as suppressed if VL < 1000 cp/ml, or as failed otherwise. The four dierent CD4 strata are represented by the letter i in the equations and are respectively: CD4 > 500cells/µl (i = 1), 350 < CD4 < 500c/µl (i = 2), 200 < CD4 < 350c/µl (i = 3) and CD4 < 200c/µl (i = 4). NNRTI resistance is represented by j, and its value is 0 if an individual is susceptible to NNRTI and 1 otherwise. The gender dimension, represented by k, takes 0 for male and 1 for female. The indices i, j and k are used in equations in order to specify a particular layer of each dimension. When an index is missing, it means that we have summed over all the layers that the dimension contains (e.g. I k (t) := i,j I i,j,k (t)).

Modelling HIV transmission
The number of newly infected individuals per time step (1 month) is split in three parts, characterising the three dierent transmission routes: from men to men (men who have sex with men -MSM), from men to women and from women to men (heterosexual transmission -HET). Let k and k respectively be the gender of a susceptible (HIV-negative) and an infected individual. Assuming a density-dependent transmission and dierent risk behaviours between infected individuals knowing or not their status, the number of individuals of gender k that have been newly infected by individuals of gender k is: where I k,k , Susc k,k and N k,k represent respectively the number of infected people, the number of susceptible people and the total number of people of gender k that have unprotected intercourse with the gender k , β d and β u the frequency of unprotected intercourse per month, respectively for infected people knowing their status (diagnosed) and for those who do not (undiagnosed) and ν k,k the probability that an unprotected intercourse leads to a new infection between an infected individual of gender k and a susceptible individual of gender k. As HET and MSM populations were not formally split in the model, we approximated I k ,k N k ,k by I k N k (where I k := k I k ,k and N k := k N k ,k ), assuming a similar HIV prevalence in MSM as in the overall male population. We also replaced Susc k,k by ρ k,k Susc k , where ρ k,k represents the proportion of people having sexual intercourse with gender k among sexually active people of gender k and where Susc k = Susc k,0 + Susc k,1 . We assumed that ν 1,1 = 0 as there is no risk of infection during a sexual intercourse between two women. The proportion of MSM among men ρ 0,0 is 0.05, as reported by Anova Health Foundation [1]. Similarly, we set the proportion of sexually active women that have a sexual intercouse with men ρ 1,0 to 0.95. Using Eq 1, the number of newly infected individuals of gender k per time step is: We assumed that all newly infected individuals arrive at the rst CD4 stratum (i = 1). See Eq 18 for more details.

Modelling mortality
Mortality rates dier according to CD4 counts and care stages. We combined the results of two studies in order to estimate the relative risks for each group [2,3]. The group of suppressed individuals with CD4 > 500c/µl is dened as the reference as they have the lowest risk. To convert the relative risks into rates, we used a free parameter µ, which corresponds to the mortality rate of the reference group (see Table 5). As [2] observed a high heterogeneity in mortality risk among people with CD4 < 200c/µl depending on CD4 counts, mortality was modelled dierently for this class, based on what has already been done in Thembisa model [4]. Instead of assuming a xed risk as for the other classes, we allowed the mortality rate of the last CD4 stratum (CD4 < 200c/µl) to vary within a range, according to its proportion of people with very low CD4 counts. The upper level of the range corresponds to the relative risk in the scenario where all individuals have CD4 < 50c/µl, while the lower level corresponds to the scenario where all individuals have CD4 > 50c/µl. As we do not have any accurate information over time about this proportion, we used rate of ART initiation as a proxy. We assumed that average treatment rate of the 3 previous years determines the proportion of individuals with CD4 > 50c/µl, as shown in Eq 3. This average corresponds to the increase in treatment rate relative to 2005. We assumed an exponential decrease of the proportion of people with CD4 < 50c/µl when the treatment rate is increasing, in line which structure of the compartmental model assuming exponential distribution of the rates.

Modelling diagnosis and treatment rate Diagnosis rate
To model the diagnosis rate, we needed to distinguish three dierent types of testing: 1) testing asymptomatic individuals γ k diag1 , 2) testing symptomatic (opportunistic infection -OI) γ i diag2 and 3) testing pregnant women γ i,k diag3 . The overall diagnosis rate for a given CD4 stratum i and gender k is thus: [6]. The diagnosis rate γ k diag1 has increased over the years, as a consequence of the augmentation in the number of HIV-tests performed (in the asymptomatic population). To model the increase over time, two free parameters are included into the diagnosis rate: one representing the diagnosis rate in 2005 for men, and one representing its increase between 2005 and 2015. We assumed similar diagnosis rates for the four CD4 strata. As reported in [6], diagnosis rate varies across gender, being higher for women. The diagnosis for women is thus: γ k=1 diag1 = γ k=0 diag1 /p I→D , where p I→D is a xed parameter [6]. The rate of diagnosis due to OI γ i diag2 depends on CD4 counts as OI is more likely to occur with low CD4 counts. Following what has been done by Thembisa [4], we set γ i diag2 as: γ i diag2 = inc i OI ·γ test2 (t), where inc i OI represents the OI incidence and is set as 0.05/(1000person·year) for ind. with CD4 > 500c/µl, 0.12/(1000py) for ind. with 350 < CD4 < 500c/µl, 0.27/(1000py) for ind. with 200 < CD4 < 350c/µl, 0.9/(1000py) for ind. with CD4 < 200c/µl. γ test2 (t) represents the monthly testing rate for individuals having an OI. To model its increase over time, we used a sigmoid function increasing from 2% in 2005 to 8% in 2015 [4]. γ i,k diag3 model the increased diagnosis rate due to pregnancy. It is set as 0 for men and decreases with CD4 counts, as fertility rate is lower for women with low CD4 counts. This rate is: where θ birth = 23/(1000py) is the birth rate in the overall South African population, θ i birth CD4 the decreased in birth rate according to CD4 counts and γ k test3 (t) the monthly testing rate for pregnant women [4]. θ i birth CD4 is 1 for women with CD4 > 500c/µl, 0.96 for women with 350 < CD4 < 500c/µl, 0.87 for women with 200 < CD4 < 350c/µl, 0.74 for women with CD4 < 200c/µl. We used a sigmoid function to model to model the monthly testing rate for pregnant women. This increases from 4% in 2005 to 8% in 2010 [4].

Treatment rate
We allowed treatment rate to vary over time and CD4 classes. The treatment rate for the CD4 class i at time t is : where γ CD4<200

2005
is a free parameter representing the treatment rate for an eligible individual in 2005 with CD4 < 200c/µl, r elig i (t) the proportion of individuals assumed to be eligible within CD4 class i at time t, r CD4 i (t) the relative treatment rate for CD4 class i relative to that in CD4 < 200c/µl class (i = 4), r time (t) the relative treatment rate at time t relative to that in 2005. Aside from the free parameter γ CD4<200

2005
, the values of all other parameters from Eq 4 are taken from the Thembisa model [4]. We set r CD4 i to 0.4 for the CD4 > 500c/µl class (i = 1), 0.5 for the 500 > CD4 > 350c/µl class (i = 2), 0.7 for the 350 > CD4 > 200c/µl class (i = 3) and 1 (reference group) for the CD4 < 200c/µl class (i = 4). The parameter r elig i (t) models the broadening of eligibility criteria over time as well as the delay between guideline change and change in practice. The product r elig Figure 1. We used a sigmoid function in order to model the gradual increase of the treatment rate over time, represented by r time (t) in the equation. Based on Thembisa data [4, p.28], the function r time (t) implies a 17-fold increase between 2002 and 2012. Initiation of PI-based treatment as a rst-line regimen is represented by the rates γ i D→T 2 . These rates has been estimated using IeDEA-SA data (see section 3.1).

Modelling resistance
The resistance dimension consists of two layers: "being NNRTI-susceptible" and "being NNRTIresistant". Individuals can enter the resistant layer in two dierent ways: either by acquiring drug resistance or by being infected by a resistant strain. Acquisition of NNRTI resistance at a rate σ res is only possible when failing the rst-line treatment. Alternatively, there is a risk to be directly infected by a resistant virus, this risk being proportional to the percentage of infectious individuals that are resistant. Resistant individuals can also revert back to the susceptible layer at a rate σ rev when no more drug pressure is exerted. As time to virological failure is lower for resistant individuals [7], we added a xed parameter α in order to express this dierence. It represents the hazard ratio of the time to virological suppression (susceptible vs resistant) and has been collected from literature [7]. Similar dierences have been observed in time to viral failure between susceptible and resistant individuals. For the sake of simplicity, we decided to use the same parameter α to model both dierences, as estimates found in the literature were within the same range.

Modelling demographic changes
Susceptible people As the model only describes the HIV-infected population, the susceptible population Susc k (t) is not part of the model. However, as Susc k (t) is needed in order to model infections, it is estimated as follows: is the total number of adults of gender k in South Africa estimated by the World Health Organisation (WHO) [8] and Inf k (t) is the total number of infected adults of gender k calculated by the model (sum over all compartments). We used demographic data and interpolated them in order to obtain a function N k (t) that is continuous over time.
Children reaching adulthood The inow of children reaching adulthood (≥ 15 years old) was calculated using [4], [9] and [10]. Thembisa model provides yearly estimates of the number of 15-year olds that are 1) HIV-infected, 2) diagnosed and 3) on ART, stratied by gender. NNRTI resistance prevalence in 15-year olds on ART were estimated by only considering acquisition of NNRTI-resistance during ART. Transmission of resistance due to the prevention of mother to child transmission (PMTCT) treatment was not considered, as the national PMTCT programme only started in 2002 in South Africa, and thus does not have any eect on 15-year olds before 2017. Moreover, we did not consider transmission of resistance from mother to child, as level of resistance was very low in 2002. We considered that 20% of 15-year olds were on a failing ART [9]. Among them, we considered that 90% were resistant to NNRTI [10]. We interpolated these yearly estimates into continuous monthly estimates. Eq 6-9 model the dierence in treatment failure/success rates depending on the absence or presence of NNRTI-resistance. Table 1 shows the dierent compartments of the model and model outcomes (see also .
Impact of resistance on clinical outcome Number of newly infections between time s − 1 and s (see Eq.2) Number of diagnosed HIV-infected individuals at time t Number of individuals on ART at time t

Number of infectious and diagnosed individuals
Inf jk d (t) = D jk (t) + T jk Susc k number of susceptible indiv. of gender k by denition: Inf jk u (t) number of undiagnosed indiv.
where ∆D j (t) represent the inow of individuals going from I j to D j at each time step (newly diagnosed individuals).
Contribution of TDR to ADR between times s − 1 and s (see Discussion section in the manuscript) Supplementary Material June 8, 2019 class, which has twice as many as in the second CD4 class, which has twice as many as in the rst one. Finally, we assumed that 1% of infected individuals were resistant to NNRTI. For sake of simplicity, this percentage did not depend on CD4 counts or on care stages.

Calibration and model simulation
We calibrated the model in two successive steps. First, IeDEA data were used to estimate the majority of the rates with survival analysis. Second, we tted our model to Thembisa data using a maximum likelihood approach. A few parameters, mainly the ones related to resistance, were collected from literature. For more details, see Table 4.

Survival analysis
To estimate the majority of the rates that are related to continuum of care or disease progression, we used data from IeDEA-SA, a network collecting individual information about HIV-infected patients from several cohorts in Southern Africa. After having selected only adults from the 5 South African cohorts and discarded patients with erroneous information, we ended up with information about 54'016 patients. This includes 1) start/end of drug regimen, 2) VL measurements, 3) CD4 counts measurements and 4) outcome (i.e. death). Let A and B be two compartments of the model and r A→B the rate corresponding to the movement from A to B. When a linear interaction ( dB dt = r A→B · A) is chosen, the compartmental model assumes that the time T A→B spent in A before switching to B is exponentially distributed with mean r −1 A→B : T A→B ∼ Exp(r A→B ). Therefore, we estimated the dierent rates summarised in the section below by using survival analysis and assuming a exponentially distributed times.
Rates related to movement between CD4 classes All the rates that are related to the progression of the disease (switch to another CD4 class) within the same care stage are estimated with IeDEA data. As the time spent on one care stage before switching to another stage is generally long, the interval censoring approach provided accurate estimates of almost all rates. As no reliable data exist about untreated individuals (I and D), the progression in CD4 counts for I and D was estimated from literature [11].
Rates related to movement between care stages Rates that model HIV-transmission, diagnosis and treatment initiation were not estimated with IeDEA data and have already been described in Section 1. For all the other rates that are related to continuum of care, we adapted the survival analysis method to handle the sparsity of IeDEA data. First, as few VL measurements are reported per individuals, some steps are missing in the IeDEA data, e.g. some individuals passing directly from "diagnosed" (D) to "suppressed" Supplementary Material (S 1 ), which implies that treatment start is missing in the database. In this case, we aimed to reconstruct the history of care of these patients based on the sparse information we had from IeDEA. In this example, as no rate exists between D and S 1 , we should assume that the individual stayed at T 1 before going to S 1 . Therefore, the information provided by IeDEA data could here help us to estimate two rates: γ D→T 1 and γ T 1 →S 1 . Second, we also modied the method in order to take into account the risk of bias caused by the sparsity of the CD4 measurements. As an example, let's suppose that we want to estimate the suppression rate γ 1 T 1 →S 1 for individuals with CD4 > 500c/µl. If the CD4 counts fell below 500c/µl between the last time he has been reported at T 1 and the rst time he was at S 1 , it is impossible to know whether the patient got suppressed when his CD4 counts was below or above 500c/µl. Therefore, information about this patient cannot be used to inform any of the two rates γ 1 T 1 →S 1 or γ 2 T 1 →S 1 . Although the quality of the estimates remains good for most of them due to the high number of individuals, rejecting those patients could introduce a bias. We tried to correct for this bias by adapting the standard method. Let's consider the rate γ 1 T 1 →S 1 . First, we estimated this rate without CD4 stratication: γ T 1 →S 1 . Next, we compared the weighted average of the rate stratied by CD4 with its estimate without CD4 stratication, by taking the ratio: This constant c represents the bias made when estimating γ T 1 →S 1 by CD4 stratication. To correct for this bias, we therefore multiplied the previously estimated rates γ i T 1 →S 1 by c: c· γ i T 1 →S 1 . Table 2 shows estimates of the rates that are related to the progression of the disease and

Likelihood maximisation
Four dierent rates -transmission rate, diagnosis rate, ART initiation rate and mortality rate -were estimated during the second phase. These four rates were modelled with 7 parameters (see Table 5). Estimates from the rst phase (Table 2 and 3) and from literature (Table 4) were used to run the model. We used an maximum likelihood approach to t four model outcomes to outcomes from Thembisa model (see Eq 20). The four model outcomes are: the number of yearly new infection ∆Inf , the number of undiagnosed individuals Diag, the number of treated individuals T reat and the number of AIDS-related deaths M ort. We used the optim function in R together with the L-BFGS-B method. This method allowed us to provide lower and upper bounds for each parameters. These bounds were chosen in order to include all reasonable values. As optimisation of the likelihood function might give dierent results depending on the starting values of the parameters, dierent starting values were randomly chosen within the range dened by the lower and upper bounds. We selected the set of parameters that maximized the likelihood over the simulations that converged. Calculations were performed on UBELIX (http://www.id.unibe.ch/hpc), the HPC cluster at the University of Bern.

Sensitivity analysis
In the sensitivity analysis, we perturbed 200 times seven parameters using a Latin Hypercube Sampling method (see Table 7). As varying the transmission-related parameters may modify the overall transmission rate, an adjustment is made to have a transmission rate similar to the baseline model. Sensitivity analysis were run for the baseline model as well as for each  Figure E. and F. respectively. Note, however, that these two latter indicators are not used during the tting procedure. The lines correspond to the best model t while the grey area is delimited by the lower and upper bounds of the 10 best ts, whose variation is due to dierent starting points (see Section 3.2).
counterfactual scenario (10 dierent scenarios in total). 100% sensitivity ranges are displayed in Figures 2 and 3 and Table 1 of the main paper.