HIV Model Parameter Estimates from Interruption Trial Data including Drug Efficacy and Reservoir Dynamics

Mathematical models based on ordinary differential equations (ODE) have had significant impact on understanding HIV disease dynamics and optimizing patient treatment. A model that characterizes the essential disease dynamics can be used for prediction only if the model parameters are identifiable from clinical data. Most previous parameter identification studies for HIV have used sparsely sampled data from the decay phase following the introduction of therapy. In this paper, model parameters are identified from frequently sampled viral-load data taken from ten patients enrolled in the previously published AutoVac HAART interruption study, providing between 69 and 114 viral load measurements from 3–5 phases of viral decay and rebound for each patient. This dataset is considerably larger than those used in previously published parameter estimation studies. Furthermore, the measurements come from two separate experimental conditions, which allows for the direct estimation of drug efficacy and reservoir contribution rates, two parameters that cannot be identified from decay-phase data alone. A Markov-Chain Monte-Carlo method is used to estimate the model parameter values, with initial estimates obtained using nonlinear least-squares methods. The posterior distributions of the parameter estimates are reported and compared for all patients.


Introduction
Many researchers have analyzed the dynamics of human immunodeficiency virus (HIV) using nonlinear ordinary differential equation (ODE) models [1][2][3][4][5]. Using various mathematical models, they sought to simulate the dynamics of the virus or to help design a treatment. Many of these studies have attempted to identify model parameters from patient data, but the sparsity of measurements resulted in very large confidence intervals in the parameter estimates. Furthermore, the experimental data used only included the period of viral decay following the introduction of therapy. Identification of the HIV model parameters under these conditions requires assuming that the drug efficacy is known, which in turn affects the estimates of the remaining parameters.
The amount of data normally available for HIV model identification is sparse. Typically, a patient on effective therapy has his or her viral load tested every 3 or 4 months, a rate too slow to accurately capture the dynamic characteristics accurately. In this paper, the data used to identify model parameters are from the AutoVac study conducted at the IrsiCaixa HIV research foundation in Barcelona [6]. In the AutoVac study, 12 patients underwent a series of about 30-day treatment interruptions, followed by resumption of suppressive therapy, and viral load measurements were taken at 3-day intervals during the interruption. This resulted in between 69 and 114 viral load measurements per patient, with between 38 and 77 data points per patient above the limit of detection. Therefore, data gathered from this particular experiment are sufficiently rich for model identification. All patients enrolled in the AutoVac study had maintained undetectable viral loads on therapy for at least two years prior to the study, and had baseline CD4+ T-Cell counts over 700. Patient nine and twelve both had interruptions without measurable viral load rebound, and were excluded from this study due to insufficient usable data.
Full-state measurements are not available for HIV model identification, so the problem of identifiability must be considered. For HIV models, Stafford et al. [7] have studied the identifiability of a 3 state model. Xia and Moog [8] analyzed the theoretical identifiability of a 4 state model and determined the minimal number of state measurements needed for estimating all model parameters. Frequently, only viral load data are available and in this case, not all parameters can be identified independently [7,9]. Recently, Miao et al. also investigated some identifiability issues for viral dynamics [10].
For identifying the parameters of a viral dynamics model, two major methods are commonly used: nonlinear least squares [11] and Bayesian estimation [12][13][14][15][16]. In this paper, we employ a Bayesian Markov-Chain Monte Carlo technique as in [12,16], with nonlinear least-squares used to generate initial conditions for the MCMC technique. The primary difference between this work and previous works is the quality of the data used for estimation. The data used in [16] consists of 10 measurements from 12 patients taken at 5 time points. The data used in [12] consists of seven measurements from 42 patients taken at seven time points. The data used in [15] consists of 9 viral load measurements from 42 patients taken at nine time points, plus a single baseline measurement of phenotypic drug susceptibility and survey data on patient adherence. The data used in [17] used 18 measurements of viral load from 18 time points. All four of these studies only included data from a single virus decay phase following treatment initiation. As a result of the sparse data, these previous studies had to make a number of simplifying assumptions about parameter values in order to preserve identifiability. By contrast, the AutoVac patient study provides us with between 69 and 114 viral load measurements from 10 patients from between 3 to 5 treatment interruptions cycles per patient. The high quality of the data used in this study allows reliable estimation of parameter values without resorting to the simplifying assumptions used in previous studies.
We consider the following mathematical model characterizing the viral dynamics for a patient: There are three states: x, the concentration of target CD4+ T cells; y, the concentration of actively infected CD4+ T cells; v, the viral load. l is the proliferation rate and d is the death rate of target CD4+ T cells; b is the infection rate; g is the drug efficacy; a is the death rate of actively infected cells; l y (t) is the contribution of the reservoir to actively infected CD4+ T cells; c is the rate of free virus production by infected cells; v is the clearance rate for the free virus. The drug application u is 0 during interruptions and 1 during treatment. This is a variation of a model first proposed in [18], with the addition of the l y (t) term describing the additional contribution of infected cells from all viral reservoir processes. This model is essentially the same as the model identified against patient data in the previous studies [11][12][13][14][15].
Highly active antiretroviral therapy (HAART) has proven effective to reduce the active viral load [19,20] and is standard care for HIV patients. However, it cannot eradicate the virus completely [21,22]. Although scientists suspect that the existence of long-term latent reservoirs in patients is the main reason for viral persistence [23,24], there is little quantitative understanding of their contribution, mainly because of the difficulty measuring the virus reservoir directly. Some HIV investigators have proposed mathematical models to describe the dynamics of long-term latent reservoirs [25][26][27]. Little research has been done to estimate the parameters of these models quantitatively based on clinical data.
In this study, the total contribution of the reservoir processes to the actively infected CD4+T cells is estimated from standard viralload time-series data. We analyze the identifiability for each parameter in Model 1 using differential algebra tools. The implementation of Bayesian estimation method is presented, and the joint posterior parameter distributions calculated by the Bayesian methods for each patient are reported.

Experimental Methods
Ethics statement. The previously published clinical study [6] was carried out in accordance with a human subjects protocol approved by the institutional ethics review committee at the University Hospital Germans Trias i Pujol in Barcelona, Spain. Written informed consent was obtained from all study participants. De-identified patient data was shared in accordance with a protocol approved by the University of Delaware Institutional Review Board.
Study design. This research described here uses data from a previously published study. The measurements which are the focus of this work have been previously described in [6]. Briefly, a randomized prospective Structured Treatment Interruption study enrolled 26 HIV-1 positive asymptomatic adults with no detectable virus for at least two years prior to entering the study (limit of detection 50 virions per ml). 14 were randomized to a control group, continuing their previous cART regimens. 12 were randomized to the experimental group, and underwent between three and five cycles of interrupted antiviral therapy, remaining off therapy until two consecutive viral load measurements above 3000 virions/mL were reached, or for a maximum of 30 days, then reinitiating the original cART regimen for 90 days before the next interruption cycle began. HIV-1 RNA PCR quantitative analysis was performed on samples collected three times weekly following treatment interruption, and then weekly for the two months following re-initiation of treatment.

Modeling Persistence of Latent Reservoirs
Although highly active antiretroviral therapy (HAART) efficiently suppresses viral load to undetectable levels, current regimens cannot eradicate the virus completely [21,26,[28][29][30]. One possible reason is the persistent replication of HIV at a very low level, even under HAART conditions [26,31,32]. Another possible reason is the existence of stable reservoirs of latently infected cells [26,33,34]. These two possibilities are not mutually exclusive, and it is likely that a combination of persistent viremia and reservoir activation combine to maintain the reservoirs and prevent eradication [26].
Rong and Perelson proposed models to describe the dynamics of the latent reservoir [26]. However, in order to be identifiable from viral-load data, the models must be simplified. Siliciano et al. [35] found that the average half-life of the latent reservoir in resting CD4 z T cells is 44 months, which means it is extremely stable. There is no strong evidence that the activation rate of the reservoir is constant, however, and a combination of various factors may contribute to the maintenance of a residual virus load during effective antiviral suppression. Therefore, in Equation 1 l y (t) represents the total average contribution of reservoir dynamics to the active infected cell compartment y during the treatment period. The time between ceasing antiviral therapy and the viral load reaching measurable levels (the rebound time) is very sensitive to the value of l y (t), and consequently the goodness of fit for the entire model is also very sensitive to l y (t). Surprisingly, as shown in our results, a patient-specific constant value for l y (t) provided an excellent fit for all ten patients for all interruption cycles, implying that the average contribution of reservoir dynamics to the active infected cell compartment is relatively constant across several interruption cycles.

Identifiability Analysis
Equation 1 is a special case of the following general nonlinear model: In this study X (P,t) is the state vector ½x,y,v', P is the parameter vector ½l,d,b,u,a,l y ,c,v', and F is the function which describes system dynamics.
We adapt the following concepts of identifiability from [36][37][38]: Definition 1. Equation 2 is said to be globally identifiable from the given states if the equation F(X (P,t),P)~F (X (P Ã ,t),P Ã ) has only one solution P~P Ã .
Definition 2. Equation 2 is said to be locally identifiable from the given states if in some open neighborhood, U p Ã , around the true parameter vector, the equation F (X (P,t),P)~F (X (P Ã ,t),P Ã ) has only one solution P~P Ã and P Ã [U p Ã .
The identifiability of HIV dynamic models has been analyzed previously [8,10]; however, these previous works assumed more than one state could be measured. Here, only the viral load data are assumed to be available. Differential algebra is used to analyze the identifiability issue of Model 1. Details of the differential algebraic techniques are found in [37][38][39]. The steps in this analysis are as follows: i) We choose the following order relation for Equation 1: vv_ v vv€ v vvv. ii) Based on the above order, the normalized characteristic polynomial of v is calculated:

iii)
By extracting the coefficients in the above polynomial we obtain the following set of identifiable parameters: iv) The identifiability of each parameter can be checked by checking the injectivity of the the map defined in Equation 4. All parameters except c and l are uniquely identifiable; the product cl is uniquely identifiable. Since the patients enrolled in the AutoVac study had all maintained undetect-  Table 1. The average correlation coefficients between parameters for all ten patients.
... able viral loads on therapy for at least two years prior to the study, it is reasonable to assume that the initial conditions of this study are the steady states in Equation 1. Therefore, the steady state of x in Equation 1, l d , is set as the concentration of CD4+ T Cells at the beginning of this study. Under this assumption, the value of l is defined as the product of the initial number of target cells x(0) and the estimated decay rate of target cell d. Although theoretically v is uniquely identifiable, the current best estimate of v is between 9 and 36|day {1 [40]. Estimation of v would require very high frequency measurements (several measurements per hour, much faster than our data). Therefore the value of v is set as 18:8|day {1 (corresponding to the geometric mean of the estimated half-lives in [40]), and the virus dynamics are treated as a singular perturbation to the system. Assuming that the true value of v is sufficiently large that the singular perturbation approximation is valid, the only identified parameter that depends on this assumption is c; the posterior distribution of estimated values for c would shift in a inversely proportional manner with a change in the assumed value of v, with no change in the posterior distributions of any other parameter.
The identifiability analysis described above does not take into account the number of measurements or the sampling rate; it simply states that for a sufficiently large number of measurements taken at a sufficiently fast sampling rate, the set of parameters described are identifiable. The uncertainty in the estimated parameter values will depend on the actual number of measurements and their frequency.

Bayesian Estimation
Initial estimates. During the AutoVac study treatment for each patient was interrupted and after a period of time, restarted. This cycle of interruption and reinstating the treatment is repeated 3 to 5 times. In order to generate an acceptable initial value for the MCMC method, a two-step least-squares method was used, using data from the first three interruption cycles. The data from the period in which the treatment is interrupted, region 1, is used to estimate the 5 parameters, l,d,b,a and c using a constrained nonlinear least squares method. The data from region 2, where  treatment is reinstated, contains information about the drug efficacy. With the value of the six parameters fixed to those values found by least squares in region 1 (as shown in Fig. 1), the data of region 2 is used to estimate the drug efficacy. The initial values obtained using this two-step procedure are used as the kernel for the MCMC methodology described below; the MCMC methodology identifies all six parameters against all data from both regions of all treatment interruptions. MCMC methodology. From the steady-state values of Equation 1, the relationship between d and l can be written as: Where x(0) is the initial measurement of CD4+ T cells taken for each patient at the beginning of the study (as in [16]). Therefore, in this method, the parameter set ½l,b,a,c,l y ,g T are estimated. g is calculated for each iteration by least-squares subject to the values of the other five parameters. The MCMC approach taken here is based on the Metropolis-Hasting algorithm [13]. Assume that the i th subject, we have m i measurements of viral load. We denote the parameters as: Following the iterative MCMC algorithm of [13], the implementation can be written as: where Ga is the gamma distribution and Wi is the Wishart prior distribution. The hyper-parameters, a,b,g,L,V and n are known with the following values: The hyper-prior values for the initial variance of h i , L, are sufficiently large that this may be considered a non-informative prior distribution. For comparison, the analysis in [16] had initial variance for l and d of 0.12 and 0.016 respectively, heavily biasing

Repeat step 2 until the chain converges.
To obtain reasonable results from the MCMC method, good initial estimates of h i are needed. The constrained least squares approach described previously is used to get an initial estimate.
The parameters are constrained so that R 0~l bc dav , which is the basic reproduction ratio during treatment interruptions, is greater than 1. If R 0 were less than 1, then the virus would not have successfully established infection. The above procedure was applied to the data for 10 patients with sufficient data. The MCMC procedure produced 200,000 possible sets of parameters for each patient that are consistent with the patients' data. For the purposes of analysis, the first 50,000 iterations were discarded to allow the chain to converge, leaving 150,000 parameter sets per patient for the final analysis. From this result, the marginal probability densities for of the six parameters can be established.

Nonlinear Least Squares Estimation
Parameter estimates were generated for each of 10 patients using the nonlinear least squares method. Of the 12 patients in the study, 2 had no detectable virus after an interruption, leaving insufficient data above the measurement threshold to identify model parameters. Although the nonlinear least-square method cannot guarantee globally optimal results, it can provide good initial estimates for the prior distribution of the MCMC method. Simulated viral load curves based on the identified parameter values using this method are compared to the measured data from Patient 2 in Fig. 2.

Bayesian Estimation
The MCMC model fitting procedure was run for each of the 10 patients with sufficient data. Histograms of the marginal posterior distributions for the six parameters for each of the 10 patients are shown in Fig. 3.
Note that the parameters are not independent, and the parameter vectors should be considered as complete sets. Table 1 shows the average correlation coefficient among each different pair of parameters; it is clear that most pairs of parameters would be considered highly correlated. In addition to this first-order correlation between parameters, there are also higher order nonlinear correlations. Fig. 4 shows the high level of correlation between the product of the parameters which form the numerator and denominator of R 0~l bc dav , further emphasizing the need to consider parameter vectors rather than individual parameters. Values chosen independently from each parameter's distribution can easily generate a parameter set which is a particularly poor representation of the data. Consequently, we also report as tables in the supplementary material the entire posterior distribution for each of the 10 patients (shown in supplementary tables S1, S2, S3, S4, S5, S6, S7, S8, S9, S10). The posterior distribution generated by this MCMC method provides a database for testing the robustness of treatment optimization strategies, such as those described in [25,[41][42][43].
The histograms in Fig. 3 show the range of values of each parameter, and Table 2 gives the maximum likelihood estimate for each parameter. From the histograms it is clear that the distributions for the parameters b,a, and c vary little between patients, indicating that the infection rate and burst size of the virus and the death rate of infected cells may be consistent across the patient population. By contrast, the parameters l, l y , and g vary substantially between patients, indicating that the regeneration rate of CD4+ T cells, the reservoir contribution rate, and the drug efficacy may vary significantly across the patient population. Table 3 gives a summary of the estimated population parameters (the average value of the 10 identified patients), compared with those from previously published papers [15,16]. R 0pre is the basic reproductive ratio R 0 of the virus without treatment. R 0post is R 0 (1{g), the reproductive ratio of the virus under treatment conditions. The values for parameters l,d,b,a, and c are consistent with the previously published best estimates for these parameters. In particular, the maximum likelihood estimates for the death rate of target cells d ranged from 0:045{0:45|day {1 , in perfect agreement with the estimates obtained from patient data in [15]. This estimate is considerably higher than the decay rate estimated in [16], but careful examination of the methods in that previous work show that the value of d was essentially constrained to the prior distribution. The ratio l d is consistent between all three studies. Our maximum likelihood estimates for the death rate of infected cells a ranged from 0:18{2:3|day {1 , in agreement with the current best estimate of 0:7{1:3|day {1 [17,26], as well as the data from the two comparable studies [15,16] [16]. Our maximum likelihood estimates of the target cell recruitment rate l ranged from 35{760|cells|mL {1 |day {1 , higher than the comparable range of 86{111|cells|mL {1 |day {1 reported in [15]; however, this is expected, as the inclusion criteria for our experiment resulted in patients with much healthier immune systems overall compared to the patients in [15]. The much lower rate of 0:6{1:6|cells|mL {1 |day {1 reported in [16] is a direct result of that study's unrealistically low prior value for d. . This suggests that the replenishment of the active compartment by the viral reservoirs is very heterogenous between patients. However, it is noteworthy that a single constant value of l y was able to accurately predict rebound time across multiple interruption cycles for the same patient despite variation in interruption length, indicating that the replenishment rate for a given patient is relatively constant over the course of the experiment. The efficacy of the antiviral drugs is estimated directly from the viral load data. To our knowledge, this is the first time this has been done. Previous estimates of model parameters have typically inferred drug efficacy from PK/PD data in the plasma, resulting in estimates of g&0:95 [15], or relied heavily on the assumption that the first measurement was at steady-state [16], resulting in estimates between 0:64ƒgƒ0:84. By contrast, our direct estimate of g from the viral load data give us maximum likelihood estimates ranging from g~0:67 to g~0:88. This indicates that the estimates based on plasma pharmacokinetic data may overestimate the true drug efficacy, though the estimates which relied exclusively on the first measurement were consistent with our results. Fig. 5 shows the histograms of the coefficient of determination R 2 for each of the 10 patients. The statistical significance threshold for the fit relative to an average measurement model was calculated using the F-test (Pv0:05 is considered statistically significant); this shows that, for all patients except Patient 6, all 150,000 parameter vectors in the posterior distribution would be considered a statistically significant fit to the data if considered in isolation.
The viral load fitted by the model using the maximum likelihood estimates of the parameters and simulation results for Patient 1 are shown in Fig. 6.

Discussion
As shown in this study, the parameters in the HIV dynamics model are heavily correlated. The strong correlation between parameters in this model mean that only the distribution of complete parameter sets (as opposed to independent distributions of each parameter) can be considered to accurately represent the fit of the model to the data. The quality and extent of the data available in this study was considerably higher than any previously published parameter estimation study, allowing for the accurate estimation of the model parameters without using the simplifying assumptions necessitated by the sparsity of data in previous studies. Furthermore, the level of excitation of the dynamics provided by the multiple interruption schedules allowed us to directly identify two parameters not previously identifiable from patient data. The results agree for the most part with previously published data, but the results are more reliable and significant given the dramatic increase in the amount of data available for identification. This paper presents, for the first time, the posterior distribution of parameters for a commonly used HIV infection model identified against measured patient data. Analysis of this distribution shows good representation of the data. As shown in the histograms of Fig. 5, all parameter estimates in the posterior distributions for all patients except a subset of patient 6 would be considered statistically significant by the standard F Test (Pv0:05), implying that the reported posterior distribution describes the range of feasible parameter values based on the measured data well. Inspection of the data from patient 6 shows the data available above the limit of detection are limited compared to the other patients, which can explain the high value of R 2 corresponding to an F-test P value of 0.05 for that patient. However, even for this patient, more than half of estimates are still statistically significant, and there were 39 measurement points above the limit of detection, substantially more measurements than were available for any previously published parameter estimation study.
The multiple interruptions in the patient data provided the opportunity to quantify the contribution rate of viral reservoirs to the active infected cell compartment. These rates varied widely from patient to patient. The rate was characterized in terms of number of productively infected cells produced per day, and equally well describes such diverse potential reservoir processes as low-level persistent replication, viral blipping, and spontaneous reactivation of quiescent cells. The overall fit is highly sensitive to the rebound time, and the rebound time is uniquely determined by the reservoir contribution rate. It is surprising, therefore, that a single value for the reservoir contribution rate was able to describe the data well over multiple interruptions in the same patient; this strongly suggests that the underlying process represented by the parameter l y is continuous rather than bursting in nature. The quantification and understanding of the viral reservoir dynamics is of critical importance to understanding the nature of ongoing viral evolution under conditions of effective suppression, and will be a necessary precursor to any attempts to flush the reservoirs and achieve a functional cure for HIV.
The results presented here also show that the overall drug efficacy is between 0:6 and 0:9, and that the effective reproductive ratio of the virus while on therapy is between 0:1 and 0.8, verifying the results presented in [16]. It is important to emphasize that, while the antiviral drug regimens are very successful at suppressing overall virus load, they are not perfect in inhibiting infection, and de novo infection events continue to occur even under highly effective therapy. The results presented here also verify the previously reported findings in [15] that uninfected CD4+ target cells have a half-life of approximately 2{15 days in vivo, which is substantially shorter than estimates obtained in vitro. This is likely due to the higher state of activation of the CD4+ T-cells during active viremia in vivo, but whatever the cause, it is a consistent finding across multiple experiments.
In addition to being the most accurate way of describing the fit of a model to data with high levels of measurement uncertainty, the publication of complete parameter distributions identified from patient data also has significant practical importance. A growing number of model-based interventions using variations of the model described in Equation 1 are being proposed, including our own methods designed to minimize the risk of resistance emerging during antiviral regimen switching [25,41,42,[44][45][46][47]. Most of these methods have used either nominal parameters or a single parameter set. The parameter distributions published in this work provide a parameter set against which the robustness of a proposed model-based method to expected patient variation can be tested. The data used in this paper came from a cohort restricted to patients with good immunological control of the virus under antiviral suppression, so the distribution of parameters can only be said to be representative of such a subgroup of HIV-infected persons. However, the publication of the identification methods will likely lead to the publication of parameter distributions from patients in other studies in the future, leading to a growing library of virtual patients. Table S1 The posterior distribution {h i } for Patient 1. The first 50,000 rows were excluded from analysis to allow for convergence of the Markov Chain. (XLSB)