Mean Recency Period for Estimation of HIV-1 Incidence with the BED-Capture EIA and Bio-Rad Avidity in Persons Diagnosed in the United States with Subtype B

HIV incidence estimates are used to monitor HIV-1 infection in the United States. Use of laboratory biomarkers that distinguish recent from longstanding infection to quantify HIV incidence rely on having accurate knowledge of the average time that individuals spend in a transient state of recent infection between seroconversion and reaching a specified biomarker cutoff value. This paper describes five estimation procedures from two general statistical approaches, a survival time approach and an approach that fits binomial models of the probability of being classified as recently infected, as a function of time since seroconversion. We compare these procedures for estimating the mean duration of recent infection (MDRI) for two biomarkers used by the U.S. National HIV Surveillance System for determination of HIV incidence, the Aware BED EIA HIV-1 incidence test (BED) and the avidity-based, modified Bio-Rad HIV-1/HIV-2 plus O ELISA (BRAI) assay. Collectively, 953 specimens from 220 HIV-1 subtype B seroconverters, taken from 5 cohorts, were tested with a biomarker assay. Estimates of MDRI using the non-parametric survival approach were 198.4 days (SD 13.0) for BED and 239.6 days (SD 13.9) for BRAI using cutoff values of 0.8 normalized optical density and 30%, respectively. The probability of remaining in the recent state as a function of time since seroconversion, based upon this revised statistical approach, can be applied in the calculation of annual incidence in the United States.


Introduction
Measurement of the number of new HIV-1 infections per year and the annual rate at which incident infections occur is important for tracking HIV and monitoring transmission, for evaluation of preventive interventions, and for resource allocation. Application of laboratory methods to distinguish recent from non-recent HIV infection, based upon characteristics of the antibody response early after seroconversion, has led to a less costly approach in the estimation of HIV incidence [1]. Since the publication of a sensitive/less-sensitive testing algorithm in 1998 [2], a number of bioassays have been developed to detect recent infections [3][4][5][6]. Many of these assays have recently received a formal evaluation of performance as an incidence assay by the Consortium for the Evaluation and Performance of HIV Incidence Assays [7].
For future biomarker-based HIV incidence estimates using the stratified extrapolation approach [8], the U.S. National HIV Surveillance System (NHSS) will transition from using the Aware BED EIA HIV-1 Incidence test (BED) to the Bio-Rad avidity index (BRAI), a modification of the HIV-1/HIV-2 plus O ELISA test, to classify infections as recent vs. non-recent. The accuracy of new incidence estimates will depend on having an accurate estimate of the mean duration of recent infection (MDRI) and an accurate estimate of the distribution of recency duration from a representative sample of U.S. subtype B incident HIV infections. It is important to estimate MDRI for both BED and BRAI using the same statistical procedures to facilitate bridging between NHSS historic and future trends in HIV incidence.
A critical parameter in the estimation of HIV incidence, MDRI for a given bioassay such as BED or BRAI, is defined as the average length of time, over a fixed time T, that persons with newly acquired infection are classified by the bioassay as having recently acquired infection.

MDRI is expressed as
is the probability of obtaining a recent result and t is the time since detectable infection. T was chosen to capture most of the dynamic range of a bioassay; i.e., P R % 0 for t>T [9]. Estimation of O T consists of analyzing longitudinal data from HIV seroconverters, where standard methods, such as survival analysis or binomial regression, must be carefully adapted to consider fluctuations of HIV-infected persons in and out of the bioassay-defined state of recent infection and imprecisely known seroconversion times. This report details two general approaches used to estimate Ω T and provides estimates of the O T incidence parameter for both BED and BRAI for persons infected with subtype B, the predominant subtype in the United States. We also describe the distribution of recency duration for application to NHSS HIV incidence data.

Materials and Methods
Biomarker data were generated from longitudinal specimens of therapy-naïve individuals from the United States infected with HIV-1 subtype B who had known dates of last negative and first positive HIV antibody test dates less than 365 days apart collected as part of various cohort studies, including: plasma samples from the HIVNET HIV network for prevention trials [10], the AIDSVAX B/B candidate vaccine phase III trial (Vax004) [11], the multicenter AIDS cohort study (MACS) [12], and the Seroconversion Incident Panel Project (SIPP) in collaboration with SeraCare Life Sciences, Inc. in Milford, MA. In addition, biomarker data were generated from commercially available seroconversion panels obtained from SeraCare Life Sciences (previously known as Boston Biomedical Inc. (BBI)) ( Table 1). Specimens were unlinked from personal identifiers and determined not to be human subjects research by the Centers for Disease Control and Prevention.
The principles of the BED-capture enzyme immunoassay, measuring the proportion of immunoglobulin G that is specific to HIV, and the Bio-Rad GS HIV-1/HIV-2 Plus O EIA, measuring the avidity of maturing antibodies to bind less strongly to the antigen early after infection, have been described in detail elsewhere [4,[13][14]. The thresholds used to classify recent vs. long-standing infection were 0.8 normalized optical density (OD) for BED and 30% for BRAI [15][16].
Two general approaches were used to estimate O T : (i) survival time methods, and (ii) binomial models of the probability of testing recent as a function of time since seroconversion. The fitted models were used to calculate O T at the specified recency thresholds and within the dynamic range of each bioassay, at T = 2 years post seroconversion. Differences in survival time distributions by cohort were assessed using a log-rank test with a p-value adjustment for multiple comparisons.
In the first approach, the Kaplan-Meier estimator of the survival function describing the probability of being in the recent state as a function of time since seroconversion was estimated by a step function using a maximum likelihood approach, after approximating entry (seroconversion) and exit (transition from recent to non-recent) times for each subject. To use the Kaplan-Meier estimator, the time spent in the recent state must be known or right-censored (have a lower bound). Before application of the Kaplan-Meier estimator, seroconversion times were approximated by the midpoints of seroconversion intervals, i.e., the time between last HIV-negative and first HIV-positive tests. This approximation assumes the seroconversion times were uniformly distributed within the seroconversion intervals. The times of transitions from recent to non-recent infection were estimated using linear interpolation or regression (see below) between measurement readings. If a person did not have a non-recent result, then the time in the recent state was right-censored; i.e., time in the state was greater than the time from estimated seroconversion to the subject's last visit (lower bound for survival time). Given the resulting set of distinct times spent in the recent state after seroconversion, the estimator for the probability of being in the recent state at time t post-seroconversion is given bŷ where d i is the number of transitions from recent to non-recent occurring at time t i post-seroconversion, and n i is the number of subjects in the sample at time t i after seroconversion (the number of subjects who have not yet transitioned and not been censored by time t i ).ŜðtÞ ¼ 1 for t<t 1 . O T was estimated by R T 0 SðtÞdt. The survival probability function was computed from the Kaplan-Meier step function, with probabilities given at each day between 0 and T days. Gaps between days were filled with the probability for the last day before the gap, and for those days with an estimated probability that included a fraction of a day, a weighted sum of probabilities was computed for the day, with weights based upon the duration per fraction. A correction to the Kaplan-Meier estimator was applied by classification of the maximum observation to an event if it was censored [17]. In addition to non-parametric survival, survival times spent in the recent state, as defined above, were modelled parametrically Table 1. Characteristics of data used in estimation of MDRI by data source. The median and interquartile range of distributions are given for the HIVnegative seroconversion interval, i.e., time between last HIV-negative and first HIV-positive tests; the HIV-positive follow-up, reflecting the total duration of observation after testing positive; the number of HIV-positive samples; and the HIV-positive sampling intervals or times between consecutive samples. by the Weibull distribution. The probability distribution function for the Weibull distribution is given by gðt; a; bÞ where β is the Weibull scale parameter or spread of the distribution and α shape parameter. Neglected in our previous estimates of MDRI [18], subjects observed often enough during the phase when the bioassay was near the recent/non-recent threshold may exhibit multiple fluctuations back and forth between the two states. For these subjects, predicted exit times were computed from linear regression of the bioassay measurement values on approximated days since seroconversion using all data between the observation prior to the time when the threshold was first reached and either the first observation above after the last value below the threshold or the last observation for those individuals not observed above the threshold at the end of their follow-up. If the prediction slope was negative, time was right-censored at the observation prior to when the threshold was first reached.
In the second adapted approach [19], the probability of testing recent was modeled by fitting a binary value for the observed non-recent/recent classifications, instead of the original OD or avidity value for BED and BRAI, respectively. A linear binomial regression model, assuming a logit or inverse sigmoidal logistic parametric form for the probability of testing recent as a function of time since seroconversion, i.e. g(p) = log(p/(1 − p)), was fit to the data using a maximum likelihood approach. Model goodness of fit was assessed using the AIC criterion [20]. The model form was g(P R (t)) = Sβt, where g(.) was the logit link function and Sβt the linear predictor. The seroconversion times were approximated by the midpoints of the seroconversion intervals prior to model fitting. O T was estimated by R T 0 g À1 ð Pb tÞdt using the trapezoidal rule for 731 days (0 to T). The predictor was log-transformed time since seroconversion. Bootstrapping was performed by sampling from subjects; 1000 replicates were computed. To account for potential subject level clustering effect on estimates of MDRI, a random intercept binomial model, with the logit parametric form was fit to the data using a pseudo-likelihood technique. [21] The linear predictor is of the form Sβt + γ, where γ is the subject-specific random effect. The minimum and maximum cluster size per subjects was 1 and 15, respectively.
In addition to this parametric approach, a generalized additive model (GAM) was fit to the binomial data with a logit link. The additive model generalizes the linear binomial model by modeling the expectation of testing recent as g(P R (t)) = SC(t), where SC(t) was a smooth function of time since seroconversion, estimated non-parametrically with flexible spline terms for the predictor [22]. The degrees of freedom determine the amount of smoothing. A crossvalidation method was used to optimize the effective number of parameters defined by the degrees of freedom [23]. The 'log-transformed time since seroconversion' term was fit by using a univariate smoothing spline with four degrees of freedom; of these, one was taken up by the linear portion of the fit and three by the nonlinear spline portion. As with the parametric binomial model, bootstrapping was performed to derive measures of accuracy.
SAS software, version 9.3 (SAS Institute Inc., Cary, NC, USA) was used to implement all steps of the estimation methods.

Results
Collectively, 953 specimens from 220 HIV-1 subtype B seroconverters taken from 5 cohorts were tested using either BED, BRAI, or both (Table 1, S1 and S2 Tables). The median time between last HIV-negative and first HIV-positive specimens, based upon Western blot diagnostic testing criteria, was 180 (interquartile range (IQR) 114, 192) days. The median duration of follow-up following HIV-positive diagnosis was 208 (IQR 65, 510) days; there were a median of 4 (IQR 3, 5) specimens per subject sampled approximately every 51 (IQR 28, 95) days. The percentage of subjects who did not enter the non-recent state during the course of their followup, as defined by the bioassay threshold, was 29% (61/209) for BED and 38% (62/162) for BRAI. There were not significant differences in the BED or BRAI probability distributions for remaining in the recent state by cohort (p>0.05).
Most subjects show increases in HIV antibody levels over time, though there was heterogeneity among individual responses (Fig 1) for both BED and BRAI. The estimates for O T range from 198.4 to 215.7 days for BED and 239.3 to 253.6 for BRAI avidity, differences of approximately 2-3 weeks between estimation methods ( Table 2). The relative standard error (RSE), i.e. standard error divided by the mean, ranged from 5.8 to 8.0% for BED and 5.4 to 7.0% for BRAI, with the lower RSEs associated with the survival procedures or the binomial random intercept models, suggestive of slightly better fit, but with similar dispersion observed for the two bioassays.

Discussion
In this report we have used five estimation procedures based on two statistical approaches to estimate the MDRI, a key parameter in estimation of population level HIV incidence, and the distribution of time from seroconversion to non-recent classification which can be used by NHSS to estimate the number of new HIV-1 infections for monitoring of trends in incidence at the national and state levels. Resulting estimates of O T ranged from 198.4 to 215.7 days for BED and 239.3 to 253.6 days for Bio-Rad avidity. These results constitute revised estimates for the BED mean duration of recency used in the HIV-1 incidence algorithm [18] and are consistent with other recently published estimates of BED MDRI [19,24]. Differences in O T due to revisions will be inversely proportional to differences in estimated incidence when O T is applied to NHSS incidence data because this parameter is in the denominator of the incidence algorithm [20,[25][26], and will result in lower estimates of HIV-1 incidence than previously published [27,28]. Improved estimation methods that account for the wide variability in individual response to antibody maturation, and specifically the potential fluctuations around the bioassay threshold, have been implemented. In addition, we have developed a random intercept model for the binomial approach to appropriately account for varying subject-level cluster sizes. The resulting MDRIs based upon our data were 1-2 weeks shorter when estimated from the random effects models relative to the logit models which assumed independence of observations.
Estimation of O T in general, and our estimation work in particular, have important limitations. Though we restricted our analyses to study cohorts and commercially available seroconverter panels of data from the United States for estimation of O T , a key difference from previously published results, these data may not be representative of all newly diagnosed HIV-1 subtype B infections in the United States. In addition, new diagnostic testing recommendations may lengthen MDRI by shortening the duration of time between transmission and detected infection. Future estimates will require an adjustment to MDRI or use of new seroconversion panel data from subjects diagnosed consistent with new HIV diagnostic testing algorithms. An important assumption is that of uniformly distributed seroconversion times in the seroconversion interval, the time from last HIV-negative to first HIV-positive tests. The assumption of uniformly distributed seroconversion times may be reasonable for individuals enrolled in studies that schedule visits. However, it may otherwise be possible to observe lack of independence between testing and seroconversion time for persons motivated by suspicion of infection or with concomitant needs for clinical care, e.g., diagnoses of STDs or pregnancy [29][30][31][32]. Results from simulation studies demonstrated that the estimation methods described in this report were robust to most data issues, e.g., duration of time between last HIV-negative and first HIV-positive tests, frequency of HIV-positive sampling, and loss-to-follow-up [33]. However, when data issues were compounded, e.g., when both HIV-negative and HIV-positive sampling frequency was lengthy, absolute bias increased. Although a number of methods can be used to estimate O T , there were differences in accuracy and precision depending upon the data quality. For example, models we implemented in estimation of MDRI that relied on a parametric assumption or ignored within-subject correlation in the presence of varying cluster sizes (measurements per subject) did not perform as well as the non-parametric survival approach in the presence of increasing measurement noise with T = 2 years. Although the estimates reported herein will require validation from other studies of seroconverters, the MDRI estimates based upon the revised survival approach, 198.4 days for BED and 239.6 days for BRAI, and more specifically, the probability of remaining in the recent state as a function of time since seroconversion, will be applied in calculation of annual incidence in the United States as described previously [25,27].
Supporting Information S1 Table. Spreadsheet with BED OD values at times since estimated time of seroconversion, assayed for 858 specimens from 209 HIV-1 subtype B seroconverters, taken from 4 data cohorts.
(XLSX) S2 Table. Spreadsheet with Bio-Rad avidity values at times since estimated time of seroconversion, assayed for 749 specimens from 162 HIV-1 subtype B seroconverters, taken from 3 data cohorts.