Towards Estimation of HIV-1 Date of Infection: A Time-Continuous IgG-Model Shows That Seroconversion Does Not Occur at the Midpoint between Negative and Positive Tests

Estimating date of infection for HIV-1-infected patients is vital for disease tracking and informed public health decisions, but is difficult to obtain because most patients have an established infection of unknown duration at diagnosis. Previous studies have used HIV-1-specific immunoglobulin G (IgG) levels as measured by the IgG capture BED enzyme immunoassay (BED assay) to indicate if a patient was infected recently, but a time-continuous model has not been available. Therefore, we developed a logistic model of IgG production over time. We used previously published metadata from 792 patients for whom the HIV-1-specific IgG levels had been longitudinally measured using the BED assay. To account for patient variability, we used mixed effects modeling to estimate general population parameters. The typical patient IgG production rate was estimated at r = 6.72[approximate 95% CI 6.17,7.33]×10−3 OD-n units day−1, and the carrying capacity at K = 1.84[1.75,1.95] OD-n units, predicting how recently patients seroconverted in the interval ∧ t = (31,711) days. Final model selection and validation was performed on new BED data from a population of 819 Swedish HIV-1 patients diagnosed in 2002–2010. On an appropriate subset of 350 patients, the best model parameterization had an accuracy of 94% finding a realistic seroconversion date. We found that seroconversion on average is at the midpoint between last negative and first positive HIV-1 test for patients diagnosed in prospective/cohort studies such as those included in the training dataset. In contrast, seroconversion is strongly skewed towards the first positive sample for patients identified by regular public health diagnostic testing as illustrated in the validation dataset. Our model opens the door to more accurate estimates of date of infection for HIV-1 patients, which may facilitate a better understanding of HIV-1 epidemiology on a population level and individualized prevention, such as guidance during contact tracing.


Introduction
Accurately estimating incidence of an infectious disease is vital for informed and targeted prevention, and knowing the date of infection per case is important for estimating the incidence in a population. For acute infections, like influenza, it is relatively straightforward to infer the date of infection because it occurred just shortly before the diagnosis. For chronic infections, like human immunodeficiency virus type 1 (HIV-1) infection, it is more complicated to infer the date of infection because only rarely are persons diagnosed during primary HIV-1 infection (PHI). Instead, most diagnosed persons have an established HIV-1 infection of unknown duration. Consequently, the World Health Organization (WHO), the Joint United Nations Programme on HIV/AIDS (UNAIDS), as well as national public health institutes usually simply report the number of newly diagnosed cases. Due to the current problems with HIV-1 incidence estimation, there is considerable interest in the development of assays and biomarkers that can determine if an HIV-1 infection is recent, in order to allow for estimating HIV-1 incidence in a population [1,2,3,4,5,6,7,8].
Seroconversion occurs on average 21 days after HIV-1 infection [9,10], and is thus a useful date to infer by serology. Serological assays are based on the knowledge about the development and maturation of the HIV-1 antibody response in infected persons (reviewed in [3,4,6,11]). These assays are collectively referred to as Serological Testing Algorithm for Recent HIV Seroconversion (STARHS) [4] or Recent Infection Testing Algorithm (RITA) [2]. In 1998 Janssen et al. described the first mathematical method that was specifically developed to estimate HIV-1 incidence using a cross-sectional sampling approach [12]. This method used results from a ''less-sensitive'' (or detuned) version and a standard version of an HIV-1 enzyme linked immunoassay (EIA). Since then, additional assays have been developed, such as the IgG capture BED enzyme immunoassay (BED assay) [13], the IDE-V3 assay [14], and several different avidity assays (reviewed in [15]). Adjustments of Janssen's original formula have also been presented [16,17]. The BED assay, which was developed by the US Centers for Disease Control and Prevention (CDC), has been commercialized. The assay name 'BED' signifies that it is based on a trimeric branched peptide with each branch derived from the immunodominant region of the gp41 glycoprotein of HIV-1 subtype B, circulating recombinant form (CRF) 01_AE or subtype D to overcome subtype-specific differences associated with some other assays [4]. Importantly, the BED assay, like most other serological assays, has been designed for incidence estimates in populations. At present, it provides a binary result, i.e., recent vs. long-term infection based on a cutoff value of a normalized optical density (OD-n = 0.8) in the EIA, rather than a quantitative estimation of time since seroconversion. The mean time interval from seroconversion to this cutoff value, i.e., the mean recency period, has been estimated at around 180 days, with some differences between genetic subtypes and populations [18]. The cutoff value was optimized to minimize misclassification of recent and long-term infections, but such misclassifications still occur. For instance, it is well-established that the BED assay can give a false impression of recent infection for some patients with advanced disease because HIV-1 antibody levels to the BED peptides sometimes wane with advancing immunodeficiency [13]. For this reason different approaches to adjust BED incidence estimates have been suggested [16,17].
The objectives for this study were: 1) To create a biologically motivated time-continuous model of the production of BEDspecific IgG (BED IgG) data; 2) To address the patient variability of the BED IgG growth following HIV-1 seroconversion,; 3) To critically examine the common modeling assumption that sercconversion happens at the midpoint between last negative and first positive HIV-1 test result; and 4) To reevaluate national Swedish surveillance data utilizing BED data. To achieve these goals, we explored various parameterizations of a basic logistic growth model describing the production of BED IgG, trained on a large cohort metadata set from a recent study by Parekh et al. [18]. To account for patient variability, universal parameter values were estimated using mixed effects modeling, and final model selection and validation was performed on a second large dataset, consisting of new BED data from Swedish patients newly diagnosed with HIV-1 infection between 2002 and 2010. While we informed the model with BED assay results, because it is currently the most used biomarker for recency estimation, our model could be adjusted to other available and future serological biomarkers as well as be included in multi-assay approaches [8,19,20].

Ethical Approval
For the new data in this study, collected in Sweden, informed written or oral consent was obtained from all adult participants and from the next of kin, caregivers or guardians on the behalf of participants that were minors or children. The research was conducted according to the Declaration of Helsinki and was approved by the Regional Medical Ethics Board in Stockholm, Sweden, which had permitted the use of oral consent to minimize the risk of selection biases due to patient drop-out because some ethnic groups of participants were known to be willing to take part in the study, but reluctant to provide written consent (Dnr 02-367, 04-797 and 2007/1533). Written or oral consent were documented in the patient clinical records.

Study Populations and BED Measurements
To infer parameters for our time-continuous IgG model, we used BED data from a previous meta-study by Parekh et al [18], encompassing 756 HIV-1 diagnosed patients sampled at regular intervals in 16 cohorts. These metadata came from longitudinal cohorts where patients were tested on regular intervals with a maximum time span between the last HIV-negative and first HIVpositive test of 365 days (median = 168 days), where the authors assumed that the time of seroconversion occurred at the interval midpoint. We used patients sampled at least twice, resulting in 2975 OD-n measurements from 718 HIV-1 patients.
In addition, we performed BED-testing on plasma samples from 819 patients who were previously diagnosed as HIV-1-infected in Sweden in 2002-2010. These patients are a subset from a recently published study of transmitted drug resistance [21]; we included patients who were living in Sweden when they became infected, whereas patients infected before first arrival in Sweden were not included. The study population constituted 68% (819 of 1196) of all Swedish patients in this category who were diagnosed during the study period and they also accurately reflect the entire population with respect to gender, age, transmission routes, and infections with various HIV-1 subtypes (approximately 40% subtype B, and further subtypes A, C, D, CRF01, CRF02 and others). In contrast to the patients studied by Parekh et al., the Swedish patients did not undergo HIV-testing at regular intervals as they were identified by regular public health diagnostic testing. Nevertheless, for 523 of the 819 Swedish patients we had the date of the last negative HIV-test result.
For the Swedish samples BED OD-n was measured using the Aware TM BED TM EIA HIV-1 Incidence Test (Calypte Biomedical Corporation, Portland, OR, USA) according to the manufacturer's instructions on a Dynex Technologies MRX Revelation spectrophotometer. A calibrator is used with a known amount of HIV-1-specific IgG in order to make individual runs comparable. Thus, a normalized OD-value (OD-n) for each well is calculated by dividing the raw OD-measurement by the median calibrator value of that individual run. As specified by the manufacturer, samples with OD-n values ,1.2 were rerun in triplicate and the median value was used.

A Logistic Model Describes BED IgG Production
Similar to many biological systems where the rate of reproduction is proportional to the existing population and limited resources, the growth of HIV-1-specific IgG following seroconversion can be modeled by a logistic function, where r is the growth rate of HIV-1-specific IgG and K is the limiting factor, lim t?? P IgG (t)~K, aka. the carrying capacity. We assume that the carrying capacity does not change over the time we are interested in, i.e., during the ramp-up of IgG directed to HIV-1 in the first few years of infection. The solution to this differential equation is We focus on the portion of HIV-1-specific IgG that is measured by the BED assay [13,18,22], henceforth referred to as ''BED IgG''. This assay measures the absorbance of light (l = 450 nm with reference 630-650 nm) of HIV-1-specific IgG complexes. According to Beer-Lambert's law, the absorbance (optical density, OD) is directly proportional to the concentration of the absorbing species, OD IgG (l) = log 10 (I 0 /I), where I 0 /I is the ratio of light that passes through a solution containing BED IgG complexes. The measured OD IgG is normalized using an assay standard as described above; the calibrated OD value is denoted OD-n.
Hence, because OD-n operates on a logarithmic scale, the logistic function is transformed to a linear-asymptotic curve, which we model as where r is modeled as the logarithm to enforce positivity of the rate constant, ensuring it will reach the asymptote K.

Estimating Logistic Model Parameters
Based on metadata from Parekh et al [18], we selected patients with longitudinal samples (n$2) to use as model training data. In Parekh et al. the mean time interval between their estimated time of seroconversion and reaching a specified assay cutoff value in a population was defined as the ''mean recency period''. We are ultimately interested in the date of infection of each patient, T inf (i); for that, we first estimate the time (t) between when the sample for BED testing was collected (T BED ) and when seroconversion occurred (T sc ), as defined in Figure 1. Thus, the logistic model parameters will refer to time since seroconversion for a typical patient. We used a mixed effects linear-asymptotic model to accommodate the logarithmic OD-n scale to infer OD IgG , r and K (Eq. 2) corresponding to the three parameters of the logistic IgG model (Eq. 1). In addition, we modeled OD IgG and r independently from K by a generalized linear mixed effects model in the IgG growth phase (t#350 days).
While we model both the response OD IgG and the random effects B as random variables, we only observe values of OD IgG .
The conditional distribution OD IgG |B and the marginal distribution B are independent, multivariate normal distributions. Values of OD IgG were grouped corresponding to patient, resulting in 2975 OD-n measurements in 718 groups ( Figure S1). Consequently, the model estimates parameter values representative for the whole population of the training data in the fixed effects, while the random effects describe conditional modes of the estimated parameters. The random effects in the linear-asymptotic mixed model are described in a general positive-definite matrix structure, allowing parameter inclusion/exclusion as well as defining different covariance dependencies. We tested all possible random effects structures (df = 5-9) to investigate which of them that could be omitted to avoid over-parameterizing our model (Table 1). Similarly, to investigate parameterization level and dependencies in the generalized linear mixed effects modeling of OD IgG and r only, we tested whether a correlated (df = 4) or uncorrelated (df = 5) random effects model would better fit the data. To maximize the amount of patient data and to allow for varying trends as well as varying sampling periods, for this analysis we included all patients sampled at least 5 times within t ,350 days (N = 116).
All submodels are referred to in the following text by their model abbreviations as defined in Table 1.
The linear-asymptotic mixed model was fitted by full maximum likelihood estimation using the nlme package version 3.1-103 [23] and the generalized linear mixed model was fitted by restricted maximum likelihood (REML) estimation using the lme4 package version 0.999375-42 [24], both made for the R computing environment [25].

Model Selection, Validation and Estimating Time-bias
To select which of the mixed effects parameterizations best described independent data, model validation was done with the Swedish data. We used our logistic model with parameters estimated by the fixed effects to translate the Swedish OD-n measurements to estimate the time interval ( ' t) between a patient's date of seroconversion ( ' T sc ) and date of BED test (T BED ). By definition, when OD-n ,0.07, the lower BED detection limit [18], ' t = 0. As defined in Figure 1, this time was compared to a time Figure 1. Definitions of dates and time intervals relative to BED testing. The time since seroconversion (t) is the time from the date a patient seroconverted (T sc ) to when a sample for BED testing was collected (T BED ). We estimate t by a logistic IgG model (Eq. 1) as ' t. Date of infection (T inf ) occurred on average 21 days prior to T sc [9,10]. When available, the patient history also includes the dates of last negative HIV-1 antibody testing (T (2) ) and first positive HIV-1 antibody testing (T (+) ), defining the serological interval. Note that T (+) and T BED may often occur at the same date. To reevaluate national Swedish HIV surveillance data we compared ' T inf with T (+) , resulting in a time difference D. doi:10.1371/journal.pone.0060906.g001 interval constrained by the last negative and first positive HIV-1 testing (the serological interval) relative to BED testing, T (2) and T (+) , respectively. Thus, ' t = a+b|T (2) +t(T (+) -T (2) )|, where t = (0,1) describes the relative position within the serological interval (T (2) ,T (+) ), optimized when a = 0 and b = 1, which assumes that ' t perfectly infers T sc . Final model selection was performed by a hit-and-miss statistic, formally evaluated by a Poisson test. The hit accuracy was measured as the mean distance (in days) between the model-inferred ' t's and the serological intervals (targets). When a patient's target was hit the accuracy distance was zero. The precision is then defined as the distribution of the accuracy distances.

Comparing Date of Infection to National Swedish HIV Surveillance Data
National data on number of diagnoses per year in Sweden were collected from the Swedish Institute for Communicable Disease Control (http://www.smittskyddsinstitutet.se/statistik/ hivinfektion/, accessed 01-26-2012). To match the Swedish BED data we excluded patients that had been infected before first arrival to Sweden. Between years 2005-2008 the sampling of our BED data was directly proportional to the national number of diagnoses per year (p.0.05, Wilcoxon rank sum test), and included all relevant transmission risk groups.
The Swedish BED data consisted of patients diagnosed September 2002 through July 2010 (N = 819). The date of infection (T inf ) for each patient was estimated as the date of BED sampling (T BED ) minus two time intervals; the modelinferred time since seroconversion ( ' t) and a time interval of 21 days between infection and seroconversion. The latter time interval was based on published data on HIV-1 seroconversion phases [9,10]. Hence, ' T inf = T BED -' t -21 days. To reevaluate national Swedish HIV surveillance data we compared ' T inf with the reported date of diagnosis (first positive HIV-1 sample) T (+) , resulting in a time difference D ( Figure 1). As mentioned above, the date of diagnosis and date of BED sampling was identical for many of the patients in our study.

Model Training and Parameterizations Using Cohort Metadata
In the full mixed effects modeling the random effect of OD IgG (0) was found to have very small deviations (model ISR OD IgG (0) s.d. = 1.22610 25 OD-n units). Furthermore, no covariance between OD IgG (0) and r was observed (model ISR OD IgG (0) to r correlation = 0.001). Similarly, when only modeling OD IgG (0) and r in the IgG growth phase (models CG and UG; Table 1), an ANOVA supported the observation that no pattern of correlation between intercepts OD IgG (0) and slopes r could be observed in the random effects (p,,0.001, x 2 = 134.72, df = 1; Table 1). In addition, biologically it is logical to assume that there is no patient variation in the HIV-1-specific IgG level before a patient has been infected; they should all be below detection limit of the BED assay. Indeed, the fixed effect OD IgG (0) in all models was very close to zero at t = 0. Importantly, when OD IgG = 0, ' t was also close to zero for all models, which means that the assumption that seroconversion on average occurred at the midpoint of the serological interval was correct for these cohort data.
For K the relevant time interval since seroconversion is defined by when the BED HIV-1-specific IgG production has reached its asymptote. In the training data that appeared in SAR at approximately ' t = 711 days (OD-n within 99% of K) in those patients that were followed at least that long (N = 74). For comparison to the fixed effect K, that data resulted in a normally distributed OD-n distribution (p = 0.64, Shapiro-Wilk test), with a weighted mean of OD-n = 2.65 and a standard deviation of 0.89 OD-n units. Thus, our fixed effect K is well within the spread of patient data that cover the asymptotic phase, justifying the use of our mixed effects modeling to find the typical patient parameter values.
Overall our best parameterization appeared to be SAR, which includes all fixed effects and the random effects of r and K (Table 1), however, we could not exclude models SR and ISR on statistical grounds using the cohort training data (AIC scores of SR and ISR were not significantly worse than that of SAR). Model SAR estimated the growth rate at r = 0.00672 OD-n units per day, and the asymptote at K = 1.85 OD-n units, while models SR and ISR both estimated r = 0.00151 and K = 3.78. To evaluate which of these models performed best on independent data, we examined new BED data collected from Swedish patients detected by regular public health diagnostic testing, i.e., non-cohort type data.

Model Selection and Validation Using Non-cohort Type Data
For model validation we used new data from 819 Swedish HIV-1-infected patients diagnosed in 2002-2010. For the SAR model, 500 patients had BED OD-n measurements that fell within the model predictive interval, and of these 350 had a previous negative test. For the SR and ISR models 703 patients had a BED OD-n measurement that fell within the model predictive intervals, and of these 464 had a previous negative test. These patients describe a general population, with different transmission modes, analyzed by one BED measurement per patient, similar to cross-sectional data. A hit-and-miss statistic, measuring whether the model-inferred date of seroconversion( ' T sc ) hit between the dates of collection of the last negative and first positive HIV-1 test (the serological interval; Figure 1), identified SAR as the best model. In the interval where all methods had predictive power ( ' t,711 days), the point estimate of SAR hit 90% of the patients' serological intervals, compared to 88% for SR and ISR. When including the 95% confidence interval (CI), SAR hit 94% compared to 92% for SR and ISR. Critically, the number of patient serological intervals SAR predicted correctly while SR and ISR failed was significantly better than when SR and ISR were correct and SAR failed (p,0.05 and p,0.0005, Poisson test, respectively for point estimate and 95% CI estimate). Even when considering the longer predictive interval of SR and ISR ( ' t,3056 days), these models only hit 86% of the serological intervals.
The accuracy of the estimated date of seroconversion using SAR was on average only 3.8 days off the serological interval, significantly better than SR and ISR at 60 days (p,0.05, paired Wilcoxon rank sum test). Among those patients that SAR missed (N = 29 of 350), unsurprisingly, there was a tendency towards smaller serological intervals (mean target size = 327 days; p,0.001, jackknife subsampling). However, the hit-and-miss statistic for targets ,327 days was still good at 88% accuracy. Importantly, SAR showed no correlation between the length of the serological interval (target size) and the precision of the model estimate, measured by the distribution of the accuracy measurements (p = 0.12, Pearson's correlation = 20.083).
Hence, the validation data showed that SAR was our best parameterization of the logistic IgG model ( Figure S2), which supports the biological intuition that there is no patient variation in BED OD-n at t = 0. Thus, the typical patient is represented by a logistic growth of the BED detected HIV-specific IgG following infection ( Figure 2). The model is informative of time since seroconversion when the BED OD-n value of the kit negative control is within an acceptable range of OD-n = (0,0.3), corresponding to ' t = (0,52) days, but specific to each run of BED measurements. Using the average value for a positive test result in the cohort data (OD-n = 0.07 (range: 0.05,0.11; [18]), and OD-n#1.84 (corresponding to a OD-n value within 99% of the asymptote in SAR), the informative OD-n interval translates into a continuous time interval with predictive power in ' t = (31, 711) days since seroconversion. In our Swedish data the BED kit negative control was at OD-n = 0.16 (range: 0.08,0.28), corresponding to ' t = 39 days. The fixed effects of this model, describing the typical patient with an accuracy of 94%, was described by For comparison, the BED binary classification of whether patients have ''recent'' or ''long-term'' HIV-1 infection is based on a cutoff at OD-n = 0.8, reported in different studies to time since seroconversion of 109-220 days [16,17,18,26]. This time overlaps with our time-continuous model (SAR) estimate, which predicts that at OD-n = 0.8 the typical patient seroconverted 92-133 days before BED test sampling (95% CI).

Date of Seroconversion is Biased Towards Date of Diagnosis
We next investigated where the inferred date of seroconversion was inferred on each corresponding patient's serological interval in our Swedish data (N = 350). Naturally, seroconversion (and infection, bar the time from infection to detectable HIV-1 by a valid method [27]) must have happened sometime between the dates that define the serological interval ( Figure 1). Recall that the Swedish data was not from a cohort study, but rather data from for patients detected by regular public health diagnostic testing. From this type of data it is not obvious that the population average date of seroconversion is in the middle of the serological interval. Indeed, the Swedish data shows a clear bias of seroconversion shifted towards the date of diagnosis T (+) (Figure 3). The relative position (t) of our model-based point-estimate of the date of seroconversion within the serological interval was significantly right-skewed (p,0.01, Wilcoxon rank sum test). Hence, this shows 1) that BED test results are applicable to infer the date of seroconversion in non-cohort type data, but 2) that estimating date of seroconversion as the midpoint between the last negative and the first positive HIV-1 test result is inaccurate and misleading in this type of data.
Patients in the Swedish data (that had a previous negative test and were within the model predictive interval, N = 350) were estimated to have seroconverted at a median of 60 days before BED testing ( Figure 3C). However, the long tail of this distribution implies that many patients had seroconverted considerably longer ago. As expected when including patients above the model predictive interval (OD-n.1.84) a longer median time since seroconversion was estimated at 143 days (N = 500). Moreover, when analyzing the entire Swedish set (N = 819), the median time

Reevaluation of National Swedish HIV Surveillance Data
Using our model-based estimations of date of seroconversion we reevaluated epidemiological data for Swedish patients from whom there previously only was information on the date of diagnosis. As an illustrative example, Figure 4 shows reevaluations affecting year 2006, moving cases into 2006 from following years or out of 2006 to previous years. Most diagnoses stemmed from a date of infection within a year before or after 2006, and a few (n = 4) were estimated to have been infected longer ago than possible to estimate with the SAR model. Note that here the maximal time that can operate is 732 days, composed of the SAR upper predictive value (711 days) plus the average time from infection to seroconversion (21 days).
Panel A in Figure 5 shows the resulting distributions of time between diagnosis and date of infection as inferred by our timecontinuous IgG model (SAR) for years 2003-2009, partitioned into men who have sex with men (MSM), injecting drug users (IDU), and heterosexual (HET) transmission groups. For comparison, we have included results from conventional BED assay interpretation using a binary model (Bin) which classifies infections as ''recent'' or ''long-term'', with a cutoff at OD-n = 0.8 [18]. Each field shows the predictions of Bin and SAR of patients classified as within (orange) or beyond (blue) the ''recent'' or quantifiable range, respectively. It is evident that the SAR model gives more informative results, i.e., SAR classifies more patients with its quantifiable range than Bin classifies as ''recent''.
Using our date of infection-estimates we note some interesting results relating to the Swedish HIV epidemic: 1) MSM in general had larger proportions of yearly diagnosed individuals that were classified within the quantifiable range (,711 days) than IDU and HET ( Figure 5A Figure 5B), which corresponds to a well-described   CRF01_AE outbreak among IDUs in Stockholm discovered in the summer of 2006 [28]. Indeed, further dividing the infections into subtypes showed that the shift was due to CRF01_AE infections and not an increase of subtype B infections, which previously had been the dominantly spreading subtype among Swedish IDUs. Furthermore, the fact that a relatively large number of cases discovered in 2007 were predicted to have been infected for more than 732 days indicates that the outbreak started earlier than the diagnosis dates would suggest.
A comparison of the number of HIV diagnoses (grey line) and SAR inferred date of infections (black line), summarized quarterly from mid-2002 to mid-2010, is shown in Figure 5 Figure S3.

Discussion
Due to the current limitations in HIV-1 incidence estimation there is considerable interest amongst international and national public health agencies in serological biomarkers that relate to the time of HIV-1 infection, such as the BED assay that was developed by the USA CDC [1,2,3,4,5,6,12,13,29,30,31]. For example, an international panel recently urged for novel incidence assays and algorithms, especially for use on cross-sectional data [32]. Here, we have created a time-continuous model that allows quantitative estimation of time since seroconversion based on BED assay results that works on both cohort and cross-sectional type of data. We expect that time-continuous rather than two-level discrete (recent or long-term) estimates will make incidence estimation more accurate, given that this new detailed information can be incorporated in algorithms used to calculate incidence in a population. For instance, Sommen et al [33] recently proposed an approach for estimating HIV incidence from continuous biomarker values. We exemplify that our quantitative estimation of time since seroconversion can improve national Swedish HIV surveillance data, which currently show a peak in newly diagnosed cases in 2007 while our analyses show that most of the corresponding infections occurred in 2006. Importantly, this finding is corroborated by independent phylodynamic analyses of an outbreak among IDUs [28]. Our method should be valuable also in other countries that have incorporated biomarker testing in their national HIV surveillance programs, e.g., France, the UK and Germany [6,30,31].
For patients diagnosed as a result of seeking public health diagnostic testing services we found that the date of seroconversion, as inferred by our BED-based SAR model, was closer to the first HIV-1 positive sample than to the midpoint between the last negative and first positive sample. This should not be surprising because persons who seek public health services often have a reason to get tested for HIV infection, such as an unsafe sexual encounter. In contrast, we found that midpoint dating is a valid approach for cohort population-studies of HIV-1 infection (followed longitudinally) because they are tested at regular intervals rather than due to perceived risk of infection. Hence, date of serocoversion, and by inference date of infection, can be estimated using the midpoint approach for cohort data, but not for data resulting from public health diagnostic testing. This fact is supported by previous incidence modeling results showing that cohort-based estimates were robust against dependence between testing and time of infection, while STARHS estimates may be biased because of early testing in recently infected persons [34,35]. This finding is relevant because the ''midpoint assumption'' frequently is made also on data collected from patients diagnosed in public health services; this includes for instance the three large European collaborative projects Eurosida, Cascade, and Spread [36,37,38,39].
Every natural system is limited by its resources. The HIVspecific IgG growth within a patient thus has to reach a limit, in population biology often referred to as the carrying capacity of the system. Our model does not specify the limiting factor(s), but it is obviously related to the immune response to HIV replication and production, which in turn is controlled by e.g. target cell populations and immune clearance [40]. Thus, this becomes a dynamic system that ultimately determines the carrying capacity of the system. Over long infection times, and certainly with development of immunodeficiency, it is reasonable to assume that the carrying capacity changes. While logistic models with more than one carrying capacity have been developed [41,42], we did not include this complication as we are only interested in the stage during which the initial IgG response develops. Once the (first) carrying capacity is reached (here at t .711 days), such models no longer have power to predict time. Encouragingly, based on more limited data, another paper published while our study was under review found a similar expression for the BED IgG growth using a statistical approach rather than our biologically principled approach [43]. Similar to the biological limiting factors that influence the carrying capacity, the OD measurements also have an upper limit. Most ELISA spectrophotometers used to measure absorbance have an upper limit of OD = 3-4, and therefore there is also a technical limit on the maximum IgG concentration that can be measured. Note that this refers to the raw OD measured directly on each sample. In the BED assay, the raw measurements are normalized by an assay standard, making results comparable between runs. This standard should have a raw absorbance in the interval OD = (0.380,1.350), resulting in an upper OD-n range that can be reliably modeled of 2.22-10.53. As our model estimates the carrying capacity at K = 1.84 [1.75, 1.95] OD-n units, instrument limitations should have minimal impact in the model predictive interval of ' t#711 days. For patients with high OD-n values, it is possible that the model predictive interval could be extended by serial dilutions of patient's serum samples prior to BED testing, but this is something that we have not yet explored, partly because no such training data are available.
Recently, Parekh et al showed that different human populations as well as humans infected with different HIV-1 subtypes may show different rates of development of BED-specific IgG in response to HIV-1 infection [18]. They analyzed a large set of cohorts from different geographical locations worldwide, and concluded that previous recency period cutoff-times based on subtype B virus infections needed to be adjusted to better describe world variation. The new BED kit instructions will be updated to reflect this important finding. The parameter values of our logistic model were informed by a slightly expanded set, kindly provided by Dr. Parekh, and thus also reflect the world human-as well as HIV-1-variation. Similarly, our Swedish data also consisted of patients of different genetic backgrounds as well as infections with different HIV-1 subtypes [21,44]. However, we did not attempt to explicitly include data about human genetics or HIV-1 genetic subtype in our model because 1) it is still largely unknown if human genetic factors involved in humoral immune responses differ among human populations, 2) differences within HIV-1 subtypes appear to affect OD-n trends as much as between subtypes [18], and 3) often this type of information is not available anyway.
The fact that the BED assay can give a false impression of recent infection for patients with advanced disease [13] deserves some discussion. This is an important problem when the BED assay is used for HIV-incidence estimation in populations by anonymous testing. Thus, if diagnosis occurs in late stage infection and no other clinical data is available, e.g. CD4 counts, the problem with false recent classification becomes more severe. However, for our Swedish patients we had access to CD4 counts informing about possible late stage. We are currently exploring if our model can be further improved by formal incorporation of CD4 counts as a covariate and/or by using results from two or more consecutive BED-tests from each patient. Similarly, ''late presentation'', i.e. persons presenting for care with a CD4 count below 350 cells/mL [45,46], affects around 50% of patients diagnosed in several European and US settings [45,47,48,49]. Late presentation is an important clinical problem because it leads to increased morbidity and mortality [45,46,50] as well as epidemiological problems because patients who are unaware of their infection are more likely to transmit the infection to others than patients who have been diagnosed [51]. However, it is important to point out that late presentation is not equivalent to a longstanding infection [52]. Thus, our method to estimate the date of infection could add important information on the epidemiology of late presentation.
In conclusion, we have created a model that quantifies the time since seroconversion based on a simple serological assay, i.e. the BED assay. The model is applicable to BED results from patients included in cohort studies as well as patients diagnosed as a result of public health services. This model should be generally applicable to many quantitative antibody tests, such as improved HIV-1 ''recency''-tests as well as tests for other pathogens. We show that using the midpoint between the last negative and the first positive HIV-1 sample gives an inaccurate estimate of date of seroconversion for patients identified by regular public health diagnostic testing, but has validity for patients who are sampled at pre-defined intervals, e.g. in longitudinal cohort studies. We expect that our method can improve incidence estimates, and thus provide valuable information for HIV-1 surveillance and prevention. Figure S1 Model training data. The graph shows data from Parekh et al [18] for patients sampled $2 times. This formed the model training data and included 2975 OD-n measurements from 718 patients. (PDF) Figure S2 Individual patient data compared to population estimate. Each of the 718 patient's BED data in the model training set is individually compared to our logistic IgG model (Eq. 1) informed by the SAR-estimated fixed effects parameter values (blue lines). (PDF) Figure S3 The difference of estimating date of infection compared to using date of diagnosis. The 95% confidence bands of the difference between number of diagnoses and estimated infections shown quarterly for the IDU transmission group. Points on the zero line indicate that the number of diagnoses is a good approximation of the number of infections in that quarter. A significant deviation from zero is highlighted with red points when diagnoses would underestimate infections and blue points for overestimating infections. (EPS)