Seroconverting Blood Donors as a Resource for Characterising and Optimising Recent Infection Testing Algorithms for Incidence Estimation

Introduction Biomarker-based cross-sectional incidence estimation requires a Recent Infection Testing Algorithm (RITA) with an adequately large mean recency duration, to achieve reasonable survey counts, and a low false-recent rate, to minimise exposure to further bias and imprecision. Estimating these characteristics requires specimens from individuals with well-known seroconversion dates or confirmed long-standing infection. Specimens with well-known seroconversion dates are typically rare and precious, presenting a bottleneck in the development of RITAs. Methods The mean recency duration and a ‘false-recent rate’ are estimated from data on seroconverting blood donors. Within an idealised model for the dynamics of false-recent results, blood donor specimens were used to characterise RITAs by a new method that maximises the likelihood of cohort-level recency classifications, rather than modelling individual sojourn times in recency. Results For a range of assumptions about the false-recent results (0% to 20% of biomarker response curves failing to reach the threshold distinguishing test-recent and test-non-recent infection), the mean recency duration of the Vironostika-LS ranged from 154 (95% CI: 96–231) to 274 (95% CI: 234–313) days in the South African donor population (n = 282), and from 145 (95% CI: 67–226) to 252 (95% CI: 194–308) days in the American donor population (n = 106). The significance of gender and clade on performance was rejected (p−value = 10%), and utility in incidence estimation appeared comparable to that of a BED-like RITA. Assessment of the Vitros-LS (n = 108) suggested potentially high false-recent rates. Discussion The new method facilitates RITA characterisation using widely available specimens that were previously overlooked, at the cost of possible artefacts. While accuracy and precision are insufficient to provide estimates suitable for incidence surveillance, a low-cost approach for preliminary assessments of new RITAs has been demonstrated. The Vironostika-LS and Vitros-LS warrant further analysis to provide greater precision of estimates.


A The Data
Scatter plots of the datasets (by Recent Infection Testing Algorithm (RITA) and country of collection) are provided below. The standardised optical density (SOD) at the time of the first seropositive donation and the interdonation (ID) interval between the last seronegative and first seropositive donation, for each seronverting blood donor, are shown. SOD values below the thresholds (indicated by blue lines) indicate recent infections.

B The RITA Characteristic Estimators
The maximum likelihood estimators for the RITA characteristics to be estimated are derived below. The distributional properties of the estimators, for large samples, are also noted.

Derivation of the Maximum Likelihood Estimators
For a seroconverter with interdonation (ID) interval ∆ between the last seronegative test and first seropositive test: 1. X denotes the result of the RITA at the time of the first seropositive test, and has a probability mass function f X (x), 2. Y is the time since seroconversion at the time of the first seropositive donation, and the time of seroconversion is uniformly distributed in the ID interval, The joint probability function of X and Y is denoted by f X,Y (x, y), and the distribution of X conditional on Y by f X|Y (x|y).
3. S R (t) is the probability that the seroconverter is in the state of recent infection a time t after seroconversion, conditional on being alive. S R (t) = f X|Y (x|y).
The probability, p, that the seroconverter is classified as recently infected at the time of the first seropositive donation is 6 The likelihood, L, of all RITA classifications in a sample of n seroconverters is where the subscript i denotes quantities relating to the i th seroconverter in the sample, and x denotes the observed values of X.
The analyses of McDougal et al [1], McWalter and Welte [2] and Wang and Lagakos [3] assume individual SOD curves either cross the threshold (distinguishing recent from non-recent infection) and remain above it or fail to reach the threshold, and therefore S R (t) approaches some constant value, α, which is the proportion of SOD curves that fail to cross the threshold, for t larger than some time cutoff T , The mean recency duration, ω, is the mean of the times taken to cross the threshold, for those SOD curves that do so, described by S R (t).
More generally, S R (t) may not remain constant for t > T . A false-recent rate, ε, may then be defined as the proportion of individuals, who have been seropositive for longer than T , that is classified as recently infected [4].
For S R (t) exhibiting little variability around an approximately constant value for t > T , the parameterisation in (3) is used to obtain rough estimates of the RITA characteristics.
Substituting (3) into (1), the probability that the seroconverter is recently infected at the time of the first seropositive donation becomes For S R (t) = S R (θ, t), L is a function of the unknown parameters θ and α (if there is no input estimate for α), which are estimated to maximise L. The estimate of ω is ∞ 0 S R (θ, t) dt, whereθ is the estimate of θ.
This likelihood approach also facilitates non-parametric inference, by considering only individuals with large ∆. Since if ∆ > T , then is the mean recency duration.
Substituting (6) into (4), p becomes a function of the RITA characteristics, and the likelihood function becomes and n * (≤ n) is the size of the sample consisting of all seroconverters with ID intervals larger than T (and the subscript i denotes quantities relating to the i th individual in this smaller sample). The estimated RITA characteristics maximize the likelihood L, which is now a function of ω, and α (if there is no input estimate of α).
Simultaneous estimation of the RITA characteristics is less feasible in samples with closely clustered ID intervals. In the extreme case of ∆ i = ∆ for all i, simultaneous estimation is not possible as there are no unique estimates of ω and α which maximise the likelihood function (which is maximised when

Properties of the Estimators
A maximum likelihood estimator,ξ, is asymptotically (as the sample size n → ∞) normally distributed around the true parameter value, ξ, with variance equal to the inverse of the expected Fisher's Information Matrix: where E(.) is the expected value and L(.) is the likelihood function, under regularity conditions [5].
Large sample approximations for the properties of the estimators for the mean recency duration and proportion of SOD curves that fail to reach the threshold,ω andα, follow.
where p i and L = L(ω) are given in (8).
where the covariance matrix is and p i and L = L(ω, α) are given in (8).

C Fit of Estimated RITA Characteristics, Vironostika-LS
The method of maximum likelihood, outlined in Section B above, was used to characterise the Vironostika-LS, in the South African and American repeat donor populations. Firstly, the mean recency duration, ω, was estimated assuming a known α. Secondly, simultaneous estimation of ω and α was performed. Non-parametric estimation was applied, using data on seroconverters with interdonation (ID) intervals larger than T = 1 year (n = 282 for South Africa and n = 106 for USA).
In the figures below, the observed percentages and expected percentages (obtained by substituting estimated RITA characteristics into (7)) of seroconverters who were recently infected at the first seropositive donations, as a function of ID interval, are compared. Subjects with similar ID intervals were grouped together (at least 20 subjects per group), and the observed and expected percentages were plotted against the average ID interval, per group. In Figures 4 and 6, the 95% confidence interval limits for the expected percentages (dotted lines) are obtained by substituting the 95% confidence interval limits for ω, not taking any uncertainty in α into account, into (7). In Figures 5 and 7, the plotted limits for the expected percentages (dotted lines) indicate the minimum and maximum values for the probability of being recently infected, p, obtained when considering all pairs of values for the RITA characteristics lying within the 95% confidence regions for ω and α.

D Parametric Versus Non-Parametric Estimation
By using only data with sufficiently large interdonation (ID) intervals, the need for parametric assumptions is circumvented (see Section B above). While this protects against bias arising from poor parametric assumptions, the sample size is reduced.
The RITA characteristics of the Vironostika-LS, in the South African repeat donor population, were estimated using all data and a number of parametric assumptions (characterisations of S R (t) = S R (θ, t), where θ is a vector of parameters). The seven assumed forms for S R (θ, t) are plotted in Figure 8. By design, θ = ω for each form.
In Table 1, estimates of the mean recency duration, ω, using the various parametric assumptions, are tabulated. The results of the chi-squared goodness of fit tests [6], used to assess the agreement between the data and assumptions, are also provided. Widely varying estimates of ω were obtained, even after discarding those estimates for which data and assumptions poorly agreed. Since the underlying dynamics of the data are unknown, the extent of bias is unknown.
Simulated data was therefore used to investigate the trade-off between the increased precision from larger samples and increased potential for bias from poor parametric assumptions, when moving to a parametric approach. 100 datasets (of 500 seroconverters each) were simulated, assuming each of the seven forms for S R (θ, t) and α = 0%. ID intervals were simulated from a non-parametric distribution fitted to the ID intervals in the dataset for the Vironostika-LS, South Africa. For each dataset, the goodness of fit was assessed and ω estimated, using each parametric assumption. The non-parametric method was also applied, using all ID intervals greater than T = 1 year. Underestimation of ω is therefore expected for the distributions with maximum times in the state of recent infection greater than 1 year.
The results of the investigation, provided in Table 2, indicate that, although moving to a parametric approach allows all data to be exploited, there is the potential of introducing large bias in estimates from indistinguishably poor parametric assumptions. The average 95% confidence interval widths, when using the correct parametric assumptions and the non-parametric approach, are also provided in Table 2. The increased widths when moving to the non-parametric approach illustrate the loss of precision incurred when discarding data with insufficiently large ID intervals.

E Fit of Estimated RITA Characteristics, Vitros-LS
The method of maximum likelihood, outlined in Section B above, was used to characterise the Vitros-LS, in the South African repeat donor population. Firstly, the mean recency duration, ω, was estimated assuming a known α. Secondly, simultaneous estimation of ω and α was performed. Non-parametric estimation was applied, using only data points with interdonation (ID) intervals larger than T . In the figures below, the observed percentages and expected percentages (obtained by substituting estimated RITA characteristics into (7)) of seroconverters who were recently infected at the first seropositive donations, as a function of ID interval, are compared for T = 1 year (n = 108) and T = 2.5 years (n = 59). Subjects with similar ID intervals were grouped together (at least 20 subjects per group), and the observed and expected percentages were plotted against the average ID interval, per group. In Figures 9 and 11, the 95% confidence interval limits for the expected percentages (dotted lines) are based on the 95% confidence interval limits for ω, not taking any uncertainty in α into account. In Figures 10 and 12, the plotted limits for the expected percentages (dotted lines) indicate the minimum and maximum percentages obtained when considering all pairs of values for the RITA characteristics lying within the 95% confidence regions for ω and α.