Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Population-level HIV incidence estimates using a combination of synthetic cohort and recency biomarker approaches in KwaZulu-Natal, South Africa

  • Eduard Grebe ,

    Contributed equally to this work with: Eduard Grebe, Alex Welte

    Roles Conceptualization, Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation DST-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa

  • Alex Welte ,

    Contributed equally to this work with: Eduard Grebe, Alex Welte

    Roles Conceptualization, Formal analysis, Methodology, Writing – review & editing

    Affiliation DST-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa

  • Leigh F. Johnson,

    Roles Data curation, Formal analysis, Software, Writing – review & editing

    Affiliation Centre for Infectious Diseases Epidemiology and Research (CIDER), University of Cape Town, Cape Town, South Africa

  • Gilles van Cutsem,

    Roles Data curation, Funding acquisition, Investigation, Writing – review & editing

    Affiliations Centre for Infectious Diseases Epidemiology and Research (CIDER), University of Cape Town, Cape Town, South Africa, Médecins Sans Frontières, Cape Town, South Africa

  • Adrian Puren,

    Roles Investigation, Writing – review & editing

    Affiliation National Institute for Communicable Diseases (NICD), National Health Laboratory Service, Johannesburg, South Africa

  • Tom Ellman,

    Roles Data curation, Funding acquisition, Investigation, Writing – review & editing

    Affiliation Médecins Sans Frontières, Cape Town, South Africa

  • Jean-François Etard,

    Roles Data curation, Funding acquisition, Investigation, Writing – review & editing

    Affiliations TransVIHMI, Institut de Recherche pour le Développement (IRD), Institut National de la Santé et de la Recherche Médicale (INSERM), Montpellier University, Montpellier, France, Epicentre, Paris, France

  • the Consortium for the Evaluation and Performance of HIV Incidence Assays (CEPHIA) ,

    Membership list can be found in the Acknowledgments section.

  • Helena Huerga

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Writing – review & editing

    Affiliation Epicentre, Paris, France

Population-level HIV incidence estimates using a combination of synthetic cohort and recency biomarker approaches in KwaZulu-Natal, South Africa

  • Eduard Grebe, 
  • Alex Welte, 
  • Leigh F. Johnson, 
  • Gilles van Cutsem, 
  • Adrian Puren, 
  • Tom Ellman, 
  • Jean-François Etard, 
  • the Consortium for the Evaluation and Performance of HIV Incidence Assays (CEPHIA), 
  • Helena Huerga



There is a notable absence of consensus on how to generate estimates of population-level incidence. Incidence is a considerably more sensitive indicator of epidemiological trends than prevalence, but is harder to estimate. We used a novel hybrid method to estimate HIV incidence by age and sex in a rural district of KwaZulu-Natal, South Africa.


Our novel method uses an ‘optimal weighting’ of estimates based on an implementation of a particular ‘synthetic cohort’ approach (interpreting the age/time structure of prevalence, in conjunction with estimates of excess mortality) and biomarkers of ‘recent infection’ (combining Lag-Avidity, Bio-Rad Avidity and viral load results to define recent infection, and adapting the method for age-specific incidence estimation). Data were obtained from a population-based cross-sectional HIV survey conducted in Mbongolwane and Eshowe health service areas in 2013.


Using the combined method, we find that age-specific HIV incidence in females rose rapidly during adolescence, from 1.33 cases/100 person-years (95% CI: 0.98,1.67) at age 15 to a peak of 5.01/100PY (4.14,5.87) at age 23. In males, incidence was lower, 0.34/100PY (0.00-0.74) at age 15, and rose later, peaking at 3.86/100PY (2.52-5.20) at age 30. Susceptible population-weighted average incidence in females aged 15-29 was estimated at 3.84/100PY (3.36-4.40), in males aged 15-29 at 1.28/100PY (0.68-1.50) and in all individuals aged 15-29 at 2.55/100PY (2.09-2.76). Using the conventional recency biomarker approach, we estimated HIV incidence among females aged 15-29 at 2.99/100PY (1.79-4.36), among males aged 15-29 at 0.87/100PY (0.22-1.60) and among all individuals aged 15-59 at 1.66/100PY (1.13-2.27).


HIV incidence was very high in women aged 15-30, peaking in the early 20s. Men had lower incidence, which peaked at age 30. The estimates obtained from the hybrid method are more informative than those produced by conventional analysis of biomarker data, and represents a more optimal use of available data than either the age-continuous biomarker or synthetic cohort methods alone. The method is mainly useful at younger ages, where excess mortality is low and uncertainty in the synthetic cohort estimates is reasonably small.


Application of this method to large-scale population-based HIV prevalence surveys is likely to result in improved incidence surveillance over methods currently in wide use. Reasonably accurate and precise age-specific estimates of incidence are important to target better prevention, diagnosis and care strategies.


HIV epidemic surveillance largely relies on cross-sectional measurements of prevalence, often by means of representative household surveys. However, for a non-remissible condition with extended survival time like HIV, instantaneous prevalence reflects the epidemic trajectory (incidence, mortality and migration) over a significant period prior to the survey. Estimating HIV incidence—the most sensitive and informative indicator of current epidemiological trends—therefore poses significant methodological challenges.

The ‘gold standard’ method of directly observing new infections in cohorts of HIV-negative individuals followed up over time are costly and logistically challenging, and it is difficult to ensure sufficient population representivity to ensure results can be generalised. Several alternative approaches have been proposed for estimating HIV incidence, including a ‘synthetic cohort’ approach—i.e. inferring incidence from the age and/or time structure of prevalence [16], from biomarkers for ‘recent infection’ measured in cross-sectional surveys [711], or using dynamical population models that have been calibrated to survey data [1215]. No single method by itself achieves the desired levels of accuracy and precision [16].

In this work we develop a novel hybrid method which uses an ‘optimal’ weighting of, (a) an implementation of the ‘synthetic cohort’ approach of Mahiane et al. [6]—i.e. interpreting the age and time structure of prevalence, in conjunction with excess mortality—and (b) an adaptation of the Kassanjee et al. estimator for incidence from biomarkers of recent infection [8] that takes account of the age structure of recent infection (amongst HIV-positive individuals). The method of Mahiane et al. relies on the instantaneously exact, fully age- and time-structured, representation of the dynamical relation of prevalence, excess mortality and incidence. In the case of a relatively stable epidemic (i.e. relatively slow change in age-specific prevalence over time), the age structure of prevalence provides fairly precise age-specific incidence estimates.

We applied this method to a cross-sectional household survey conducted in a district of KwaZulu-Natal province (KZN) to estimate the HIV incidence by age and sex in the area at the time of the survey (2013). For the present analysis we assume stability, but we investigate the impact of plausible time-gradients of prevalence in the sensitivity analysis. Precision of incidences estimates is markedly lower for ages over 30, and we therefore report as primary results incidence over age range 15-29 years.


Survey design and procedures

The data analysed in this study were obtained from the Mbongolwane and Eshowe HIV Impact in Population Survey, conducted in 2013 in Mbongolwane, a rural area, and Eshowe, the main town in the uMlalazi Municipality in KZN, South Africa. A two-stage stratified clustered sampling strategy was used for the selection of households according to the 2011 Census, which indicated a population of approximately 120,000 at the time of the survey [17]. Individuals aged 15-59 years old living in sampled households, and who provided informed consent, were enrolled in the study.

The University of Cape Town Human Research Ethics Committee (HREC 461/2012), the Health Research Committee of the Health Research and Knowledge Management Unit of the KwaZulu-Natal Department of Health and the Comité de Protection de Personnes de Paris in France approved the study protocol.

Face-to-face interviewer-administered questionnaires were used to collect information on socio-demographics and sexual history at the sampled household. HIV testing, including pre- and post-test counselling, was done by certified lay counsellors, on site, using the Determine Rapid HIV-1/2 Antibody test kit as a screening test followed, in the case of a positive result, by the Unigold Rapid HIV test kit for confirmation. Venous blood specimens were collected from all participants who consented. HIV antibody-positivity was determined using the on-site rapid result, confirmed by laboratory-based ELISA in the case of discordant rapid test results. Specimens from participants confirmed to be HIV antibody-positive were subjected to the Sedia Limiting Antigen Avidity EIA (LAg) assay [18], the Bio-Rad Avidity assay [19], as well as a quantitative viral load, CD4 count and an ARV presence test. Viral load testing was performed using a NucliSens EasyQ HIV-1 v2.0 assay from Biomerieux. Qualitative testing for ARV drug levels, including nevirapine, efavirenz and lopinavir, was performed using a LC MS/MS qualitative assay. In addition, to detect acute infections in antibody-negative participants, HIV-negative specimens were subjected to pooled Nucleic Acid Amplification Testing (NAAT) testing (in 5-member pools) using Roche AMPLISCREEN, and specimens from positive pools tested individually using the Roche CAP/CTM assay.

More detail on the survey has been published elsewhere [17, 2022].

Estimating incidence using biomarkers for ‘recent’ infection

We used calibration data from the Consortium for the Evaluation and Performance of HIV Incidence Assays (CEPHIA) to explore a range of recent infection case definitions based on combinations of LAg normalised optical density (ODn), Bio-Rad Avidity index (AI) and viral load thresholds, and selected an ‘optimal’ recent infection testing algorithm (RITA) based on the variance of the incidence estimates produced. The procedures and results are detailed in S1 Appendix. These show that a RITA that defines recent infection as NAT+/Ab− OR Ab+/ODn < 2.5/AI < 30/VL > 75 achieves a mean duration of recent infection (MDRI, adjusted for the sensitivity of the screening algorithm) of 217 days (95% CI: 192,244) and a context-specific false-recent rate (FRR) of 0.17% (95% CI: 0.05%,0.35%). We analyse sensitivity to imperfectly-estimated FRR in S2 Appendix.

This definition of recency was then employed to estimate incidence in the study population, by age group and sex, using the method of Kassanjee et al. [8]. The well-known Kassanjee et al. estimator, adapted for use in complex surveys by allowing the use of proportions and their standard errors, rather than survey counts, is given in Eq 1. (1) where PH is the prevalence of HIV, PR|+ is the proportion of recency tests performed on HIV-positive participants that produced a ‘recent’ result, ΩT is the MDRI and ϵT is the FRR, and T is the chosen time cutoff beyond which a ‘recent’ result is considered ‘falsely recent’ by definition. Note that the product of PH and PR|+ is the overall prevalence of recency in the sample. This estimator is implemented in the inctools R package [23]. The documentation of inctools provides details on estimating the variance of incidence estimates using both delta method and bootstrapping approaches.

Owing to the small ‘recent infection’ case counts, statistical uncertainty reaches unacceptable levels when age groups are small, and we therefore estimated incidence using the conventional approach in 15 to 29 year-olds and 30 to 59 year-olds.

For the purpose of the combined method described below, we further adapted the estimator for age-dependent prevalence of recent infection, allowing us to estimate highly granular age-specific incidence using the recent infection biomarker data in the survey. Details are provided in the section on the combined method.

Estimating age-specific incidence using the Mahiane et al. ‘synthetic cohort’ method

We employ the incidence estimator of Mahiane et al. [6] to estimate incidence from the age structure of prevalence. The estimator was derived from the fundamental relationship between incidence, prevalence and mortality in a non-transient condition—with prevalence viewed as the accumulated incidence over time, accounting for the removal of prevalent cases from the population through condition-induced ‘excess’ mortality. This is shown using a simple dynamical SI-type model, where it is demonstrated that simply rearranging the differential equations describing change in the state variables for the susceptible and infected groups yields an estimator for incidence that relies only on prevalence and excess mortality (but critically, not total mortality). In an age-structured population, this approach yields the incidence estimator in Eq 2. (2) where p(a,t) is age and time-specific prevalence and δ(a,t) is age and time-specific excess mortality.

In a stable epidemic, where the age structure of prevalence is not changing at a significant rate in secular time (see Discussion section), the age-structure of prevalence from a single cross-sectional prevalence survey is informative, and the estimator can be simplified to: (3)

We obtained age-specific incidence estimates by fitting a regression model for prevalence as a function of age to finely-grained data (i.e., not using integer ages, but the difference in days between the birth date and interview date of each participant), using a generalised linear model with a cubic polynomial in age as predictors and a logit link: (4) with g() the logit link function, so that (5) and (6)

We fit the model, separately for males and females, to data from participants aged 15 to 34 years. This provided us with a continuous function, p(a), for 15 ≤ a < 35. We derived, for each sex, the function for excess mortality, δ(a), from age-specific AIDS mortality estimates for KwaZulu-Natal province produced by the Thembisa demographic model [24], allowing us to estimate age-specific incidence, λ(a).

Reproducibility of the incidence estimate at any given age was investigated by bootstrapping the dataset (reproducing the complex sampling frame employed in the survey), refitting the models and obtaining an incidence estimate for each of the 10,000 resampled datasets. The standard deviation of the obtained estimates was computed to approximate the standard error, and the 2.5th and 97.5th percentiles to approximate the 95% confidence interval.

Estimating age-specific incidence using the combined method

In order to estimate age-specific incidence by combining HIV prevalence data and biomarkers for recent infection, available in the same dataset, we estimated age-specific incidence (and its variance) using (1) the synthetic cohort method described above, and (2) an adaptation of the Kassanjee et al. estimator to age-structured recency biomarker data. The adapted estimator is shown in Eq 7. (7) where PH(a) is the age-specific prevalence of HIV (estimated as in the previous section), and PR|+(a) is the age-specific prevalence of recency amongst HIV-positives (described below).

In order to obtain the prevalence of recency as a function of age we fit a generalised linear regression model with log of age as linear predictor and a complementary log-log link. This functional form implies an exponential decline in the prevalence of recency with age, which captures the epidemiologically sensible assumption that at young ages larger proportions of infections were acquired in the recent past. The model has the functional form: (8) with g() the link function, so that (9)

Owing to the use of prevalence in both estimates, the incidence estimates are necessarily correlated. We therefore resample the data (replicating the complex sampling frame), fit the models of PH(a) and PR(a), and at each age of interest, obtain the two incidence estimates, λP (incidence from age-structured prevalence) and λR (incidence from age-structured recency amongst positives). We then evaluate, at each age of interest, from 10,000 bootstrap iterations, the variances and covariance of the two incidence estimates, in the case of λR further incorporating uncertainty in MDRI and FRR. We then compute a combined incidence estimate using a weighted average of the two estimates. The implied weighting function, W(a), derived from the ‘optimal’ weighting factors, Wa (i.e. the weighting factor that minimises variance of the combined estimate) obtained at each evaluated age, is then convolved with the combined incidence function. At a particular age a, incidence from the combined method is given by Eq 10. (10) with 0 ≤ Wa ≤ 1, and consequently no normalisation to total weight is required. The variance of the incidence estimate at a given age is obtained from Eq 11. (11) with ρ the Pearson’s correlation coefficient between λP and λR at that age. The value of W that minimises total variance at the age of interest is obtained from the following formula, derived by setting Eq 11 to 0 and solving for W: (12)

The continuous incidence function λ(a) is then obtained by fitting a cubic interpolating spline (using the method of Forsythe, Malcolm and Moler) to estimated incidence, λa, for all ages in the range 15 to 35 years, evaluated at steps of 0.25 years.

For comparability with conventional age-group estimates, ‘average incidence’ was estimated in age bins. The integral of the λ(a) function was evaluated over the age range for which average incidence was sought, and weighted using a weighting function reflecting (a) the sampling density, or (b) the susceptible population density, to obtain average incidence. For population weighting, the population by age and sex was obtained from the 2011 Census for Eshowe and Mbongolwane, and the susceptible population size estimated using prevalence estimates from the survey data. Susceptible population-weigthted estimates are presented as primary results.

The unweighted incidence spline function, λ(a) was weighted by a weighting function f(a), derived from either the sampling density or the susceptible population density, and the integral evaluated over the age range of interest (a0 to a1) in order to obtain weighted average incidence over that range, as shown in Eq 13. (13)

The procedure was performed separately for males and females, and in order to obtain overall average incidence, these estimates were then further weighted using the weighting functions for males and females.

Defining total weights for the two sexes as and , for any age interval and weighting function, the total incidence is then given by Eq 14. (14)

Confidence intervals were obtained by bootstrapping the data (10,000 iterations), and in each iteration estimating average incidence.

The source code utilised in this estimation procedure is made available in S1 Code.

Sensitivity analyses

In order to investigate the sensitivity of our analyses to uncertainty in the False-Recent Rate, we repeated the incidence estimation procedure using a range of FRRs between 0% and 1%. We further investigated sensitivity of average incidence to the weighting scheme.

The implementation of the method developed in this paper does not take into account change in prevalence (and incidence) in the time dimension. This is valid when the epidemic is relatively stable and most of the information is captured in the age structure of prevalence. In order to investigate sensitivity to possible change over time in age-specific prevalence, we investigated a number of hypothetical scenarios in which age-specific prevalence is increasing or decreasing exponentially.

Sensitivity analyses are reported in S2 Appendix.


Conventional analysis of the biomarkers for recent infection (in large age bins) yielded an overall HIV incidence estimate for individuals aged 15 to 59 years at the time of the survey of 1.60 cases/100 person-years (PY) (95% CI: 1.11,2.16). In males 15-59 the incidence was estimated at 0.71/100PY (0.22,1.25) and in females 15-59 at 2.26/100PY (1.48,3.14). Among individuals aged 15-29, the main group of interest in this work, overall incidence was estimated at 2.03/100PY (1.37,2.77), for males at 0.89/100PY (0.28,1.58) and for females at 3.05/100PY (1.87,4.37). These results are presented in Table 1. Smaller age bins do not yield informative results using the conventional approach, owing the small case counts of recent infections.

By way of comparison, the age-continuous biomarker-based method, which is a key component of the combined method, yielded ‘average incidence’ estimates (weighted by susceptible population density) in individuals aged 15-29 overall of 1.87 cases/100PY (1.31,2.43), in males of 0.81/100PY (0.22,1.45) and in females of 2.95/100PY (1.98,4.04). The synthetic cohort method yielded susceptible population-weighted average incidence in individuals aged 15-29 overall of 3.19/100PY (2.83,3.56), in males of 2.00/100PY (1.53,2.46) and in females of 4.39/100PY (4.00,4.85). Using the combined method, we obtained incidence estimates in individuals aged 15-29 of 2.54/100PY (2.07,2.77), in males of 1.26/100PY (0.64,1.49) and in females of 3.83/100PY (3.35,4.37). These results (as well as for five-year age bins) are reported in Tables 2 and 3.

Table 2. ‘Average incidence’ estimates by age group using the biomarker and synthetic cohort methods.

Table 3. ‘Average incidence’ estimates by age group using the combined method.

Age-specific estimates using the combined method are shown in the figures. Incidence estimates are presented as continuous functions of age for individuals aged 15-29, with the contributions of the age-continuous biomarker and synthetic cohort methods. Fig 1 shows the overall results, Fig 2 the estimates for males and Fig 3 the estimates for females. Estimates become uninformative at ages over 30, owing to greatly increased statistical uncertainty.

Fig 1. HIV incidence by age in males and females aged 15-30, using the synthetic cohort, recency biomarker and combined methods.

Fig 2. HIV incidence by age in males aged 15-30, using the synthetic cohort, recency biomarker and combined methods.

Fig 3. HIV incidence by age in females aged 15-30, using the synthetic cohort, recency biomarker and combined methods.

Incidence in females rose steeply during the teenage years, from 1.31 cases/100PY (0.97,1.66) at age 15 to a peak of 4.95/100PY (4.09,5.81) at age 23. Incidence was lower—but still very high—in women in their late twenties and early 30s, with estimated incidence of 4.50/100PY (3.07,5.92) at age 30. Uncertainty in the estimates increased with age (with a standard error of approximately 0.7 at age 30, compared to 0.4 at age 23). Estimates were very imprecise for ages over 30: at age 35, incidence was estimated at 2.78/100PY, with a standard error of 1.67, resulting in a 95% CI of 0.00,6.06. Age-specific incidence in teenaged males was substantially lower than in females, estimated at 0.32 cases/100PY (0.00,0.65) at age 15, and rising sharply from the early twenties, peaking at 4.10/100PY (2.75,5.46) at age 30. Incidence in males aged 23 was estimated at 1.39/100PY (0.95,1.82). Overall incidence estimates reflect the estimates for males and females so that estimated incidence at age 15 was 0.82 cases/100PY (0.64,1.00), peaked at 4.47/100PY (3.52,5.41) at age 29, and was 4.42/100PY (3.37,5.47) at age 30.


This study describes a novel hybrid method that allows for reasonably precise estimation of age-specific incidence up to about age 30 years. It constitutes a significant improvement over conventional cross-sectional incidence estimation using biomarkers of recent infection, where small case counts limit informative estimates to large age bins.

We confirm previously-described very high incidence among young women and also among slightly older young men. A compartmental mathematical model developed by Blaizot et al. [25] produced similar incidence estimates by sex and age group when calibrated to the same data [26]. In females, incidence peaked at age 23, and in males at age 30. We have previously described that young people were more likely to transmit HIV. In the same survey, among individuals aged 15-19 years and 20-34 years 34% and 35% respectively were unaware of their HIV status and 66% and 53% were virally unsuppressed; both factors were associated with higher-risk sexual behaviour [20]. Precise age-specific incidence estimates are important to identify the age and gender groups most at risk. These findings highlight the need for targeted prevention and HIV testing strategies for girls and young women, as well as men aged 20 to 40 years.

The conventional biomarker-based approach does not allow finely-grained age-specific incidence estimation, since small case counts (or sample proportions) result in very wide confidence intervals. Even analysis of the data in five-year age bins produce estimates that cannot be clearly distinguished from zero. Our adapted age-continuous biomarker estimates provide reasonably reproducible estimates in younger individuals, where the parameterisation of the prevalence of recent infection (amongst HIV-positive individuals) is likely to be sound. However, this method would be more challenging to implement in older individuals, where the distribution of recent infections is more complicated.

The synthetic cohort method provides additional information on incidence, and in certain age ranges is in fact more informative than the biomarker method. As can be seen in the figures, at younger ages the two estimates are very similar, but diverge at older ages. At younger ages the synthetic cohort method has greater precision (estimates have lower variance). In females, the combined estimate is weighted in favour of the synthetic cohort method throughout the age range 15 to 29, but with more heavily skewed at younger ages (weighting factor of 0.84 at age 15 and 0.54 at age 29), whereas in males the weighting tips towards the biomarker method at age 25.

The idea of using demographic structure of prevalence data to infer incidence is certainly not new. Williams et al. [1] developed something very close to the approach we are taking—the main difference being their proposal (in light of data available at that time) to use age-averaged rather than age-specific estimates of time dependence of prevalence. We follow the instantaneously exact, fully age and time-structured, representation of the relation of prevalence, mortality and incidence that was introduced in Mahiane et al. [6]. That paper also considered the previously-published methods of Brunet and Struchiner [2, 3], Hallett et al. [4], and Brookmeyer and Konikoff [5], all of which were found to have substantial biases, noted to be the result of their various particular forms of dynamical approximation—essentially using assumptions of constant prevalence in age and time ranges.

An advantage of the hybrid approach is that it combines both age-specific HIV prevalence data and biomarker data, thus reducing the risk of bias in HIV incidence estimation. By combining the estimates from the two methods, age-specific incidence can be estimated with significantly greater precision than with the biomarker method alone. For example, at age 15 in females, the standard error on the age-continuous biomarker estimate of 0.91 cases/100PY is 0.42 (i.e. a coefficient of variation of 46%) and the standard error on the weighted average of 1.31/100PY is 0.18 (CoV = 13%). The narrower confidence bounds around the combined method estimates can be clearly seen in the figures. At certain ages, there is a very substantial improvement, for example in males aged 22, the CoV on the biomarker estimate is 47%, while on the combined estimate it is 18%. The precision of the combined method is not greatly enhanced over that of the synthetic cohort method, but estimates are likely to be more accurate, especially where information on change in the age structure of prevalence over time is not available, which may bias estimates.

For the recent infection case definition adopted for this analysis we estimated, using CEPHIA calibration data, a very small context-specific false-recent rate. Unfortunately, the FRR is also the test property that is hardest to estimate, and where the transferability from calibration data to the surveyed population is most problematic. While we adopted a sophisticated approach to context adaptation of the test property estimates, these challenges remain. For present purposes we assumed that the test properties (MDRI and FRR) do not vary with age, although it is likely that the longer (on average) time-since-infection in older individuals would impact the FRR and that biological changes in the immune system would impact the MDRI. In estimating context-specific FRR we make assumptions about the population-level distribution of times-since-infection, but a lack of data on past incidence precludes a more nuanced age-specific FRR estimate. This limitation is addressed by means of a sensitivity analysis with respect to FRR, as reported in S2 Appendix. Given the very low population-level FRR estimate, it is unlikely that this assumption introduces substantial bias. The sensitivity analyses indicate that our results are not highly sensitive to the false-recent rate, although it becomes more so at older ages, where the combined method relies more on the biomarker-based estimate.

While the use of pooled nucleic acid amplification testing increases the sensitivity of the screening algorithm, this strategy adds considerably to the cost. Defining NAAT yield (acutely infected) cases as recently infected also added approximately 15 days to the MDRI of the RITA [27]. However, we identified only two acute infections, and it is not clear that this strategy is feasible in most large population-based surveys.

The impact of ART on recency biomarkers is well-established—treated individuals tend to undergo partial seroreversion resulting in “false” recent classifications. The inclusion of a viral load threshold in the RITA ameliorates this problem, resulting in a very low FRR, although calibration data are lacking on treated but virally unsuppressed individuals (see S1 Appendix). The increasing adoption of early treatment (“Universal Test-and-Treat”) has the potential to impact the MDRI of RITAs that classify treated (or virally suppressed) individuals as non-recent, although at the time of this survey, very few (if any) individuals in the study population would have received ART within two years of infection. The impact of ART on the synthetic cohort method is likely to be largely innocuous, as long as accurate age-specific excess mortality estimates are employed. Increasing uptake of early ART would further reduce the already low excess mortality in young HIV-infected individuals.

A major limitation of this study is that we are analysing data from a single cross-section survey, providing no information on change in the (age-structured) prevalence over time. A second survey is planned in the study population, which would allow future analyses to be conducted that explicitly incorporate change over time. We investigated the sensitivity of our estimates to prevalence changes in time, and found that estimates from the combined method are not very sensitive to plausible rates of change in prevalence at the time of the survey. However, if the assumption of a stable epidemic were violated and rapid increases or declines in prevalence were taking place at the time of the survey, our method would exhibit significant bias. It would therefore be preferable to explicitly incorporate the time dimension in the analysis (by using data from serial prevalence surveys) and it is essential that the version of the method that ignores time is only applied in settings where the assumption of stability is sound.

Further, estimates become very uncertain at ages over 30 (see Fig 4), resulting in synthetic cohort, biomarker and combined method estimates with confidence intervals that stretch from close to zero to very large values. The failure of the method to provide informative estimates at higher ages requires further investigation. This limitation may, in part, reflect the particular parameterisation of regression models for HIV prevalence and for the prevalence of ‘recent infection’ used in the present analysis.

Fig 4. HIV incidence by age in males and females aged 15-35, using the synthetic cohort, recency biomarker and combined methods.


This analysis demonstrates the value of age-structured prevalence data, when reasonable estimates of excess mortality are available, and that when additional biomarkers of recent infection are available these can be sensibly incorporated into age-specific incidence estimates. The novel hybrid method used in this analysis can be extended to allow the analysis of serial prevalence (and, when available, recent infection) data, without significant further conceptual development, for maximally informative incidence estimation. Application of this method to large-scale population-based HIV prevalence surveys is likely to result in improved incidence surveillance over methods currently in wide use. Reasonably accurate and precise age-specific estimates of incidence are important to target better prevention, diagnosis and care strategies.

Supporting information

S1 Appendix. Optimal RITA identification and calibration.


S1 Dataset. Anonymised dataset.

The variables id, cluster and ward are randomised participant, cluster (primary sampling unit) and electoral ward (stratum) identifiers, respectively. To replicate the multistage sampling frame during boostrapping, wards were resampled with replacement, and within each sampled ward clusters were resampled with replacemnent. The age_years variable captures age in years, rounded to whole numbers as a safeguard against de-anonymisation. Final HIV status is captured in hiv_status and participants identified as HIV-infected using nucleic acid amplification testing have “True” in naat_yield. Final LAg normalised optical density and Bio-Rad Avidity avidity index are captured in LAgODn and BioRadAI, respectively, and viral load in viral_load. For convenience, recency status according to the RITA utilised in this analysis is captured in recent.



The authors gratefully acknowledge the Centre for High Performance Computing for providing computational resources to this research project.

The authors are grateful to the Consortium for the Evaluation and Performance of HIV Incidence Assays (CEPHIA) for allowing us to use its recent infection biomarker calibration data in this study, specifically the principal investigators Gary Murphy, Christopher D. Pilcher, Michael P. Busch and Alex Welte, and the core team comprising Sheila Keating, Mila Lebedeva, Dylan Hampton, Jake Hall, Elaine McKinney, Kara Marson, Shelley Facente, Eduard Grebe, Reshma Kassanjee and Trust Chibawara. The analysis presented in this paper was conducted by Eduard Grebe.

CEPHIA comprises: Oliver Laeyendecker, Thomas Quinn, David Burns (National Institutes of Health); Alex Welte, Eduard Grebe, Reshma Kassanjee, David Matten, Hilmarié Brand, Trust Chibawara (South African Centre for Epidemiological Modelling and Analysis); Gary Murphy, Elaine Mckinney, Jake Hall (Public Health England); Michael Busch, Sheila Keating, Mila Lebedeva, Dylan Hampton (Blood Systems Research Institute); Christopher Pilcher, Kara Marson, Shelley Facente, Jeffrey Martin; (University of California, San Francisco); Susan Little (University of California, San Diego); Anita Sands (World Health Organization); Tim Hallett (Imperial College London); Sherry Michele Owen, Bharat Parekh, Connie Sexton (Centers for Disease Control and Prevention); Matthew Price, Anatoli Kamali (International AIDS Vaccine Initiative); Lisa Loeb (The Options Study—University of California, San Francisco); Jeffrey Martin, Steven G Deeks, Rebecca Hoh (The SCOPE Study—University of California, San Francisco); Zelinda Bartolomei, Natalia Cerqueira (The AMPLIAR Cohort—University of São Paulo); Breno Santos, Kellin Zabtoski, Rita de Cassia Alves Lira (The AMPLIAR Cohort—Grupo Hospital Conceição); Rosa Dea Sperhacke, Leonardo R Motta, Machline Paganella (The AMPLIAR Cohort—Universidade Caxias Do Sul); Esper Kallas, Helena Tomiyama, Claudia Tomiyama, Priscilla Costa, Maria A Nunes, Gisele Reis, Mariana M Sauer, Natalia Cerqueira, Zelinda Nakagawa, Lilian Ferrari, Ana P Amaral, Karine Milani (The São Paulo Cohort—University of São Paulo, Brazil); Salim S Abdool Karim, Quarraisha Abdool Karim, Thumbi Ndungu, Nelisile Majola, Natasha Samsunder (CAPRISA, University of Kwazulu-Natal); Denise Naniche (The GAMA Study—Barcelona Centre for International Health Research); Inácio Mandomando, Eusebio V Macete (The GAMA Study—Fundacao Manhica); Jorge Sanchez, Javier Lama (SABES Cohort—Asociación Civil Impacta Salud y Educación (IMPACTA)); Ann Duerr (The Fred Hutchinson Cancer Research Center); Maria R Capobianchi (National Institute for Infectious Diseases “L. Spallanzani”, Rome); Barbara Suligoi (Istituto Superiore di Sanità, Rome); Susan Stramer (American Red Cross); Phillip Williamson (Creative Testing Solutions / Blood Systems Research Institute); Marion Vermeulen (South African National Blood Service); and Ester Sabino (Hemocentro do Sao Paolo). General enquiries may be addressed to Gary Murphy at


  1. 1. Williams B, Gouws E, Wilkinson D, Karim SA. Estimating HIV incidence rates from age prevalence data in epidemic situations. Statistics in Medicine. 2001;20(13):2003–2016. pmid:11427956
  2. 2. Brunet RC, Struchiner CJ. Rate Estimation from Prevalence Information on a Simple Epidemiologic Model for Health Interventions. Theoretical Population Biology. 1996;50(3):209–226. pmid:9000488
  3. 3. Brunet RC, Struchiner CJ. A Non-parametric Method for the Reconstruction of Age- and Time-Dependent Incidence from the Prevalence Data of Irreversible Diseases with Differential Mortality. Theoretical Population Biology. 1999;56(1):76–90. pmid:10438670
  4. 4. Hallett TB, Zaba B, Todd J, Lopman B, Mwita W, Biraro S, et al. Estimating Incidence from Prevalence in Generalised HIV Epidemics: Methods and Validation. PLoS Medicine. 2008;5(4):e80. pmid:18590346
  5. 5. Brookmeyer R, Konikoff J. Statistical Considerations in Determining HIV Incidence from Changes in HIV Prevalence. Statistical Communications in Infectious Diseases. 2011;3(1).
  6. 6. Mahiane GS, Ouifki R, Brand H, Delva W, Welte A. A General HIV Incidence Inference Scheme Based on Likelihood of Individual Level Data and a Population Renewal Equation. PLoS ONE. 2012;7(9):e44377. pmid:22984497
  7. 7. McWalter TA, Welte A. A Comparison of Biomarker Based Incidence Estimators. PLoS ONE. 2009;4(10):e7368. pmid:19809505
  8. 8. Kassanjee R, McWalter TA, Bärnighausen T, Welte A. A New General Biomarker-based Incidence Estimator. Epidemiology. 2012;23(5):721–728. pmid:22627902
  9. 9. Brookmeyer R, Laeyendecker O, Donnell D, Eshleman SH. Cross-Sectional HIV Incidence Estimation in HIV Prevention Research. JAIDS Journal of Acquired Immune Deficiency Syndromes. 2013;63:S233–S239. pmid:23764641
  10. 10. Brookmeyer R, Konikoff J, Laeyendecker O, Eshleman SH. Estimation of HIV Incidence Using Multiple Biomarkers. American Journal of Epidemiology. 2013;177(3):264–272. pmid:23302151
  11. 11. Kassanjee R, Pilcher CD, Busch MP, Murphy G, Facente SN, Keating SM, et al. Viral load criteria and threshold optimization to improve HIV incidence assay characteristics. AIDS. 2016;30(15):2361–2371. pmid:27454561
  12. 12. Johnson LF, Hallett TB, Rehle TM, Dorrington RE. The effect of changes in condom usage and antiretroviral treatment coverage on human immunodeficiency virus incidence in South Africa: a model-based analysis. Journal of The Royal Society Interface. 2012;9(72):1544–1554.
  13. 13. Brown T, Bao L, Eaton JW, Hogan DR, Mahy M, Marsh K, et al. Improvements in prevalence trend fitting and incidence estimation in EPP 2013. AIDS. 2014;28:S415–S425.
  14. 14. Johnson LF, Chiu C, Myer L, Davies MA, Dorrington RE, Bekker LG, et al. Prospects for HIV control in South Africa: a model-based analysis. Global Health Action. 2016;9(1):30314. pmid:27282146
  15. 15. Stover J, Brown T, Puckett R, Peerapatanapokin W. Updates to the Spectrum/Estimations and Projections Package model for estimating trends and current values for key HIV indicators. AIDS. 2017;31(October 2016):S5–S11.
  16. 16. Rehle T, Johnson L, Hallett T, Mahy M, Kim A, Odido H, et al. A Comparison of South African National HIV Incidence Estimates: A Critical Appraisal of Different Methods. PLoS ONE. 2015;10(7):e0133255. pmid:26230949
  17. 17. Huerga H, Van Cutsem G, Farhat JB, Reid M, Bouhenia M, Maman D, et al. Who needs to be targeted for HIV testing and treatment in KwaZulu-Natal? Results from a population-based survey. Journal of Acquired Immune Deficiency Syndromes. 2016;73(4):411. pmid:27243903
  18. 18. Duong YT, Qiu M, De AK, Jackson K, Dobbs T, Kim AA, et al. Detection of Recent HIV-1 Infection Using a New Limiting-Antigen Avidity Assay: Potential for HIV-1 Incidence Estimates and Avidity Maturation Studies. PLoS ONE. 2012;7(3):e33328. pmid:22479384
  19. 19. Masciotra S, Dobbs T, Candal D, Hanson D, Delaney K, Rudolph D, et al. Antibody avidity-based assay for identifying recent HIV-1 infections based on Genetic Systems TM1/2 Plus O EIA. Conference on Retroviruses and Opportunistic Infections; 2010 Feb 16-19; San Francisco, CA.
  20. 20. Huerga H, Venables E, Ben-Farhat J, van Cutsem G, Ellman T, Kenyon C. Higher risk sexual behaviour is associated with unawareness of HIV-positivity and lack of viral suppression—implications for Treatment as Prevention. Scientific Reports. 2017;7(1):16117. pmid:29170407
  21. 21. Huerga H, Shiferie F, Grebe E, Giuliani R, Farhat JB, Van-Cutsem G, Cohen K. A comparison of self-report and antiretroviral detection to inform estimates of antiretroviral therapy coverage, viral load suppression and HIV incidence in Kwazulu-Natal, South Africa. BMC Infect Dis. 2017;17(1):653. pmid:28969607
  22. 22. Huerga H, Van Cutsem G, Farhat JB, Puren A, Bouhenia M, Wiesner L, et al. Progress towards the UNAIDS 90–90-90 goals by age and gender in a rural area of KwaZulu-Natal, South Africa: a household-based community cross-sectional survey. BMC Public Health. 2018;18(1):303. pmid:29499668
  23. 23. Welte A, Grebe E, McIntosh A, Bäumler P. inctools: Incidence Estimation Tools; 2018. R package version 1.0.11. Available from:
  24. 24. Johnson LF, May MT, Dorrington RE, Cornell M, Boulle A, Egger M, et al. Estimating the impact of antiretroviral treatment on adult mortality trends in South Africa: A mathematical modelling study. PLOS Medicine. 2017;14(12):e1002468. pmid:29232366
  25. 25. Blaizot S, Riche B, Maman D, Mukui I, Kirubi B, Etard JF, Ecochard R. Estimation and short-term prediction of the course of the HIV epidemic using demographic and health survey methodology-like data. PLOS ONE. 2015;10(6):e0130387. pmid:26091253
  26. 26. Blaizot S, Huerga H, Riche B, Ellman T, Shroufi A, Etard JF, Ecochard R. Combined interventions to reduce HIV incidence in KwaZulu-Natal: a modelling study. BMC Infectious Diseases. 2017;17(1):522. pmid:28747167
  27. 27. Grebe E, Facente S, Powrie A, Gerber J, Priede G, Chibawara T, et al. Infection Dating Tool; 2018. Online application. Available from: