Prediction of extended high viremia among newly HIV-1-infected persons in sub-Saharan Africa

Objective Prompt identification of newly HIV-infected persons, particularly those who are most at risk of extended high viremia (EHV), allows important clinical and transmission prevention benefits. We sought to determine whether EHV could be predicted during early HIV infection (EHI) from clinical, demographic, and laboratory indicators in a large HIV-1 incidence study in Africa. Design Adults acquiring HIV-1 infection were enrolled in an EHI study assessing acute retroviral syndrome (ARS) symptoms and viral dynamics. Methods Estimated date of infection (EDI) was based on a positive plasma viral load or p24 antigen test prior to seroconversion, or the mid-point between negative and positive serological tests. EHV was defined as mean untreated viral load ≥5 log10 copies/ml 130–330 days post-EDI. We used logistic regression to develop risk score algorithms for predicting EHV based on sex, age, number of ARS symptoms, and CD4 and viral load at diagnosis. Results Models based on the full set of five predictors had excellent performance both in the full population (c-statistic = 0.80) and when confined to persons with each of three HIV-1 subtypes (c-statistic = 0.80–0.83 within subtypes A, C, and D). Reduced models containing only 2–4 predictors performed similarly. In a risk score algorithm based on the final full-population model, predictor scores were one for male sex and enrollment CD4<350 cells/mm3, and two for having enrollment viral load >4.9 log10 copies/ml. With a risk score cut-point of two, this algorithm was 85% sensitive (95% CI: 76%-91%) and 61% specific (55%-68%) in predicting EHV. Conclusions Simple risk score algorithms can reliably identify persons with EHI in sub-Saharan Africa who are likely to sustain high viral loads if treatment is delayed. These algorithms may be useful for prioritizing intensified efforts around care linkage and retention, treatment initiation, adherence support, and partner services to optimize clinical and prevention outcomes.

Introduction Antiretroviral therapy (ART) initiated early in HIV-1 infection preserves immune function [1], reduces adverse clinical outcomes [2][3][4], and prevents transmission [4]. Recognizing these benefits, HIV treatment guidelines recommend ART initiation at diagnosis, regardless of CD4 count [5,6]. However, as of July 2017, only 60% of low-and middle-income countries had adopted the "Treat All" policy, and only 9% had implemented this approach in a majority of treatment sites [7]. These lags in guideline implementation-combined with suboptimal care linkage among those who are ART-eligible at diagnosis [8,9] and poor care retention both before [10] and after [11] ART initiation-result in substantial losses of both clinical and transmission prevention benefits.
Delayed ART initiation and poor retention are particularly detrimental among persons who sustain high viral loads in the absence of treatment, as they are at especially high risk of onward transmission [12] and disease progression [13,14]. If persons likely to sustain high viremia could be identified at HIV diagnosis, then intensified efforts to support care linkage, ART initiation, partner services, care retention, and ART adherence specifically among these persons could optimize clinical and transmission prevention benefits. We therefore sought to develop risk score algorithms for identifying newly HIV-infected cases likely to have extended high viremia (EHV)-that is, likely to sustain viral loads !5 log 10 more than three months after HIV-1 acquisition-based on demographic, clinical, and laboratory indicators in a large cohort of HIV-1 seroconverters in Africa.

Study design, setting, population, and procedures
As described previously [15], adults at risk of HIV-1 infection in eastern and southern Africa were enrolled into a multi-center cohort study across nine research centers in Kenya (Nairobi, Kilifi), Uganda (Entebbe, Masaka), Rwanda (Kigali), Zambia (Lusaka, Copperbelt), and South Africa (Rustenburg, Cape Town). Study volunteers were tested for HIV-1 monthly or quarterly (depending on site), including p24 antigen testing to detect infection before seroconversion. Blood collected at antibody-negative visits was saved to enable retrospective HIV RNA testing for acute HIV infection if a subsequent sample was found antibody-positive. Volunteers with incident HIV-1 infection detected between March 2005 and December 2011 were invited to At the time of EHI cohort enrollment, volunteers identified 90 days after their estimated date of infection (EDI) were asked whether they had experienced eleven symptoms of acute retroviral syndrome (ARS) in the three months before HIV-1 detection: fever, headache, night sweats, myalgia, fatigue, skin rash, oral ulcers, pharyngitis, lymphadenopathy, diarrhea, and anorexia [16]. As described previously [17], blood was drawn at enrollment for pol gene viral subtyping and CD4 and viral load quantification. The EDI was defined as 10 days before the first positive viral load test if antibody and p24 tests were negative at the time of the detectable viral load, 14 days before the first positive p24 test if no previous viral load or antibody tests were positive, or the midpoint between the last negative and first positive HIV-antibody test in the absence of any p24-or RNA-positive samples [17]. Following EHI study enrollment, viral load and CD4 quantification was performed monthly in the first three months after the EDI, quarterly until two years post-EDI, and semiannually thereafter.

Statistical analyses
EHI study participants were included in the current analysis if their enrollment visit was 90 days post-EDI, ARS symptoms were assessed at enrollment, and at least one viral load measurement was available for EHV calculation. Similar to prior studies [18,19], we defined EHV as a mean pre-ART viral load !5 log 10 copies/ml during the period 130-330 days post-EDI. Viral load measurements taken after ART initiation were censored. In descriptive analyses, we examined the distributions (overall and by subtype) of key demographic and clinical variables, including numbers of ARS symptoms and EHV prevalence.
We constructed logistic regression models with EHV as the outcome, first without and then with stratification by HIV-1 subtype. The overall (subtype-nonspecific) analyses were meant to represent model use in locations in which there is not a single predominant subtype, whereas the subtype-specific analyses were intended for locations in which a particular subtype is known to be most prevalent. Subtype-specific analyses were limited to subtypes A, C, and D (excluding a small number of recombinants), the only subtypes for which we had sufficient numbers of volunteers. Based on previous studies of viremia and/or clinical progression [20][21][22][23][24][25][26][27], we included sex, age (!30 vs. <30), and a dichotomized measure of the number of ARS symptoms (!2 vs. <2) as predictor variables in each full model. We also included enrollment viral load (dichotomized at the median: >4.9 vs. 4.9 log 10 copies/ml) and CD4 stratum (< 350 vs. ! 350 cells/mm 3 ) as predictors. We conducted backward elimination with a stopping rule of p = 0.2 for each model. To assess the performance of the full and final models, we examined the c-statistic, representing the area under each model's receiver operating characteristic (ROC) curve. C-statistics of <0.69 were considered poor, 0.7-0.79 acceptable, 0.8-0.89 excellent, and !0.9 outstanding [28].
After constructing predictive models overall and by HIV-1 subtype, we sought to develop a risk score algorithm for predicting EHV in each context. For each of the four final models (one for the overall population and one for each of the three HIV-1 subtypes), we calculated a predictor score for each explanatory variable by rounding its beta coefficient (i.e., natural log of the odds ratio) to the nearest integer. After calculating a risk score for each volunteer by summing the applicable predictor scores from a given model, we calculated the sensitivity and specificity of each possible risk score cut-point within each model. All analyses were conducted with SAS 9.4 (SAS Institute, Cary, NC).

Results
Of 613 HIV-1 seroconverters enrolling in the EHI cohort, 422 (68.8%) enrolled and completed information on ARS symptoms 90 days after their EDI (Fig 1). Of these, 388 (91.9%) had at least one pre-ART viral load measurement (range = 1-5 measurements; mode = 2 measurements) between 130 and 330 days post-EDI, and were thus eligible for analysis. Approximately one-third (34.8%) of the eligible volunteers had HIV-1 subtype A, 43.8% subtype C, and 15.7% subtype D, with the remaining 5.7% comprising other subtypes (Table 1). Across subtypes, the median age among males and females was 30 and 28 years, respectively, and the majority Predicting extended high HIV-1 viremia in sub-Saharan Africa (72.9%) of volunteers belonged to serodiscordant couples. A lower proportion of those with subtype A were female versus those with subtypes C and D, and although the age distribution in females was similar across subtypes, male volunteers with subtype A were appreciably younger than those with subtype C (but not D). Due to differences in at-risk source populations across research centers, serodiscordant couples were more prevalent among those with subtype C than with the other two subtypes, and men who have sex with men were more prevalent among those with subtype A. Overall, the median time between EDI and enrollment was 45 days, median number of ARS symptoms was 2, and median enrollment viral load was 4.9 log 10 copies/ml. Time since EDI and enrollment viral load were similar across subtypes, but subtype A volunteers had a greater number of ARS symptoms than those with subtypes C and D (as previously reported in [16]). The proportions with CD4>350 and CD4>500 (83.1% and 54.3% overall, respectively) were similar across subtypes.
The relationship between number of ARS symptoms and EHV prevalence varied across subtypes (Fig 2). Subtype A volunteers exhibited a clear increase in EHV prevalence as the number of ARS symptoms increased from 0-1 to 2-7 to !8. Subtype C and D volunteers also experienced an increase in EHV prevalence as the number of ARS symptoms increased from 0-1 to 2-7, but subtype D estimates were imprecise. No subtype C or D volunteers with !8 ARS symptoms experienced EHV, but very few subtype C and D volunteers had this many symptoms (N = 4 and N = 1, respectively).
Logistic regression models including sex, age, number of ARS symptoms, enrollment viral load, and enrollment CD4 had excellent performance both in the full population (c-statistic = 0.80) and when confined to persons with each of the three HIV-1 subtypes (c-statistic = 0.83, 0.83, and 0.80 for subtypes A, C, and D, respectively). The final models obtained through backward selection performed similarly (c-statistics of 0.79, 0.82, 0.83, and 0.80 in the full, subtype A, subtype C, and subtype D populations, respectively). The final model for the full population included sex, age at infection, enrollment viral load, and enrollment CD4; the final model for subtype A infection included number of ARS symptoms, age at infection, and enrollment viral load; the final model for subtype C infection included male sex, age at infection, enrollment viral load, and enrollment CD4; and the final model for subtype D included age at infection and enrollment viral load. Using the beta coefficients from the final model applied in the full population, we developed a risk score algorithm in which male sex and enrollment CD4 <350 cells/mm 3 were each assigned one point, and enrollment log 10 viral load >4.9 log 10 copies/ml was assigned two points ( Table 2). Based on the beta coefficients in the model developed specifically for subtype A infection, we assigned one point to number of ARS symptoms !2 and to age !30, and three points to enrollment viral load >4.9 log 10 copies/ml. Point values for the subtype C algorithm were one point for male sex and age !30, two points for enrollment viral load >4.9 log 10 copies/ml, and two points for enrollment CD4 <350 cells/mm 3 . In the algorithm for subtype D, both enrollment viral load >4.9 log 10 copies/ml and age <30 at infection carried two points on the basis of their beta coefficients.  The overall and subtype-specific algorithms performed well (Fig 3). For example, with a cut-point of two in the full-population model, the sensitivity was 85% (95% CI: 76%-91%)that is, the algorithm would correctly identify 85% of persons who would subsequently experience EHV. The corresponding specificity was 61% (55%-68%)-that is, 61% of those who did not ultimately experience EHV had scores <2. In other words, the algorithm would correctly rule out 61% of persons who would not experience EHV. In the subtype A model, a cut-point of three had a sensitivity of 91% (82%-100%) and a specificity of 63% (53%-73%); in the subtype C model, a cut-point of two had a sensitivity of 93% (80%-98%) and a specificity of 44% (33%-54%); and in the subtype D model, a cut-point of two had a sensitivity of 100% and a specificity of 33% (19%-47%).

Discussion
Rapid identification and treatment of newly HIV-infected persons can have important clinical and public health benefits, but many persons with EHI are likely to have CD4 counts above current ART initiation thresholds in many sub-Saharan African countries [6,7], and suboptimal linkage and retention are prevalent even among treatment-eligible persons [8][9][10][11]. The detrimental effects of delaying treatment are particularly great among newly infected persons who sustain high viral loads, allowing unmitigated transmission for months or years before Sensitivity and specificity of risk score models developed in the full and subtype-specific populations. The horizontal axes display all possible risk score cut-points that could be chosen for clinical implementation of a given algorithm. In clinical implementation, all persons with risk scores at or above a chosen cut-point would be identified as likely to subsequently have extended high viremia. Circles represent the proportion of all EHV cases with scores at or above a given risk score cut-point (i.e., sensitivity). Diamonds represent the proportion of all those who did not have EHV with scores below a given risk score cut-point (i.e., specificity).
https://doi.org/10.1371/journal.pone.0192785.g003 treatment begins. To determine whether limited resources for linkage, retention, treatment, and partner services could be efficiently targeted toward EHI cases with the highest potential for onward transmission and clinical progression, we sought to develop predictive models and risk score algorithms for EHV based on previously identified correlates of sustained high viremia.
The full predictive model containing number of ARS symptoms, male sex, age at infection, and enrollment CD4 and viral load had excellent performance in the overall and subtype-specific populations. This strong predictive ability was largely maintained in each of the more parsimonious, reduced models, suggesting that simple models with only 2-4 predictors could reliably identify candidates for intensified efforts around ART initiation, counseling, and monitoring in many sub-Saharan African settings.
Possible approaches for identifying potential EHI cases for algorithm application will vary across settings according to laboratory capabilities. Newly HIV-positive persons with recent HIV-negative test results (i.e., in the prior 3-6 months) should be considered as probable EHI cases, as should those with discordant rapid antibody tests in dual-test settings [29,30]. Though less commonly available, fourth-generation antibody/antigen tests or HIV RNA testing of antibody-negative persons offer more direct identification of EHI cases for EHV algorithm application.
The risk score algorithms that we developed both in the overall and subtype-specific populations performed well, suggesting that these algorithms could be useful in many sub-Saharan African diagnosis settings. In real-world implementation, clinical staff would complete a brief checklist of algorithm predictors for suspected or confirmed EHI cases and then sum predictor scores to calculate a patient's risk score. For example, consider a forty-year-old male who tests HIV-antibody-positive after testing negative two months earlier in a setting where subtype A predominates. If this probable EHI case has four ARS symptoms and log 10 viral load > 4.9 copies/ml, then he would have a risk score of five (sum of predictor scores = 1 for age + 1 for number of symptoms + 3 for viral load). Patients with a risk score above a chosen cut-point would then be selected for intensified efforts around care engagement, treatment initiation, and/or partner services.
Strengths of our study include the collection of relevant data under a standardized protocol and questionnaire at nine different sites across sub-Saharan Africa, as well as the relatively large population of newly HIV-infected persons arising from the multi-site design. We note, however, that the small numbers of volunteers within each subtype limited our ability to include a large number of predictors (e.g., each specific ARS symptom) in subtype-specific models. It is possible that subtype-specific algorithms based on particular symptoms or other variables would be more predictive. We also note that the timing of ARS assessment and viral load measurements within the course of HIV infection could differ in the real world, as detection of incident HIV occurred through regular testing in our population but may be more symptom-driven outside of the research context. It is difficult to predict the effect of such timing differences on symptom recall or viral load values, but any such effects could affect algorithm performance.
To our knowledge, only one other risk score algorithm relating ARS symptoms to longerterm viral load endpoints has been published [31]. That algorithm, which relied on expert opinion to select predictors and assign points (i.e., without a predictive model), included severe neurological symptoms (3 points), inpatient treatment (3 points), age !50 years (1 point), reported or documented fever (1 point), elevated liver enzymes (1 point), and thrombocytopenia (1 point). In applying the algorithm to persons infected predominantly with HIV-1 subtype B in Switzerland, higher risk scores were found to be associated with a higher setpoint viral load after 90 days of untreated infection, but algorithm performance in terms of sensitivity, specificity, and c-statistic was not reported in the original study [31] or in a recent validation study in a US population [32]. Although the general purpose of the Swiss algorithm is similar to ours, the risk scores and results are not directly comparable, due to differences in the specific predictors included, participants' HIV-1 subtype profiles, and study settings. In particular, the US and European contexts to which the previous algorithm is most likely to be generalizable is not subject to the same resource constraints as the settings in which our algorithm is intended to guide resource allocation.
Overall, our findings suggest that algorithms based on age, sex, CD4, viral load, and numbers of ARS symptoms could be useful in identifying newly HIV-infected persons in whom intensified efforts around ART initiation, retention, and/or partner services should be considered. By guiding efficient intervention targeting in resource-limited settings, such algorithms could enable important clinical and transmission prevention benefits from the earliest possible point in infection.