Indices to Measure Risk of HIV Acquisition in Rakai, Uganda

Introduction Targeting most-at-risk individuals with HIV preventive interventions is cost-effective. We developed gender-specific indices to measure risk of HIV among sexually active individuals in Rakai, Uganda. Methods We used multivariable Cox proportional hazards models to estimate time-to-HIV infection associated with candidate predictors. Reduced models were determined using backward selection procedures with Akaike's information criterion (AIC) as the stopping rule. Model discrimination was determined using Harrell's concordance index (c index). Model calibration was determined graphically. Nomograms were used to present the final prediction models. Results We used samples of 7,497 women and 5,783 men. 342 new infections occurred among females (incidence 1.11/100 person years,) and 225 among the males (incidence 1.00/100 person years). The final model for men included age, education, circumcision status, number of sexual partners, genital ulcer disease symptoms, alcohol use before sex, partner in high risk employment, community type, being unaware of a partner's HIV status and community HIV prevalence. The Model's optimism-corrected c index was 69.1 percent (95% CI = 0.66, 0.73). The final women's model included age, marital status, education, number of sex partners, new sex partner, alcohol consumption by self or partner before sex, concurrent sexual partners, being employed in a high-risk occupation, having genital ulcer disease symptoms, community HIV prevalence, and perceiving oneself or partner to be exposed to HIV. The models optimism-corrected c index was 0.67 (95% CI = 0.64, 0.70). Both models were well calibrated. Conclusion These indices were discriminative and well calibrated. This provides proof-of-concept that population-based HIV risk indices can be developed. Further research to validate these indices for other populations is needed.

Moreover, the effectiveness of some preventive programs was shown to be higher among people most-at-risk of HIV. For example, male circumcision (MC) was more effective in studies of men deemed to be at high risk of HIV such as men recruited from clinics treating sexually transmitted infections and among truck drivers [8], or HIV uninfected men in discordant relationships with HIV-positive women [9]. Higher efficacy of MC was also suggested for men with many partners or genital ulcers in the Rakai circumcision trial [10].
With respect to PrEP, mathematical models suggest higher costeffectiveness of oral PrEP among most-at-risk groups [11,12].
Unlike countries where the HIV epidemic is limited to specific groups (concentrated epidemics) and high-risk groups are readily identified, in sub-Saharan Africa (SSA) with a generalized epidemic, identifying high-risk groups to target is especially challenging.
Prediction indices have been used successfully to predict the risk for chronic diseases such as coronary heart disease and cancer [13,14] but have had limited application to HIV infection. Todate research into indices to predict HIV risk in SSA has been limited to discordant couple relationships [15,16]. Development of indices for the general population is especially critical for SSA where identifiable discordant couple relationships contribute only modestly to HIV incidence [17,18].
We developed gender-specific indices to predict the risk of HIV acquisition based on the general population of sexually active individuals who participated in the Rakai Community Cohort Study (RCCS) in Rakai district, Uganda.

Ethics statement
The Rakai Community cohort study (RCCS) was approved by the Science and Ethics Committee of Uganda Virus Research Institute, the Uganda National Council of Science and Technology and US-based Western IRB. Written consent was obtained from all research participants. Participants less than 18 years had their parents, caretakers, or guardians provide written consent for them in addition to their own written assent. Consent procedures were approved by the Science and Ethics Committee of Uganda Virus Research Institute, the Uganda National Council of Science and Technology and US-based Western IRB.

Population
Derivation of the indices was based on data from the Rakai Community Cohort Study (RCCS), in Rakai district, South-Western Uganda. The cohort has been described previously [19,20]. Briefly, the RCCS is an open population-based prospective cohort of approximately 15,000 consenting participants aged 15-49 years who were interviewed in surveys conducted every 12-20 months since 1994. Participants provided information in structured interviews and provided blood samples for HIV serology. The data collected included demographics, sexual behaviors, health and contextual characteristics. We used longitudinal data collected from 2003 to 2011. The maximum followup time was 7.7 years. The indices were limited to sexually active individuals who reported sexual intercourse in the previous 12 months.

Potential predictor variables
The variables considered included participant demographics (age, marital status, education); sexual behaviors in the previous twelve months including number of sexual partners, frequency of condom use, use of alcohol before sex by either partner, casual sex, transactional sex, concurrent sexual partners and self-perception of exposure to HIV or perception of exposure by partner (for unmeasured risk factors); biomedical factors including genital ulcer symptoms, men's circumcision status, and use of hormonal contraception by women; HIV testing and counseling in the previous twelve months; contextual factors including community type (trading center versus village), whether one migrated to the community within the previous 2 years, Community HIV prevalence, whether or not the participant's employment type was associated with high risk of acquiring HIV); and partner characteristics including use of alcohol before sex, perception of partner's exposure to HIV, whether partner had a high risk employment type and whether the partner's HIV status was

Statistical methods
We performed frequency tabulations of potential categorical variables and summaries of continuous variables. We used a correlational matrix to determine pairwise correlations between potential variables. Highly correlated variables were combined if they measured similar domains of HIV risk. For this reason, use of alcohol before sex by oneself and one's sexual partner were combined into one variable. The same was done for selfperception of exposure to HIV and perception of exposure to HIV by sexual partners. Two percent of males and about three percent of females had at least one of the candidate variables missing. To avoid possible selection bias we imputed missing values using the aregImpute (Hmisc) R function. This function conducts multiple imputation by using the bootstrap samples for each of the multiple imputations and fits a flexible additive model on a sample with replacement from the original data and this model is used to predict all of the original missing and non-missing values for the variables being imputed [32]. We used Cox proportional hazards regression to model time-to-HIV infection as a function of candidate predictor variables. Since HIV infection was only detected at the time of the survey visits, the exact time of HIV infection was not known so the time of infection was assumed to be the middle of the interval between the last negative and first positive HIV test.
Analyses were stratified by gender because some of the important variables, such as circumcision status and use of hormonal contraception were gender-specific.
We assessed unadjusted associations between variables and HIV acquisition. To determine the optimal form for continuous variables (age and community HIV prevalence), we compared deviances of nested models of polynomials including linear, square, and cubic terms; as well as restricted cubic splines with 3-5 knots. As a result, age was modeled using a square term for men and a linear term for women. Community HIV prevalence was modeled using a linear term.
The global and variable-specific proportional hazards assumption was checked using the cox.zph R function [33] which examines the correlation coefficient between transformed survival time and scaled Schoenfeld residuals as well as the slope of the time-dependent coefficient.
In view of the evidence of higher effectiveness of male circumcision among high risk men [8,10], we included an interaction term between number of sex partners and circumcision status in the full multivariable men's model. We used a main effects model for women. A reduced model was obtained using backward selection procedures with AIC as stopping rule. For statistical inference, the AIC criteria requires that the increase in model x 2 for a given variable be greater than two times the degrees of freedom for the variable and a x 2 p-value not greater than 0.157 for one degree of freedom [34,35].

Model performance
Model discrimination was assessed by using Harrell's concordance index (c index) for survival data and its 95 percent confidence interval [36]. We also examined discrimination graphically using a plot of cumulative HIV incidence by quartiles of predicted HIV risk score.
Model calibration was assessed graphically using a plot of 4-year observed versus predicted survival for groups of participants with different survival probabilities.

Internal Validation
A bias-corrected (corrected for possible over-fitting) c index was obtained using bootstrap resampling validation procedures [36] with 200 bootstrap samples from the original sample. The biascorrected c index gives the best estimate of discrimination if the coefficients from this model were applied to another sample to predict HIV acquisition.

Proportion of new infections due to most-at-risk status
We determined the proportion of cumulative new infections contributed by the most-at-risk group. We used various thresholds of HIV risk scores to define most-at-risk status, including the upper quintile, upper two quintiles, upper quartile, upper third and upper half of the risk scores as possible thresholds.
The model was presented as a nomogram using Harrell's nomogram R function [37]. The nomogram provides scores

Sample
There were a total of 30771 participants in the RCCS surveys from 2003 to 2011. Of these, 4,181 were newly enrolled into the cohort at the last survey and so did not provide follow-up for outcome analyses. Of the remaining 26,590, other participants  were excluded for the following reasons: 7,555 (28.4 percent) were seen only at one survey and so did not contribute to the incident outcome analyses; 2,620 (10.0 percent) were not sexually active during the study period; 2,013 (7.6 percent) were prevalent cases of HIV; and 1,122 (4.2 percent) had no HIV tests or had only one HIV test and therefore could not contribute to the outcome analysis. Therefore we used 13,280 initially HIV uninfected participants with follow up visits to develop the indices. Of these, 7,497 (56.4 percent) were women and 5,783 (43.6 percent) were men. The mean follow-up time was 4.0 years (SD = 1.98) with a maximum of 7.7 years.

Unadjusted analyses
For men, all of the variables we tested were significantly associated with HIV acquisition in the unadjusted analyses at p#0.157 (table 3) except high risk employment and recent migration to the community. For the women, non-significant variables included education, having a partner in high risk occupation, and hormonal contraception.

Multivariable analyses and final model discrimination
In the multivariable analysis 10 factors were selected in the men's model. These factors included age, education, circumcision status, number of sexual partners, alcohol consumption by self or partner, genital ulcers, being unaware of a partner's HIV status, community type, having a partner with a high-risk employment type and community HIV prevalence (table 4). Because age was modeled using a quadratic term, to obtain the effect of age, we differentiated the regression equation with respect to age and obtained the following equation: d(ln l 1 =l 0 ) d(age)~0 :2084292{ 2 Ã 0:0042433 Ã age. This equation shows that the effect of age on the hazards of HIV acquisition is a function of age. The hazard increases more rapidly at age 15 and less rapidly thereafter and tends to reduce after 25 years of age. Men with post-primary education had 43 percent lower hazards of HIV infection compared to men with primary education or no education. Being circumcised was associated with 39 percent lower hazards compared to being non-circumcised. Compared to having one sexual partner in the previous 12 months, having two partners was associated with 21 percent higher hazards and 90 percent higher hazards for having three or more partners. Alcohol consumption before sex by self or partner was associated with 28 percent higher hazards. Having genital ulcers was associated with 81 percent higher hazards. Being unaware of a partner's HIV status was associated with 78 percent higher hazards. Living in in a trading center was associated with 67 percent higher hazards compared to living in the village. Having a partner with a high risk employment type was associated with 85 percent higher hazards. Every unit increase in community HIV prevalence was associated with three percent higher hazards. A similar reduced model for the women included the following 11 factors: age, marital status, education, number of sex partners, having a new sex partner, alcohol consumption by self or partner before sex, having concurrent relationships, being employed in a high-risk occupation, having genital ulcers, community HIV prevalence, and perceiving oneself or partner to have been exposed to HIV infection (table 4). Each year increase in age was associated with a three percent reduction in hazard of HIV acquisition. Post-primary education was associated with 17  percent lower hazards compared with primary education. Compared to monogamous marriage, being in a polygamous relationship was associated with 10 percent higher hazards of HIV acquisition, being separated or divorced was associated with two times the hazard, and women who were never married had 72 percent higher hazards. Compared to having one sexual partner in the previous 12 months, having two or more partners was associated with twice the hazards of infection. Having a new sexual partner in the previous 12 months was associated with 59 percent higher hazards. Concurrent relationships were associated with 50 percent higher hazards. Having genital ulcers was associated with 76 percent higher hazards. High risk employment type was associated with 32 percent higher hazards. Every unit increase in community HIV prevalence was associated with 3. Discrimination was also demonstrated graphically by plotting the cumulative HIV incidence by quartiles of risk score (a linear combination of products of model coefficients and levels of variables). The plot for men (figure 1) showed good separation of quartiles of risk especially the highest quartile. A similar trend is observed in the women's plot; however the separation is less for the lowest two quartiles (figure 2).

Model calibration
A plot of observed versus predicted probabilities of being HIVfree after four years of follow-up showed excellent agreement between observed and predicted probabilities (figures 3 and 4). Similar results were observed at two years of follow-up (results not shown).

Nomograms
Gender-specific nomograms are shown in figures 5 and 6. A score for each variable is obtained by drawing a vertical line to the top scale and reading off the variable's score for each individual. The total score for an individual is obtained by summing up all the individual variable scores. A scale for the total score and the corresponding 2-year and 4-year probabilities of being HIV-free are given at the bottom of the nomograms.

Proportion of cumulative incidence due to most-at-risk status
In table 5 we provided proportions of cumulative incidence due to most-at-risk status at various thresholds. The upper quartile of nomogram scores contributed 55 percent of incident HIV among men and 48 percent among women; while the upper two quintiles contributed 70 percent among men and 63 percent among women. Proportions at other thresholds are shown in table 5.

Discussion
Gender-specific indices to predict risk of HIV acquisition based on the Rakai cohort in Uganda were discriminative and well calibrated. Through graphical methods we showed that the indices showed better discrimination between the highest quartile of risk scores and lower quartiles. Also, most-at-risk groups defined by upper thresholds of risk scores contributed substantially to cumulative incidence (table 5). We believe that this property of the indices makes them suitable for identifying individuals most-atrisk of HIV infection.
To the best of our knowledge, this is the first effort to develop indices to predict individual risk of HIV infection in the general population of SSA. Our study has several strengths. We used a population-based longitudinal cohort which provided high quality data on known predictors of HIV acquisition. We tested various transformations of age to obtain the optimal transformation for age and community HIV prevalence. We also used self-perceived exposure or perceived exposure of partner to HIV as a predictor variable to capture unmeasured predictors of HIV infection. In this population self-perceived risk of exposure to HIV was associated with a higher risk of HIV infection [38].
Our study has limitations. All the variables apart from HIV prevalence were self-reported and therefore subject to recall error and social-desirability bias. However, these self-reported variables were found to be predictive of HIV risk. We believe that the ease of obtaining self-reports makes it feasible to use these indices in clinic settings or HIV counseling offices. We also did not validate these indices in other settings; however we noted that the indices performed well during internal validation; which provides a good indication that they are likely to perform well in other similar settings. However, despite the performance with internal validation, we recommend that these indices be externally validated before use in other populations similar to Rakai. To facilitate external validation, we have provided model coefficients in table 4. In settings where these indices may not be sufficiently predictive, techniques are available to re-calibrate them and update them for these news settings [39].
Given successful validation in other populations, these tools could be used in the context of voluntary counseling and testing to identify people most-at-risk of HIV for targeting them with preventive programs. An example of such interventions is oral preexposure prophylaxis (PrEP) with Tenofovir and Emtricitabine. In the FEM-PrEP [40] and VOICE (MTN 003) [41] trials, oral PrEP was not efficacious for HIV prevention due to low adherence to medications which was ascribed to low self-perceived risk of HIV [40]. Indices such as ours could help inform individuals of their true risk and thus offset false self-perceptions of risk. Therefore, these prediction indices, in addition to generating demand for HIV preventive services, could help maintain high levels of adherence to these services. Also, the routine use of these indices in HIV counseling may increase the efficiency of counseling by focusing on an individual's risk.
The implementation of these indices would require an HIV counselor to score clients HIV risk during a post-test counseling session using gender-specific nomograms provided in figures 5 and 6. The counselor would then determine the individual's level of HIV risk and provide appropriate risk-reduction counseling. We have provided a simple step-by-step guide in the online supplement to guide the implementation (supplement S1).

Conclusion
We developed and internally validated gender-specific indices to predict risk of HIV infection. Our study provides proof-of-concept that indices to predict individual's risk of HIV infection can be developed to increase the efficiency of HIV prevention programs. Further research to validate these indices for other populations is needed.

Supporting Information
Supplement S1 A step-by-step guide to implementation of nomograms. (DOCX)