Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa

The Global AIDS Strategy 2021-2026 identifies adolescent girls and young women (AGYW) as a priority population for HIV prevention, and recommends differentiating intervention portfolios geographically based on local HIV incidence and individual risk behaviours. We estimated prevalence of HIV risk behaviours and associated HIV incidence at health district level among AGYW living in 13 countries in sub-Saharan Africa. We analysed 46 geospatially-referenced national household surveys conducted between 1999-2018 across 13 high HIV burden countries in sub-Saharan Africa. Female survey respondents aged 15-29 years were classified into four risk groups (not sexually active, cohabiting, non-regular or multiple partner[s] and female sex workers [FSW]) based on reported sexual behaviour. We used a Bayesian spatio-temporal multinomial regression model to estimate the proportion of AGYW in each risk group stratified by district, year, and five-year age group. Using subnational estimates of HIV prevalence and incidence produced by countries with support from UNAIDS, we estimated new HIV infections in each risk group by district and age group. We then assessed the efficiency of prioritising interventions according to risk group. Data consisted of 274,970 female survey respondents aged 15-29. Among women aged 20-29, cohabiting (63.1%) was more common in eastern Africa than non-regular or multiple partner(s) (21.3%), while in southern countries non-regular or multiple partner(s) (58.9%) were more common than cohabiting (23.4%). Risk group proportions varied substantially across age groups (65.9% of total variation explained), countries (20.9%), and between districts within each country (11.3%), but changed little over time (0.9%). Prioritisation based on behavioural risk, in combination with location- and age-based prioritisation, reduced the proportion of population required to be reached in order to find half of all expected new infections from 19.4% to 10.6%. FSW were 1.3% of the population but 10.6% of all expected new infections. Our risk group estimates provide data for HIV programmes to set targets and implement differentiated prevention strategies outlined in the Global AIDS Strategy. Successfully implementing this approach would result in more efficiently reaching substantially more of those at risk for infections.

First, we used a multinomial logistic regression model to infer the proportion of AGYW in the risk groups k ∈ {1, 2, 3 + }. One way to specify this model is using a multinomial likelihood where the number of women in risk group k is y itak , the sample size is m ita = 3 + k=1 y itak , p itak > 0 is the probability of membership of the kth risk group with 3 + k=1 p itak = 1, and taking k = 1 to be the baseline category, linear predictors may be specified for the additive log ratios log(p itak /p ita1 ). To facilitate inference in the R-INLA, we used an equivalent Poisson log-linear model via the multinomial-Poisson transformation (Baker 1994). This transformation, and further details about this model, are presented in Section 1.2.
Next, we fit a logistic regression model to estimate the proportion of those in the k = 3 + risk group that were in the k = 3 and k = 4 risk groups respectively. This model was of the form where q ia = p ia4 /(p ia3 + p ia4 ) = p ia4 /p ia3 + and the linear predictor η ia is chosen suitably, described in more detail in Section 1.3. Taking this two-step approach allowed us to include all surveys in the multinomial regression model, but only those surveys with a specific transactional sex question in Equation 2. As all such surveys occurred in 2013-2018, in the logistic regression model we assumed q ia to be constant over time.
To facilitate uncertainty quantification, we took 1000 posterior samples, indexed by s, from each of the multinomial {p s itak } and logistic regression {q s ia } models. Samples p s ita3 and p s ita4 were then generated by p s ita3 = (1 − q s ia )p ita3 + s and p s ita4 = q s ia p s ita3 + . As transactional sex does not directly correspond to sex work (Wamoyi et al. 2016), we adjusted our samples so that at a national level, the population size estimates match FSW population size estimates by age. Our methodology for producing these estimates is described in Section 2. In making this adjustment, we assumed that subnational variation in the FSW proportions corresponds to that of the transactional sex proportions.
We calculated the mean, median, and 2.5% and 97.5% quantiles for the risk group probabilities empirically using the adjusted samples. To produce aggregated estimates, such as the age category 15-24 or any national estimates, we weighted the adjusted samples by population sizes N ita = k N itak obtained from the Naomi model (Eaton et al. 2021).

The multinomial-Poisson transformation
The multinomial-Poisson transformation reframes a given multinomial logisitic regression model as an equivalent Poisson log-linear model of the form for certain choice of the linear predictor η itak . The basis of the transformation is that, conditional on their sum, Poisson counts are jointly multinomially distributed (McCullagh and Nelder 1989) as follows where κ ita = 3 + k=1 κ itak such that category probabilites are obtained by the softmax funciton In the equivalent model, the sample sizes m ita = k y itak are treated as random, rather than fixed as they would be in the multinomial logistic regression model, taking a Poisson distribution In the equivalent model, the joint distribution of p(y ita , m ita ) = p(y ita | m ita )p(m ita ) is corresponding to the product of independent Poisson likelihoods as in Equation 4. This model, including random sample sizes, is equivalent to the multinomial logistic regression only when these normalisation constants are recovered exactly. To ensure that this is the case, one approach is to include observation-specific random effects θ ita in the equation for the linear predictor. Multiplying each of {κ itak } 3 + k=1 by exp(θ ita ) has no effect on the category probabilities, but does provide the necessary flexibility for κ ita to recover m ita exactly. Although in theory an improper prior θ ita ∝ 1 should be used, in practise, by keeping η ita otherwise small using appropriate constraints, so that arbitrarily large values of θ ita are not required, it is sufficient (and practically preferable for inference) to instead use a vague prior.

Model specifications
Model ID Category (β k ) Country (ζ ck ) Age (α ack ) Spatial (ϕ ik ) Temporal (γ tk ) Spatio-temporal (δitk)   M1  IID  IID  IID  IID  IID  Not included  M2  IID  IID  IID  Besag  IID  Not included  M3  IID  IID  IID  IID  AR1  Not included  M4  IID  IID  IID  Besag  AR1 Not included Table A: The multinomial regression models that we considered. Observation random effects θ ita , included in all models, are omitted from this table.

Spatial random effects
The specifications we considered were IID ϕ ik ∼ N (0, τ −1 ϕ ), and Besag grouped by category given by the Kronecker product of the scaled Besag structure matrix R ⋆ b and the identity matrix I, and − denotes the generalised matrix inverse. Scaling of the structure matrix to have generalised variance one ensures interpretable priors may be placed on the precision parameter (Sørbye and Rue 2014). We followed the further recommendations of Freni-Sterrantino, Ventrucci, and Rue (2018) with regard to disconnected adjacency graphs, singletons and constraints. The Besag structure matrix R b is obtained by the precision matrix of the random effects where j ∼ i if the districts A i and A j are adjacent, and n δi is the number of districts adjacent to A i .
In preliminary testing, we excluded spatial random effects from the model, but found that this negatively effected performance. We also tested using the BYM2 model (Simpson et al. 2017) in place of the Besag, but found that the proportion parameter posteriors tended to be highly peaked at the value one. For simplicity and to avoid numerical issues, by using Besag random effects we decided to fix this proportion to one.

Temporal random effects
The specifications we considered were IID ϕ tk ∼ N (0, τ −1 ϕ ), and AR1 grouped by category where the scaled structure matrix R ⋆ γ = R ⋆ r ⊗ I is given by the Kronecker product of a scaled AR1 structure matrix R ⋆ r and the identity matrix I. The AR1 structure matrix R r is obtained by precision matrix of the random effects r = (r 1 , . . . , r T ) ⊤ specified by where ϵ t ∼ N (0, 1) and |ρ| < 1. For the lag-one correlation parameter ρ, we used the PC prior, as derived by Sørbye and Rue (2017), with base model ρ = 1 and condition P(ρ > 0 = 0.75). We chose the base model ρ = 1 corresponding to no change in behaviour over time, rather than the alternative ρ = 0 corresponding to no correlation in behaviour over time, as we judged the former to be more plausible a priori.

Constraints
To ensure interpretable posterior inferences of random effect contribution, we applied sum-to-zero constraints such that none of the category interaction random effects altered overall category probabilities. For the space-year-category random effects, we applied analogous sum-to-zero constraints to maintain roles of the space-category and year-category random effects. Together, these were: 1. Category k β k = 0 2. Country c ζ ck = 0, ∀ k 3. Age-country a α ack = 0, ∀ c, k, 4. Spatial i ϕ ik = 0, ∀ k 5. Temporal t γ tk = 0, ∀ k

Survey weighted likelihood
We included surveys which use a complex design, in which each individual has an unequal probability of being included in the sample. For example the DHS often employs a two-stage cluster design, first taking an urban rural stratified sample of ennumeration areas, before selecting households from each ennumeration area using systematic sampling (DHS 2012).
To account for this aspect of survey design, we use a weighted pseudo-likelihood where the observed counts y are replaced by effective counts y ⋆ calculated using the survey weights w j of all individuals j in the corresponding strata. We multiplied direct estimates produced using the survey package (Lumley 2004) by the Kish effective sample size (Kish 1965) to obtain y ⋆ . These counts may not be integers, and as such the Poisson likelihood we used in Equation 4 is not appropriate. Instead, we used a generalised Poisson pseudo-likelihood y ⋆ ∼ xPoisson(κ), given by as implemented by family = "xPoisson" in R-INLA, which accepts non-integer input.

Model selection
We performed model selection on the basis of the conditional predictive ordinate (CPO) criterion (Pettit 1990), selecting model M2. This model included Besag spatial random effects and IID temporal random effects. For reference, we also computed the deviance information criterion (DIC) (Spiegelhalter et al. 2002) and widely applicable information criterion (WAIC) (Watanabe 2013) (

Logistic regression model
Model ID Intercept (β0) Country (ζc) Age (αca) Spatial (ϕi) Covariates Table C: The logistic regression models that were considered. cfswever denotes the proportion of men who have ever paid for sex and cfswrecent denotes the proportion of men who have paid for sex in the past 12 months.
We considered six logistic regression models (Table C)  95% prior probability on the range 2-50% for the percentage of those with non-regular or multiple partners who report transactional sex. We considered two specifications (IID, Besag) for the spatial random effects ϕ i . To aid estimation with sparse data, we also considered national-level covariates for the proportion of men who have paid for sex ever cfswever or in the last twelve months cfswrecent, available from Hodgins et al. (2022). For both random effect precision parameters τ ∈ {τ α , τ ζ } we used the PC prior with base model σ = 0 and P(σ > 2.5 = 0.01). For the regression parameters β ∈ {β cfswever , β cfswrecent } we used the prior β ∼ N (0, 2.5 2 ).

Survey weighted likelihood
As with the multinomial regression model, we used survey weighted counts {y ⋆ itak } and sample sizes {m ⋆ itak }. We used a generalised binomial pseudo-likelihood y ⋆ ∼ xBinomial(m ⋆ , q), as implemented by family = "xBinomial" in R-INLA, given by to extend the binomial distribution to non-integer weighted counts and sample sizes.

Model selection
We selected the best model according to the CPO statistic, which was model L6. CPO values, along with DIC and WAIC values for reference, are presented in Table D and Fig B. Inclusion of Besag spatial random effects, rather than IID, consistently improved performance. Benefits from inclusion of covariates were more marginal. That said, as some countries had no suitable surveys, we preferred to include covariate information such that the estimates in these countries are based on some country-specific data.

Coverage assessment
To assess the calibration of our fitted model, we calculated the quantile q of each observation within the posterior predictive distribution. For calibrated models, these quantiles, known as probability integral

FSW population size estimation
To estimate the number of FSW by age group and country, we disaggregated country-specific estimates of adult (15-49) FSW population size from Stevens et al. (2022) by age group.
First, we calculated the total sexually debuted population in each age group, in each country. To describe the distribution of age at first sex, we used skew logistic distributions (Nguyen and Eaton 2022) with cumulative distribution function given by where κ c , µ c , γ c > 0 are country-specific shape, shape and skewness parameters respectively.
Next, we used the assumed Gamma(α = 10.4, β = 0.36) FSW age distribution in South Africa from the Thembisa model (Johnson and Dorrington 2020) to calculate the implied ratio between the number of FSW and the sexually debuted population in each age group. We assumed these ratios in South Africa were applicable to every country to calculate the number of FSW by age group in all 13 countries (Fig D).

Calculation of prevalence and PLHIV
We calculated HIV prevalence ρ iak and the number of people living with HIV (PLHIV) H iak stratified according to district, age group and risk group by disaggregating Naomi estimates by risk group.
To do so, we estimated HIV prevalence log odds ratios relative to the general population using age, country and risk group specific HIV prevalence bio-marker survey data. We also included general population HIV prevalence data, allowing us to fit a logistic regression model including indicator functions for each risk group, and an indicator for being in the general population. The regression coefficients in this model correspond to log odds, such that log odds ratios may be easily obtained by taking the difference.
To allow the log odds ratio for the highest risk group to vary based on general population prevalence we fit a linear regression of the FSW log odds against the general population log odds. We ensured that the log odds ratio for the highest risk group was at least as large as that of the second highest risk group.
Given the fitted log odds ratios log(OR k ), we disaggregated Naomi estimates of PLHIV H ia on the logit scale using numerically optimisation to obtain the value of θ minimising the function where logistic(x) = exp(x)/(1 + exp(x)) such that logistic(θ + log(OR k )) = ρ iak . Fig R -

Calculation of incidence and expected number of new infections
We calculated HIV incidence λ iak and number of new HIV infections I iak stratified according to district, age group and risk group by linear disaggregation Risk group specific HIV incidence estimates are then given by which we evaluated using Naomi model estimates of the number of new HIV infections I ia = λ ia N ia , HIV infection risk ratios {RR 3 , RR 4 (λ ia )}, and risk group population sizes as above. The risk ratio RR 4 (λ ia ) was defined as a function of general population incidence. Fig AE -AQ in S2 Text show the resulting HIV incidence estimates by risk group in each country. The number of new HIV infections are then I iak = λ iak N iak .

Calculation of expected new infections reached
We calculated the number of new infections that would be reached prioritising according to each possible stratification of the population-that is for all 2 3 = 8 possible combinations of stratification by location, age, and risk group. As an illustration, for stratification just by age, we aggregated the number of new HIV infections and HIV incidence as such Under this stratification, individuals in each age group a are prioritised according to the highest HIV incidence λ a . By cumulatively summing the expected infections, for each fraction of the total population reached we calculated the fraction of total expected new infections that would be reached. Fig AR -BD in S2 Text show the percentage of new infections that would be reached prioritising according to each possible stratification within each country.
This analysis was relatively simple. More involved analyses might consider prioritisation of a hypothetical intervention which has some, possibly varying, probability of preventing HIV acquisition, as well as the costs associated to its roll-out.