The authors confirm that co-authors Till Bärnighausen and Marie-Louise Newell are PLOS ONE Editorial Board members. However, this does not alter the authors’ adherence to PLOS ONE Editorial policies and criteria.
Analyzed the data: JL. Wrote the paper: JL JM TB MLN.
Population-based HIV surveillance is crucial to inform understanding of the HIV pandemic and evaluate HIV interventions, but little is known about longitudinal participation patterns in such settings. We investigated the dynamics of longitudinal participation patterns in a high HIV prevalence surveillance setting in rural South Africa between 2003 and 2012, taking into account demographic dynamics. At any given survey round, 22,708 to 30,495 persons were eligible. Although the yearly participation rates were relatively modest (26% to 46%), cumulative rates increased substantially with multiple recruitment opportunities: 68% of eligible persons participated at least once, 48% at least twice and 31% at least three times after five survey rounds. We identified two types of study fatigue: at the individual level, contact and consent rates decreased with multiple recruitment opportunities and, at the population level, these rates also decreased over calendar time, independently of multiple recruitment opportunities. Using sequence analysis and hierarchical clustering, we identified three broad individual participation profiles: consenters (20%), switchers (43%) and refusers (37%). Men were over represented among refusers, women among consenters, and temporary non-residents among switchers. The specific subgroup of persons who were systemically not contacted or refusers constitutes a challenge for population-based surveillance and interventions.
Over the past two decades, population-based longitudinal HIV surveillance [
However, little is known about the underlying dynamics of participation patterns (i.e. participation outcomes in a sequence of survey rounds) in HIV surveillance, which are embedded within demographic dynamics (mortality, aging in, migration) and how these participation dynamics affect HIV prevalence and incidence estimates. Previous work has mainly focused on factors associated with participation in single, or at best two cross-sectional surveys [
Furthermore, universal repeat HIV testing and immediate antiretroviral treatment (“test and treat”) strategies constitute one of the major current research questions in the HIV field [
One of the main challenges facing population-based HIV surveillance is to ensure adequate statistical representation of the general population, or, at the very least, adequate statistical representation of the general population within strata of key variables used to re-weight the surveyed population to reflect the general population. A high participation rate is required for accurate estimation of HIV prevalence, as participation is likely to depend on HIV status [
The main purpose of our study is thus to measure the evolution of annual participation rates, to identify potential study ‘fatigue’ over time, to investigate longitudinal participation patterns and to assess their potential impact on estimates of HIV prevalence and incidence in a high HIV prevalence surveillance setting in a rural area in KwaZulu-Natal, South Africa, where annual HIV surveys have been conducted since 2003 among the adult resident population.
The Africa Centre for Health and Population Studies has hosted a socio-demographic household surveillance in a rural sub-district of uMkhanyakude in northern KwaZulu-Natal (South Africa) since 2000. The surveillance area is 438 km2 in size and includes a population of approximately 90 000 isiZulu-speaking people [
Starting in 2003, a nested HIV surveillance was conducted among resident adults, who were invited to respond to a health and sexual behaviour questionnaire and to provide a small blood sample which was tested for HIV [
All adults residing in the area and who were able to provide informed written consent were eligible to participate in HIV surveillance. From 2003 to 2006, eligibility was restricted to women aged 15–49 years and men aged 15–54. From 2007 onwards, all persons aged 15 years and older were eligible.
The population of eligible resident participants changes substantially on a yearly basis. Thus from 2004 onwards, at the beginning of each calendar year, a list of all resident persons eligible to participate in HIV surveillance was generated from information available in the demographic database. During fieldwork operations, some persons became ineligible “a posteriori”, because the information of death, out-migration or sickness was not available at the time the eligibility list was generated or because their situation changed. These persons were treated as ineligible for the purpose of our analysis.
Informed written consent was obtained from all adult eligible persons aged 15 year or older for participation in the surveillance and to provide a small blood sample for HIV analysis for research purposes. As permitted by the regulatory framework governing research in South Africa at the time of the study [
Data are available from the INDEPTH data repository (doi:
In any given survey round, we defined the HIV surveillance acceptance rate as the proportion consenting to provide a blood sample for HIV testing among persons who were contacted; and the effective HIV surveillance coverage as the proportion consenting to provide a blood sample among all eligible persons.
The use of sequence analysis [
The participation sequence of a person, as illustrated on
To investigate the concept of ‘study fatigue’, which postulates that persons become less interested and thus less likely to participate repeatedly in a study over time, we used two binomial logistic regression models to estimate the probability of being contacted vs. not being contacted (using all participation outcomes, see
To investigate patterns of participation over the long term, we focused our analysis on persons with a long participation sequence length, i.e. persons who were eligible at least seven times. We restricted the sequence analysis to long sequences (of length of 7 to 9) because the distance metric between two sequences we employed is also influenced by difference in sequence length. A hierarchical classification of the entire population would have been unduly influenced by sequence length rather than participation status dynamics. Dissimilarity between sequences was calculated using optimal matching techniques before dividing the population into several groups (participation profiles) using hierarchical classification and describing participation sequences classes. We used the Longest Common Subsequence (LCS) distance to compute distances between sequences (see example on
Sankey diagrams (flow diagrams in which the thickness of the lines is proportional to the flow quantity) were used to represent the dynamic of the eligible population over time and the participation dynamic by sequence position of each participation profile.
All statistical analyses were performed using R 3.0.1 [
Different colours represent the year in which persons first became eligible for HIV surveillance. The numbers in the grey bars represent the number of persons eligible for HIV surveillance during a given survey round. The numbers below the grey bars represent the persons who enter or exit the HIV surveillance eligible population because of death, in- and out-migration or ageing into the open cohort.Annual participation rates and cumulative rates.
Overall, the HIV surveillance acceptance rate remained relatively stable at 32–41% after the first survey round (
Although the effective HIV surveillance participation rate in any single survey round might appear to be low, it is important to consider multiple recruitment opportunities in a longitudinal setting. Thus, cumulatively after five rounds, 68% of all eligible persons participated at least once, 48% at least twice and 31% at least three times (
The concept of study ‘fatigue’ postulates that persons become less likely to participate repeatedly in a study over time. Overall, we found that men and persons aged 20–49 years were less likely to be contacted or to consent to participate if contacted than women (p<0.001) and persons of other ages (
Contacted vs. Not contacted | Consented vs. Refused (among contacted) | |||||
---|---|---|---|---|---|---|
aOR | p | aOR | p | |||
1st | 1 | 1 | ||||
2nd | 0.910 | 0.0028 | 0.411 | 0.0000 | ||
3rd | 0.704 | 0.0000 | 0.290 | 0.0000 | ||
4th | 0.654 | 0.0000 | 0.249 | 0.0000 | ||
5th | 0.591 | 0.0000 | 0.240 | 0.0000 | ||
6th | 0.518 | 0.0000 | 0.217 | 0.0000 | ||
7th | 0.450 | 0.0000 | 0.199 | 0.0000 | ||
8th | 0.392 | 0.0000 | 0.196 | 0.0000 | ||
9th | 0.302 | 0.0000 | 0.178 | 0.0000 | ||
female | 1 | 1 | ||||
male | 0.576 | 0.0000 | 0.722 | 0.0000 | ||
15–19 | 1.589 | 0.0000 | 1.326 | 0.0000 | ||
20–29 | 1 | 0 | 1 | |||
30–39 | 0.801 | 0.0000 | 0.908 | 0.0000 | ||
40–49 | 0.853 | 0.0000 | 1.023 | 0.1531 | - | |
50 or more | 1.490 | 0.0000 | 1.425 | 0.0000 | ||
1 | 1 | 1 | ||||
2 | 0.981 | 0.5763 | - | 1.043 | 0.1335 | - |
3 | 1.448 | 0.0000 | 1.087 | 0.0018 | ||
4 | 1.655 | 0.0000 | 1.087 | 0.0030 | ||
5 | 1.959 | 0.0000 | 1.107 | 0.0000 | ||
6 | 2.503 | 0.0000 | 1.164 | 0.0000 | ||
7 | 2.473 | 0.0000 | 1.121 | 0.0001 | ||
8 | 2.784 | 0.0000 | 1.154 | 0.0000 | ||
9 | 2.955 | 0.0000 | 1.142 | 0.0000 | ||
2003/2004 | 0.186 | 0.0000 | 2.650 | 0.0000 | ||
2005 | 0.879 | 0.0049 | 1.332 | 0.0000 | ||
2006 | 1.390 | 0.0000 | 1.283 | 0.0000 | ||
2007 | 0.564 | 0.0000 | 1.029 | 0.1596 | - | |
2008 | 1 | 1 | ||||
2009 | 0.759 | 0.0000 | 1.024 | 0.2438 | - | |
2010 | 0.339 | 0.0000 | 1.508 | 0.0000 | ||
2011 | 0.305 | 0.0000 | 1.472 | 0.0000 | ||
2012 | 0.206 | 0.0000 | 1.077 | 0.0013 | ||
negative | 1 | 1 | ||||
positive | 1.003 | 0.8965 | - | 0.919 | 0.0000 | |
unknown | 0.770 | 0.0000 | 0.196 | 0.0000 | ||
no | 1 | 1 | ||||
yes | 0.636 | 0.0000 | 0.986 | 0.2094 | - |
247,040 observations for model 1 (contacted vs. not contacted). 220,096 observations for model 2 (consented vs. refused). aOR: adjusted Odds Ratio.
*** p ≤ 0.001,
** 0.001< p ≤ 0.01,
* 0.01 < p ≤ 0.05,—p > 0.05.
In order to identify participation profiles, we conducted a cluster analysis of persons with long participation sequences, with a length of 7–9 survey rounds. While persons with long (7–9 rounds) participation sequences represented 19% of the 60,954 persons who were ever eligible, they represented 33–47% of eligible persons during any given survey round (
This flowchart represents the flows between participation outcomes from one sequence position to the next one.
Consenters | Switchers | Refusers | All | |
---|---|---|---|---|
n | 2,401 | 5,076 | 4,321 | 11,798 |
% | 20.4 | 43.0 | 36.6 | 100.0 |
female (%) | 75.0 | 62.0 | 56.7 | 62.7 |
male (%) | 25.0 | 38.0 | 43.3 | 37.3 |
15–19 y. (%) | 24.6 | 27.3 | 17.2 | 23.0 |
20–29 y. (%) | 19.3 | 24.2 | 23.3 | 22.9 |
30–39 y. (%) | 19.4 | 21.0 | 28.3 | 23.4 |
40–49 y. (%) | 30.6 | 22.5 | 26.1 | 25.5 |
50+ y. (%) | 6.1 | 5.0 | 5.0 | 5.2 |
median (years) | 33 | 29 | 33 | 31 |
mean | 6.2 | 2.9 | 0.5 | 2.7 |
At least one (%) | 100.0 | 99.9 | 35.0 | 76.2 |
At least two (%) | 100.0 | 84.1 | 9.8 | 60.1 |
At least three (%) | 100.0 | 58.2 | 1.3 | 45.9 |
1st | 15.5 | 18.5 | 22.1 | 17.7 |
2nd | 16.7 | 22.6 | 24.7 | 20.2 |
3rd | 17.9 | 24.2 | 28.6 | 21.5 |
4th | 20.9 | 26.6 | 28.6 | 23.7 |
5th | 23.9 | 29.6 | 28.7 | 26.4 |
6th | 26.3 | 31.6 | 30.3 | 28.7 |
7th | 27.7 | 36.8 | 38.8 | 32.5 |
8th | 27.2 | 35.4 | 40.3 | 31.7 |
9th | 24.5 | 31.6 | 42.9 | 29.1 |
number of conversions | 275 | 448 | 24 | 747 |
person-years of observation | 13,345 | 16,064 | 1,068 | 30,477 |
observed HIV incidence (‰) | 20.6 | 27.9 | 22.5 | 24.5 |
no | 61.9 | 55.5 | 60.5 | 58.6 |
yes | 38.1 | 44.5 | 39.5 | 41.4 |
Consenters generally had refusal rates below 20% at any sequence position (except for the 8th and the 9th outcome where it increased to 23% and 34%, respectively). Consenters who refused to participate in any round were more likely to consent to participate in the subsequent round (
Men were over-represented among refusers, women among consenters, and temporarily ineligible persons among switchers (
Observed HIV incidence among refusers (2.3%) was estimated only on the 9.8% who provided a sample for testing twice and is thus a limited indicator for this group (not representative and sample size too small, there is no statistically significant difference between observed incidence among refusers and incidence among consenters, p = 0.7643, or switchers, p = 0.3418, using Fisher test). Another proxy of HIV incidence could be the increase of HIV prevalence. Between the first and the 9th outcomes, observed HIV prevalence increased by 9.0% for consenters, 13.1% for switchers and by 20.8% for refusers. It should be noted however that prevalence could also increase in case of improved coverage of HIV treatment and, consequently, of reduction in mortality due to HIV.
Due to substantial levels of mortality [
One of the main criticisms addressed at population-based HIV surveillance is the low yearly participation rates [
Refusal could constitute an important source of bias for prevalence and incidence estimation. We showed that refusal was associated with sex and age, and we estimated HIV prevalence to be substantially higher among refusers [
We were able to detect two types of study ‘fatigue’. The first was clearly related to individual-level ‘fatigue’, i.e. the more frequently persons were asked to participate, the less likely they were to repeatedly consent. Secondly, we also detected a wider ‘surveillance system fatigue’, i.e. a tendency over time for participation rates to decrease in later rounds compared to earlier rounds. This could be due to a changing local social perception of HIV surveillance, of persons not being convinced of the utility of HIV surveillance or even of a fatigue of the surveillance system including field workers. The latter could explain the observed increase of non-contacts in later survey rounds. However, qualitative work would be required to explore these hypotheses.
From an HIV intervention perspective, our results could be used to help assess the likely population impact of a test and treat strategy currently evaluated through large trials. One of the main challenges of test and treat trials is to repeatedly reach a high proportion of the total population. Even if the setting of a trial is different from a surveillance setting, our study shows that refusal and non-contacts could be a major issue. More than a fifth (23.4%, see
To our knowledge, this is the first time that Sankey diagrams and this type of sequence analysis has been applied to longitudinal HIV epidemiology. These methods allow a detailed description of the complex participation dynamics in an open longitudinal cohort and deserve wider attention in similar other settings.
(PDF)
The numbers in the grey bars represent the number of persons eligible for HIV surveillance during that round. The numbers below the grey bar represents the persons who enter (green) or exit (red) the HIV surveillance because of death, migration or ageing into the open cohort. Sequence length corresponds to the total number of times a person has been eligible for HIV surveillance. Individual sequences have been categorized as short (length of 1 to 3), mid (length of 4 to 6) or long (length of 7 to 9).
(TIFF)
(TIFF)
A dendrogram is a tree diagram illustrating the arrangement of the participation sequences produced by hierarchical clustering. Partition is obtained by cutting the dendrogram at a specific height and is represented by red rectangles. Profiles have been named according to their participation pattern (see
(TIFF)