The authors have declared that no competing interests exist.
Conceived and designed the experiments: JLS SJB HJWS CO. Analyzed the data: JLS HJWS CO. Contributed reagents/materials/analysis tools: AWS. Wrote the paper: JLS HJWS SJB CO AWS.
Implementation of trachoma control strategies requires reliable district-level estimates of trachomatous inflammation–follicular (TF), generally collected using the recommended gold-standard cluster randomized surveys (CRS). Integrated Threshold Mapping (ITM) has been proposed as an integrated and cost-effective means of rapidly surveying trachoma in order to classify districts according to treatment thresholds. ITM differs from CRS in a number of important ways, including the use of a school-based sampling platform for children aged 1–9 and a different age distribution of participants. This study uses computerised sampling simulations to compare the performance of these survey designs and evaluate the impact of varying key parameters.
Realistic pseudo gold standard data for 100 districts were generated that maintained the relative risk of disease between important sub-groups and incorporated empirical estimates of disease clustering at the household, village and district level. To simulate the different sampling approaches, 20 clusters were selected from each district, with individuals sampled according to the protocol for ITM and CRS.
Although in some contexts the two methodologies may be equivalent, ITM can introduce a bias-dependent shift as prevalence of TF increases, resulting in a greater risk of misclassification around treatment thresholds. In addition to strengthening the evidence base around choice of trachoma survey methodologies, this study illustrates the use of a simulated approach in addressing operational research questions for trachoma but also other NTDs.
Reliable district-level prevalence estimates of active trachoma are essential to targeting control interventions. While cluster randomised surveys (CRS) remain the recommended strategy for obtaining these estimates, more rapid and cost-effective methods that can be integrated with other diseases are under investigation. One proposed method is Integrated Threshold Mapping (ITM), which incorporates a school-based platform into the sampling protocol. This study uses a computerised sampling approach to evaluate whether ITM and CRS are equivalent, and explore the impact of varying key parameters on the performance of these sampling methodologies. The results from these simulations reflect a known limitation of school-based sampling: that resulting prevalence estimates are unreliable when the enrollment is low and/or the risk of disease in schools differs from communities. However, quantification of the performance of ITM at the district level highlights the variation in performance in different contexts and provides important information for national control programmes. The results from this study strengthen the evidence base around trachoma sampling methodologies and demonstrate the advantages of using a simulated approach to evaluate different sampling scenarios.
Since the establishment in 1998 of the Global Elimination of Trachoma by 2020 (GET2020) Alliance, an increasing number of endemic countries have implemented national programmes in an effort to meet elimination targets. These targets are less than one case of trachomatous trichiasis (TT) per 1000 total population unknown to the health system, and <5% trachomatous inflammation–follicular (TF) in children aged 1–9 years, at the sub-district level
Both CRS and ITM diagnose trachoma based on the presence of key clinical signs using the 1987 WHO simplified grading system: TF in children aged 1–9 and TT in adults aged over 14
TF Prevalence (district level) | Classification | Treatment strategy |
<5% | Active trachoma not a public health problem | No MDA |
5–9.9% | Hypo-endemic | Determine need for MDA at sub-district level |
10–29.9% | Meso-endemic | MDA at district level (≥3 years |
>30% | Hyper-endemic | MDA at district level (≥5 years |
before reassessment to determine whether to stop or continue.
CRS | ITM | |
Platform | Community-based | School-based with younger children brought from the community |
Cluster selection | Probability proportional to size or random selection | Random selection: minimum 2 per subdistrict |
Participant selection | Household | Children aged 6–9 at school & 1–5 year old children from communities |
Sample size and age groups | 100 aged 1–9 years | 25 aged 1–5 years and 25 aged 6–9 years |
Although ITM was internally validated against CRS during the pilot phase of the methodology's development in Mali and Senegal
Computerised sampling simulations have provided a convenient platform recently to evaluate alternative survey designs for tropical diseases including soil-transmitted helminthes, trachoma and schistosomiasis
This analysis used computerised sampling simulations to compare the precision and accuracy of district level prevalence estimates based on ITM versus CRS. Furthermore, we compared the performance of both survey methodologies, in terms of their ability to correctly classify districts according to established TF prevalence thresholds and the factors that affect the degree of equivalence. Equivalence between the two survey methods, under different scenarios, was formally evaluated by testing the null hypothesis that ITM yields the same programmatic results compared to CRS.
Simulating sampling designs require gold standard data from which to draw samples and compare sample estimates. There are no perfect datasets available to conduct this analysis, which would necessitate standarised, full census datasets of demographic and epidemiological information for multiple districts. An alternative is to simulate these data, using parameter estimates from empirical data to generate realistic pseudo gold standard data on active trachoma
One dataset used to parameterize this analysis comes from Kahe Village, Rombo District, northern Tanzania, which is a single community that consists of 90 local administrative units called balozis. A fully enumerated census and survey of trachoma was conducted in April to June 2000 by means of a house-to-house survey, using the WHO simplified grading system, prior to the initiation of any interventions against trachoma. A single examiner collected these data and clinical grading was validated through a live-patient inter-grader agreement exercise using an international expert reference grader with an agreement of 100% for TF. The dataset in total consists of 5748 individuals in 1103 households, with between 41–126 individuals and 8–23 households per balozi. The dataset included information on the presence or absence of TF in 1831 children aged 1–9 years, where the prevalence was 33.4%. Data on school enrolment were also available for a subset (23%) of children aged 6–9 years.
The demographic (age and gender) and household structure present in Kahe was used for all simulated communities in the expanded dataset. This dataset was also used to provide initial values used to parameterize the models, including the relative risk of TF between children aged 1–5 years and 6–9 years and the intra-cluster correlation (ICC) measuring the degree of disease clustering within households. The subset of data with information on enrolment provided an initial value for the relative risk of TF in children aged 6–9 who were enrolled in school to those who did not. In addition, this dataset was used to assess whether there was an additional household level risk associated with having a schoolgoing/non-schoolgoing sibling and inform the simulation model (results presented in the Technical Appendix).
Data on the prevalence of active trachoma were available for 305 clusters (non-overlapping sampling populations) from 29 districts in Kenya, surveyed as part of the National Trachoma Control Programme between 2004–2012 and included within the Global Atlas of Trachoma
Variance in the prevalence of active trachoma was quantified within 29 districts in Kenya. The mean within-district variance was then used to inform beta density functions for simulating cluster-level prevalence values for varying district level prevalence values.
The process of expanding the community dataset to simulate realistic data for 100 communities within each of 100 districts is fully described in the Technical Appendix (Supplement 1). In brief, district level prevalence estimates were generated covering all endemicity classes and used to simulate community level estimates of TF in children aged 1–9 years. The burden of TF within each simulated community was distributed among the population according to parameters initially defined by the above datasets (
Key Parameter | Rationale | Method for estimation & Initial Value | Sensitivity Analysis |
1. Age-specific prevalence of TF: TF in 1–5 years versus 6–9 years | In order to expand a cluster level prevalence estimate in children aged 1–9 years to the two age groups, need to know RR between groups. This will likely vary with endemicity. | Estimated from gold standard datasets Initial value: 2.0 | Varied parameter: 1.3, 1.5, 1.8, 1.0, 2.0 |
2. Risk of TF in enrolled children vs non-enrolled children | Likely that enrolled children will have lower TF prevalence | Estimated from gold standard datasets Initial value: 0.5 | Varied parameter: 0.25, 0.33, 0.5, 0.75, 1.0 |
3. School attendance | This will affect the sample size in schools of 6–15 year olds and affect the impact of parameter 2. | Ministry of Education data Initial value: 0.7 | Varied parameter: 0.4 and 0.7 |
4. Clustering within households: risk of TF in children aged 1–5 years with a TF positive/negative sibling | Clustering at the household level will mean that children with TF positive siblings are more likely to have TF | Estimated from gold standard datasets Initial value: 0.2 | Varied parameter: 0.1, 0.2, 0.3, 0.4, 0.5 |
TF: trachomatous inflammation–follicular; RR: relative risk.
Random selection of 20 clusters were used in simulations for both.
To avoid basing simulations on data parametized by single village-level and district level datasets, additional pseudo-gold standard datasets were simulated varying each of the epidemiological parameters identified in
CRS for trachoma uses a standard two-stage or multi-stage design, often comprising a random selection of approximately 20 villages (clusters) at the first stage and selection of households at the second
A computerized simulation approach, using Monte Carlo methods, was used to randomly select 20 clusters from each district and sample individuals within each cluster according to the protocol for ITM and CRS (
District-level prevalence estimates generated by the two sampling methodologies were used to classify districts according to endemicity class for each simulation, using categories corresponding to established treatment thresholds: hypo-endemic (<10%), meso-endemic (10–30%) and hyper-endemic (>30%) (
Due to the complicated sampling distributions of these methodologies, it is not possible to calculate the full theoretical OC curves. However, we can visualize the empirical OC curves resulting from these simulation studies, which are generated from the proportion of times a district is correctly classified in each endemicity class using the two methodologies, over a “range” of district prevalence values. For each survey method, this allowed us to establish the range of district prevalence values in which the probability of correctly classifying a district is less than or equal to 0.80.
Overall agreement in district endemicity classifications by the two methodologies was assessed using a weighted kappa-statistic. This statistic provides a measure of agreement between the two methods adjusted for chance, where a value of zero indicates agreement no better than chance. Weighting is useful when there are more than two ordered categories, so that the magnitude of disagreement between categories is allowed to vary (i.e., difference between <10% and 10–30% is not as great as that between <10% and >30%). Increasing kappa values correspond to better agreement between the two methods, where agreement is often interpreted as slight (<0.2), fair (0.2–0.4), moderate (0.4–0.6), substantial (0.6–0.8) and almost perfect (≥0.8)
Equivalence between the two survey methods was formally evaluated by testing the null hypothesis that ITM yields the same programmatic results compared to CRS. The distribution of the difference in the proportion of correctly classified districts by ITM and CRS was generated and the mean and 95% CIs plotted in relation to delta, Δ, a threshold corresponding to a predefined level of difference deemed programmatically important. In these analyses, delta was initially assumed to be 20%, based on the rationale that this is equal to 80% of the simulations being classified the same by ITM and CRS and roughly corresponding to a standard level of acceptable error. Where the CI fell within this range, the survey methods were classified as equivalent for that district, while those that fell outside were classified as not equivalent and those that overlapped with the thresholds as inconclusive. Districts were stratified by the relative risk of TF and endemicity class to evaluate whether the equivalence of the two methodologies varied with these parameters.
Overall, the results indicate that ITM under-estimates the true prevalence of TF compared to CRS and that the magnitude of difference between estimates from these methodologies increases with endemicity. This is illustrated in
Plots are generated using simulated data and present results from a single district within each endemicity class. The red line represents the true district-level prevalence, the curves are histograms of values from 1000 simulations using the CRS method (red) and ITM method (blue).
The proportion of times each of 100 districts were correctly classified by ITM and CRS were compared to true prevalence, where the relative risk of TF in enrolled and non-enrolled children is equal to 0.5 and enrolment rate is 0.7. The green lines correspond to the treatment thresholds and the boxes in red and grey around these thresholds to areas of “higher” misclassification, where the districts will be correctly classified less than 80% of the time.
Using a relative risk of TF in enrolled versus non-enrolled children equal to 0.5, there was “almost perfect” agreement (Kappa = 0.86) in district-level endemicity classification between ITM and CRS overall in the 1000 simulated samples. However, agreement between ITM and CRS decreased with increasing endemicity category, with substantial agreement found in hypoendemic districts (Kappa = 0.71) and only moderate agreement in mesoendemic (Kappa = 0.47) and hyperendemic districts (Kappa = 0.41).
The equivalence analysis in
The figure presents the difference in the proportion of times ITM correctly classified districts compared to CRS (over 1000 simulations) by endemicity class in relation to an assumed value (20%) representing an important programmatic difference. The blue square is the mean difference in proportions and the lines correspond to the difference in the 95% CI. The two methods are deemed equivalent when ITM correctly classifies districts differently to CRS no more than 20% of the time.
Sensitivity analysis of the impact of varying key parameters as shown in
Equivalence is determined by calculating the difference in the probabilities that CRS and ITM will correctly classify a given district over 1000 simulations, and estimating whether this difference exceeds a delta equal to 0.2, signifying that two methods classify districts differently no more than 20% of the time. The figure presents equivalence by endemicity class and relative risk of TF in enrolled and non-enrolled children, where enrolment is equal to 0.4 (blue) or 0.7 (green).
Range of values in which the risk of misclassifying a district using CRS and ITM sampling methodologies is greater or equal to 0.20 around the 10% and 30% thresholds, with the enrolment rate equal to 0.7.
Our simulations show that over a range of epidemiological settings, ITM will under-estimate the true prevalence of TF. The error introduced by ITM also means that districts are more prone to misclassification according to treatment thresholds than by CRS. The extent of underestimation and misclassification of districts introduced by ITM is dependent on three main factors: (i) the district prevalence of TF; (ii) the relative risk of TF between enrolled and non-enrolled children within clusters; and (iii) the enrollment rate in schools. In general, the overall agreement between the two methods is high, but as the difference in risk of TF between enrolled and non-enrolled children becomes more pronounced, there is a shift in prevalence estimates corresponding to the magnitude of the bias. In these situations, the null hypothesis of programmatic equivalence between the two methodologies is not supported.
Use of a school-based platform is a key methodological difference between CRS and ITM and, while the potential pitfalls of this approach are well recognised, the impact of this strategy on treatment decisions has not been systematically evaluated until now
As a consequence of this potential bias, ITM may be less likely than CRS to misclassify areas as greater than 10% or 30% when the true prevalence is below this threshold, but more likely to misclassify areas as lower when the true prevalence is higher. Misclassification is more comparable between the two methodologies at the 10% threshold, particularly when the relative risk between enrolled and non-enrolled children is closer to one. At this threshold, the misclassification by ITM would result in resources being allocated for further surveys at the subdistrict level instead of implementing MDA for the entire district. In practice, the difference in performance is most likely to impact interventions around the 30% threshold, where areas misclassified by ITM would be treated for three years before an impact survey instead of being treated for five years. Districts that fall within areas of high misclassification are of operational interest and the optimal choice of survey design is likely to be a function of the cost of the surveys, the costs of treatment associated with misclassification around both thresholds and the likely impact of treatment decisions on long term transmission dynamics. For example, while a particular survey design may be a cost-effective method to classify districts at a given round, a more accurate but more expensive survey design may allow quicker elimination of the disease leading to cost-savings in the future. Incorporating costs and the impact of treatment decisions on transmission was beyond the scope of this paper, but is the focus of future study.
Our use of computerised simulation has a number of advantages over field evaluations of trachoma sampling approaches
Although our study explored the performance of ITM and CRS in varying contexts, there are a number of potential limitations that may limit its generalisability. First, although key factors were varied in order to test sampling strategies in different epidemiological scenarios, exploring datasets similar to the data from Kahe in Tanzania and from Kenya would allow a more realistic range of parameters to be incorporated. In addition, parameterisation of the model assumed constant relationships which may be more complex in reality. Certain factors, like household clustering of trachoma, may vary markedly based on local transmission intensity, however no clear and consistent relationship was supported by available data. This may partly be due to random error introduced by the clinical sign TF, which is known to be an unreliable marker of
The results from this study strengthen the evidence base around trachoma sampling methodologies and demonstrate the advantages of using a simulated approach to evaluate different sampling scenarios. To a large extent, the results from these simulations reflect a known limitation of school-based sampling: that resulting prevalence estimates are unreliable when the enrollment is low and/or the risk of disease in schools differs from communities. However, quantification of the performance of ITM at the district level in different contexts provides important information for national control programmes. In areas where enrolment is known to be very high, and it can be reliably inferred that the bias is minimized, then ITM may provide a rapid, cost-effective alternative to CRS
This paper serves as a demonstration of the use of sampling simulations to explore alternative sampling approaches not only for trachoma but also for other NTDs. We propose that this methodology be adopted as a cost-effective methodology to identify and evaluate potential strategies for the mapping, monitoring and evaluation, and surveillance, prior to field testing in multiple settings. Such simulations can identify key parameters including performance of sampling strategies and help inform the design of field evaluations. In turn, field studies can provide better estimates of key parameters and serve to refine simulation. We advocate an iterative process of simulation and field studies to identify optimal and cost-effective sampling strategies for a range of NTDs.
Technical appendix.
(DOC)
We are grateful to Dr. Michael Gichangi, officer of the Division of Ophthalmic Services in the Ministry of Public Health and Sanitation, Government of Kenya and Dr. Sheila West for providing data used to inform simulations, and to PJ Hooper and Danny Haddad for their support and encouragement.