Two-stage lot quality assurance sampling framework for monitoring and evaluation of neglected tropical diseases, allowing for imperfect diagnostics and spatial heterogeneity

Adama Kazienga; Luc E. Coffeng; Sake J. de Vlas; Bruno Levecke

doi:10.1371/journal.pntd.0010353

Abstract

Background

Monitoring and evaluation (M&E) is a key component of large-scale neglected tropical diseases (NTD) control programs. Diagnostic tests deployed in these M&E surveys are often imperfect, and it remains unclear how this affects the population-based program decision-making.

Methodology

We developed a 2-stage lot quality assurance sampling (LQAS) framework for decision-making that allows for both imperfect diagnostics and spatial heterogeneity of infections. We applied the framework to M&E of soil-transmitted helminth control programs as a case study. For this, we explored the impact of the diagnostic performance (sensitivity and specificity), spatial heterogeneity (intra-cluster correlation), and survey design on program decision-making around the prevalence decisions thresholds recommended by WHO (2%, 10%, 20% and 50%) and the associated total survey costs.

Principal findings

The survey design currently recommended by WHO (5 clusters and 50 subjects per cluster) may lead to incorrect program decisions around the 2% and 10% prevalence thresholds, even when perfect diagnostic tests are deployed. To reduce the risk of incorrect decisions around the 2% prevalence threshold, including more clusters (≥10) and deploying highly specific diagnostic methods (≥98%) are the most-cost saving strategies when spatial heterogeneity is moderate-to-high (intra-cluster correlation >0.017). The higher cost and lower throughput of improved diagnostic tests are compensated by lower required sample sizes, though only when the cost per test is <6.50 US$ and sample throughput is ≥3 per hour.

Conclusion/Significance

Our framework provides a means to assess and update M&E guidelines and guide product development choices for NTD. Using soil-transmitted helminths as a case study, we show that current M&E guidelines may severely fall short, particularly in low-endemic and post-control settings. Furthermore, specificity rather than sensitivity is a critical parameter to consider. When the geographical distribution of an NTD within a district is highly heterogeneous, sampling more clusters (≥10) may be required.

Author summary

Periodic follow-up surveys for monitoring and evaluation (M&E) are an important aspect of large-scale neglected tropical diseases (NTD) programs. They are critical to measure progress and determine whether continuing or scaling down interventions is justified. In absence of better alternatives, imperfect diagnostic tests are often deployed in these surveys. Yet, little is known how this affects the population-based program decision-making. We expanded an existing survey design framework that allows for imperfect diagnostic tests and applied it to M&E of intestinal worms. It is a synonymous of soil-transmitted helminths control programs as a case study. In addition, we assessed the trade-off between cost per test, sample throughput and diagnostic performance of tests. We demonstrated that the current M&E guidelines may lead to incorrect program decisions, even when perfect diagnostic methods are deployed. To reduce the risk of incorrect decisions, including more disease-clusters and deploying highly specific diagnostic methods revealed to be the most cost-saving strategies. On the other hand, the higher cost and lower throughput of improved diagnostic tests can be compensated by lower required sample sizes. In conclusion, our framework can contribute to developing more evidence-based guidelines and choices in diagnostic product development for M&E of NTD control programs.

Citation: Kazienga A, Coffeng LE, de Vlas SJ, Levecke B (2022) Two-stage lot quality assurance sampling framework for monitoring and evaluation of neglected tropical diseases, allowing for imperfect diagnostics and spatial heterogeneity. PLoS Negl Trop Dis 16(4): e0010353. https://doi.org/10.1371/journal.pntd.0010353

Editor: Matthew C. Freeman, Emory University, UNITED STATES

Received: October 11, 2021; Accepted: March 28, 2022; Published: April 8, 2022

Copyright: © 2022 Kazienga et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data are in the manuscript and supporting information files.

Funding: BL acknowledges funding from Ghent University starting grant (www.ugent.be). LEC the Dutch Research Council (NWO, grant 016.Veni.178.023). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Neglected tropical diseases (NTD) are a diverse group of 20 parasitic, bacterial, and viral diseases, several of which are zoonotic, food or vector-borne in nature [1–3]. Worldwide, they affect more than 1.5 billion of the world’s population, but disproportionately impact the most impoverished communities in tropical countries [3], particularly those on the African continent [1]. Enormous progress has been made so far in controlling NTD, yet significant challenges remain such as a lack of improved diagnostic tests, new interventions, and monitoring and evaluation (M&E) strategies. As a response to this, WHO has recently published its new road map, aiming (i) to reduce the number of people requiring interventions against NTDs by 90%, (ii) to reduce the global disease burden by 75%, (iii) to eliminate at least one NTD in 100 countries, and (iv) to eradicate two NTD by the end of 2030 [4].

To reach these ambitious 2030 targets, periodic follow-up surveys to measure progress and determine whether scaling down or stopping the interventions is justified, so-called M&E, are important aspects of these large-scale NTD programs. To track progress towards achieving the 2030 targets for NTD, WHO developed and published M&E guidelines for 15 of the 20 NTD [4]. Generally, these M&E guidelines involve recommendations for (i) which diagnostic test to use, (ii) which survey design to use (e.g., number of subjects and number of clusters), and (iii) the corresponding decision rules for continuing or reducing the frequency (or intensity) of interventions. However, the diagnostic tests deployed in these M&E surveys are often imperfect, and it remains unclear how this affects decision-making at the population level. For example, Gass (2020) recently urged for a shift in paradigm from sensitivity to specificity and a holistic approach to developing WHO M&E guidelines for population-based interventions, aligning survey designs and decision rules to the deployed diagnostic test [5].

In a first attempt to gain insight into this complex interplay between diagnostic performance, survey designs and decision rules, we previously developed a general framework based on a 1-stage lot quality assurance sampling (LQAS) approach that consider the impact of diagnostic performance at the individual level (sensitivity and specificity), survey design (e.g., number of subjects), and decision rules on the correctness of program decision-making at the population level [6]. We found that specificity rather than sensitivity becomes more important when the program approaches the endgame. In addition, the study found that the requirements for both parameters are inversely correlated, resulting in multiple combinations of sensitivity and specificity that allow for reliable decision-making. This study further highlighted that improving diagnostic performance results in smaller sample sizes for the same level of program decision-making. Thus, the additional costs per diagnostic test with improved diagnostic performance can be compensated by lower operational costs in the field. These findings have been instrumental in defining the required diagnostic performance of tests deployed in M&E of control programs targeting soil-transmitted helminthiasis [6]. However, an important limitation of this framework is the assumption that all tested subjects originate from the same cluster (e.g., community/school). As such, spatial heterogeneity of infections across clusters was ignored. Also, a detailed cost assessment was not included (e.g., operational costs to collect and screen samples, cost per test and the sample throughput). This is important as improved diagnostic tests might allow decisions to be based on fewer samples, which would compensate for the increased cost per test and reduced sample throughput.

In the present study, we developed a 2-stage LQAS framework for decision-making that allows for both imperfect diagnostics and spatial heterogeneity of infections. Second, we applied the framework using M&E of soil-transmitted helminth (STH) control programs as a case study. For this, we explored the impact of diagnostic performance (sensitivity and specificity), spatial heterogeneity (intra-cluster correlation), and survey design (number of clusters and number of subjects per cluster) on the correctness of program decision-making based on WHO-recommended thresholds, as well as the associated total survey costs. In addition, we evaluated to what extent an increased cost per test and a reduced sample throughput might be compensated by lower sample size requirements when using improved diagnostic tests.

Methodology

Development of a 2-stage LQAS framework for population-based decision-making using imperfect diagnostic tests

General concepts of the framework.

The 2-stage LQAS framework for population-based decision-making using imperfect diagnostic tests consists of four steps (Fig 1). In the first step, n_clust clusters (e.g., schools or communities) are randomly selected from a given implementation unit i (e.g., a geographical area where preventive chemotherapy (PC) is administered). In the second step, n_sub subjects are randomly selected within each cluster j. In the third step, all n_tot ( = n_clust∙n_sub) subjects are screened using an imperfect diagnostic test D that has a sensitivity se_d and a specificity sp_d. In the fourth and final step, the program decisions for unit i are made based on the number of positive test results (X_i+). The frequency (or intensity) of the intervention will be reduced (e.g., scale-down from 6-monthly to annual PC) in case X_i+ is less than some decision cut-off c. In contrast, when X_i+≥c, the frequency or intensity of the intervention remains unchanged or will even be increased. Note that X_i+ includes both true and false positive test results. Also, the framework allows for the spatial heterogeneity, the so-called intra-cluster correlation ρ_i. It is a measure of the extent to which positive test results are clustered within clusters of an implementation unit i, and as such, is a measure of how prevalences vary between clusters. The intra-cluster correlation can take on values between zero and one, with higher values representing more clustering of positive test results (i.e., higher geographical heterogeneity in infection levels).

Download:

Fig 1. The different steps within the 2-stage LQAS framework for population-based decision-making using an imperfect test.

https://doi.org/10.1371/journal.pntd.0010353.g001

Ultimately, we wish to determine the survey design (n_clust and n_sub) and the decision cut-off c that allows for an acceptable risk of making incorrect program decision-making when an imperfect diagnostic test D is deployed in a unit i with intra-cluster correlation ρ_i. In other words, we should aim to minimize the probability that interventions are implemented at a frequency or intensity higher than officially necessary based on the true underlying prevalence (i.e., when true prevalence π_i is under the program decision threshold T) as this would lead to a waste of both time and resources. Simultaneously, we should aim to minimize the probability that interventions are scaled down prematurely (i.e., when π_i≥T), which would lead to a preventable increase in infection and morbidity. Here, we assume that the target prevalence threshold T is defined in terms of the true prevalence π_i (not as measured by some imperfect diagnostic test). We further assume that the value of T is appropriate for scaling interventions up or down, which may [7] or may not be the case [8–10].

Fig 2A illustrates the program decision-making process based on LQAS, using a toy example. It describes the probability of continuing or scaling up the frequency or intensity of the intervention as a function of the true underlying prevalence π_i at the implementation unit level. For this, we assumed that a theoretical diagnostic method D was deployed (sensitivity se_d = 80% and specificity sp_d = 98%) to screen n_clust = 5 clusters and n_sub = 50 subjects per cluster. For illustrative purposes, the LQAS decision cut-off was set at c = 90 (in a sample of n_tot = n_clust∙n_sub = 250). Given a program decision threshold T = 50% (vertical straight line), we can now deduce the error probability of unnecessarily continuing or upscaling the intervention at a frequency or intensity that is greater than needed (ε_overtreat) when true prevalence π_i<T, and the error probability of prematurely reducing the frequency or intensity of interventions (ε_undertreat) when true prevalence π_i≥T. These error probabilities can be considered to be community-level analogues of one minus the negative predictive value and one minus the positive predicted value [5], as used in recent NTD modelling exercises [8,11]. In Fig 2B, we deduce the decision cut-off c for which ε_overtreat and ε_undertreat do not exceed acceptable risk levels for arbitrary choices of π_i<T and π_i≥T. For this purpose, we define the minimal survey performance in terms of the (un)acceptable risk levels (ε_overtreat = 25% and ε_undertreat = 5%) for a particular range of π_i close to threshold T that is defined by a lower (LL) and upper limit (UL). Together, they form a “grey zone” that dictates the slope of the sigmoid curve (from lower left bottom to top right corner of the grey zone). We then try to find the range of values for the decision cut-off c that satisfy the conditions and and we select the value of decision cut-off c that offers the lowest attainable risk of ε_undertreat as we considered this the most important risk to mitigate. As can be seen from Fig 2B, the solid lines indicate the values of c that satisfy the conditions mentioned above (red line: c = 85 and blue line: c = 111), and we choose the decision cut-off equalled 85, as it offers the lowest ε_undertreat (0.5%) as compared to 111 (ε_undertreat = 4.7%). On the contrary, the dotted lines represent some values of c that do not satisfy the conditions (green line: c = 83 and purple line: c = 113).

Download:

Fig 2. Program decision-making process based on 2-stage LQAS framework allowing for an imperfect diagnostic test.

Panel A illustrates the risk of making a wrong program decision (ε_overtreat and ε_undertreat) for a given true prevalence π_i and decision cut-off c = 90 was applied. The vertical straight line indicates the program prevalence threshold T = 50%, whereas the horizontal dotted lines represent the ε_overtreat and ε_undertreat for π_i = 37.5% and π_i = 62.5%, respectively. Panel B represents the decision cut-off c for a choice of grey zone width (lower limit (LL) = 37.5% and upper limit (UL) = 62.5%) and acceptable risk levels (E_overtreat = 25% and ε_undertreat = 5%) for a particular of range of π_i. The solid sigmoid curves (in red and blue) indicate the values of c that satisfy the conditions and , whereas the dotted sigmoid curves (green and purple) do not satisfy the conditions mentioned above. All graphs are based on the same theoretical diagnostic test D (se_d = 80% and sp_d = 98%), survey design (n_clust = 5 and n_sub = 50), an intra-cluster correlation ρ_i = 0.02 and 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g002

The mathematical backbone of the framework

The mathematical backbone of the 2-stage LQAS population-based decision-making using an imperfect diagnostic test is summarized by the three equations below (Eq (1)–(3)). Tables 1 and 2 describe the definitions of the different parameters and derived variables used in these equations, respectively. Eq (1) describes the 2-stage sampling process that results in the total number of positive test results within a cluster. For this, we used a 2-stage beta-binomial model, where the beta-distribution represents the variation in true cluster-level prevalence π_ij within an implementation unit i, and the binomial distribution represents the sample variation in the number of positive tests X_ij resulting in cluster j and implementation unit i when an imperfect diagnostic test D with sensitivity se_d and specificity sp_d is applied. The shape parameters α_i and β_i, both greater than zero, define the beta distribution, where index i indicates that these parameters can be implementation unit specific. For the present study, we parameterized the beta distribution in terms of the expected value and the intra-cluster correlation , where α_i and β_i are the distributions’ two shape parameters. An added benefit of using the intra-cluster correlation as a measure of variation is that it is independent of the mean π_i of the beta distribution, unlike its variance . To illustrate our framework, we considered an intra-cluster correlation ρ_i of 0.02 [12]. The binomial distribution for positivity of individual test results was parameterized in term of , the expected proportion of positive test results within a cluster j from implementation unit i, and the number of subjects screened per cluster (n_sub): (Eq 1)

Download:

Table 1. Definitions of the parameters that describe the 2-stage LQAS framework.

https://doi.org/10.1371/journal.pntd.0010353.t001

Download:

Table 2. Definitions of the derived variables that describe the 2-stage LQAS framework.

https://doi.org/10.1371/journal.pntd.0010353.t002

Eq (2) and Eq (3) below represent the risks associated with incorrect program decision-making, denoting the parts of the graph line in Fig 2A left and right of the prevalence threshold T, respectively. Here, Eq (2) describes the probability that the interventions are implemented at a frequency or intensity higher than required for the true underlying prevalence when π_i<T. Vice versa, Eq 3 describes the probability that interventions are prematurely stopped or scaled down when π_i≥T: (Eq 2) (Eq 3)

To be able to calibrate a value of the decision cut-off c, we need to choose what risk of over-and undertreatment (ε_overtreat and ε_undertreat) we find acceptable for a point π_i<T and another point π_i≥T. Together, these two true prevalence points, which correspond with LL and UL of the grey zone, dictate the slope of the sigmoid curve illustrated in Fig 2B (the higher the desired steepness, the lower risk we are willing to accept). We express this as ε_overtreat(π_i = LL)≤E_overtreat and ε_undertreat(π_i = UL)≤E_undertreat.

Simulation framework to determine the decision cut-off c

The determination of the decision cut-off c is straightforward in a 1-stage LQAS ignoring the spatial heterogeneity of infections across clusters [8], but there is no analytical solution for a 2-stage LQAS allowing for spatial heterogeneity [6]. We therefore applied a Monte Carlo (MC) simulation framework to determine the decision cut-off c. In brief, we first simulated the distribution of the total number of positive test results (X_i+) based on 10,000 MC draws from the 2-stage beta-binomial model (Eq (1)). A set of MC draws was produced for each combination of chosen prevalence threshold T and values of the limits of the grey zone (LL, UL), conditional on survey design (n_clust, n_sub) and spatial variation in true prevalences (ρ_i).

Subsequently, for each combination of c ∈(0, n_clust∙n_sub) and prevalence threshold T, we calculated the probability of overtreatment ε_overtreat (Eq (2)) when π_i = LL and the probability of undertreatment ε_undertreat (Eq (3)) when π_i = UL. Suitable values of the decision cut-off c were those that resulted in both ε_overtreat(π_i = LL)≤E_overtreat and ε_undertreat(π_i = UL)≤E_undertreat. When a multiple values of c satisfied the grey zone conditions (Eq(2) and (Eq(3)), we selected the value of decision cut-off c that offers the lowest attainable risk of ε_undertreat as we considered this the most important risk to mitigate.

In Fig 3, we further illustrate this process using the toy example of Fig 2. The top row (Fig 3A–3D) shows the process for the lower limit of the grey zone, whereas the bottom row (Fig 3E–3H) illustrates this for the upper limit of the grey zone. The first three panels in each row (Fig 3A–3C and Fig 3E–3G) represent the iterative process to determine the distribution of X_i+ based on a 2-stage beta-binomial model; the fourth panel in each row (Fig 3D and 3H) describes ε_overtreat and ε_undertreat across all potential values of c between zero and n_clust∙n_sub. Clearly, a decision cut-off c between 85 (Fig 3D: ε_overtreat(π_i = LL)≤E_overtreat) and 111 (Fig 3H: ε_undertreat(π_i = UL)≤E_undertreat) allowed for an acceptable risk of erroneous decision-making (see also Fig 2B). Note that in this example, there are multiple options for c, but this may not always be the case. Indeed, it is anticipated that for some combinations of n_clust, n_sub, se_d, sp_d, LL, UL, and ρ_i, there will not be any decision cut-off that fulfils the conditions ε_overtreat(π_i = LL)≤E_overtreat and ε_undertreat(π_i = UL)≤E_undertreat.

Download:

Fig 3. Determination of the decision cut-off c based on 2-stage LQAS framework allowing for imperfect tests.

This figure describes the simulation framework to determine the decision cut-off c that allows for adequate (E_overtreat = 25% and E_undertreat = 5%) decision-making at a true underlying prevalence π_i at the implementation unit i equal to 37.5% (LL) and 62.5% (UL) when an imperfect theoretical diagnostic test D (se_d = 80% and sp_d = 98%) was deployed to screen n_clust = 5 and n_sub = 50 subjects per cluster. We fixed the intra-cluster correlation ρ_i = 0.02 at both limits of the grey zone. The bottom row graphs (Panels A–D) illustrate the process for the lower limit of the grey zone, whereas the bottom row graphs (Panels E–H) show this for the UL of the grey zone. The horizontal dotted line in Panel D represents the E_overtreat, wherein Panel H represents E_undertreat. The dotted vertical lines in these panels represent the decision cut-off c and the bullet the ε_overtreat (Panel D) and ε_undertreat (Panel H). All graphs are based on the same set of 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g003

Application of the 2-stage LQAS framework to soil-transmitted helminths

Monitoring and evaluation of soil-transmitted helminth control programs as a case study.

STH are a group of intestinal roundworms, including Ascaris lumbricoides (giant roundworm), Trichuris trichiura (whipworm), Ancylostoma duodenale and Necator americanus (hookworms). WHO recommends to control the STH-attributable morbidity through periodical administration of anthelmintic drugs to both children and other at-risk populations living in endemic areas. The frequency of these large-scale deworming programs is based on the observed prevalence of STH infections (any species). That is, at the start of the program, it is recommended to distribute drugs twice a year when the prevalence is at least 50% and once a year when the prevalence is at least 20%. During the implementation phase, the prevalence of any STH infection is periodically re-evaluated to verify whether objectives are being met, and if necessary, adjusting the frequency of drug administration (prevalence ≥50%: 3x PC / year; 50%> prevalence ≥20%: maintain PC frequency; 20%> prevalence ≥10%: 1x PC / year; 10%> prevalence ≥2%: 1x PC / 2 years; prevalence <2%: stop PC). To monitor and evaluate STH control programs, WHO recommends screening 5 schools and 50 children per school. Traditionally, STHs have been diagnosed by detecting worm eggs in a stool smear using a compound light microscope (the Kato-Katz thick smear method). This method is cheap and straightforward, yet notoriously imperfect, and sensitivity is more of a limitation (mainly for detection of low-intensity infections) than specificity [13,14]. Notably, there is a lack of studies investigating whether its performance is sufficient to make reliable program decision to stop PC and how use of better diagnostics or sampling strategies could lead to improvement.

We will illustrate the application of the 2-stage LQAS framework in the context of M&E of STH control programs, with the aims (i) to determine the required diagnostic performance when the current WHO survey design is applied, (ii) to further optimize the survey design for imperfect diagnostic tests, and (iii) to customize the sample throughput and cost per test according to the improvements in diagnostic performance. For each of these objectives, we applied the aforementioned 2-stage LQAS framework on strategically selected scenarios of n_clust, n_sub, se_d, sp_d, LL, UL, and ρ_i, and verified whether we could render a decision cut-off c that fulfils the conditions ε_overtreat(π_i = LL)≤E_overtreat and ε_undertreat(π_i = UL)≤E_undertreat. To allow for two different operational definitions of ‘reliable’ program decision-making (‘adequate’ vs. ‘ideal’), we set the E_undertreat at 5% for both definitions, whereas we set E_overtreat at either 25% (‘adequate’) or 10% (‘ideal’). These values for E_undertreat and E_overtreat have also been previously used to determine the sensitivity and specificity for diagnostic tests for other helminth diseases [5,6]. S1 Table provides an overview of the different n_clust, n_sub, se_d, sp_d, LL, UL, ρ_i, E_undertreat and E_overtreat that were considered in our simulations and S2 Table shows a summary of the simulated diagnostic methods. In the following sections, we will briefly justify the choices made to meet the objectives. Also note that we assume that sensitivity and specificity of a diagnostic test D do not vary across individuals, even though this assumption does not really hold, as the sensitivity of detecting worm eggs in stool depends on the intensity of STH infection, which is known to vary between individuals [15].

Determine the required diagnostic performance when applying the current WHO survey design.

To determine the required diagnostic performance when applying the current WHO survey design (n_clust = 5 and n_sub = 50) across each of the 4 program decision thresholds T (2%, 10%, 20% and 50%), we varied both se_d and sp_d from 60% to 100% (with 1% increments), resulting in a 41 × 41 grid of hypothetical diagnostic tests.

Given the wide range in program decision thresholds T (2% to 50%), we opted to define the limits of the grey zone proportional to T. We arbitrarily set the limits at T±25%. For example, for a T equal to 2%, LL and UL were set at 1.5% and 2.5%, respectively, while for a T equal to 50%, these numbers were 37.5% and 62.5%. We assumed that spatial heterogeneity (ρ_i = 0.02) was the same for all true prevalences of infection π_i. For example, this assumption translates to a central 95%-confidence interval (95%-CI) of cluster-level true prevalence π_ij of 36.2–63.8% for an implementation unit i with an expected cluster-level true prevalence π_i of 50%; analogously, for π_i = 2% this translates to a 95% CI of 0.0–7.3%.

Optimise the survey design when using an imperfect diagnostic test.

To optimise the survey design, we explored the impact of the width of the grey zone (LL and UL), se_d, sp_d, n_clust and ρ_i on n_sub. To this end, we determined the required n_sub that allowed for adequate (E_undertreat = 5% and E_overtreat = 25%) program decisions around a program decision threshold of 2% for different scenarios of the width of the grey zone (T±25%, ±40%, ±50%, ±60% and ±75%), se_d (80%−100%, with 1% increments), sp_d (98%−100%, with 0.1% increments), n_clust (5, 10, 15 and 20) with p_i (initially) fixed at 0.02. We focused on T = 2% because the requirements for n_clust and n_sub become more stringent as a function with decreasing values of T [6].

Finally, we further assessed the impact of the geographical variation in prevalence between clusters (ρ_i) and the number of sampled clusters (n_clust) on the minimum required number of subjects per cluster (n_sub). To this end, we arbitrarily fixed the limits of the grey zone at 1% and 3%, assumed a theoretical diagnostic test with se_d = 80% and sp_d = 98%, explored three values for n_clust (10, 15 and 20) and varied the intra-cluster correlation ρ_i from 0.012 (lower spatial heterogeneity of infections across the clusters) to 0.032 (higher spatial heterogeneity of infections across the clusters) with 0.004 increments.

We also estimated how the total survey costs (C_tot) may change with the diagnostic test characteristics (se_d and sp_d), which dictate the minimum required n_clust and n_sub, given (LL, LL, E_undertreat and E_overtreat), reagent costs, and sample throughput. As a benchmark, we used a theoretical reference diagnostic test D_t1 (se_dt1 = 80% and sp_dt1 = 98%, as in the example used in Fig 2) for which we assume that reagent costs and throughput are the same as a single Kato-Katz thick smear, as recently estimated by Coffeng et al.[16]. These cost estimates include (i) the reagent cost for collection (0.57 US$ per sample) and testing (1.38 US$ per sample), (ii) the salary for a single mobile field team comprised of one nurse and three laboratory technicians (90 US$ per day, assuming 8 working hours), and (iii) the cost per day for car rental, including salary of the driver and gasoline (90 US$ per day). As in Coffeng et al. [16], we adopt the assumption that a team collects samples in the morning (8:00–12:00) and that all collected samples are processed in the afternoon (13:00–17:00), which implies that the number of samples that can be collected daily is limited. Therefore, we calculated for each survey design the number of working days (n_days) required to screen all recruited subjects. For this, as by Coffeng et al.[16], we assume that it takes a person on average 412 seconds to process and test a single stool sample (which in Coffeng et al.’s analysis included preparation and examination of stool smear, digitization of demographic information, and recording the egg counting results). The sample throughput for the hypothetical reference test was 9 samples per hour, whereas the reagent cost to collect samples and the reagent cost to test samples were 0.57 US$ and 1.38 US$ respectively.

The equation to calculate n_days is described below: (Eq 4)

Finally, (iv) we included a cost of 180 US$ (salary for one team + one car rental) per school included in the survey. This cost also covers the time required to inform the schools of the purpose of the study. The total survey cost C_tot is as follows: (Eq 5)

We subsequently applied C_tot to determine which n_clust and n_sub represent the number of clusters and subjects per cluster that allow for adequate (E_undertreat = 5% and E_overtreat = 25%) program decisions around a program decision threshold T of 2% (LL = 1% and UL = 3%) for a given se_d and sp_d. We fixed ρ_i at 0.02. We assumed that (i) there were no additional costs for the laboratory infrastructure, (ii) that the team only received compensation during working days (Monday–Friday), and (iii) did not take any breaks during processing.

Customize the sample throughput and cost per test according to the improvements in diagnostic performance.

To assess how the total cost of a survey depends on the diagnostic test performance (which dictates the minimal adequate design in terms of n_clust and n_sub), the sample throughput and cost per test, we defined three theoretical diagnostic methods D_t1−D_t3 with the same sensitivity () but varying specificity ( and sp_dt3 = 94%) and apply these to varying scenarios of sample throughput (3–50 samples processed by one person in an hour, with 1 sample increment) and cost per test (1 US$−10 US$, with 0.01 US$ increment) applying Eqs (4) and (5) to estimate the total survey cost. For each combination of the parameters mentioned above, we then expressed the entire survey cost relative to the total cost of a survey based on the theoretical reference diagnostic test D_t3 (se_dt3 = 80% and sp_dt3 = 94%) with cost characteristics as for single Kato-Katz thick smears. In each of the scenarios, ρ_i was fixed at 0.02, and the grey zone was defined as T±50% (LL = 1% and UL = 3%).

Application of the framework to develop evidence-based guidelines and direct R&D.

To further illustrate how our framework can contribute to the development of more evidence-based WHO guidelines, we explored the required number of subjects per cluster when 10, 15, and 20 clusters (intra-cluster correlation ρ_i of 0.02) were sampled that allows for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%). We considered the 2% prevalence threshold (LL = 1% and UL = 3%) separately for two theoretical diagnostic tests D_tpp1 ( and ) and D_tpp2 ( and ). The values for the diagnostic performance of both tests were based on the minimal required sensitivity and specificity defined in the recently published WHO target product profiles (TPPs) for M&E of STH control programs [17]. We further estimated the total survey cost using equations Eq (4) and Eq (5). To this end, we used the required sample throughput (minimal: 7 samples per person per hour; ideal: 10 samples per person per hour) and reagent costs (minimal: 3 US$, ideal: 1 US$) listed in the WHO TPPs [17].

Further, we used our framework to gain insights into where R&D should be directed. For this, we fixed the diagnostic performance for single Kato-Katz thick smear KK (se_kk = 55% and sp_kk = 95%) to detect low intensity infections [18–21]. Then, we considered two hypothetical improved Kato-Katz thick smear methods, one with improved sensitivity KK_se(se_kkse = 60% and sp_kkse = 95%) and one with improved specificity KK_sp (se_kksp = 55% and sp_kksp = 99%). We then estimated the required number of subjects per cluster and the associated total survey cost when 10, 15, and 20 clusters were sampled around the 2% program prevalence threshold (LL = 1% and UL = 3%) that allow for adequate decision-making (E_overtreat = 25% and E_undertreat = 5%). Additionally, we estimated the highest possible reagent cost per test for the diagnostic test KK_sp. For this, we determined the survey cost for this improved Kato-Katz thick smear with varying scenarios of sample throughput (3–50 samples) and reagent cost to test one sample (1 US$−10 US$) relative to the total cost of a survey based on the hypothetical diagnostic test that meets the WHO TPPs for M&E of STH control program in terms of diagnostic performance D_tpp1 (Se_tpp1 = 60% andSp_tpp1 = 99%), sample throughput (7 samples per person per hour) and reagent cost (3 US$ per test), while setting the number of clusters at 10.

Results

The required diagnostic performance when applying the current WHO survey design

Fig 4 illustrates the required sensitivity and specificity when applying the WHO survey design (n_clust = 5 and n_sub = 50 per cluster) around the three highest program prevalence thresholds T (Fig 4A: 10%; Fig 4B: 20%; Fig 4C: 50%). We fixed the intra-cluster correlation ρ_i at 0.02 and defined the limits of the grey zone as a proportion of the T (LL = T−0.25∙T; UL = T+0.25∙T). Each of the panels represents a contour plot highlighting all possible combinations of sensitivity and specificity that rendered a decision cut-off c that fulfils and for both adequate (all combinations within the light blue and dark blue area) and ideal program decision-making (all combinations in the dark blue area). The red area indicates all combinations of sensitivity and specificity that did not render a decision cut-off c that fulfilled the conditions for adequate decision-making. Three crucial aspects can be observed in Fig 4. First, the requirements for diagnostic test performance become less stringent when the program threshold is closer to 50%. None of the combinations of sensitivity and specificity allowed for adequate or ideal decision-making around program thresholds of 10% (areas of Fig 4A is entirely red) and 2% (not shown as it is identical to 10%). For thresholds of 20% and 50%, program decisions were either adequate (Fig 4B contains red and light blue) or adequate to ideal (Fig 4C contains all three colours), respectively. Second, the requirements for specificity and sensitivity are inversely correlated. For instance, when adequate performance program decisions are made around a 50% threshold (Fig 4C) and with a diagnostic test with specificity equal to 60%, the sensitivity must be at least 70% and vice versa. Third, the requirements for specificity are more stringent than those for sensitivity, which became already apparent in the scenario where program decision-making is made around a T of 20% (Fig 4B). When employing a diagnostic method with a perfect specificity, the sensitivity should not drop below 72% to guarantee adequate decision-making. In contrast, the specificity can only drop to 86% when the sensitivity is set at 100%.

Download:

Fig 4. The required sensitivity and specificity when applying the current WHO study design.

This figure indicates the required sensitivity and specificity when applying the current WHO study design (n_clust = 5 and n_sub = 50) that allow for adequate ( or ideal ( decision-making around each of the four program prevalence thresholds T (Panel A: 10%; Panel B: 20%; Panel C: 50%). The intra-cluster correlation ρ_i was fixed at 0.02, and the limits of the grey zone were defined as a proportion of the program prevalence threshold . The red area indicates that the combination of the survey design (n_clust, n_sub) and the test performance (se_d and sp_d) was inadequate for decision-making, while the light blue and the dark blue represent combinations that allow for adequate ( and ideal decision-making, respectively. All graphs are based on the same set of 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g004

Optimise the survey design when using an imperfect diagnostic test

We now proceed to optimise the survey design for a program prevalence threshold of 2%, for which the WHO-recommended survey design is not sufficient. The first step towards this is to evaluate how stringency (i.e., grey zone width) affects the minimum survey design. Fig 5 illustrates the impact of the width of the grey zone on the number of subjects per cluster (n_sub; Fig 5A), the total number of subjects sampled across n_clust clusters (n_tot; Fig 5B), the corresponding decision cut-off (c; Fig 5C), and the associated survey costs (C_tot; Fig 5D) required for adequate program decision-making . We fixed the intra-cluster correlation ρ_i at 0.02 and assumed the use of a theoretical diagnostic test D_t1 (se_dt1 = 80% and sp_dt1 = 98%). Each of these four metrics (n_sub, n_tot, c and C_tot) decreased as the width of the grey zone became wider (i.e., when less stringent requirements for decision-making are defined). For example, when the limits of the grey zone were defined as 40% of the program threshold () and 10 clusters were randomly selected, the minimum required number of subjects per cluster was 540 (n_tot = 5,400), where this was 175 (n_tot = 1,750) when the limits were defined as T±50% (LL = 1%, UL = 3%) (Fig 5A and B). Additionally, it is important to note that this difference in n_tot subjects further decreased when the width of the grey zone became wider. Indeed, when the width of the grey zone was T±75%, the required n_tot subjects was 650 when n_clust equalled 5 and 500 when n_clust was set at 10 (Fig 5B). Furthermore, Fig 5C shows very similar trends as Fig 5B because the decision cut-off also decreased when the width of the grey zone became wider. Also, for relatively small grey zone, a higher number of clusters (either 15 or 20) is most cost-efficient. As a result, it was not unexpected to observe that for a width of the grey zone defined as T±75%, sampling 5 clusters became the most cost-efficient design (Fig 5D).

Download:

Fig 5. Impact of the width of the grey zone on program decision-making when applying imperfect diagnostics.

This figure illustrates the impact of the width of the grey zone on the number of subjects per cluster (Panel A), the total number of subjects sampled across n_clust clusters on the minimum required survey design for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%) (Panel B), the decision threshold c (Panel C), and the associated total survey cost (Panel D). We considered program prevalence threshold T of 2%, fixed the intra-cluster correlation ρ_i at 0.02 and assumed a theoretical diagnostic test D_t1 (se_dt1 = 80% and sp_dt1 = 98%). The width of the grey zone is expressed as a proportion of program prevalence threshold T. The black bullet across the four panels indicates the chosen hypothetical reference grey zone width (T±50%), which was chosen to further illustrate the impact of the diagnostic performance (Fig 6) and geographical variation in prevalence between clusters on program decision-making (Fig 7), and the relative change in total survey cost based on newer diagnostic tests (Fig 8). All graphs are based on the same set of 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g005

To facilitate interpretation of the impact of the other parameters on the program decision-making, we fixed the limits of the grey zone in all next figures to 1% (= LL) and 3% (= UL). These limits corresponded to T±50% (black dots in Fig 5) and resulted in a total sample size n_tot of about 2,000 (200 subjects per cluster) and a decision cut-off of 60 when 10 clusters were sampled, a sample size that we considered feasible under field conditions.

We now proceed to evaluate the impact of diagnostic test performance on the minimum survey design. Fig 6 illustrates the impact of the diagnostic performance (se_d≥80% and sp_d≥98%) and the number of sampled clusters (n_clust: 5, 10, 15 and 20) on the minimum required number of subjects per cluster (n_sub) for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%) around a 2% program prevalence threshold. We fixed the intra-cluster correlation ρ_i at 0.02. Generally, this figure highlights three important aspects. First, improving the diagnostic performance can substantially further reduces the minimum required number of subjects per cluster and the total sample size. For example, when sampling 5 clusters and deploying a diagnostic test D (se_d = 80% and sp_d = 98%), the minimum required number of subjects per cluster was ~1,300. However, it dropped to 400 subjects per cluster when the diagnostic test specificity was set at 99.9% (Fig 6A). Second, these panels confirm that specificity is more important than sensitivity when evaluating prevalence around 2%. In other words, improving the specificity has more impact on the minimum required sample size than improving the sensitivity. This can be readily seen from the almost vertical orientation of the contour lines in all the panels of Fig 6, which indicate that sample size changes fastest with specificity (i.e., about perpendicular to the contour lines). Third, as expected, sampling fewer clusters leads to a greater number of subjects per cluster and ultimately to an increased total sample size. For example, when sensitivity was set at 80% and specificity was set at 98%, the minimum required number of subjects per cluster was ~1,300 when 5 clusters were sampled (n_tot = 6,500). For 10, 15 and 20 clusters, these numbers were about 150 (n_tot = 1,500), 80 (n_tot = 1,200), and 50 (n_tot = 1,000), respectively (Fig 6B).

Download:

Fig 6. Impact of the diagnostic performance on program decision-making.

The contour lines illustrate the relationship between specificity (x-axis), the sensitivity (y-axis), and the minimum number of subjects per cluster n_sub (contour lines) for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%) around a 2% program prevalence threshold for the number of clusters n_clust equal to 5, 10, 15 and 20. We fixed the intra-cluster correlation ρ_i at 0.02 and the limits of the grey zone at 1% and 3%. All graphs are based on the same set of 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g006

Last, we consider the impact of geographical variation on the optimal study design for a threshold of 2%, assuming test sensitivity and specificity of 80% and 98%, respectively. Fig 7 displays the impact of the geographical variation in prevalence (expressed as intra-cluster correlation, which ranged from 0.012 to 0.032) on the number of subjects per cluster (n_sub; Fig 7A), the total number of subjects sampled across n_clust (n_tot; Fig 7B), the corresponding decision cut-off (c; Fig 7C), and the associated total survey costs (C_tot; Fig 7D) required for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%) around a 2% program prevalence threshold. Each of these four metrics increased as a function of more heterogeneously distribution of STH infections. In other words, if the spatial heterogeneity is different between implementation units, the optimal design for decision-making will be different. For example, when fixing the number of clusters at 10, a minimum of 120 subjects per cluster are required when the intra-cluster correlation ρ_i equalled 0.012, while 188 subjects per cluster are needed when ρ_i was set at 0.025 (Fig 7A). Additionally, for the total survey cost (Fig 7D), sampling fewer clusters are preferred when infections are more homogeneously among clusters (ρ_i<0.017). For example, it is apparent from Fig 7D that sampling 10 or 15 clusters was the most cost-effective when the intra-cluster correlation ranged from 0.012 to 0.017. However, in case of a more heterogeneous distribution, Fig 7D indicates that it becomes more cost-effective to sample more clusters and fewer subjects per cluster. For example, when fixing the intra-cluster correlation ρ_i at 0.012, the total survey cost was ~6,000 US$ when 10 clusters were sampled and ~7,000 US$ in case 20 clusters were sampled. In contrast, for an intra-cluster correlation of 0.025, these numbers were ~8,500 US$ for 10 clusters and ~7,8000 US$ for both 15 and 20 clusters (Fig 7D).

Download:

Fig 7. Impact of geographical variation in prevalence between clusters on program decision-making.

This figure illustrates the impact of the geographical variation in prevalence (expressed as intra-cluster correlation, which ranged from 0.012 to 0.032) on the number of subjects per cluster (n_sub; Panel A), the total number of subjects sampled across n_clust (n_tot; Panel B), the corresponding decision cut-off c (Panel C), and the associated total survey costs (C_tot; Panel D) required for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%) around a 2% program prevalence threshold. In this figure, the sensitivity and specificity were set at 80% and 98%, respectively. The limits of the grey zone at 1% and 3%. All graphs are based on the same set of 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g007

Customize the sample throughput and cost per test according to the improvements in specificity

Let us now consider how the potentially increased cost and lower throughput of improved diagnostic tests may be offset by lower required sample sizes due to higher test specificity. Fig 8 illustrates the survey cost for varying scenarios of sample throughput and reagent cost (i.e., the cost to test one sample) relative to the total cost of a survey based on the hypothetical reference diagnostic test with sample throughput (9 samples per hour per person) and cost characteristics (1.38 US$ reagent test cost) as single Kato-Katz thick smears. Three theoretical diagnostic methods D_t1−D_t3 were defined with varying specificity ( and sp_dt3 = 94%) while setting the sensitivity at 80% () for all three tests. As diagnostic test specificity will have an impact on the sample size (number of clusters and individuals per cluster) and thus the total survey costs (see Fig 6), we considered as reference the diagnostic method with the lowest specificity, D_t3.

The estimated total survey cost for the reference diagnostic test D_t3 (10 clusters and 350 individuals per cluster tested) was ~14,500 US$ (indicated by a black bullet in Fig 6A). From Fig 8A, it becomes clear that to keep the total survey cost the same (i.e., the purple line), the requirements for reagent costs and the sample throughput are correlated. For example, when we increase the reagent costs with 0.50 US$ to 1.88 US$, the required sample throughput should be at least 13 instead of 9 to result in the same total survey cost. The panel also indicates that the association describing the break-even point is non-linear. In other words, an increase of 0.50 US$ will not always require increasing the sample throughput by 4. Indeed, if we now further increase the reagent cost by 0.50 US$ to 2.38 US$, the required throughput needs to be increased up to 23 (increase of 10), underscoring that the compensation of increased reagent cost by the sample throughput is limited. In fact, for a survey based on the reference diagnostic test D_t3, the reagent cost per test cannot exceed 2.90 US$.

Download:

Fig 8. The relative total survey cost of imperfect diagnostic tests with varying sample throughput and reagent costs.

The contour lines illustrate the total survey cost when applying imperfect diagnostic tests with varying sample throughput and reagent cost to test one sample relative to the total cost of a survey based on the hypothetical reference diagnostic test with the sample throughput (9 samples per hour per person) and cost characteristics (1.38 US$ reagent test cost) of single Kato-Katz thick smear. We considered three hypothetical diagnostic tests D_t3 (reference diagnostic test; Panel A), D_t2 (Panel B), and D_t3 (Panel C), each with a different diagnostic performance (). The intra-cluster correlation ρ_i was set at 0.02 and the number of clusters at 10. The number of subjects per cluster and the total survey cost was defined as the minimum required number of subjects per cluster and minimal cost required for adequate decision-making (E_overtreat = 25% and E_undertreat = 5%) around a program prevalence threshold of 2%, with the grey zone defined as T±50% (LL = 1% and UL = 3%). All three panels were based on 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g008

Fig 8B and C illustrate the same trade-off between sample throughput and reagent costs for diagnostic tests D_t2 (se_dt2 = 80% and sp_dt2 = 96%) and D_t1 (se_dt1 = 80%, sp_dt1 = 98%), respectively. Generally, these panels indicate that when fixing the number of clusters at n_clust = 10, the improved diagnostic performance reduced the required number of subjects per cluster (350 vs. 250 vs. 175), which allowed for diagnostic tests with a lower sample throughput (downward shift of the contour lines) and higher reagent costs (shift of the contour lines to the right) while keeping the total survey cost the same (purple line). For example, to compensate for an 0.50 US$ increase in reagent costs, the required sample throughput should be at least 6 for diagnostic test D_t2 and 4 for diagnostic test D_t1 instead of 13 per hour. Furthermore, the highest possible reagent cost per test to ensure the same total survey cost for this improved diagnostic test was ~4.30 US$ for the diagnostic test D_t2 and 6.50 US$ for diagnostic test D_t1.

Application of the framework to develop evidence-based guidelines and direct R&D

Given a 2% program prevalence threshold and a choice of grey zone width (T±50%) for adequate program decision-making (E_overtreat = 25% and E_undertreat = 5%), we investigated the potential contribution of our framework to more evidence-based WHO guidelines. For this, two hypothetical tests D_tpp1 (se_tpp1 = 60% and sp_tpp1 = 99%) and D_tpp2 (se_tpp2 = 86% and sp_tpp2 = 94%), both identified as potential diagnostic tests in the recently published by WHO TPPs for M&E of STH. The intra-cluster correlation (ρ_i) was fixed at 0.02 when sampled 10, 15, and 20 clusters. Furthermore, we set the sample throughput (7 and 10 samples per person per hour) and reagent cost (1 US$, 3 US$ per test) [17]. Table 3 illustrates the required number of subjects per cluster for each scenario. It is evident from this table that for a diagnostic test D_tpp1 the number of subjects per cluster equalled 165 for 10 clusters, 92 for 15 clusters, and 60 for 20 clusters. For a diagnostic test D_tpp2 the numbers were 330 (10 clusters), 162 (15 clusters), and 110 (20 clusters), respectively. Also, it becomes clear that the requirements for both sample throughput and reagent cost are less stringent for diagnostic test D_tpp1. For approximately the same survey costs (~11,000 US$), the sample throughput and cost per test for diagnostic test D_tpp1 can be 7 samples and 3 US$, respectively, while this needs to be 10 samples and 1 US$ for diagnostic test D_tpp2.

Download:

Table 3. The use of the 2-stage LQAS framework to develop guidelines and strategic choices in R&D.

https://doi.org/10.1371/journal.pntd.0010353.t003

Let us now illustrate the potential contribution of our framework to R&D, considering a 2% program prevalence threshold (LL = 1% and UL = 3%). Table 3 highlights that R&D should be directed towards improved specificity rather than sensitivity. For instance, if sampling 20 clusters with improved sensitivity, 190 subjects per cluster must be sampled, whereas only 80 are needed per cluster with improved specificity. Furthermore, we determined the highest possible reagent cost per test for the Kato-Katz thick smears with improved specificity KK_sp (se_kksp = 55% and sp_kksp = 99%). It can be observed from Fig 9 that for the same total survey cost, the reagent cost per test for the Kato-Katz thick smear with improved specificity should not exceed 1.85 US$ when one person can only process 7 samples per hour. This cost per test could be relaxed to 2.40 US$ and 2.64 US$ when the sample throughput is increased to 9 and 10 samples, respectively.

Download:

Fig 9. The relative total survey cost of Kato-Katz thick smear with improved specificity.

The contour lines illustrate the total survey cost when applying a Kato-Katz thick smear with improved specificity KK_sp(se_kksp = 55%, and sp_kksp = 99%) with varying sample throughput and reagent cost to test one sample relative to the total cost of a survey based on the soil-transmitted helminths target product profiles diagnostic performance D_tpp1 (se_tpp1 = 60% and sp_tpp1 = 99%) with sample throughput (7 samples per hour per person) and cost characteristics (3 US$ reagent test cost). The dotted line indicates the maximum reagent cost per test for the improved Kato-Katz thick smear (1.85 US$) when the sample throughput was set at 7. The intra-cluster correlation ρ_i was set at 0.02 and the number of clusters at 10. The number of subjects per cluster was estimated as the required minimum number of subjects for adequate decision-making (E_overtreat = 25% and E_undertreat = 5%) around a program prevalence threshold of 2%, with the grey zone defined as T±50% (LL = 1%and UL = 3%). The total survey cost was defined as the minimal cost required. The graph was based on 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.g009

Discussion

Diagnostic tests currently deployed in M&E surveys are often imperfect. In the present study, we developed a 2-stage LQAS framework for program decision-making that allows for both imperfect diagnostics and spatial heterogeneity of infections. We applied the framework using M&E of STH control programs as a case study. For this, we explored the impact of the diagnostic performance, spatial heterogeneity and survey design on both the program decision-making around the prevalence decisions thresholds recommended by WHO and the total survey costs. In addition, we assessed the trade-off between cost per test, sample throughput and diagnostic performance of tests. Finally, we illustrated how our framework could support the development of evidence-based guidelines and direct R&D.

Revision of WHO guidelines for M&E of STH programs may be warranted

The currently proposed survey design by WHO (5 schools and 50 subjects per school per implementation unit) allows for adequate (prevalence threshold of 20%) to ideal (prevalence threshold of 50%) program decision-making. However, the risk of an incorrect program decision around a prevalence threshold of 2% and 10% may be too high given the geographical heterogeneity (ρ_i = 0.02) in infection levels and predefined grey zone (LL = 1% and UL = 3%), as assumed here. Given that even perfect diagnostic tests cannot reduce this risk to an acceptable level under these assumptions (S1 Fig), revision of the WHO guidelines for M&E of STH programs may be warranted. To illustrate how our framework can contribute to the development of more evidence-based WHO guidelines, we provided an example calculation for the minimum survey design for diagnostic tests that satisfy the recently published WHO TPPs for M&E of STH control program. The resulting required sample size for both TPPs for M&E of STH was feasible under fields conditions (Table 3).

A paradigm shift from sensitivity to specificity is warranted

Our study confirms that the requirements for diagnostic test parameters are inversely correlated [6] and that specificity rather than sensitivity is a critical diagnostic parameter to consider when NTD programs progress towards the end-game [5]. These observations have already been explained in more detail elsewhere [5,6], and hence we will restrict ourselves to underscoring their impact on research and development (R&D) of future diagnostic tests for NTD. First, we highlight that an important shift in paradigm on diagnostic performance will be required from the NTD research community. Where the focus has mainly been on the diagnostic sensitivity (sustained by the focus and conclusions of various research [22–24]), it has now become clear that obtaining a high diagnostic specificity will be the driving force for R&D. This need for more specific diagnostic tests is already apparent in the different TPPs that WHO recently published for onchocerciasis [25], lymphatic filariasis [26], soil-transmitted helminthiasis [17] and schistosomiasis [27]. In each of these TPPs, the required specificity did not drop below 94%, while the requirements for sensitivity were relaxed to 60%.

Second, we suggest a re-appraisal of the value of the current diagnostic standards. For example, a single Kato-Katz thick smear is the current diagnostic standard for STHs, and although it lacks sensitivity to detect infections of low intensity ([13,14,28,29]: Ascaris: ~55%, Trichuris: ~80%, hookworms: ~70%), it has a high specificity (≥98%) [19,30]. In other words, this cheap and simple method might still be valuable at the program’s endgame, though the survey design will need to be adapted accordingly. For example, assuming the aforementioned diagnostic performance of a single Kato-Katz thick smear and the operational cost for Kato-Katz thick smear described in the methods section, sampling 20 clusters and 230 subjects per cluster (total sample size = 4,600) is the most cost-effective survey design for Ascaris that allows for adequate program decision-making (E_overtreat = 25%, E_undertreat = 5%) around the 2% prevalence threshold (lower limit = 1% and upper limit = 3%) (Table 3). However, this survey design will require substantial financial resources (20,236 US$), though improvements to the Kato-Katz thick smear may drastically reduce this cost. Although there are several alternative diagnostics for STH (e.g., Mini-FLOTAC, McMaster, FECPAK^G2 and qPCR), we strongly doubt whether they will outcompete the Kato-Katz thick smear. Their specificity too is not perfect [13,31] and they come with substantially more operational costs.

Compensating improved diagnostics tests’ higher cost and lower throughput by lower sample size

Our results confirm that improving diagnostic performance results in smaller sample sizes for the same level of program decision-making. They also highlight that improving the diagnostic performance and the corresponding reduced sample sizes can compensate for more costly tests and lower sample throughput. In addition, our results highlight that there is a limit to the extent to which higher reagent costs can be compensated by lower sample throughput and vice versa. For example, in our theoretical scenarios, the highest possible cost per test equaled 6.50 US$ and the minimal sample throughput 3 samples per person per hour when test sensitivity is 80%, and specificity is 98%. These findings are crucial for R&D and will guide developers in making strategic choices. Indeed, different pairs of sensitivity and specificity have been included in the WHO TPP. Yet, each of them will come with a different survey design (see above), and because of this, requirements for both sample throughput and reagent cost per test will be different.

Still a long way to go

Although we have illustrated the added value of our framework to develop more evidence-based M&E guidelines and make strategic R&D choices, there is still a long way to go. First, the optimal design will be specific to each implementation unit as it is expected that spatial heterogeneity will not be the same across implementation units due to local dispersion of houses and hygienic behaviour. Second, the framework is applicable to any NTD but it will need to be adapted for each specific NTD separately. This is because not only the program prevalence thresholds will be different, it is also expected that the spatial heterogeneity will vary across NTD. For instance, the geographical distribution of some NTD like lymphatic filariasis and schistosomiasis may be more focal (higher spatial heterogeneity) due to relatively low vector mobility as compared to, for example onchocerciasis (highly mobile vector) and STH (widespread environmental contamination). Moreover, it will be equally important for each NTD community to agree on the acceptable width of the grey zone separately for each program threshold. Third, although we focused primarily on the prevalence of STH infections of any intensity, the current framework can also be used to assess moderate-to-heavy intensity infections (WHO is striving to reduce the number of children that carry a moderate-to-heavy intensity infections to less than 2%) [32]. In this case, the presence of infections of any intensity is replaced by the presence of moderate-to-heavy intensity infections. Finally, we assumed that sensitivity and specificity do not vary across individuals. Yet, this assumption does not hold, as it is known that the sensitivity of Kato-Katz thick smear (and most other parasitological diagnostic methods) is positively related to infection intensity and is therefore different between individuals [15] as well as at the lower and upper boundary of the grey zone around a decision threshold. We believe that relaxing this assumption will impact the survey design, the sample size and the corresponding decision cut-off for any program prevalence thresholds. Therefore, future research should prioritize the expansion of the framework allowing for varying sensitivity as a function of egg counts and day-to-day variation in egg excretion. This expansion would allow for more insights into the impact of increased sampling (1 vs. 2 stool samples) and diagnostic effort (single vs. duplicate Kato-Katz thick smear) on the survey design, which in turn would provide a framework to make more evidence-based and cost-efficient recommendations.

Conclusion

Our framework allows for the assessment and updating of M&E guidelines for NTD and product development choices. Using STH as a case study, we show that current M&E guideline may fall short severely, especially in low-endemic and post-control settings. Furthermore, specificity rather than sensitivity becomes a critical parameter to consider and sampling more clusters (≥10) may be necessary in case of considerable heterogeneity in the geographical distribution of NTD infections.

Supporting information

S1 Table. Parameterisation of the variables used to illustrate the 2-stage LQAS framework for STH control programs.

T: program decision prevalence threshold, LL: lower limit of the grey zone, E_overrtreat: highest allowed probability of falsely continuing or upscaling an intervention within an implementation unit i, E_undertreat: highest allowed probability of prematurely stopping or scaling down interventions within an implementation unit i; ρ_i: intra-cluster correlation; se_d: sensitivity of an imperfect diagnostic test D; sp_d: specificity of an imperfect diagnostic test D; n_clust: number of clusters; n_sub: number of subjects per cluster. _: output variable and hence is not fixed on one value. *: The number of subjects per cluster was determined as the minimum required number of subjects that allow for adequate program decision-making (E_overtreat = 25%, E_undertreat = 5%) around a 2% program prevalence threshold. A total of 10,000 Monte Carlo simulations was used.

https://doi.org/10.1371/journal.pntd.0010353.s001

(DOCX)

S2 Table. Summary of diagnostic methods used to illustrate the 2-stage LQAS framework for STH control programs. TPPs: target product profiles.

https://doi.org/10.1371/journal.pntd.0010353.s002

(DOCX)

S1 Fig. The relationship between the risk of under and overtreatment when deploying perfect diagnostic test.

ε_undertreat: probability of prematurely reducing interventions within an implementation unit i; ε_overtreat: probability of falsely continuing or upscaling an intervention frequency within an implementation unit i. The blue indicates the area in which the combination of the risk of under and overtreatment allow for adequate decision-making (ε_undertreat = 5%, ε_overtreat = 25%). We defined the grey zone as T±50% (2%: lower limit = 1% and upper limit = 3%). The intra-cluster correlation ρ_i was set at 0.02 and the number of clusters at 5 and the number of subjects per cluster at 50. The graph was based on 10,000 Monte Carlo simulations.

https://doi.org/10.1371/journal.pntd.0010353.s003

(TIF)

References

1. Engels D, Zhou X-N. Neglected tropical diseases: an effective global response to local poverty-related disease priorities. Infect Dis poverty. 2020;9(1):1–9. pmid:31996251
- View Article
- PubMed/NCBI
- Google Scholar
2. Bodimeade C, Marks M, Mabey D. Neglected tropical diseases: elimination and eradication. Clin Med (Northfield Il). 2019;19(2):157. pmid:30872302
- View Article
- PubMed/NCBI
- Google Scholar
3. Reed SL, McKerrow JH. Why funding for neglected tropical diseases should be a global priority. Clin Infect Dis. 2018;67(3):323–6. pmid:29688342
- View Article
- PubMed/NCBI
- Google Scholar
4. World Health Organization. Ending the neglect to attain the Sustainable Development Goals: A road map for neglected tropical diseases 2021–2030 [Internet]. World Health Organization. 2021 [cited 2022 Feb 10]. Available from: https://www.who.int/publications/i/item/9789240010352
5. Gass K. Time for a diagnostic sea-change: Rethinking neglected tropical disease diagnostics to achieve elimination. PLoS Negl Trop Dis. 2020;14(12):e0008933. pmid:33382694
- View Article
- PubMed/NCBI
- Google Scholar
6. Levecke B, Coffeng LE, Hanna C, Pullan RL, Gass KM. Assessment of the required performance and the development of corresponding program decision rules for neglected tropical diseases diagnostic tests: Monitoring and evaluation of soil-transmitted helminthiasis control programs as a case study. PLoS Negl Trop Dis. 2021;15(9):e0009740. pmid:34520474
- View Article
- PubMed/NCBI
- Google Scholar
7. Hund L, Pagano M. Extending cluster lot quality assurance sampling designs for surveillance programs. Stat Med. 2014;33(16):2746–57. pmid:24633656
- View Article
- PubMed/NCBI
- Google Scholar
8. Coffeng LE, Le Rutte EA, Munoz J, Adams E, de Vlas SJ. Antibody and antigen prevalence as indicators of ongoing transmission or elimination of visceral leishmaniasis: a modelling study. Clin Infect Dis. 2021;(ciab210). pmid:33906229
- View Article
- PubMed/NCBI
- Google Scholar
9. Farrell SH, Coffeng LE, Truscott JE, Werkman M, Toor J, De Vlas SJ, et al. Investigating the effectiveness of current and modified world health organization guidelines for the control of soil-transmitted helminth infections. Clin Infect Dis. 2018;66(Suppl 4):S253–9. pmid:29860285
- View Article
- PubMed/NCBI
- Google Scholar
10. Giardina F, Coffeng LE, Farrell SH, Vegvari C, Werkman M, Truscott JE, et al. Sampling strategies for monitoring and evaluation of morbidity targets for soil-transmitted helminths. PLoS Negl Trop Dis. 2019;13(6):e0007514. pmid:31242194
- View Article
- PubMed/NCBI
- Google Scholar
11. Coffeng LE, Stolk WA, Golden A, de Los Santos T, Domingo GJ, de Vlas SJ. Predictive value of Ov16 antibody prevalence in different subpopulations for elimination of African onchocerciasis. Am J Epidemiol. 2019;188(9):1723–32. pmid:31062838
- View Article
- PubMed/NCBI
- Google Scholar
12. Gyorkos TW, Maheu-Giroux M, Blouin B, Casapia M. Impact of health education on soil-transmitted helminth infections in schoolchildren of the Peruvian Amazon: a cluster-randomized controlled trial. PLoS Negl Trop Dis. 2013;7(9):e2397. pmid:24069469
- View Article
- PubMed/NCBI
- Google Scholar
13. Cools P, Vlaminck J, Albonico M, Ame S, Ayana M, José Antonio BP, et al. Diagnostic performance of a single and duplicate Kato-Katz, Mini-FLOTAC, FECPAKG2 and qPCR for the detection and quantification of soil-transmitted helminths in three endemic countries. PLoS Negl Trop Dis. 2019;13(8):e0007446. pmid:31369558
- View Article
- PubMed/NCBI
- Google Scholar
14. Khurana S, Sethi S. Laboratory diagnosis of soil transmitted helminthiasis. Trop Parasitol. 2017;7(2):86. pmid:29114485
- View Article
- PubMed/NCBI
- Google Scholar
15. Bärenbold O, Garba A, Colley DG, Fleming FM, Assaré RK, Tukahebwa EM, et al. Estimating true prevalence of Schistosoma mansoni from population summary measures based on the Kato-Katz diagnostic technique. PLoS Negl Trop Dis. 2021;15(4):e0009310. pmid:33819266
- View Article
- PubMed/NCBI
- Google Scholar
16. Coffeng LE, Vlaminck J, Cools P, Albonico M, Ame S, Ayana M, et al. A general framework to make cost-efficient choices about fecal egg count methods and study designs to inform large-scale STH deworming programs–monitoring of therapeutic drug efficacy as a case study. (Manuscript Prep. 2021;
- View Article
- Google Scholar
17. WHO. Diagnostic target product profile for monitoring and evaluation of soil-transmitted helminth control programmes [Internet]. [cited 2021 Aug 16]. Available from: https://www.who.int/publications/i/item/9789240031227
18. Tarafder MR, Carabin H, Joseph L, Balolong E Jr, Olveda R, McGarvey ST. Estimating the sensitivity and specificity of Kato-Katz stool examination technique for detection of hookworms, Ascaris lumbricoides and Trichuris trichiura infections in humans in the absence of a ‘gold standard.’ Int J Parasitol. 2010;40(4):399–404. pmid:19772859
- View Article
- PubMed/NCBI
- Google Scholar
19. Speich B, Ali SM, Ame SM, Albonico M, Utzinger J, Keiser J. Quality control in the diagnosis of Trichuris trichiura and Ascaris lumbricoides using the Kato-Katz technique: experience from three randomised controlled trials. Parasit Vectors. 2015;8(1):1–8.
- View Article
- Google Scholar
20. Knopp S, Salim N, Schindler T, Voules DAK, Rothen J, Lweno O, et al. Diagnostic accuracy of Kato–Katz, FLOTAC, Baermann, and PCR methods for the detection of light-intensity hookworm and Strongyloides stercoralis infections in Tanzania. Am J Trop Med Hyg. 2014;90(3):535. pmid:24445211
- View Article
- PubMed/NCBI
- Google Scholar
21. Coulibaly JT, Ouattara M, Becker SL, Lo NC, Keiser J, N’Goran EK, et al. Comparison of sensitivity and faecal egg counts of Mini-FLOTAC using fixed stool samples and Kato-Katz technique for the diagnosis of Schistosoma mansoni and soil-transmitted helminths. Acta Trop. 2016;164:107–16. pmid:27591137
- View Article
- PubMed/NCBI
- Google Scholar
22. Nikolay B, Brooker SJ, Pullan RL. Sensitivity of diagnostic tests for human soil-transmitted helminth infections: a meta-analysis in the absence of a true gold standard. Int J Parasitol. 2014;44(11):765–74. pmid:24992655
- View Article
- PubMed/NCBI
- Google Scholar
23. Werkman M, Wright JE, Truscott JE, Easton A V, Oliveira RG, Toor J, et al. Testing for soil-transmitted helminth transmission elimination: Analysing the impact of the sensitivity of different diagnostic tools. PLoS Negl Trop Dis. 2018;12(1):e0006114. pmid:29346366
- View Article
- PubMed/NCBI
- Google Scholar
24. Freeman MC, Akogun O, Belizario V Jr, Brooker SJ, Gyorkos TW, Imtiaz R, et al. Challenges and opportunities for control and elimination of soil-transmitted helminth infection beyond 2020. PLoS Negl Trop Dis. 2019;13(4):e0007201. pmid:30973872
- View Article
- PubMed/NCBI
- Google Scholar
25. WHO. Onchocerciasis: diagnostic target product profile to support preventive chemotherapy [Internet]. [cited 2021 Aug 16]. Available from: https://www.who.int/publications/i/item/9789240024496
26. WHO. Diagnostic test for surveillance of lymphatic filariasis: target product profile [Internet]. [cited 2021 Aug 16]. Available from: https://www.who.int/publications/i/item/9789240018648
27. WHO. Diagnostic target product profiles for monitoring, evaluation and surveillance of schistosomiasis control programmes [Internet]. [cited 2021 Oct 11]. Available from: https://www.who.int/publications/i/item/9789240031104
28. Niguse AF, Hailu T, Alemu M, Nibret E, Amor A, Munshea A. Evaluating the Performance of Diagnostic methods for soil transmitted helminths in the Amhara National Regional State, Northwest Ethiopia. BMC Infect Dis. 2020;
- View Article
- Google Scholar
29. Dunn JC, Papaiakovou M, Han KT, Chooneea D, Bettis AA, Wyine NY, et al. The increased sensitivity of qPCR in comparison to Kato-Katz is required for the accurate assessment of the prevalence of soil-transmitted helminth infection in settings that have received multiple rounds of mass drug administration. Parasit Vectors. 2020;13(1):1–11. pmid:31900233
- View Article
- PubMed/NCBI
- Google Scholar
30. Vlaminck J, Cools P, Albonico M, Ame S, Ayana M, Dana D, et al. An in-depth report of quality control on Kato-Katz and data entry in four clinical trials evaluating the efficacy of albendazole against soil-transmitted helminth infections. PLoS Negl Trop Dis. 2020;14(9):e0008625. pmid:32956390
- View Article
- PubMed/NCBI
- Google Scholar
31. Cools P, van Lieshout L, Koelewijn R, Addiss D, Ajjampur SSR, Ayana M, et al. First international external quality assessment scheme of nucleic acid amplification tests for the detection of Schistosoma and soil-transmitted helminths, including Strongyloides: A pilot study. PLoS Negl Trop Dis. 2020;14(6):e0008231. pmid:32544158
- View Article
- PubMed/NCBI
- Google Scholar
32. World Health Organization. 2030 targets for soil-transmitted helminthiases control programmes [Internet]. World Health Organization; 2020 [cited 2022 Mar 15]. Available from: https://apps.who.int/iris/handle/10665/330611

[ref1] 1. Engels D, Zhou X-N. Neglected tropical diseases: an effective global response to local poverty-related disease priorities. Infect Dis poverty. 2020;9(1):1–9. pmid:31996251
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Bodimeade C, Marks M, Mabey D. Neglected tropical diseases: elimination and eradication. Clin Med (Northfield Il). 2019;19(2):157. pmid:30872302
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Reed SL, McKerrow JH. Why funding for neglected tropical diseases should be a global priority. Clin Infect Dis. 2018;67(3):323–6. pmid:29688342
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. World Health Organization. Ending the neglect to attain the Sustainable Development Goals: A road map for neglected tropical diseases 2021–2030 [Internet]. World Health Organization. 2021 [cited 2022 Feb 10]. Available from: https://www.who.int/publications/i/item/9789240010352

[ref5] 5. Gass K. Time for a diagnostic sea-change: Rethinking neglected tropical disease diagnostics to achieve elimination. PLoS Negl Trop Dis. 2020;14(12):e0008933. pmid:33382694
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Levecke B, Coffeng LE, Hanna C, Pullan RL, Gass KM. Assessment of the required performance and the development of corresponding program decision rules for neglected tropical diseases diagnostic tests: Monitoring and evaluation of soil-transmitted helminthiasis control programs as a case study. PLoS Negl Trop Dis. 2021;15(9):e0009740. pmid:34520474
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Hund L, Pagano M. Extending cluster lot quality assurance sampling designs for surveillance programs. Stat Med. 2014;33(16):2746–57. pmid:24633656
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Coffeng LE, Le Rutte EA, Munoz J, Adams E, de Vlas SJ. Antibody and antigen prevalence as indicators of ongoing transmission or elimination of visceral leishmaniasis: a modelling study. Clin Infect Dis. 2021;(ciab210). pmid:33906229
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. Farrell SH, Coffeng LE, Truscott JE, Werkman M, Toor J, De Vlas SJ, et al. Investigating the effectiveness of current and modified world health organization guidelines for the control of soil-transmitted helminth infections. Clin Infect Dis. 2018;66(Suppl 4):S253–9. pmid:29860285
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Giardina F, Coffeng LE, Farrell SH, Vegvari C, Werkman M, Truscott JE, et al. Sampling strategies for monitoring and evaluation of morbidity targets for soil-transmitted helminths. PLoS Negl Trop Dis. 2019;13(6):e0007514. pmid:31242194
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref11] 11. Coffeng LE, Stolk WA, Golden A, de Los Santos T, Domingo GJ, de Vlas SJ. Predictive value of Ov16 antibody prevalence in different subpopulations for elimination of African onchocerciasis. Am J Epidemiol. 2019;188(9):1723–32. pmid:31062838
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref12] 12. Gyorkos TW, Maheu-Giroux M, Blouin B, Casapia M. Impact of health education on soil-transmitted helminth infections in schoolchildren of the Peruvian Amazon: a cluster-randomized controlled trial. PLoS Negl Trop Dis. 2013;7(9):e2397. pmid:24069469
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref13] 13. Cools P, Vlaminck J, Albonico M, Ame S, Ayana M, José Antonio BP, et al. Diagnostic performance of a single and duplicate Kato-Katz, Mini-FLOTAC, FECPAKG2 and qPCR for the detection and quantification of soil-transmitted helminths in three endemic countries. PLoS Negl Trop Dis. 2019;13(8):e0007446. pmid:31369558
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref14] 14. Khurana S, Sethi S. Laboratory diagnosis of soil transmitted helminthiasis. Trop Parasitol. 2017;7(2):86. pmid:29114485
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref15] 15. Bärenbold O, Garba A, Colley DG, Fleming FM, Assaré RK, Tukahebwa EM, et al. Estimating true prevalence of Schistosoma mansoni from population summary measures based on the Kato-Katz diagnostic technique. PLoS Negl Trop Dis. 2021;15(4):e0009310. pmid:33819266
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref16] 16. Coffeng LE, Vlaminck J, Cools P, Albonico M, Ame S, Ayana M, et al. A general framework to make cost-efficient choices about fecal egg count methods and study designs to inform large-scale STH deworming programs–monitoring of therapeutic drug efficacy as a case study. (Manuscript Prep. 2021;
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref17] 17. WHO. Diagnostic target product profile for monitoring and evaluation of soil-transmitted helminth control programmes [Internet]. [cited 2021 Aug 16]. Available from: https://www.who.int/publications/i/item/9789240031227

[ref18] 18. Tarafder MR, Carabin H, Joseph L, Balolong E Jr, Olveda R, McGarvey ST. Estimating the sensitivity and specificity of Kato-Katz stool examination technique for detection of hookworms, Ascaris lumbricoides and Trichuris trichiura infections in humans in the absence of a ‘gold standard.’ Int J Parasitol. 2010;40(4):399–404. pmid:19772859
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref19] 19. Speich B, Ali SM, Ame SM, Albonico M, Utzinger J, Keiser J. Quality control in the diagnosis of Trichuris trichiura and Ascaris lumbricoides using the Kato-Katz technique: experience from three randomised controlled trials. Parasit Vectors. 2015;8(1):1–8.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref20] 20. Knopp S, Salim N, Schindler T, Voules DAK, Rothen J, Lweno O, et al. Diagnostic accuracy of Kato–Katz, FLOTAC, Baermann, and PCR methods for the detection of light-intensity hookworm and Strongyloides stercoralis infections in Tanzania. Am J Trop Med Hyg. 2014;90(3):535. pmid:24445211
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref21] 21. Coulibaly JT, Ouattara M, Becker SL, Lo NC, Keiser J, N’Goran EK, et al. Comparison of sensitivity and faecal egg counts of Mini-FLOTAC using fixed stool samples and Kato-Katz technique for the diagnosis of Schistosoma mansoni and soil-transmitted helminths. Acta Trop. 2016;164:107–16. pmid:27591137
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Nikolay B, Brooker SJ, Pullan RL. Sensitivity of diagnostic tests for human soil-transmitted helminth infections: a meta-analysis in the absence of a true gold standard. Int J Parasitol. 2014;44(11):765–74. pmid:24992655
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref23] 23. Werkman M, Wright JE, Truscott JE, Easton A V, Oliveira RG, Toor J, et al. Testing for soil-transmitted helminth transmission elimination: Analysing the impact of the sensitivity of different diagnostic tools. PLoS Negl Trop Dis. 2018;12(1):e0006114. pmid:29346366
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref24] 24. Freeman MC, Akogun O, Belizario V Jr, Brooker SJ, Gyorkos TW, Imtiaz R, et al. Challenges and opportunities for control and elimination of soil-transmitted helminth infection beyond 2020. PLoS Negl Trop Dis. 2019;13(4):e0007201. pmid:30973872
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref25] 25. WHO. Onchocerciasis: diagnostic target product profile to support preventive chemotherapy [Internet]. [cited 2021 Aug 16]. Available from: https://www.who.int/publications/i/item/9789240024496

[ref26] 26. WHO. Diagnostic test for surveillance of lymphatic filariasis: target product profile [Internet]. [cited 2021 Aug 16]. Available from: https://www.who.int/publications/i/item/9789240018648

[ref27] 27. WHO. Diagnostic target product profiles for monitoring, evaluation and surveillance of schistosomiasis control programmes [Internet]. [cited 2021 Oct 11]. Available from: https://www.who.int/publications/i/item/9789240031104

[ref28] 28. Niguse AF, Hailu T, Alemu M, Nibret E, Amor A, Munshea A. Evaluating the Performance of Diagnostic methods for soil transmitted helminths in the Amhara National Regional State, Northwest Ethiopia. BMC Infect Dis. 2020;
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref29] 29. Dunn JC, Papaiakovou M, Han KT, Chooneea D, Bettis AA, Wyine NY, et al. The increased sensitivity of qPCR in comparison to Kato-Katz is required for the accurate assessment of the prevalence of soil-transmitted helminth infection in settings that have received multiple rounds of mass drug administration. Parasit Vectors. 2020;13(1):1–11. pmid:31900233
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref30] 30. Vlaminck J, Cools P, Albonico M, Ame S, Ayana M, Dana D, et al. An in-depth report of quality control on Kato-Katz and data entry in four clinical trials evaluating the efficacy of albendazole against soil-transmitted helminth infections. PLoS Negl Trop Dis. 2020;14(9):e0008625. pmid:32956390
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref31] 31. Cools P, van Lieshout L, Koelewijn R, Addiss D, Ajjampur SSR, Ayana M, et al. First international external quality assessment scheme of nucleic acid amplification tests for the detection of Schistosoma and soil-transmitted helminths, including Strongyloides: A pilot study. PLoS Negl Trop Dis. 2020;14(6):e0008231. pmid:32544158
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref32] 32. World Health Organization. 2030 targets for soil-transmitted helminthiases control programmes [Internet]. World Health Organization; 2020 [cited 2022 Mar 15]. Available from: https://apps.who.int/iris/handle/10665/330611

Figures

Abstract

Background

Methodology

Principal findings

Conclusion/Significance

Author summary

Introduction

Methodology

Development of a 2-stage LQAS framework for population-based decision-making using imperfect diagnostic tests

General concepts of the framework.

The mathematical backbone of the framework

Simulation framework to determine the decision cut-off c

Application of the 2-stage LQAS framework to soil-transmitted helminths

Monitoring and evaluation of soil-transmitted helminth control programs as a case study.

Determine the required diagnostic performance when applying the current WHO survey design.

Optimise the survey design when using an imperfect diagnostic test.

Customize the sample throughput and cost per test according to the improvements in diagnostic performance.

Application of the framework to develop evidence-based guidelines and direct R&D.

Results

The required diagnostic performance when applying the current WHO survey design

Optimise the survey design when using an imperfect diagnostic test

Customize the sample throughput and cost per test according to the improvements in specificity

Application of the framework to develop evidence-based guidelines and direct R&D

Discussion

Revision of WHO guidelines for M&E of STH programs may be warranted

A paradigm shift from sensitivity to specificity is warranted

Compensating improved diagnostics tests’ higher cost and lower throughput by lower sample size

Still a long way to go

Conclusion

Supporting information

S1 Table. Parameterisation of the variables used to illustrate the 2-stage LQAS framework for STH control programs.

S2 Table. Summary of diagnostic methods used to illustrate the 2-stage LQAS framework for STH control programs. TPPs: target product profiles.

S1 Fig. The relationship between the risk of under and overtreatment when deploying perfect diagnostic test.

References