• Loading metrics

Strengthening data collection for neglected tropical diseases: What data are needed for models to better inform tailored intervention programmes?

Strengthening data collection for neglected tropical diseases: What data are needed for models to better inform tailored intervention programmes?

  • Jaspreet Toor, 
  • Jonathan I. D. Hamley, 
  • Claudio Fronterre, 
  • María Soledad Castaño, 
  • Lloyd A. C. Chapman, 
  • Luc E. Coffeng, 
  • Federica Giardina, 
  • Thomas M. Lietman, 
  • Edwin Michael, 
  • Amy Pinsent


Locally tailored interventions for neglected tropical diseases (NTDs) are becoming increasingly important for ensuring that the World Health Organization (WHO) goals for control and elimination are reached. Mathematical models, such as those developed by the NTD Modelling Consortium, are able to offer recommendations on interventions but remain constrained by the data currently available. Data collection for NTDs needs to be strengthened as better data are required to indirectly inform transmission in an area. Addressing specific data needs will improve our modelling recommendations, enabling more accurate tailoring of interventions and assessment of their progress. In this collection, we discuss the data needs for several NTDs, specifically gambiense human African trypanosomiasis, lymphatic filariasis, onchocerciasis, schistosomiasis, soil-transmitted helminths (STH), trachoma, and visceral leishmaniasis. Similarities in the data needs for these NTDs highlight the potential for integration across these diseases and where possible, a wider spectrum of diseases.


The neglected tropical diseases (NTDs) are a diverse group of communicable diseases identified by the World Health Organization (WHO) which predominantly affect populations living in poverty, leading to increased morbidity and mortality [1]. In 2012, WHO Roadmap on NTDs was developed to accelerate efforts for elimination and control whereby the diseases are no longer considered public health problems [1]. Disease-specific goals have been defined and set by WHO to be reached by 2020 with new Roadmap targets drafted for 2021 to 2030 [2]. High-quality data are needed to track progress towards the new WHO NTD Roadmap, but data challenges remain [3]. Furthermore, WHO recognises that monitoring and evaluation (M&E) for all NTDs is weak in many countries and that the capacity for data collection should be prioritized and strengthened [2].

Moving forward, it is clear that there is a need to strengthen data collection and evaluation for decision-making. Mathematical models, such as those developed and investigated by the NTD Modelling Consortium [46], have an important role in evaluating current data and determining remaining data gaps. These models have recently been recognised by WHO for providing information to inform strategies against NTDs [7,8].

To inform the discussion on expanding data collection, we have performed focused analyses on priority data needs for 7 NTDs (gambiense human African trypanosomiasis, lymphatic filariasis, onchocerciasis, schistosomiasis, soil-transmitted helminths (STH), trachoma, and visceral leishmaniasis in the Indian subcontinent) in a special collection of papers in PLOS Neglected Tropical Diseases and summarised the key data requirements raised within this special NTD Modelling Consortium collection here [9]. These analyses address 2 main issues: Firstly, M&E needs to better inform tailoring of programmes, and secondly, key epidemiological uncertainties which are crucial for understanding the dynamics of these diseases in response to interventions and in planning for WHO control or elimination goals.

Although this collection was written prior to the current Coronavirus Disease 2019 (COVID-19) pandemic which has postponed many NTD-related activities [10], upon their resumption, there is an opportunity to collect data which could be used to better tailor programmes, ensuring and, in some cases, accelerating progress towards WHO 2030 targets [11].

Indirectly estimating transmission

To reach WHO goals by 2030, tailoring of intervention programmes is becoming increasingly important, particularly as many of the NTDs face programmatic constraints (Table 1). Measures of transmission in an area are required to inform model-based recommendations for tailored interventions, i.e., the frequency, coverage, and duration of interventions required. However, as disease transmission cannot be directly measured, it must be estimated indirectly from data collected in the field. In most areas, local tailoring of interventions requires more information on local transmission than current surveillance delivers.

Table 1. Overview of the 7 NTDs analysed in the NTD Modelling Consortium collection [9].

Mathematical models have the potential to offer recommendations for locally tailored interventions but remain constrained by the data currently available. Better data will improve the quality of models and modelling recommendations in numerous ways, such as informing model parameters and assumptions, reducing uncertainty and verifying projections, thereby enabling more accurate tailoring of interventions and assessment of their progress. There are many ways to improve data collection activities to gain more information about transmission (summarised in Fig 1 and Tables 2 and 3).

Fig 1. Key data required to indirectly inform transmission which feeds into and improves modelling projections allowing for better assessment and tailoring of interventions.

WASH, water, sanitation, and hygiene.

Table 3. Summary of epidemiological data needs for 4 NTDs.

Improving monitoring and evaluation

To improve the outcomes and impact of NTD interventions, M&E activities are carried out to enhance performance and measure results [2]. A vital aspect of M&E is collecting data which can be used to assess whether interventions are on track for achieving WHO goals. To assess this and to determine areas where interventions need to be modified (e.g., intensified due to not being on track or relaxed due to being overtreated/limited resources), more information about the interventions being implemented is needed. This includes data on the population that has been targeted, the timing and frequency of interventions, and additionally for mass drug administration (MDA) programmes, the coverage and adherence during each round of MDA (Fig 1).

M&E data can be used to determine the optimal treatment strategy (i.e., frequency, coverage, and duration) required in a particular location (Table 2 and Fig 1). To determine the specific age groups that need to be targeted in a given area, data are required to inform the age profile of infection [13,16,21].

To assess how infection levels are impacted following a round of treatment, and to validate model projections, data collected at multiple time points, particularly pre- and posttreatment, are informative [13,16,19]. Furthermore, for diseases assessing the effectiveness of passive case detection, such as gambiense human African trypanosomiasis, data on the stage of the disease are needed [12]. Where possible, collecting data at multiple time points within randomised controlled trials can provide greater insight into the impact attributable to an intervention.

It is important to note that reality cannot be perfectly observed but collecting better data and using statistical tools will improve our understanding of the underlying biological processes of interest and allow us to take these limitations into account. Diagnostic test performance adds to the complexity of prevalence measures (Table 2). Additionally, as these diseases vary geographically, the prevalence is characterised, to various extents, by spatial heterogeneity. For example, for STH, sampling multiple villages/schools per implementation unit improves the accuracy in assessing progress towards targets [17]. Furthermore, spatial correlation can be beneficially used to optimise survey designs and improve the accuracy of predictive risk maps [25]. However, geostatistical models for disease prevalence strongly rely on the quality of the underlying data, especially on the reliability of the geographical coordinates of the survey locations [26]. Inaccuracies or incompleteness of this essential information reduces the quality of model outputs.

Uncertain epidemiology—Learning more

As these diseases are neglected, and often characterised by complicated parasite life cycles, there is limited knowledge on their epidemiology and the population biology of the parasites causing them. Modelling insights remain limited by the lack of epidemiological and field data available [5]. Consequently, modelling assumptions have to be made resulting in uncertainty in model recommendations. There are key areas of uncertainty where epidemiological data are required for improving our understanding of the dynamics and model parameterisation, in order to improve the robustness of model insights (Table 3 and Fig 1). Although some parameters may never be estimable, there may be testable hypotheses which could inform our understanding of epidemiology.

The persistence of transmission when infection levels have been reduced through interventions is crucially dependent on heterogeneities in exposure, immunological processes, parasite aggregation, and ultimately transmission. These are very difficult to measure, even in epidemiological studies, but may be essential for achieving the long-term goals of NTD programmes. For vector-borne diseases, such as onchocerciasis and visceral leishmaniasis, human/vector mixing patterns play a role in local transmission dynamics. Hence, data on these patterns can reveal the degree of spatial clustering, assortative (nonhomogeneous) mixing and exposure heterogeneity allowing for improved prediction of village-level incidence and guidelines on spatially targeted interventions [14,15,22,27]. Additionally, for visceral leishmaniasis, data on immune responses and infection combined with presence or absence of symptoms can inform the duration of immunity and identify markers for infection [23,28]. Note that we focus on visceral leishmaniasis in the Indian subcontinent as it is believed to be entirely anthroponotic only there (i.e., humans are the only reservoir of infection) [22].

Water, sanitation, and hygiene (WASH) interventions have played a role across many of the NTDs. However, the value of WASH has been difficult to analyse with reviews based on current evidence showing contrasting effects [2931]. To better understand and predict the added value of WASH, detailed data on WASH-related behaviour are required, although this could be difficult to collect [18] (Table 3).

Better data but at what cost?

It is important to take into account that although there are great benefits to better data, data collection is typically limited due to various financial and programmatic constraints. Key constraints associated with obtaining data are summarised in Tables 2 and 3 and Fig 2.

Fig 2. Programmatic constraints associated with obtaining the required M&E and epidemiological data.

M&E, monitoring and evaluation; WASH, water, sanitation, and hygiene.

Although it is likely to be more costly to collect the required data, this may be more cost-effective in the long term as it will allow for more effective decision-making. Hence, rather than a cost, this could be viewed as an investment. As an example for schistosomiasis, new diagnostic techniques may potentially have a higher cost per test, but this may be outweighed by the long-term programmatic benefits, including being able to detect elimination and resurgence [32]. Furthermore, given the similarities of data needs for these diseases, integration of data collection activities across multiple NTDs could potentially reduce the total costs.

Data curation, integration, and availability

There are a variety of challenges surrounding the quality of current data, for example, data collected on paper that requires manual entry into databases can increase the risk of errors and be time-consuming. Other challenges include partial reporting whereby only a portion or summary of the data collected is made available, and the absence of standardisation and consistency of reporting both within and between countries at different time points can make the data integration process difficult often resulting in a loss of data. Hence, better data refers not only to collecting a greater quantity of data but also to improving the quality of the data and data reporting protocols. For the NTD Modelling Consortium and for the wider scientific community, data curation, integration, and availability are key. Standardising and curating data and having it available publicly would ensure that it can be utilised by the scientific community. Electronic data collection tools are paving the way forward for addressing some of these challenges [3336]. Alongside this, the Findability, Accessibility, Interoperability, and Reusability (FAIR) data principles have been designed to improve scientific data management and stewardship [37]. Publishing the models and outputs in a reproducible way is also important for driving forward progress on NTDs.


Better M&E and epidemiological data will improve our understanding of these NTDs by leading to more informed parameter values, validated model structures, and reduced uncertainty, thereby improving the reliability of assessments of intervention programmes and modelling recommendations for tailored interventions. On the one hand, more accurate models may give us greater confidence in whether the goal of an intervention strategy will be met. On the other, they might allow us to better assess the robustness of M&E strategies, which aim to verify whether a goal has been met, after an intervention has been implemented.

Further work is needed to encourage opportunities for the integration of data collection activities across the NTDs and where possible, a wider spectrum of diseases. Additionally, once NTD programmes are able to resume following the current disruption due to COVID-19, potential synergies between the COVID-19 control efforts and NTD programmes will be important to consider [10,11,38]. Moving forward, as transmission declines and programmes become more tailored, such opportunities will be important as data needs will continue to grow.


We are grateful to all of the NTD Modelling Consortium members and our external collaborators for contributing to this collection. We thank Hugo Turner for helpful comments on this viewpoint. Additionally, we thank Andreia Vasconcelos for overseeing the development of this viewpoint.


  1. 1. World Health Organization. Accelerating work to overcome the global impact of neglected tropical diseases: a roadmap for implementation. 2012 [cited 2021 Jan 30]. Available from:
  2. 2. World Health Organization. Ending the neglect to attain the Sustainable Development Goals: a road map for neglected tropical diseases 2021–2030. 2020 [cited 2021 Jan 30]. Available from:
  3. 3. Malecela MN. Reflections on the decade of the neglected tropical diseases. Int Health. 2019. pmid:31529110
  4. 4. Hollingsworth TD, Adams ER, Anderson RM, Atkins K, Bartsch S, Basáñez MG, et al. Quantitative analyses and modelling to support achievement of the 2020 goals for nine neglected tropical diseases. Parasit Vectors. 2015. pmid:26652272
  5. 5. Hollingsworth TD. Counting Down the 2020 Goals for 9 Neglected Tropical Diseases: What Have We Learned From Quantitative Analysis and Transmission Modeling? Clin Infect Dis. 2018. pmid:29860293
  6. 6. Hollingsworth TD, Medley GF. Learning from multi-model comparisons: Collaboration leads to insights, but limitations remain. Epidemics. 2017. pmid:28279450
  7. 7. Gates Open Research. 2030 goals for neglected tropical diseases. 2020 [cited 2020 June 11]. Available from:
  8. 8. World Health Organization. Modelling study widens viewpoints for new roadmap for neglected tropical diseases. 2019 [cited 2020 June 11]. Available from:
  9. 9. PLOS Collections. NTD Modelling Consortium: Insights on data needs. 2019 [cited 2020 May 28]. Available from:
  10. 10. World Health Organization. COVID-19: WHO issues interim guidance for implementation of NTD programmes. 2020 [cited 2020 June 11]. Available from:
  11. 11. Toor J, Adams ER, Aliee M, Amoah B, Anderson RM, Ayabina D, et al. Predicted impact of COVID-19 on neglected tropical disease programmes and the opportunity for innovation. Clin Infect Dis. 2020. pmid:32984870
  12. 12. Castano MS, Ndeffo-Mbah ML, Rock KS, Palmer C, Knock E, Miaka EM, et al. Assessing the impact of aggregating disease stage data in model predictions of human African trypanosomiasis transmission and control activities in Bandundu province (DRC). PLoS Negl Trop Dis. 2020. pmid:31961872
  13. 13. Michael E, Sharma S, Smith ME, Touloupou P, Giardina F, Prada JM, et al. Quantifying the value of surveillance data for improving model predictions of lymphatic filariasis elimination. PLoS Negl Trop Dis. 2018. pmid:30296266
  14. 14. de Vos AS, Stolk WA, de Vlas SJ, Coffeng LE. The effect of assortative mixing on stability of low helminth transmission levels and on the impact of mass drug administration: Model explorations for onchocerciasis. PLoS Negl Trop Dis. 2018. pmid:30296264
  15. 15. Hamley JID, Milton P, Walker M, Basáñez MG. Modelling exposure heterogeneity and density dependence in onchocerciasis using a novel individual-based transmission model, EPIONCHO-IBM: Implications for elimination and data needs. PLoS Negl Trop Dis. 2019. pmid:31805049
  16. 16. Toor J, Turner HC, Truscott JE, Werkman M, Phillips AE, Alsallaq R, et al. The design of schistosomiasis monitoring and evaluation programmes: The importance of collecting adult data to inform treatment strategies for Schistosoma mansoni. PLoS Negl Trop Dis. 2018. pmid:30296257
  17. 17. Giardina F, Coffeng LE, Farrell SH, Vegvari C, Werkman M, Truscott JE, et al. Sampling strategies for monitoring and evaluation of morbidity targets for soil-transmitted helminths. PLoS Negl Trop Dis. 2019. pmid:31242194
  18. 18. Coffeng LE, Nery SV, Gray DJ, Bakker R, de Vlas SJ, Clements ACA. Predicted short and long-term impact of deworming and water, hygiene, and sanitation on transmission of soil-transmitted helminths. PLoS Negl Trop Dis. 2018. pmid:30522129
  19. 19. Pinsent A, Hollingsworth TD. Optimising sampling regimes and data collection to inform surveillance for trachoma control. PLoS Negl Trop Dis. 2018. pmid:30307939
  20. 20. Lietman TM, Deiner MS, Oldenburg CE, Nash SD, Keenan JD, Porco TC. Identifying a sufficient core group for trachoma transmission. PLoS Negl Trop Dis. 2018. pmid:30296259
  21. 21. Chapman LAC, Morgan ALK, Adams ER, Bern C, Medley GF, Hollingsworth TD. Age trends in asymptomatic and symptomatic Leishmania donovani infection in the Indian subcontinent: A review and analysis of data from diagnostic and epidemiological studies. PLoS Negl Trop Dis. 2018. pmid:30521526
  22. 22. Chapman LAC, Jewell CP, Spencer SEF, Pellis L, Datta S, Chowdhury R, et al. The role of case proximity in transmission of visceral leishmaniasis in a highly endemic village in Bangladesh. PLoS Negl Trop Dis. 2018. pmid:30296295
  23. 23. Bulstra CA, Le Rutte EA, Malaviya P, Hasker EC, Coffeng LE, Picado A, et al. Visceral leishmaniasis: Spatiotemporal heterogeneity and drivers underlying the hotspots in Muzaffarpur, Bihar, India. PLoS Negl Trop Dis. 2018. pmid:30521529
  24. 24. Singh OP, Tiwary P, Kushwaha AK, Singh SK, Singh DK, Lawyer P, et al. Xenodiagnosis to evaluate the infectiousness of humans to sandflies in an area endemic for visceral leishmaniasis in Bihar, India: a transmission-dynamics study. Lancet. 2021. (20)30166–X
  25. 25. Fronterre C, Amoah B, Giorgi E, Stanton MC, Diggle PJ. Design and Analysis of Elimination Surveys for Neglected Tropical Diseases. J Infect Dis. 2020. pmid:31930383
  26. 26. Jacquez GM. A research agenda: does geocoding positional error matter in health GIS studies? Spat Spatiotemporal Epidemiol. 2012. pmid:22469487
  27. 27. Chapman LAC, Spencer SEF, Pollington TM, Jewell CP, Mondal D, Alvar J, et al. Inferring transmission trees to guide targeting of interventions against visceral leishmaniasis and post–kala-azar dermal leishmaniasis. Proc Natl Acad Sci U S A. 2020. pmid:32973088
  28. 28. Le Rutte EA, Zijlstra EE, de Vlas SJ. Post-Kala-Azar Dermal Leishmaniasis as a Reservoir for Visceral Leishmaniasis Transmission. Trends Parasitol. 2019. pmid:31266711
  29. 29. Stocks ME, Ogden S, Haddad D, Addiss DG, McGuire C, Freeman MC. Effect of water, sanitation, and hygiene on the prevention of trachoma: a systematic review and meta-analysis. PLoS Med. 2014. pmid:24586120
  30. 30. Vaz Nery S, Pickering AJ, Abate E, Asmare A, Barrett L, Benjamin-Chung J, et al. The role of water, sanitation and hygiene interventions in reducing soil-transmitted helminths: interpreting the evidence and identifying next steps. Parasit Vectors. 2019. pmid:31138266
  31. 31. Ejere HOD, Alhassan MB, Rabiu M. Face washing promotion for preventing active trachoma. Cochrane Database Syst Rev. 2015. pmid:25697765
  32. 32. Turner HC, Bettis AA, Dunn JC, Whitton JM, Hollingsworth TD, Fleming FM, et al. Economic Considerations for Moving beyond the Kato-Katz Technique for Diagnosing Intestinal Parasites As We Move Towards Elimination. Trends Parasitol. 2017. pmid:28187989
  33. 33. World Health Organization Expanded Special Project for Elimination of Neglected Tropical Diseases. ESPEN Collect. 2021 [cited 2021 Jan 30]. Available from:
  34. 34. Open Data Kit. Open Data Kit. 2020 [cited 2021 Jan 30]. Available from:
  35. 35. Epicollect5. Free and easy-to-use mobile data-gathering platform. 2021 [cited 2021 Jan 30]. Available from:
  36. 36. Tropical Data. 2021 [cited 2021 Mar 22]. Available from:
  37. 37. Wilkinson M, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016. pmid:26978244
  38. 38. Ehrenberg JP, Zhou XN, Fontes G, Rocha EMM, Tanner M, Utzinger J. Strategies supporting the prevention and control of neglected tropical diseases during and beyond the COVID-19 pandemic. Infect Dis Poverty. 2020. pmid:32646512