Strengthening data collection for neglected tropical diseases: What data are needed for models to better inform tailored intervention programmes?

Locally tailored interventions for neglected tropical diseases (NTDs) are becoming increasingly important for ensuring that the World Health Organization (WHO) goals for control and elimination are reached. Mathematical models, such as those developed by the NTD Modelling Consortium, are able to offer recommendations on interventions but remain constrained by the data currently available. Data collection for NTDs needs to be strengthened as better data are required to indirectly inform transmission in an area. Addressing specific data needs will improve our modelling recommendations, enabling more accurate tailoring of interventions and assessment of their progress. In this collection, we discuss the data needs for several NTDs, specifically gambiense human African trypanosomiasis, lymphatic filariasis, onchocerciasis, schistosomiasis, soil-transmitted helminths (STH), trachoma, and visceral leishmaniasis. Similarities in the data needs for these NTDs highlight the potential for integration across these diseases and where possible, a wider spectrum of diseases.


Introduction
The neglected tropical diseases (NTDs) are a diverse group of communicable diseases identified by the World Health Organization (WHO) which predominantly affect populations living in poverty, leading to increased morbidity and mortality [1]. In 2012, WHO Roadmap on NTDs was developed to accelerate efforts for elimination and control whereby the diseases are no longer considered public health problems [1]. Disease-specific goals have been defined and set by WHO to be reached by 2020 with new Roadmap targets drafted for 2021 to 2030 [2]. High-quality data are needed to track progress towards the new WHO NTD Roadmap, but data challenges remain [3]. Furthermore, WHO recognises that monitoring and evaluation (M&E) for all NTDs is weak in many countries and that the capacity for data collection should be prioritized and strengthened [2].
Moving forward, it is clear that there is a need to strengthen data collection and evaluation for decision-making. Mathematical models, such as those developed and investigated by the NTD Modelling Consortium [4][5][6], have an important role in evaluating current data and determining remaining data gaps. These models have recently been recognised by WHO for providing information to inform strategies against NTDs [7,8].
To inform the discussion on expanding data collection, we have performed focused analyses on priority data needs for 7 NTDs (gambiense human African trypanosomiasis, lymphatic filariasis, onchocerciasis, schistosomiasis, soil-transmitted helminths (STH), trachoma, and visceral leishmaniasis in the Indian subcontinent) in a special collection of papers in PLOS Neglected Tropical Diseases and summarised the key data requirements raised within this special NTD Modelling Consortium collection here [9]. These analyses address 2 main issues: Firstly, M&E needs to better inform tailoring of programmes, and secondly, key epidemiological uncertainties which are crucial for understanding the dynamics of these diseases in response to interventions and in planning for WHO control or elimination goals.
Although this collection was written prior to the current Coronavirus Disease 2019 (COVID-19) pandemic which has postponed many NTD-related activities [10], upon their resumption, there is an opportunity to collect data which could be used to better tailor programmes, ensuring and, in some cases, accelerating progress towards WHO 2030 targets [11].

Indirectly estimating transmission
To reach WHO goals by 2030, tailoring of intervention programmes is becoming increasingly important, particularly as many of the NTDs face programmatic constraints (Table 1). Measures of transmission in an area are required to inform model-based recommendations for tailored interventions, i.e., the frequency, coverage, and duration of interventions required. However, as disease transmission cannot be directly measured, it must be estimated indirectly from data collected in the field. In most areas, local tailoring of interventions requires more information on local transmission than current surveillance delivers.
Mathematical models have the potential to offer recommendations for locally tailored interventions but remain constrained by the data currently available. Better data will improve the quality of models and modelling recommendations in numerous ways, such as informing model parameters and assumptions, reducing uncertainty and verifying projections, thereby enabling more accurate tailoring of interventions and assessment of their progress. There are many ways to improve data collection activities to gain more information about transmission (summarised in Fig 1 and Tables 2 and 3).

Improving monitoring and evaluation
To improve the outcomes and impact of NTD interventions, M&E activities are carried out to enhance performance and measure results [2]. A vital aspect of M&E is collecting data which can be used to assess whether interventions are on track for achieving WHO goals. To assess this and to determine areas where interventions need to be modified (e.g., intensified due to not being on track or relaxed due to being overtreated/limited resources), more information about the interventions being implemented is needed. This includes data on the population Table 1. Overview of the 7 NTDs analysed in the NTD Modelling Consortium collection [9].

NTD and WHO target analysed in collection Main mode of transmission WHO-recommended strategy
Gambiense human African trypanosomiasis: Elimination of transmission [12] Transmitted by tsetse flies Intensified disease management via active and passive case finding, followed by treatment Lymphatic filariasis (Elephantiasis): Elimination as a public health problem (<1% microfilarial prevalence) [13] Transmitted by mosquitoes Annual MDA Onchocerciasis (River blindness): Elimination of transmission [14][15] Transmitted by black flies Annual MDA Schistosomiasis (Bilharzia): Morbidity control (�5% heavyintensity prevalence in school-aged children aged 5-14 years) and elimination as a public health problem (�1% heavy-intensity prevalence in school-aged children aged 5-14 years) [16] Transmitted through parasite eggs in an infected individual's excreta contaminating freshwater sources. that has been targeted, the timing and frequency of interventions, and additionally for mass drug administration (MDA) programmes, the coverage and adherence during each round of MDA (Fig 1). Human/blackfly mixing patterns based on pre-control distribution of mf intensity levels in humans Mean larval infection intensity per local blackfly population and the size of potential human subgroups linked to the same sites (e.g., fishermen near a specific flybreeding site) Model-predicted prospects of elimination through MDA strongly depend on the degree of assortative mixing. However, there is little quantitative evidence to inform elimination strategies on whether and how to respond to assortative mixing Sampling from diverse individuals (skin snips). In settings with mf prevalence <30%, high skin mf density in those mf-positive (>20 mf/skin snip) may indicate assortative mixing Interviewing the local human population (asking for main visited locations) and catching and dissecting blackflies from diverse locations. Trying to link local fly populations with high infection intensity levels to specific human subgroup(s) exposed to these flies Difficult to quantify the extent of assortative mixing. Highly location-specific data and entomologist expertise are needed

Onchocerciasis [15]
Individual-level heterogeneity in exposure to fly bites Exposure heterogeneity has a large impact on parasite resilience and is currently estimated using population level epidemiological data M&E data can be used to determine the optimal treatment strategy (i.e., frequency, coverage, and duration) required in a particular location (Table 2 and Fig 1). To determine the specific age groups that need to be targeted in a given area, data are required to inform the age profile of infection [13,16,21].
To assess how infection levels are impacted following a round of treatment, and to validate model projections, data collected at multiple time points, particularly pre-and posttreatment, are informative [13,16,19]. Furthermore, for diseases assessing the effectiveness of passive case detection, such as gambiense human African trypanosomiasis, data on the stage of the disease are needed [12]. Where possible, collecting data at multiple time points within randomised controlled trials can provide greater insight into the impact attributable to an intervention.
It is important to note that reality cannot be perfectly observed but collecting better data and using statistical tools will improve our understanding of the underlying biological processes of interest and allow us to take these limitations into account. Diagnostic test performance adds to the complexity of prevalence measures (Table 2). Additionally, as these diseases vary geographically, the prevalence is characterised, to various extents, by spatial heterogeneity. For example, for STH, sampling multiple villages/schools per implementation unit improves the accuracy in assessing progress towards targets [17]. Furthermore, spatial correlation can be beneficially used to optimise survey designs and improve the accuracy of predictive risk maps [25]. However, geostatistical models for disease prevalence strongly rely on the quality of the underlying data, especially on the reliability of the geographical coordinates of the survey locations [26]. Inaccuracies or incompleteness of this essential information reduces the quality of model outputs.

Uncertain epidemiology-Learning more
As these diseases are neglected, and often characterised by complicated parasite life cycles, there is limited knowledge on their epidemiology and the population biology of the parasites causing them. Modelling insights remain limited by the lack of epidemiological and field data available [5]. Consequently, modelling assumptions have to be made resulting in uncertainty in model recommendations. There are key areas of uncertainty where epidemiological data are required for improving our understanding of the dynamics and model parameterisation, in order to improve the robustness of model insights (Table 3 and Fig 1). Although some parameters may never be estimable, there may be testable hypotheses which could inform our understanding of epidemiology.
The persistence of transmission when infection levels have been reduced through interventions is crucially dependent on heterogeneities in exposure, immunological processes, parasite aggregation, and ultimately transmission. These are very difficult to measure, even in epidemiological studies, but may be essential for achieving the long-term goals of NTD programmes. For vector-borne diseases, such as onchocerciasis and visceral leishmaniasis, human/vector mixing patterns play a role in local transmission dynamics. Hence, data on these patterns can reveal the degree of spatial clustering, assortative (nonhomogeneous) mixing and exposure heterogeneity allowing for improved prediction of village-level incidence and guidelines on spatially targeted interventions [14,15,22,27]. Additionally, for visceral leishmaniasis, data on immune responses and infection combined with presence or absence of symptoms can inform the duration of immunity and identify markers for infection [23,28]. Note that we focus on visceral leishmaniasis in the Indian subcontinent as it is believed to be entirely anthroponotic only there (i.e., humans are the only reservoir of infection) [22].
Water, sanitation, and hygiene (WASH) interventions have played a role across many of the NTDs. However, the value of WASH has been difficult to analyse with reviews based on current evidence showing contrasting effects [29][30][31]. To better understand and predict the added value of WASH, detailed data on WASH-related behaviour are required, although this could be difficult to collect [18] (Table 3).

Better data but at what cost?
It is important to take into account that although there are great benefits to better data, data collection is typically limited due to various financial and programmatic constraints. Key constraints associated with obtaining data are summarised in Tables 2 and 3 and Fig 2. Although it is likely to be more costly to collect the required data, this may be more costeffective in the long term as it will allow for more effective decision-making. Hence, rather than a cost, this could be viewed as an investment. As an example for schistosomiasis, new diagnostic techniques may potentially have a higher cost per test, but this may be outweighed by the long-term programmatic benefits, including being able to detect elimination and resurgence [32]. Furthermore, given the similarities of data needs for these diseases, integration of data collection activities across multiple NTDs could potentially reduce the total costs.

Data curation, integration, and availability
There are a variety of challenges surrounding the quality of current data, for example, data collected on paper that requires manual entry into databases can increase the risk of errors and be time-consuming. Other challenges include partial reporting whereby only a portion or summary of the data collected is made available, and the absence of standardisation and consistency of reporting both within and between countries at different time points can make the data integration process difficult often resulting in a loss of data. Hence, better data refers not only to collecting a greater quantity of data but also to improving the quality of the data and data reporting protocols. For the NTD Modelling Consortium and for the wider scientific community, data curation, integration, and availability are key. Standardising and curating data and having it available publicly would ensure that it can be utilised by the scientific community. Electronic data collection tools are paving the way forward for addressing some of these challenges [33][34][35][36]. Alongside this, the Findability, Accessibility, Interoperability, and Reusability (FAIR) data principles have been designed to improve scientific data management and stewardship [37]. Publishing the models and outputs in a reproducible way is also important for driving forward progress on NTDs.

Conclusions
Better M&E and epidemiological data will improve our understanding of these NTDs by leading to more informed parameter values, validated model structures, and reduced uncertainty, thereby improving the reliability of assessments of intervention programmes and modelling recommendations for tailored interventions. On the one hand, more accurate models may give us greater confidence in whether the goal of an intervention strategy will be met. On the other, they might allow us to better assess the robustness of M&E strategies, which aim to verify whether a goal has been met, after an intervention has been implemented.
Further work is needed to encourage opportunities for the integration of data collection activities across the NTDs and where possible, a wider spectrum of diseases. Additionally, once NTD programmes are able to resume following the current disruption due to COVID-19, potential synergies between the COVID-19 control efforts and NTD programmes will be important to consider [10,11,38]. Moving forward, as transmission declines and programmes become more tailored, such opportunities will be important as data needs will continue to grow.