Quantifying the value of surveillance data for improving model predictions of lymphatic filariasis elimination

Background Mathematical models are increasingly being used to evaluate strategies aiming to achieve the control or elimination of parasitic diseases. Recently, owing to growing realization that process-oriented models are useful for ecological forecasts only if the biological processes are well defined, attention has focused on data assimilation as a means to improve the predictive performance of these models. Methodology and principal findings We report on the development of an analytical framework to quantify the relative values of various longitudinal infection surveillance data collected in field sites undergoing mass drug administrations (MDAs) for calibrating three lymphatic filariasis (LF) models (EPIFIL, LYMFASIM, and TRANSFIL), and for improving their predictions of the required durations of drug interventions to achieve parasite elimination in endemic populations. The relative information contribution of site-specific data collected at the time points proposed by the WHO monitoring framework was evaluated using model-data updating procedures, and via calculations of the Shannon information index and weighted variances from the probability distributions of the estimated timelines to parasite extinction made by each model. Results show that data-informed models provided more precise forecasts of elimination timelines in each site compared to model-only simulations. Data streams that included year 5 post-MDA microfilariae (mf) survey data, however, reduced each model’s uncertainty most compared to data streams containing only baseline and/or post-MDA 3 or longer-term mf survey data irrespective of MDA coverage, suggesting that data up to this monitoring point may be optimal for informing the present LF models. We show that the improvements observed in the predictive performance of the best data-informed models may be a function of temporal changes in inter-parameter interactions. Such best data-informed models may also produce more accurate predictions of the durations of drug interventions required to achieve parasite elimination. Significance Knowledge of relative information contributions of model only versus data-informed models is valuable for improving the usefulness of LF model predictions in management decision making, learning system dynamics, and for supporting the design of parasite monitoring programmes. The present results further pinpoint the crucial need for longitudinal infection surveillance data for enhancing the precision and accuracy of model predictions of the intervention durations required to achieve parasite elimination in an endemic location.


Methodology and principal findings
We report on the development of an analytical framework to quantify the relative values of various longitudinal infection surveillance data collected in field sites undergoing mass drug administrations (MDAs) for calibrating three lymphatic filariasis (LF) models (EPIFIL, LYM-FASIM, and TRANSFIL), and for improving their predictions of the required durations of drug interventions to achieve parasite elimination in endemic populations. The relative information contribution of site-specific data collected at the time points proposed by the WHO monitoring framework was evaluated using model-data updating procedures, and via calculations of the Shannon information index and weighted variances from the probability distributions of the estimated timelines to parasite extinction made by each model. Results show that data-informed models provided more precise forecasts of elimination timelines in each site compared to model-only simulations. Data streams that included year 5 post-MDA microfilariae (mf) survey data, however, reduced each model's uncertainty most compared to data streams containing only baseline and/or post-MDA 3 or longer-term mf survey data irrespective of MDA coverage, suggesting that data up to this monitoring point may be optimal for informing the present LF models. We show that the improvements observed in the predictive performance of the best data-informed models may be a function of temporal changes in inter-parameter interactions. Such best data-informed models may also produce PLOS

Introduction
Mathematical models of parasite transmission, via their capacity for producing dynamical forecasts or predictions of the likely future states of an infection system, offer an important tool for guiding the development and evaluation of strategies aiming to control or eliminate infectious diseases [1][2][3][4][5][6][7]. The power of these numerical simulation tools is based uniquely on their ability to appropriately incorporate the underlying nonlinear and multivariate processes of pathogen transmission in order to facilitate plausible predictions outside the range of conditions at which these processes are either directly observed or quantified [8][9][10][11]. The value of these tools for guiding policy and management decisions by providing comparative predictions of the outcomes of various strategies for achieving the control or elimination of the major Neglected Tropical Diseases (NTDs) has been highlighted in a series of recent publications [8,11,12], demonstrating the crucial role these quantitative tools are beginning to play in advancing policy options for these diseases.
While these developments underscore the utility of transmission models for supporting policy development in parasite control, a growing realization is that these models can be useful for this purpose only if the biological processes are well defined and demographic and environmental stochasticity are either well-characterized or unimportant for meeting the goal of the policy modelling exercise [9][10][11][13][14][15][16]. This is because the realized predictability of any model for a system depends on the initial conditions, parameterizations and process equations that are utilized in its simulation such that model outcomes are strongly sensitive to the choice of values used for these variables [17]. Any misspecification of these system attributes will lead to failure in accurately forecasting the future behaviour of a system, with predictions of actual future states becoming highly uncertain even when the exact representation of the underlying deterministic process is well established but precise specification of initial conditions or forcing and/or parameter values is difficult to achieve [17,18]. This problem becomes even more intractable when theoretical models depend on parameter estimates taken from other studies [5,17,19]. Both these challenges, viz. sensitivity to forcing conditions and use of parameter estimates from settings that are different from the dynamical environment in which a model will be used for simulation, imply that strong limits will be imposed on the realized predictability of any given model for an application [9,10,20]. As we have shown recently, if such uncertainties are ignored, the ability of parasite transmission models to form the scientific basis for management decisions can be severely undermined, especially when predictions are required over long time frames and across heterogeneous geographic locations [4,5,7].
These inherent difficulties with using an idealized model for producing predictions to guide management have led to consideration of data-driven modelling procedures that allow the use of information contained within observations to improve specification and hence the predictive performance of process-based models [9,10,14,[21][22][23]. Such approaches, termed model-data fusion or data assimilation methods, act by combining models with various data streams (including observations made at different spatial or temporal scales) in a statistically rigorous way to inform initial conditions, constrain model parameters and system states, and quantify model errors. The result is the discovery of models that can more adequately capture the prevailing system dynamics in a site, an outcome which in turn has been shown to result in the making of significantly improved predictions for management decision making [9,10,14,24]. Initially used in geophysics and weather forecasting, these methods are also beginning to be applied in ecological modelling, including more recently in the case of infectious disease modelling [9,10]. In the latter case, the approach has shown that it can reliably constrain a disease transmission model during simulation to yield results that approximate epidemiological reality as closely as possible, and as a consequence improve the accuracy of forecasts of the response of a pathogen system exposed to various control efforts [4-7, 21, 25-27].
More recently, attention has also focused on the notion that a model essentially represents a conditional proposition, i.e. that running a model in a predictive mode presupposes that the driving forces of the system will remain within the bounds of the model conceptualization or specification [28]. If these driving forces were to change, then it follows that even a model well-calibrated to a given historical dataset will fail. New developments in longitudinal data assimilation can mitigate this problem of potential time variation of parameters via the recursive adjustment of the model by assimilation of data obtained through time [22,29,30]. Apart from allowing assessment of whether stasis bias may occur in model predictions, such sequential model calibration with time-varying data can also be useful for quantifying the utility of the next measurement in maximizing the information gained from all measurements together [31]. Carrying out such longitudinal model-data analysis has thus the potential for providing information to improve the efficiency and cost-effectiveness of data monitoring campaigns [24,[31][32][33], along with facilitating more reliable model forecasts.
A key question, however, is evaluating which longitudinal data streams provide the most information to improve model performance [33]. Indeed, it is possible that from a modelling perspective using more data may not always lead to a better-constrained model [34]. This suggests that addressing this question is not only relevant to model developers, who need observational data to improve, constrain, and test models, but also for disease managers working on the design of disease surveillance plans. At a more philosophical level, we contend that these questions have implications for how current longitudinal monitoring data from parasite control programmes can best be exploited both scientifically and in management [31]. Specifically, we suggest that these surveillance data need to be analysed using models in a manner that allows the extraction of maximal information about the monitored dynamical systems so that this can be used to better guide both the collection of such data as well as the provision of more precise estimates of the system state for use in making state-dependent decisions [2,[35][36][37]. Currently, parasite control programmes use infection monitoring data largely from sentinel sites primarily to determine if an often arbitrarily set target is met [3]. Little consideration is given to whether these data could also be used to learn about the underlying transmission dynamics of the parasitic system, or how such learning can be effectively used by management to make better decisions regarding the interventions required in a setting to meet stated goals [2,4].
Here, we develop an analytical framework to investigate the value of using longitudinal LF infection data for improving predictions of the durations of drug interventions required for achieving LF elimination by coupling data collected during mass drug interventions (MDAs) carried out in three example field sites to three existing state-of-the-art lymphatic filariasis (LF) models [4,6,21,[38][39][40][41][42][43]. To be managerially relevant to current WHO-specified LF intervention surveillance efforts, we evaluated the usefulness of infection data collected in these sites at the time points proposed by the WHO monitoring framework in carrying out the present assessment [44]. This was specifically performed by ranking these different infection surveillance data streams according to the incremental information gain that each stream provided for reducing the prediction uncertainty of each model.

Data
Longitudinal pre-and post-infection and MDA data from representative sites located in each of the three major regions endemic for LF (Africa, India, and Papua New Guinea (PNG)) were assembled from the published literature for use in constraining the LF models employed in this study. The three sites (Kirare, Tanzania, Alagramam, India, and Peneng, PNG) were selected on the basis that each represents the average endemic transmission conditions (average level of infection, transmitting mosquito genus) of each of these three major extant LF regions, while providing details on the required model inputs and data for conducting this study. These data inputs encompassed information on the annual biting rate (ABR) and dominant mosquito genus, as well as MDA intervention details, including the relevant drug regimen, time and population coverage of MDA, and times and results of the conducted microfilaria (mf) prevalence surveys (Table 1). Note each site also provided these infection and MDA data at the time points pertinent to the existing WHO guidelines for conducting LF monitoring surveys during a MDA programme [44], which additionally, as pointed out above, allowed the assessment of the value of such infection data both for supporting effective model calibration and for producing more reliable intervention forecasts.

The models
The three existing LF models employed for this study included EPIFIL, a deterministic Monte Carlo population-based model, and LYMFASIM and TRANSFIL, which are both stochastic, individual-based models. All three models simulate LF transmission in a population by accounting for key biological and intervention processes such as impacts of vector density, the life cycle of the parasite, age-dependent exposure, density-dependent transmission processes, infection aggregation, and the effects of drug treatments as well as vector control [4, 21, 38-40, 42, 43, 49]. Although the three models structurally follow a basic coupled immigration-death model formulation, they differ in implementation (e.g. from individual to population-based), the total number of parameters included, and the way biological and intervention processes are mathematically incorporated and parameterized. The three models have been compared in recent work [8,12], with full details of the implementation and simulation procedures for each individual model also described [6,8,12,21,39,42,43,49,50]. Individual model parameters and fitting procedures specific to this work are given in detail in S1 Supplementary Information.

Longitudinal data assimilation procedures
We used longitudinal data assimilation methods to sequentially calibrate the three LF models with the investigated surveillance data such that parameter estimates and model predictions reflect not only the information contained in the baseline but also follow-up data points. The available mf prevalence data from each site were arranged into four different temporal data streams to imitate the current WHO guidelines regarding the time points for conducting monitoring surveys during an MDA programme. This protocol proposes that infection data be collected in sentinel sites before the first round of MDA to establish baseline conditions, no sooner than 6 months following the third round of MDA, and no sooner than 6 months following the fifth MDA to assess whether transmission has been interrupted (defined as reduction of mf prevalence to below 1% in a population) [44,51]. Thus, the four data streams considered for investigating the value of information gained from each survey were respectively: scenario 1-baseline mf prevalence data only, scenario 2-baseline and post-MDA 3 mf prevalence data, scenario 3-baseline, post-MDA 3, and post-MDA 5 mf prevalence data, and scenario 4-baseline and post-MDA 5 mf prevalence data. In addition to these four data streams, a fifth model-only scenario (scenario 0) was also considered where no site-specific data was introduced. In this case, simulations of interventions were performed using only model-specific parameter and ABR priors estimated for each region. The first step for all models during the data assimilation exercises reported here was to initially simulate the baseline infection conditions in each site using a large number of samples (100,000 for EPIFIL and TRANSFIL, and 10,000-30,000 for LYMFASIM) randomly selected from the parameter priors deployed by each model. The number of parameters which were left free to be fitted to these data by each model range from 3 (LYMFASIM and TRANSFIL) to 21 (EPIFIL). The ABR, a key transmission parameter in all three models, was also left as a free parameter whose distribution was influenced by the observed ABR (Table 1) and/or by fits to previous region-specific datasets (see S1 Supplementary Information for model-specific implementations). The subsequent steps used to incorporate longitudinal infection data into the model calibration procedure varied among the models, but in all cases the goodness-of-fit of the model outputs for the site-specific mf prevalence data was assessed using the chi-square metric (α = 0.05) [52].
EPIFIL used a sequential model updating procedure to iteratively modify the parameters with the introduction of each subsequent follow up data point through time [6]. This process uses parameter estimates from model fits to previous data as priors for the simulation of the next data which are successively updated with the introduction of each new observation, thus providing a flexible framework by which to constrain a model using newly available data. Fig 1  summarizes the iterative algorithm used for conducting this sequential model-data assimilation exercise [6]. LYMFASIM and TRANSFIL, by contrast, included all the data in each investigated stream together for selecting the best-fitting models for each time series-i.e. model selection for each data series was based on using all relevant observations simultaneously in the fitting process [30,53,54]. Although a limitation of this batch estimation approach is that the posterior probability of each model is fixed for the whole simulation period, unlike the case in sequential data assimilation where a restricted set of parameters is exposed to each observation (as a result of parameter constraining by data used in the previous time step)which thereby yields models that give better predictions for different portions of the underlying temporal process-here we use both methods to include and assess the impact that this implementation difference may have on the results presented below. For all models, the final updated parameter estimates from each data stream were used to simulate the impact of observed MDA rounds and for predicting the impact of continued MDA to estimate how many years were required to achieve 1% mf prevalence.

Intervention modelling
Interventions were modelled by using the updated parameter vectors or models selected from each scenario for simulating the impact of the reported as well as hypothetical future MDA rounds on the number of years required to reduce the observed baseline LF prevalence in each site to below the WHO transmission threshold of 1% mf prevalence [44]. When simulating these interventions, the observed MDA times, regimens, and coverages followed in each site were used (Table 1), while MDA was assumed to target all residents aged 5 years and above. For making mf prevalence forecasts beyond the observations made in each site, MDA simulations were extended for a total of 50 annual rounds in each site at an assumed coverage of 65%. While the drug-induced mf kill rate and the duration of adult worm sterilization were fixed among the models (Table 1), the worm kill rate was left as a free parameter to be estimated from post-intervention data to account for the uncertainty in this drug efficacy parameter [4,7,21]. The number of years of MDA required to achieve the threshold of 1% mf prevalence was calculated from model forecasts of changes in mf prevalence due to MDA for each modeldata fusion scenario.

Information contribution of model and data
The predictions from each model regarding timelines to achieve 1% mf for each fitting scenario were used to determine the information gained from each data stream compared to the In all scenarios, the initial EPIFIL models were initialized with parameter priors and a chi-square fitting criterion was applied to select those models which represent the baseline mf prevalence data sufficiently well (α = 0.05). The accepted models were then used to simulate the impact of interventions on mf prevalence. The chi-square fitting criterion was sequentially applied to refine the selection of models according to the post-MDA mf prevalence data included in the fitting scenario. The fitted parameters from selection of acceptable models at each data point were used to predict timelines to achieve 1% mf prevalence. The scenarios noted in the blue boxes indicate the final relevant updating step before using the fitted parameters to predict timelines to achieve 1% mf in that data fitting scenario. information attributable to the model itself [14,33,55]. The relative information gained from a particular data stream was calculated as I d = H m -H md where H measures the entropy or uncertainty associated with a random variable, H m denotes predictions from the model-only scenario (scenario 0) which essentially represents the impact of prior knowledge of the system, and H md signifies predictions from each of the four model-data scenarios (i.e. scenarios [1][2][3][4]. The values of I d for each data scenario or stream were compared in a site to infer which survey data are most useful for reducing model uncertainty. The Shannon information index was used to measure entropy, H, as follows: is the discrete probability density function (PDF) of the number of years of MDA predicted by each fitted model to reach 1% mf, and is estimated from a histogram of the respective model predictions for m bins (of equal width in the range between the minimum and maximum values of the PDFs) [14,56].
To statistically compare two entropy values, a permutation test using the differential Shannon entropy (DSE) was performed [57]. DSE is defined as |H 1 -H 2 | where H 1 was calculated from the distribution of timelines to achieve 1% mf for a given scenario, y 1 , and H 2 was calculated from the distribution of timelines to achieve 1% mf for a different scenario, y 2 . The list of elements in y 1 and y 2 were combined into a single list of size y 1 + y 2 and the list was permuted 20,000 times. DSE was then recalculated each time by calculating a new H 1 from the first y 1 elements and a new H 2 from the last y 2 elements from each permutation, from which p-values may be quantified as the proportion of all recalculated DSEs that were greater than the original DSE.

Weighting model predictions
Model predictions of the mean and variance in timelines to LF elimination were weighted according to the frequencies by which predictions occurred in a group of simulations. In general, if D 1 , D 2 ,. . .,D n are data points (model predictions in the present case) that occur in an ensemble of simulations with different weights or frequencies W 1 ,W 2 ,. . .,W n , then the weighted mean, Here, n is the number of data points and n 0 is the number of non-zero weights. In this study, the weighted variance of the distributions of predicted timelines to achieve 1% mf prevalence was calculated to provide a measure of the precision of model predictions in addition to the entropy measure, H. A similar weighting scheme was also used to pool the timeline predictions of all three models. Here, predictions made by each of the three models for each data scenario were weighted as above, and a composite weighted 95% percentile interval for the pooled predictions was calculated for each data stream. This was done by first computing the weighted percentiles for the combined model simulations from which the pooled 2.5 th and 97.5 th percentile values were quantified. The Matlab function, wprctile, was used to carry out this calculation.

Parameter constraints and interactions
The extent by which parameter constraints are achieved through the coupling of models with data was evaluated to determine if improvements in such constraints by the use of additional data may lead to reduced model prediction uncertainty [33]. Parameter constraint was calculated as the ratio of the mean standard deviation of all fitted parameter distributions to the mean standard deviation of all prior parameter distributions. A ratio of less than one indicates the fitted parameter space is more constrained than the prior parameter space [33]. This assessment was carried out using the EPIFIL model only. In addition, pairwise parameter correlations were also evaluated to assess whether the sign, magnitude, and significance of these correlations changed by scenario to determine if using additional data might alter these interactions to better constrain a model. For this assessment, Spearman's correlation coefficients and p-values testing the hypothesis of no correlation against the alternative of correlation were calculated, and the exercise was run using the estimated parameters from the EPIFIL model.

Sensitivity analyses
EPIFIL was used to conduct a sensitivity analysis investigating whether the trend in relative information gained by coupling the model with longitudinal data was dependent on the interventions simulated. The same series of simulations (for three LF endemic sites and five fitting scenarios) were completed with the extended MDA coverage beyond the observations given in Table 1 set here at 80% instead of 65% to represent an optimal control strategy. As before, the timelines to reach 1% mf prevalence in each fitting scenario were calculated and used to determine which data stream provided the model with the greatest gain of information. The results were compared to the original series of simulations to assess whether the trends are robust to changes in the intervention coverages simulated. EPIFIL was also used to perform another sensitivity analysis expanding the number of data streams to investigate if the WHO monitoring scheme is adequate for informing the making of reliable model-based predictions of timelines for achieving LF elimination. To perform this sensitivity analysis, pre-and post-MDA data from Villupuram district, India that provide extended data points (viz. scenario 1-4 as previously defined, plus scenario 5-baseline, post-MDA 3, post-MDA 5, and post-MDA 7 mf prevalence data, and scenario 6-baseline, post-MDA 3, post-MDA 5, post-MDA 7, and post-MDA 9 mf prevalence) were assembled from the published literature [47,58]. The timelines to reach 1% mf prevalence and the entropy for each of these additional scenarios were calculated to determine whether additional data streams over those recommended by WHO are required for achieving more reliable model constraints, which among these data might be considered as compulsory, and which might be optional for supporting predictions of elimination.

Statistical analyses
Differences in predicted medians, weighted variances and entropy values between data scenarios, models and sites were statistically evaluated using Kruskall-Wallis tests for equal medians, F-tests for equality of variance, and DSE permutation tests, respectively. P-values for assessing significance for all pairwise tests were obtained using the Benjamini-Hochberg procedure for controlling the false discovery rate, i.e. for protecting against the likelihood of obtaining false positive results when carrying out multiple testing [59].

Assessing the benefit of longitudinal monitoring data for modelling LF elimination
Here, our goal was twofold. First, to determine if data are required to improve the predictability of intervention forecasts by the present LF models in comparison with the use of theoretical models only, and second, to evaluate the benefit of using different longitudinal streams of mf survey data for calibrating the three models in order to determine which data stream was most informative for reducing the uncertainty in model predictions in a site. Table 2 summarises the key results from our investigation of these questions: these are the number of accepted best-fitting models for each data stream or scenario in the three study sites (Table 1), the predicted median and range (2.5 th -97.5 th percentiles) in years to achieve the mf threshold of 1% mf prevalence, the weighted variance and entropy values based on these predictions, and the relative information gained (in terms of reduced prediction uncertainty) by the use of longitudinal data for constraining the projections of each of the three LF models investigated. Even though the number of selected best-fit models based on the chi-square criterion (see Methods) differed for each site and model, these results indicate unequivocally that models constrained by data provided significantly more precise intervention predictions compared to model-only predictions ( Table 2). Note that this was also irrespective of the two types of longitudinal data assimilation procedures (sequential vs. simultaneous) used by the different models in this study. Thus, for all models and sites, model-only predictions made in the absence of data (scenario 0) showed the highest prediction uncertainty, highlighting the need for data to improve the predictive performance of the present models. The relative information gained by using each data stream in comparison to model-only predictions further support this finding, with the best gains in reducing model prediction uncertainty provided by those data constraining scenarios that gave the lowest weighted variance and entropy values; as much as 92% to 96% reductions in prediction variance were achieved by these scenarios in comparison to modelonly predictions between the three models ( Table 2). The results also show, however, that data streams including post-MDA 5 mf survey data (scenarios 3 and 4) reduced model uncertainty (based on both the variance and entropy measures) most compared to data streams containing only baseline and/or post-MDA 3 mf survey data (scenarios 1 and 2) ( Table 2). Although there were differences between the three models (due to implementation differences either in how the models are run (Monte Carlo deterministic vs. individual-based) or in relation to how the present data were assimilated (see above)), overall, scenario 3, which includes baseline, post-MDA 3, and post-MDA 5 data, was shown to most often reduce model uncertainty the greatest. Additionally, there was no statistical difference between the performances of scenarios 3 and 4 in those cases where scenario 4 resulted in the greatest gain of information (Table 2). It is also noticeable that the best constraining data stream for each combination of site and model also produced as expected the lowest range in predictions of the numbers of years of annual MDA required to achieve the 1% mf prevalence in each site, with the widest ranges estimated for model-only predictions (scenario 0) and the shorter data streams (scenario 1). In general, this constriction in predictions also led to lower estimates of the median times to achieve LF elimination, although this varied between models and sites ( Table 2).

Inter-and pooled model performance
The change in the distributions of predicted timelines to LF elimination without and with model constraining by the different longitudinal data streams is illustrated in Fig 2 for the Kirare site (see S2 Supplementary Information for results obtained for the other two study villages investigated here). The results illustrate that both the location and length of the tail of the prediction distributions can change as models are constrained with increasing lengths of longitudinal data, with inclusion of post-MDA 5 mf survey data consistently producing a narrower or sharper range of predictions compared to when this survey point is excluded. Fig 3 compares the uncertainty in predictions of timelines to achieve elimination made by each of the three models without (scenario 0) and via their constraining by the data streams providing the lowest prediction entropy for each of the models per site. Note that variations in scenario 0 predictions among the three models directly reflect the different model structures, parameterizations, and the presence (or absence) of stochastic elements. The boxplots in the figure, however, show that for all three sites and models, calibration of each model by data The lowest entropy scenario for each site is bolded and shaded grey. Additional scenarios shaded grey are not significantly different from the lowest entropy scenario. Data assimilation in filarial model predictions greatly reduces the uncertainty in predictions of the years of annual MDA required to eliminate LF compared to model-only predictions, with the data streams producing the lowest entropy for simulations in each site significantly improving the precision of these predictions ( Table 2). This gain in precision, and thus the information gained using these data streams, is, as expected, greater for the stochastic LYMFASIM and TRANSFIL models compared to the deterministic EPIFIL model. Note also that even though the ranges in predictions of the annual MDA years required to eliminate LF by the data streams providing the lowest prediction entropy differed statistically between the three models, the values overlapped markedly (e.g. for Kirare the ranges are 10-18, 9-14, 9-15 for EPIFIL, LYMFASIM and TRANSFIL Data assimilation in filarial model predictions respectively), suggesting the occurrence of a similar constraining of predictive behaviour among the three models. To investigate this potential for a differential model effect, we further pooled the predictions from all three models for all the data scenarios and evaluated the value of each investigated data stream for improving their combined predictive ability. The weighted 95% percentile intervals from the pooled predictions were used for carrying out this assessment. The results are depicted in Fig 4 and indicate that, as for the individual model predictions, uncertainty in the collective predictions by the three LF models for the required number of years to eliminate LF using annual MDA in each site may be reduced by model calibration to data, with the longitudinal mf prevalence data collected during the later monitoring periods (scenarios 3 and 4) contributing most to improving the multi-model predictions for each site. The boxplots show that by calibrating the models to data streams, more precise predictions are able to be made regarding timelines to achieve 1% mf prevalence across all models and sites. The results of pairwise F-tests for variance, performed to compare the weighted variance in timelines to achieve 1% mf prevalence between model-only simulations (scenario 0) and the lowest entropy simulations (best scenario) (see Table 2), show that the predictions for the best scenarios are significantly different from the predictions for the model-only simulations. Significance was determined using the Benjamini-Hochberg procedure for controlling the false discovery rate (q = 0.05). For EPIFIL, LYMFASIM and TRANSFIL, the best scenarios are scenarios 4, 3, and 4 for Kirare, scenarios 4, 4, and 3 for Alagramam, and scenarios 3, 3, and 3 for Peneng, respectively.

Parameter constraints and interactions
We attempted to investigate if model uncertainty in predictions by the use of longitudinal data was a direct function of parameter constraining by the addition of data. Given the similarity in outcomes of each model, we remark on the results from the fits of the technically easier to run EPIFIL model to evaluate this possibility here. The assessment of the parameter space constraint achieved through the inclusion of data was made by determining if the fitted parameter distributions for the model became reduced in comparison with priors as data streams were added to the system [33]. The exercise showed that the size of the estimated parameter distributions reduced with addition of data, with even scenario 1 data producing reductions for Kirare and Peneng (Fig 5). In the case of Alagramam, however, there was very little, if any, constraint in the fitted parameter space compared to the prior parameter space. This result, together with the fact that even using all the data in Kirare and Peneng produced up to only between 2.5 to 5% reductions in fitted parameter distributions when compared to the priors, indicate that the observed model prediction uncertainty in this study may be due to other complex factors connected with model parameterization. Table 3 provides the results of an analysis of pairwise parameter correlations of the selected best-fitting models for data scenario 1 compared to those selected by the data stream that gave the best reduction in EPIFIL prediction uncertainty for Alagramam (scenario 3). These results show that while the parameter space was not being constrained with the addition of more data, the pattern of parameter correlations changed in a complex manner between the two constraining data sets. For example, although the number of significantly correlated parameters did not differ, the magnitude and direction of parameter correlations were shown to change between the two data scenarios ( Table 3). The corresponding results for Kirare and Peneng are shown in S3 Supplementary  Information, and indicate that a broadly similar pattern of changes in parameter associations also occurred as a result of model calibration to the sequential data measured from those sites. This suggests that this outcome may constitute a general phenomenon at least with regards to the sequential constraining of EPIFIL using longitudinal mf prevalence data. An intriguing finding (from all three data settings) is that the most sensitive parameters in this regard, i.e. with respect to altered strengths in pairwise parameter correlations, may be those representing the relationship of various components of host immunity with different transmission processes, including with adult worm mortality, rates of production and survival of mf, larval development rates in the mosquito vector and infection aggregation (Table 3). This suggests that, as more constraining data are added, changes in the multidimensional parameter relationships related to host immunity could contribute to the sequential reductions in the LF model predictive uncertainty observed in this study.

Contributions of MDA coverage and length of monitoring data to model uncertainty
The LF elimination timeline predictions used above were based on modelling the impacts of annual MDA given the reported coverages in each site followed by an assumed standard coverage for making longer term predictions (see Methods). This raises the question as to whether the differences detected in the case of the best constraining data stream between the present study sites and between models ( Table 2) could be a function of the simulated MDA coverages in each site. To investigate this possibility, we used EPIFIL to model the outcome of changing the assumed MDA coverage in each site on the corresponding entropy and information gain trends in elimination predictions made from the models calibrated to each of the site-specific data scenarios/streams investigated here. The results of increasing the assumed coverage of MDA to 80% for each site are shown in Fig 6 and indicate that the choice of MDA coverage in this study are unlikely to have significantly influenced the conclusion made above that the best performing data streams for reducing model uncertainty for predicting LF elimination pertains to data scenarios 3 and 4. However, while the model-predicted timelines to achieve the 1% mf prevalence threshold using the observed MDA coverage followed by 80% MDA coverage showed that the data stream which most reduced uncertainty did not change from the impact of using the observed MDA coverage followed by 65% MDA coverage modelled for Kirare and Peneng (Table 2, Fig 6), this was not the case for Alagramam, where data from scenario 3 with a 80% coverage resulted in the greatest reduction in entropy compared to the original results using 65% coverage which indicated that scenario 4 data performed best (Table 2, Fig 6). Notably, though, the entropy values of predictions using the data scenario 3 and 4 constraints were not statistically different for this site (p-value < 0.05) (Fig 6).
EPIFIL was also used to expand the number of calibration scenarios using a dataset with longer term post-MDA data from Villupuram district, India. This dataset contained two addition data streams: scenario 5 which included baseline, post-MDA 3, post-MDA 5, and post-MDA 7 mf data, and scenario 6, which included baseline, post-MDA 3, post-MDA 5, post-MDA 7, and post-MDA 9 mf data. Scenario 6 thus contained the most post-MDA data and was demonstrated to be the most effective for reducing model uncertainty, but this effect was not statistically significantly different from the reductions produced by assimilating data contained in Table 3. Spearman parameter correlations for scenarios 1 (lower left triangle) and 3 (upper right triangle) for Alagramam, India. Data assimilation in filarial model predictions scenarios 3 and 5 ( Table 4). The inclusion of more data than are considered in scenario 3 therefore did not result in any significant additional reduction in model uncertainty.

Assessing prediction accuracy
EPIFIL was used to evaluate the accuracy of the data-driven predictions of the timelines required to meet the goal of LF elimination based on breaching the WHO-set target of 1% mf For all sites, either scenario 3 or 4 had the lowest entropies, and scenario 4 was not significantly different from scenario 3 for Kirare and Alagramam. These results were not statistically different from the results given 65% coverage (see Table 2), suggesting that the data stream associated with the lowest entropy is robust to changes in the interventions simulated. Scenarios where the weighted variance or entropy were not significantly different from the lowest entropy scenario are noted with the abbreviation NS. Significance was determined using the Benjamini-Hochberg procedure for controlling the false discovery rate (q = 0.05).

Relative information gained by data (%) +
Significance was determined using the Benjamini-Hochberg procedure for controlling the false discovery rate (q = 0.05) in all pairwise statistical tests. + information gained by each data stream (scenario1-6) are presented in comparison to the information contained in the model-only simulation (scenario 0) https://doi.org/10.1371/journal.pntd.0006674.t004 Data assimilation in filarial model predictions prevalence. This analysis was performed by using the longitudinal pre and post-infection and MDA data reported for the Nigerian site, Dokan Tofa, where elimination was achieved according to WHO recommended criteria after seven rounds of MDA (Table 5). The data from this site comprised information on the ABR and dominant mosquito genus, as well as details of the MDA intervention carried out, including the relevant drug regimen applied, time and population coverage of MDA, and outcomes from the mf prevalence surveys conducted at baseline and at multiple time points during MDA [60]. The results of model predictions of the timelines to reach below 1% mf prevalence as a result of sequential fitting to the mf prevalence data from this site pertaining to scenarios 0-4 (as defined above) are shown in Table 6. Note that in the post MDA 3, 5 and 7 surveys, as no LF positive individuals were detected among the sample populations, we used a one-sided 95% Clopper-Pearson interval to determine the expected upper one-sided 95% confidence limits for these sequentially observed zero infection values  Data assimilation in filarial model predictions using the "Rule of Three" approximation after K empty samples formula [61]. The results show that model constraining by scenario 2, which includes baseline and post-MDA 3 data, and scenario 3, which includes baseline, post-MDA 3, and post-MDA 5 data, resulted in both the least entropy values and the shortest predicted times, i.e., from as low as 2 to as high as 7 years, required for achieving LF elimination in this site ( Table 6). The data in Table 5 show that the first instance the calculated one-sided upper 95% confidence limit in this setting fell below 1% mf prevalence also occurred post MDA 7 (i.e after 7 years of MDA). This is a significant result, and indicates that apart from being able to reduce prediction uncertainty, the best data-constrained models are also able to more accurately predict the maximal time (7 years) by which LF elimination occurred in this site.

Discussion
Our major goal in this study was to compare the reliability of forecasts of timelines required for achieving parasite elimination made by generic LF models versus models constrained by sequential mf prevalence surveillance data obtained from field sites undergoing MDA. A secondary aim was to evaluate the relative value of data obtained at each of the sampling time points proposed by the WHO for monitoring the effects of LF interventions in informing these model predictions. This assessment allowed us to investigate the role of these data for learning system dynamics and measure their value for guiding the design of surveillance programmes in order to support better predictions of the outcomes of applied interventions. Fundamentally, however, this work addresses the question of how best to use predictive parasite transmission models for guiding management decision making, i.e. whether this should be based on the use of ideal models which incorporate generalized parameter values or on models with parameters informed by local data [10]. If we find that data-informed models can reduce prediction uncertainty significantly compared to the use of theoretical models unconstrained by data, then it is clear that to be useful for management decision making we require the application of model-data assimilation frameworks that can effectively incorporate information from appropriate data into models for producing reliable intervention projections. Antithetically, such a finding implies that using unconstrained ideal models in these circumstances will provide only approximate predictions characterized by a degree of uncertainty that might be too large to be useful for reliable decision making [14,33,62].
Here, we have used three state-of-the-art LF models calibrated to longitudinal human mf prevalence data obtained from three representative LF study sites to carry out a systematic analysis of these questions in parasite intervention modelling (see also Walker et al [63] for a recent study highlighting the importance of using longitudinal sentinel site data for improving the prediction performances of the closely-related onchocerciasis models). Further, by iteratively testing the reduction in the uncertainty of the projections of timelines required to achieve LF elimination in a site made by the models matching each observed data point, we have also quantified the relative values of temporal data streams, including assessing optimal record lengths, for informing the current LF models. Our results provide important insights as to how best to use process models for understanding and generating predictions of parasite dynamics. They also highlight how site-specific longitudinal surveillance data coupled with models can be useful for providing information about system dynamics and hence for improving predictions of relevance to management decision-making.
The first result of major importance from our work is that models informed by data can significantly reduce predictive uncertainty and hence improve performance of the present LF models for guiding policy and management decision-making. Our results show that these improvements in predictive precision were consistent between the three models and across all three of our study sites, and can be very substantive with up to as much as 92% to 96% reductions in prediction variance obtained by the best data-constrained models in a site compared to the use of model-only predictions ( Table 2). The practical policy implications of this finding can also be gleaned from appraising the actual numerical ranges in the predictions made by each individual model for each of the modelling scenarios investigated here. In the case of EPI-FIL, the best data-informed model (scenario 3 in Peneng) gave an elimination prediction range of 7-12 years, while the corresponding model-only predictions for this site indicated a need for between 6-29 years of annual MDA (Table 2). These gains in information from using data to inform model parameters and hence predictions were even larger for the two stochastic models investigated here, viz. LYMFASIM and TRANSFIL, where ranges as wide as 7-28 years predicted by model-only scenarios were reduced to 9-14 years for the best data-informed models in the case of LYMFASIM for Kirare village, and from as broad as 8-48 years to 7-22 years respectively in the case of TRANSFIL for Peneng (Table 2). These results unequivocally indicate that if parasite transmission models are used unconstrained by data, i.e. based on general parameter values uninformed by local data, it would lead to the making of predictions that would be marked by uncertainties that are likely to be far too large to be meaningful for practical policy making. If managers are risk averse, this outcome will also mean their need to plan interventions for substantially much longer than necessary, with major implications for the ultimate cost of the programme. Note also that although statistically significant changes in the median years of MDA required to achieve LF elimination were observed for the best datainformed models for all the three LF model types in each site, these were relatively small compared to the large reductions seen in each model's predictive uncertainly (Table 2, Fig 3). This result highlights that the major gains from constraining the present models by data lies in improving their predictive certainty rather than in advancing their average behaviour. However, our preliminary analysis of model predictive accuracy suggests that the best data-constrained models may also be able to generate more accurate predictions of the impact of control ( Table 6), indicating that, apart from simply reducing predictive uncertainty, such models could additionally have improved capability for producing more reliable predictions of the outcomes of interventions carried out in a setting.
The iterative testing of the reduction in forecast uncertainty using mf surveillance data measured at time points proposed by the WHO (to support assessment of whether the threshold of 1% mf prevalence has been reached before implementation units can move to post-treatment surveillance [44]) has provided further insights into the relative value of these data for improving the predictive performance of each of the present LF models. Our critical finding here is that parameter uncertainty in all three LF models was similarly reduced by the assimilation of a few additional longitudinal data records (Table 2). In particular, we show that data streams comprising baseline + post-MDA 3 + post-MDA 5 (scenario 3) and those comprising baseline + post-MDA 5 data (scenario 4) best reduced parameter-based uncertainty in model projections of the impact of MDAs carried out in each study site irrespective of the models used. Although preliminary, a potential key finding is that the use of longer-term data additional to the data measured at the WHO proposed monitoring time points did not lead to a significant further reduction in parameter uncertainty (Table 4). Also, the finding that the WHO data scenarios 3 and 4 were adequate for constraining the present LF models appears not to be an artefact of variations in the MDA coverages observed between the three study sites (Fig 6). These results suggest that up to 5 years of post-MDA mf prevalence data are sufficient to constrain model predictions of the impact of LF interventions at a time scale that can go up to as high as 7 to 22 years depending on the site and model, and that precision may not improve any further if more new data are added ( Table 2, Table 4). Given that the WHO post-MDA LF infection monitoring protocol was developed for the purpose solely focussed on supporting the meeting of set targets (e.g. the 1% mf prevalence threshold) and not on a priori hypotheses regarding how surveillance data could be used also to understand the evolution and hence prediction of the dynamical parasitic system in response to management action, our results are entirely fortuitous with respect to the value of the current LF monitoring data for learning about the LF system and its extinction dynamics in different settings [31]. They do, nonetheless, hint at the value that coupling models to data may offer to inform general theory for guiding the collection and use of monitoring data in parasite surveillance programmes in a manner that could help extract maximal information about the underlying parasite system of interest.
Our assessment of whether the incremental increase in model predictive performance observed as a result of assimilating longitudinal data may be due to parameter constraining by the addition of data has shed intriguing new light on the impact that qualitative changes in dynamical system behaviour may have on parameter estimates and structure, and hence on the nature of the future projections of system change we can make from models. Our major finding in this regard is that even though the parameter space itself may not be overly constrained by the best data stream (scenario 3 in this case for Alagramam village), the magnitude and direction of parameter correlations, particularly those representing the relationship of different components of host immunity with various transmission processes, changed markedly between the shorter (scenario 1) and seemingly optimal data streams (scenario 3). This qualitative change in system behaviour induced by alteration in parameter interactions in response to perturbations has been shown to represent a characteristic feature of complex adaptive ecological systems, particularly when these systems approach a critical boundary [64][65][66]. This underscores yet another important reason to incorporate parameter information from data for generating sound system forecasts [67]. The finding that additional data beyond 5 years post-MDA did not appear to significantly improve model predictive performance in this regard suggests that pronounced change in LF parameter interactions in response to MDA interventions may occur generally around this time point for this parasitic disease, and that once in this parameter regime further change appears to be unlikely. This is an interesting finding, which not only indicates that coupling models to at least 5 years post-MDA will allow detection of the boundaries delimiting the primary LF parameter regions with different qualitative behaviour, but also that the current WHO monitoring protocol might be sufficient to allow this discovery of system change.
Although our principal focus in this study was in investigating the value of longitudinal data for informing the predictive performance of the current LF models, the results presented here have also underscored the existence of significant spatial heterogeneity in the dynamics of parasite extinction between the present sites ( Table 2, Fig 3). In line with our previous findings, this observed conditional dependency of systems dynamics on local transmission conditions means that timelines or durations of interventions required to break LF transmission (as depicted in Table 2) will also vary from site to site even under similar control conditions [3][4][5]21]. As we indicated before, this outcome implies that we vitally require the application of models to detailed spatio-temporal infection surveillance data, such as that exemplified by the data collected by countries in sentinel sites as part of their WHO-directed monitoring and evaluation activities, if we are to use the present models to make more reliable intervention predictions to drive policy and management decisions (particularly with respect to the durations of interventions required, need for switching to more intensified or new MDA regimens, and need for enhanced supplementary vector control) in a given endemic setting [64]. As we have previously pointed out, the development of such spatially adaptive intervention plans will require the development and use of spatially-explicit data assimilation modelling platforms that can couple geostatistical interpolation of model inputs (eg. ABR and/or sentinel site mf/ antigen prevalence data) with discovery of localized models from such data in order to produce the required regional or national intervention forecasts [5].
The estimated parameter and prediction uncertainties presented here are clearly dependent on the model-data fusion methodology and its implementation, and the cost function used to discover the appropriate models for a data stream [20]. While we have attempted to evaluate differences in individual model structures, their computer implementation, and the data assimilation procedures followed (e.g. sequential vs. simultaneous data assimilation), via comparing the collective predictions of the three models versus the predictions provided by each model singly, and show that these factors are unlikely to play a major role in influencing the current results, we indicate that future work must address these issues adequately to improve the initial methods we have employed in this work. Currently, we are examining the development of sequential Bayesian-based multi-model ensemble approaches that will allow better integration of each model's behaviour as well as better calculation of each model's transient parameter space at each time a new observation becomes available [30]. This work also involves the development of a method to fuse information from several indicators of infection (e.g. mf, antigenemia, antibody responses [21]) together to achieve a more robust constraining of the present models. As different types of data can act as mutual constraints on a model, we also expect that such multiindicator model-data fusion methods will additionally address the problem of equifinality, which is known to complicate the parameterization of complex dynamical models [24,68].
Of course, the ultimate test of the results reported here, viz. that LF models constrained by coupling to year 5 post-MDA data can provide the best predictions of timelines for meeting the 1% mf prevalence threshold in a site, is by carrying out the direct validation of our results against independent observations (as demonstrated by the preliminary validation study carried out here using the Dokan Tofa data (Tables 5 and 6)). We expect that data useful for performing these studies at scale may be available at the sentinel site level in the countries carrying out the current WHO-led monitoring programme. The present results indicate that access to such data, and to post-treatment surveillance data which are beginning to be assembled by many countries, is now a major need if the present LF models are to provide maximal information about parasite system responses to management and thus generate better predictions of system states for use in policy making and in judging management effectiveness in different spatiotemporal settings [24,31]. Given that previous modelling work has indicated that if the globally fixed WHO-proposed 1% mf prevalence threshold is insufficient to break LF transmission in every setting (and thus conversely leading to significant infection recrudescence [21]), the modelling of such spatio-temporal surveillance data will additionally allow testing if meeting this recommended threshold will indeed result in successfully achieving the interruption of LF transmission everywhere.