Skip to main content
  • Loading metrics

Predicting the environmental suitability for onchocerciasis in Africa as an aid to elimination planning


Recent evidence suggests that, in some foci, elimination of onchocerciasis from Africa may be feasible with mass drug administration (MDA) of ivermectin. To achieve continental elimination of transmission, mapping surveys will need to be conducted across all implementation units (IUs) for which endemicity status is currently unknown. Using boosted regression tree models with optimised hyperparameter selection, we estimated environmental suitability for onchocerciasis at the 5 × 5-km resolution across Africa. In order to classify IUs that include locations that are environmentally suitable, we used receiver operating characteristic (ROC) analysis to identify an optimal threshold for suitability concordant with locations where onchocerciasis has been previously detected. This threshold value was then used to classify IUs (more suitable or less suitable) based on the location within the IU with the largest mean prediction. Mean estimates of environmental suitability suggest large areas across West and Central Africa, as well as focal areas of East Africa, are suitable for onchocerciasis transmission, consistent with the presence of current control and elimination of transmission efforts. The ROC analysis identified a mean environmental suitability index of 0·71 as a threshold to classify based on the location with the largest mean prediction within the IU. Of the IUs considered for mapping surveys, 50·2% exceed this threshold for suitability in at least one 5 × 5-km location. The formidable scale of data collection required to map onchocerciasis endemicity across the African continent presents an opportunity to use spatial data to identify areas likely to be suitable for onchocerciasis transmission. National onchocerciasis elimination programmes may wish to consider prioritising these IUs for mapping surveys as human resources, laboratory capacity, and programmatic schedules may constrain survey implementation, and possibly delaying MDA initiation in areas that would ultimately qualify.

Author summary

As of 2018, it was unknown if onchocerciasis transmission occurred among approximately 2 400 implementation units (IUs; typically, second administrative-level units, such as districts) considered potentially endemic. These IUs have either never been surveyed for onchocerciasis or historical data are not sufficient to define contemporary endemicity status. Given the large number of IUs for which baseline data collection is likely required to achieve continental elimination, there is a need to prioritise areas for surveys to ensure that those suitable for endemic transmission, and therefore potentially eligible for mass drug administration, are able to initiate interventions as soon as possible. We used boosted regression trees to predict environmental suitability for onchocerciasis, with corresponding measures of uncertainty. We summarized the fine scale spatial predictions at the IU level by using receiver operating characteristic (ROC) curve analysis to identify a threshold that maximized agreement with the occurrence locations to identify IUs that may warrant prioritisation for mapping surveys. This analysis suggests that approximately half of the IUs considered for surveys could be classified as environmentally suitable for onchocerciasis. In order to develop an elimination strategy, many national onchocerciasis elimination programmes (NOEPs) need a mechanism to synthesise historical data to define priority areas for surveys.


Onchocerciasis (a disease caused by infection with Onchocerca volvulus) can lead to permanent blindness and skin disease, and over 99% of people infected reside in Africa [1]. Since the mid-1970s, vector control of the Simulium black fly vectors and, since the late-1980s, mass drug administration (MDA) with ivermectin, have been implemented (in combination or using MDA alone) with the goal of reducing onchocerciasis-related morbidity in areas of meso- to hyper-endemicity [2]. To date, over one billion ivermectin treatments have been administered by national onchocerciasis control programmes, in addition to millions of ivermectin treatments provided for the elimination of lymphatic filariasis (LF) as a public health problem [2]. Preventive chemotherapy with MDA (in which individuals residing in endemic areas are offered ivermectin) has been identified as the primary intervention for the control of onchocerciasis-related morbidity and elimination of onchocerciasis transmission [2]. Under the former Onchocerciasis Control Programme (OCP) in West Africa and the African Programme for Onchocerciasis Control (APOC), as well as onchocerciasis-control programmes supported by other partners, areas eligible for MDA were often identified by purposively sampling communities near known or suspected Simulium breeding sites. Prevalence of onchocerciasis was estimated via skin snip biopsy to detect the presence of microfilariae under standardised protocols (for OCP) or nodule palpation (onchocercoma), the latter leading to the rapid epidemiological mapping for onchocerciasis (REMO) tool [3]. This approach was generally successful at identifying foci with moderate to high levels of transmission (nodule prevalence ≥20%) [4], but is less sensitive in low-prevalence settings [5].

In 2012, the paradigm for onchocerciasis programmes began to shift from control to elimination [6]. Recent evidence from the Americas [7,8] and Africa [9,10] has shown that annual or semi-annual MDA reaching at least 80% of the eligible population may halt transmission after a period of variable duration (in part determined by baseline endemicity) [11], achieving local elimination in some foci [12]. To ultimately achieve elimination of transmission, therefore, endemic areas must first be correctly delineated to ensure timely initiation of interventions [13]. Various stakeholders are now exploring the feasibility of eliminating onchocerciasis across Africa [14], and several methods are under consideration to identify areas eligible for MDA. As of 2018, the Expanded Special Project for Elimination of Neglected Tropical Diseases (ESPEN) at the World Health Organization Africa Region (WHO-AFRO) identified approximately 2 400 implementation units (IUs), typically second administrative-level units (such as districts), for which endemicity status is uncertain due to a lack of current prevalence data. Of these, 1 651 IUs have never received ivermectin MDA, and 783 IUs currently receive ivermectin (plus albendazole) as part of LF programmes, or may be under post-MDA surveillance for LF [15]. The objective of this analysis was to predict to what extent these IUs of uncertain endemicity status were likely to be environmentally suitable for onchocerciasis. The results of this analysis could be used by national programmes and implementing partners to identify priority areas for mapping surveys.


Data inputs

We first constructed an analytical dataset of locations at which onchocerciasis has been detected (‘occurrences’). The case definition of an occurrence included any geo-referenced data point or polygon (i.e., areal data) for which at least one person tested positive using any of the following diagnostics to measure prevalence of onchocerciasis infection or onchocerciasis-related disease: nodule palpation; skin snip microscopy; onchocerciasis-related eye or skin disease; or Ov16 seropositivity, as well as other diagnostic tests (see S1 Text). Two alternative case definitions were tested in a sensitivity analysis, described in S1 Text. Inputs were obtained from a systematic literature review of the prevalence of human onchocerciasis and onchocerciasis-related morbidity published from 1975 to 2017, detailed elsewhere [16]. Since the majority of onchocerciasis prevalence data were reported by national onchocerciasis control and elimination programs for the purposes of programme monitoring, we also extracted prevalence data from the ESPEN [15] online portal. We further requested prevalence data collected under the OCP–operational in West Africa from 1974 to 2002 –from its former Director, BA Boatin, PhD (personal communication, January 2019). Locations missing geographical information (i.e., latitude-longitude for community-level prevalence or a shapefile for areal data) were geo-referenced following the procedures described in Hill et al.[16] Further details on the dataset are presented in S1 Text.

For the boosted regression tree (BRT) [17] model to compare against a set of control conditions, we must also provide it with examples of environmental conditions where onchocerciasis has not been detected. Since methods used to detect onchocerciasis included skin biopsy and nodule palpation, it is possible that reports of zero prevalence may not be true absences, particularly among areas of low prevalence, due to low sensitivity of these methods [18]. Rather than use reported absence data (which may include false negatives), we therefore randomly sampled background data to provide a contrast signal, re-implementing protocols from prior ecological and epidemiological species distribution analyses akin to pseudo-absence data [19]. Background points were sampled independently across 100 bootstraps and uniformly from within 100 km of the input data locations (polygon boundaries and point locations) such that the number of samples from each region (defined as a 100-km buffer from an occurrence location) matched the number of occurrence records associated with it. Since the 100-km regions would overlap with the IUs for which endemicity status was known, we did not sample from IUs considered endemic for onchocerciasis, and avoided sampling within polygonal locations in the occurrence dataset. We used a shapefile provided by ESPEN to identify IUs to conduct background sampling (S1 Text).


Ten covariates were included in the analysis based on known evidence of an association with presence of the vector. These included variables that represent climatic factors (aridity, precipitation, and daytime temperature), vegetation (enhanced vegetation index, tasseled cap brightness, and tasseled cap wetness), breeding sites near fast-flowing rivers (distance to rivers, slope, and elevation) and transmission occurring in rural areas (urbanicity). Details regarding covariate selection, source information, and visualisations are included in S2 Text and S4 Table.

Statistical analysis

Since the purpose of the analysis was to predict environmental suitability of onchocerciasis among countries for which onchocerciasis endemicity was uncertain, we excluded all IUs from countries considered entirely non-endemic (as reported by ESPEN): Algeria, Botswana, Cabo Verde, Eritrea, Eswatini, the Gambia, Lesotho, Madagascar, Mauritius, Mauritania, Seychelles, South Africa, and Zimbabwe (see S1 Text). The rationale for this exclusion was to avoid selecting locations in the background sample that would introduce extreme covariate values into the analysis (such as the Sahara desert). To predict the environmental suitability of onchocerciasis, we employed 100 optimised BRT models [17] to produce a final, bootstrap aggregated BRT model. The BRT method models environmental suitability of onchocerciasis transmission as a suitability index (from 0 to 1) based on the values of environmental covariates at the locations corresponding to occurrence inputs. We first employed Bayesian parameter optimisation [17,2022] to select values for three hyperparameters needed to implement the BRT method: the number of leaves of each learned tree (tree complexity), the weighting assigned to previously learned models (learning rate), and number of trees. Additional details on hyperparameter selection and the BRT methodology are presented S3 Text. Once final hyperparameter values were selected, we then implemented 100 BRT models to generate the mean, lower 2·5th percentile, and 97·5th percentile predictions for every 5 × 5-km location. We used covariate values from the year 2016 to generate predictions across the entire geographical extent of the analysis.

The final models for the environmental suitability of onchocerciasis were evaluated based on the average root mean square error (RMSE) and area under the receiver operating curve (AUC) of each bootstrap and the relative influence and marginal effect curve of each covariate. This allowed us to estimate the significance within the model of each environmental factor and its behaviour relative to all other covariates and to the covariate values associated with the input data.

We comply with the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER) [23] as outlined in the S1 Table. Complete information on data sources is available from the Global Health Data Exchange. Related statistical code for R 3·1·2 is available at GitHub. Maps were produced using ArcGIS Desktop 10.6.

Thresholding and summarisation by implementation unit

We used the existing data set to identify an optimised threshold value to classify the modelled environmental suitability index into a binary presence/absence classification [24]. The threshold selected was the value that minimised the classification error associated with the model, and therefore most appropriately indicated reported occurrences to be in locations predicted to be environmentally suitable, and sampled background inputs to be less suitable. Using the receiver operating characteristic (ROC) curve, we evaluated the sensitivity and specificity trade-off possible values from 0 to 1, finding the value which minimised the distance to (0, 1) on the ROC plot. Since an IU (or area within an IU) would generally qualify for MDA if any location were identified to have evidence of onchocerciasis transmission through primary data collection, we summarised each IU as a function of this binary classification based on the value of the 5 × 5-km location with the largest mean prediction within each IU. We then estimated the posterior probability that a single IU would include any location that exceeded the threshold identified. We compared these with the reported endemicity status available in the shapefile used to conduct the background sample. Addition details are presented in the S9 Fig.


The final dataset contained 13 382 occurrence records: 11 094 from ESPEN; 689 non-ESPEN data points (from BK Mayala, PhD at the Demographic and Health Surveys (DHS) Program, personal communication); 863 from the systematic review; and 736 from OCP historical records. Of all these records, 128 were georeferenced as polygons. A summary of the original data extracted by diagnostic used, year, and country is presented in S1 Fig and S2 and S3 Tables. After de-duplication across all sources, a total of 987 records reported skin snip examination, 12 161 nodule palpation, 155 onchocerciasis-related skin or eye disease, and 98 reported serological antibody testing (Ov16 ELISA or rapid diagnostic test, RDT). Fig 1 presents a map of locations of the occurrence and background sample. Overall, the Democratic Republic of the Congo (DRC) and Nigeria reported what amounted to be 50% of occurrence locations (3 933 locations (29%) in the DRC; 2 755 locations (21%) in Nigeria).

Fig 1.

Location of data sources: (a) Location of occurrence data points are visualised in blue. (b) Locations chosen for the background sample are mapped in red. The background sample represents the locations chosen to compare against the occurrence data points. IUs for which endemicity status was uncertain and mapping surveys are considered were excluded from selection. Due to the density of background points chosen, they appear as polygon data in the map. Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population. Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at

Results at the 5 × 5-km resolution

The results of the environmental suitability model for Africa are presented in Fig 2. The model results show higher environmental suitability across most of the southern half of West Africa, the DRC, South Sudan, and western Ethiopia, as well as in areas of Tanzania and Mozambique. The highest 5 × 5-km grid-cell-level predictions were observed in Equatorial Guinea, Nigeria, the DRC, and Cameroon (>0·98). The mean predictions show areas suitable for transmission that generally agree with previous model-based geostatistical analyses [25,26]. In West Africa, our model predictions suggest high suitability in areas compared to a previous model [25] predicting lower endemicity in Liberia, northwest Ghana, and northern Guinea. In other regions, we predict high suitability in the area bordering the Republic of the Congo, the DRC, and Angola, as well as in eastern and southern Malawi, northern Nigeria, western Kenya, and eastern Central African Republic; these areas were predicted to have low prevalence in prior estimates [26]. Model fit statistics for AUC were 0·90 and RMSE was 0·38. The marginal effects of the covariates were highest for aridity (0·22), precipitation (0·15), and elevation (0·16). Covariate influence plots, and results of the sensitivity analyses are included in S3S9 Figs. Model results are available via and country-level map results are included in S1 Appendix.

Fig 2.

Environmental suitability predictions: Visualisation of (a) mean, (b) lower 95% uncertainty interval, and (c) upper 95% uncertainty interval. Environmental suitability index predicted by the model is bounded from 0% (low) to 100% (high). Countries in grey were excluded from the analysis. Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population. Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at

Results at the IU level

The ROC analysis identified a mean prediction of ≥0·71, best agreed with the location of occurrences, and was used to classify IUs as environmentally suitable. We identified a total 3 087 IUs that include at least one location for which the mean model results suggest environmental suitability for onchocerciasis transmission. We summarise this across the four types of IU endemicity status (as reported at the time of this analysis) in Table 1. Overall, the environmental suitability predictions are concordant with areas previously identified as endemic or non-endemic when classifying based on the maximum mean grid-cell-level prediction within the boundaries of an IU. Of the IUs considered for elimination mapping, a total of 828 IUs (50·2%) had environmental suitability predictions that exceeded the 0·71 threshold in at least one grid cell. The majority of IUs with high environmental suitability are located in Angola (81 IUs), the DRC (191 IUs), Ethiopia (94 IUs), Kenya (89 IUs), and Nigeria (79 IUs). Among IUs currently under MDA with ivermectin for the purpose of LF elimination, 495 (63%) were predicted to have at least one grid cell exceeding 0·71. In Fig 3, we present the posterior probability that a location within IUs exceeds the threshold, incorporating uncertainty into the classification of ‘suitable’.

Fig 3. Posterior probability any location with Implementation Units (IU) exceeds the threshold for suitability.

The posterior probability (%) of an IU including a location that exceeds the 0·71 threshold used to identify areas of suitability is estimated from the 100 BRT bootstraps. Areas in red are less likely to have at least one location defined as suitable, areas in blue are more likley to include environmentally suitable locations. Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population. Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at

Table 1. Comparison of implementation unit (IU) classification using reported endemicity versus modelled environmental suitability model.

Within-IU differences in environmental suitability

While programmatic decisions typically occur at the IU level, we summarised the range of predictions to determine if any IUs identified as suitable were the result of smaller areas of high predicted suitability, as this would suggest IUs for which there may be foci of transmission as opposed to potential transmission across the entire IU. Among IUs that exceeded the 0·71 threshold, the range of predictions within IU borders was as large as 0·95 (suggesting a high degree of variation within the IU). Only 20% of the IUs identified as suitable had mean predictions that exceeded the threshold across all locations within IU boundaries. The range of mean suitability is presented S3 and S5 Figs.


The environmental suitability model identified approximately half of the IUs currently considered for elimination mapping surveys as environmentally similar to areas for which onchocerciasis has been previously detected. These results suggest that a large proportion of IUs considered for mapping might be of lower priority for survey implementation, particularly for programmes in countries such as Ethiopia that may need to survey as many as 461 IUs. Using the results of this analysis, the national program in Ethiopia could prioritise data collection for the 94 IUs that exceeded the threshold. While not the primary target of inference, these results also suggest that over half of the IUs currently under ivermectin MDA for the purposes of elimination of LF as a public health problem may also be suitable for onchocerciasis. Given the goal of elimination of transmission, the duration of MDA required for onchocerciasis is longer than what is implemented for LF. Surveys of onchocerciasis prevalence would be warranted in these IUs to determine if MDA with ivermectin should continue, or else risk the reductions in onchocerciasis prevalence achieved through LF program interventions may be lost owing to interruption of MDA.

While the model results do not completely rule out the need to conduct mapping surveys in areas predicted to have less suitability, they may enable prioritisation of survey implementation planning, assuming some constraints on survey effort in space and time. Standard guidelines for determining IU eligibility for MDA are currently under development, with methods such as purposive sampling of villages based on proximity to breeding sites (‘first-line villages’) and IU-level cluster random surveys under consideration. Regardless of the mapping strategy chosen to identify IUs warranting MDA, NOEPs will likely need information to prioritise IUs for implementation of surveys. Prioritisation of data collection activities, such as considering environmental suitability along with other factors such as proximity to endemic districts and presence of existing programme infrastructure, could result in more efficient resource deployment, particularly given the costs of fieldwork, seasonal constraints such as weather and other health system activities, as well as demands on lab capacity should confirmation of Ov16 RDT results with ELISA be required. Ultimately, prioritisation of IUs likely to be endemic would also enable more rapid scale-up of MDA (or other strategies such as ‘test-and-treat’ in areas co-endemic for Loa loa filariasis); once evidence of onchocerciasis transmission is available, donated ivermectin can be deployed. We recommend national programs compare these results with data on Loa loa prevalence to inform intervention strategies in co-endemic settings.

Environmental suitability predictions, when overlaid with maps or satellite imagery of settlements, may also be useful to NOEPs for other applications aside from IU-level decision making. They may wish to consider use of these results to identify areas for sampling within IUs, especially those with no first-line village or breeding site information. Among the IUs for which our estimates suggest environmental suitability is high, in settings such as the DRC, Malawi, and Nigeria, predictions were variable within individual IUs, suggesting that differences within IUs may require consideration during baseline survey site selection or could inform sampling strategies. Once mapping for onchocerciasis in these areas is conducted, newly collected data can be used to validate this model’s performance (akin to a natural hold-out) and then subsequently included in future updates to improve the quality of predictions.


With over 13 000 occurrence point inputs, we exceed the number of inputs used for other global analyses of environmental suitability for disease transmission (e.g., dengue [27] and leishmaniasis [28]). The method employed to select values for the BRT hyperparameters avoids the computational demands of implementing a deterministic grid search and is superior to a random search, combining faster computation with iterative selection of the best hyperparameter values [29]. Conceptually, the choice to employ a background sample rather than observed absence data is analogous to certain case-control designs, where the controls are selected to represent the exposure distribution among the source population that gave rise to the cases. In this context, the background sample represents covariate patterns that describe onchocerciasis-endemic countries generally and allows the model to test for associations between covariate patterns identified across all occurrence locations within that setting. Further, since the background sample replaces the observed absence data (where no individuals test positive at a given location), we avoid bias from diagnostic procedures that have poor sensitivity and specificity, such as nodule palpation. It is possible that some observed absences measured by nodule palpation were false negatives [30] or positives. By comparing the IUs known to be currently or historically endemic and those known to be non-endemic with the results of this analysis, we showed strong agreement, suggesting that the model has accurately characterised IUs for which knowledge of endemicity status exists. Finally, this analysis transforms detailed 5 × 5-km-level predictions into IU-level results, which is the unit of programmatic decision making.


The primary limitation of this analysis is that we predict mean values of environmental suitability as an index, a measure not directly comparable to other quantities. This analysis can only indicate how similar a location may be (relative to the covariates included in the analysis) to other locations where onchocerciasis has previously been detected; it does not predict the magnitude of infection. We define suitable regions to be any area with a mean estimate exceeding the optimal threshold. In this way, IUs with ‘high suitability’ are defined relative to each bootstrap and not relative to one another. Other thresholds could be used to aggregate these results to characterise individual IUs. Second, it is possible that covariate patterns identified as suitable for onchocerciasis transmission are biased towards IUs of higher prevalence, as data collection for onchocerciasis control prioritised identification of areas with greater morbidity, generally associated with higher levels of infection prevalence. It is also possible the model will predict high environmental suitability among locations similar to onchocerciasis-endemic settings even if the location lacks the vector or the parasite, or among settings where transmission has been eliminated. We encourage NOEPs to consider the model results alongside programmatic evidence. Third, it is also possible that the covariate patterns at the 5 × 5-km resolution do not adequately capture the specific ecological niche for Simulium breeding sites in all settings, and the vector can travel beyond the 5 km range [31]. Simulium abundance data is not available for the entire African continent nor is it available per unique species; we therefore rely on other covariates as proxies to represent ecological conditions that might be suitable to the vector. In some settings, smaller rivers may serve as viable breeding sites and future analysis should consider more detailed hydrological data sources. Matching covariate values for temperature, precipitation, enhanced vegetation index, urbanicity, tasseled cap wetness, and tasseled cap brightness to input data by year of data collection was not possible for occurrence data pre-2000 (approximately 20% of the inputs), which may also introduce bias for areas where substantial changes have occurred, although use of annual mean values would be less sensitive to seasonal variation from year to year. We are unable to account for temporal shifts in river locations, but note that calculating distance to rivers at the 5x5km spatial scale likely reduces the potential error. Remote sensing methods have been used to generate a spectral signal to identify potential breeding sites at a much finer spatial scale [32] (0.6m2), but it would be computationally infeasible to use those inputs for a continental analysis. Fourth, there may be potential differences in the ecological niche of onchocerciasis in West Africa compared to Central or East Africa driven by forest or savannah habitats [33]. Due to the limited data available for West Africa (beyond Nigeria), conducting separate sub-continental analyses was not feasible, particularly for the former OCP areas. Finally, BRT models are highly sensitive to the selection of inputs; results may vary by the case definition of an occurrence. Our sensitivity analysis (see S3−S6 Figs) did not result in qualitatively different results in mean predictions. We further did not exclude occurrence inputs reporting Ov16 seropositivity, which may not represent contemporary transmission in cases where only one or two individuals test positive. Inclusion of nodule palpation data could also be subject to bias in areas of low endemicity [30]. Less than 1% of the input data reported using serological tests and exclusion of these data from preliminary models resulted in negligible differences in the results (results not shown). It was also not possible to review the original source documentation of data reported via the ESPEN portal or historical data from OCP; bias may have been introduced if those sources contain inaccuracies. There were also not sufficient entomological monitoring data available to compare against all human prevalence data to include evidence of transmission in the vector population as part of our case definition of an occurrence. The background sample cannot account for possible bias in the occurrence data which may have been selected preferentially with respect to locations suspected or known to be endemic. For this reason, we do not recommend the model results be used to exclude any location from mapping to determine program eligibility for MDA or other interventions, but rather to use the model as a tool for prioritization and comparison alongside other data sources.

There are additional complexities that programmes should consider if these model results are used for planning. In settings where population movement due to factors such as conflict, instability, or seasonal migration results in transmission of infection occurring at locations distant from settlements, the results of survey mapping and the model estimates may be discordant. It will be important for NOEPs to characterise such communities, particularly if diagnostics that detect Ov16 seropositivity are used in adult populations, as individuals may test positive if they have been exposed to onchocerciasis earlier in life at other locations. Since the model can only identify locations for which covariate patterns are similar to areas for which onchocerciasis has previously been detected, we would encourage NOEPsto interpret these model results as a mechanism to prioritise surveys, not as a substitute for primary data collection.


The shift from morbidity control in meso- to hyper-endemic areas to eliminating [13] transmission at the pan-African scale provides a unique opportunity to develop and validate a model to help NOEPs identify endemic IUs with greater efficiency. Our analysis expands upon prior work [25,26] to incorporate more data sources, generate predictions for the entire set of countries suspected or known to be onchocerciasis-endemic (not only areas defined by regional control programmes), and translates detailed spatial predictions into IU-level results for use in program decision making. The large scale of data collection required throughout the African continent to achieve elimination of transmission can benefit from modelled estimates of environmental suitability to facilitate programme planning. If IUs most likely to sustain transmission of onchocerciasis can be surveyed earlier, this may result in faster MDA initiation in those areas. Evidence from settings where local elimination of transmission has already been achieved suggests that at least 10 to 15 years of high coverage MDA may be required under annual or twice-yearly treatment [11,12,34]. To reach elimination of onchocerciasis transmission, it is imperative that districts in need of MDA are identified as quickly as possible.

Supporting information

S1 Fig. Data coverage by year.

Here we visualise the volume of data used in the analysis by country and year. Larger circles indicate more data inputs. ‘NA’ indicates records for which no year was reported (eg, ‘pre-2000’).


S2 Fig. Illustration of covariate values for year 2000.

Maps were produced using ArcGIS Desktop 10.6.


S3 Fig. Environmental suitability of onchocerciasis including locations that have received MDA for which no pre-intervention data are available.

This plot shows suitability predictions from green (low = 0%) to pink (high = 100%), representing those areas where environmental conditions are most similar to prior pathogen detections. Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population. Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at


S4 Fig. Environmental suitability prediction uncertainty including locations that have received MDA for which no pre-intervention data are available.

This plot shows uncertainty associated with environmental suitability predictions colored from blue to red (least to most uncertain). Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population. Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at


S5 Fig. Environmental suitability of onchocerciasis excluding morbidity data.

This plot shows suitability predictions from green (low = 0%) to pink (high = 100%), representing those areas where environmental conditions are most similar to prior pathogen detections. Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population. Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at


S6 Fig. Environmental suitability prediction uncertainty excluding morbidity data.

This plot shows uncertainty associated with environmental suitability predictions colored from blue to red (least to most uncertain). Countries in grey with hatch marks were excluded from the analysis based on a review of national endemicity status. Areas in grey only represent locations masked due to sparse population.


S7 Fig. Covariate Effect Curves for all onchocerciasis occurrences (measures of infection prevalence and disability).

On the right set of axes we show the frequency density of the occurrences taking covariate values over 20 bins of the horizontal axis. The left set of axes shows the effect of each on the model, where the mean effect is plotted on the black line and its uncertainty is represented by the upper and lower confidence interval bounds plotted in dark grey. The figures show the fit per covariate relative to the data that correspond to specific values of the covariate.


S8 Fig. Covariate Effect Curves for all onchocerciasis occurrences (measures of infection prevalence and disability).

On the right set of axes we show the frequency density of the occurrences taking covariate values over 20 bins of the horizontal axis. The left set of axes shows the effect of each on the model, where the mean effect is plotted on the black line and its uncertainty is represented by the upper and lower confidence interval bounds plotted in dark grey.


S9 Fig. ROC analysis for threshold.

Results of the area under the receiver operating characteristic (ROC) curve analysis are presented below, with false positive rate (FPR) on the x-axis and true positive rate (TPR) on the y-axis. The red dot on the curve represents the location on the curve that corresponds to a threshold that most closely agreed with the input data. For each of the 100 BRT models, we estimated the optimal threshold that maximised agreement between occurrence inputs (considered true positives) and the mean model predictions as 0·71.


S1 Table. Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER) checklist.


S2 Table. Total number of occurrence data classified as point and polygon inputs by diagnostic.

We present the total number of occurrence points extracted from the input data sources by diagnostic type. ‘Other diagnostics’ include: DEC Patch test; Knott’s Method (Mazotti Test); 2 types of LAMP; blood smears; and urine tests.


S3 Table. Total number of occurrence data classified as point and polygon inputs by location.


S1 Text. Details outlining construction of occurrence dataset.


S3 Text. Boosted regression tree methodology additional details.


S1 Appendix. Country-level maps and data results.

Maps were produced using ArcGIS Desktop 10.6 and shapefiles to visualize administrative units are available at



Data accessed via the ESPEN portal belong to the Ministries of Health and we would like to acknowledge the national onchocerciasis control and elimination programmes for making these data available.


  1. 1. World Health Organization. Progress report on the elimination of human onchocerciasis, 2016–2017. Weekly Epidemiological Record 2017; 45: 681–94. pmid:29130679
  2. 2. Lawrence J, Sodahlon YK, Ogoussan KT, Hopkins AD. Growth, challenges, and solutions over 25 years of Mectizan and the impact on onchocerciasis control. PLoS Negl Trop Dis 2015; 9: e0003507. pmid:25974081
  3. 3. Noma M, Nwoke BEB, Nutall I, et al. Rapid epidemiological mapping of onchocerciasis (REMO): its application by the African Programme for Onchocerciasis Control (APOC). Ann Trop Med Parasitol 2002; 96 Suppl 1: S29–39.
  4. 4. Duerr HP, Raddatz G, Eichner M. Diagnostic value of nodule palpation in onchocerciasis. Trans R Soc Trop Med Hyg 2008; 102: 148–54. pmid:18082234
  5. 5. Vlaminck J, Fischer PU, Weil GJ. Diagnostic tools for onchocerciasis elimination programs. Trends in Parasitology 2015; 31: 571–82. pmid:26458784
  6. 6. Mackenzie CD, Homeida MM, Hopkins AD, Lawrence JC. Elimination of onchocerciasis from Africa: possible? Trends Parasitol 2012; 28: 16–22. pmid:22079526
  7. 7. Sauerbrey M, Rakers LJ, Richards FO. Progress toward elimination of onchocerciasis in the Americas. Int Health 2018; 10: i71–8. pmid:29471334
  8. 8. Gonzalez RJ, Cruz-Ortiz N, Rizzo N, et al. Successful interruption of transmission of Onchocerca volvulus in the Escuintla-Guatemala focus, Guatemala. PLOS Neglected Tropical Diseases 2009; 3: e404. pmid:19333366
  9. 9. Zarroug IMA, Hashim K, ElMubark WA, et al. The first confirmed elimination of an onchocerciasis focus in Africa: Abu Hamed, Sudan. Am J Trop Med Hyg 2016; 95: 1037–40. pmid:27352878
  10. 10. Lakwo TL, Garms R, Rubaale T, et al. The disappearance of onchocerciasis from the Itwara focus, western Uganda after elimination of the vector Simulium neavei and 19 years of annual ivermectin treatments. Acta Trop 2013; 126: 218–21. pmid:23458325
  11. 11. Coffeng LE, Stolk WA, Hoerauf A, et al. Elimination of African onchocerciasis: modeling the impact of increasing the frequency of ivermectin mass treatment. PLoS ONE 2014; 9: e115886. pmid:25545677
  12. 12. Verver S, Walker M, Kim YE, et al. How can onchocerciasis elimination in Africa be accelerated? Modeling the impact of increased ivermectin treatment frequency and complementary vector control. Clin Infect Dis 2018; 66: S267–74. pmid:29860291
  13. 13. Cantey PT, Roy SL, Boakye D, et al. Transitioning from river blindness control to elimination: steps toward stopping treatment. Int Health 2018; 10: i7–13. pmid:29471338
  14. 14. Lawrence J, Sodahlon YK. Onchocerciasis: the beginning of the end. Int Health 2018; 10: i1–2. pmid:29471347
  15. 15. World Health Organization: Regional Office for Africa. Onchocerciasis. ESPEN. 2019. (accessed May 14, 2019).
  16. 16. Hill E, Hall J, Letourneau ID, et al. A database of geopositioned onchocerciasis prevalence data. Sci Data 2019; 6: 1–6. pmid:30647409
  17. 17. Elith J, Leathwick JR, Hastie T. A working guide to boosted regression trees. Journal of Animal Ecology 2008; 77: 802–13. pmid:18397250
  18. 18. Unnasch TR, Golden A, Cama V, Cantey PT. Diagnostics for onchocerciasis in the era of elimination. Int Health 2018; 10: i20–6. pmid:29471336
  19. 19. Phillips SJ, Dudík M, Elith J, et al. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecol Appl 2009; 19: 181–97. pmid:19323182
  20. 20. Head T, MechCoder, Louppe G, et al. scikit-optimize/scikit-optimize: v0.5.2. Zenodo, 2018
  21. 21. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N. Taking the human out of the loop: a review of Bayesian optimization. Proceedings of the IEEE 2016; 104: 148–75.
  22. 22. Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis 2002; 38: 367–78.
  23. 23. Stevens GA, Alkema L, Black RE, et al. Guidelines for Accurate and Transparent Health Estimates Reporting: the GATHER statement. The Lancet 2016; 388: e19–23.
  24. 24. Liu C, Berry PM, Dawson TP, Pearson RG. Selecting thresholds of occurrence in the prediction of species distributions. Ecography 2005; 28: 385–93.
  25. 25. O’Hanlon SJ, Slater HC, Cheke RA, et al. Model-based geostatistical mapping of the prevalence of Onchocerca volvulus in West Africa. PLOS Neglected Tropical Diseases 2016; 10: e0004328. pmid:26771545
  26. 26. Zouré HG, Noma M, Tekle AH, et al. The geographic distribution of onchocerciasis in the 20 participating countries of the African Programme for Onchocerciasis Control: (2) pre-control endemicity levels and estimated number infected. Parasites & Vectors 2014; 7: 326. pmid:25053392
  27. 27. Bhatt S, Gething PW, Brady OJ, et al. The global distribution and burden of dengue. Nature 2013; 496: 504–7. pmid:23563266
  28. 28. Pigott DM, Bhatt S, Golding N, et al. Global distribution maps of the leishmaniases. eLife 2014; 3: e02851. pmid:24972829
  29. 29. Snoek J, Larochelle H, Adams RP. Practical Bayesian Optimization of Machine Learning Algorithms. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012: 2951–2959.
  30. 30. Coffeng LE, Pion SDS, O’Hanlon S, et al. Onchocerciasis: the pre-control association between prevalence of palpable nodules and skin microfilariae. PLoS Negl Trop Dis 2013; 7. pmid:23593528
  31. 31. Thompson BH. Studies on the flight range and dispersal of Simulium damnosum (Diptera: Simuliidae) in the rain-forest of Cameroon. Ann Trop Med Parasitol 1976; 70: 343–54. pmid:971003
  32. 32. Jacob BG, Novak RJ, Toe LD, et al. Validation of a remote sensing model to identify simulium damnosum s.l. breeding sites in sub-Saharan Africa. PLOS Neglected Tropical Diseases 2013; 7: e2342. pmid:23936571
  33. 33. Choi Y-J, Tyagi R, McNulty SN, et al. Genomic diversity in Onchocerca volvulus and its Wolbachia endosymbiont. Nat Microbiol 2016; 2: 16207. pmid:27869792
  34. 34. Walker M, Stolk WA, Dixon MA, et al. Modelling the elimination of river blindness using long-term epidemiological and programmatic data from Mali and Senegal. Epidemics 2017; 18: 4–15. pmid:28279455