Model-Based Geostatistical Mapping of the Prevalence of Onchocerca volvulus in West Africa

Background The initial endemicity (pre-control prevalence) of onchocerciasis has been shown to be an important determinant of the feasibility of elimination by mass ivermectin distribution. We present the first geostatistical map of microfilarial prevalence in the former Onchocerciasis Control Programme in West Africa (OCP) before commencement of antivectorial and antiparasitic interventions. Methods and Findings Pre-control microfilarial prevalence data from 737 villages across the 11 constituent countries in the OCP epidemiological database were used as ground-truth data. These 737 data points, plus a set of statistically selected environmental covariates, were used in a Bayesian model-based geostatistical (B-MBG) approach to generate a continuous surface (at pixel resolution of 5 km x 5km) of microfilarial prevalence in West Africa prior to the commencement of the OCP. Uncertainty in model predictions was measured using a suite of validation statistics, performed on bootstrap samples of held-out validation data. The mean Pearson’s correlation between observed and estimated prevalence at validation locations was 0.693; the mean prediction error (average difference between observed and estimated values) was 0.77%, and the mean absolute prediction error (average magnitude of difference between observed and estimated values) was 12.2%. Within OCP boundaries, 17.8 million people were deemed to have been at risk, 7.55 million to have been infected, and mean microfilarial prevalence to have been 45% (range: 2–90%) in 1975. Conclusions and Significance This is the first map of initial onchocerciasis prevalence in West Africa using B-MBG. Important environmental predictors of infection prevalence were identified and used in a model out-performing those without spatial random effects or environmental covariates. Results may be compared with recent epidemiological mapping efforts to find areas of persisting transmission. These methods may be extended to areas where data are sparse, and may be used to help inform the feasibility of elimination with current and novel tools.


Methods and Findings
Pre-control microfilarial prevalence data from 737 villages across the 11 constituent countries in the OCP epidemiological database were used as ground-truth data. These 737 data points, plus a set of statistically selected environmental covariates, were used in a Bayesian model-based geostatistical (B-MBG) approach to generate a continuous surface (at pixel resolution of 5 km x 5km) of microfilarial prevalence in West Africa prior to the commencement of the OCP. Uncertainty in model predictions was measured using a suite of validation statistics, performed on bootstrap samples of held-out validation data. The mean Pearson's correlation between observed and estimated prevalence at validation locations was 0.693; the mean prediction error (average difference between observed and estimated values) was 0.77%, and the mean absolute prediction error (average magnitude of difference between observed and estimated values) was 12.2%. Within OCP boundaries, 17.8 million

Introduction
Human onchocerciasis (river blindness) is a parasitic infection caused by the filarial nematode Onchocerca volvulus. Onchocerciasis belongs to the group of neglected tropical diseases (NTDs). In January 2012, the London Declaration on NTDs [1], inspired by the World Health Organization (WHO) Road Map to overcome the impact of these diseases [2], committed to sustain, expand and extend the necessary supply of drugs and other interventions to help control onchocerciasis throughout Africa, and achieve elimination in selected African countries by 2020, with global elimination/eradication targeted by 2040 [1,2,3]. The life cycle of O. volvulus depends on transmission by biting flies of the genus Simulium. Persistent infection can lead to gradual loss of visual acuity, eventually leading to blindness, and to the development of a variety of severe and debilitating skin lesions [4,5]. The infection is also a cause of excess human mortality both indirectly through blindness and directly through (still poorly understood mechanisms caused by) heavy infection in sighted individuals [6,7].
Onchocerciasis has previously been highly prevalent across much of sub-Saharan Africa and in foci in Latin America and the Yemen [8]. Under the umbrella of the WHO, the Onchocerciasis Control Programme in West Africa (OCP) deployed large-scale antivectorial and/or antiparasitic interventions against onchocerciasis from 1975 to 2002 to control the disease in initially 7 (original area) and finally 11 (including extension areas) West African countries ( Fig  1A) [9]. The implementation of vector control, by weekly larviciding of Simulium damnosum s.l. breeding sites for a period at least as long as the estimated longevity of the parasite, led to the interruption of transmission across large swathes of the OCP area [10,11]. From 1990 onwards, vector control began to be stopped in those basins that had received at least 14 years of larviciding, and in which the epidemiological and entomological trends were satisfactory [12]. However, because of invasion by infective flies, some rivers on the border of the original area continued to be treated beyond the hypothetical period of 14 years, and up to the time the extension areas were set up. Although onchocerciasis ceased to constitute a public health problem in the original OCP area, transmission persisted in others, and towards the end of the OCP in 2002, special intervention zones were established to continue larviciding and/or to intensify distribution of the microfilaricidal drug ivermectin [13].
Currently, national control programmes in the former OCP and the WHO African Programme for Onchocerciasis Control (APOC) are shifting their focus from morbidity control to elimination of the infection where feasible through community-directed treatment with ivermectin (CDTI). Using 15-17 years of annual (or biannual) ivermectin distribution in the absence of vector control, elimination has been documented in some OCP foci in Mali and Senegal [14,15], and some APOC foci in Nigeria [16], providing proof-of-principle for the goals set by the WHO for elimination of onchocerciasis in selected African countries by the year 2020 [2]. However, a key question that needs urgent investigation is where else will it be feasible to achieve the 2020 goals in the former OCP and APOC areas using current intervention strategies, and if not, by when these goals could be achieved, or what other (novel or complementary) strategies are required.
Among the main determinants of the duration of ivermectin-based control and the feasibility of its elimination is the initial, pre-control, level of endemicity [17,18]. Endemicity levels are usually categorised in terms of the community prevalence of infection with O. volvulus microfilariae, with the community microfilarial load (CMFL) sometimes also used. (A highly non-linear relationship exists between infection prevalence and infection intensity, such that at high infection prevalence the CMFL may vary widely.) CMFL is calculated as the geometric mean number of microfilariae per skin snip in a cohort of adults aged 20 years and over, and is an indicator of the intensity of infection [19]. CMFL is a more accurate indicator of the true epidemiological situation in a population than prevalence, as for the reasons mentioned earlier there can be substantial variability in CMFL values at high prevalences. However, prevalence data are more widely available. Microfilarial prevalence is positively correlated with transmission intensity, measured in terms of the annual transmission potential (ATP)-the yearly number of infective larvae of O. volvulus potentially received by a person maximally exposed to simuliid bites in the same community, although the relationship between these two variables is also strongly non-linear [20,21].
Therefore, and to further understanding of the feasibility of onchocerciasis elimination, we embarked on the task of mapping the distribution of the baseline microfilarial prevalence in the former OCP area. Recent modelling studies [18,22], have indicated that reaching the operational elimination thresholds suggested by APOC [23] may be feasible, with an annual ivermectin distribution strategy, in areas with a baseline microfilarial prevalence less than 60% (the prevalence value above which there is hyperendemicity), provided that high levels of therapeutic coverage and treatment compliance are maintained. By contrast, in initially highly hyperendemic areas (prevalence of 80% or above), biannual ivermectin distribution or deployment of complementary strategies (e.g. focal vector control if possible; delivery of anti-wolbachial therapies [24], or others [25]) may be necessary. Mapping of infection prevalence is of importance not only for targeting current or novel control efforts appropriately, but also, once local elimination has been achieved or is deemed possible to achieve within proposed time frameworks, to predict areas at risk of recrudescence from geographically contiguous areas still harbouring infection. Finally, mapping the baseline prevalence can help to determine the population at risk and infected prior to the inception of control interventions, essential denominators to calculate the initial burden of disease and its changes as a result of such interventions [26].
The production of continuous prevalence maps is increasingly using model-based geostatistics (MBG) [27], often employing Bayesian inference (B-MBG) for spatial prediction and robust characterisation of uncertainty surrounding those predictions [28]. Detailed prevalence maps have been presented for a variety of directly-and indirectly-transmitted human helminthiases, e.g. [29][30][31][32], and vector-borne diseases, e.g. [28,[33][34][35], including lymphatic filariasis [36]. Zouré et al. (2014) recently presented a model-based geostatistical map of the prevalence of onchocerciasis in countries covered under APOC [37], based on nodule prevalence data obtained from Rapid Epidemiological Mapping of Onchocerciasis (REMO). This map marks substantial progress in mapping the pre-control distribution of onchocerciasis in APOC areas, but does not consider the potential influence of environmental covariates affecting onchocerciasis distribution, particularly in areas still lacking ground-truth data. In the study presented in this paper, we map the pre-control distribution of onchocerciasis-based on skin snip data for microfilarial prevalence-in the area of West Africa formerly covered by the OCP, whilst also extending the geostatistical methods to consider and incorporate important environmental explanatory covariates.
The overall aims of this study are to: a) present the first geostatistical map of human onchocerciasis (microfilarial) prevalence at the inception of the former OCP in West Africa; b) identify important environmental covariates that aid estimation in locations for which no data were collected by the OCP; c) to characterise areas where high transmission occurred prior to control (areas of initially high onchocerciasis endemicity), and d) estimate the number of people living in the OCP area that were at risk and infected with O. volvulus prior to the commencement of the control interventions.

Methods
The methods used in the epidemiological surveys of the OCP have been previously described [38]. At each survey a complete census of the village was conducted, and approximately 84% of persons enumerated in the census were examined [39]. The countries participating in the OCP signed a memorandum of agreement that covered all issues pertaining to the operations and included clearance for epidemiological, parasitological, and ophthalmological surveys. Additionally, a committee consisting of the Chief of Units of OCP ensured that the plans and the operational phases of the OCP. Until 1987 the method of control was almost exclusively anti-vectorial, through larviciding of vector breeding habitats. From 1987 ivermectin was used extensively, and almost exclusively in the Western Extension zones. C) Locations of the village surveys which met the criteria for inclusion in this study. A total of 737 survey sites were selected, from 5,817 surveys. The survey data displayed are coloured by endemicity class of crude microfilarial prevalence in the population aged !5 years. For the cut-off values of microfilarial prevalence corresponding to each endemicity category see Main Text and legend of Fig 3B. methodology of work were correctly followed by the technicians in the field. Communities were free to participate in the taking of skin snip samples. Therefore, our study satisfies the requirements for ethical clearance within the memorandum and the data were provided by BAB (former and last director of the OCP).

Onchocerciasis Prevalence Data
Relevant survey data were extracted from the OCP epidemiological database, which contains 5,816 surveys conducted in 2,581 (geo-referenced) villages across all operational phases of the OCP in the 11 participating countries in West Africa (see Fig A within S1 File for locations of all village sites in the OCP epidemiological database). There were 9 operational phases (summarised in Table 1 and depicted in Fig 1B). A total of 737 villages had parasitological surveys that met the inclusion (pre-control) criteria (see S1 File Text A 'Data description and pre-control data selection criteria' and Fig B therein). Prevalence ranged between 0 and 94.8% with a mean of 51.7% and a median of 56.4%. With the exception of one village, numbers examined per village ranged from 23 to 828 with a median of 171. One village had just 5 individuals examined (prevalence of 40%) but this represented 100% of the recorded population for this village. The geographical distribution and endemicity of the selected villages are shown in Fig  1C, with the endemicity classes defined as non-endemic/sporadic endemicity <10%; hypoendemic onchocerciasis: !10% and <35%; mesoendemic: !35% and < 60%; hyperendemic: ! 60% and < 80%, and highly hyperendemic (or holoendemic): ! 80% microfilarial prevalence. Although in the OCP it was standard epidemiological practice to adjust microfilarial prevalence by age and sex according to the OCP reference population [40], in this study we used crude microfilarial prevalence because counts of persons examined were not available in the data by age cohort. The correlation coefficient between crude and adjusted prevalence was 0.973 (determination coefficient 0.947, Fig C in S1 File), therefore the use of crude prevalence was not expected to bias results.

Geographical Limits of the Mapping
We restricted our mapping to the geographic extent visible in Fig 1A, which encompasses the OCP and immediate surrounding areas in West Africa. Our ground-truth data points, consisting of parasitological survey data from the OCP epidemiological database, were located almost entirely within the boundaries of the OCP, as depicted in Fig 1C ( [9,12], which led to the extensions described in Table 1. River basins in which the blackfly vectors of onchocerciasis can breed were identified by the OCP for the different species of the Simulium damnosum complex [41,42]. We let the northerly limits of the mapping extend to extremely arid, desert conditions where transmission of onchocerciasis is known not to occur, but we do not generate pseudo-absence points in these areas; instead, we allow the model to generate prevalence values based on the environmental factors associated with the data and distance from known (ground-truth) data points. The easterly limit extends to the fringes of Nigeria, where control is under the auspices of APOC and we do not consider estimation beyond these areas. (See Text B 'Parasitological survey methods and geographical limits of the mapping' in S1 File.)

Climatic and Environmental Data
An ensemble of explanatory environmental and climatic covariates which were deemed useful in the estimation of microfilarial prevalence was identified building on the methodology presented in a previous mapping study for another filarial infection [36]. Initial inclusion was broad and partly based on knowledge of parasite and vector ecology (e.g. distance of villages to fast flowing rivers), as well as variables considered to affect their distribution and life cycle (e.g. climatic data). Climatic data were extracted from WorldClim Global Climate Data, http:// www.worldclim.org [43]. Hydrologically-derived variables were extracted and calculated from the Hydro1K database [44,45], and an alternative set of hydrologically-derived variables was extracted from the Rivers of Africa dataset (derived from HydroSheds) [46], which includes alternative metadata about the stream lines. Normalized Difference Vegetation Index (NDVI) data were extracted from the Global Inventory Modeling and Mapping Studies (GIMMS) dataset for the years 1981 to 1986 [47][48][49] (the earliest data available in this dataset). Data layers, generated from the MODIS instrument on board the Terra satellites, were also used. These layers included land surface temperature (day and night), land cover classification, NDVI and enhanced vegetation index (EVI) [49][50][51][52][53]. A layer of vegetation type based on physiognomy and floristic composition from White's Vegetation Map of Africa was also extracted and used [54]. We initially tested layers for altitude derived from digital elevation models (DEMs) from different mapping projects, which were extracted from the Global 30 Arc-Second Elevation (GTOPO30) DEM [55,56] and the Global Land One-km Base Elevation Project (GLOBE) [57]. Where necessary, the original data sources were assembled together to form contiguous coverage of the study area, re-projected to the output map projection (Lambert Azimuthal Equal Area; Fig E in S1 File), re-sampled to 5km x 5km (25 km 2 , the resolution of the output map), cropped to the borders depicted in Fig 1A and, finally, those cells that overlay inland water bodies were masked as they represented no data values. Each pixel of the output map (n = 112, 295), therefore, had a value for each of the environmental covariates. The values of these covariates at each survey village were sampled from these layers. The initial ensemble of covariates tested and the data sources are summarised in Table 2 and further details of the processing of these datasets are described in Text C of S1 File 'Processing of environmental covariates'. Maps of the covariates that were included in the model are presented in Fig F.1 and Fig  F.2 of S1 File Text D 'Maps of environmental covariates'.

Temporal Trends in the Data
Although the baseline survey data used in this study were collected over a period of 16 years (1974-1990), exploratory inspection of pre-control prevalence boxplots by year showed that there was no significant temporal trend. It was, therefore, deemed unnecessary to include the effect of time as these data were assumed to be representative of the pre-control endemic equilibrium situation at the beginning of each OCP phase and commencement of intervention by river basin. Time-varying changes in the environmental covariates were also treated as negligible. The WorldClim dataset uses interpolated data averaged over 50 years, whilst the MODIS data, representing 10-day means, were averaged by ecological quarter (1st: Dec-Feb, 2nd: Mar-May, 3rd: Jun-Aug, 4th: Sep-Nov) across the entire duration of available data at the time of analysis (February 2000-July 2013). There was no discernable non-linear trend in NDVI from MODIS vs. NDVI from the Advanced Very High Resolution Radiometer, which has a much earlier temporal coverage (from 1981) but a coarser spatial resolution (64km 2 pixels). Therefore, the long-term climatic and environmental data were implicitly assumed to be representative of the average conditions present in 1975-1990 even if portions of the covariate data had been collected prior or subsequent to this date.

Spatial Dependence in the Data and Model-Based Geostatistical Analysis
A geostatistical model (used to develop a continuous map of pre-control microfilarial prevalence across the OCP area) is simply a Generalised Linear Mixed Model (GLMM) with environmental covariates to account for the fixed effects, a spatial term that accounts for the spatial correlation in the data, and a random effect term that accounts for the non-spatial variation in the data, yielding a Generalised Linear Spatial Model (GLSM). If we assume that the number of individuals found positive for O. volvulus microfilariae at location x i is Y i out of the total number of people examined by skin snip at such location, n i , then Y i is a binomial random variable, Y i * Bin(n i , p i ) where p i is the proportion of individuals positive for skin microfilariae at location i.
An exploratory tool widely used in geostatistics is the semivariogram [58]. Under the assumption of spatial dependence, there is a relationship between the variance of the difference between pairs of random variables (in this study microfilarial prevalence) and their separation distance, d. The empirical variogram represents the relationship between that separation distance, binned into intervals, and the mean semivariance of infection prevalence between all pairs of locations within each distance bin (Fig 2A). Firstly, we fitted a theoretical variogram model to an empirical variogram controlling for the environmental covariates ( Fig 2B)-using an exponential covariance function-to generate sensible prior values to use for the spatial component of the Bayesian model (see section on 'Prior Specification' below). Second, we used these in the GLSM to fit a model of linear predictors (the environmental variables as fixed effects), conditional on some stochastic process S(Á), i.e., the unobserved, underlying spatial process (which enters the model as a location-specific random offset in the linear predictor). Following [27], modifying the notation in [36], and conditional on S(Á), the responses Y i /n i at locations x i , for i = 1,. . .,k (k = 737 survey villages) are mutually independent random variables whose conditional expectations, , are given by, Semi-variogram analysis of spatial trend in the data. A) Empirical variogram assuming a constant mean trend across the study region (i.e. without the use of environmental covariates). B) The spatial trend present in the data after accounting for the effects of explanatory environmental covariates by incorporating a spatially-varying mean trend surface. The purple dots are the mean semivariance values between all pairs of villages with separation distance contained within a discrete distance bin, with the mid point of each bin used to locate the dots on the x-axis. The red solid line is the fitted theoretical variogram model. The dash-dot blue horizontal lines represent the nugget variance τ 2 , encompassing sources of non-spatial variation. The dash-dot green horizontal lines represent the sill variance, σ 2 , which is the estimated variance of the spatial process S(Á). The dashed orange vertical lines are the range at which the variance asymptotically reaches 95% of its sill value and is given by −ln(0.05)ϕ, where ϕ is the range parameter. Beyond this value the covariance between locations does not significantly depend on separation distance. The grey shaded areas are the Monte Carlo envelopes of variance expected by chance, obtained by randomly reassigning the values of microfilarial prevalence at data locations and calculating the empirical semivariogram. The empirical variogram (A) mostly lies outside this region and, therefore, there is therefore a significant spatial trend present in the data. The trend in semivariogram (B) is fitted to the residuals resulting after adjusting for the set of linear predictors of the environmental covariate model. The reduction in range parameter, compared to the mean trend shows that the environmental covariates account for some (but not all) of the observed spatial variation. As some of such variation still lies outside of the Monte-Carlo envelope of spatially random data, there is still a significant spatial trend. where μ i is the expected value of microfilarial prevalence at location x i ; S(x 1 ),. . .,S(x k ) are the unobserved random variables for the i = 1,. . .,k locations derived from the spatial process, S(Á); d j (x i ) represents the value of the j th environmental variable at location x i ; β j is the regression coefficient of the j th variable, and the link function is denoted by h(Á). For binomial data the link function is the logit function, i.e. h(μ) = log[μ/(1−μ)]. The spatial signal S(Á) has expectation E[S(x)] = 0, variance Var[S(x)] = σ 2 and covariance function, where σ 2 is the 'sill' or maximum variance of the spatial process (composed of the sum of the partial sill-the spatial variance-and the nugget variance, τ 2 -the non-spatial variance (representing sources of non-spatial variation in true prevalence such as measurement error derived from the skin snip method or unmeasured village-specific demographic or socio-economic factors, among others)-d = kx − x 0 k is the Euclidean distance between x and x 0 , and ρ(d) is the correlation function which controls how the variogram increases from the intercept to its asymptote. Initial fitting of semivariogram models suggested that the exponential correlation function gave a good fit to the data for both a mean trend without covariates (Fig 2A) and after controlling for the effect of environmental covariates ( Fig 2B). The exponential correlation function is given by, and the semivariance γ(d) of the exponential model with a nugget effect is given by, where ϕ is the range parameter, determining the rate at which the dependence between S(x) values decays with distance. Taking −ln(0.05)ϕ gives the distance at which the semivariance asymptotically reaches 95% of its sill value. The specification of this correlation function determines the smoothness of the underlying spatial surface (see section on 'Prior Specification' below). We denote the vector of j = 1,. . ., p regression parameters as β and the parameters of the correlation function as θ. We can use this model specification and the data to make predictions about locations of interest which may not have been sampled, but for which values of the regression coefficients are available (environmental covariates).
Continuing with the notation of [27], we may specify a vector of prediction at target loca-

Initial Selection of Environmental/Climatic Covariates
A systematic Bayesian step-wise procedure was used to select the best set of uncorrelated predictor covariates for use in the geostatistical model. This method builds upon that of Austin and Tu [61] by fitting univariate Bayesian GLSM models for both linear and quadratic functional forms of each of the initial predictor covariates. We tested model fit by means of a Kfolds cross validation, with K = 1,. . ., 10. The data were randomly partitioned into a training set, consisting of approximately 90% of the data (N T K = 663), whilst the remaining 10% (N V K = 74) were held-out and used in the validation exercises. In each K-fold a different set of input data points was held out, such that each of the 737 data points appeared in one and only one of the K-fold validation data set. For this reason, N V K¼10 comprised 71 data points and N T K¼10 666 training data points (see Text E 'K-folds model validation' and Fig G in S1 File for locations of data points in each K-fold). Each model was then independently fitted against the 10 training datasets. The Deviance Information Criterion (DIC), which is a Bayesian measure of model fit [62], was used to compare how well each covariate was able to estimate prevalence at a set of held-out validation locations. Ten datasets were chosen as a compromise between number of bootstraps and computational time.
For both linear and quadratic functional forms of all of the input covariates, using each bootstrap dataset in turn we calculated the DIC for that model. For each covariate, the functional form yielding the lowest DIC was used to select either linear or quadratic forms for further evaluation. The covariates were then ranked by their resulting DIC values (lowest to highest) and an iterative selection procedure applied to select the best set of uncorrelated covariates. In the first iteration, the overall best-performing covariate (the covariate generating the lowest DIC) was selected and the correlation between values at all data locations between the model with this covariate and with all other covariates was computed. Any covariates with a Pearson's correlation of > 0.7 were deemed to be collinear and were eliminated from further model selection. The second iteration started with the next best remaining covariate, i.e. the covariate yielding the next lowest DIC after the previous round. Collinear covariates were excluded as before and further iterations continued until all remaining covariates had either been eliminated or included in the set of uncorrelated covariates used to identify the best performing model via a Bayesian step-wise selection procedure.

Building a Multivariate Model
Given the set of uncorrelated potential predictors identified, we used a standard Bayesian forward step-wise procedure to build the most parsimonious model that provided the best fit to the observed data. We started by selecting the model which had the lowest possible DIC, identified from the previous step. Using this as a baseline model, bivariate models, each using an additional covariate, were run. Whichever, if any, of the new models had the greatest reduction in DIC from the baseline was selected to be the new baseline model, and additional covariates were added to this model. At each step we added the covariate which effected the greatest marginal reduction in DIC. The step-wise procedure stopped when the addition of further covariates did not reduce (or increased) the DIC. Table 3 lists the covariates which were included in the final Bayesian GLSM, their median values and their 75% and 95% Bayesian credible intervals (BCI), respectively containing 75% and 95% of the posterior distribution for each parameter.

Prior Specification
For each of the β j environmental regression parameters, a prior distribution was assigned, such that b j $ N ð0; s 2 b j Þ and s 2 b j was given a value of 10. Pragmatic priors for the spatial correlation function parameters (Eq 2) were specified after initial investigation by fitting a theoretical variogram, by maximum likelihood estimation, to the data conditioning on a mean trend model specified by the environmental covariates ( Fig 2B). Slightly at odds with traditional Bayesian prior selection, the maximum likelihood estimate was used as the mean of the prior for the range parameter ϕ, to help the model converge better. The prior was specified as a discretised uniform prior, defined across the interval ϕ'−10 < ϕ < ϕ' + 10 with 1,000 discrete support points, where ϕ' is the maximum likelihood estimate. The upper and lower limits of support were set to ensure exploration of a wide prior parameter space. The nugget variance, τ 2 -not a parameter estimated by the model-was specified as a relative nugget (i.e. as the proportion of the total variance of the spatial process that is due to non-spatial effects). The relative nugget was computed using the parameter estimates from the theoretical variogram model fitting procedure, giving a value of 0.279. The variance Var[S(x)] = σ 2 was assumed to follow a chisquared distribution, composed of the sum of squares of random normal deviates (from the mean of this distribution, set to 0), drawn at each data location. Although initial testing indicated that adequate mixing of most MCMC chains was achieved with a burn-in of less than 1.5×10 5 iterations, some bootstrap datasets did not converge, so the burn-in length was increased to 1.5×10 6 iterations to ensure that chains reached the posterior equilibrium distribution. A further 1×10 6 iterations were run to sample from the posterior distribution with thinning set to store every 500th subsample to reduce autocorrelation between successive subsamples, giving a total of 2,000 samples from the posterior distribution. Text F of S1 File provides further details on 'MCMC chain diagnostics' and Fig H and

Model Output
Model outputs consist of a series of realisations from the posterior distribution of microfilarial prevalence for each map pixel, which may be summarised in a number of ways. The mean of the posterior distribution was taken as a point estimate for each pixel (Fig 3A), and pixels were discretized into broad endemicity classes (Fig 3B) following the endemicity classification for onchocerciasis used by the OCP [63], but also including an extra holoendemic class for very high prevalence areas (previously defined). It is also straightforward to derive quantities that may be of operational interest in control programmes, such as threshold probability maps (Fig 4A-4C) or probability of endemicity class classification (Fig 4D). We mapped the predictive probability that prevalence did not exceed the threshold of hypoendemicity (35%), where the burden of onchocerciasis as a public health problem is relatively low. We also mapped the predictive probability that prevalence exceeds the epidemiologically important threshold of hyperendemicity (60%), above which there is a greater than non-linear increase in morbidity experienced by the population, and annual ivermectin distribution may not be sufficient to reach operational elimination thresholds within the WHO proposed timeframes [22]. To map the predictive probability that prevalence does not exceed 35% we approximated from the predictive posterior at each target location (x Ã ) by calculating the proportion of realisations (r) from the predictive posterior that do not exceed this threshold; ð1=rÞ taking a value of 0 or 1 depending on whether the threshold was exceeded or not. This surface is depicted in Fig 4A. The continuous probability surface that prevalence exceeds 60% is shown in Fig 4B, and a discretized surface of probability that prevalence exceeds 60% is depicted in Fig 4C, classed into P < 50%, 50% P < 75% and P ! 75%.
A surface of the probability of endemicity class assignment (P class ) was also calculated for each pixel (Fig 4D). Each posterior predictive realisation was assigned to an endemicity class based on the cut-off values detailed in the legend of Fig 3B. The mean of the posterior of each pixel was used to assign endemicity class. The probability of that assignment is then the proportion of posterior realisations which were in the same endemicity class. The number of endemicity classes was reduced to four with cut-off values set as follows: hypoendemic onchocerciasis: <35%; mesoendemic: !35% and <60%; hyperendemic: !60% and <80%, and highly hyperendemic (or holoendemic): !80% microfilarial prevalence. The latter category, although not specified in [63], has been used to define more accurately the relationship between the prevalence of infection and onchocerciasis-associated morbidity, and to investigate the feasibility of reaching operational elimination thresholds with ivermectin distribution alone [18,22]. With four endemicity classes the minimum probability of endemicity class membership is 25%.

Uncertainty
A key advantage of the MBG approach is that it permits robust evaluation of the uncertainty associated with model outputs [58]. Uncertainty may be directly inferred from the posterior distribution of the predictions, with greater uncertainty indicated by a larger variance of the posterior distribution at prediction locations. The standard deviation of the posterior predictive distribution and the population layer were used to calculate a population-weighted index of uncertainty ( Fig 5A) following [64]. Maps displaying the variance of the posterior predictive distribution at each map pixel were also calculated ( Fig 5B) as a visual representation of the uncertainty surrounding the prediction made at each target location. A) The mean of the predictive posterior distribution of microfilarial prevalence in the population aged ! 5 years (P mf ) for each pixel in the study area. There are no pseudo-absence data points in this model (i.e., pragmatically-generated points of zero-prevalence in areas of known absence of disease) and, therefore, the grey areas in the north of the map, where predictions of prevalence are very low, are entirely determined by the value of the environmental covariates (the closest villages are too distal to exert a spatial effect). The white boundary denotes the limits of the OCP area. B) Endemicity class classifications for the mean of the predictive posterior, based on a modification of the OCP endemicity categories. Areas where prevalence is very low are shown by sub-dividing the hypoendemic class into two categories, a non-endemic/sporadic endemicity class, where microfilarial prevalence is <10%, and a hypoendemic class where prevalence !10% but <35%. Other categories are mesoendemic: !35% and < 60%; hyperendemic: ! 60% and < 80%, and highly hyperendemic (or holoendemic): ! 80%.  A) The posterior predictive probability that microfilarial prevalence for each map pixel is less than 35%. The model output is mapped on a continuous scale from 0 to 1, with dark red representing a high degree of certainty that prevalence does not exceed the threshold of 35%. B) The continuous posterior predictive probability that microfilarial prevalence exceeds the threshold for hyperendemicity of 60%. C) Model output as in Fig 4B but categorised into broad probability intervals. Maps of this nature may have utility for the optimal targeting of current and novel control interventions by programme managers. Light-grey areas are pixels with a probability < 50% of exceeding the 60% threshold; pink pixels denote that the probability of exceeding hyperendemicity lies between 50% and 75%; deep-red colours represent a probability > 75% of exceeding this threshold. D) The probability of each map pixel being in the endemicity class to which it was assigned (P class ). With 4 endemicity classes the probability ranges from a minimum of 0.25 to 1, with a value of one meaning that all realisations from the predictive posterior were assigned to the same endemicity class. In all panels the red boundary denotes the OCP limits.

Model Validation
The ability of the final B-MBG GLSM model to estimate the prevalence of O. volvulus microfilarial infection prior to the inception of the OCP control interventions in the area covered by the programme was assessed, for comparability with published work, using validation criteria which have been applied in other mapping exercises using Bayesian inference to generate predictive distributions. [e.g. 64,65]. A K-folds cross-validation exercise [66] was performed as previously described. In each cross-validation (CV) run, a single subsample was retained to use as the validation dataset whilst the other K − 1 datasets were used for training the model. For each CV, posterior predictive distributions of microfilarial prevalence were generated at unobserved validation locations for comparison against the known values. Using K-folds validation, each datum was used once in the cross validation. A number of validation indicators were then calculated according to the methodology used by the Malaria Atlas Project (MAP; http://www. map.ox.ac.uk/).
As a measure of the accuracy of point estimations the following statistics were calculated: (a) the overall Pearson-product-moment correlation coefficient (r) between observed and estimated values at validation locations ( Fig 6A); (b) the mean prediction error; (c) the mean absolute prediction error, and (d) the ability of the model to allocate endemicity class correctly. The correlation coefficient provides a measure of linear association between observations and estimations. The mean prediction error is a measure of overall bias, i.e. the tendency of the model to systematically over-(if the value is positive) or under-(if the value is negative) estimate the true value. The mean absolute prediction error is a measure of overall accuracy, i.e. the magnitude of the difference between estimated and observed values [67]. We computed receiver operating characteristic (ROC) curves [64] and calculated the area under curve (AUC) statistic for each endemicity class to test congruency between observed endemicity classes and modelestimated endemicity classes (Fig 6B). Following [64] an AUC greater than 0.7 indicates good A) The population weighted index of uncertainty is calculated as in [61] and is a pragmatic representation of how important uncertainty in the predictive posteriors is likely to be. The uncertainty index is calculated by taking the log 10 (pop 75 +1)×1/ P class where pop 75 is the population count for 1975 and P class is the probability of endemicity class assignment (Fig 4D). B) Map of the variance of predictive posteriors. Higher values mean more diffuse posterior distributions and hence greater uncertainty in model estimates. The lowest uncertainty is found in pixels proximal to data locations. In both panels the red boundary denotes the OCP limits.  to excellent ability to discriminate correctly between points within and without an endemicity class.
Following [68], we tested how closely posterior distributions matched their assumed error distribution by means of a probability-probability (P-P) plot, which computes the expected value of microfilarial prevalence at each location, for a series of increasing probability thresholds (a total of 101 thresholds so that the range 0-100% can be explored), and compares them to the observed prevalence values (Fig 6C). Assuming errors are normally distributed, at a given probability threshold, the proportion of predicted values exceeding their observed value across all validation locations (the so-called 'coverage probability') should be equal to that probability threshold and therefore the P-P plot should show values lying perfectly along the 1:1 line. Deviations from this line indicate a tendency of the model to over-or under-represent uncertainty in point predictions. We also compared the distribution of estimated and observed prevalences at validation locations using a histogram (Fig 6D).

Estimating Human Population Counts, Numbers at Risk, and Numbers Infected with O. volvulus in 1975
In order to estimate numbers of persons infected, maps estimating mean endemicity were overlaid on a population count layer to calculate the number of persons expected to have been infected with O. volvulus microfilariae. The number of persons at risk was assumed to be the total population aged !5 years living within rural areas as onchocerciasis transmission requires clean, unpolluted and fast flowing bodies of water as breeding sites for the vectors. (The selection of the !5-year olds was because this is the age group examined for skin microfilariae in most surveys. Children younger than 5 years of age are at risk of acquiring onchocerciasis but there was a need to make the population groups comparable to those for which infection with O. volvulus is routinely reported.) Raster global population counts are readily available from the Gridded Population of the World (GPW) v.3 data (http://sedac.ciesin.columbia.edu/data/collection/gpw-v3) from the Center for International Earth Science Information Network (CIESIN) within the Earth Institute at Columbia University [69]. These data were downloaded for 1990 (the earliest available data) and projected backwards to 1975, on a per-pixel basis, using the relevant national medium-variant, inter-censal country population growth rates from the United Nations (UN), World Population Prospects, 2012 Revision (WPP2012) database [70]. As microfilarial prevalence is often expressed in terms of the population aged 5 years and older, the proportion of the population aged 0-4 years in 1975, according to the WPP2012 database, was removed (see above). The resulting raster was then re-projected and cropped to match the study area output maps (Fig 7A). Total population counts in the estimated 1975 map were very close to the census estimates from the WPP2012 database (Table 4). The area was delineated into rural and urban areas using the gridded rural-urban extents map (GRUMP v.1) [71]. Pixels falling in urban extents were considered not-at-risk and were excluded from calculations of at-risk and infected totals, as these areas are not suitable for simuliid breeding (Fig 7B). We considered estimates of infection for the total study area (as shown by the extent of the maps), and within the OCP control area (delineated by the boundary marked in Fig 1A).

Model Parameters
As may be expected from the ecology of onchocerciasis, the log 10 -transformed distance to the closest river was a negative and statistically significant environmental predictor of onchocerciasis microfilarial prevalence ( Table 3), confirming that as distance to rivers increases, microfilarial prevalence decreases. In addition to this, the only other (negatively  * At risk population is defined as the total population aged 5 years and above living in rural areas within the Onchocerciasis Control Programme borders. Rural areas are defined by the Urban Extents mask (Fig 7B). ǂ The product of the mean per-pixel microfilarial prevalence and the corresponding pixel at risk-population count. Only pixels within OCP borders count towards this total. ++ The average mean prevalence values by country, in percent. Country mean prevalence is the average of the per-pixel mean prevalence from the map output depicted in Fig 3A, within OCP borders (not the ratio between the numbers of infected and the numbers at risk).
+++ The values shown here do not include a large area in southern Ghana later incorporated into the OCP and that mainly comprised forest-type onchocerciasis.
doi:10.1371/journal.pntd.0004328.t004 associated) environmental covariates, which were statistically significant at the 95% highest posterior density (HPD; the shortest interval which contains 95% of the posterior probability), were: a) the second order polynomial term of log-transformed distance to river; b) the second order polynomial term of mean temperature of the wettest quarter (BIO8); c) precipitation seasonality (BIO15); and d) NDVI for the December-February ecological quarter. The spatial terms for the range (ϕ) and for the sill or variance of the spatial process (σ 2 ) were positive and significant predictors. At the 75% HPD (the interval which contains 75% of the posterior probability), in addition to the variables mentioned above, the intercept, the mean temperature annual range, and the NDVI of the second (Mar-May) and third (Jun-Aug) ecological quarters were positively and statistically significantly associated with microfilarial prevalence, whilst the second order polynomial terms of the temperature annual range and of the precipitation seasonality were negative and significant predictors.

Model Output: Map of Baseline Microfilarial Prevalence prior to OCP Interventions
The mean of the predictive posterior microfilarial prevalence in the population aged 5 years and older is shown in Fig 3A. The mean O. volvulus prevalence in pixels within the boundaries of the OCP area was 45.2%, ranging between 2.3% and 90.0%. The mean prevalence in map pixels outside the OCP control programme was 18.2%, ranging between 0.5% and 70.1%. It should be noted that we do not have ground-truth data points in areas outside the OCP boundary and, therefore, estimates made for such areas are more speculative and rely on data from within the OCP boundaries and their relationship with environmental covariates. A surface of mean prevalence discretized into the endemicity levels defined in the Methods (see 'Onchocerciasis Prevalence Data' section above) is shown in Fig 3B. The map of input data locations is displayed in Fig 1C.

Map Accuracy and Predictive Probability Maps
The predictive posterior distributions are an inherent realisation of the statistical uncertainty surrounding the mean estimates. We translated the posterior distributions into operationally useful maps, depicting areas where the prevalence is below or above a certain threshold. The probabilities that microfilarial prevalence is below 35% (where disease severity is low), or exceeds 60% (where disease severity is high) are given in Fig 4A and 4B, respectively. In both cases, the greatest levels of uncertainty were associated with areas where the posterior predictive mean of prevalence fell in the mesoendemic endemicity class, and in areas where we do not have ground-truth data points to inform the model, such as areas outside the OCP boundaries. Where predictions were either in hypo-or holoendemic classes, the probability of threshold exceedance was either very low or very high, respectively. The effect of the environmental covariate, distance to river, is visible in these continuous probability threshold maps. The outline of the stream network can be seen in Fig 4A as lighter shade in dark areas indicating a low probability that microfilarial prevalence is < 35% along the stream network. In Fig 4B the river network is also visible as darker colours in light areas, indicating a higher probability that microfilarial prevalence exceeds 60% close to the river. Fig 4C describes the probability that prevalence exceeds the clinically important threshold of 60% microfilarial prevalence, indicating areas with 0 to 50%, 50 to 75%, and 75 to 100% probability that areas are at least hyperendemic. There is a band of high-probability of exceedance of the hyper-endemic threshold across the core area of the OCP, from Guinea, through Côte d'Ivoire, Ghana and Southern Burkina Faso, to Togo and into Benin. This band also extends into Nigeria, despite there being no input data points there; a country which was later included in the mass treatment programmes under APOC (see Fig 2 and Fig 3 of [37] by way of validation of our results). The average probability of endemicity class membership across all map pixels was 66% (Fig 4D), with a range of 34% to 100%. This means that on average, 66% of the posterior realisations for any given map pixel were within the same endemicity class. The most uncertain endemicity class membership for individual map pixels appears to be in the core areas of the OCP, predominantly in mesoand hyper-endemic classified pixels. However, examination of the signal to noise ratio (SNR) map (see S1 File Text H 'Further maps derived from the predictive posterior distributions') indicates that these are the areas with the highest SNR values, as may be expected due to their proximity to ground-truth data points (Fig L of S1 File). This suggests that the prevalence cutoff values used to define endemicity classes are less well-suited to this study. Pixels with estimates of prevalence close to endemicity class boundaries will be less easily classified to a given endemicity class, and instead a more finely discretized map, such as the map in Fig 3A, would be more informative.
We modelled the probable importance of the uncertainty surrounding endemicity class membership as in [64], using a population-weighted index of uncertainty. Briefly, in areas where the probability of endemicity class membership is not well defined (a low probability within the overall range of 0.25-1 in this study), and the population density is also high, the possible marginal effect of that uncertainty in the predictive posterior distribution of microfilarial prevalence will be high. The population-weighted uncertainty map depicted in Fig 5A, which has an uncertainty index value scaled between 0 and 1, is highly overdispersed, with > 90% of the pixels in the map having an uncertainty index of < 0.5 assigned to them.

Model Validation
The overall correlation coefficient, r, between observed and estimated values at validation locations was 0.693 ( Fig 6A). Calculating separately a correlation coefficient, ρ, for each of the 10 K-folds individually, we obtained a range of values varying between 0.566 and 0.806 with a median of 0.699. The mean prediction error was 0.77% (i.e., overall, observed prevalence was greater than the estimated prevalence by less than 1%). The mean absolute error indicated that, on average, there was a difference of 12.3% between the observed and estimated microfilarial prevalence at validation locations. The variograms depicted in Fig 2 show that there was significant spatial dependence in the data (Fig 2A), even after accounting for the effect of environmental covariates (Fig 2B), as with increasing separation distance there was increasing semivariance that lies outside the envelope of semivariance values expected by chance (the grey shaded area-a Monte Carlo envelope calculated from random reassignment of prevalence values to different data locations). The expected onchocerciasis endemicity class exactly matched the observed endemicity class for 406/737 (52.7%) validation locations and was within one endemicity class at 711/737 (96.5%) locations, showing good agreement between observed and estimated classes. Serious misclassification of endemicity class (i.e. classifying a point to a nonadjacent class) occurred in 26/737 (3.5%) cases. A contingency table by endemicity class is given in Table 5. AUC statistics for each endemicity class were calculated using ROC curves (Fig 6B), and with the exception of the mesoendemic class, all other endemicity classes had good to very good agreement between observed and predicted endemicity classes with AUC values of 0.85 for hypoendemic; 0.77 for hyperendemic, and 0.81 for holoendemic classes. The mesoendemic class fell just slightly below the threshold for good predictive capacity, with a value of 0.68.
Following [64,65], we also tested how closely posterior distributions matched their assumed normal error distribution by means of a P-P plot (Fig 6C). The line is close to the 1:1 line of a perfect error model, but falls below the 1:1 line for values between probability thresholds of 0.15 to 0.60, meaning that there is a slight bias of the model to overestimate the probability of taking values closer to the middle of the range of possible prevalence values, whilst slightly under-representing extreme (very low or very high) microfilarial prevalence values.
This phenomenon can be seen on the histogram in Fig 6D. The histogram of prevalence at input data locations is negatively skewed, whilst the histogram of estimated prevalence at validation locations exhibits a smaller degree of skewness, with a greater proportion of estimated values falling in the meso-and hyper-endemic categories.

Estimated Numbers of People Living in the OCP Area, Infected with O. volvulus and At-Risk of Onchocerciasis in 1975
The population totals calculated in this study are very close to the country-level census numbers provided in [70]. The total gridded population living in OCP countries in 1975 according to our estimates was 52.5 million, comparing well with the total UN estimate for OCP countries of 52.4 million persons. We combined the map of mean microfilarial prevalence in those aged 5 years and above with a map of the estimated corresponding population for 1975 to derive estimates of numbers of people infected with O. volvulus. We estimated that there were 7.55 million persons with onchocerciasis at the outset of the OCP living within the control programme's boundaries (approximately 1.3 million km 2 [9]). The total number of persons estimated to be at risk of onchocerciasis within the OCP borders was estimated to be 17.8 million. Within the study area, there were an additional 11.6 million persons aged 5 years old or more, living in rural areas in OCP countries, but outside of the control programme's borders. From the model estimates, we calculated that an additional 2.8 million of these persons may have been living with onchocerciasis in these areas. Of these, 1.1 million persons were in Ghana, where the control programme subsequently extended southwards from the original OCP boundaries to cover forest areas known to be endemic but not previously covered under the OCP. The total numbers of persons at-risk and infected, by country, are shown in Table 4.

Discussion
To the best of our knowledge, we have presented the first continuous map of the pre-control prevalence of O. volvulus microfilariae in West Africa, in the region previously covered by the OCP and, in addition, have identified environmental covariates that aid in the estimation of onchocerciasis prevalence in un-sampled areas. For instance, our estimated pre-control infection prevalence distribution in the totality (Liberia) or part (Nigeria) of two non-OCP countries fares well with that presented, respectively, in [72] and [37]. We have also provided comprehensive estimates of the initial numbers of persons infected and at risk of infection, and consider a robust handling of the uncertainty surrounding those estimates. According to our model, the mean baseline infection prevalence in the rural areas of the OCP was 45%. By comparison, Zouré et al. [37] estimated that in APOC areas and for 2011-had control interventions not been implemented-the average microfilarial prevalence of infection in the entire rural population would have been 15%. This estimate, based on a conversion of nodule prevalence into microfilarial prevalence using the relationship described in Coffeng et al. [73], is much lower than ours. APOC covers a much larger and more environmentally and geopolitically heterogeneous area, which may contribute to explain some of the difference in infection prevalence estimated in the two studies.
Although levels of infection have decreased as a result of the interventions implemented by the OCP [8][9][10][11][12], and environmental change has occurred in the intervening years since 1975 [74], a map of the baseline levels of prevalence and endemicity provides an indicator of initial conditions for transmission, which are influential determinants of the feasibility of onchocerciasis elimination with ivermectin [17,18,22]. This is particularly relevant, since after cessation of OCP vector control operations in 2002, and subsequently in special intervention zones [13], blackfly vector populations have recuperated to original levels in many areas and there is evidence of continuing transmission within the former OCP area [75][76][77].

Representativeness of the Data
The OCP epidemiological dataset represents the most complete source of standardised parasitological survey data for onchocerciasis in West Africa. However, although extensive surveys were conducted (see Fig A in S1 File for a map of all locations surveyed during the OCP), only a small fraction of the total population was sampled (the subset of the OCP dataset which was selected as the most representative of pre-control prevalence consisted of approximately 145,000 persons skin snipped). One of the problems faced when using the OCP data for mapping purposes, was that the population and epidemiological surveys were not performed at random, introducing some bias in the data towards areas of higher endemicity. Original programme areas were delineated, for the purposes of enabling evaluation of the impact of vector control, using cartographic and entomological knowledge about the location of rivers, flow speeds and vector breeding sites. In a limited number of epidemiological settings more detailed studies were carried out with the objective to map the distribution of onchocerciasis, and villages were classified as first-line if there were no other settlements between themselves and vectors' breeding sites [78,79]. Further surveys were conducted in a selection of villages at regular intervals up-and down-stream from the breeding site, and a random sample taken from second-and third-line villages at increasing distances from the river [79]. Including this information in the model when estimating the effect of environmental covariates may have helped increase predictive accuracy, but it would be difficult to classify map pixels into first-line or otherwise without complete knowledge of the distribution of all villages across the study area. It was not necessary at the time to conduct a complete census of the resident population (although a census was carried out in each survey village) in the OCP area and its extensions, because control initially relied on vector control (and not on mass distribution of ivermectin, for which censuses are mandatory).

Use of Environmental Covariates and a Spatial Trend
The use of environmental covariates to improve model accuracy was investigated and a number of variables associated, or deemed to be potentially associated, with onchocerciasis was investigated. A model incorporating covariates was found to be a better predictor of onchocerciasis prevalence and endemicity than an intercept-only model, which incorporated only the spatial trend. Inclusion of covariates improved (reduced) the DIC values in model validation and improved the correlation between observed and predicted prevalence values at validation locations. In addition to the environmental covariates listed in Table 3, we also tested for any effects of spatial differences between the geographical distributions of different members of the S. damnosum complex but we could not detect any significant influence of vector taxon on the model's predictions. (Had we modelled infection intensity rather than prevalence, an effect of vector taxon might have become apparent.) However, the value of ϕ % 38 corresponds to a spatial trend that reaches 95% of its maximum semi-variance over a distance of approximately 113km (i.e., for any prediction location, its estimated value is influenced by locations up to 113km apart). The range parameter is consistent with the observation that blackfly vectors can travel several hundred kilometres from a breeding site [80]. However, Renz et al. (1987) found biting density to drop to 10% of its original level at distances of 5km from riverine breeding sites in Sudan-savanna focus of Northern Cameroon [81]. Other studies have shown that people may be bitten by female blackflies in locations distant from any known breeding site [82][83][84] and that re-invasions of parous flies into areas previously cleared through larviciding can happen from breeding sites as far as 300km away [85]. This lends strong epidemiological and ecological support to the operation of a spatial trend operating over this distance.
We extended the methods of Slater et al. [36] to carry out a robust, Bayesian step-wise selection procedure to identify the best set of uncorrelated explanatory covariates. The chosen covariates significantly improved the ability of the model to explain the variation in recorded microfilarial prevalence. The covariates also helped to define natural geographical limits to the mapping which would otherwise have been impossible, given the bias towards surveyed villages with a high probability of having onchocerciasis. In other mapping studies, pseudo-absence points have been introduced into the data in un-sampled areas which are suspected to be nonendemic, to help delineate risk-free areas. In this study we have avoided this, and instead allowed the limits to be defined by the environmental predictors and the spatial covariance function in the model. We found that the model estimated very low prevalence in the very dry arid areas north of the OCP boundaries, without recourse to including pseudo-absence points.
There is a zone of southeastern Côte d'Ivoire in Figs 3 and 4 where the model predicts nonendemicity and hypoendemicity with patches of mesoendemicity, although it captures a hyperendemic hotspot around the lower Bandama river. The prediction for an area east of this location is consistent with a zone where there are very few S. damnosum s.l. breeding sites in the middle section of the River Comoe and its surrounds (see Figures 2 and 3 of [42]). However, for the lower Bandama river and the lower Comoe river, a previous study by Dadzie et al. (1990) [86] found a high prevalence of onchocerciasis in 11 first-line villages in southern Côte d'Ivoire which, with one exception, were located south of the OCP boundary, but they did remark that the CMFL values were "relatively low". Similarly, hyperendemicity in the Cavally river valley in western Côte d'Ivoire [87], the map for which was erroneously transposed with the one in [88] was missed by the model. Whilst our map estimates show elevated prevalence along stream lines in the southeastern Côte d'Ivoire area, the prevalence is not as high as that found in the Dadzie and co-workers' study. The villages in that study were all situated 1km from known breeding sites in rivers, making this sample not wholly representative of the epidemiological situation in southern Côte d'Ivoire. The villages in our study are located between 0 and 26km from the nearest river (mean = 5.15km, median = 3.86km), and because our pixel resolution is of 5km x 5km, our model may have smoothed over some of the highly localised nature of the elevated endemicity reported in [86]. In addition, the model relied heavily on only two village data sets in this area with a considerable distance between them on a west to east axis, both of which were points of non-endemicity ( Fig 1C). Nevertheless, that these points reflected the true situation in the zone between the Bandama and Comoe rivers is supported by epidemiological data collected in an area due north of Abidjan and due east of Yamoussoukro in October 2014 that revealed only hypoendemic villages. An additional potential explanation is that we may be missing important environmental covariates that would be able to explain the high endemicity in this particular area. In the southern extremes of the study area the land type is dense forest, and forest-type onchocerciasis is known to produce high prevalence; we may be missing some important covariates associated with this bioclime, as most of the area for which the model produces estimates of initial onchocerciasis prevalence belongs to the savannah bioclimes. Finally, in areas with ground truth data, the variable distance to river is helpful for modelling the distribution of endemicity around rivers, but in those without ground truth data, the model may not be able to predict high endemicity levels on the basis of the environmental variables only. Some of these variables are probably not capturing determinants of high endemicity along river courses, such as the locations and productivity of vector-breeding grounds. This may require other environmental information such as that used in the remote sensing model presented in [89].
We confirmed that microfilarial prevalence is statistically significantly and negatively associated with increasing distance from rivers in both the linear term (p-value < 0.01) and the quadratic term (p-value < 0.05). Distance between village and river is an important determinant of vector biting rate [81] and onchocerciasis endemicity when the river contains rapids and other suitable and identifiable vector breeding sites, according to a classification of first, second and third line villages [90,91].
Mean temperature of the wettest quarter (BIO8) was also significantly and negatively correlated with prevalence (p-value < 0.05). Interpretation of this result is rather more difficult because the wettest quarter is necessarily different across different map pixels, but it suggests that conditions for blackfly breeding and transmission may be more favourable in areas in which the majority of precipitation occurs in the cooler parts of the year. This is linked with precipitation seasonality (BIO15), which was also significantly and negatively correlated with infection prevalence. Areas with greater intra-annual variability in rainfall (and therefore possibly representative of more seasonal transmission) were negatively associated with prevalence (the coefficient of variation of precipitation seasonality is the standard deviation of the weekly precipitation estimates expressed as a percentage of the mean of those estimates). Areas with a more constant weekly rate of precipitation (perhaps indicative of perennial transmission), which also occurs during the cooler parts of the year (possibly favouring vector survival), are positively associated with microfilarial prevalence. This is compounded by the negative parameter value for temperature annual range (BIO7). A smaller difference between the highest weekly maximum temperature and the lowest weekly maximum temperature had a positive effect on prevalence (presumably through an effect on developmental rates of blackfly immature stages, and perhaps also of the parasite larval stages within the vectors), which are highly dependent on temperature [92].
The normalized difference vegetation index (NDVI) for the December-February ecological quarter was a statistically significant and negative predictor of microfilarial prevalence (pvalue < 0.05). This quarter follows a period of low precipitation within the OCP area during which the temperature is still relatively mild and the highest NDVI values for this quarter are found only along the coastal regions, which are unsuitable as blackfly breeding habitats. When relaxing the BCIs to 75% (i.e. at p-value < 0.25), NDVI for other ecological quarters became significantly and positively associated with prevalence. In particular, areas with more lush green vegetation during March-May and June-August were positively associated with onchocerciasis prevalence. These greener areas occur within the OCP boundaries, whereas the desert areas situated north of the programme boundaries are extremely arid due to lower rainfall and higher temperatures, with very low NDVI values and negatively associated with the presence of onchocerciasis. Compound topographic index (a proxy for the slope and relative wetness of the land) had a weak effect in the model, as did the precipitation of the wettest quarter (BIO16).
Altitude has been found to be positively correlated with onchocerciasis prevalence in various foci of Cameroon [93], Venezuela [94], and Guatemala [95] up to an altitude of 1,500 metres above sea level, and negatively correlated above that altitude, with most of the proportional increase taking place at altitudes up to 500 m [93][94][95]. However, this association may also depend on the type of river and its geomorphological substrate, determining vector breeding suitability (and in Venezuela also an association with vector species composition and the relative vector competence of the various simuliid species for O. volvulus), and not merely on altitude [96,97]. In this study, altitude was not found to be a significant predictor of onchocerciasis; however, this could be partly explained because both rainfall and temperature variables were included in the model, which may remove some of the effect of altitude on prevalence, and the fact that > 94% of the total land area in the OCP is < 500 metres elevation.

Model Implementation and Validation
In line with other recent approaches to large-scale geostatistical mapping [28][29][30][31][32][33][34][35][36], we used MCMC methods to fit a Bayesian statistical model, which uses linear combinations of the survey data (microfilarial prevalence) and a spatial covariance function to obtain predictive posterior estimates at un-sampled locations. The distribution of such estimates readily translates into measures of uncertainty associated with microfilarial prevalence. For those non-surveyed locations, which are distant from those with data, the posterior distribution tends to be more diffuse, indicating greater uncertainty.
In this study we used 25km 2 map pixels. This was chosen as a compromise between achieving a desirable degree of spatial resolution and minimising computing time, but led to some loss of short-scale variation in prevalence below the chosen value. This size of individual map pixels means that very short-scale variation between closely situated villages is smoothed over and is likely to be important given the exponentially decaying relationship between distance from the river and biting density/proportion of parous flies that occurs within this distance [81]. A consequence of this smoothing effect is that although the predicted surface provides a robust prediction of endemicity at larger scales, it is less capable of representing variations over very short distances, and the estimation of extreme values may be truncated. The output range of prevalence values in those aged 5 years and over in the predictive map was slightly narrower (from 0.5% to 90%) than the input data (from 0% to 95%), a feature of spatial smoothing also reported by others [64,65,98].
Our validation statistics compare reasonably well with those presented in other mapping exercises, such as MAP for 2007 [64] and 2010 [65], which also applied MBG procedures but (in the case of MAP 2007) did not find a strong relationship between malaria prevalence (Plasmodium falciparum in those aged 2-10 years) and a range of remotely-sensed environmental covariates. (In their update of global P. falciparum endemicity map for 2010, Gething et al. [65] succeeded in identifying environmental covariates as predictors of malaria prevalence.) Although we obtained a lower correlation coefficient (0.69 in our case versus 0.82, for the 2007 version of MAP in Africa [62]), our sample size was 7 times lower (737 versus 5,307 input data points). The mean prediction error (0.8% vs. 0.8%, indicating a tendency to overestimate prevalence by less than 1%), and the mean absolute errors (12% vs. 11%) are comparable. (In the 2010 MAP iteration for Africa, the correlation coefficient between predicted and observed values was 0.86; the mean prediction error was -0.9%, indicating a tendency to underestimate prevalence by less than 1%, and the mean absolute error was 12% [65].) The ability to predict endemicity class correctly was also lower (59% vs. 71% in the 2007 MAP), but there was only a value of 3.5% for serious estimation errors (i.e. non-adjacent class classification) when using four endemicity classes (the hypoendemic class was expanded to include the non-endemic class). The ability of the model to allocate locations correctly into endemicity class partly depends on the boundaries and number of categories chosen to represent endemicity levels, and therefore it is not a wholly unbiased metric of map accuracy. (With n endemicity classes there are (n -1) 2 -(n -1) chances for misclassification-allocating to a non-adjacent endemicity class-in the n×n matrix; hence, the probability of misclassification increases non-linearly, asymptotically approaching 1, with increasing n).

Population at Risk and Numbers Infected with O. volvulus in 1975
The United Nations estimated the 1975 population for the 11 countries covered by the OCP at 52.4 million [70]. Our back-calculation sets this figure at 52.5 million (only 0.2% higher). Boatin [9] stated that by the time the OCP ended in 2002, 27 years later, the estimated population in the area covered by the programme was over 78 million.
The Third Report of the Expert Committee on Onchocerciasis, published in 1987 [99], indicated that in the OCP area, the number of persons at risk (as assessed in 1984-1985) was 7.2 million, with 2.4 million infected (a ratio of 3:1). Since this assessment was made approximately 10 years after the commencement of antivectorial operations in the seven initial countries of the OCP (namely, Benin, Burkina Faso, Côte d'Ivoire, Ghana, Mali, Niger and Togo), the report excluded persons living in previously endemic areas of these countries where transmission was deemed to be controlled from the calculations of the numbers at risk. In those areas, the numbers of persons infected with O. volvulus were calculated as 25% of those who would have been infected at the beginning of the control programme [99]. Consequently, the numbers infected at the start of the OCP would have been 5.9 million in the initial 7 countries, and 6.8 million adding Guinea, Guinea Bissau, Senegal, and Sierra Leone (which were incorporated later). According to our mapping exercise, the numbers of persons at risk would have been 2.5 times as large as the number reported in 1987, at 17.8 million (representing the total population aged 5 years and above living in rural areas within the OCP borders). The number of people with onchocerciasis would have been 7.55 million, i.e. three times as large as the number documented in 1987 (which gives directional support to the report's consideration that the figures given would have been 25% of those prior to control). The report also acknowledges that the figures for the prevalence of infection most likely underestimated the true extent of the problem, as they were obtained by extrapolation from relatively small sample surveys, without accurate knowledge of the local distribution of endemic foci (and certainly without considering spatial correlation between survey points). Survey villages were selected from historical records, for their hyperendemicity of onchocerciasis, and other criteria aimed to ensure long-term evaluation of vector control. Village selection was somewhat independent of the entomological evaluation network, and consequently, although successful in evaluating the vector control campaign, it could not have described the distribution and severity of the disease adequately and did not allow for correct estimates to be made of the number of people infected with O. volvulus [100].
A study of the distribution of onchocerciasis was conducted in 1987 in the Western Extension [100], partly conceived to overcome these limitations. After discarding population sizes greater than 800 inhabitants, the authors calculated a rural population of 4.5 million persons and reported that, of these, 1.5 million persons were infected. Using the Gridded Population of the World dataset, and satellite-derived GRUMP data as a clipping mask to remove urban extents, our model estimated that the rural population at risk within the Western Extension in 1975 would have been 4.5 million, with 1.9 million persons infected. The congruence between our study and that of [100] is excellent. Another study by De Sole et al. [101] investigated the distribution and severity of onchocerciasis in Southern Benin, Togo and Ghana. There is also excellent concordance between the hyperendemic areas delineated by the maps in our study and those areas proposed to be at elevated risk of onchocerciasis-associated blindness in the De Sole et al. [101] study. The district of Pru on the north-western fringes of Lake Volta in Ghana is particularly congruent between the two studies. Both of the De Sole et al. studies [100,101] were carried out in proposed extension areas in order to determine the pre-control geographical distribution of O. volvulus in such areas. Where this type of surveys was carried out, accurate knowledge of the distribution of pre-control onchocerciasis prevalence was obtained, now confirmed with the modern mapping methods we have applied. However, in the original OCP areas, where the main focus was not to determine the distribution of the disease but to identify vector breeding sites to monitor vector control operations, the pre-control prevalence of onchocerciasis was greatly underestimated [99].
Finally, Coffeng et al. [26] estimated that before the inception of APOC in 1995 about 32 million people in APOC areas were infected with onchocerciasis (Zouré et al. [37] estimated a figure of 35.6 million in 2011 had interventions not been implemented). Adding these figures to our 7.55 million infected prior to the inception of the OCP, we obtained a total value of 40-43 million people infected in Africa at baseline. This total lends support to the recent estimation of 37 million people infected with O. volvulus in the continent [8,102], and supersedes the frequently quoted, but grossly underestimated, previous figure of about 18 million infected [99].

Directions for Future Work and Concluding Remarks
The Disease Reference Group for Helminth Infections (DRG4) of the UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR), recently identified that an essential pre-requisite for statistical and dynamical modelling to effectively aid and inform helminthiasis control programmes, is that of creating open access, high quality, epidemiology and control databases [103]. It is the aim of other infectious disease mapping projects (e.g. Global Atlas of Helminth Infections, http://www.thiswormyworld.org; Malaria Atlas Project, http://www.map.ox.ac.uk; Global Atlas of Trachoma Prevalence, http:// www.trachomaatlas.org) to become an open-access information resource on the distribution of these infections, to which the community of public health workers can contribute their own data. Progress towards this goal has been made particularly for soil-transmitted helminthiases.
Among the questions for onchocerciasis that need addressing, are those concerning: a) how best to quantify the progress of the aforementioned control programmes, describing in time and space changes in infection prevalence, intensity, and morbidity after the commencement of interventions?; b) how to identify in time and space hotspots of persistent transmission and investigate the causes for such persistence (e.g. poor geographic/therapeutic coverage and compliance vs. decreased drug efficacy)?; c) how best to evaluate the feasibility of local elimination of the infection reservoir and the probability of recrudescence or reinvasion once interventions cease? and d) how is the burden of onchocercal disease changing under sustained control [26,104]?
The DRG4 group also identified, as a modelling research priority, the need for linking MBG maps with population dynamics frameworks to both update the information on endemicity in future iterations of the maps; to compare the outputs with those, derived from transmission models, of long-term projections of the impact of interventions on infection prevalence, intensity, and associated disease [26,103], and to project in which foci the 2020 goals will be achieved with current tools and where they will not be reached within reasonable time horizons without implementing novel and/or complementary control interventions. An insightful, albeit complex, analysis might be obtained by approaching this question as a spatio-temporal problem with interventions (of at least two kinds, namely antivectorial and antiparasitic) occurring at different times in different places. Finally, it would be possible to extend the geostatistical model developed here to map the endemicity of onchocerciasis in APOC's countries. APOC has made remarkable progress using REMO to guide the implementation of CDTI [105], and model-based geostatistical maps (based on the spatial dependency component of the model, without inclusion of environmental covariates), have recently been published [37]. However, application of the methodology described here may make it possible to refine estimation of infection prevalence at locations for which no ground-truth data are available by using information from the identified environmental covariates (such as distance to nearest river). It would also be crucial to develop MBG tools to jointly map onchocerciasis-loiasis endemicity and co-endemicity [34,[106][107][108] as the risk of severe adverse effects due to ivermectin treatment in the latter [109,110] presently hinders progress towards the control and elimination of onchocerciasis in the continent by 2020.

Acknowledgments
The work presented in this paper would not have been possible without the tireless and longstanding contributions made by all the previous directors, programme managers, scientific and technical advisory boards and funding bodies of the OCP. We are especially grateful to the community distributors and mobile teams who originally surveyed the villages within the OCP, collecting the valuable epidemiological data used in this study. Their work is a valuable and lasting contribution towards the goal of eliminating NTDs. We also thank Dr Bernard Philippon for advice regarding specific treatment interventions in certain watercourses of the OCP. The Malaria Modelling Group at the MRC Centre for Outbreak Analysis and Modelling, Imperial College London, generously allowed us time on their HPC cluster and we thank Dr Wes Hinsley in particular for support in that regard. The authors would also like to thank members of the Malaria Atlas Mapping (MAP) project from Oxford University, UK for helpful comments and advice regarding methodological issues presented in this paper. We especially would like to thank Peter Gething, Samir Bhatt, Daniel Weiss, and Nick Golding.