Geostatistical Model-Based Estimates of Schistosomiasis Prevalence among Individuals Aged ≤20 Years in West Africa

Background Schistosomiasis is a water-based disease that is believed to affect over 200 million people with an estimated 97% of the infections concentrated in Africa. However, these statistics are largely based on population re-adjusted data originally published by Utroska and colleagues more than 20 years ago. Hence, these estimates are outdated due to large-scale preventive chemotherapy programs, improved sanitation, water resources development and management, among other reasons. For planning, coordination, and evaluation of control activities, it is essential to possess reliable schistosomiasis prevalence maps. Methodology We analyzed survey data compiled on a newly established open-access global neglected tropical diseases database (i) to create smooth empirical prevalence maps for Schistosoma mansoni and S. haematobium for individuals aged ≤20 years in West Africa, including Cameroon, and (ii) to derive country-specific prevalence estimates. We used Bayesian geostatistical models based on environmental predictors to take into account potential clustering due to common spatially structured exposures. Prediction at unobserved locations was facilitated by joint kriging. Principal Findings Our models revealed that 50.8 million individuals aged ≤20 years in West Africa are infected with either S. mansoni, or S. haematobium, or both species concurrently. The country prevalence estimates ranged between 0.5% (The Gambia) and 37.1% (Liberia) for S. mansoni, and between 17.6% (The Gambia) and 51.6% (Sierra Leone) for S. haematobium. We observed that the combined prevalence for both schistosome species is two-fold lower in Gambia than previously reported, while we found an almost two-fold higher estimate for Liberia (58.3%) than reported before (30.0%). Our predictions are likely to overestimate overall country prevalence, since modeling was based on children and adolescents up to the age of 20 years who are at highest risk of infection. Conclusion/Significance We present the first empirical estimates for S. mansoni and S. haematobium prevalence at high spatial resolution throughout West Africa. Our prediction maps allow prioritizing of interventions in a spatially explicit manner, and will be useful for monitoring and evaluation of schistosomiasis control programs.


Introduction
Schistosomiasis is a water-based disease caused by trematodes of the genus Schistosoma. The five schistosome species that are known to infect humans are Schistosoma mansoni, S. haematobium, S. intercalatum, S. mekongi, and S. japonicum. School-aged children are at highest risk of infection and are the main target group for interventions [1].
Despite successful efforts to control schistosomiasis in different parts of the world, more than 200 million individuals are still estimated to be infected and the annual global burden due to schistosomiasis might exceed 4.5 million disability-adjusted life years (DALYs) lost [1][2][3]. A substantial amount of this burden is concentrated in West Africa, including Cameroon. Indeed, 72 million infections are thought to occur in this part of the world [4]. However, the current statistics, as presented by Chitsulo et al. (2000) [4], Steinmann et al. (2006) [2], and Utzinger et al. (2009) [5], are largely based on population re-adjusted data originally published by Utroska and colleagues in the late 1980s [6]. Hence, the estimates are likely to be outdated due to, among other reasons, large-scale preventive chemotherapy campaigns, improved sanitation, water resources development and management, and socio-economic development.
Recently, donors have provided new funds to control the socalled neglected tropical diseases (NTDs), including schistosomiasis. For cost-effective planning and evaluation of control activities, it is essential to have reliable baseline maps of the geographical distribution of at-risk population and disease burden. Early schistosomiasis mapping efforts have been based on climatic suitability thresholds [7,8]. These maps are not reliable because they are not based on disease data. Apart from a few studies [9][10][11][12], empirical maps of disease distribution over large areas are not available since there is a paucity of contemporary large-scale survey data.
The first comprehensive compilation of historical schistosomiasis prevalence surveys at a global scale was carried out by Doumenge et al. in the mid-1980s [13]. More recent collections are available by Brooker et al. (2010) [14] for soil-transmitted helminthiasis and schistosomiasis, but data access is limited. The European Union (EU)-funded CONTRAST project initiated the development of an open-access global NTD database, which is updated in real time (GNTD database; http://www.gntd.org) [15]. A key objective of CONTRAST is to employ this database for large-scale schistosomiasis prevalence mapping and prediction in sub-Saharan Africa for the spatial refinement of control interventions and the cost-effective allocation of resources.
Geographical locations in close proximity share common exposures which influence the disease outcome similarly. The geographical information of the survey locations in the GNTD database allows taking into account the potential spatial correlation and therefore creation of more realistic models. Standard statistical modeling approaches assume independence between locations [16]. Ignoring potential spatial correlation in neighboring areas due to common exposures could result in incorrect model estimates [17]. Geostatistical models take into account spatial clustering by introducing location-specific random effect parameters in the covariance matrix by a function of distance between locations [16]. Such models typically contain large numbers of parameters and cannot be estimated by the commonly used maximum likelihood approaches [18]. Bayesian model formulations enable model fit via Markov chain Monte Carlo (MCMC) simulations [16].
Bayesian geostatistical models have been applied in mapping schistosomiasis at different spatial scales, for example by Raso et al. (2005) [19] in the region of Man, western Côte d'Ivoire, and Clements et al. (2008) [9] in Mali, Niger, and Burkina Faso. Brooker et al. (2010) [14] developed a global predictive map highlighting those areas where preventive chemotherapy against schistosomiasis and soil-transmitted helminthiasis are warrant. However, to our knowledge, there is neither a model-based S. haematobium nor a S. mansoni large-scale prevalence map and spatially explicit burden estimates for the whole West African region.

Author Summary
Schistosomiasis is a parasitic disease caused by a blood fluke that mainly occurs in Africa. Current prevalence estimates of schistosomiasis are based on historical data, and hence might be outdated due to control programs, improved sanitation, and water resources development and management (e.g., construction of large dams and irrigation systems). To help planning, coordination, and evaluation of control activities, reliable schistosomiasis prevalence estimates are needed. We analyzed compiled survey data from 1980 onwards for West Africa, including Cameroon, focusing on individuals aged #20 years. Bayesian geostatistical models were implemented based on environmental and climatic predictors to take into account potential spatial clustering within the data. We created the first smooth data-driven prevalence maps for Schistosoma mansoni and S. haematobium at high spatial resolution throughout West Africa. We found that an estimated 50.8 million West Africans aged #20 years are infected with schistosome blood flukes. Country prevalence estimates ranged between 0.5% (in The Gambia) and 37.1% (in Liberia) for S. mansoni and between 17.6% (in The Gambia) and 51.6% (in Sierra Leone) for S. haematobium. Our results allow prioritization of areas where interventions are needed, and to monitor and evaluate the impact of control activities. In this paper, we developed Bayesian geostatistical models based on environmental and climatic risk factors to obtain reliable empirical schistosomiasis prevalence maps for individuals aged #20 years by analyzing the GNTD data for West Africa, including Cameroon. Prediction was based on joint kriging in order to summarize the results as population-adjusted country prevalence estimates. Emphasis was placed on the distribution of S. haematobium and S. mansoni. We neglected S. intercalatum due to low infection risks, especially outside Cameroon.

Disease data
The GNTD database was used to obtain prevalence data on schistosomiasis. This database assembles general information about the type of publication, authors, and publication year, as well as study-specific information about survey population, survey period, Schistosoma species, diagnostic test employed, and the number of infected individuals among those examined, stratified by age and sex (if available). Hospital studies, data on specific susceptible groups (such as HIV positives), and postintervention studies were not included in the database [15]. For this study, we analyzed all point-level data on settled populations in West Africa on either S. haematobium or S. mansoni: 4550 and 2611 survey locations, respectively. We excluded (i) surveys with missing geographical coordinates; (ii) missing numbers of individuals screened; (iii) surveys carried out before 1980; (iv) individuals aged .20 years; and (v) entries based on certain diagnostic techniques. With regard to the latter exclusion criteria, we rejected all non-direct diagnostic examination techniques, such as immunofluorescence tests, antigen detections or questionnaire data, and direct fecal smears that have very low diagnostic sensitivities (overall, 4% of the data for S. mansoni and 0.1% for S. haematobium were excluded). Hence, the surveys included were mainly based on the Kato-Katz thick smear method (S. mansoni) and urine filtration or sedimentation Figure 1. Study profile. Schematic overview of the study profiling process. The numbers in brackets in the acute-angled boxes represent the number of survey locations (which may not be unique) included in the current GNTD dataset, while the numbers outside the boxes represent the amount of survey dropped due to the reason given in the boxes with rounded corners. doi:10.1371/journal.pntd.0001194.g001 Schistosomiasis Prevalence in West Africa www.plosntds.org (S. haematobium). Sensitivity and specificity of the diagnostic techniques were not incorporated in the model due to usually unknown sampling effort (e.g., number of stool samples, number of slides examined under a microscope, etc.), which affect diagnostic accuracy.
We assumed that the proportion of rejected diagnostic techniques among the data with missing information on the technique (S. mansoni: 33.5% missing, S. haematobium: 20.6% missing) is similar. Therefore, we considered the bias that would arise from ignoring the missing data as larger than the bias from potentially rejected diagnostic techniques among the missing data. A separate model validation on the reduced datasets confirmed that by including data with incomplete records the predictive ability increased compared to the model excluding this information (results not presented).
Climatic, environmental, and population data Climatic, environmental, and population data were obtained from different freely accessible remote sensing data sources, as summarized in Table 1. Data on day and night temperature were extracted from land surface temperature (LST) data. The normalized difference vegetation index (NDVI) was used as a proxy for vegetation. Digitized maps on freshwater body sources (e.g., rivers, lakes, and wetlands) in West Africa were acquired with the characteristic of being either perennial or temporary.
Processing of the MODIS/Terra data was carried out using the 'MODIS Reprojection Tool' [20] and code implemented in Fortran 90 [21] to summarize the temporal changes by an overall yearly average based either on the mean (NDVI, day and night LST) or the mode (land cover). Furthermore, the land cover categories, as defined by the International Geosphere-Biosphere Programme, were re-grouped into six categories as follows: (i) sparsely vegetated; (ii) deciduous forest and savanna; (iii) evergreen forest; (iv) cropland; (v) urban; and (vi) wet areas. Rainfall estimates were processed via the software IDIRSI 32 [22]. Yearly averaged rainfall was calculated as summary measure. Distance calculations to the nearest freshwater body source were done in ArcMap version 9.2 of the Environmental Systems Research Institute (ESRI; Redlands, CA, USA) [23].
A classification scheme of West Africa into ecological zones was obtained using a demo version of the Earth Resources Data Analysis System Imagine 9.3 software [24]. The datasets were subjected to an unsupervised classification, via the 'Iterative Self-Organizing Data Analysis Technique' (ISODATA), to map areas of environmental clustering which were further summarized into five main classes based on between-class similarities. The resulting  [25] and the classes can be interpreted as (i) desert/semi-desert; (ii) sahelian zone; (iii) savannah; (iv) forest; and (v) tropical rainforest.
Population count data obtained from LandScan for 2008 were converted to 565 km spatial resolution and adjusted to 2010 using country-specific average annual rates of change for 2005-2010 provided by the United Nations (UN) [26]. Estimates for the percentage of individuals aged #20 years among the total population per country were extracted from the U.S. Census Bureau International Database [27] for the year 2010. Population counts were linked to the percentage of children. The estimated number of infected individuals #20 years was calculated by combining a sample of the joint predictive posterior distribution of the disease prevalence predicted at pixel level with the population size of that age group within the pixel. The predictive posterior distribution of the number of infected individuals per country was estimated by summing up the pixel-samples and calculating summary statistics. The combined schistosomiasis prevalence (infection with S. mansoni or S. haematobium or both) was calculated on the assumption that the two infections are independent from   Extraction of the remotely sensed data at the survey locations and at the prediction locations for the two databases was performed via a self written Fortran 90 code. The prediction surface for West Africa was built in ArcMap [23] with a spatial resolution of 0.05u60.05u (approximately 565 km) resulting in approximately 220,000 pixels covering the study region. The data were displayed in ArcMap.

Statistical analysis
For each Schistosoma species, bivariate logistic regressions were performed in STATA/IC 10.1 [28] in order to assess potential covariates in relation to the outcome (the number of infected individuals over the number of individuals screened per location). Continuous covariates were categorized into four groups based on quartiles to account for potential non-linearity in the outcomepredictor relationship on the logit. The Bayesian information criterion (BIC) was employed to detect whether linear or categorized covariates on the logits have smaller BIC and therefore predict the outcome more accurately. We used the following covariates in both linear and categorical scales: altitude, day LST, night LST, rainfall, NDVI, and distance to the nearest freshwater body. The type of freshwater body, ecological zone, and land cover were measured in categorical dimensions.
Relevance of continuous or categorized covariates to predict the outcome was assessed based on p-values resulting from likelihood ratio tests (LRTs) at significance levels of 0.15. All significant covariates were included in the Bayesian analysis.
Bayesian geostatistical logistic regression models were fitted with location-specific random effects. Spatial correlation was modeled assuming that the random effects follow a multivariate normal distribution with variance-covariance matrix related to an exponential correlation function between any pair of locations. via Bayesian kriging using joint predictive posterior distributions [16]. Due to computational issues, we modeled the multivariate Gaussian spatial process separately for each country. The performance of the models was assessed using model validation via different approaches: mean predictive errors (ME), mean absolute predictive errors (MAE), discriminatory performance on a 50% prevalence cut-off, and Bayesian credible interval (BCI) comparisons [17]. Further details pertaining to the Bayesian geostatistical model, sub-sampling, and model validation approaches are given in the Appendix S1.

Final datasets and preliminary statistics
A schematic overview of the study profile on obtaining prevalence data on schistosomiasis from the GNTD is given in Figure 1. The final datasets consisted of 1993 and 1179 survey locations for S. haematobium and S. mansoni, respectively, out of which 1722 and 1094 locations were unique. Observed prevalence of the survey locations ranged from 0% to 100% for each Schistosoma species with mean prevalence of 31.0% (median 15.0%, standard deviation (SD) 29.0%) for S. haematobium, and 17.7% (median 0.0%, SD 24.4%) for S. mansoni. The distribution and the prevalence level of the survey locations are shown in Figures 2 and  3 for S. haematobium and S. mansoni, respectively. An overview of the number of surveys with details given regarding sampling period, diagnostic technique, survey type, and mean prevalence, stratified by country, is given in Table 2.
Spatial distributions of potential covariates influencing the distribution of schistosomiasis are presented in Figure 4. Bivariate logistic regressions of the continuous factors in relation to the disease outcomes showed that categorical variables predicted better based on BIC values than linear variables for both Schistosoma species (results not presented). Each potential covariate considered for the analyses had a p-value of ,0.001 based on LRTs and was therefore included in the multivariate analyses. Backwards logistic regressions demonstrated the importance of the whole set of covariates for each species. The resulting odds ratios (ORs) of bivariate and multivariate non-spatial logistic regressions are summarized in Table 3 for S. haematobium, and Table 4 for S. mansoni. The only non-significant outcome-predictor relations in a multivariate framework for the former species were yearly averaged precipitation between 300 mm and 399 mm, and NDVI levels between 0.33 and 0.52. For the latter species, only altitude levels of at least 500 m above sea level and night LSTs between 20.0uC and 20.7uC were non-significant.

Spatial modeling outcomes
Model parameter estimates for S. haematobium and S. mansoni are presented in Table 3 and Table 4, respectively. Introduction of spatial correlation led to changes in the significance of covariates    Figure 5A presents the prevalence map for S. haematobium based on the median of the predictions. Low-prevalence areas (predicted infection prevalence ,10%) were primarily observed in the Sahara, Cameroon, north-west Côte d'Ivoire, and Senegal. Prevalence .50% are mainly spread along the Niger River, in Sierra Leone, east/central Senegal, and south Nigeria. The map of the SD of model predictions for this species ( Figure 5B) demonstrates that small prediction errors were primarily found around the survey locations used for sub-sampling.

Schistosomiasis prevalence maps
The median spatial S. mansoni prevalence map is shown in Figure 6A with the corresponding error presented in Figure 6B. High-prevalence areas (predicted prevalence .50%) were mainly found in north-east Liberia, east Côte d'Ivoire, west Ghana, north/central Benin, west Nigeria, north Cameroon, and central Mali in close proximity to Niger River. Very low prevalence areas (predicted prevalence ,10%) were predominant in Senegal, The Gambia, Guinea-Bissau, Mauritania, and Niger. Furthermore, low prevalence areas were predicted for north Mali, south Togo, and parts of Cameroon. Areas of high prediction accuracy were found around the sub-sampled survey locations and in desert/semi-desert ecological zones. Table 5 shows population-adjusted country prevalence estimates. For S. haematobium, prevalence estimates range between 17.6% (The Gambia) and 51.6% (Sierra Leone), whereas for S. mansoni they range between 0.5% (The Gambia) and 37.1% (Liberia). S. haematobium was found to be the predominant species throughout West Africa with a difference compared to S. mansoni of up to 30% in Burkina Faso and a minimum difference of about 4% in Liberia. Combined Schistosoma prevalence estimates, assuming independence of the occurrence of the two species, varied from 18.1% (The Gambia) to 58.3% (Liberia) with high numbers of infected individuals aged #20 years (more than 5 million) in Ghana and Nigeria. Lower numbers (,1 million) of infected individuals aged #20 years were found in The Gambia, Guinea-Bissau, Liberia, and Mauritania. The overall number of infected individuals aged #20 years in West Africa is 50.8 million.

Model validation results
Model validation based on 80% of the survey locations resulted in MEs of 21.7 for S. haematobium and 0.0 for S. mansoni, and respective MAEs of 19.5 and 7.3. The percentage of test locations correctly predicted by 95% BCIs was 72.9% for S. haematobium, and 72.5% for S. mansoni. ME and MAE comparisons between spatial and exchangeable random effect models showed that spatial models result in better predictive ability (S. haematobium: ME = 3.8, MAE = 27.7; S. mansoni: ME = 20.8, MAE = 14.9).
Discriminatory performance based on a 50% prevalence cut-off showed that the models correctly predicted 93.2% and 76.9% of the validation locations for S. mansoni and S. haematobium, respectively. False-high predictions were obtained for 5.5% (S. mansoni) and 18.8% (S. haematobium) of the test locations.

Discussion
To our knowledge, we provide the first model-based prevalence maps for both S. haematobium and S. mansoni for individuals aged #20 years in West Africa, including Cameroon. We used a readily available open-access database consisting of a large number of historical and contemporary geolocated and standardized survey data [15], coupled with Bayesian-based geostatistical tools. Standard geostatistical methods are not able to handle large numbers of survey locations due to computational problems. Therefore, for the first time, an approximation of the spatial process was implemented in Schistosoma prevalence modeling.
In comparison to existing prevalence estimates, major shortcomings of previous studies have been addressed, and hence our prevalence maps show a higher spatial resolution and we believe that they are more accurate than heretofore. This claim is justified as follows. First, our estimates are based on the GNTD database that has gone live in July 2010, developed as part of the EUfunded CONTRAST project. As of February 2010, the GNTD contained more than 4500 and 2600 unique entries in West Africa for S. haematobium and S. mansoni, respectively. Second, datatailored statistical methods based on Bayesian geostatistical modeling were used in order to incorporate spatial correlation between survey locations and to obtain more accurate estimates of the uncertainty of the predictions. Third, climatic and environmental covariates were employed in the models to evaluate the effect on the disease outcomes. The climatic and environmental factors were obtained at high spatial resolution to be able to predict small hotspots of risk, which could arise due to the focal distribution of schistosomiasis, which is an important epidemiological feature of the disease [32]. An existing S. haematobium prevalence map for three West African countries (i.e., Burkina Faso, Mali, and Niger) using Bayesian geostatistical modeling was previously presented by Clements et al. (2008) [33]   level of schistosomiasis prevalence but rather probabilities that the predicted prevalence is above a pre-defined cut-off, arbitrarily set at 50%. This cut-off has been proposed by the World Health Organization (WHO) [1] to distinguish between low and high risk areas, and hence such maps are useful to detect areas where preventive chemotherapy might be warranted on an annual basis. However, the maps do not provide detailed information for lower risk areas or the number of infected individuals and they cannot be used for monitoring and evaluation purposes following interventions. A more recent publication by   [9] presented a S. haematobium prevalence map for the same three West African countries. This map shows similar patterns to our map with the exception of north Burkina Faso. In this area, Clements and colleagues predicted prevalence levels of 10-20% for high and low egg-intensities, while our estimates suggest much higher prevalence (.50%). These discrepancies are most likely due to differences in the underlying survey data. The Clements et al. data were only partially included in the GNTD database as we could not access them fully. The estimated spatial correlation for both Schistosoma species was very strong with spatial ranges of approximately 400 km. Previously reported spatial ranges in parts of West Africa vary between 7.5 km [19] and approximately 180 km [33]. However, these estimates were based on recent surveys, and hence influenced by recently established control programs. Interventions are likely to reduce the predictive power of environmental and climatic factors on the distribution of schistosomiasis and, thus, reduce spatial correlation. Similar effects were found for malaria, where historic data showed stronger spatial correlation [34] than recent surveys [35,36].
We overlaid population data adjusted to 2010 on the predicted prevalence surfaces for the two Schistosoma species in order to obtain country-specific estimates of the number of infected individuals aged #20 years. Previous country estimates, for instance those presented by Chitsulo et al.  [5], are interpolations of limited observations for a whole country, and hence lack empirical modeling. Chitsulo and colleagues reported a higher number of infected people for West Africa (71.8 million) compared to our estimate (50.8 million). Of note, the Chitsulo et al. estimates are based on the whole population, while our new estimates concern the age group #20 years. Moreover, the Chitsulo et al. estimates pertain to mid-1990s population estimates, compared to our adjusted estimates for the year 2010. In countries like Cameroon, The Gambia, Ghana, and Liberia, characterized by high rural-tourban migration in the last decade, the Chitsulo et al. prevalence estimates should be treated with care due to rapid urbanization. Our study revealed that the combined prevalence of S. haematobium and S. mansoni in The Gambia, for example, is two-fold lower than previously reported by Chitsulo et al. (18.1% vs. 37.5%). However, in Benin, Guinea, Liberia, Nigeria, and Togo, we found prevalence estimates that are more than 10 percentage points higher than the previous estimates. On the one hand, differences might be related to sparse data, for example, in Benin, The Gambia, Guinea, Guinea-Bissau, Liberia, Mauritania, Nigeria, and Sierra Leone. Previous estimates failed to take into account model-based predictions on the basis of climate, environment and disease data. Since we modeled disease prevalence on individuals aged #20 years (highest risk groups), the prevalence estimates correspond to the former risk group. Therefore they are likely to overestimate the prevalence in the whole population.
We estimated the country-specific overall schistosomiasis prevalence by assuming independence between the occurrence of S. haematobium and S. mansoni in each area. However, it is conceivable that simultaneous infections with both species is more frequent than expected by chance in areas where the species coexist as infection pathways are similar and highly behavioral related. Hence, the combined prevalence estimates potentially underestimate the true schistosomiasis situation in West Africa. A modeling approach via joint spatial random effects [37] could assess the effect of potential dependence between the species, but would increase the number of spatial parameters and is therefore computationally challenging.
We might also underestimate schistosomiasis prevalence in Cameroon, Mali, and Nigeria because of the presence of S. intercalatum [4]. We did not include this species in the analysis since the GNTD database currently only contains 17 survey locations outside Cameroon. However, it is assumed that S. intercalatum has a low prevalence [4] and there are signs that this species is further declining in importance [38].
Model validation has shown that the S. haematobium predictions seem to overestimate the actual prevalence, while the S. mansoni model revealed no tendency to over-or underestimate the overall prevalence. The MAE for the S. haematobium model is nearly three times larger than the one for S. mansoni. This is expected because the mean prevalence for S. haematobium was about double than that for S. mansoni. Our models correctly predict about 72% of the survey locations when considering 95% BCIs. We are encouraged by these results, since perfect predictions are rather unlikely in reality due to the complexity of disease transmission.
However, our models are based on assumptions, which could influence model performance. We assumed that the diagnostic techniques employed have similar ability to detect an infection, but different diagnostic techniques show differences in sensitivity and specificity, which also depends on the overall prevalence and infection intensity [39]. This might have led to an underestimation of prevalence due to the imperfect sensitivity of direct diagnostic techniques [39]. Additional model parameters accounting for the performance of the different diagnostic techniques could be incorporated in the models. However in the absence of detailed information regarding sampling effort, assumptions would be required which may be debatable and introduce additional biases. We are currently examining the effect of different approaches on addressing this issue on the modelbased predictions.
We did not adjust the outcome according to age and sex even though the age groups differ and especially school surveys are likely to include more boys than girls due to prevailing cultural issues in many parts of West Africa. Therefore, our results are likely to be biased and potentially overestimate schistosome prevalence. However, many publications do not present stratified results by these subgroups. Age-adjustment models are feasible but difficult to implement because age-prevalence curves have to be fitted for different transmission settings [40]. Furthermore, disease data are often reported at wide age ranges (i.e., school-aged children) and individuals might not be well distributed within the range introducing bias even though an age-prevalence model is taken into account. Surveys are typically conducted in endemic areas leading to high observed prevalence levels. This could result in an overestimation of prevalence in the present analysis. However, in the data we analyzed, 45% of the locations for S. haematobium and 73% for S. mansoni had an observed prevalence levels below 10%. We therefore assume that a location selection bias is unlikely.
Another concern is the large amount of zero outcomes (i.e., none of the study participants found to be infected) especially for S. mansoni (S. mansoni: 54.1%; S. haematobium: 20.1%). To overcome this issue, zero-inflated models need to be incorporated, which modify the likelihood function and add an additional model parameter capturing the over-dispersion arising by the zeros [41].  The models presented in this manuscript did only include spatial random errors, and hence we ignored potential measurement errors. Inclusion of location-specific non-spatial error terms might have improved model predictions. However, locationspecific non-spatial error terms would have doubled the number of error terms leading to highly parameterized models.
We further assumed isotropic stationary models. Non-stationary models imply that the spatial random effect is varying from one region to another and is not stable throughout the study area [35]. This assumption has been confirmed by semi-variogram comparisons showing that the estimated spatial range parameters for S. mansoni differ between eco-zones. However, semi-variogram analyses did not indicate non-stationarity in the spatial distribution of S. haematobium. Isotropic models assume that the spatial correlation is the same within the same distance irrespective of direction [42]. This assumption might not be valid since intermediate host snails spread along rivers and lakeshores and, therefore, introduce correlation attributed to directions.
The choice and size of sub-sampled locations required to adequately approximate the spatial Gaussian process is a research area on its own in spatial statistics. Many different approaches are available to optimize selection. We implemented a method based on semi-variogram comparisons. This selection is aiming to preserve the spatial surface of the original dataset. However, it might fail to identify a sub-sample, which minimizes the prediction error. The spatially averaged predictive variance (SAPV) method proposed by Finley is trying to optimize the variance in the predictions, but implementation is computationally highly demanding [43].
Time-dependent covariates, such as the climatic factors, might have changed between the 1980s and the 2000s. However, our geographical covariates were solely based on recent remote sensing data (from 2000 onwards), because historical remote sensing data are, to our knowledge, not freely available at high spatial and temporal resolution. The long run averages of the recent data enable us to maintain high spatial resolution although they cannot capture variation in the observed outcome due to unusual climatic conditions or climate change that might have occurred since the 1980s and 1990s.
Preliminary residual analyses suggest that there is only weak temporal correlation in the data. We therefore only modeled a spatial rather than a spatio-temporal process. This led to a more parsimonious model and facilitated model fit. Nevertheless, we incorporated temporal trends in the prevalence estimation by including the survey year as covariate. Both Schistosoma species showed that the predicted prevalence was highest during the 1990s. This increase might be explained by water resources development and management activities (e.g., the construction of dams and irrigation systems), political unrests and civil restructuring. Water resources development and management projects might have improved the suitability of the environment for snail intermediate hosts that might have spread into previously snailfree zones together with the parasites. Since the beginning of the new millennium, a number of large-scale preventive chemother-apy programs are underway in parts of West Africa and it will be important to monitor how the prevalence of schistosomiasis changes in space and over time. The effectiveness of control interventions may vary across areas but, to our knowledge, a comprehensive database compiling this information with high spatio-temporal resolution has yet to be established.
Concluding, our country-specific Schistosoma prevalence estimates and numbers of individuals aged #20 years infected with either S. mansoni, or S. haematobium, or both species concurrently presented here are useful tools for disease control managers and other stakeholders to support decision-making on interventions.
Our maps can also serve as a benchmark to monitor the impact of control interventions and for long-term evaluation on transmission dynamics. Model-based estimates in areas with scarce data and high uncertainty could be improved by additional surveys to enhance our knowledge on the distribution of schistosomiasis and disease burden. We plan to further expand this work to other regions and address the issues of non-stationarity, diagnostic sensitivity, and age-heterogeneity across surveys. Finally, we will test the assumption of independence between the Schistosoma species to improve accuracy of the joint prevalence estimates.

Supporting Information
Alternative Language Abstract S1 Geostatistische modellbasierte Abschä tzungen zur Hä ufigkeit von Schistosomiasis in Westafrika fü r Personen im Alter von maximal 20 Jahren -Translation of abstract into German by Nadine Schur.

(PDF)
Appendix S1 Supporting information on geostatistical model formulation, spatial process approximation and model validation.